A novel machine learning approach to assess the quality of sensor data using an ensemble classification framework is presented in this paper. The quality of sensor data is indicated by discrete quality flags that indicate the level of uncertainty associated with a sensor reading. Depending on the domain and the problem under consideration, the level of uncertainty is different and thus unsupervised methods like outlier detection fails to match the expectation. The quality flags are normally assigned by domain experts. Considering the volume of sensor data, manual assignment is a laborious task and subject to human error. Given a representative set of labelled data, a supervised classification approach is thus a feasible alternative. The nature of sensor data, however, poses some challenges to the classification task. Data of dubious quality exists in such data sets with very small frequency leading to the class imbalance problem. We thus adopt a cluster oriented sampling approach to address the imbalance issue. In addition, it is beneficial to train multiple classifiers to improve the overall classification accuracy. We thus produce multiple under-sampled training sets using cluster oriented sampling and train base classifiers on each of them. Decisions produced by the base classifiers are fused into a single decision using majority voting. We have evaluated the proposed ensemble classification framework by assessing the quality of marine sensor data obtained from sensors situated at Sullivans Cove, Hobart, Australia. Experimental results reveal that the proposed framework agrees with expert judgement with high accuracy and achieves superior classification performance than other state-of-the-art approaches.
sensors, quality assessment of sensor data, ensemble classifier, class balancing, time series classification