Evaluation of missing value methods for predicting ambient BTEX concentrations in two neighbouring cities in Southwestern Ontario Canada
Miller, L and Xu, X and Wheeler, A and Zhang, T and Hamadani, M and Ejaz, U, Evaluation of missing value methods for predicting ambient BTEX concentrations in two neighbouring cities in Southwestern Ontario Canada, Atmospheric Environment, 181 pp. 126-134. ISSN 1352-2310 (2018) [Refereed Article]
High density air monitoring campaigns provide spatial patterns of pollutant concentrations which are integral in exposure assessment. Such analysis can assist with the determination of links between air quality and health outcomes, however, problems due to missing data can threaten to compromise these studies. This research evaluates four methods; mean value imputation, inverse distance weighting (IDW), inter-species ratios, and regression, to address missing spatial concentration data ranging from one missing data point up to 50% missing data. BTEX (benzene, toluene, ethylbenzene, and xylenes) concentrations were measured in Windsor and Sarnia, Ontario in the fall of 2005. Concentrations and inter-species ratios were generally similar between the two cities. Benzene (B) was observed to be higher in Sarnia, whereas toluene (T) and the T/B ratios were higher in Windsor. Using these urban, industrialized cities as case studies, this research demonstrates that using inter-species ratios or regression of the data for which there is complete information, along with one measured concentration (i.e. benzene) to predict for missing concentrations (i.e. TEX) results in good agreement between predicted and measured values. In both cities, the general trend remains that best agreement is observed for the leave-one-out scenario, followed by 10% and 25% missing, and the least agreement for the 50% missing cases. In the absence of any known concentrations IDW can provide reasonable agreement between observed and estimated concentrations for the BTEX species, and was superior over mean value imputation which was not able to preserve the spatial trend. The proposed methods can be used to fill in missing data, while preserving the general characteristics and rank order of the data which are sufficient for epidemiologic studies.
air quality, VOCs, BTEX, inter-specie ratios, benzene, missing values, imputation