In epidemiological studies, assessment of long term exposure to air pollution is often estimated using air pollution measurements at fixed monitoring stations, and interpolated to the residence of survey participants through Geographical Information Systems (GIS). However, obtaining georeferenced address data from national registries requires a long and cumbersome administrative procedure, since this kind of personal data is protected by privacy regulations. This paper aims to assess whether information collected in health interview surveys, including air pollution annoyance, could be used to build prediction models for assessing individual long term exposure to air pollution, removing the need for data on personal residence address.
Analyses were carried out based on data from the Belgian Health Interview Survey (BHIS) 2013 linked to GIS-modelled air pollution exposure at the residence place of participants older than 15 years (n = 9347). First, univariate linear regressions were performed to assess the relationship between air pollution annoyance and modelled exposure to each air pollutant. Secondly, a multivariable linear regression was performed for each air pollutant based on a set of variables selected with elastic net cross-validation, including variables related to environmental annoyance, socio-economic and health status of participants. Finally, the performance of the models to classify individuals in three levels of exposure was assessed by means of a confusion matrix.
Our results suggest a limited validity of self-reported air pollution annoyance as a direct proxy for air pollution exposure and a weak contribution of environmental annoyance variables in prediction models. Models using variables related to the socio-economic status, region, urban level and environmental annoyance allow to predict individual air pollution exposure with a percentage of error ranging from 8% to 18%. Although these models do not provide very accurate predictions in terms of absolute exposure to air pollution, they do allow to classify individuals in groups of relative exposure levels, ranking participants from low over medium to high air pollution exposure. This model represents a rapid assessment tool to identify groups within the BHIS participants undergoing the highest levels of environmental stress.