
Research Article
Characterizing Air Quality in Urban Areas with Mobile Measurement and High Resolution Open Spatial Data: Comparison of Different Machine-Learning Approaches Using a Visual Interface
@INPROCEEDINGS{10.1007/978-3-030-51005-3_12, author={Yao Shen and Stephan Lehmler and Syed Monjur Murshed and Till Riedel}, title={Characterizing Air Quality in Urban Areas with Mobile Measurement and High Resolution Open Spatial Data: Comparison of Different Machine-Learning Approaches Using a Visual Interface}, proceedings={Science and Technologies for Smart Cities. 5th EAI International Summit, SmartCity360, Braga, Portugal, December 4-6, 2019, Proceedings}, proceedings_a={SMARTCITY}, year={2020}, month={7}, keywords={Air quality Land-use regression Dashboard Machine-learning}, doi={10.1007/978-3-030-51005-3_12} }
- Yao Shen
Stephan Lehmler
Syed Monjur Murshed
Till Riedel
Year: 2020
Characterizing Air Quality in Urban Areas with Mobile Measurement and High Resolution Open Spatial Data: Comparison of Different Machine-Learning Approaches Using a Visual Interface
SMARTCITY
Springer
DOI: 10.1007/978-3-030-51005-3_12
Abstract
Air quality is one of the most important topics in our urban life, as it is of great significance for human health and urban planning. However, accurate assessment and prediction of air quality in urban areas are difficult. In major cities, typically only a limited number of air quality monitoring stations are available, and inferring air quality in the un-sampled areas throughout the city is challenging. On the other hand, air quality varies in the urban areas non-linearly; it is highly spatially dependent and considerably influenced by multiple factors, such as building distribution, traffic situation and land uses.
In this research, we model air quality in the city of Augsburg using spatial features and high quality sensor data. We identify spatial features such as types and areas of different land uses, road networks with high resolution.
We integrate open available data to the air quality prediction. In this regard, we compare a simple baseline model with linear regression models (Ordinary Least-Squares and Ridge Regression) and tree-based machine-learning models (Gradient Boosting and Random Forest). In our evaluation, given the non-linearity of the data, tree-based models outperform all linear models, which are commonly used in literatures.
In addition, we created an interactive and visual dashboard. This dashboard demonstrates the analytical workflow, gives insight into model performance and uncertainty and visualizes the results.