
Research Article
Is Adding More Modalities Better in a Multimodal Spatio-temporal Prediction Scenario? A Case Study on Japan Air Quality
@INPROCEEDINGS{10.1007/978-3-030-94822-1_22, author={Yutaro Mishima and Guillaume Habault and Shinya Wada}, title={Is Adding More Modalities Better in a Multimodal Spatio-temporal Prediction Scenario? A Case Study on Japan Air Quality}, proceedings={Mobile and Ubiquitous Systems: Computing, Networking and Services. 18th EAI International Conference, MobiQuitous 2021, Virtual Event, November 8-11, 2021, Proceedings}, proceedings_a={MOBIQUITOUS}, year={2022}, month={2}, keywords={Multimodal Spatio-temporal Air quality Location GPS}, doi={10.1007/978-3-030-94822-1_22} }
- Yutaro Mishima
Guillaume Habault
Shinya Wada
Year: 2022
Is Adding More Modalities Better in a Multimodal Spatio-temporal Prediction Scenario? A Case Study on Japan Air Quality
MOBIQUITOUS
Springer
DOI: 10.1007/978-3-030-94822-1_22
Abstract
Nowadays, several spatio-temporal datasets are made available for research purposes (e.g., location, traffic or meteorology dataset). These datasets are more and more utilized as multimodal inputs of neural networks in order to perform spatio-temporal predictions. However, there are few methods that include functions, which explicitly capture cross-modal relationships. This lack of information will be a serious problem when more complex modalities and dependencies among modalities will need to be taken into consideration. Considering that in the future more spatio-temporal datasets will be made available, it is of crucial importance to tackle this problem. In this paper, we conduct some preliminary experiments to confirm whether an existing multimodal spatio-temporal network performs better when another modality is added. These experiments compare air quality forecasting performance using a trimodal spatio-temporal dataset. This comparison is realized with several methods and especially one that has been modified to handle multiple modalities. Based on the obtained results, we confirm that prediction performance does not improve when another modality is simply added. Therefore, some methods are required to capture complex cross-modal relationships.