
Research Article
A Study on Effectiveness of BERT Models and Task-Conditioned Reasoning Strategy for Medical Visual Question Answering
@INPROCEEDINGS{10.1007/978-3-031-29126-5_5, author={Chau Nguyen and Tung Le and Nguyen-Khang Le and Trung-Tin Pham and Le-Minh Nguyen}, title={A Study on Effectiveness of BERT Models and Task-Conditioned Reasoning Strategy for Medical Visual Question Answering}, proceedings={Artificial Intelligence for Communications and Networks. 4th EAI International Conference, AICON 2022, Hiroshima, Japan, November 30 - December 1, 2022, Proceedings}, proceedings_a={AICON}, year={2023}, month={3}, keywords={Medical visual question answering Visual question answering Task-conditioned reasoning Conditional reasoning}, doi={10.1007/978-3-031-29126-5_5} }
- Chau Nguyen
Tung Le
Nguyen-Khang Le
Trung-Tin Pham
Le-Minh Nguyen
Year: 2023
A Study on Effectiveness of BERT Models and Task-Conditioned Reasoning Strategy for Medical Visual Question Answering
AICON
Springer
DOI: 10.1007/978-3-031-29126-5_5
Abstract
Medical visual question answering task requires a framework to understand a medical question in natural language and examine the corresponding image to produce the answer to the question. The common framework consists of a language understanding module, a visual understanding module, a signal fusion module, and an answer prediction module. Most existing works employed recurrent neural network-based models for the language understanding module. However, these approaches may not produce robust text presentations and are hard to interpret. On the other hand, BERT models are more robust for text representation and can provide a clue for interpretability via the attention weights between the words. Besides, as the questions consist of closed-answer questions and open-answer questions, the task-conditioned reasoning strategy was proposed to handle each type of question separately while maintaining several modules in the framework to be shared. In this paper, we investigate the effectiveness of pre-trained BERT models and the task-conditioned reasoning strategy for the task of medical visual question answering on the VQA-RAD dataset. Experimental results demonstrate improvements when pre-trained BERT models are combined with the task-conditioned reasoning strategy.