
Research Article
VMHQA: A Vietnamese Multi-choice Dataset for Mental Health Domain Question Answering
@ARTICLE{10.4108/eetsis.7678, author={Tu Anh Hoang Nguyen and Quang-Dieu Nguyen and Harius M. Nguyen and Alfred Hoang Nguyen and LOAN Nguyen}, title={VMHQA: A Vietnamese Multi-choice Dataset for Mental Health Domain Question Answering}, journal={EAI Endorsed Transactions on Scalable Information Systems}, volume={12}, number={4}, publisher={EAI}, journal_a={SIS}, year={2025}, month={9}, keywords={VMHQA, Mental Health Dataset, Vietnamese Multiple-Choice Question Answering (MCQA), BERT-based Models, NLP in Mental Health, Retrieval-Augmented Generation (RAG), Agentic Chunking, Large Language Modes (LLMs)}, doi={10.4108/eetsis.7678} }
- Tu Anh Hoang Nguyen
Quang-Dieu Nguyen
Harius M. Nguyen
Alfred Hoang Nguyen
LOAN Nguyen
Year: 2025
VMHQA: A Vietnamese Multi-choice Dataset for Mental Health Domain Question Answering
SIS
EAI
DOI: 10.4108/eetsis.7678
Abstract
This paper introduces VMHQA, a VietnameseMultiple-Choice Question Answering (MCQA) dataset designed to address critical mental health resources gaps, particularly in low and middle-income countries like Vietnam. The dataset comprises 10,000 meticulously curated records across 1,166 mental health subjects, including 249 topics in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and 8,599 contextual paragraphs. Each record adheres to the United States Medical Licensing Examination (USMLE) format, with targeted questions, correct answers, multiple-choice options, and supporting paragraphs from reputable sources such as academic journals and local hospital websites, further inspected by prestigious mental hospitals in Vietnam. VMHQA thus provides a reliable, structured foundation for preconsultation tools, allowing for early psychological intervention for those concerned about mental health issues. This study also goes beyond data collection to evaluate the effectiveness of VMHQA using cutting-edge machine learning models, such as BERT-based architectures, large language models (LLMs) ranging from 7 to 9 billion parameters, and various generative pre-trained transformer (GPT) frameworks. In addition, we look at how Retrieval-Augmented Generation (RAG) combined with Agentic Chunking can improve the accuracy and interpretability of responses in this specialised domain. The retrieval mechanisms of RAG are examined explicitly for their ability to generate contextually accurate answers sensitive to psychological nuances. Our findings shed light on the effectiveness of these advanced models in handling complex, domainspecific question-answering tasks in mental health, highlighting their potential to make mental health care more accessible and reliable for Vietnamese-speaking communities. VMHQA thus represents a significant step toward making mental health care more accessible, offering hope for improved mental health outcomes.
Copyright © 2024 Tu Anh Hoang Nguyen et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.