
Research Article
The Impacts of the Contextual Substitutions in Vietnamese Micro-text Augmentation
@INPROCEEDINGS{10.1007/978-3-030-92942-8_3, author={Huu-Thanh Duong and Trung-Kiet Tran}, title={The Impacts of the Contextual Substitutions in Vietnamese Micro-text Augmentation}, proceedings={Nature of Computation and Communication. 7th EAI International Conference, ICTCC 2021, Virtual Event, October 28--29, 2021, Proceedings}, proceedings_a={ICTCC}, year={2022}, month={1}, keywords={Data augmentation Deep learning Sentiment analysis Contextual substitution}, doi={10.1007/978-3-030-92942-8_3} }
- Huu-Thanh Duong
Trung-Kiet Tran
Year: 2022
The Impacts of the Contextual Substitutions in Vietnamese Micro-text Augmentation
ICTCC
Springer
DOI: 10.1007/978-3-030-92942-8_3
Abstract
The deep learning models rely on a huge amount of annotated training data to learn multiple layers of the features or representations and also avoid overfitting. However, the annotated dataset is unavailable, especially for the low resource languages. Building them is a tedious, time-consuming and expensive task. Thus, data augmentation has been mentioned as a perfect approach to generate the annotated data from the limited data without user intervention. In this paper, we evaluate the importances and the impacts of the contextual words to enhance the training data based on a pre-trained model which we build based on the reviews extracting the e-commerce websites in Vietnamese. We experiment on the sentiment analysis problem to evaluate the effectiveness of our approach.