Research Article
Research on Dense Enhanced Document Retrieval Based on G-mixup
@INPROCEEDINGS{10.4108/eai.2-6-2023.2334610, author={Jiawei Tang and Junping Liu}, title={Research on Dense Enhanced Document Retrieval Based on G-mixup}, proceedings={Proceedings of the 2nd International Conference on Information Economy, Data Modeling and Cloud Computing, ICIDC 2023, June 2--4, 2023, Nanchang, China}, publisher={EAI}, proceedings_a={ICIDC}, year={2023}, month={8}, keywords={mixup dense document retrieval graph convolutional neural}, doi={10.4108/eai.2-6-2023.2334610} }
- Jiawei Tang
Junping Liu
Year: 2023
Research on Dense Enhanced Document Retrieval Based on G-mixup
ICIDC
EAI
DOI: 10.4108/eai.2-6-2023.2334610
Abstract
The dense document retrieval model based on Mixup regards words as independent individuals, splits the connection between words, ignores the semantic information of the text, and also has the problem of insufficient labeled training data. In view of the above problems, this paper proposes a G-mixup graph based data intensive enhanced document retrieval model GDAR (Graph Data Augment Retrieval). The model first uses the graph convolutional neural network to convert queries and documents into graph data; then, uses the same type of document graph to construct a graph genera-tor Graphon; finally, mixes the graph generator Graphon in the Euclidean space to ob-tain The new graph generator Graphons performs linear interpolation and perturbation operations on Graphons to obtain new training data with soft labels, solving the prob-lem of lack of labeled data in dense document retrieval models. Experiments on the Natural Questions and TriviaQA datasets show that the method improves the accura-cy index of T-1 by 4.12% and 4.88% respectively compared with the best baseline method.