Research on Dense Enhanced Document Retrieval Based on G-mixup

Jiawei Tang; Junping Liu

Proceedings of the 2nd International Conference on Information Economy, Data Modeling and Cloud Computing, ICIDC 2023, June 2–4, 2023, Nanchang, China

Research Article

Research on Dense Enhanced Document Retrieval Based on G-mixup

Download364 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.2-6-2023.2334610,
    author={Jiawei  Tang and Junping  Liu},
    title={Research on Dense Enhanced Document Retrieval Based on G-mixup},
    proceedings={Proceedings of the 2nd International Conference on Information Economy, Data Modeling and Cloud Computing, ICIDC 2023, June 2--4, 2023, Nanchang, China},
    publisher={EAI},
    proceedings_a={ICIDC},
    year={2023},
    month={8},
    keywords={mixup dense document retrieval graph convolutional neural},
    doi={10.4108/eai.2-6-2023.2334610}
}

Jiawei Tang
Junping Liu
Year: 2023
Research on Dense Enhanced Document Retrieval Based on G-mixup
ICIDC
EAI
DOI: 10.4108/eai.2-6-2023.2334610

Jiawei Tang¹^,*, Junping Liu¹

1: Wuhan Textile University

*Contact email: 2359451809@qq.com

Abstract

The dense document retrieval model based on Mixup regards words as independent individuals, splits the connection between words, ignores the semantic information of the text, and also has the problem of insufficient labeled training data. In view of the above problems, this paper proposes a G-mixup graph based data intensive enhanced document retrieval model GDAR (Graph Data Augment Retrieval). The model first uses the graph convolutional neural network to convert queries and documents into graph data; then, uses the same type of document graph to construct a graph genera-tor Graphon; finally, mixes the graph generator Graphon in the Euclidean space to ob-tain The new graph generator Graphons performs linear interpolation and perturbation operations on Graphons to obtain new training data with soft labels, solving the prob-lem of lack of labeled data in dense document retrieval models. Experiments on the Natural Questions and TriviaQA datasets show that the method improves the accura-cy index of T-1 by 4.12% and 4.88% respectively compared with the best baseline method.

Keywords: mixup dense document retrieval graph convolutional neural

Published: 2023-08-02
Publisher: EAI

: http://dx.doi.org/10.4108/eai.2-6-2023.2334610

Research on Dense Enhanced Document Retrieval Based on G-mixup

Abstract

About EAI

Community

Publish with EAI