
Research Article
Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus
@INPROCEEDINGS{10.1007/978-3-030-94551-0_3, author={Ri-han Wu and Yi-jie Cao}, title={Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus}, proceedings={Advanced Hybrid Information Processing. 5th EAI International Conference, ADHIP 2021, Virtual Event, October 22-24, 2021, Proceedings, Part I}, proceedings_a={ADHIP}, year={2022}, month={1}, keywords={Corpus Language Information retrieval}, doi={10.1007/978-3-030-94551-0_3} }
- Ri-han Wu
Yi-jie Cao
Year: 2022
Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus
ADHIP
Springer
DOI: 10.1007/978-3-030-94551-0_3
Abstract
Cross language information retrieval focuses on how to use the query expressed in one language to search the information expressed in another language. One of the key problems is to adopt different methods to establish bilingual semantic correspondence. In recent years, topic model has become an effective method in machine learning, information retrieval and natural language processing. This paper systematically studies the cross language retrieval model, cross language text classification method and cross language text clustering method. Without the help of cross language resources such as machine translation and bilingual dictionaries, it can effectively solve the many to many problem of Vocabulary Translation in CLIR and the problem of partial decomposition of unknown words. The experimental results on the cross language text classification evaluation corpus established in this paper show that the performance of cross language and single language text classification on the bilingual topic space constructed by this method is close to or better than that of single language classification on the original feature space, and the performance of cross language text clustering is close to or better than that of single language document clustering.