Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus

Ri-han Wu; Yi-jie Cao

Advanced Hybrid Information Processing. 5th EAI International Conference, ADHIP 2021, Virtual Event, October 22-24, 2021, Proceedings, Part I

Research Article

Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus

Download

216 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-94551-0_3,
    author={Ri-han Wu and Yi-jie Cao},
    title={Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus},
    proceedings={Advanced Hybrid Information Processing. 5th EAI International Conference, ADHIP 2021, Virtual Event, October 22-24, 2021, Proceedings, Part I},
    proceedings_a={ADHIP},
    year={2022},
    month={1},
    keywords={Corpus Language Information retrieval},
    doi={10.1007/978-3-030-94551-0_3}
}

Ri-han Wu
Yi-jie Cao
Year: 2022
Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus
ADHIP
Springer
DOI: 10.1007/978-3-030-94551-0_3

Ri-han Wu¹^,*, Yi-jie Cao²

1: School of Chinese Language and Literature, Northwest Minzu University
2: School of Ethnology and Sociology, Northwest Minzu University

*Contact email: wurihan21322@yeah.net

Abstract

Cross language information retrieval focuses on how to use the query expressed in one language to search the information expressed in another language. One of the key problems is to adopt different methods to establish bilingual semantic correspondence. In recent years, topic model has become an effective method in machine learning, information retrieval and natural language processing. This paper systematically studies the cross language retrieval model, cross language text classification method and cross language text clustering method. Without the help of cross language resources such as machine translation and bilingual dictionaries, it can effectively solve the many to many problem of Vocabulary Translation in CLIR and the problem of partial decomposition of unknown words. The experimental results on the cross language text classification evaluation corpus established in this paper show that the performance of cross language and single language text classification on the bilingual topic space constructed by this method is close to or better than that of single language classification on the original feature space, and the performance of cross language text clustering is close to or better than that of single language document clustering.

Keywords: Corpus, Language, Information retrieval

Published: 2022-01-18
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-94551-0_3

Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus

Abstract

About EAI

Community

Publish with EAI