Context-Aware Systems and Applications. 4th International Conference, ICCASA 2015, Vung Tau, Vietnam, November 26-27, 2015, Revised Selected Papers

Research Article

Indexing Based on Topic Modeling and MATHML for Building Vietnamese Technical Document Retrieval Effectively

Download
254 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-29236-6_31,
        author={Tuan Xuan and Linh Khanh and Hung Trung and Ha Thu and Tinh Thanh},
        title={Indexing Based on Topic Modeling and MATHML for Building Vietnamese Technical Document Retrieval Effectively},
        proceedings={Context-Aware Systems and Applications. 4th International Conference, ICCASA 2015, Vung Tau, Vietnam, November 26-27, 2015, Revised Selected Papers},
        proceedings_a={ICCASA},
        year={2016},
        month={4},
        keywords={Mathml Topic modeling Vietnamese technical text Search engine Information retrieval},
        doi={10.1007/978-3-319-29236-6_31}
    }
    
  • Tuan Xuan
    Linh Khanh
    Hung Trung
    Ha Thu
    Tinh Thanh
    Year: 2016
    Indexing Based on Topic Modeling and MATHML for Building Vietnamese Technical Document Retrieval Effectively
    ICCASA
    Springer
    DOI: 10.1007/978-3-319-29236-6_31
Tuan Xuan1,*, Linh Khanh2,*, Hung Trung3,*, Ha Thu2,*, Tinh Thanh4,*
  • 1: Vietnam Ministry of Education and Training
  • 2: Electric Power University
  • 3: Danang University
  • 4: Le Quy Don Technical University
*Contact email: cxtuan@moet.edu.vn, linhbk@epu.edu.vn, vthung@dut.udn.vn, hantt@epu.edu.vn, tinhdt@mta.edu.vn

Abstract

The grow of data on the Internet has brought to people many information and it also opened some important problem in Information retrieval…Along with it, some search engines have developed for user’s purpose. User can retrieve information by content, keyword or anything what they need. However, data on the Internet is too huge, the results feedback is often millions or hundreds millions for each query. Therefore, with the narrow field, we will meet a difficult to find related information, especially technical information that contain formulas. In this paper, we present a method for building Vietnamese technical text based on topic modeling and MathML for indexing. System has built and tested with over 500 Vietnamese technical text shown that, this system satisfied users’ requires in accuracy and speed.