Machine Learning and Intelligent Communications. 4th International Conference, MLICOM 2019, Nanjing, China, August 24–25, 2019, Proceedings

Research Article

A Method of Calculating the Semantic Similarity Between English and Chinese Concepts

Download
172 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-32388-2_27,
        author={Jingwen Cao and Tiexin Wang and Wenxin Li and Chuanqi Tao},
        title={A Method of Calculating the Semantic Similarity Between English and Chinese Concepts},
        proceedings={Machine Learning and Intelligent Communications. 4th International Conference, MLICOM 2019, Nanjing, China, August 24--25, 2019, Proceedings},
        proceedings_a={MLICOM},
        year={2019},
        month={10},
        keywords={HowNet MongoDB Semantic similarity Knowledge driven},
        doi={10.1007/978-3-030-32388-2_27}
    }
    
  • Jingwen Cao
    Tiexin Wang
    Wenxin Li
    Chuanqi Tao
    Year: 2019
    A Method of Calculating the Semantic Similarity Between English and Chinese Concepts
    MLICOM
    Springer
    DOI: 10.1007/978-3-030-32388-2_27
Jingwen Cao1,*, Tiexin Wang,*, Wenxin Li1,*, Chuanqi Tao,*
  • 1: Nanjing University of Aeronautics and Astronautics
*Contact email: caojingwen1028@126.com, tiexin.wang@nuaa.edu.cn, freedomtot@nuaa.edu.cn, t-chuanqi@163.com

Abstract

In the big data era, data and information processing is a common concern of diverse fields. To achieve the two keys “efficiency” and “intelligence” to the processing process, it’s necessary to search, define and build the potential links among heterogeneous data. Focusing on this issue, this paper proposes a knowledge-driven method to calculate the semantic similarity between (bilingual English-Chinese) words. This method is built on the knowledge base “HowNet”, which defines and maintains the “atom taxonomy tree” and the “semantic dictionary” - a network of knowledge system describing the relationships between word concepts and attributes of the concepts. Compared to other knowledge bases, HowNet pays more attention to the connections between words based on concepts. Besides, this method is more complete in the analysis of concepts and more convenient in calculation methods. The non-relational database MongoDB is employed to improve the efficiency and fully use the rich knowledge maintained in HowNet. Considering both the structure of HowNet and characteristics of MongoDB, a certain number of equations are defined to calculate the semantic similarity.