Machine Learning and Intelligent Communications. 4th International Conference, MLICOM 2019, Nanjing, China, August 24–25, 2019, Proceedings

Research Article

Statement Generation Based on Big Data for Keyword Search

Download
129 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-32388-2_41,
        author={Qingqing Liu and Zhengyou Xia},
        title={Statement Generation Based on Big Data for Keyword Search},
        proceedings={Machine Learning and Intelligent Communications. 4th International Conference, MLICOM 2019, Nanjing, China, August 24--25, 2019, Proceedings},
        proceedings_a={MLICOM},
        year={2019},
        month={10},
        keywords={NLG Web crawler Lucene structure},
        doi={10.1007/978-3-030-32388-2_41}
    }
    
  • Qingqing Liu
    Zhengyou Xia
    Year: 2019
    Statement Generation Based on Big Data for Keyword Search
    MLICOM
    Springer
    DOI: 10.1007/978-3-030-32388-2_41
Qingqing Liu1, Zhengyou Xia1,*
  • 1: Nanjing University of Aeronautics and Astronautics
*Contact email: zhengyou_xia@nuaa.edu.cn

Abstract

Natural language generation (NLG) is the process of automatically generating a high-quality natural language text through a planning process based on some key information. Regular NLG generates sentences by analyzing grammatical and semantics, generating rules, and then organizing elements based on rules and heuristics. However, sentences generated by such methods are too strict, poorly scalable and difficult to adapt to the changing language style of human beings nowadays. Our goal is to generate smooth, personal, multi-sentence text for end users. This paper introduces a new NLG system, which can generate distinctive statements, and discard the knowledge of semantics, syntax etc., which are required by the original rule-based generation statements. This system turns out to be simple and efficient. We obtain required corpus from the network, and then use the idea of the search engine to find sentences from a large amount of data that matches the meaning of the keyword provided by users. Such generated sentences are more consistent with people’s daily life. Finally, we apply our system in the web commentary domain, evaluating our system based on three criteria. The result shows that our system works well in this field and can continue to deepen.