About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Data and Information in Online Environments. Second EAI International Conference, DIONE 2021, Virtual Event, March 10–12, 2021, Proceedings

Research Article

Evaluating the Effect of Corpus Normalisation in Topics Coherence

Download(Requires a free EAI acccount)
1 download
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-77417-2_15,
        author={Luana da Silva Sousa and Vinicius Melquiades de Sousa and Rogerio de Aquino Silva and Gustavo Medeiros de Ara\^{u}jo},
        title={Evaluating the Effect of Corpus Normalisation in Topics Coherence},
        proceedings={Data and Information in Online Environments. Second EAI International Conference, DIONE 2021, Virtual Event, March 10--12, 2021, Proceedings},
        proceedings_a={DIONE},
        year={2021},
        month={6},
        keywords={Corpus normalisation LDA Topic coherence Ontology Natural language processing},
        doi={10.1007/978-3-030-77417-2_15}
    }
    
  • Luana da Silva Sousa
    Vinicius Melquiades de Sousa
    Rogerio de Aquino Silva
    Gustavo Medeiros de Araújo
    Year: 2021
    Evaluating the Effect of Corpus Normalisation in Topics Coherence
    DIONE
    Springer
    DOI: 10.1007/978-3-030-77417-2_15
Luana da Silva Sousa1, Vinicius Melquiades de Sousa1, Rogerio de Aquino Silva1, Gustavo Medeiros de Araújo1
  • 1: Engineering and Data Science Lab

Abstract

Probabilistic topic models are extensively used to better understand the content of documents. Due to the fact that topic models are totally unsupervised, statistical and data driven, they may produce topics not always meaningful. This work is based on the hypothesis that, since LDA takes into account the number of occurrences of words, we could affect the quality of topics by semantically normalising the text, where each concept would be represented by the same word. We can find a formal description of lexemes found in text using a knowledgebase and extract the several forms of mentioning a lexeme to normalize a corpus. We use topic coherence metric, as it represents the semantic interpretability of the terms used to describe a particular topic, to quantify the influence of semantic corpus normalisation in topics. The first tests on the semantic normalisation framework of texts showed prominent results, and shall be investigated in depth in future.

Keywords
Corpus normalisation LDA Topic coherence Ontology Natural language processing
Published
2021-06-15
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-030-77417-2_15
Copyright © 2021–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL