About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
IoT 22(3): e2

Research Article

Use of Neural Topic Models in conjunction with Word Embeddings to extract meaningful topics from short texts

Download517 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eetiot.v8i3.2263,
        author={Nassera HABBAT and Houda ANOUN and Larbi HASSOUNI and Hicham NOURI},
        title={Use of Neural Topic Models in conjunction with Word Embeddings to extract meaningful topics from short texts},
        journal={EAI Endorsed Transactions on Internet of Things},
        volume={8},
        number={3},
        publisher={EAI},
        journal_a={IOT},
        year={2022},
        month={9},
        keywords={Neural Topic Models, Pre-training word embedding, Short text, Topic coherence},
        doi={10.4108/eetiot.v8i3.2263}
    }
    
  • Nassera HABBAT
    Houda ANOUN
    Larbi HASSOUNI
    Hicham NOURI
    Year: 2022
    Use of Neural Topic Models in conjunction with Word Embeddings to extract meaningful topics from short texts
    IOT
    EAI
    DOI: 10.4108/eetiot.v8i3.2263
Nassera HABBAT1,*, Houda ANOUN1, Larbi HASSOUNI1, Hicham NOURI1
  • 1: University of Hassan II Casablanca
*Contact email: nassera.habbat@gmail.com

Abstract

Unsupervised machine learning is utilized as a part of the process of topic modeling to discover dormant topics hidden within a large number of documents. The topic model can help with the comprehension, organization, and summarization of large amounts of text. Additionally, it can assist with the discovery of hidden topics that vary across different texts in a corpus. Traditional topic models like pLSA (probabilistic latent semantic analysis) and LDA suffer performance loss when applied to short-text analysis caused by the lack of word co-occurrence information in each short text. One technique being developed to solve this problem is pre-trained word embedding (PWE) with an external corpus used with topic models. These techniques are being developed to perform interpretable topic modeling on short texts. Deep neural networks (DNN) and deep generative models have recently advanced, allowing neural topic models (NTM) to achieve flexibility and efficiency in topic modeling. There have been few studies on neural-topic models with pre-trained word embedding for producing significant topics from short texts. An extensive study with five NTMs was accomplished to test the efficacy of additional PWE in generating comprehensible topics through experiments with different datasets in Arabic and French concerning Moroccan news published on Facebook pages. Several metrics, including topic coherence and topic diversity, are utilized in the process of evaluating the extracted topics. Our research shows that the topic coherence of short texts can be significantly improved using a word embedding with an external corpus.

Keywords
Neural Topic Models, Pre-training word embedding, Short text, Topic coherence
Received
2022-07-30
Accepted
2022-09-29
Published
2022-09-30
Publisher
EAI
http://dx.doi.org/10.4108/eetiot.v8i3.2263

Copyright © 2022 Nassera HABBAT et al., licensed to EAI. This is an open access article distributed under the terms of the CC BYNC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL