About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
sis 20(26): e7

Research Article

Semantic N-Gram Topic Modeling

Download3449 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eai.13-7-2018.163131,
        author={Pooja Kherwa and Poonam Bansal},
        title={Semantic N-Gram Topic Modeling},
        journal={EAI Endorsed Transactions on Scalable Information Systems},
        volume={7},
        number={26},
        publisher={EAI},
        journal_a={SIS},
        year={2020},
        month={2},
        keywords={Topic Modeling, Latent Dirichlet Allocation, Point wise Mutual Information, Bag of words, Coherence, Perplexity},
        doi={10.4108/eai.13-7-2018.163131}
    }
    
  • Pooja Kherwa
    Poonam Bansal
    Year: 2020
    Semantic N-Gram Topic Modeling
    SIS
    EAI
    DOI: 10.4108/eai.13-7-2018.163131
Pooja Kherwa1,*, Poonam Bansal1
  • 1: Maharaja Surajmal Institute of Technology, C-4 Janak Puri. GGSIPU. New Delhi-110058, India
*Contact email: poona281280@gmail.com

Abstract

In this paper a novel approach for effective topic modeling is presented. The approach is different from traditional vector space model-based topic modeling, where the Bag of Words (BOW) approach is followed. The novelty of our approach is that in phrase-based vector space, where critical measure like point wise mutual information (PMI) and log frequency based mutual dependency (LGMD)is applied and phrase’s suitability for particular topic are calculated and best considerable semantic N-Gram phrases and terms are considered for further topic modeling. In this experiment the proposed semantic N-Gram topic modeling is compared with collocation Latent Dirichlet allocation(coll-LDA) and most appropriate state of the art topic modeling technique latent Dirichlet allocation (LDA). Results are evaluated and it was found that perplexity is drastically improved and found significant improvement in coherence score specifically for short text data set like movie reviews and political blogs.

Keywords
Topic Modeling, Latent Dirichlet Allocation, Point wise Mutual Information, Bag of words, Coherence, Perplexity
Received
2019-11-16
Accepted
2020-02-10
Published
2020-02-11
Publisher
EAI
http://dx.doi.org/10.4108/eai.13-7-2018.163131

Copyright © 2020 Pooja Kherwa et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL