About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
ismla 25(1):

Research Article

Enhancing Document Clustering with Hybrid Recurrent Neural Networks and Autoencoders: A Robust Approach for Effective Semantic Organization of Large Textual Datasets

Download219 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eetismla.4564,
        author={Ratnam Dodda and Suresh Babu Alladi},
        title={Enhancing Document Clustering with Hybrid Recurrent Neural Networks and Autoencoders: A Robust Approach for Effective Semantic Organization of Large Textual Datasets},
        journal={EAI Endorsed Transactions on Intelligent Systems and Machine Learning},
        volume={1},
        number={1},
        publisher={EAI},
        journal_a={ISMLA},
        year={2024},
        month={3},
        keywords={Document Clustering, Recurrent Neural Network, Autoencoders, Hybrid model, Diverse Datasets},
        doi={10.4108/eetismla.4564}
    }
    
  • Ratnam Dodda
    Suresh Babu Alladi
    Year: 2024
    Enhancing Document Clustering with Hybrid Recurrent Neural Networks and Autoencoders: A Robust Approach for Effective Semantic Organization of Large Textual Datasets
    ISMLA
    EAI
    DOI: 10.4108/eetismla.4564
Ratnam Dodda1,*, Suresh Babu Alladi1
  • 1: Jawaharlal Nehru Technological University Anantapur
*Contact email: ratnam.dodda@gmail.com

Abstract

This research presents an innovative document clustering method that uses recurrent neural networks (RNNs) and autoencoders. RNNs capture sequential dependencies while autoencoders improve feature representation. The hybrid model, tested on different datasets (20-Newsgroup, Reuters, BBC Sports), outperforms traditional clustering, revealing semantic relationships and robustness to noise. Preprocessing includes denoising techniques (stemming, lemmatization, tokenization, stopword removal) to ensure a refined data set. Evaluation metrics (adjusted randomness evaluation, normalized mutual information evaluation, completeness evaluation, homogeneity evaluation, V-measure, accuracy) validate the effectiveness of the model and provide a powerful solution for organizing and understanding large text datasets.

Keywords
Document Clustering, Recurrent Neural Network, Autoencoders, Hybrid model, Diverse Datasets
Received
2023-12-08
Accepted
2024-03-13
Published
2024-03-18
Publisher
EAI
http://dx.doi.org/10.4108/eetismla.4564

Copyright © 2024 R. Dodda et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL