Enhancing Document Clustering with Hybrid Recurrent Neural Networks and Autoencoders: A Robust Approach for Effective Semantic Organization of Large Textual Datasets

Ratnam Dodda; Suresh Babu Alladi

ismla 25(1):

Research Article

Enhancing Document Clustering with Hybrid Recurrent Neural Networks and Autoencoders: A Robust Approach for Effective Semantic Organization of Large Textual Datasets

Download219 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eetismla.4564,
    author={Ratnam Dodda and Suresh Babu Alladi},
    title={Enhancing Document Clustering with Hybrid Recurrent Neural Networks and Autoencoders: A Robust Approach for Effective Semantic Organization of Large Textual Datasets},
    journal={EAI Endorsed Transactions on Intelligent Systems and Machine Learning},
    volume={1},
    number={1},
    publisher={EAI},
    journal_a={ISMLA},
    year={2024},
    month={3},
    keywords={Document Clustering, Recurrent Neural Network, Autoencoders, Hybrid model, Diverse Datasets},
    doi={10.4108/eetismla.4564}
}

Ratnam Dodda
Suresh Babu Alladi
Year: 2024
Enhancing Document Clustering with Hybrid Recurrent Neural Networks and Autoencoders: A Robust Approach for Effective Semantic Organization of Large Textual Datasets
ISMLA
EAI
DOI: 10.4108/eetismla.4564

Ratnam Dodda¹^,*, Suresh Babu Alladi¹

1: Jawaharlal Nehru Technological University Anantapur

*Contact email: ratnam.dodda@gmail.com

Abstract

This research presents an innovative document clustering method that uses recurrent neural networks (RNNs) and autoencoders. RNNs capture sequential dependencies while autoencoders improve feature representation. The hybrid model, tested on different datasets (20-Newsgroup, Reuters, BBC Sports), outperforms traditional clustering, revealing semantic relationships and robustness to noise. Preprocessing includes denoising techniques (stemming, lemmatization, tokenization, stopword removal) to ensure a refined data set. Evaluation metrics (adjusted randomness evaluation, normalized mutual information evaluation, completeness evaluation, homogeneity evaluation, V-measure, accuracy) validate the effectiveness of the model and provide a powerful solution for organizing and understanding large text datasets.

Keywords: Document Clustering, Recurrent Neural Network, Autoencoders, Hybrid model, Diverse Datasets

Received: 2023-12-08
Accepted: 2024-03-13
Published: 2024-03-18
Publisher: EAI

: http://dx.doi.org/10.4108/eetismla.4564

Copyright © 2024 R. Dodda et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

Enhancing Document Clustering with Hybrid Recurrent Neural Networks and Autoencoders: A Robust Approach for Effective Semantic Organization of Large Textual Datasets

Abstract

About EAI

Community

Publish with EAI