About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
sis 25(5):

Research Article

The Cutting-Edge Hadoop Distributed File System: Un-leashing Optimal Performance

Download38 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eetsis.9027,
        author={Anish Gupta and P. Santhiya and C. Thiyagarajan and Anurag Gupta and Manish Gupta and Rajendra Kr. Dwivedi},
        title={The Cutting-Edge Hadoop Distributed File System: Un-leashing Optimal Performance},
        journal={EAI Endorsed Transactions on Scalable Information Systems},
        volume={12},
        number={5},
        publisher={EAI},
        journal_a={SIS},
        year={2025},
        month={10},
        keywords={Hadoop, HDFS, DataNode, NameNode, Write Operation, Read Operation},
        doi={10.4108/eetsis.9027}
    }
    
  • Anish Gupta
    P. Santhiya
    C. Thiyagarajan
    Anurag Gupta
    Manish Gupta
    Rajendra Kr. Dwivedi
    Year: 2025
    The Cutting-Edge Hadoop Distributed File System: Un-leashing Optimal Performance
    SIS
    EAI
    DOI: 10.4108/eetsis.9027
Anish Gupta1, P. Santhiya2, C. Thiyagarajan3, Anurag Gupta4, Manish Gupta5,*, Rajendra Kr. Dwivedi5
  • 1: Chandigarh Engineering College
  • 2: Sathyabama Institute of Science and Technology
  • 3: Panimalar Engineering College Chennai
  • 4: ABES Engineering College
  • 5: Madan Mohan Malaviya University of Technology
*Contact email: mkgcse@mmmut.ac.in

Abstract

Despite the widespread adoption of 1000-node Hadoop clusters by the end of 2022, Hadoop implementation still encounters various challenges. As a vital software paradigm for managing big data, Hadoop relies on the Hadoop Distributed File System (HDFS), a distributed file system designed to handle data replication for fault tolerance. This technique involves duplicating data across multiple DataNodes (DN) to ensure data reliability and availability. While data replication is effective, it suffers from inefficiencies due to its reliance on a single-pipelined paradigm, leading to time wastage. To tackle this limitation and optimize HDFS performance, a novel approach is proposed, utilizing multiple pipelines for data block transfers in-stead of a single pipeline. Additionally, the proposed approach incorporates dynamic reliability evaluation, wherein each DN updates its reliability value after each round and sends this information to the NameNode (NN). The NN then sorts the DN based on their reliability values. When a client requests to upload a data block, the NN responds with a list of high-reliability DN, ensuring high-performance data transfer. This proposed approach has been fully implemented and tested through rigorous experiments. The results reveal significant improvements in HDFS write operations, providing a promising solution to overcome the challenges associated with traditional HDFS implementations. By leveraging multiple pipelines and dynamic reliability assessment, this approach enhances the overall performance and responsiveness of Hadoop's distributed file system.

Keywords
Hadoop, HDFS, DataNode, NameNode, Write Operation, Read Operation
Received
2025-04-04
Accepted
2025-10-09
Published
2025-10-13
Publisher
EAI
http://dx.doi.org/10.4108/eetsis.9027

Copyright © 2025 Manish Gupta et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NCSA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL