About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
sis 25(3):

Research Article

Smart Data Prefetching Using KNN to Improve Hadoop Performance

Download44 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eetsis.9110,
        author={Rana Ghazali and Douglas G. Down},
        title={Smart Data Prefetching Using KNN to Improve Hadoop Performance},
        journal={EAI Endorsed Transactions on Scalable Information Systems},
        volume={12},
        number={3},
        publisher={EAI},
        journal_a={SIS},
        year={2025},
        month={4},
        keywords={Hadoop Performance, Smart prefetch technique, K-Nearest Neighbor Clustering, MapReduce, Machine Learning, Cache Replacement},
        doi={10.4108/eetsis.9110}
    }
    
  • Rana Ghazali
    Douglas G. Down
    Year: 2025
    Smart Data Prefetching Using KNN to Improve Hadoop Performance
    SIS
    EAI
    DOI: 10.4108/eetsis.9110
Rana Ghazali1,*, Douglas G. Down2
  • 1: Islamic Azad University, Tehran
  • 2: McMaster University
*Contact email: ghazalir@mcmaster.ca

Abstract

Hadoop is an open-source framework that enables the parallel processing of large data sets across a cluster of machines. It faces several challenges that can lead to poor performance, such as I/O operations, network data transmission, and high data access time. In recent years, researchers have explored prefetching techniques to reduce the data access time as a potential solution to these problems. Nevertheless, several issues must be considered to optimize the prefetching mechanism. These include launching the prefetch at an appropriate time to avoid conflicts with other operations and minimize waiting time, determining the amount of prefetched data to avoid overload and underload, and placing the prefetched data in locations that can be accessed efficiently when required. In this paper, we propose a smart prefetch mechanism that consists of three phases designed to address these issues. First, we enhance the task progress rate to calculate the optimal time for triggering prefetch operations. Next, we utilize K-Nearest Neighbor clustering to identify which data blocks should be prefetched in each round, employing the data locality feature to determine the placement of prefetched data. Our experimental results demonstrate that our proposed smart prefetch mechanism improves job execution time by an average of 28.33% by increasing the rate of local tasks.

Keywords
Hadoop Performance, Smart prefetch technique, K-Nearest Neighbor Clustering, MapReduce, Machine Learning, Cache Replacement
Received
2024-08-28
Accepted
2024-11-01
Published
2025-04-17
Publisher
EAI
http://dx.doi.org/10.4108/eetsis.9110

Copyright © 2025 R. Ghazali et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL