3rd International IEEE/Create-Net Workshop on Networks for Grid Applications

Research Article

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed

  • @INPROCEEDINGS{10.4108/gridnets.2006.14,
        author={Ming Lei and Susan V.  Vrbsky and Xiaoyan Hong},
        title={A Dynamic Data Grid Replication Strategy to Minimize the Data Missed},
        proceedings={3rd International IEEE/Create-Net Workshop on Networks for Grid Applications},
        publisher={IEEE},
        proceedings_a={GRIDNETS},
        year={2006},
        month={10},
        keywords={Data Grid  data availability  data missing rate  limited storage  replica strategy},
        doi={10.4108/gridnets.2006.14}
    }
    
  • Ming Lei
    Susan V. Vrbsky
    Xiaoyan Hong
    Year: 2006
    A Dynamic Data Grid Replication Strategy to Minimize the Data Missed
    GRIDNETS
    IEEE
    DOI: 10.4108/gridnets.2006.14
Ming Lei1,*, Susan V. Vrbsky1,*, Xiaoyan Hong1,*
  • 1: Department of Computer Science, University of Alabama, Tuscaloosa, AL 35487-0290
*Contact email: mlei@cs.ua.edu, vrbsky@cs.ua.edu, hxy@cs.ua.edu

Abstract

The data availability in a data grid system is complicated by node failure, data catalog error and an unreliable network. To improve the job response time and data availability, data is typically replicated in large scale data-massive applications. However, the dynamic behavior of a Grid user makes it difficult to determine where and how to make data replications to meet the system availability goal. Some strategies for data replication have previously been proposed, but they assumed unlimited storage for replicas. In this paper, we present two new metrics to measure the system data availability. We then model the system availability problem assuming limited replica storage and transfer this to a classic optimal problem. We present four strategies for limited replica storage that maximize the data availability by minimizing the data missed rate (MinDmr), based on a file weight and prediction function. Our simulation on the OptorSim shows our MinDmr algorithm achieves better performance overall than others in term of data availability. Results indicate the performance of MinDmr is always better than others with varying prediction functions, job schedulers and file access patterns, as far as the data missing rate is concerned.