11th EAI International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness

Research Article

Design and Implementation of Various File Deduplication Schemes on Storage Devices

Download860 downloads
  • @INPROCEEDINGS{10.4108/eai.19-8-2015.2260903,
        author={Yong-Ting Wu and Min-Chieh Yu and Jenq-Shiou Leu and Eau-Chung Lee and Tian Song},
        title={Design and Implementation of Various File Deduplication Schemes on Storage Devices},
        proceedings={11th EAI International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness},
        publisher={IEEE},
        proceedings_a={QSHINE},
        year={2015},
        month={9},
        keywords={ffile deduplication; cloud system; storage devices},
        doi={10.4108/eai.19-8-2015.2260903}
    }
    
  • Yong-Ting Wu
    Min-Chieh Yu
    Jenq-Shiou Leu
    Eau-Chung Lee
    Tian Song
    Year: 2015
    Design and Implementation of Various File Deduplication Schemes on Storage Devices
    QSHINE
    IEEE
    DOI: 10.4108/eai.19-8-2015.2260903
Yong-Ting Wu1, Min-Chieh Yu1, Jenq-Shiou Leu,*, Eau-Chung Lee2, Tian Song3
  • 1: National Taiwan University of Science and Technology
  • 2: QNAP Inc.
  • 3: Tokushima University
*Contact email: jsleu@mail.ntust.edu.tw

Abstract

As the smart devices revolutionize, people may generate a lot of data and store the data in the local or remote file system in their daily lives. Even though the novel computer hardware and network technologies can handle the demand of generating a big volume of data, effective file deduplication can save storage space in either the private computing environment or the public cloud system. In the paper, we aim at designing and implementing various file deduplication schemes on storage device, which are based on different duplication checking rules, including file name, file size, and file full/partial content hash value. Comprehensive experiment results show that a partial content hashing based file deduplication can have a better trade-off between the computation cost and deduplication accuracy.