7th International Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

OTPM: Failure Handling in Data-intensive Analytical Processing

Download663 downloads
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2011.247083,
        author={Binh Han and Edward Omiecinski and Leo Mark and Ling Liu},
        title={OTPM: Failure Handling in Data-intensive Analytical Processing},
        proceedings={7th International Conference on Collaborative Computing: Networking, Applications and Worksharing},
        publisher={IEEE},
        proceedings_a={COLLABORATECOM},
        year={2012},
        month={4},
        keywords={failure handling fault tolerance analytical processing parallel query processing},
        doi={10.4108/icst.collaboratecom.2011.247083}
    }
    
  • Binh Han
    Edward Omiecinski
    Leo Mark
    Ling Liu
    Year: 2012
    OTPM: Failure Handling in Data-intensive Analytical Processing
    COLLABORATECOM
    ICST
    DOI: 10.4108/icst.collaboratecom.2011.247083
Binh Han1,*, Edward Omiecinski1, Leo Mark1, Ling Liu1
  • 1: Georgia Institute of Technology
*Contact email: binhhan@gatech.edu

Abstract

Parallel processing is the key to speedup performance and to achieve high throughput in processing large scale data analytical workloads. However, failures of nodes involved in the analytical query can interrupt the whole process, resulting in the complete restart of the query if the system does not have query fault-tolerance. Complete restart might be too costly for processing query on very large databases and might not be able to meet the time constraints in decision support systems. In this paper, we present an approach to resume query processing after failure by keeping track of the point at which data has been processed by an operator, called operator tracking. We also consider saving intermediate results using partial materialization. We look at several fundamental parallel database techniques which are widely used today and analyze the performance cost of query processing and recovery using those techniques with our OTPM fault-tolerance approach. We perform simulation-based experiments which show that our approach incurs only a small resume overhead compared to complete pipelining and complete materialization of intermediate results. Also, the combination of our approach with vertically partitioned database in a shared-nothing environment yields the best performance among different settings for parallel processing of data intensive analytical workloads.