10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

A Distributed Polygon Retrieval Algorithm using MapReduce

Download646 downloads
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2014.257705,
        author={Qiulei Guo and Balaji Palanisamy and Hassan Karimi},
        title={A Distributed Polygon Retrieval Algorithm using MapReduce},
        proceedings={10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing},
        publisher={IEEE},
        proceedings_a={COLLABORATECOM},
        year={2014},
        month={11},
        keywords={polygon retrieval mapreduce},
        doi={10.4108/icst.collaboratecom.2014.257705}
    }
    
  • Qiulei Guo
    Balaji Palanisamy
    Hassan Karimi
    Year: 2014
    A Distributed Polygon Retrieval Algorithm using MapReduce
    COLLABORATECOM
    IEEE
    DOI: 10.4108/icst.collaboratecom.2014.257705
Qiulei Guo1, Balaji Palanisamy1,*, Hassan Karimi1
  • 1: University of Pittsburgh
*Contact email: bpalan@pitt.edu

Abstract

The proliferation of data acquisition devices like 3D laser scanners had led to the burst of large-scale spatial terrain data which imposes many challenges to spatial data analysis and computation. With the advent of several emerging collaborative cloud technologies, a natural and cost-effective approach to managing such large-scale data is to store and share such datasets in a publicly hosted cloud service and process the data within the cloud itself using modern distributed computing paradigms such as MapReduce. For several key spatial data analysis and computation problems, polygon retrieval is a fundamental operation which is often computed under real-time constraints. However, existing sequential algorithms fail to meet this demand effectively given that terrain data in recent years have witnessed an unprecedented growth in both volume and rate. In this work, we develop a MapReduce-based parallel polygon retrieval algorithm which aims at minimizing the IO and CPU loads of the map and reduce tasks during spatial data processing. The results of the preliminary experiments on a Hadoop cluster demonstrate that the proposed techniques are scalable and lead to more than 35% reduction in execution time of the polygon retrieval operation over existing distributed algorithms.