Smart Societies, Infrastructure, Technologies and Applications. First International Conference, SCITA 2017, Jeddah, Saudi Arabia, November 27–29, 2017, Proceedings

Research Article

Parallel Shortest Path Graph Computations of United States Road Network Data on Apache Spark

Download
617 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-94180-6_30,
        author={Yasir Arfat and Rashid Mehmood and Aiiad Albeshri},
        title={Parallel Shortest Path Graph Computations of United States Road Network Data on Apache Spark},
        proceedings={Smart Societies, Infrastructure, Technologies and Applications. First International Conference, SCITA 2017, Jeddah, Saudi Arabia, November 27--29, 2017, Proceedings},
        proceedings_a={SCITA},
        year={2018},
        month={7},
        keywords={Big data Apache spark Apache hadoop Graph analytics Apache GraphX Shortest paths Road networks},
        doi={10.1007/978-3-319-94180-6_30}
    }
    
  • Yasir Arfat
    Rashid Mehmood
    Aiiad Albeshri
    Year: 2018
    Parallel Shortest Path Graph Computations of United States Road Network Data on Apache Spark
    SCITA
    Springer
    DOI: 10.1007/978-3-319-94180-6_30
Yasir Arfat1,*, Rashid Mehmood1,*, Aiiad Albeshri1,*
  • 1: King Abdulaziz University
*Contact email: yasirarfat081@gmail.com, RMehmood@kau.edu.sa, aaalbeshri@kau.edu.sa

Abstract

Big data is being generated from various sources such as Internet of Things (IoT) and social media. Big data cannot be processed by traditional tools and technologies due to their properties, volume, velocity, veracity, and variety. Graphs are becoming increasingly popular to model real-world problems; the problems are typically large and, hence, give rise to large graphs, which could be analysed and solved using big data technologies. This paper explores the performance of single source shortest path graph computations using the Apache Spark big data platform. We use the United States road network data, modelled as graphs, and calculate shortest paths between vertices. The experiments are performed on the Aziz supercomputer (a Top500 machine). We solve problems of varying graph sizes, i.e. various states of the US, and analyse Spark’s parallelization behavior. As expected, the speedup is dependent on both the size of the data and the number of parallel nodes.