Research Article
Parallel Shortest Path Graph Computations of United States Road Network Data on Apache Spark
@INPROCEEDINGS{10.1007/978-3-319-94180-6_30, author={Yasir Arfat and Rashid Mehmood and Aiiad Albeshri}, title={Parallel Shortest Path Graph Computations of United States Road Network Data on Apache Spark}, proceedings={Smart Societies, Infrastructure, Technologies and Applications. First International Conference, SCITA 2017, Jeddah, Saudi Arabia, November 27--29, 2017, Proceedings}, proceedings_a={SCITA}, year={2018}, month={7}, keywords={Big data Apache spark Apache hadoop Graph analytics Apache GraphX Shortest paths Road networks}, doi={10.1007/978-3-319-94180-6_30} }
- Yasir Arfat
Rashid Mehmood
Aiiad Albeshri
Year: 2018
Parallel Shortest Path Graph Computations of United States Road Network Data on Apache Spark
SCITA
Springer
DOI: 10.1007/978-3-319-94180-6_30
Abstract
Big data is being generated from various sources such as Internet of Things (IoT) and social media. Big data cannot be processed by traditional tools and technologies due to their properties, volume, velocity, veracity, and variety. Graphs are becoming increasingly popular to model real-world problems; the problems are typically large and, hence, give rise to large graphs, which could be analysed and solved using big data technologies. This paper explores the performance of single source shortest path graph computations using the Apache Spark big data platform. We use the United States road network data, modelled as graphs, and calculate shortest paths between vertices. The experiments are performed on the Aziz supercomputer (a Top500 machine). We solve problems of varying graph sizes, i.e. various states of the US, and analyse Spark’s parallelization behavior. As expected, the speedup is dependent on both the size of the data and the number of parallel nodes.