Research Article
Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework
@INPROCEEDINGS{10.1007/978-3-319-90775-8_4, author={C. Sona and Jaison Mulerikkal}, title={Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework}, proceedings={Mobile Networks and Management. 9th International Conference, MONAMI 2017, Melbourne, Australia, December 13-15, 2017, Proceedings}, proceedings_a={MONAMI}, year={2018}, month={5}, keywords={Pattern matching Hadoop MapReduce Big Data Knuth Morris Pratt Algorithm Boyer Moore Algorithm Franek Jennings Smyth Algorithm}, doi={10.1007/978-3-319-90775-8_4} }
- C. Sona
Jaison Mulerikkal
Year: 2018
Performance Comparison of Distributed Pattern Matching Algorithms on Hadoop MapReduce Framework
MONAMI
Springer
DOI: 10.1007/978-3-319-90775-8_4
Abstract
Creating meaning out of the growing Big Data is an insurmountable challenge data scientists face and pattern matching algorithms are great means to create such meaning from heaps of data. However, the available pattern matching algorithms are mostly tested with linear programming models whose adaptability and efficiency are not tested in distributed programming models such as Hadoop MapReduce, which supports Big Data. This paper explains an experience of parallelizing three of such pattern matching algorithms, namely - Knuth Morris Pratt Algorithm (KMP), Boyer Moore Algorithm (BM) and a lesser known Franek Jennings Smyth (FJS) Algorithm and porting them to Hadoop MapReduce framework. All the three algorithms are converted to MapReduce programs using key value pairs and experimented on single node as well as cluster Hadoop environment. The result analysis with the Project Gutenberg data-set has shown all the three parallel algorithms scale well on Hadoop environment as the data size increases. The experimental results prove that KMP algorithm gives higher performance for shorter patterns over BM, and BM algorithm gives higher performance than KMP for longer patterns. However, FJS algorithm, which is a hybrid of KMP and Boyer horspool algorithm which is advanced version of BM, outperforms both KMP and BM for shorter and longer patterns, and emerges as the most suitable algorithm for pattern matching in a Hadoop environment.