Ubiquitous Communications and Network Computing. Second EAI International Conference, Bangalore, India, February 8–10, 2019, Proceedings

Research Article

Prefix Tree Based MapReduce Approach for Mining Frequent Subgraphs

Download
97 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-20615-4_17,
        author={Supriya Movva and Saketh Prata and Sai Sampath and R. Gayathri},
        title={Prefix Tree Based MapReduce Approach for Mining Frequent Subgraphs},
        proceedings={Ubiquitous Communications and Network Computing. Second EAI International Conference, Bangalore, India, February 8--10, 2019, Proceedings},
        proceedings_a={UBICNET},
        year={2019},
        month={5},
        keywords={Frequent subgraph mining Subgraph Support threshold Hadoop framework Map-reduce},
        doi={10.1007/978-3-030-20615-4_17}
    }
    
  • Supriya Movva
    Saketh Prata
    Sai Sampath
    R. Gayathri
    Year: 2019
    Prefix Tree Based MapReduce Approach for Mining Frequent Subgraphs
    UBICNET
    Springer
    DOI: 10.1007/978-3-030-20615-4_17
Supriya Movva1,*, Saketh Prata1,*, Sai Sampath1, R. Gayathri1
  • 1: Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham
*Contact email: supriya.movva.212@gmail.com, pratasaketh@gmail.com

Abstract

The frequent subgraphs are the subgraphs which appear in a number, more than or equal to a user-defined threshold. Many algorithms assume that the apriori based approach yields an efficient result for finding frequent subgraphs, but in our research, we found out that Apriori algorithm lacks scalability with the main memory. Frequent subgraph mining using Apriori algorithm with FS tree uses adjacency list representation. FS tree is a prefix tree data structure. It implements the algorithm in two phases. In the first phase, it uses the Apriori algorithm to find frequent two edge subgraphs. In the second phase, it uses FS-tree algorithm to search all the frequent subgraphs from frequent two edge subgraphs. Scanning the dataset for every candidate is the drawback of the Apriori algorithm, so the Apriori algorithm with FS-tree is used to overcome the multiple scanning. This algorithm is also implemented in an assumption that the data set fits well in memory. In this paper, we propose parallel map-reduce based frequent subgraph mining technique performed in a distributed environment on the Hadoop framework. The experiments validate the efficiency of the algorithm for generating frequent subgraphs in large graph datasets.