1st International ICST Workshop on Knowledge Discovery and Data Mining

Research Article

Effective Pruning Strategies for Sequential Pattern Mining

  • @INPROCEEDINGS{10.4108/wkdd.2008.2682,
        author={Xu Yusheng and Ma Zhixin and Li Lian and Tharam S. Dillon},
        title={Effective Pruning Strategies for Sequential Pattern Mining},
        proceedings={1st International ICST Workshop on Knowledge Discovery and Data Mining},
        publisher={ACM},
        proceedings_a={WKDD},
        year={2010},
        month={5},
        keywords={},
        doi={10.4108/wkdd.2008.2682}
    }
    
  • Xu Yusheng
    Ma Zhixin
    Li Lian
    Tharam S. Dillon
    Year: 2010
    Effective Pruning Strategies for Sequential Pattern Mining
    WKDD
    ACM
    DOI: 10.4108/wkdd.2008.2682
Xu Yusheng1,*, Ma Zhixin1,*, Li Lian1,*, Tharam S. Dillon2,*
  • 1: School of Information Science and Technology, Lanzhou University, China.
  • 2: School of Information System, Curtin University, Perth, Australia.
*Contact email: xuyusheng@lzu.edu.cn, mazhx@lzu.edu.cn, lil@lzu.edu.cn, tharam.dillon@cbs.curtin.edu.au

Abstract

In this paper, we systematically explore the search space of frequent sequence mining and present two novel pruning strategies, SEP (Sequence Extension Pruning) and IEP (Item Extension Pruning), which can be used in all Apriori-like sequence mining algorithms or lattice-theoretic approaches. With a little more memory overhead, proposed pruning strategies can prune invalidated search space and decrease the total cost of frequency counting effectively. For effectiveness testing reason, we optimize SPAM [2] and present the improved algorithm, SPAMSEPIEP, which uses SEP and IEP to prune the search space by sharing the frequent 2-sequences lists. A set of comprehensive performance experiments study shows that SPAMSEPIEP outperforms SPAM by a factor of 10 on small datasets and better than 30% to 50% on reasonably large dataset.