Complex Sciences. First International Conference, Complex 2009, Shanghai, China, February 23-25, 2009. Revised Papers, Part 1

Research Article

Spam Source Clustering by Constructing Spammer Network with Correlation Measure

Download
463 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-02466-5_88,
        author={Jeongkyu Shin and Seunghwan Kim},
        title={Spam Source Clustering by Constructing Spammer Network with Correlation Measure},
        proceedings={Complex Sciences. First International Conference, Complex 2009, Shanghai, China, February 23-25, 2009. Revised Papers, Part 1},
        proceedings_a={COMPLEX PART 1},
        year={2012},
        month={5},
        keywords={Electronic spam complex network clustering method},
        doi={10.1007/978-3-642-02466-5_88}
    }
    
  • Jeongkyu Shin
    Seunghwan Kim
    Year: 2012
    Spam Source Clustering by Constructing Spammer Network with Correlation Measure
    COMPLEX PART 1
    Springer
    DOI: 10.1007/978-3-642-02466-5_88
Jeongkyu Shin1,*, Seunghwan Kim1,*
  • 1: Pohang University of Science and Technology
*Contact email: jkshin@physics.postech.ac.kr, swan@postech.ac.kr

Abstract

Spam filtering is one of the most challenging problems in electric message systems. In general, recent studies on specifying real spam source are based on content filtering because spammers usually falsify their origin. We propose a method to specify spam source based on structural analysis with complex network. We assume that each spam sources either has the same victim list or uses the same spam-hosting program. We treat spam source - target relationship as a bipartite network and construct weighted spam source network by network projection using correlation measure. We find that community clustering methods are inappropriate with spammer network. We group spammers with gradient-based grouping, which uses correlations between nodes as gradient between nodes. We convert them into local minima, which helps to cluster spammers into a few spam source groups. We investigate the weblog spam data with the proposed method and validate it. The method that we propose can be applied to diverse categorization problems, such as multiple text categorization and network subunit clustering.