PUED: A Social Spammer Detection Method Based on PU Learning and Ensemble Learning

Yuqi Song; Min Gao; Junliang Yu; Wentao Li; Lulan Yu; Xinyu Xiao

Collaborative Computing: Networking, Applications and Worksharing. 13th International Conference, CollaborateCom 2017, Edinburgh, UK, December 11–13, 2017, Proceedings

Research Article

PUED: A Social Spammer Detection Method Based on PU Learning and Ensemble Learning

Download

62 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-00916-8_14,
    author={Yuqi Song and Min Gao and Junliang Yu and Wentao Li and Lulan Yu and Xinyu Xiao},
    title={PUED: A Social Spammer Detection Method Based on PU Learning and Ensemble Learning},
    proceedings={Collaborative Computing: Networking, Applications and Worksharing. 13th International Conference, CollaborateCom 2017, Edinburgh, UK, December 11--13, 2017, Proceedings},
    proceedings_a={COLLABORATECOM},
    year={2018},
    month={10},
    keywords={Spammer detection Social network PU Learning Ensemble Learning},
    doi={10.1007/978-3-030-00916-8_14}
}

Yuqi Song
Min Gao
Junliang Yu
Wentao Li
Lulan Yu
Xinyu Xiao
Year: 2018
PUED: A Social Spammer Detection Method Based on PU Learning and Ensemble Learning
COLLABORATECOM
Springer
DOI: 10.1007/978-3-030-00916-8_14

Yuqi Song^,*, Min Gao^,*, Junliang Yu^,*, Wentao Li¹^,*, Lulan Yu^,*, Xinyu Xiao^,*

1: University of Technology Sydney

*Contact email: songyq@cqu.edu.cn, gaomin@cqu.edu.cn, yu.jl@cqu.edu.cn, wentao.li@student.uts.edu.au, lulanyu@cqu.edu.cn, xiaoxy@cqu.edu.cn

Abstract

In social network, people generally tend to share information with others, thus, those who have frequent access to the social network are more likely to be affected by the interest and opinions of other people. This characteristic is exploited by spammers, who spread spam information in network to disturb normal users for interest motives seriously. Numerous notable studies have been done to detect social spammers, and these methods can be categorized into three types: unsupervised, supervised and semi-supervised methods. While the performance of supervised and semi-supervised methods is superior in terms of detection accuracy, these methods usually suffer from the dilemma of imbalanced data since the number of unlabeled normal users is far more than spammers’ in real situations. To address the problem, we propose a novel method only relying on normal users to detect spammers exactly. We present two steps: one picks out reliable spammers from unlabeled samples which is imposed on a voting classifier; while the other trains a random forest detector from the normal users and reliable spammers. We conduct experiments on two real-world social datasets and show that our method outperforms other supervised methods.

Keywords: Spammer detection Social network PU Learning Ensemble Learning

Published: 2018-10-17
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-00916-8_14