A Content-Context-Centric Approach for Detecting Vandalism in Wikipedia

Lakshmish Ramaswamy; Raga Tummalapenta; Kang Li; Calton Pu

9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

A Content-Context-Centric Approach for Detecting Vandalism in Wikipedia

Download1239 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/icst.collaboratecom.2013.254059,
    author={Lakshmish Ramaswamy and Raga Tummalapenta and Kang Li and Calton Pu},
    title={A Content-Context-Centric Approach for Detecting Vandalism in Wikipedia},
    proceedings={9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing},
    publisher={ICST},
    proceedings_a={COLLABORATECOM},
    year={2013},
    month={11},
    keywords={collaborative online social media vandalism detection content-context www co-occurrence probability top-ranked co-occurrence probability},
    doi={10.4108/icst.collaboratecom.2013.254059}
}

Lakshmish Ramaswamy
Raga Tummalapenta
Kang Li
Calton Pu
Year: 2013
A Content-Context-Centric Approach for Detecting Vandalism in Wikipedia
COLLABORATECOM
IEEE
DOI: 10.4108/icst.collaboratecom.2013.254059

Lakshmish Ramaswamy¹, Raga Tummalapenta¹, Kang Li¹^,*, Calton Pu²

1: University of Georgia
2: Georgia Institute of Technology

*Contact email: kangli@cs.uga.edu

Abstract

Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content- context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach.

Keywords: collaborative online social media, vandalism detection, content-context, www co-occurrence probability, top-ranked co-occurrence probability

Published: 2013-11-12
Publisher: ICST

: http://dx.doi.org/10.4108/icst.collaboratecom.2013.254059

A Content-Context-Centric Approach for Detecting Vandalism in Wikipedia

Abstract

About EAI

Community

Publish with EAI