Multi-modal Fusion for Flasher Detection in a Mobile Video Chat Application

Lei Tian; Rahat Rafiq; Shaosong Li; David Chu; Richard Han; Qin Lv; Shivakant Mishra

11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

Research Article

Multi-modal Fusion for Flasher Detection in a Mobile Video Chat Application

Download634 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/icst.mobiquitous.2014.257973,
    author={Lei Tian and Rahat Rafiq and Shaosong Li and David Chu and Richard Han and Qin Lv and Shivakant Mishra},
    title={Multi-modal Fusion for Flasher Detection in a Mobile Video Chat Application},
    proceedings={11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services},
    publisher={ICST},
    proceedings_a={MOBIQUITOUS},
    year={2014},
    month={11},
    keywords={multi-modal fusion flasher detection mobile video chat},
    doi={10.4108/icst.mobiquitous.2014.257973}
}

Lei Tian
Rahat Rafiq
Shaosong Li
David Chu
Richard Han
Qin Lv
Shivakant Mishra
Year: 2014
Multi-modal Fusion for Flasher Detection in a Mobile Video Chat Application
MOBIQUITOUS
ICST
DOI: 10.4108/icst.mobiquitous.2014.257973

Lei Tian¹^,*, Rahat Rafiq¹, Shaosong Li¹, David Chu², Richard Han¹, Qin Lv¹, Shivakant Mishra¹

1: University of Colorado Boulder
2: Microsoft Research

*Contact email: lei.tian@colorado.edu

Abstract

This paper investigates the development of accurate and efficient classifiers to identify misbehaving users (i.e., “flashers”) in a mobile video chat application. Our analysis is based on video session data collected from a mobile client that we built that connects to a popular random video chat service. We show that prior image-based classifiers designed for identifying normal and misbehaving users in online video chat systems perform poorly on mobile video chat data. We present an enhanced image-based classifier that improves classification performance on mobile data. More importantly, we demonstrate that incorporating multi-modal mobile sensor data from accelerometer and the camera state (front/back) along with audio can significantly improve the overall image-based classification accuracy. Our work also shows that leveraging multiple image-based predictions within a session (i.e., temporal modality) has the potential to further improve the classification performance. Finally, we show that the cost of classification in terms of running time can be significantly reduced by employing a multilevel cascaded classifier in which high-complexity features and further image-based predictions are not generated unless needed.

Keywords: multi-modal fusion flasher detection mobile video chat

Published: 2014-11-17
Publisher: ICST

: http://dx.doi.org/10.4108/icst.mobiquitous.2014.257973