Research Article
Multi-modal Fusion for Flasher Detection in a Mobile Video Chat Application
@INPROCEEDINGS{10.4108/icst.mobiquitous.2014.257973, author={Lei Tian and Rahat Rafiq and Shaosong Li and David Chu and Richard Han and Qin Lv and Shivakant Mishra}, title={Multi-modal Fusion for Flasher Detection in a Mobile Video Chat Application}, proceedings={11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services}, publisher={ICST}, proceedings_a={MOBIQUITOUS}, year={2014}, month={11}, keywords={multi-modal fusion flasher detection mobile video chat}, doi={10.4108/icst.mobiquitous.2014.257973} }
- Lei Tian
Rahat Rafiq
Shaosong Li
David Chu
Richard Han
Qin Lv
Shivakant Mishra
Year: 2014
Multi-modal Fusion for Flasher Detection in a Mobile Video Chat Application
MOBIQUITOUS
ICST
DOI: 10.4108/icst.mobiquitous.2014.257973
Abstract
This paper investigates the development of accurate and efficient classifiers to identify misbehaving users (i.e., “flashers”) in a mobile video chat application. Our analysis is based on video session data collected from a mobile client that we built that connects to a popular random video chat service. We show that prior image-based classifiers designed for identifying normal and misbehaving users in online video chat systems perform poorly on mobile video chat data. We present an enhanced image-based classifier that improves classification performance on mobile data. More importantly, we demonstrate that incorporating multi-modal mobile sensor data from accelerometer and the camera state (front/back) along with audio can significantly improve the overall image-based classification accuracy. Our work also shows that leveraging multiple image-based predictions within a session (i.e., temporal modality) has the potential to further improve the classification performance. Finally, we show that the cost of classification in terms of running time can be significantly reduced by employing a multilevel cascaded classifier in which high-complexity features and further image-based predictions are not generated unless needed.