Research Article
An Improved Anti Spam Filter Based on Content, Low Level Features and Noise
@INPROCEEDINGS{10.1007/978-3-642-27299-8_59, author={Anand Gupta and Chhavi Singhal and Somya Aggarwal}, title={An Improved Anti Spam Filter Based on Content, Low Level Features and Noise}, proceedings={Advances in Computer Science and Information Technology. Networks and Communications. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part I}, proceedings_a={CCSIT PART I}, year={2012}, month={11}, keywords={Low level feature anti obfuscation technique noise}, doi={10.1007/978-3-642-27299-8_59} }
- Anand Gupta
Chhavi Singhal
Somya Aggarwal
Year: 2012
An Improved Anti Spam Filter Based on Content, Low Level Features and Noise
CCSIT PART I
Springer
DOI: 10.1007/978-3-642-27299-8_59
Abstract
Spammers are constantly evolving new spam technologies, the latest of which is image spam. Till now research in spam image identification has been addressed by considering properties like colour, size, compressibility, entropy, content etc. However, we feel the methods of identification so evolved have certain limitations due to embedded obfuscation like complex backgrounds, compression artifacts and wide variety of fonts and formats .To overcome these limitations, we have proposed a 4-stage methodology which uses the information of low level features and content of the spam images. The method works on images with and without noise separately. Also colour properties of the images are altered so that OCR (Optical Character Recognition) can easily read the text embedded in the image. The proposed method is tested on a dataset of 1984 spam images and is found to be effective in identifying all types of spam images having (1) only text, (2) only images or (3) both text and images. The encouraging experimental results show that the technique achieves an accuracy of 92%.