Advances in Computer Science and Information Technology. Networks and Communications. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part I

Research Article

An Improved Anti Spam Filter Based on Content, Low Level Features and Noise

Download
356 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-27299-8_59,
        author={Anand Gupta and Chhavi Singhal and Somya Aggarwal},
        title={An Improved Anti Spam Filter Based on Content, Low Level Features and Noise},
        proceedings={Advances in Computer Science and Information Technology. Networks and Communications. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part I},
        proceedings_a={CCSIT PART I},
        year={2012},
        month={11},
        keywords={Low level feature anti obfuscation technique noise},
        doi={10.1007/978-3-642-27299-8_59}
    }
    
  • Anand Gupta
    Chhavi Singhal
    Somya Aggarwal
    Year: 2012
    An Improved Anti Spam Filter Based on Content, Low Level Features and Noise
    CCSIT PART I
    Springer
    DOI: 10.1007/978-3-642-27299-8_59
Anand Gupta1,*, Chhavi Singhal1,*, Somya Aggarwal1,*
  • 1: Netaji Subhas Institute of Technology
*Contact email: Omaranand@gmail.com, chhavisinghal28@gmail.com, somya3322@gmail.com

Abstract

Spammers are constantly evolving new spam technologies, the latest of which is image spam. Till now research in spam image identification has been addressed by considering properties like colour, size, compressibility, entropy, content etc. However, we feel the methods of identification so evolved have certain limitations due to embedded obfuscation like complex backgrounds, compression artifacts and wide variety of fonts and formats .To overcome these limitations, we have proposed a 4-stage methodology which uses the information of low level features and content of the spam images. The method works on images with and without noise separately. Also colour properties of the images are altered so that OCR (Optical Character Recognition) can easily read the text embedded in the image. The proposed method is tested on a dataset of 1984 spam images and is found to be effective in identifying all types of spam images having (1) only text, (2) only images or (3) both text and images. The encouraging experimental results show that the technique achieves an accuracy of 92%.