Collaborative, Autonomic, and Resilient Defenses for Cyber Physical Systems

Research Article

Filtering Spam by Hybrid Approach

Download168 downloads
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2011.247188,
        author={Mohamed Tabris and Youssif Al-Nashif and Salim Hariri},
        title={Filtering Spam by Hybrid Approach},
        proceedings={Collaborative, Autonomic, and Resilient Defenses for Cyber Physical Systems},
        publisher={IEEE},
        proceedings_a={CYPHYCARD'},
        year={2012},
        month={4},
        keywords={hybrid spam filtering spam anomaly based spam detection email protection},
        doi={10.4108/icst.collaboratecom.2011.247188}
    }
    
  • Mohamed Tabris
    Youssif Al-Nashif
    Salim Hariri
    Year: 2012
    Filtering Spam by Hybrid Approach
    CYPHYCARD'
    ICST
    DOI: 10.4108/icst.collaboratecom.2011.247188
Mohamed Tabris1, Youssif Al-Nashif1, Salim Hariri1,*
  • 1: the University of Arizona
*Contact email: hariri@ece.arizona.edu

Abstract

Spam has become a main cause of financial loss in most of the organization. It was seen that 81.6% of the email Traffic in 2006 was spam [1]. The loss incurred by the companies is growing exponentially and so is the number of spam emails. This makes spam detection and spam filters are critically important. There are various techniques used in order to filter spam, two of most prominent techniques available are IP blacklisting/white-listing and content-based filtering. An email is identified as spam based on the reputation of the source done by grey listing the source from their previous history [2]. In this paper we introduce a method for improving the spam filters by using a hybrid technique (Content Based & Anomaly based detection approach). In this work, we show how to identify whether the email is spam or not by implementing models that capture the nature of emails’ headers and patterns found in the emails’ content. The general behavior of spam and legitimate emails for each of these models is obtained and assigned a score; the value of this score is used to differentiate between a legitimate emails and spam. By using this hybrid approach, we were able to detect spam with a false positive rate of .54% and a false negative rate of 1.34%. We also discuss the relation between phishing and spam and how some anti-phishing techniques can be used in spam filters.