Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III

Research Article

Comparing Supervised Learning Classifiers to Detect Advanced Fee Fraud Activities on Internet

Download
262 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-27317-9_10,
        author={Abiodun Modupe and Oludayo Olugbara and Sunday Ojo},
        title={Comparing Supervised Learning Classifiers to Detect Advanced Fee Fraud Activities on Internet},
        proceedings={Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III},
        proceedings_a={CCSIT PART  III},
        year={2012},
        month={11},
        keywords={Advanced Fee Fraud Word Clustering Supervised Learning Cluster Features},
        doi={10.1007/978-3-642-27317-9_10}
    }
    
  • Abiodun Modupe
    Oludayo Olugbara
    Sunday Ojo
    Year: 2012
    Comparing Supervised Learning Classifiers to Detect Advanced Fee Fraud Activities on Internet
    CCSIT PART III
    Springer
    DOI: 10.1007/978-3-642-27317-9_10
Abiodun Modupe1,*, Oludayo Olugbara2,*, Sunday Ojo1,*
  • 1: Tshwane University of Technongy
  • 2: Durban University of Technongy
*Contact email: modupea@tut.ac.za, oludayoo@dut.ac.za, ojoso@tut.ac.za

Abstract

Due to its inherent vulnerability, internet is frequently abused for various criminal activities such as Advanced Fee Fraud (AFF). At present, it is difficult to accurately detect activities of AFF defrauders on internet. For this purpose, we compare classification accuracies of Binary Logistic Regression (BLR), Back-propagation Neural Network (BNN), Naive Bayesian Classifier (NBC) and Support Vector Machine (SVM) learning methods. The word clustering method (globalCM) is used to create clusters of words present in the training dataset. A Vector Space Model (VSM) is calculated from words in each e-mail in the training set. The WEKA data mining framework is selected as a tool to build supervised learning classifiers from the set of VSMs using the learning methods. Experiments are performed using stratified 10-fold cross-validation method to estimate classification accuracies of the classifiers. Results generally show that SVM utilizing a polynomial kernel gives the best classification accuracy. This study makes a positive contribution to the problem of detecting unwanted e-mails. The comparison of different learning methods is also valuable for a decision maker to consider tradeoffs in method accuracy versus complexity.