About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Innovations and Interdisciplinary Solutions for Underserved Areas. 6th EAI International Conference, InterSol 2023, Flic en Flac, Mauritius, September 16-17, 2023, Proceedings

Research Article

Characterization of Malicious URLs Using Machine Learning and Feature Engineering

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-51849-2_2,
        author={Sidwendluian Romaric Nana and Didier Bassol\^{e} and Jean Serge Dimitri Ouattara and Oumarou Si\^{e}},
        title={Characterization of Malicious URLs Using Machine Learning and Feature Engineering},
        proceedings={Innovations and Interdisciplinary Solutions for Underserved Areas. 6th EAI International Conference, InterSol 2023, Flic en Flac, Mauritius, September 16-17, 2023, Proceedings},
        proceedings_a={INTERSOL},
        year={2024},
        month={2},
        keywords={Malicious URL Characterization Feature Engineering Detection Classification},
        doi={10.1007/978-3-031-51849-2_2}
    }
    
  • Sidwendluian Romaric Nana
    Didier Bassolé
    Jean Serge Dimitri Ouattara
    Oumarou Sié
    Year: 2024
    Characterization of Malicious URLs Using Machine Learning and Feature Engineering
    INTERSOL
    Springer
    DOI: 10.1007/978-3-031-51849-2_2
Sidwendluian Romaric Nana1,*, Didier Bassolé1, Jean Serge Dimitri Ouattara1, Oumarou Sié1
  • 1: Laboratoire de Mathématiques et d’Informatique
*Contact email: sidnanaroma@gmail.com

Abstract

In this paper, we use Machine Learning models for malicious URL detection and classification by Feature Engineering techniques. These models were implemented with scikit-learn using Random Forest, Support Vector Machine and XGBoost classifier algorithms. Our models were trained, tested, and then optimized with a dataset of 641,125 URLs (benign, defacement, malware, and phishing) from several sources including ISCX-URL2016 from the University of New Brunswick. Through iterative learning, we have shown that the combination of certain hyperparameters and features reduces the false positive rate. The results obtained are interesting with scores close to 100% and zero false positive rates for some types of URLs. We then evaluated the performance of the models against other related works models.

Keywords
Malicious URL Characterization Feature Engineering Detection Classification
Published
2024-02-02
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-51849-2_2
Copyright © 2023–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL