
Research Article
Ensemble Fusion for Enhanced Malicious URL Detection by Integrating Machine Learning and Deep Learning Techniques
@INPROCEEDINGS{10.1007/978-3-031-77075-3_27, author={Raja Rao PBV and Kiran Sree Pokkuluri and M. Prasad and Neeraj Sharma and BSatya Narayana Murthy and Adina Karunasri}, title={Ensemble Fusion for Enhanced Malicious URL Detection by Integrating Machine Learning and Deep Learning Techniques}, proceedings={Cognitive Computing and Cyber Physical Systems. 5th EAI International Conference, IC4S 2024, Bhimavaram, India, April 5--7, 2024, Proceedings, Part-I}, proceedings_a={IC4S}, year={2025}, month={2}, keywords={Malicious URL RNN LSTM TF-IDF MLP}, doi={10.1007/978-3-031-77075-3_27} }
- Raja Rao PBV
Kiran Sree Pokkuluri
M. Prasad
Neeraj Sharma
BSatya Narayana Murthy
Adina Karunasri
Year: 2025
Ensemble Fusion for Enhanced Malicious URL Detection by Integrating Machine Learning and Deep Learning Techniques
IC4S
Springer
DOI: 10.1007/978-3-031-77075-3_27
Abstract
The exponential rise of malicious activities on the internet underscored the critical need for robust detection mechanisms to safeguard users from potential threats. In this paper, the authors propose an innovative method for enhancing malicious URL detection by utilizing ensemble fusion techniques that integrate both ML and DL methodologies. The proposed method began by loading and preprocessing a large-scale dataset comprising 5,49,346 URLs sourced from Kaggle. Through feature engineering and extraction, the dataset is transformed into a numerical format suitable for model training, employing TF-IDF to capture the importance of features. Subsequently, individual ML models are trained, including Random Forest, XGBoost, and Gradient Boosting, as well as the DL models Multi-Layer Perceptron (MLP), RNN, LSTM, and GRU, on the preprocessed data. Random Forest achieved a recall of 97% and an accuracy of 97.50%, while LSTM demonstrated a recall and accuracy of 97% and 97.50%, respectively. Then, ensemble fusion techniques, specifically stacking and the meta-learner approach, were used to combine the predictions from all individual models and produce a final prediction. Through comprehensive evaluation and performance analysis, the proposed method demonstrated the efficacy of ensemble fusion model in accurately detecting malicious URLs, achieving superior performance compared to individual models. The proposed ensemble model with logistic regression as a meta-learner achieved an accuracy of 98.4% and a recall of 98%. These findings underscore the robustness and superior performance of the ensemble fusion approach in accurately identifying malicious URLs.