
Research Article
Phishing Web Page Detection with Semi-Supervised Deep Anomaly Detection
@INPROCEEDINGS{10.1007/978-3-030-90022-9_20, author={Linshu Ouyang and Yongzheng Zhang}, title={Phishing Web Page Detection with Semi-Supervised Deep Anomaly Detection}, proceedings={Security and Privacy in Communication Networks. 17th EAI International Conference, SecureComm 2021, Virtual Event, September 6--9, 2021, Proceedings, Part II}, proceedings_a={SECURECOMM PART 2}, year={2021}, month={11}, keywords={Phishing Semi-supervised learning Anomaly detection}, doi={10.1007/978-3-030-90022-9_20} }
- Linshu Ouyang
Yongzheng Zhang
Year: 2021
Phishing Web Page Detection with Semi-Supervised Deep Anomaly Detection
SECURECOMM PART 2
Springer
DOI: 10.1007/978-3-030-90022-9_20
Abstract
Phishing web page is one of the most serious threats to the users of the Internet. Recently, deep learning-based phishing detection methods have achieved significant improvement. However, these supervised deep neural networks require a large number of training samples. They also have difficulties in detecting novel phishing web pages. Using anomaly detection approaches is a possible way out yet is currently less explored, possibly due to two reasons. First, HTML codes lie in high dimensional discrete space which is difficult to handle for existing anomaly detection methods. Second, existing anomaly detection methods may find other types of anomalies that are beyond the scope of phishing.
In this paper, we propose a novel semi-supervised deep anomaly detection-based phishing webpage detection method. We first utilize a multi-head self-attention network to learn feature representation that is suitable for anomaly detection from HTML codes. Then we build a semi-supervised learner with Gaussian prior and contrastive loss to fulfill an end-to-end anomaly detector that is specifically optimized for detecting phishing webpages. Extensive experiments on a real-world dataset demonstrate that the accuracy of our method outperforms other state-of-the-art methods by a large margin.