
Research Article
Web Bot Detection Based on Hidden Features of HTTP Access Log
@INPROCEEDINGS{10.1007/978-3-031-33458-0_3, author={Kaiyuan Li and Mingrong Xiang and Mitalkumar Kakaiya and Shashank Kaul and Xiaodong Wang}, title={Web Bot Detection Based on Hidden Features of HTTP Access Log}, proceedings={Tools for Design, Implementation and Verification of Emerging Information Technologies. 17th EAI International Conference, TridentCom 2022, Melbourne, Australia, November 23-25, 2022, Proceedings}, proceedings_a={TRIDENTCOM}, year={2023}, month={6}, keywords={Web bot Web bot detection Unsupervised learning Machine learning Cyber security}, doi={10.1007/978-3-031-33458-0_3} }
- Kaiyuan Li
Mingrong Xiang
Mitalkumar Kakaiya
Shashank Kaul
Xiaodong Wang
Year: 2023
Web Bot Detection Based on Hidden Features of HTTP Access Log
TRIDENTCOM
Springer
DOI: 10.1007/978-3-031-33458-0_3
Abstract
Web bot generates a large fraction of traffic on present-day Web servers. It not only introduces a threat to website security, performance and user privacy but also raises concerns about valuable information and digital asset scripting. Much research explored traffic features, tagging legitimate users and bot traffic, and created some efficient machine-learning models to detect web bots. However, previous machine learning methods used to detect web bots based on the observable raw data, that have become more challenging with the increasingly diverse and complex logic and technologies of web bots. In this research, we proposed the Autoencoder-based method to detect the web bot, distinguishing the HTTP access behaviours between humans and web bots. Our method aims to find the hidden features from the raw HTTP access data and allow for clustering the web bots with scattered raw features. Furthermore, we use the polar coordinates transformation strategy to rotate the geometry of hidden features and solve the clustering difficulties caused by the randomness of the neural network environment. We compare the web bot detection performance with the other competitors, which yielded about 30% improvements in accuracy.