
Research Article
Data Balancing Technique Based on AE-Flow Model for Network Instrusion Detection
@INPROCEEDINGS{10.1007/978-3-031-34790-0_14, author={Xuanrui Xiong and Yufan Zhang and Huijun Zhang and Yi Chen and Hailing Fang and Wen Xu and Weiqing Lin and Yuan Zhang}, title={Data Balancing Technique Based on AE-Flow Model for Network Instrusion Detection}, proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings}, proceedings_a={CHINACOM}, year={2023}, month={6}, keywords={Imbalanced data Deep generative model-Flow AutoEncoder Network Intrusion Detection}, doi={10.1007/978-3-031-34790-0_14} }
- Xuanrui Xiong
Yufan Zhang
Huijun Zhang
Yi Chen
Hailing Fang
Wen Xu
Weiqing Lin
Yuan Zhang
Year: 2023
Data Balancing Technique Based on AE-Flow Model for Network Instrusion Detection
CHINACOM
Springer
DOI: 10.1007/978-3-031-34790-0_14
Abstract
In network intrusion detection, the frequency of some rare network attacks is low, and such samples collected are relatively few. It results in an imbalanced proportion of each category in the dataset. Training the classifier with imbalanced datasets will bias the classifier to majority class samples and affect the classification performance on minority class samples. In response to this problem, researchers usually increase minority class samples and reduce majority class samples to get a balanced dataset. Therefore, we propose a data balancing technique based on AutoEncoder-Flow (AE-Flow) Model. Firstly, we use AutoEncoder (AE) to improve the deep generative model-Flow, obtaining AE-Flow. Then we use it to learn the distribution of minority class samples and generate new samples. Secondly, we use K-means and OneSidedSelection (OSS) algorithms to finish the undersampling of majority class samples. Finally we get a balanced dataset and use machine learning (ML) classifier to finish intrusion detection. We conducted comparative experiments on NSL-KDD dataset. The experimental results show that the balanced dataset obtained by our proposed method can effectively improve the Recall rate on minority class samples and the classification performance on overall samples.