
Research Article
Assessing the Quality of Differentially Private Synthetic Data for Intrusion Detection
@INPROCEEDINGS{10.1007/978-3-031-25538-0_25, author={Md Ali Reza Al Amin and Sachin Shetty and Valerio Formicola and Martin Otto}, title={Assessing the Quality of Differentially Private Synthetic Data for Intrusion Detection}, proceedings={Security and Privacy in Communication Networks. 18th EAI International Conference, SecureComm 2022, Virtual Event, October 2022, Proceedings}, proceedings_a={SECURECOMM}, year={2023}, month={2}, keywords={Intrusion detection system Differential privacy Generative adversarial networks Data sharing}, doi={10.1007/978-3-031-25538-0_25} }
- Md Ali Reza Al Amin
Sachin Shetty
Valerio Formicola
Martin Otto
Year: 2023
Assessing the Quality of Differentially Private Synthetic Data for Intrusion Detection
SECURECOMM
Springer
DOI: 10.1007/978-3-031-25538-0_25
Abstract
Supervised learning is effectively adopted in Network Intrusion Detection Systems (IDS) to detect malicious activities in Information Technology (IT) and Operation Technology (OT). Sharing high-quality network data among cyber-security practitioners increases the chance of detecting new threat campaigns by analyzing updated traffic features. As data sharing brings privacy concerns, Differential-Privacy (DP) has emerged as a promising approach to performing privacy-preserving analytics. This paper presents an approach to generating high-quality synthetic network features using a differentially private Generative Adversarial Network (DP-GAN) based on the DoppleGANgerhttps://github.com/fjxmlzn/DoppelGANgertoolset. We assess the classification performance of several machine learning (ML) models on a privacy-preserved synthetic dataset derived from the NSL-KDD intrusion dataset. Experiments show ML algorithms achieve high classification accuracy on the synthetic data ((95.95\%)) with a low privacy budget ((\varepsilon = 6.73)), i.e., low success rates for membership inference attacks. Hence, DP-GAN models offer a promising tool for sharing traffic features with bounded loss of privacy.