
Research Article
Poisoning-Attack Detection Using an Auto-encoder for Deep Learning Models
@INPROCEEDINGS{10.1007/978-3-031-36574-4_22, author={El Moadine Anass and Coatrieux Gouenou and Bellafqira Reda}, title={Poisoning-Attack Detection Using an Auto-encoder for Deep Learning Models}, proceedings={Digital Forensics and Cyber Crime. 13th EAI International Conference, ICDF2C 2022, Boston, MA, November 16-18, 2022, Proceedings}, proceedings_a={ICDF2C}, year={2023}, month={7}, keywords={Deep Learning Model Poisoning Attack Anomaly Detection Auto-Encoder Flipping Attack Pattern Addition Attack}, doi={10.1007/978-3-031-36574-4_22} }
- El Moadine Anass
Coatrieux Gouenou
Bellafqira Reda
Year: 2023
Poisoning-Attack Detection Using an Auto-encoder for Deep Learning Models
ICDF2C
Springer
DOI: 10.1007/978-3-031-36574-4_22
Abstract
Modern Deep LearningDLmodels can be trained in various ways, including incremental learning. The idea is that a user whose model has been trained on his own data will perform better on new data. The model owner can share its model with other users, who can then train it on their data and return it to the model owner. However, these users can perform poisoning attacksPAby modifying the model’s behavior in the attacker’s favor. In the context of incremental learning, we are interested in detecting aDLmodel for image classification that has undergone a poisoning attack. To perform such attacks, an attacker can, for example, modify the labels of some training data, which is then used to fine-tune the model in such a way that the attacked model will incorrectly classify images similar to the attacked images, while maintaining good classification performance on other images. As a countermeasure, we propose a poisoned model detector that is capable of detecting various types ofPAattacks. This technique exploits the reconstruction error of a machine learning-based auto-encoderAEtrained to model the distribution of the activation maps from the second-to-last layer of the model to protect. By analyzingAEreconstruction errors for some given inputs, we demonstrate that aPAcan be distinguished from a fine-tuning operation that can be used to improve classification performance. We demonstrate the performance of our method on a variety of architectures and in the context of aDLmodel for mass cancer detection in mammography images.