
Research Article
Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling
@INPROCEEDINGS{10.1007/978-3-031-54528-3_4, author={Qian Yao and Xinli Xiong and Peng Wang and Yongjie Wang}, title={Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling}, proceedings={Collaborative Computing: Networking, Applications and Worksharing. 19th EAI International Conference, CollaborateCom 2023, Corfu Island, Greece, October 4-6, 2023, Proceedings, Part II}, proceedings_a={COLLABORATECOM PART 2}, year={2024}, month={2}, keywords={Deep reinforcement learning Opponent modeling FlipIt game Non-stationary environment}, doi={10.1007/978-3-031-54528-3_4} }
- Qian Yao
Xinli Xiong
Peng Wang
Yongjie Wang
Year: 2024
Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling
COLLABORATECOM PART 2
Springer
DOI: 10.1007/978-3-031-54528-3_4
Abstract
In the cyber attack and defense process, the opponent’s strategy is often dynamic, random, and uncertain. Especially in an advanced persistent threat scenario, it is not easy to capture its behavior strategy when confronted with a long-term latent, highly dynamic and unpredictable opponent. FlipIt game can model the stealth interaction of advanced persistent threat. However, it is insufficient for traditional reinforcement learning approach to solve real-time and non-stationary game model. Therefore, how to model a non-stationary opponent implicitly and keep the defense agent’s advantage continuously is essential. In this paper, we propose an extended FlipIt game model incorporating opponent modeling. And then we propose an approach that combines deep reinforcement learning, opponent modeling, and dropout technology to perceive the behavior of a non-stationary opponent and defeat it. Instead of explicitly identifying the opponent’s intention, the defense agent observes the opponent’s last move actions from the game environment, stores the information in its knowledge, then perceives the opponent’s strategy and finally makes a decision to maximize its benefits. We show the excellent performance of our approach whether the opponent adopts traditional, random or composite strategies. The experimental results demonstrated that our approach can perceive the opponent quickly and maintain the superiority of suppressing the opponent.