
Research Article
Deep Reinforcement Learning Approaches Against Jammers with Unequal Sweeping Probability Attacks
@ARTICLE{10.4108/eetinis.v12i4.10461, author={Lan Nguyen and Duy Nguyen and Nghi Tran and David Brunnenmeyer}, title={Deep Reinforcement Learning Approaches Against Jammers with Unequal Sweeping Probability Attacks}, journal={EAI Endorsed Transactions on Industrial Networks and Intelligent Systems}, volume={12}, number={4}, publisher={EAI}, journal_a={INIS}, year={2025}, month={11}, keywords={Jamming Attacks, Markov Decision Process, Double Deep Q-Networks, Data Rate Game, Q-learning, Reinforcement Learning, Deep Q-Networks}, doi={10.4108/eetinis.v12i4.10461} }- Lan Nguyen
Duy Nguyen
Nghi Tran
David Brunnenmeyer
Year: 2025
Deep Reinforcement Learning Approaches Against Jammers with Unequal Sweeping Probability Attacks
INIS
EAI
DOI: 10.4108/eetinis.v12i4.10461
Abstract
This paper investigates deep reinforcement learning (DRL) approaches designed to counter jammers that maximize disruption by employing unequal sweeping probabilities. We first propose a model and defense action based on a Markov Decision Process (MDP) under non-uniform attacks. A key drawback of the standard MDP model, however, is its assumption that the defending agent can acquire sufficient information about the jamming patterns to determine the transition probability matrix. In a dynamic environment, the attacker’s patterns and models are often unknown or difficult to obtain. To overcome this limitation, RL techniques such as Q-learning, deep Q-network (DQN), and double deep Q-network (DDQN) have been considered effective defense strategies that operate without an explicit jamming model. With Q-learning, defense strategies can still be computationally expensive and require long time to learn the optimal policy. This limitation arises because a large state space or a substantial number of actions causes the Q-table to grow exponentially. Leveraging the flexibility, adaptability, and scalability of RL, we first propose a DQN framework designed to handle large-scale action spaces across expanded channels and jammers. Furthermore, to overcome the inherent overestimation bias present in Q-learning and DQN algorithms, we investigate a DDQN framework. Assuming the estimation error of the action value in DQN follows a zero-mean Gaussian distribution, we then analytically derive the expected loss. Numerical examples are finally presented to characterize the performances of the proposed algorithms and the superiority of DDQN over DQN and Q-learning approaches.
Copyright © 2025 Lan K. Nguyen et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.


