sesa 21(27): e2

Research Article

Zero-Trust Based Distributed Collaborative Dynamic Access Control Scheme with Deep Multi-Agent Reinforcement Learning

Download332 downloads
  • @ARTICLE{10.4108/eai.25-6-2021.170246,
        author={Qiuqing Jin and Liming Wang},
        title={Zero-Trust Based Distributed Collaborative Dynamic Access Control Scheme with Deep Multi-Agent Reinforcement Learning},
        journal={EAI Endorsed Transactions on Security and Safety},
        volume={8},
        number={27},
        publisher={EAI},
        journal_a={SESA},
        year={2020},
        month={12},
        keywords={Zero-Trust, Insider Threats, Dynamic Access Control, Reinforcement Learning},
        doi={10.4108/eai.25-6-2021.170246}
    }
    
  • Qiuqing Jin
    Liming Wang
    Year: 2020
    Zero-Trust Based Distributed Collaborative Dynamic Access Control Scheme with Deep Multi-Agent Reinforcement Learning
    SESA
    EAI
    DOI: 10.4108/eai.25-6-2021.170246
Qiuqing Jin1,2,*, Liming Wang
  • 1: Institute of Information Engineering, Chinese Academy of Sciences
  • 2: University of Chinese Academy of Sciences
*Contact email: jinqiuqing@iie.ac.cn

Abstract

Vast majority of organizations and companies strongly depend on intranet with access control to achieve security data accessibility and authorized resource sharing across departments and networks. However, traditional boundary defense has difficulty in mitigating the increasing threats and attacks that mostly originated by insiders. Common insider threat solutions decouple the detection and defense, which requires domain knowledge and human intervention to achieve the mitigation after the protection. Moreover, these static methods have no capability to dynamically monitor various anomaly events and take corresponding protective measures. In this paper, we present a Zero-Trust based collaborative dynamic access control scheme to rebuild a security network architecture from the traffic scheduling perspective for insider threats mitigation. This scheme organically combines anomaly detection and mitigation execution by constructing dynamic updating user trust profile as the evidence of access control and collaboratively adjusting mitigation policy with any subtle requirement and environment changes in a scalable distributed way. We make use of the Multi Agent Deep Deterministic Policy Gradient (MADDPG) to optimize the traffic allocation policy for adaptive and automatic collaborative management scheme with the consideration of network security, network environment and user requirement. The performance of the scheme is analyzed through a network simulator, which shows promising results for DRL to be applied in threat mitigation.