AIRS: Explanation for Deep Reinforcement Learning based Security Applications

#1 AIRS: Explanation for Deep Reinforcement Learning based Security Applications [PDF³] [Copy] [Kimi⁴] [REL]

Authors: Jiahao Yu, Wenbo Guo, Qi Qin, Gang Wang, Ting Wang, Xinyu Xing

Recently, we have witnessed the success of deep reinforcement learning (DRL) in many security applications, ranging from malware mutation to selfish blockchain mining. Like all other machine learning methods, the lack of explainability has been limiting its broad adoption as users have difficulty establishing trust in DRL models' decisions. Over the past years, different methods have been proposed to explain DRL models but unfortunately, they are often not suitable for security applications, in which explanation fidelity, efficiency, and the capability of model debugging are largely lacking. In this work, we propose AIRS, a general framework to explain deep reinforcement learning-based security applications. Unlike previous works that pinpoint important features to the agent's current action, our explanation is at the step level. It models the relationship between the final reward and the key steps that a DRL agent takes, and thus outputs the steps that are most critical towards the final reward the agent has gathered. Using four representative security-critical applications, we evaluate AIRS from the perspectives of explainability, fidelity, stability, and efficiency. We show that AIRS could outperform alternative explainable DRL methods. We also showcase AIRS's utility, demonstrating that our explanation could facilitate the DRL model's failure offset, help users establish trust in a model decision, and even assist the identification of inappropriate reward designs.

Subject: USENIX-Sec.2023 - Fall

yu-jiahao@usenixsecurity23@USENIX

#1 AIRS: Explanation for Deep Reinforcement Learning based Security Applications [PDF3] [Copy] [Kimi4] [REL]

#1 AIRS: Explanation for Deep Reinforcement Learning based Security Applications [PDF³] [Copy] [Kimi⁴] [REL]