26948@AAAI

Total: 1

#1 Know Your Enemy: Identifying Adversarial Behaviours in Deep Reinforcement Learning Agents (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Seán Caulfield Curley ; Karl Mason ; Patrick Mannion

It has been shown that an agent can be trained with an adversarial policy which achieves high degrees of success against a state-of-the-art DRL victim despite taking unintuitive actions. This prompts the question: is this adversarial behaviour detectable through the observations of the victim alone? We find that widely used classification methods such as random forests are only able to achieve a maximum of ≈71% test set accuracy when classifying an agent for a single timestep. However, when the classifier inputs are treated as time-series data, test set classification accuracy is increased significantly to ≈98%. This is true for both classification of episodes as a whole, and for “live” classification at each timestep in an episode. These classifications can then be used to “react” to incoming attacks and increase the overall win rate against Adversarial opponents by approximately 17%. Classification of the victim’s own internal activations in response to the adversary is shown to achieve similarly impressive accuracy while also offering advantages like increased transferability to other domains.