Enhancing RL Safety with Counterfactual LLM Reasoning

2409.10188

Total: 1

#1 Enhancing RL Safety with Counterfactual LLM Reasoning [PDF²] [Copy] [Kimi³] [REL]

Authors: Dennis Gross, Helge Spieker

Reinforcement learning (RL) policies may exhibit unsafe behavior and are hard to explain. We use counterfactual large language model reasoning to enhance RL policy safety post-training. We show that our approach improves and helps to explain the RL policy safety.

Subject: Machine Learning

Publish: 2024-09-16 11:30:39 UTC

2409.10188

#1 Enhancing RL Safety with Counterfactual LLM Reasoning [PDF2] [Copy] [Kimi3] [REL]

#1 Enhancing RL Safety with Counterfactual LLM Reasoning [PDF²] [Copy] [Kimi³] [REL]