Beyond Reward: A Bounded Measure of Agent Environment Coupling

#1 Beyond Reward: A Bounded Measure of Agent Environment Coupling [PDF] [Copy] [Kimi¹] [REL]

Authors: Wael Hafez, Cameron Reid, Amit Nazeri

Real-world reinforcement learning (RL) agents operate in closed-loop systems where actions shape future observations, making reliable deployment under distribution shifts a persistent challenge. Existing monitoring relies on reward or task metrics, capturing outcomes but missing early coupling failures. We introduce bipredictability (P) as the ratio of shared information in the observation, action, outcome loop to the total available information, a principled, real time measure of interaction effectiveness with provable bounds, comparable across tasks. An auxiliary monitor, the Information Digital Twin (IDT), computes P and its diagnostic components from the interaction stream. We evaluate SAC and PPO agents on MuJoCo HalfCheetah under eight agent, and environment-side perturbations across 168 trials. Under nominal operation, agents exhibit P = 0.33 plus minus 0.02, below the classical bound of 0.5, revealing an informational cost of action selection. The IDT detects 89.3% of perturbations versus 44.0% for reward based monitoring, with 4.4x lower median latency. Bipredictability enables early detection of interaction degradation before performance drops and provides a prerequisite signal for closed loop self regulation in deployed RL systems.

Subjects: Artificial Intelligence , Machine Learning

Publish: 2026-03-01 21:38:39 UTC

2603.01283

#1 Beyond Reward: A Bounded Measure of Agent Environment Coupling [PDF] [Copy] [Kimi1] [REL]

#1 Beyond Reward: A Bounded Measure of Agent Environment Coupling [PDF] [Copy] [Kimi¹] [REL]