Value-Based Deep RL Scales Predictably

#1 Value-Based Deep RL Scales Predictably [PDF] [Copy] [Kimi] [REL]

Authors: Oleh Rybkin, Michal Nauman, Preston Fu, Charlie Snell, Pieter Abbeel, Sergey Levine, Aviral Kumar

Scaling data and compute is critical in modern machine learning. However, scaling also demands _predictability_: we want methods to not only perform well with more compute or data, but also have their performance be predictable from low compute or low data runs, without ever running the large-scale experiment. In this paper, we show predictability of value-based off-policy deep RL. First, we show that data and compute requirements to reach a given performance level lie on a _Pareto frontier_, controlled by the updates-to-data (UTD) ratio. By estimating this frontier, we can extrapolate data requirements into a higher compute regime, and compute requirements into a higher data regime. Second, we determine the optimal allocation of total _budget_ across data and compute to obtain given performance and use it to determine hyperparameters that maximize performance for a given budget. Third, this scaling behavior is enabled by first estimating predictable relationships between different _hyperparameters_, which is used to counteract effects of overfitting and plasticity loss unique to RL. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym, when extrapolating to higher levels of data, compute, budget, or performance.

Subject: ICML.2025 - Poster

FLPFPYJeVU@OpenReview

#1 Value-Based Deep RL Scales Predictably [PDF] [Copy] [Kimi] [REL]