Reinforcement Learning Foundation Models Should Already Be A Thing

#1 Reinforcement Learning Foundation Models Should Already Be A Thing [PDF] [Copy] [Kimi] [REL]

Authors: Abdelrahman Zighem, Jill-Jênn Vie

Foundation models for language and vision are powered by internet-scale data, while structured domains (tabular prediction, time-series forecasting, graph learning, reinforcement learning) are not. The substitute is synthetic data, which shifts the burden from collection to prior design. Such priors already exist for many structured tasks: TabPFN and its successors solve tabular classification with a transformer pretrained on a synthetic Bayesian prior. We make two points. \textbf{First}, reinforcement learning is the conspicuous gap: sampling a synthetic MDP is as feasible as sampling a synthetic tabular dataset, yet no in-context RL work treats prior design as a primary objective. \textbf{Second}, MDPs admit a fixed-size sufficient statistic, independent of the episodes observed and tabular in shape, which makes them directly amenable to the attention-based architectures used for tabular foundation models, with a policy head replacing the supervised target. Together these define the agenda for an RL foundation model. As a proof of concept, we train one model entirely on synthetic MDPs and show that, with no task-specific tuning, it solves held-out tabular benchmarks in context, both online and offline: online, in far fewer episodes than UCB-VI and tabular Q-learning, and offline, competitively with VI-LCB.

Subjects: Machine Learning , Artificial Intelligence

Publish: 2026-06-17 08:27:27 UTC

2606.18812

#1 Reinforcement Learning Foundation Models Should Already Be A Thing [PDF] [Copy] [Kimi] [REL]