Total: 1
We study reinforcement learning in non-stationary communicating MDPs whose transition drift admits a low-rank plus sparse structure. We propose SVUCRL (Structured Variation UCRL) and prove the dynamic-regret bound $ \widetilde{\mathcal O}\bigl( D_{\max}S\sqrt{A T} +D_{\max}\sqrt{(B_r+B_p)K S T} +D_{\max}\,\delta_B\,B_p \bigr). $ where $S$ is the number of states, $A$ the number of actions, $T$ the horizon, $D_{\max}$ the MDP diameter, $B_r$/$B_p$ the total reward/transition variation budgets, and $K SA$ the rank of the structured drift. The first term is the statistical price of learning in stationary problems; the second is the \emph{non-stationarity price}, which scales with $\sqrt{K}$ rather than $\sqrt{SA}$ when drift is low-rank. This matches the $\sqrt{T}$ rate (up to logs) and improves on prior $T^{3/4}$-type guarantees. SVUCRL combines: (i) online low-rank tracking with explicit Frobenius guarantees, (ii) incremental RPCA to separate structured drift from sparse shocks, (iii) adaptive confidence widening via a bias-corrected local-variation estimator, and (iv) factor forecasting with an optimal shrinkage center.