Total: 1
Contextual sequential decision-making is fundamental to machine learning, with applications in bandits, sequential hypothesis testing, and online risk control. These tasks often rely on statistical measures like expectation, variance, and quantiles. In this paper, we propose a universal algorithmic framework that learns the full underlying distribution, enabling a unified approach to all contextual online decision-making problems. The challenge lies in the uncountably infinite-dimensional regression, where existing contextual bandit algorithms all yield infinite regret. We innovatively propose an efficient infinite-dimensional functional regression oracle for contextual cumulative distribution functions (CDFs) and model every datum as a combination of context-dependent CDF basis functions. Our analysis reveals that the decay rate of the eigenvalue sequence of the design integral operator governs the regression error rate, and consequently, the utility regret rate. Specifically, when the eigenvalue sequence exhibits a polynomial decay of order $\frac{1}{\gamma}\ge 1$, the utility regret is bounded by $\tilde{O}( T^{\frac{3\gamma+2}{2(\gamma+2)}})$. The case that $\gamma=0$ can recover the existing optimal rate in contextual bandits literature with finite-dimensional regression and so as exponential decay. We also provide a numerical method to compute the eigenvalue sequence of integral operators, enabling the practical implementation of our framework.