Pareto Optimal Risk-Agnostic Distributional Bandits with Heavy-Tail Rewards

#1 Pareto Optimal Risk-Agnostic Distributional Bandits with Heavy-Tail Rewards [PDF] [Copy] [Kimi] [REL]

Authors: Kyungjae Lee, Dohyeong Kim, Taehyun Cho, Chaeyeon Kim, Yunkyung Ko, Seungyub Han, Seokhun Ju, Dohyeok Lee, Sungbin Lim

This paper addresses the problem of multi-risk measure agnostic multi-armed bandits in heavy-tailed reward settings. We propose a framework that leverages novel deviation inequalities for the $1$-Wasserstein distance to construct confidence intervals for Lipschitz risk measures. The distributional LCB (DistLCB) algorithm is introduced, which achieves asymptotic optimality by deriving the first lower bounds for risk measure aware bandits with explicit sub-optimality gap dependencies. The DistLCB is further extended to multi-risk objectives, which enables Pareto-optimal solutions that consider multiple aspects of reward distributions. Additionally, we provide a regret analysis that includes both gap-dependent and gap-independent bounds for multi-risk settings. Experiments validate the effectiveness of the proposed methods in synthetic and real-world applications.

Subject: NeurIPS.2025 - Poster

q8oLLyA34Q@OpenReview

#1 Pareto Optimal Risk-Agnostic Distributional Bandits with Heavy-Tail Rewards [PDF] [Copy] [Kimi] [REL]