VPPqfZWsuO@OpenReview

Total: 1

#1 Improved Confidence Regions and Optimal Algorithms for Online and Offline Linear MNL Bandits [PDF] [Copy] [Kimi] [REL]

Authors: Yuxuan Han, Jose Blanchet, Zhengyuan Zhou

In this work, we consider the data-driven assortment optimization problem under the linear multinomial logit(MNL) choice model. We first establish a improved confidence region for the maximum likelihood estimator (MLE) of the $d$-dimensional linear MNL likelihood function that removes the explicit dependency on a problem-dependent parameter $\kappa^{-1}$ in previous result (Oh and Iyengar, 2021), which scales exponentially with the radius of the parameter set. Building on the confidence region result, we investigate the data-driven assortment optimization problem in both offline and online settings. In the offline setting, the previously best-known result scales as $\tilde{O}\left(\sqrt{\frac{d}{\kappa n_{S^\star}}}\right)$, where $n_{S^\star}$ the number of times that optimal assortment $S^\star$ is observed (Dong et al., 2023). We propose a new pessimistic-based algorithm that, under a burn-in condition, removes the dependency on $d,\kappa^{-1}$ in the leading order bound and works under a more relaxed coverage condition, without requiring the exact observation of $S^\star$. In the online setting, we propose the first algorithm to achieve $\tilde{O}(\sqrt{dT})$ regret without a multiplicative dependency on $\kappa^{-1}$. In both settings, our results nearly achieve the corresponding lower bound when reduced to the canonical $N$-item MNL problem, demonstrating their optimality.

Subject: NeurIPS.2025 - Poster