Total: 1
We revisit the contextual cascading bandit, where a learning agent recommends an ordered list (\emph{cascade}) of items, and a user scans the list sequentially, stopping at the first attractive item. Although cascading bandits underpin various applications including recommender systems and search engines, the role of the cascade length $K$ in shaping regret has remained unclear. Contrary to prior results that regret grows with $K$, we prove that regret actually \emph{decreases} once $K$ is large enough. Leveraging this insight, we design a new upper-confidence-bound algorithm built on online mirror descent that attains the sharpest known regret upper bound, $\tilde{\mathcal{O}}\bigl(\min \lbrace K\bar{p}^{K-1}, 1 \rbrace d \sqrt{T}\bigr)$ for contextual cascading bandits. To complement this new regret upper bound, we provide a nearly matching lower bound of $\Omega \bigl(\min \lbrace K\underline{p}^{K-1}, 1 \rbrace d \sqrt{T}\bigr)$, where $0 \leq \underline{p} \leq \bar{p} < 1$. Together, these results fully characterize how regret truly scales with $K$, thereby closing the theoretical gap for contextual cascading bandits. Finally, comprehensive experiments validate our theoretical results and show the effectiveness of our proposed method.