Total: 1
We study the 'feasible action search' (FAS) problem for linear bandits, wherein a learner attempts to discover a feasible point for a set of linear constraints $\Phi_* a \ge 0,$ without knowledge of the matrix $\Phi_* \in \mathbb{R}^{m \times d}$. A FAS learner selects a sequence of actions $a_t,$ and uses observations of the form $\Phi_* a_t + \mathrm{noise}$ to either find a point with nearly optimal 'safety margin', or detect that the constraints are infeasible, where the safety margin of an action measures its (signed) distance from the constraint boundary. While of interest in its own right, the FAS problem also directly addresses a key deficiency in the extant theory of 'safe linear bandits' (SLBs), by discovering a safe initialisation for low-regret SLB methods.We propose and analyse a novel efficient FAS-learner. Our method, FAST, is based on Thompson Sampling. It applies a _coupled_ random perturbation to an estimate of $\Phi_*,$ and plays a maximin point of a game induced by this perturbed matrix. We prove that FAST stops in $\tilde{O}(d^3/\varepsilon^2 M_*^2)$ steps, and incurs $\tilde{O}(d^3/|M_*|)$ safety costs, to either correctly detect infeasibility, or output a point that is at least $(1-\varepsilon) M_*$-safe, where $M_*$ is the _optimal safety margin_ of $\Phi_*$. Further, instantiating prior SLB methods with the output of FAS yields the first SLB methods that incur $\tilde{O}(\sqrt{d^3 T/M_*^2})$ regret and $O(1)$ risk without a priori knowledge of a safe action. The main technical novelty lies in the extension of Thompson Sampling to this multiobjective setting, for which we both propose a coupled noise design, and provide an analysis that avoids convexity considerations.