Regret Bounds for Noise-Free Kernel-Based Bandits

§ Problem Statement

Setup

Let $\mathcal{X}=[0,1]^d\subset\mathbb{R}^d$ , let $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ be a known positive definite kernel with RKHS $(\mathcal{H}_k,\|\cdot\|_{\mathcal{H}_k})$ , and fix a known radius $C_k>0$ . An unknown deterministic function $f\in\mathcal{H}_k$ satisfies $\|f\|_{\mathcal{H}_k}\le C_k$ . At each round $t=1,2,\dots,N$ , an algorithm (possibly randomized) chooses $x_t\in\mathcal{X}$ measurably as a function of the past and then observes the exact value $y_t=f(x_t)$ (noise-free feedback). Let $x^*\in\arg\max_{x\in\mathcal{X}} f(x)$ and define cumulative regret

R_N(A,f)=\sum_{t=1}^N\bigl(f(x^*)-f(x_t)\bigr).

Write $\tilde O(\cdot)$ for rates that hide factors polylogarithmic in $N$ .

Unsolved Problem

Problem 2022. Determine the minimax optimal growth rate

\inf_A\sup_{\|f\|_{\mathcal{H}_k}\le C_k} R_N(A,f)

as a function of $N$ in the noise-free kernel-based bandit setting, and to give matching algorithmic upper bounds and information-theoretic lower bounds. In particular, when $k$ is a (stationary, isotropic) Matern kernel on $[0,1]^d$ with smoothness parameter $\nu>0$ , determine the smallest exponent $\alpha=\alpha(d,\nu)$ such that some algorithm achieves $\sup_{\|f\|_{\mathcal{H}_k}\le C_k}R_N(A,f)=\tilde O(N^{\alpha})$ , and prove a matching lower bound (up to polylog factors).

More specifically, prove or refute the COLT 2022 conjecture that (up to polylogarithmic factors)

\sup_{\|f\|_{\mathcal{H}_k}\le C_k}R_N(A,f)= \begin{cases} O\bigl(N^{(d-\nu)/d}\bigr) & \text{if } d>\nu,\\ O(\log N) & \text{if } d=\nu,\\ O(1) & \text{if } d<\nu. \end{cases}

§ Discussion

Loading discussion…

§ Significance & Implications

In the noisy RKHS/GP bandit model, regret rates are now characterized (up to log factors) via information-gain-type complexity terms, but Vakili (COLT 2022) highlights that transferring those analyses to exact (noise-free) observations leaves a substantial, kernel-dependent gap and may even fail to guarantee sublinear regret for some kernels. Pinning down the minimax noise-free rate for Matern kernels would (i) identify the true exploration cost when observations are exact, (ii) clarify whether cumulative regret can be bounded by $O(\log N)$ or $O(1)$ once smoothness $\nu$ meets or exceeds dimension $d$ , and (iii) provide a target rate that future deterministic Bayesian-optimization-style algorithms and lower-bound proofs must match.

§ Known Partial Results

Vakili (2022): Vakili (COLT 2022) formulates the noise-free kernel bandit problem with $f\in\mathcal{H}_k$ , $\|f\|_{\mathcal{H}_k}\le C_k$ , exact observations $y_t=f(x_t)$ , and cumulative regret $R_N=\sum_{t=1}^N(f(x^*)-f(x_t))$ .
Vakili (2022): The note contrasts this with the noisy setting, where regret bounds for GP-style methods are often expressed in terms of a maximal information gain quantity (e.g., $\Gamma_{k,\lambda}(N)=\sup_{\{x_i\}_{i=1}^N}\tfrac12\log\det(I+\lambda^{-2}K)$ ), yielding rates of the form $\tilde O(\Gamma_{k,\lambda}(N)\sqrt{N})$ or $\tilde O(\sqrt{\Gamma_{k,\lambda}(N)\,N})$ depending on the analysis.
Vakili (2022): The note reports that such noisy-style bounds can be loose in the noise-free regime: for Matern kernels they lead to a regret exponent of the form $\tilde O\bigl(N^{(\nu+d)/(2\nu+d)}\bigr)$ , which the note argues is not order-optimal when observations are exact.
Vakili (2022): The note discusses multiple noise-free upper bounds obtained via different strategies (including an explore-then-commit style approach) and emphasizes that, while these bounds can be sublinear, none are proved to match a minimax lower bound for the noise-free problem.
Vakili (2022): The note states a conjectured piecewise-optimal rate for Matern kernels, $R_N=\tilde O\bigl(N^{(d-\nu)/d}\bigr)$ for $d>\nu$ , $\tilde O(\log N)$ for $d=\nu$ , and $\tilde O(1)$ for $d<\nu$ , motivated heuristically by controlling sums of GP posterior standard deviations under near-uniform sampling.