Unsolved

Regret Bounds for Noise-Free Kernel-Based Bandits

Posed by Sattar Vakili (2022)

§ Problem Statement

Setup

Let X=[0,1]dRd\mathcal{X}=[0,1]^d\subset\mathbb{R}^d, let k:X×XRk:\mathcal{X}\times\mathcal{X}\to\mathbb{R} be a known positive definite kernel with RKHS (Hk,Hk)(\mathcal{H}_k,\|\cdot\|_{\mathcal{H}_k}), and fix a known radius Ck>0C_k>0. An unknown deterministic function fHkf\in\mathcal{H}_k satisfies fHkCk\|f\|_{\mathcal{H}_k}\le C_k. At each round t=1,2,,Nt=1,2,\dots,N, an algorithm (possibly randomized) chooses xtXx_t\in\mathcal{X} measurably as a function of the past and then observes the exact value yt=f(xt)y_t=f(x_t) (noise-free feedback). Let xargmaxxXf(x)x^*\in\arg\max_{x\in\mathcal{X}} f(x) and define cumulative regret

RN(A,f)=t=1N(f(x)f(xt)).R_N(A,f)=\sum_{t=1}^N\bigl(f(x^*)-f(x_t)\bigr).

Write O~()\tilde O(\cdot) for rates that hide factors polylogarithmic in NN.

Unsolved Problem

Problem 2022. Determine the minimax optimal growth rate

infAsupfHkCkRN(A,f)\inf_A\sup_{\|f\|_{\mathcal{H}_k}\le C_k} R_N(A,f)

as a function of NN in the noise-free kernel-based bandit setting, and to give matching algorithmic upper bounds and information-theoretic lower bounds. In particular, when kk is a (stationary, isotropic) Matern kernel on [0,1]d[0,1]^d with smoothness parameter ν>0\nu>0, determine the smallest exponent α=α(d,ν)\alpha=\alpha(d,\nu) such that some algorithm achieves supfHkCkRN(A,f)=O~(Nα)\sup_{\|f\|_{\mathcal{H}_k}\le C_k}R_N(A,f)=\tilde O(N^{\alpha}), and prove a matching lower bound (up to polylog factors).

More specifically, prove or refute the COLT 2022 conjecture that (up to polylogarithmic factors)

supfHkCkRN(A,f)={O(N(dν)/d)if d>ν,O(logN)if d=ν,O(1)if d<ν.\sup_{\|f\|_{\mathcal{H}_k}\le C_k}R_N(A,f)= \begin{cases} O\bigl(N^{(d-\nu)/d}\bigr) & \text{if } d>\nu,\\ O(\log N) & \text{if } d=\nu,\\ O(1) & \text{if } d<\nu. \end{cases}

§ Discussion

Loading discussion…

§ Significance & Implications

In the noisy RKHS/GP bandit model, regret rates are now characterized (up to log factors) via information-gain-type complexity terms, but Vakili (COLT 2022) highlights that transferring those analyses to exact (noise-free) observations leaves a substantial, kernel-dependent gap and may even fail to guarantee sublinear regret for some kernels. Pinning down the minimax noise-free rate for Matern kernels would (i) identify the true exploration cost when observations are exact, (ii) clarify whether cumulative regret can be bounded by O(logN)O(\log N) or O(1)O(1) once smoothness ν\nu meets or exceeds dimension dd, and (iii) provide a target rate that future deterministic Bayesian-optimization-style algorithms and lower-bound proofs must match.

§ Known Partial Results

  • Vakili (2022): Vakili (COLT 2022) formulates the noise-free kernel bandit problem with fHkf\in\mathcal{H}_k, fHkCk\|f\|_{\mathcal{H}_k}\le C_k, exact observations yt=f(xt)y_t=f(x_t), and cumulative regret RN=t=1N(f(x)f(xt))R_N=\sum_{t=1}^N(f(x^*)-f(x_t)).

  • Vakili (2022): The note contrasts this with the noisy setting, where regret bounds for GP-style methods are often expressed in terms of a maximal information gain quantity (e.g., Γk,λ(N)=sup{xi}i=1N12logdet(I+λ2K)\Gamma_{k,\lambda}(N)=\sup_{\{x_i\}_{i=1}^N}\tfrac12\log\det(I+\lambda^{-2}K)), yielding rates of the form O~(Γk,λ(N)N)\tilde O(\Gamma_{k,\lambda}(N)\sqrt{N}) or O~(Γk,λ(N)N)\tilde O(\sqrt{\Gamma_{k,\lambda}(N)\,N}) depending on the analysis.

  • Vakili (2022): The note reports that such noisy-style bounds can be loose in the noise-free regime: for Matern kernels they lead to a regret exponent of the form O~(N(ν+d)/(2ν+d))\tilde O\bigl(N^{(\nu+d)/(2\nu+d)}\bigr), which the note argues is not order-optimal when observations are exact.

  • Vakili (2022): The note discusses multiple noise-free upper bounds obtained via different strategies (including an explore-then-commit style approach) and emphasizes that, while these bounds can be sublinear, none are proved to match a minimax lower bound for the noise-free problem.

  • Vakili (2022): The note states a conjectured piecewise-optimal rate for Matern kernels, RN=O~(N(dν)/d)R_N=\tilde O\bigl(N^{(d-\nu)/d}\bigr) for d>νd>\nu, O~(logN)\tilde O(\log N) for d=νd=\nu, and O~(1)\tilde O(1) for d<νd<\nu, motivated heuristically by controlling sums of GP posterior standard deviations under near-uniform sampling.

§ References

[1]

Open Problem: Regret Bounds for Noise-Free Kernel-Based Bandits

Sattar Vakili (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Regret Bounds for Noise-Free Kernel-Based Bandits (PDF)

Sattar Vakili (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Proceedings PDF.

[3]

Convergence rates of efficient global optimization algorithms

Adam D. Bull (2011)

Journal of Machine Learning Research

📍 Mentioned in the provided partial results as a source of (near-)optimal simple-regret rates for EI under Matern-type smoothness.

§ Tags