Unsolved

Tight Online Confidence Intervals for RKHS Elements

Posed by Sattar Vakili et al. (2021)

§ Problem Statement

Setup

Let (X,k)(\mathcal X,k) be a nonempty set with a positive semidefinite kernel k:X×XRk:\mathcal X\times\mathcal X\to\mathbb R, and let Hk\mathcal H_k be the associated RKHS with norm Hk\|\cdot\|_{\mathcal H_k}. Assume the diagonal is bounded: κ2:=supxXk(x,x)<\kappa^2:=\sup_{x\in\mathcal X} k(x,x)<\infty. An unknown target function fHkf\in\mathcal H_k satisfies fHkB\|f\|_{\mathcal H_k}\le B.

An online algorithm interacts for rounds t=1,2,t=1,2,\dots. Let Ft:=σ(x1,y1,,xt,yt)\mathcal F_t:=\sigma(x_1,y_1,\dots,x_t,y_t). At round tt, the algorithm chooses xtXx_t\in\mathcal X that is Ft1\mathcal F_{t-1}-measurable and observes

yt=f(xt)+ηt,y_t=f(x_t)+\eta_t,

where (ηt)(\eta_t) is conditionally mean-zero and RR-sub-Gaussian: for all λR\lambda\in\mathbb R,

E[exp(ληt)Ft1]exp(λ2R22).\mathbb E\big[\exp(\lambda\eta_t)\mid \mathcal F_{t-1}\big]\le \exp\Big(\tfrac{\lambda^2R^2}{2}\Big).

Given data up to time tt and regularization λ>0\lambda>0, define the kernel ridge regression estimator

f^targmingHk s=1t(ysg(xs))2+λgHk2.\hat f_t\in\arg\min_{g\in\mathcal H_k}\ \sum_{s=1}^t (y_s-g(x_s))^2+\lambda\|g\|_{\mathcal H_k}^2.

Let KtRt×tK_t\in\mathbb R^{t\times t} with (Kt)ij=k(xi,xj)(K_t)_{ij}=k(x_i,x_j), let y1:t=(y1,,yt)y_{1:t}=(y_1,\dots,y_t)^\top, and for xXx\in\mathcal X let kt(x)=(k(x1,x),,k(xt,x))k_t(x)=(k(x_1,x),\dots,k(x_t,x))^\top. By the representer theorem, one may take

f^t(x)=kt(x)(Kt+λI)1y1:t.\hat f_t(x)=k_t(x)^\top (K_t+\lambda I)^{-1}y_{1:t}.

Define the associated (regularized) predictive standard deviation proxy

st(x)=k(x,x)kt(x)(Kt+λI)1kt(x).s_t(x)=\sqrt{k(x,x)-k_t(x)^\top (K_t+\lambda I)^{-1}k_t(x)}.

Unsolved Problem

Characterize, up to the correct order and with explicit dependence on B,R,λ,δB,R,\lambda,\delta, and suitable kernel-complexity quantities, the smallest achievable sequence (βt)t1(\beta_t)_{t\ge 1} such that for every (possibly adaptive) sampling rule (xt)(x_t),

P(t1, xX: f(x)f^t(x)βtst(x))1δ.\mathbb P\Big(\forall t\ge 1,\ \forall x\in\mathcal X:\ |f(x)-\hat f_t(x)|\le \beta_t\, s_t(x)\Big)\ge 1-\delta.

In particular, determine whether the existing sequential (adaptive-design) RKHS confidence bounds used in kernelized bandits/RL can be substantially tightened, or whether their time/kernel-complexity dependence is information-theoretically necessary for uniform-in-(t,x)(t,x) guarantees.

§ Discussion

Loading discussion…

§ Significance & Implications

Uniform-in-time, high-probability bounds of the form f(x)f^t(x)βtst(x)|f(x)-\hat f_t(x)|\le \beta_t s_t(x) are the key step that turns kernel regression error control into exploration guarantees in kernelized bandits and reinforcement learning. If βt\beta_t is overly conservative in the adaptive (online) design setting, regret analyses inherit this looseness and may fail to certify sublinear regret for standard kernelized algorithms. Tight characterizations of the best possible online βt\beta_t would therefore (i) sharpen regret guarantees when possible and (ii) clarify whether currently weak bounds arise from analysis artifacts or from a genuine limitation of existing algorithms under adaptive sampling.

§ Known Partial Results

  • Vakili et al. (2021): The COLT 2021 open-problem note isolates the difficulty of constructing confidence sequences for RKHS-valued estimation when the design points xtx_t are chosen sequentially based on past observations.

  • Vakili et al. (2021): The note surveys confidence bounds currently used in analyses of kernelized bandit and RL methods and argues that their resulting confidence widths can be overly loose in the online/adaptive setting.

  • Vakili et al. (2021): A key obstacle emphasized is that standard (non-adaptive or fixed-design) concentration arguments do not directly yield uniform-in-time guarantees when xtx_t depends on past noise.

  • Vakili et al. (2021): The note highlights that, with current tools, regret analyses for algorithms such as GP-UCB and GP-TS can become too weak to even prove sublinear regret in general, motivating tighter online RKHS confidence intervals.

§ References

[1]

Open Problem: Tight Online Confidence Intervals for RKHS Elements

Sattar Vakili, Jonathan Scarlett, Tara Javidi (2021)

Conference on Learning Theory (COLT), PMLR 134

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Tight Online Confidence Intervals for RKHS Elements (PDF)

Sattar Vakili, Jonathan Scarlett, Tara Javidi (2021)

Conference on Learning Theory (COLT), PMLR 134

📍 Proceedings PDF.

[3]

On Kernelized Multi-armed Bandits

Sayak Ray Chowdhury, Aditya Gopalan (2017)

International Conference on Machine Learning (ICML), PMLR 70

📍 Representative kernel bandit analysis relying on RKHS confidence bounds under sequentially chosen points.

§ Tags