Tight Online Confidence Intervals for RKHS Elements
Posed by Sattar Vakili et al. (2021)
§ Problem Statement
Setup
Let be a nonempty set with a positive semidefinite kernel , and let be the associated RKHS with norm . Assume the diagonal is bounded: . An unknown target function satisfies .
An online algorithm interacts for rounds . Let . At round , the algorithm chooses that is -measurable and observes
where is conditionally mean-zero and -sub-Gaussian: for all ,
Given data up to time and regularization , define the kernel ridge regression estimator
Let with , let , and for let . By the representer theorem, one may take
Define the associated (regularized) predictive standard deviation proxy
Unsolved Problem
Characterize, up to the correct order and with explicit dependence on , and suitable kernel-complexity quantities, the smallest achievable sequence such that for every (possibly adaptive) sampling rule ,
In particular, determine whether the existing sequential (adaptive-design) RKHS confidence bounds used in kernelized bandits/RL can be substantially tightened, or whether their time/kernel-complexity dependence is information-theoretically necessary for uniform-in- guarantees.
§ Discussion
§ Significance & Implications
Uniform-in-time, high-probability bounds of the form are the key step that turns kernel regression error control into exploration guarantees in kernelized bandits and reinforcement learning. If is overly conservative in the adaptive (online) design setting, regret analyses inherit this looseness and may fail to certify sublinear regret for standard kernelized algorithms. Tight characterizations of the best possible online would therefore (i) sharpen regret guarantees when possible and (ii) clarify whether currently weak bounds arise from analysis artifacts or from a genuine limitation of existing algorithms under adaptive sampling.
§ Known Partial Results
Vakili et al. (2021): The COLT 2021 open-problem note isolates the difficulty of constructing confidence sequences for RKHS-valued estimation when the design points are chosen sequentially based on past observations.
Vakili et al. (2021): The note surveys confidence bounds currently used in analyses of kernelized bandit and RL methods and argues that their resulting confidence widths can be overly loose in the online/adaptive setting.
Vakili et al. (2021): A key obstacle emphasized is that standard (non-adaptive or fixed-design) concentration arguments do not directly yield uniform-in-time guarantees when depends on past noise.
Vakili et al. (2021): The note highlights that, with current tools, regret analyses for algorithms such as GP-UCB and GP-TS can become too weak to even prove sublinear regret in general, motivating tighter online RKHS confidence intervals.
§ References
Open Problem: Tight Online Confidence Intervals for RKHS Elements
Sattar Vakili, Jonathan Scarlett, Tara Javidi (2021)
Conference on Learning Theory (COLT), PMLR 134
📍 Open-problem note in COLT proceedings.
Open Problem: Tight Online Confidence Intervals for RKHS Elements (PDF)
Sattar Vakili, Jonathan Scarlett, Tara Javidi (2021)
Conference on Learning Theory (COLT), PMLR 134
📍 Proceedings PDF.
On Kernelized Multi-armed Bandits
Sayak Ray Chowdhury, Aditya Gopalan (2017)
International Conference on Machine Learning (ICML), PMLR 70
📍 Representative kernel bandit analysis relying on RKHS confidence bounds under sequentially chosen points.