Unsolved

Log-factor-free adaptive contraction on general Minkowski-dimensional domains

Sourced from the work of Tao Tang, Nan Wu, Xiuyuan Cheng, David Dunson

§ Problem Statement

Setup

Let D1D\ge 1, let XRD\mathcal X\subset\mathbb R^D be compact, and let N(X,r)N(\mathcal X,r) be the covering number by Euclidean balls of radius rr. Work under an intrinsic-dimension upper-complexity condition of the form

N(X,r)Crdfor sufficiently small r,N(\mathcal X,r)\le C\,r^{-d}\quad\text{for sufficiently small }r,

for some d(0,D]d\in(0,D] and C<C<\infty, together with the regression model

XiPX,Yi=f0(Xi)+ξi,ξiiidN(0,σ2),X_i\sim P_X,\qquad Y_i=f_0(X_i)+\xi_i,\qquad \xi_i\stackrel{iid}{\sim}N(0,\sigma^2),

where PXP_X is supported on X\mathcal X.

This setup follows Tang et al. (2024).

What the source proves (self-contained, with notation aligned to the paper):

  1. Prior class (geometry-agnostic GP): use a centered GP on X\mathcal X with squared-exponential kernel
ht(x,x)=exp ⁣(xx22t),t>0,h_t(x,x')=\exp\!\left(-\frac{\|x-x'\|^2}{2t}\right),\qquad t>0,

and a hierarchical/empirical-Bayes prior pn(t)p_n(t) on bandwidth tt. "Geometry-agnostic" means the prior construction does not take intrinsic dimension as input and does not use manifold charts/atlases or Laplace--Beltrami coordinates; it is built from ambient Euclidean data geometry.

  1. General assumptions used for the adaptive-rate theorem: (A1) covering-number complexity: for some ϱ>0\varrho>0, CX>0C_{\mathcal X}>0, r0(0,1)r_0\in(0,1),
N(r,X,)CXrϱ,0<rr0;N(r,\mathcal X,\|\cdot\|_\infty)\le C_{\mathcal X}r^{-\varrho},\qquad 0<r\le r_0;

(A2) RKHS approximation: for some s>0s>0 and constants ϵ0,ν1,ν2>0\epsilon_0,\nu_1,\nu_2>0, for every 0<ϵ<ϵ00<\epsilon<\epsilon_0 there exists FϵHϵ(X)F^\epsilon\in\mathbb H_\epsilon(\mathcal X) such that

supxXFϵ(x)f0(x)ν1ϵs/2,FϵHϵ(X)2ν2ϵϱ/2;\sup_{x\in\mathcal X}|F^\epsilon(x)-f_0(x)|\le \nu_1\epsilon^{s/2}, \qquad \|F^\epsilon\|_{\mathbb H_\epsilon(\mathcal X)}^2\le \nu_2\epsilon^{-\varrho/2};

(A3) prior-on-bandwidth condition: pn(t)p_n(t) places enough mass near tn2/(2s+ϱ)t\asymp n^{-2/(2s+\varrho)} and sufficiently little mass at much smaller scales (formal two-sided exponential-tail inequalities in Assumption (A3) of the paper).

  1. Posterior contraction definition used in the source: for a semimetric dnd_n, a sequence εn\varepsilon_n is a contraction rate if
Π(dn(f,f0)>εn(Xi,Yi)i=1n)0\Pi\big(d_n(f,f_0)>\varepsilon_n\mid (X_i,Y_i)_{i=1}^n\big)\to 0

in probability.

  1. Main adaptive-rate conclusion under (A1)--(A3): in fixed design, the paper proves
εˉn=Cns2s+ϱ(logn)D+12+ϱ/s+D+12ns2s+ϱ(logn)D+1,\bar\varepsilon_n = C\,n^{-\frac{s}{2s+\varrho}}(\log n)^{\frac{D+1}{2+\varrho/s}+\frac{D+1}{2}} \lesssim n^{-\frac{s}{2s+\varrho}}(\log n)^{D+1},

and in random design (with bounded-signal truncation in the theorem statement) obtains the same polynomial exponent in L2(PX)L_2(P_X) up to logarithmic factors.

  1. Meaning of "minimax power" and "adaptation": the polynomial exponent is s/(2s+ϱ)s/(2s+\varrho) (equivalently β/(2β+d)\beta/(2\beta+d) when writing s=βs=\beta, ϱ=d\varrho=d), which is the standard nonparametric minimax exponent for dd-dimensional smoothness-β\beta regression classes; "adaptive" means one prior construction attains this exponent across the paper's class of unknown intrinsic dimensions and smoothness levels, with only logarithmic overhead.

Unsolved Problem

Determine whether one can remove the logarithmic loss and prove, uniformly over broad classes of (X,PX,f0)(\mathcal X,P_X,f_0) satisfying the source-type assumptions,

Π ⁣(ff0L2(PX)>Mnβ/(2β+d)|(Xi,Yi)i=1n)0\Pi\!\left(\|f-f_0\|_{L_2(P_X)}>M n^{-\beta/(2\beta+d)}\,\middle|\,(X_i,Y_i)_{i=1}^n\right)\to 0

(in expectation under f0f_0) for all sufficiently large MM, simultaneously adaptive in unknown dd and unknown smoothness.

§ Discussion

Loading discussion…

§ Significance & Implications

Tang et al. (2024) identify logarithmic factors as the remaining gap in the general low-intrinsic-dimensional setting. Closing this gap would determine whether fully geometry-agnostic GP priors attain the sharp adaptive minimax rate on general supports satisfying covering-number upper bounds.

§ Known Partial Results

  • Tang et al. (2024): The paper proves adaptive posterior contraction for low-intrinsic-dimensional supports at minimax-power rates up to logarithmic factors under covering-number upper-complexity assumptions. For the manifold setting treated in the paper (with its regularity assumptions), results are also stated up to logarithmic factors.

§ References

[1]

Adaptive Bayesian regression on data with low intrinsic dimensionality

Tao Tang, Nan Wu, Xiuyuan Cheng, David Dunson (2024)

Annals of Statistics (to appear)

📍 arXiv v3, Section 6 (Discussion), first paragraph (RKHS approximation/covering-number dependence for general low-dimensional supports), p. 14

Primary source. Author order follows arXiv v3 metadata; year follows the original arXiv posting year (2024), while arXiv_id points to revision v3.

§ Tags