Unsolved

Remove the Polylogarithmic Gap to Exact Minimax Optimality

Sourced from the work of Kaizheng Wang

§ Problem Statement

Setup

Let X\mathcal X be an input space and K:X×XRK:\mathcal X\times\mathcal X\to\mathbb R a symmetric positive-semidefinite kernel with RKHS H\mathcal H and feature map ϕ:XH\phi:\mathcal X\to\mathcal H (so K(x,z)=ϕ(x),ϕ(z)HK(x,z)=\langle\phi(x),\phi(z)\rangle_{\mathcal H}). Observe independent datasets:

Ds={(Xi,Yi)}i=1n i.i.d. from source law P,Dt={X0i}i=1n0 i.i.d. from target covariate law QX,\mathcal D_s=\{(X_i,Y_i)\}_{i=1}^n\ \text{i.i.d. from source law }P,\qquad \mathcal D_t=\{X_{0i}\}_{i=1}^{n_0}\ \text{i.i.d. from target covariate law }Q_X,

where target labels in the target sample are unobserved.

This setup follows Wang (2023).

Assume a common regression function under covariate shift:

f(x)=E[YX=x]=E[Y0X0=x]=ϕ(x),θHf^\star(x)=\mathbb E[Y\mid X=x]=\mathbb E[Y_0\mid X_0=x]=\langle\phi(x),\theta^\star\rangle_{\mathcal H}

for some θH\theta^\star\in\mathcal H, with PXQXP_X\neq Q_X allowed. Define

F={fθ: fθ(x)=ϕ(x),θH, θH},\mathcal F=\{f_\theta:\ f_\theta(x)=\langle\phi(x),\theta\rangle_{\mathcal H},\ \theta\in\mathcal H\}, RQ(f):=E[(f(X0)Y0)2],RQ(f)RQ(f)=EX0QX[(f(X0)f(X0))2].\mathcal R_Q(f):=\mathbb E[(f(X_0)-Y_0)^2],\qquad \mathcal R_Q(f)-\mathcal R_Q(f^\star)=\mathbb E_{X_0\sim Q_X}[(f(X_0)-f^\star(X_0))^2].

For λ>0\lambda>0, kernel ridge regression on source labels is

f^λargminfF{1ni=1n(f(Xi)Yi)2+λfF2}.\widehat f_\lambda\in\arg\min_{f\in\mathcal F}\Big\{\frac1n\sum_{i=1}^n(f(X_i)-Y_i)^2+\lambda\|f\|_{\mathcal F}^2\Big\}.

Assume there exist constants R,σ,M>0R,\sigma,M>0 such that f(X)R|f^\star(X)|\le R almost surely, noise η:=Yf(X)\eta:=Y-f^\star(X) is conditionally σ\sigma-sub-Gaussian given XX, and either ϕ(X)HM\|\phi(X)\|_{\mathcal H}\le M almost surely or ϕ(X)\phi(X) is strongly sub-Gaussian in H\mathcal H with proxy MM. Define

Σ:=E[ϕ(X)ϕ(X)],Σ0:=E[ϕ(X0)ϕ(X0)],μ2:=max{σ2/R2,M2},\Sigma:=\mathbb E[\phi(X)\otimes\phi(X)],\qquad \Sigma_0:=\mathbb E[\phi(X_0)\otimes\phi(X_0)],\qquad \mu^2:=\max\{\sigma^2/R^2,\,M^2\},

and effective sample size

neff:=sup{tn: tΣ0nΣ+μ2I}.n_{\mathrm{eff}}:=\sup\{t\le n:\ t\Sigma_0\preceq n\Sigma+\mu^2 I\}.

The source proves (up to polylogarithmic factors) a high-probability upper bound of the form

RQ(f^)RQ(f)infρ>0{R2ρ+σ2neffj1μjμj+ρ}+R2μ2(neff1+n01),\mathcal R_Q(\widehat f)-\mathcal R_Q(f^\star) \lesssim \inf_{\rho>0}\Big\{R^2\rho+\frac{\sigma^2}{n_{\mathrm{eff}}}\sum_{j\ge1}\frac{\mu_j}{\mu_j+\rho}\Big\} +R^2\mu^2\big(n_{\mathrm{eff}}^{-1}+n_0^{-1}\big),

where (μj)(\mu_j) are eigenvalues of Σ0\Sigma_0.

Unsolved Problem

Remove the remaining polylogarithmic gap. Determine whether one can (i) construct an estimator with a matching upper bound without extra polylog factors under the same assumptions, or (ii) prove a matching lower bound showing those log factors are unavoidable for this formulation.

§ Discussion

Loading discussion…

§ Significance & Implications

The paper's minimax discussion indicates matching rates up to polylogarithmic factors in the effective-sample-size regime. Resolving whether the remaining log gap is removable would sharpen the statistical optimality picture for pseudo-label-based adaptation and clarify whether the gap is methodological or intrinsic.

§ Known Partial Results

  • Ma et al. (2023): In arXiv:2302.10160v4, Section 4, Theorem 4.1 ("Excess risk") proves non-asymptotic high-probability excess-risk bounds driven by the effective sample size neffn_{\mathrm{eff}}. Corollary 4.1 gives explicit rates (finite-rank, exponential, polynomial spectral regimes) that are minimax-optimal up to polylogarithmic factors under the paper's assumptions. Section 4, Remark 3 ("Optimality and adaptivity") relates sharpness to Theorem 2 of Ma-Pathak-Wainwright (2023) in bounded-likelihood-ratio settings.

§ References

[1]

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Kaizheng Wang (2023)

Annals of Statistics (accepted; forthcoming)

📍 Section 4: definition of effective sample size (Eq. (4.1) in the PDF), Theorem 4.1 ("Excess risk"), Corollary 4.1, and Remark 3 ("Optimality and adaptivity"); Section 7 for broader future directions.

Primary source for the minimax-up-to-polylog results. Year is recorded by arXiv first posting (2023), while v4 reflects a later revision.

[2]

Optimally tackling covariate shift in RKHS-based nonparametric regression

Cong Ma, Reese Pathak, Martin J Wainwright (2023)

The Annals of Statistics

📍 Theorem 2 (minimax lower bound under bounded likelihood-ratio assumptions), as cited in Wang's Section 4 optimality remark.

Lower-bound benchmark cited by Wang for sharpness comparisons in bounded-likelihood-ratio regimes.

§ Tags