Unsolved

Theory for Debiased PCR Under General Covariate Models

Sourced from the work of Yufan Li, Pragya Sur

§ Problem Statement

Setup

Let (xi,yi)i=1n(x_i,y_i)_{i=1}^n be independent observations from the linear model

yi=xiβ0+εi,i=1,,n,y_i=x_i^\top\beta_0+\varepsilon_i,\qquad i=1,\dots,n,

where β0Rp\beta_0\in\mathbb R^p is unknown, xiRpx_i\in\mathbb R^p satisfies E[xi]=0\mathbb E[x_i]=0 and Cov(xi)=Σ\mathrm{Cov}(x_i)=\Sigma (not assumed to be right-rotationally invariant), and εi\varepsilon_i is independent of xix_i with E[εi]=0\mathbb E[\varepsilon_i]=0, Var(εi)=σ2(0,)\mathrm{Var}(\varepsilon_i)=\sigma^2\in(0,\infty). Write XRn×pX\in\mathbb R^{n\times p} for the design matrix (rows xix_i^\top) and yRny\in\mathbb R^n for the response vector.

Let the singular value decomposition of XX be X=UDVX=UDV^\top, with rank rr, singular values d1dr>0d_1\ge\cdots\ge d_r>0, and right singular vectors v1,,vrv_1,\dots,v_r. For a fixed truncation level krk\le r, define Vk=(v1,,vk)V_k=(v_1,\dots,v_k), Uk=(u1,,uk)U_k=(u_1,\dots,u_k), Dk=diag(d1,,dk)D_k=\mathrm{diag}(d_1,\dots,d_k), and the rank-kk principal components regression estimator

β^kPCR=VkDk1Uky,\hat\beta_k^{\mathrm{PCR}}=V_kD_k^{-1}U_k^\top y,

equivalently the least-squares estimator constrained to span(Vk)\mathrm{span}(V_k).

Consider a spectrum-aware debiased PCR estimator of the form

β^kdPCR=β^kPCR+MkX(yXβ^kPCR)n,\hat\beta_k^{\mathrm{dPCR}}=\hat\beta_k^{\mathrm{PCR}}+M_k\frac{X^\top(y-X\hat\beta_k^{\mathrm{PCR}})}{n},

where MkRp×pM_k\in\mathbb R^{p\times p} is a data-dependent matrix built from the empirical spectrum/eigenstructure of XX to correct truncation and regularization bias.

For a contrast vector aRpa\in\mathbb R^p, define the conditional centering and scale

Biasa,k,n:=E ⁣[a(β^kdPCRβ0)X],SEa,k,n2:=Var ⁣(aβ^kdPCRX).\mathrm{Bias}_{a,k,n}:=\mathbb E\!\left[a^\top(\hat\beta_k^{\mathrm{dPCR}}-\beta_0)\mid X\right],\qquad \mathrm{SE}_{a,k,n}^2:=\mathrm{Var}\!\left(a^\top\hat\beta_k^{\mathrm{dPCR}}\mid X\right).

The

Unsolved Problem

Give sharp, verifiable conditions on (Σ,β0,εi)(\Sigma,\beta_0,\varepsilon_i), dimension growth (p,n,k)(p,n,k), and contrast classes aa (for example, deterministic or random aa with bounded norm/sparsity) under which, for general non-rotationally-invariant designs,

a(β^kdPCRβ0)Biasa,k,nSEa,k,nN(0,1),\frac{a^\top(\hat\beta_k^{\mathrm{dPCR}}-\beta_0)-\mathrm{Bias}_{a,k,n}}{\mathrm{SE}_{a,k,n}} \Rightarrow \mathcal N(0,1),

and to construct plug-in estimators SE^a,k,n\widehat{\mathrm{SE}}_{a,k,n} (and, if needed, Bias^a,k,n\widehat{\mathrm{Bias}}_{a,k,n}) that are consistent so that asymptotically valid confidence intervals for aβ0a^\top\beta_0 follow uniformly over the stated contrast class.

§ Discussion

Loading discussion…

§ Significance & Implications

PCR is widely used in low-rank/high-collinearity settings; inference is often the bottleneck. A general theory would convert debiased PCR from a first construction into a robust inference tool across modern correlated designs. See Li & Sur (2023) for details.

§ Known Partial Results

  • Li et al. (2023): The problem remains open in the cited source. The abstract claims the first debiased PCR estimator in high dimensions as a by-product of spectrum-aware debiasing.

§ References

[1]

Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression

Yufan Li, Pragya Sur (2023)

Annals of Statistics (to appear)

📍 Appendix I (Conjectures for general covariate models), Conjecture I.1

Source paper where this open problem is discussed.

§ Tags