Unsolved

Sharp characterization of misspecification robustness for debiased GD inference

Sourced from the work of Qiyang Han, Xiaocong Xu

§ Problem Statement

Setup

Let Zi=(Xi,Yi)Z_i=(X_i,Y_i), i=1,,ni=1,\dots,n, be i.i.d. observations from an unknown distribution PP on Z=X×Y\mathcal Z=\mathcal X\times\mathcal Y. Let ΘRd\Theta\subseteq\mathbb R^d be the parameter space, and let :Θ×ZR\ell:\Theta\times\mathcal Z\to\mathbb R be a twice continuously differentiable loss in θ\theta. Define the population and empirical risks by

LP(θ)=EP[(θ;Z)],Ln(θ)=1ni=1n(θ;Zi).L_P(\theta)=\mathbb E_P[\ell(\theta;Z)],\qquad L_n(\theta)=\frac1n\sum_{i=1}^n \ell(\theta;Z_i).

Assume the population minimizer

θPargminθΘLP(θ)\theta_P^\star\in\arg\min_{\theta\in\Theta} L_P(\theta)

exists and is unique (possibly with PP outside the working model class used to motivate \ell, i.e., misspecification is allowed).

Starting from θ0Θ\theta^0\in\Theta, define gradient-descent iterates

θs+1=θsηsLn(θs),s=0,1,,\theta^{s+1}=\theta^s-\eta_s\nabla L_n(\theta^s),\qquad s=0,1,\dots,

with step sizes ηs>0\eta_s>0. For each iteration index tt, let θ~t\widetilde\theta^t denote the debiased estimator constructed from past GD iterates {θs}st\{\theta^s\}_{s\le t} by the debiasing scheme of interest. For a fixed nonzero contrast vector aRda\in\mathbb R^d, consider the studentized statistic

Tn,t(a)=a(θ~tθP)σ^t(a),T_{n,t}(a)=\frac{a^\top(\widetilde\theta^t-\theta_P^\star)}{\widehat\sigma_t(a)},

where σ^t(a)\widehat\sigma_t(a) estimates the asymptotic standard deviation of a(θ~tθP)a^\top(\widetilde\theta^t-\theta_P^\star).

Unsolved Problem

Characterize, as sharply as possible, the class of misspecification regimes and loss/design structures under which asymptotically valid normal inference holds despite misspecification, e.g.

sup0tTnsupxRPP ⁣(Tn,t(a)x)Φ(x)0as n,\sup_{0\le t\le T_n}\sup_{x\in\mathbb R}\left|\mathbb P_P\!\left(T_{n,t}(a)\le x\right)-\Phi(x)\right|\to0\quad\text{as }n\to\infty,

for each fixed aa, and to identify where this uniform asymptotic normality must fail.

§ Discussion

Loading discussion…

§ Significance & Implications

For arXiv:2412.09498v3, the misspecification-robustness wording is supported by the introduction discussion (not as an abstract quote): Section 1.4 (after Eq. (1.11)) points to robustness under limited misspecification and refers to Appendix C.2. The paper does not provide a sharp necessary-and-sufficient boundary for the maximal validity class.

§ Known Partial Results

  • Han et al. (2024): The paper proves debiased-GD inferential validity in its theorem-specific/model-specific framework and reports additional simulation evidence of robustness, but it does not establish a sharp maximal-class characterization of misspecification robustness. Status remains open.

§ References

[1]

Gradient descent inference in empirical risk minimization

Qiyang Han, Xiaocong Xu (2024)

Annals of Statistics (to appear)

📍 arXiv:2412.09498v3, Section 1.4 (paragraph immediately after Eq. (1.11), misspecification-robustness discussion) and Appendix C.2 (simulation evidence under misspecification).

Primary source motivating this synthesized open direction.

§ Tags