Generalization-error estimation beyond Gaussian designs
Sourced from the work of Pierre C Bellec, Kai Tan
§ Problem Statement
Setup
Let be i.i.d. training data with and
where is deterministic (or independent of the sample), , and . Write for the design matrix with rows , and .
Assume a high-dimensional regime with under non-Gaussian random designs (e.g., sub-Gaussian or elliptical families with conditions sufficient for asymptotic normality arguments). For fixed iteration index , let be the -th iterate of a first-order method (such as GD, proximal GD, or an accelerated variant) applied to least squares, possibly with a proper closed convex penalty. With independent test covariate , define
Unsolved Problem
For fixed , construct a data-driven estimator of that admits a valid -scale limit law under non-Gaussian designs, with finite nondegenerate asymptotic variance and conditions for consistent variance estimation.
§ Discussion
§ Significance & Implications
The cited work studies uncertainty quantification for fixed-time iterates in high-dimensional linear models under Gaussian-design assumptions. A non-Gaussian extension remains a natural and practically important direction, but the exact scope of what is already proved beyond Gaussian settings should be treated cautiously pending a dedicated 2025-2026 literature check.
§ Known Partial Results
Bellec et al. (2024): Under Gaussian-design assumptions, the cited paper establishes -scale uncertainty quantification for risk estimation at fixed iterate for several first-order algorithms. Open-status assessment for non-Gaussian designs requires dedicated literature verification.
§ References
Pierre C Bellec, Kai Tan (2024)
Annals of Statistics (to appear)
📍 Section 5 (Discussion), exact paragraph citation to be pinned after direct full-text verification of the final published version.
Primary source motivating this formalization; non-Gaussian extension remains an open direction.