Unsolved

Early-stopping optimality without a U-shape risk assumption

Sourced from the work of Pierre C Bellec, Kai Tan

§ Problem Statement

Setup

Consider the high-dimensional linear regression model with i.i.d. training data Dn={(xi,yi)}i=1nD_n=\{(x_i,y_i)\}_{i=1}^n, where for each ii,

yi=xiβ+εi,xiRp, βRp,y_i=x_i^\top\beta^\star+\varepsilon_i,\qquad x_i\in\mathbb R^p,\ \beta^\star\in\mathbb R^p,

with xiN(0,Σ)x_i\sim N(0,\Sigma) for some positive semidefinite ΣRp×p\Sigma\in\mathbb R^{p\times p}, εiN(0,σ2)\varepsilon_i\sim N(0,\sigma^2), and εi\varepsilon_i independent of xix_i. Let p=pnp=p_n be comparable to nn (for example pn/nκ(0,)p_n/n\to\kappa\in(0,\infty)). Let an iterative algorithm (such as gradient descent, proximal gradient descent, or an accelerated variant) produce estimators β^0,β^1,,β^TnRp\hat\beta^0,\hat\beta^1,\dots,\hat\beta^{T_n}\in\mathbb R^p, where each β^t\hat\beta^t is measurable with respect to DnD_n (and any internal algorithmic randomness), and Tn1T_n\ge 1 may depend on nn.

For each iteration t{1,,Tn}t\in\{1,\dots,T_n\}, define the population prediction risk (generalization error) of β^t\hat\beta^t by

Rt:=E ⁣[(Ynew(Xnew)β^t)2Dn],R_t:=\mathbb E\!\left[(Y^{\mathrm{new}}-(X^{\mathrm{new}})^\top \hat\beta^t)^2\mid D_n\right],

where (Xnew,Ynew)(X^{\mathrm{new}},Y^{\mathrm{new}}) is an independent test pair distributed as (xi,yi)(x_i,y_i). Thus (Rt)t=1Tn(R_t)_{t=1}^{T_n} is a random sequence induced by DnD_n.

Unsolved Problem

Construct a fully data-driven stopping time t^=t^(Dn){1,,Tn}\hat t=\hat t(D_n)\in\{1,\dots,T_n\} such that, without imposing any structural shape condition on tRtt\mapsto R_t (in particular, without assuming unimodality or a U-shape),

Rt^min1tTnRt=oP(1)as n,R_{\hat t}-\min_{1\le t\le T_n}R_t=o_{\mathbb P}(1)\quad\text{as }n\to\infty,

or, preferably, to prove a nonasymptotic oracle inequality of the form

P ⁣(Rt^min1tTnRt+Δn)1αn,\mathbb P\!\left(R_{\hat t}\le \min_{1\le t\le T_n}R_t+\Delta_n\right)\ge 1-\alpha_n,

with explicit Δn0\Delta_n\to 0 and αn0\alpha_n\to 0, under verifiable conditions on (n,pn,Σ,β,σ2)(n,p_n,\Sigma,\beta^\star,\sigma^2) and the iterative update rule.

§ Discussion

Loading discussion…

§ Significance & Implications

The cited Bellec-Tan guarantee is conditional on a U-shape assumption for the risk trajectory. Removing that assumption would substantially broaden the reliability of data-driven early stopping. Post-2024 results in related inverse-problem settings also indicate that additive adaptation slack can be intrinsic, so identifying the sharp achievable guarantee in this linear-regression setting is practically and theoretically important.

§ Known Partial Results

  • Bellec et al. (2024): Bellec-Tan provide trajectory risk estimators and, under their U-shape condition, data-driven selection of t^\hat t is near-oracle up to estimation error. A conservative generic formulation is

    Rt^min1tTnRt+2sup1tTnR^tRt,R_{\hat t}\le \min_{1\le t\le T_n}R_t + 2\sup_{1\le t\le T_n}|\hat R_t-R_t|,

    where R^t\hat R_t denotes the estimated risk used for selection; hence the guarantee is an approximate oracle inequality with explicit additive slack, not exact attainment of mintRt\min_t R_t. In related post-2024 early-stopping theory for inverse problems, explicit non-vanishing remainder terms and necessity phenomena are proved, suggesting that some adaptation slack may be unavoidable without additional structure.

§ References

[1]

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

Pierre C Bellec, Kai Tan (2024)

Annals of Statistics (future paper; no volume/DOI listed )

📍 Section 1.3 (Early stopping), unnumbered paragraph in the introduction, p. 4 of arXiv v1; publication status cross-checked against the Annals of Statistics Future Papers page on 2026-02-17.

Primary source where the U-shape-conditional early-stopping claim is stated.

[2]

Early stopping for conjugate gradients in statistical inverse problems

Laura Hucker, Markus Reiss (2025)

Numerische Mathematik 157 (2025), 1739-1791, doi:10.1007/s00211-025-01469-4

📍 Abstract and Section 3/Theorem 6.8 discussion (notably the explicit additive term and statement that no classical oracle inequality is available in that setting).

Related post-2024 evidence: oracle-type bounds can require explicit additive remainder terms, and a classical exact-oracle form need not hold.

§ Tags