Early-stopping optimality without a U-shape risk assumption
Sourced from the work of Pierre C Bellec, Kai Tan
§ Problem Statement
Setup
Consider the high-dimensional linear regression model with i.i.d. training data , where for each ,
with for some positive semidefinite , , and independent of . Let be comparable to (for example ). Let an iterative algorithm (such as gradient descent, proximal gradient descent, or an accelerated variant) produce estimators , where each is measurable with respect to (and any internal algorithmic randomness), and may depend on .
For each iteration , define the population prediction risk (generalization error) of by
where is an independent test pair distributed as . Thus is a random sequence induced by .
Unsolved Problem
Construct a fully data-driven stopping time such that, without imposing any structural shape condition on (in particular, without assuming unimodality or a U-shape),
or, preferably, to prove a nonasymptotic oracle inequality of the form
with explicit and , under verifiable conditions on and the iterative update rule.
§ Discussion
§ Significance & Implications
The cited Bellec-Tan guarantee is conditional on a U-shape assumption for the risk trajectory. Removing that assumption would substantially broaden the reliability of data-driven early stopping. Post-2024 results in related inverse-problem settings also indicate that additive adaptation slack can be intrinsic, so identifying the sharp achievable guarantee in this linear-regression setting is practically and theoretically important.
§ Known Partial Results
Bellec et al. (2024): Bellec-Tan provide trajectory risk estimators and, under their U-shape condition, data-driven selection of is near-oracle up to estimation error. A conservative generic formulation is
where denotes the estimated risk used for selection; hence the guarantee is an approximate oracle inequality with explicit additive slack, not exact attainment of . In related post-2024 early-stopping theory for inverse problems, explicit non-vanishing remainder terms and necessity phenomena are proved, suggesting that some adaptation slack may be unavoidable without additional structure.
§ References
Pierre C Bellec, Kai Tan (2024)
Annals of Statistics (future paper; no volume/DOI listed )
📍 Section 1.3 (Early stopping), unnumbered paragraph in the introduction, p. 4 of arXiv v1; publication status cross-checked against the Annals of Statistics Future Papers page on 2026-02-17.
Primary source where the U-shape-conditional early-stopping claim is stated.
Early stopping for conjugate gradients in statistical inverse problems
Laura Hucker, Markus Reiss (2025)
Numerische Mathematik 157 (2025), 1739-1791, doi:10.1007/s00211-025-01469-4
📍 Abstract and Section 3/Theorem 6.8 discussion (notably the explicit additive term and statement that no classical oracle inequality is available in that setting).
Related post-2024 evidence: oracle-type bounds can require explicit additive remainder terms, and a classical exact-oracle form need not hold.