Nonasymptotic guarantees for bagged regularized M-estimators
Sourced from the work of Takuya Koriyama, Pratik Patil, Jin-Hong Du, Kai Tan, Pierre C. Bellec
§ Problem Statement
Setup
Formal setup (matching the paper): under Assumptions A--D, let have i.i.d. entries, with i.i.d. signal coordinates from and i.i.d. noise coordinates from , and subsample index sets (with ) drawn independently as in Assumption B. For each , train the residual-loss regularized estimator
where each satisfies Assumption C, and Assumption D gives the additional moment/regularity conditions used in the risk-limit theory. Define the bagged estimator and the squared excess prediction risk
for independent of training data. (When , full squared prediction risk adds the irreducible term .) Let be the source's data-dependent risk estimator, and let denote the deterministic proportional-limit risk for the corresponding proportional-asymptotic regime.
This setup follows Koriyama et al. (2025).
Unsolved Problem
The paper explicitly asks whether hyperparameters tuned by minimizing (e.g., subsample ratio and regularization level) are close to oracle hyperparameters minimizing true excess risk (or its deterministic limit ), and notes this would require suitable smoothness (e.g., Holder/Lipschitz) of excess risk or its limit as a function of hyperparameters. A stronger target, proposed here as an extension, is a uniform nonasymptotic guarantee over a tuning class :
for an appropriate finite-sample risk proxy (e.g., -type criteria), with explicit rates in . Broader formulations (anisotropic designs, deterministic signals, non-separable penalties, or beyond the Assumptions A--D regime) should be treated as extensions beyond this core problem statement.
§ Discussion
§ Significance & Implications
The paper gives precise proportional asymptotics and a consistent risk estimator, but practical tuning requires finite-sample control of the oracle gap. Proving nonasymptotic guarantees for estimator-based tuning would connect asymptotic risk formulas to defensible hyperparameter selection at realistic sample sizes.
§ Known Partial Results
Koriyama et al. (2025): Under Assumptions A--D, the paper proves deterministic proportional-limit formulas for squared excess prediction risk (including heterogeneous ensembles) and establishes consistency of a data-dependent risk estimator (Corollary 6), but does not provide general uniform nonasymptotic oracle-tuning guarantees.
§ References
Precise Asymptotics of Bagging Regularized M-estimators
Takuya Koriyama, Pratik Patil, Jin-Hong Du, Kai Tan, Pierre C. Bellec (2025)
Annals of Statistics (in press)
📍 Section 3.4, paragraph immediately after Corollary 6 and equation (14), beginning "A natural question arising from the above discussion is whether hyperparameters..." (p. 16 in arXiv v3).
Primary source paper; listed on Annals of Statistics Future Papers as in press (no final volume/issue pages or journal DOI publicly listed ).