Overparameterized optimal subsample size for infinite-ensemble subagging
Sourced from the work of Takuya Koriyama, Pratik Patil, Jin-Hong Du, Kai Tan, Pierre C. Bellec
§ Problem Statement
Setup
Assume the homogeneous subagging setup studied in Koriyama et al.: i.i.d. data in proportional asymptotics ( with ), with Gaussian design , linear signal-plus-noise response , and finite second moments for signal/noise. Base learners are regularized M-estimators with convex differentiable loss and convex penalty, trained on uniform subsamples of size , and the full-ensemble estimator is the limit (conditional subsample expectation).
Let denote squared prediction risk of the full-ensemble estimator, and let .
Unsolved Problem
In the vanishing-regularization regime , establish whether
holds under the full generality of the M-estimation framework. Existing results/evidence separate into: (i) proved or analytically derived behavior in specific tractable cases (notably ridgeless/squared-loss settings), (ii) empirical/numerical evidence in other cases (including lasso-type settings), and (iii) the unresolved unified conjecture across general losses/penalties.
For sequences with , the limsup formulation to test is correspondingly
rather than a pointwise eventual inequality.
§ Discussion
§ Significance & Implications
This would formalize when implicit regularization from subagging alone is sufficient to control prediction error without explicit penalization. A proof would clarify phase transitions in optimal subsampling and provide principled guidance for choosing in modern high-dimensional regimes. See Koriyama et al. for the current asymptotic formulas and evidence.
§ Known Partial Results
Koriyama et al. (2025): Koriyama et al. provide precise asymptotic risk characterizations for subagged regularized M-estimators. In specialized tractable regimes (notably ridgeless/squared-loss settings), the formulas support overparameterized-optimal- behavior; for broader settings (including lassoless/lasso-type cases), the paper presents supportive numerical evidence but not a single theorem covering all losses/penalties under vanishing regularization.
§ References
Precise Asymptotics of Bagging Regularized M-estimators
Takuya Koriyama, Pratik Patil, Jin-Hong Du, Kai Tan, Pierre C. Bellec (2025)
Annals of Statistics (future paper; to appear)
📍 Section 5.2 ("Optimal subsample size"), first paragraph and Figure 5 discussion of $k^\star$ shifting toward the overparameterized regime for vanishing explicit regularization (arXiv v3 dated 2025-09-27; canonical citation uses base arXiv id).
Primary source; IMS Annals of Statistics future-papers listing and arXiv preprint (latest public revision v3).