Rigorous tempered-overfitting guarantees for MDL ReLU interpolators under label noise
Sourced from the work of Sourav Chatterjee, Timothy Sudijono
§ Problem Statement
Setup
Fix an input space and let be i.i.d. samples from a distribution on . Assume there exists a binary target function such that
where is XOR, and conditionally on , with measurable for some fixed . Then and
Assume has finite description complexity in a fixed prefix-free language :
Let be binary classifiers representable by finite feedforward ReLU networks (real-valued output thresholded at ), each with prefix-free code length . Define via penalized ERM with approximate minimization: for given and tolerance , choose measurable such that
This setup follows Chatterjee & Sudijono (2024).
Unsolved Problem
Under explicit assumptions linking program complexity, ReLU approximation/realizability, coding choices, and the sequences , prove a nonasymptotic high-probability excess-risk bound
with explicit and as (for fixed and fixed complexity/noise parameters). Motivation: this would rigorously quantify whether and when MDL regularization tempers overfitting in noisy-label ReLU interpolation, extending the source paper's noiseless guarantees.
§ Discussion
§ Significance & Implications
The abstract of Chatterjee & Sudijono (2024) proves strong generalization in noiseless low-complexity settings and only indicates noisy extensions heuristically. A rigorous theorem in the noisy regime would formalize whether MDL-style selection avoids catastrophic memorization and quantify when overfitting is tempered under label noise.
§ Known Partial Results
Chatterjee et al. (2024): Theorem 5.1 proves high-probability generalization guarantees in the noiseless low-complexity setting, and Section 7 discusses noisy-label extensions only as future work; a full proved noisy-label theorem is still open.
§ References
Neural Networks Generalize on Low Complexity Data
Sourav Chatterjee, Timothy Sudijono (2024)
arXiv preprint
📍 Theorem 5.1 (main noiseless high-probability generalization guarantee) and Section 7 (Discussion), paragraph after Theorem 5.1 noting noisy-observation extensions as future work rather than a proved noisy-label theorem.
Primary source is the 2024 arXiv preprint; Annals of Statistics acceptance/to-appear status is separate (reported in 2025) and not itself a noisy-regime theorem.