§ Problem Statement
Setup
Let be a measurable space and let be an unknown probability distribution on . For each , let be an i.i.d. sample. Fix a hypothesis class and a measurable loss . Define the population (true) risk .
An empirical risk minimization (ERM) learner is a deterministic rule such that for every sample ,
with a fixed tie-breaking rule so that is single-valued.
Define the expected population risk at sample size by
(Here the expectations for and are over fresh i.i.d. samples from and , respectively.)
The learner is (locally) risk-monotone at if . Call -monotone if this inequality holds for all and all distributions on . Call weakly -monotone if there exists an integer (independent of ) such that holds for all and all distributions on .
Unsolved Problem
For ERM learners, determine conditions on and/or on restrictions on (e.g., well-specification/realizability assumptions) that are sufficient and/or necessary for -monotonicity or weak -monotonicity.
In particular, resolve -monotonicity and weak -monotonicity (existence of such an , and if it exists, the smallest valid ) for the following ERM setups on bounded domains:
-
Normal variance estimation: , , and is negative log-likelihood under ; ERM corresponds to maximum likelihood with .
-
Univariate linear regression: with and , , and ; ERM minimizes empirical mean squared error over .
§ Discussion
§ Significance & Implications
This problem asks for distribution-free (or assumption-qualified) guarantees that the expected population risk of ERM does not increase when the sample size grows by one. Such a guarantee is strictly finer than standard generalization or PAC-style statements because it constrains the entire learning curve , not just asymptotic consistency or rates. Pinning down when monotonicity holds (or must fail) clarifies when non-monotone learning curves are an unavoidable consequence of ERM + finite samples versus an artifact of modeling choices (loss, hypothesis class, tie-breaking) or missing distributional assumptions.
§ Known Partial Results
Viering et al. (2019): formalize expected risk monotonicity for learning algorithms, including the distribution-free notions of -monotonicity and eventual (weak) -monotonicity for ERM.
Viering et al. (2019): The same note provides examples showing that risk monotonicity can hold in some ERM settings and fail in others, even when monotonicity is evaluated in expectation over the random training sample.
Viering et al. (2019): The note relates risk monotonicity to PAC-learnability, emphasizing that standard learnability/generalization guarantees (which improve with in typical bounds) do not automatically imply that for every and every .
§ References
Open Problem: Monotonicity of Learning
Tom Viering, Alexander Mey, Marco Loog (2019)
Conference on Learning Theory (COLT), PMLR 99
📍 Open-problem note in COLT proceedings.
Open Problem: Monotonicity of Learning (PDF)
Tom Viering, Alexander Mey, Marco Loog (2019)
Conference on Learning Theory (COLT), PMLR 99
📍 Proceedings PDF.