The Sample Complexity of Multi-Distribution Learning for VC Classes
Posed by Pranjal Awasthi et al. (2023)
§ Problem Statement
Setup
Let be a domain and let be a hypothesis class with VC dimension . Fix parameters , , and . In the realizable multi-distribution PAC setting, an instance consists of unknown distributions over and an unknown target . For each , the learner may draw i.i.d. labeled examples with , choosing an (adaptive) sample allocation with total sample size . The goal is to output, with probability at least over the sampled data, a hypothesis such that for every ,
Define to be the minimum for which there exists a (possibly adaptive) learner that succeeds for every domain , every class with , and every choice of and .
Unsolved Problem
Determine the correct asymptotic dependence of on (and logarithmically on ), closing the gap between the best known upper bound (for constant )
and the best known lower bound
§ Discussion
§ Significance & Implications
This problem isolates the information-theoretic cost of producing a single hypothesis that achieves error at most simultaneously on each of distributions, when learning a VC class of dimension under realizability. A tight characterization of would determine whether the additional terms appearing in current upper bounds beyond (e.g., ) are unavoidable in the worst case or can be removed by improved algorithms/analyses.
§ Known Partial Results
Awasthi et al. (2023): (Upper bound, constant .) For VC dimension and distributions, the COLT 2023 open-problem note states an upper bound of .
Awasthi et al. (2023): (Lower bound.) The same source states a lower bound of .
Awasthi et al. (2023): (Gap.) The current bounds match in their leading contribution but differ in their additional dependence on and .
Awasthi et al. (2023): (Methodological hurdle.) The note discusses obstacles that appear fundamental for attempts to tighten these bounds via game-dynamics-based approaches in statistical learning.
§ References
Open Problem: The Sample Complexity of Multi-Distribution Learning for VC Classes
Pranjal Awasthi, Nika Haghtalab, Eric Zhao (2023)
Conference on Learning Theory (COLT), PMLR 195
📍 Open-problem note in COLT proceedings.
Open Problem: The Sample Complexity of Multi-Distribution Learning for VC Classes (PDF)
Pranjal Awasthi, Nika Haghtalab, Eric Zhao (2023)
Conference on Learning Theory (COLT), PMLR 195
📍 Proceedings PDF.