Information Complexity of VC Learning
Posed by Thomas Steinke et al. (2020)
§ Problem Statement
Setup
Let be a domain and a binary hypothesis class with VC dimension . For a distribution over and classifier , define the (0-1) risk .
A (possibly randomized, possibly improper) learner is a mapping that, on an i.i.d. sample , outputs a classifier .
Define the conditional mutual information (CMI) of at sample size on via the following experiment. Draw a supersample (so each ). Draw independent selector bits with , form the training sample , run the learner on , and set
where is conditional mutual information over the randomness of , , and the internal randomness of .
It is known from classical VC theory that there exist learners achieving distribution-free excess-risk bounds of the form
with probability at least (up to universal constants).
Unsolved Problem
For every VC class with , do there exist learners that simultaneously (i) achieve the standard VC-type distribution-free excess-risk guarantee above for all distributions and all , and (ii) have low information complexity as measured by CMI, e.g., bounded by a function as small as possible in terms of (and )? More generally, determine tight (asymptotically best-possible) upper and lower bounds, as functions of , on
§ Discussion
§ Significance & Implications
VC dimension characterizes learnability via distribution-free generalization/excess-risk rates, while information-based analyses aim to quantify how much a learner's output depends on its sample. This problem asks whether one can always realize the classical VC rates using learners whose dependence on the sample is small in the precise CMI sense, and to pin down the minimal worst-case CMI needed (as a function of and ) among algorithms that achieve VC-optimal learning guarantees.
§ Known Partial Results
Steinke et al. (2020): Classical VC theory gives distribution-free excess-risk rates for VC classes, on the order of up to constants.
Steinke et al. (2020): The open-problem note states that if one restricts to proper and consistent learners and measures information complexity using standard mutual information (MI), then one cannot in general obtain the desired low-information guarantees (citing Bassily et al., 2018).
Steinke et al. (2020): as an alternative information measure for learning algorithms, motivated by its connection to generalization.
Steinke et al. (2020): The COLT 2020 open problem asks for (i) existence of VC-rate learners with small CMI for every VC class, and (ii) tight upper/lower bounds on the minimal achievable worst-case CMI in terms of (and possibly ) under VC-type excess-risk requirements.
§ References
Open Problem: Information Complexity of VC Learning
Thomas Steinke, Lydia Zakynthinou (2020)
Conference on Learning Theory (COLT), PMLR 125
📍 Open-problem note in COLT proceedings.
Open Problem: Information Complexity of VC Learning (PDF)
Thomas Steinke, Lydia Zakynthinou (2020)
Conference on Learning Theory (COLT), PMLR 125
📍 Proceedings PDF.