Unsolved

Information Complexity of VC Learning

Posed by Thomas Steinke et al. (2020)

§ Problem Statement

Setup

Let X\mathcal{X} be a domain and H{0,1}X\mathcal{H}\subseteq\{0,1\}^{\mathcal{X}} a binary hypothesis class with VC dimension d:=VCdim(H)<d:=\mathrm{VCdim}(\mathcal{H})<\infty. For a distribution D\mathcal{D} over X×{0,1}\mathcal{X}\times\{0,1\} and classifier f{0,1}Xf\in\{0,1\}^{\mathcal{X}}, define the (0-1) risk LD(f):=Pr(x,y)D[f(x)y]L_{\mathcal{D}}(f):=\Pr_{(x,y)\sim\mathcal{D}}[f(x)\neq y].

A (possibly randomized, possibly improper) learner is a mapping AA that, on an i.i.d. sample S=((x1,y1),,(xn,yn))DnS=((x_1,y_1),\dots,(x_n,y_n))\sim\mathcal{D}^n, outputs a classifier A(S){0,1}XA(S)\in\{0,1\}^{\mathcal{X}}.

Define the conditional mutual information (CMI) of AA at sample size nn on D\mathcal{D} via the following experiment. Draw a supersample S~=((Zi,0,Zi,1))i=1nD2n\tilde S=((Z_{i,0},Z_{i,1}))_{i=1}^n\sim\mathcal{D}^{2n} (so each Zi,jX×{0,1}Z_{i,j}\in\mathcal{X}\times\{0,1\}). Draw independent selector bits U=(U1,,Un)U=(U_1,\dots,U_n) with UiBernoulli(1/2)U_i\sim\mathrm{Bernoulli}(1/2), form the training sample S=(Z1,U1,,Zn,Un)S=(Z_{1,U_1},\dots,Z_{n,U_n}), run the learner on SS, and set

CMID(A,n):=I(A(S);US~),\mathrm{CMI}_{\mathcal{D}}(A,n):= I(A(S);\,U\mid\tilde S),

where I(;)I(\cdot;\cdot\mid\cdot) is conditional mutual information over the randomness of S~\tilde S, UU, and the internal randomness of AA.

It is known from classical VC theory that there exist learners achieving distribution-free excess-risk bounds of the form

LD(A(S))infhHLD(h)+O ⁣(dlog(n/d)+log(1/δ)n)L_{\mathcal{D}}(A(S)) \le \inf_{h\in\mathcal{H}} L_{\mathcal{D}}(h) + O\!\left(\sqrt{\frac{d\log(n/d)+\log(1/\delta)}{n}}\right)

with probability at least 1δ1-\delta (up to universal constants).

Unsolved Problem

For every VC class H\mathcal{H} with VCdim(H)=d\mathrm{VCdim}(\mathcal{H})=d, do there exist learners AA that simultaneously (i) achieve the standard VC-type distribution-free excess-risk guarantee above for all distributions D\mathcal{D} and all nn, and (ii) have low information complexity as measured by CMI, e.g., supDCMID(A,n)\sup_{\mathcal{D}}\mathrm{CMI}_{\mathcal{D}}(A,n) bounded by a function as small as possible in terms of dd (and nn)? More generally, determine tight (asymptotically best-possible) upper and lower bounds, as functions of (d,n)(d,n), on

infA s.t. VC-type excess-risk holds supD CMID(A,n).\inf_{A\ \text{s.t. VC-type excess-risk holds}}\ \sup_{\mathcal{D}}\ \mathrm{CMI}_{\mathcal{D}}(A,n).

§ Discussion

Loading discussion…

§ Significance & Implications

VC dimension characterizes learnability via distribution-free generalization/excess-risk rates, while information-based analyses aim to quantify how much a learner's output depends on its sample. This problem asks whether one can always realize the classical VC rates using learners whose dependence on the sample is small in the precise CMI sense, and to pin down the minimal worst-case CMI needed (as a function of dd and nn) among algorithms that achieve VC-optimal learning guarantees.

§ Known Partial Results

  • Steinke et al. (2020): Classical VC theory gives distribution-free excess-risk rates for VC classes, on the order of (dlog(n/d)+log(1/δ))/n\sqrt{(d\log(n/d)+\log(1/\delta))/n} up to constants.

  • Steinke et al. (2020): The open-problem note states that if one restricts to proper and consistent learners and measures information complexity using standard mutual information (MI), then one cannot in general obtain the desired low-information guarantees (citing Bassily et al., 2018).

  • Steinke et al. (2020): as an alternative information measure for learning algorithms, motivated by its connection to generalization.

  • Steinke et al. (2020): The COLT 2020 open problem asks for (i) existence of VC-rate learners with small CMI for every VC class, and (ii) tight upper/lower bounds on the minimal achievable worst-case CMI in terms of dd (and possibly nn) under VC-type excess-risk requirements.

§ References

[1]

Open Problem: Information Complexity of VC Learning

Thomas Steinke, Lydia Zakynthinou (2020)

Conference on Learning Theory (COLT), PMLR 125

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Information Complexity of VC Learning (PDF)

Thomas Steinke, Lydia Zakynthinou (2020)

Conference on Learning Theory (COLT), PMLR 125

📍 Proceedings PDF.

§ Tags