Unsolved

Direct Sums in Learning Theory

Posed by Steve Hanneke et al. (2024)

§ Problem Statement

Setup

Let XX be an instance space and YY a (finite or infinite) label set. For an integer k1k\ge 1, write (Yk)={TY:T=k}\binom{Y}{k}=\{T\subseteq Y:|T|=k\}. A kk-list hypothesis is a function h:X(Yk)h:X\to \binom{Y}{k}, with 00-11 list loss on an example (x,y)X×Y(x,y)\in X\times Y given by (h,(x,y))=1[yh(x)]\ell(h,(x,y))=\mathbf{1}[y\notin h(x)]. For a distribution DD over X×YX\times Y, define population loss LD(h)=E(x,y)D[(h,(x,y))]L_D(h)=\mathbb{E}_{(x,y)\sim D}[\ell(h,(x,y))]. For an i.i.d. sample S=((x1,y1),,(xn,yn))DnS=((x_1,y_1),\ldots,(x_n,y_n))\sim D^n, define empirical loss LS(h)=1ni=1n1[yih(xi)]L_S(h)=\frac1n\sum_{i=1}^n \mathbf{1}[y_i\notin h(x_i)].

A (possibly improper) kk-list learning rule is a map A:(X×Y)((Yk))XA:(X\times Y)^*\to (\binom{Y}{k})^X. Its expected error at sample size nn on DD is

εn(DA)=ESDn[LD(A(S))].\varepsilon_n(D\mid A)=\mathbb{E}_{S\sim D^n}[L_D(A(S))].

A distribution DD is realizable by a (standard, 11-list) concept class CYX\mathcal{C}\subseteq Y^X if there exists cCc\in\mathcal{C} with Pr(x,y)D[y=c(x)]=1\Pr_{(x,y)\sim D}[y=c(x)]=1. The (realizable) PAC learning curve of C\mathcal{C} is

ε(nC)=infAsupD realizable by Cεn(DA),\varepsilon(n\mid\mathcal{C})=\inf_A\sup_{D\ \mathrm{realizable\ by}\ \mathcal{C}}\varepsilon_n(D\mid A),

where the infimum ranges over all 11-list learning rules AA. For a fixed marginal distribution D\mathcal{D} over XX, define Dc\mathcal{D}_c by xDx\sim\mathcal{D} and y=c(x)y=c(x), and the fixed-marginal learning curve

ε(nD,C)=infAsupcCεn(DcA).\varepsilon(n\mid\mathcal{D},\mathcal{C})=\inf_A\sup_{c\in\mathcal{C}}\varepsilon_n(\mathcal{D}_c\mid A).

For concept classes C1Y1X1\mathcal{C}_1\subseteq Y_1^{X_1} and C2Y2X2\mathcal{C}_2\subseteq Y_2^{X_2}, define their product (direct sum) class C1C2(Y1×Y2)(X1×X2)\mathcal{C}_1\otimes\mathcal{C}_2\subseteq (Y_1\times Y_2)^{(X_1\times X_2)} by

C1C2={(x1,x2)(c1(x1),c2(x2)):c1C1, c2C2}.\mathcal{C}_1\otimes\mathcal{C}_2=\{(x_1,x_2)\mapsto (c_1(x_1),c_2(x_2)) : c_1\in\mathcal{C}_1,\ c_2\in\mathcal{C}_2\}.

For rNr\in\mathbb{N}, write Cr=CC\mathcal{C}^r=\mathcal{C}\otimes\cdots\otimes\mathcal{C} (rr times). If D\mathcal{D} is a distribution on XX, write Dr\mathcal{D}^r for the product distribution on XrX^r.

Unsolved Problem

Whether joint learning on product classes can beat the naive linear direct-sum bound, and to characterize how key learnability quantities behave under \otimes, in the following concrete forms.

  1. Realizable direct-sum (worst-case distributions): For every C\mathcal{C} and r2r\ge 2, one has ε(nCr)rε(nC)\varepsilon(n\mid\mathcal{C}^r)\le r\,\varepsilon(n\mid\mathcal{C}). Does there exist C\mathcal{C} and r2r\ge 2 such that ε(nCr)=o(rε(nC))\varepsilon(n\mid\mathcal{C}^r)=o(r\,\varepsilon(n\mid\mathcal{C})) as nn\to\infty?

  2. Realizable direct-sum (fixed marginal): For every C\mathcal{C}, D\mathcal{D}, and r2r\ge 2, one has ε(nDr,Cr)rε(nD,C)\varepsilon(n\mid\mathcal{D}^r,\mathcal{C}^r)\le r\,\varepsilon(n\mid\mathcal{D},\mathcal{C}). Does there exist (C,D,r)(\mathcal{C},\mathcal{D},r) such that ε(nDr,Cr)=o(rε(nD,C))\varepsilon(n\mid\mathcal{D}^r,\mathcal{C}^r)=o(r\,\varepsilon(n\mid\mathcal{D},\mathcal{C})) as nn\to\infty?

  3. Uniform convergence under products: With

εUC(nC)=supD ESDn[suphCLD(h)LS(h)],\varepsilon_{\mathrm{UC}}(n\mid\mathcal{C})=\sup_D\ \mathbb{E}_{S\sim D^n}\Big[\sup_{h\in\mathcal{C}}\big|L_D(h)-L_S(h)\big|\Big],

determine the dependence of εUC(nCr)\varepsilon_{\mathrm{UC}}(n\mid\mathcal{C}^r) on rr and on εUC(nC)\varepsilon_{\mathrm{UC}}(n\mid\mathcal{C}).

  1. Agnostic direct sums: With
εagn(nC)=infAsupD(ESDn[LD(A(S))]infcCLD(c)),\varepsilon_{\mathrm{agn}}(n\mid\mathcal{C})=\inf_A\sup_D\Big(\mathbb{E}_{S\sim D^n}[L_D(A(S))]-\inf_{c\in\mathcal{C}}L_D(c)\Big),

determine the dependence of εagn(nCr)\varepsilon_{\mathrm{agn}}(n\mid\mathcal{C}^r) on rr and on εagn(nC)\varepsilon_{\mathrm{agn}}(n\mid\mathcal{C}).

  1. Minimal list size under products: For a standard class CYX\mathcal{C}\subseteq Y^X, let K(C)N{}K(\mathcal{C})\in\mathbb{N}\cup\{\infty\} be the minimal kk such that C\mathcal{C} is realizable kk-list PAC learnable (or \infty if no such kk exists). Given C1,C2\mathcal{C}_1,\mathcal{C}_2, determine tight bounds on K(C1C2)K(\mathcal{C}_1\otimes\mathcal{C}_2) as a function of K(C1)K(\mathcal{C}_1) and K(C2)K(\mathcal{C}_2).

  2. Compressibility and products: A finite-size kk-list sample compression scheme for C\mathcal{C} consists of a compressor and reconstructor such that for every finite labeled sample SS realizable by C\mathcal{C}, the reconstructor outputs a kk-list hypothesis hh consistent with SS (i.e. yh(x)y\in h(x) for all (x,y)S(x,y)\in S) from a bounded-size message produced by the compressor. Let Kcomp(C)K_{\mathrm{comp}}(\mathcal{C}) be the minimal kk such that C\mathcal{C} is kk-list compressible. Given C1,C2\mathcal{C}_1,\mathcal{C}_2, determine (or tightly bound) the minimal kk for which C1C2\mathcal{C}_1\otimes\mathcal{C}_2 is realizable kk-list PAC learnable as a function of Kcomp(C1)K_{\mathrm{comp}}(\mathcal{C}_1) and Kcomp(C2)K_{\mathrm{comp}}(\mathcal{C}_2).

§ Discussion

Loading discussion…

§ Significance & Implications

Direct-sum questions in learning theory ask whether learning rr labeled prediction problems that factor across coordinates (i.e. a product class Cr\mathcal{C}^r) necessarily costs Theta(rr) times the samples needed for one task, or whether joint learning can provably achieve a sublinear-in-rr improvement in worst-case expected excess error. A sharp answer would pin down when multitask/product structure only yields the trivial union-bound scaling versus when it yields genuine sample reuse, and it would also clarify how foundational quantities (realizable and agnostic learning curves, uniform convergence rates, and minimal list size/compressibility parameters) behave under forming direct-sum/product concept classes.

§ Known Partial Results

  • Hanneke et al. (2024): Trivial direct-sum upper bounds hold for learning curves: for every C\mathcal{C} and r2r\ge 2, ε(nCr)rε(nC)\varepsilon(n\mid\mathcal{C}^r)\le r\,\varepsilon(n\mid\mathcal{C}); and for every fixed marginal D\mathcal{D}, ε(nDr,Cr)rε(nD,C)\varepsilon(n\mid\mathcal{D}^r,\mathcal{C}^r)\le r\,\varepsilon(n\mid\mathcal{D},\mathcal{C}).

  • Hanneke et al. (2024): For minimal list size, the COLT 2024 note states general multiplicative-type bounds for any classes F,G\mathcal{F},\mathcal{G}:

  • Hanneke et al. (2024): For specific multiclass dimensions, the note records partial product rules: for Natarajan dimension dNd_N,

  • Hanneke et al. (2024): Using list Daniely-Shwartz (DS) dimension inequalities as stated in the note, non-learnability can tensorize: if F\mathcal{F} is not kk-list learnable and G\mathcal{G} is not kk'-list learnable, then FG\mathcal{F}\otimes\mathcal{G} is not kkkk'-list learnable.

  • Hanneke et al. (2024): The note highlights a route to linear-in-rr lower bounds for realizable direct sums via DS dimension: it states linear growth of DS1(Cr)\mathrm{DS}_1(\mathcal{C}^r) in rr and relates this (via known DS-based lower bounds on realizable learning curves) to linear-in-rr lower bounds on ε(nCr)\varepsilon(n\mid\mathcal{C}^r) up to constants and 1/n1/n scaling, pointing to the importance of pinning down the tight relationship between ε(nC)\varepsilon(n\mid\mathcal{C}) and DS1(C)/n\mathrm{DS}_1(\mathcal{C})/n.

  • Hanneke et al. (2024): For compression versus learnability, the note records that kk-list compression can be strictly stronger than kk-list learnability (i.e. there exist 22-list learnable classes with no finite 22-list compression scheme), motivating the open question about how \otimes interacts with compressibility-based parameters.

§ References

[1]

Open problem: Direct Sums in Learning Theory

Steve Hanneke, Shay Moran, Waknine Tom (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Open-problem note in COLT proceedings.

[2]

Open problem: Direct Sums in Learning Theory (PDF)

Steve Hanneke, Shay Moran, Waknine Tom (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Proceedings PDF.

§ Tags