Unsolved

Can Local Regularization Learn All Multiclass Problems?

Posed by Julian Asilis et al. (2024)

§ Problem Statement

Setup

Fix a domain XX, a label set YY (possibly infinite), and a hypothesis class HYXH \subseteq Y^X. For a distribution D\mathcal{D} over X×YX \times Y and a predictor f:XYf:X\to Y, define the 00-11 risk LD(f)=Pr(x,y)D[f(x)y]L_{\mathcal{D}}(f)=\Pr_{(x,y)\sim \mathcal{D}}[f(x)\neq y]. For a sample S=((x1,y1),,(xn,yn))(X×Y)nS=((x_1,y_1),\dots,(x_n,y_n))\in (X\times Y)^n, define the empirical risk LS(h)=1ni=1n1[h(xi)yi]L_S(h)=\frac{1}{n}\sum_{i=1}^n \mathbf{1}[h(x_i)\neq y_i] and the set of sample-consistent hypotheses HS:={hH:LS(h)=0}H_S:=\{h\in H: L_S(h)=0\}. Call D\mathcal{D} realizable w.r.t. HH if there exists hHh^*\in H with LD(h)=0L_{\mathcal{D}}(h^*)=0.

A (possibly improper) learning algorithm AA maps each finite sample SS to a predictor A(S)YXA(S)\in Y^X. We say AA PAC-learns HH in the realizable setting if there exists a sample complexity function mA:(0,1)2Nm_A:(0,1)^2\to\mathbb{N} such that for every realizable D\mathcal{D} and every ϵ,δ(0,1)\epsilon,\delta\in(0,1), if SDnS\sim \mathcal{D}^n with nmA(ϵ,δ)n\ge m_A(\epsilon,\delta) then

PrSDn[LD(A(S))ϵ]1δ.\Pr_{S\sim \mathcal{D}^n}\big[L_{\mathcal{D}}(A(S))\le \epsilon\big] \ge 1-\delta.

Let mH(ϵ,δ):=infAmA(ϵ,δ)m_H(\epsilon,\delta):=\inf_A m_A(\epsilon,\delta) denote the pointwise-optimal realizable PAC sample complexity over all learners for HH.

A local regularizer for HH is a function ψ:H×XR0\psi:H\times X\to \mathbb{R}_{\ge 0}. A learner AA is induced by ψ\psi if for every sample SS and every test point xXx\in X,

A(S)(x){h(x): hargmingHSψ(g,x)}.A(S)(x) \in \{h(x):\ h\in \operatorname*{argmin}_{g\in H_S} \psi(g,x)\}.

(Thus ψ\psi may induce multiple learners via tie-breaking.) Say that ψ\psi learns HH if every learner induced by ψ\psi PAC-learns HH.

Unsolved Problem

Problem 2024. For multiclass classification, is it true that every realizable-PAC-learnable hypothesis class HH can be learned by some local regularizer ψ\psi? If yes, can one choose ψ\psi so that at least one induced learner has sample complexity matching mH(ϵ,δ)m_H(\epsilon,\delta) up to constant factors (or more permissively, up to polylogarithmic factors in 1/ϵ1/\epsilon and 1/δ1/\delta)?

§ Discussion

Loading discussion…

§ Significance & Implications

Local regularization is an ERM-like template: it selects, for each test point xx, a label among sample-consistent hypotheses by minimizing a pointwise regularizer ψ(h,x)\psi(h,x). The question is whether this restricted optimization-based mechanism is universal for realizable multiclass PAC learning, despite known phenomena unique to multiclass settings (e.g., the need for improper prediction and the use of one-inclusion-graph-based constructions in existing characterizations). A positive answer would yield a simple, uniform recipe for building learners (and potentially near-optimal learners) from an appropriate choice of ψ\psi, avoiding the explicit combinatorial machinery of known optimal learners. A negative answer would separate learnability from learnability-by-local-regularization, pinpointing when dependence on the unlabeled geometry of the sample (as in one-inclusion graph methods) is genuinely necessary and cannot be eliminated by any pointwise regularizer.

§ Known Partial Results

  • Asilis et al. (2024): The COLT 2024 note formalizes local regularization via ψ:H×XR0\psi:H\times X\to\mathbb{R}_{\ge 0} and induced learners that, for each test point xx, output h(x)h(x) for a sample-consistent hHSh\in H_S minimizing ψ(h,x)\psi(h,x).

  • Asilis et al. (2024): The note highlights that, unlike binary classification where ERM over HH attains near-optimal sample complexity when VC dimension is finite, multiclass learning can require qualitatively different (improper) learners.

  • Daniely et al. (2014): show there exist realizable, PAC-learnable multiclass classes for which any optimal generic learner must be improper; in particular, ERM (and other proper templates optimizing over HH) cannot be universally optimal.

  • Asilis et al. (2024): Existing multiclass learners/characterizations discussed in the note rely on orientations of one-inclusion graphs associated with HH, which can depend on the unlabeled sample inputs and may be infinite when YY is infinite.

  • Asilis et al. (2024): study a strengthened variant, unsupervised local regularization, where the regularizer may also depend on the unlabeled datapoints in the training sample; the note states this variant can learn all multiclass problems with near-optimal sample complexity, leaving open whether the unlabeled dependence can be removed.

  • Asilis et al. (2024): The note proposes a candidate counterexample class HH_\triangle with XX infinite and labels Y={}2XY=\{*\}\cup 2^X, defined from triples (A,B,C)(A,B,C) of finite subsets with A=B=C=k|A|=|B|=|C|=k and pairwise intersections of size k/2k/2; each hypothesis outputs * off ABCA\cup B\cup C and is constant on each of ABA\cap B, ACA\cap C, and BCB\cap C with values chosen from the corresponding pair of set-labels. The class is argued to be PAC learnable by a simple default-* rule, but the note conjectures symmetry may prevent any (supervised) local regularizer from selecting the "natural" label without effectively using unlabeled sample geometry, leading to Open Problem 2 for HH_\triangle.

§ References

[1]

Open Problem: Can Local Regularization Learn All Multiclass Problems?

Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Can Local Regularization Learn All Multiclass Problems? (PDF)

Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Proceedings PDF.

[3]

Optimal learners for multiclass problems

Amit Daniely, Shai Shalev-Shwartz (2014)

Conference on Learning Theory (COLT), PMLR 35

📍 PMLR

[4]

A Characterization of Multiclass Learnability

Nataly Brukhim, Daniel Carmon, Irit Dinur, Shay Moran, Amir Yehudayoff (2022)

IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS 2022)

📍 Related work and partial progress context.

§ Tags