Unsolved

Properly learning decision trees in polynomial time?

Posed by Guy Blanc et al. (2022)

§ Problem Statement

Setup

Let n,sNn,s\in\mathbb{N} and let f:{0,1}n{0,1}f:\{0,1\}^n\to\{0,1\} be an unknown Boolean function promised to be representable by a rooted Boolean decision tree with at most ss leaves (i.e., each internal node queries one input bit, and each leaf is labeled in {0,1}\{0,1\}). Let U\mathcal{U} denote the uniform distribution over {0,1}n\{0,1\}^n. A randomized proper learner is given (n,s,ϵ,δ)(n,s,\epsilon,\delta) and oracle access to membership queries for ff (on any adaptively chosen x{0,1}nx\in\{0,1\}^n, the oracle returns f(x)f(x)); since U\mathcal{U} is known, the learner may also generate independent samples xUx\sim\mathcal{U}. The learner must output an explicit decision tree hypothesis h:{0,1}n{0,1}h:\{0,1\}^n\to\{0,1\} such that, with probability at least 1δ1-\delta over its internal randomness, the uniform error is at most ϵ\epsilon:

PrxU[h(x)f(x)]ϵ.\Pr_{x\sim\mathcal{U}}[h(x)\neq f(x)]\le \epsilon.

Unsolved Problem

Does there exist such a proper learning algorithm whose running time (and total number of membership queries) is bounded by poly(n,s,1/ϵ,log(1/δ))\mathrm{poly}(n,s,1/\epsilon,\log(1/\delta))? Equivalently, can properly learning size-ss decision trees under U\mathcal{U} with membership queries be done in true polynomial time, improving over the best known quasipolynomial and almost-polynomial bounds?

§ Discussion

Loading discussion…

§ Significance & Implications

This is a canonical proper-learning benchmark: the output must be a decision tree (not an arbitrary hypothesis), so the goal is to recover an explicit structured model consistent with the decision-tree representation. Resolving whether polynomial-time proper learning is possible under the uniform distribution with membership queries would pin down the algorithmic complexity of a basic concept class and determine whether the current gap between almost-polynomial/quasipolynomial algorithms and a genuine poly(n,s,1/ϵ)\mathrm{poly}(n,s,1/\epsilon) algorithm reflects an inherent barrier or is merely an artifact of existing techniques.

§ Known Partial Results

  • Blanc et al. (2022): Blanc, Lange, Qiao, and Tan (BLQT21) gave an almost-polynomial-time membership-query algorithm for properly learning decision trees under the uniform distribution.

  • Blanc et al. (2022): Prior to BLQT21, the fastest known approach for this setting was quasipolynomial time, obtainable as a consequence of the classic distribution-free decision-tree learning algorithm of Ehrenfeucht and Haussler (EH89).

  • Blanc et al. (2022): The COLT 2022 open-problem note (Blanc, Lange, Qiao, Tan, 2022) isolates the remaining gap (almost-polynomial/quasipolynomial versus polynomial time) and discusses intermediate milestones as potential stepping stones.

§ References

[1]

Open Problem: Properly learning decision trees in polynomial time?

Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Properly learning decision trees in polynomial time? (PDF)

Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Proceedings PDF.

§ Tags