Properly learning decision trees in polynomial time?
Posed by Guy Blanc et al. (2022)
§ Problem Statement
Setup
Let and let be an unknown Boolean function promised to be representable by a rooted Boolean decision tree with at most leaves (i.e., each internal node queries one input bit, and each leaf is labeled in ). Let denote the uniform distribution over . A randomized proper learner is given and oracle access to membership queries for (on any adaptively chosen , the oracle returns ); since is known, the learner may also generate independent samples . The learner must output an explicit decision tree hypothesis such that, with probability at least over its internal randomness, the uniform error is at most :
Unsolved Problem
Does there exist such a proper learning algorithm whose running time (and total number of membership queries) is bounded by ? Equivalently, can properly learning size- decision trees under with membership queries be done in true polynomial time, improving over the best known quasipolynomial and almost-polynomial bounds?
§ Discussion
§ Significance & Implications
This is a canonical proper-learning benchmark: the output must be a decision tree (not an arbitrary hypothesis), so the goal is to recover an explicit structured model consistent with the decision-tree representation. Resolving whether polynomial-time proper learning is possible under the uniform distribution with membership queries would pin down the algorithmic complexity of a basic concept class and determine whether the current gap between almost-polynomial/quasipolynomial algorithms and a genuine algorithm reflects an inherent barrier or is merely an artifact of existing techniques.
§ Known Partial Results
Blanc et al. (2022): Blanc, Lange, Qiao, and Tan (BLQT21) gave an almost-polynomial-time membership-query algorithm for properly learning decision trees under the uniform distribution.
Blanc et al. (2022): Prior to BLQT21, the fastest known approach for this setting was quasipolynomial time, obtainable as a consequence of the classic distribution-free decision-tree learning algorithm of Ehrenfeucht and Haussler (EH89).
Blanc et al. (2022): The COLT 2022 open-problem note (Blanc, Lange, Qiao, Tan, 2022) isolates the remaining gap (almost-polynomial/quasipolynomial versus polynomial time) and discusses intermediate milestones as potential stepping stones.
§ References
Open Problem: Properly learning decision trees in polynomial time?
Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan (2022)
Conference on Learning Theory (COLT), PMLR 178
📍 Open-problem note in COLT proceedings.
Open Problem: Properly learning decision trees in polynomial time? (PDF)
Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan (2022)
Conference on Learning Theory (COLT), PMLR 178
📍 Proceedings PDF.