Improper learning of mixtures of Gaussians
Posed by Elad Hazan et al. (2018)
§ Problem Statement
Setup
Fix integers and , an accuracy parameter , and a radius . Let , and let be an unknown distribution supported on (in particular, the motivating case is when is an overcomplete mixture of Gaussians with ).
A (deterministic) -bit pointwise compression scheme is a pair with encoder and decoder . Its squared reconstruction error on is
Define the (spherical) -center quantization class as follows: for any codebook (centers) , the encoder outputs an index (ties broken arbitrarily), represented using bits, and the decoder outputs . The optimal -center distortion is
An (improper) compression-based learner is an algorithm that, given i.i.d. samples from , outputs a (not necessarily -center) compression scheme with code length such that, with probability at least over the samples,
Unsolved Problem
Problem 2018. In the overcomplete regime , does there exist a computationally efficient (polynomial-time in and the sample size) improper learner achieving the above guarantee while using only
bits per point (i.e., with no polynomial dependence on in the code length)? Alternatively, can one prove that no such efficient learner exists under this worst-case, compression-based notion of learning?
§ Discussion
§ Significance & Implications
This problem asks whether allowing improper reconstructions (arbitrary decoders/encoders learned from data) can yield a worst-case additive- competitor to the optimal -center distortion while keeping the per-point description length near (up to polylog factors), even when . A positive result would formalize an efficient, sample-based route to near-optimal quantization error for distributions such as overcomplete Gaussian mixtures without committing to a generative/proper mixture model. A negative result would isolate a concrete barrier showing that, even after relaxing to improper compression schemes, achieving with only polylogarithmic bits is computationally out of reach in the overcomplete setting.
§ Known Partial Results
Hazan et al. (2018): The COLT 2018 open-problem note motivates the question from unsupervised learning of (overcomplete) mixtures of Gaussians, and explicitly switches to a worst-case, compression-based objective to allow improper learners.
Hazan et al. (2018): The benchmark objective is the population -means/-center squared-distortion objective; even the corresponding finite-sample optimization problem is NP-hard in general.
Hazan et al. (2018): If the code-length constraint is dropped, trivial worst-case additive- reconstruction is possible by quantizing all of to an -net, at the cost of bits per point (far larger than when is large).
Hazan et al. (2018): A different improper relaxation that reconstructs points from low-dimensional linear structure leads to PCA-type algorithms, but representing per-point coefficients to achieve error requires code length scaling on the order of the intrinsic dimension times , which does not match the desired polylogarithmic dependence on .
Hazan et al. (2018): The note points to bi-criteria approximations for -means (e.g., allowing more than centers) and to convex/spectral approaches in the broader comparative/worst-case unsupervised-learning literature as potential starting points, but does not resolve whether polylog-bit improper learning is achievable for .
§ References
Open problem: Improper learning of mixtures of Gaussians
Elad Hazan, Livni Roi (2018)
Conference on Learning Theory (COLT), PMLR 75
📍 Open-problem note in COLT proceedings.
Open problem: Improper learning of mixtures of Gaussians (PDF)
Elad Hazan, Livni Roi (2018)
Conference on Learning Theory (COLT), PMLR 75
📍 Proceedings PDF.