Tight Characterization of Instance-Optimal Identity Testing
Posed by Clément Canonne (2024)
§ Problem Statement
Setup
Let be a known discrete probability distribution over a countable domain , given to the tester via a succinct description, and let . The tester is given i.i.d. samples from an unknown distribution over and must, with success probability at least (over its internal randomness and the samples), distinguish between
where total variation distance is . Define the instance-optimal sample complexity as the minimum integer for which there exists such a (possibly randomized) tester that uses at most samples from .
Unsolved Problem
Characterize up to universal constant factors for all discrete and all , in terms of a simple, explicit instance-dependent functional of the reference distribution and accuracy: identify a functional and absolute constants such that
for all . Equivalently, what is the correct instance-dependent quantity that governs the optimal sample complexity of identity testing to a fixed reference distribution at distance parameter ?
§ Discussion
§ Significance & Implications
Worst-case identity-testing bounds optimize over the hardest reference distributions, but can substantially overestimate the samples needed for a given, fixed . A tight instance-optimal characterization would (i) precisely separate which parts of drive the statistical difficulty at accuracy , (ii) pin down the best achievable adaptivity (designing a single tester whose sample usage matches up to constants for every instance), and (iii) provide a sharp benchmark for lower bounds and algorithm design across qualitatively different regimes (e.g., near-uniform versus heavy-tailed or highly non-uniform references).
§ Known Partial Results
Canonne (2024): introduced the instance-optimal identity-testing formulation and gave upper and lower bounds on expressed via an -type quantity applied to a suitable truncation of the reference distribution .
Canonne (2024): In this line of work, the quasi-norm is typically written as for a nonnegative vector .
Canonne (2024): The resulting upper and lower bounds do not, in general, match within constant factors: for some choices of the multiplicative gap between the best known upper and lower bounds can be arbitrarily large.
Canonne (2024): Consequently, the identification of the correct functional that tightly characterizes up to universal constant factors remains open.
§ References
Open Problem: Tight Characterization of Instance-Optimal Identity Testing
Clément Canonne (2024)
Conference on Learning Theory (COLT), PMLR 247
📍 Open-problem note in COLT proceedings.
Open Problem: Tight Characterization of Instance-Optimal Identity Testing (PDF)
Clément Canonne (2024)
Conference on Learning Theory (COLT), PMLR 247
📍 Proceedings PDF.