Do Good Algorithms Necessarily Query Bad Points?
Posed by Rong Ge et al. (2019)
§ Problem Statement
Setup
Let be a closed convex set. Consider stochastic convex optimization of
where is convex on and attains its minimum at some . A stochastic first-order oracle, when queried at , returns such that (unbiasedness)
where ; the problem class further specifies regularity/noise assumptions (e.g. a uniform second-moment bound for all and ).
For , define the minimax expected excess-risk rate after oracle calls by
where the infimum ranges over all (possibly randomized) algorithms that make oracle queries and output measurable with respect to .
Define a (stochastic) first-order algorithm to be non-adaptive if its query points are affine functions of past observed gradients with coefficients fixed before the run: there exist deterministic vectors and deterministic matrices (allowed to depend on and on known problem parameters, but not on the realized oracle randomness) such that for all ,
where is Euclidean projection onto .
Unsolved Problem
Problem 2019. For a given class , is it information-theoretically impossible for a non-adaptive algorithm to have uniformly minimax-optimal query points at all sufficiently large times? Concretely, must it be the case that for every non-adaptive algorithm and every constant there exist arbitrarily large for which
Equivalently, is it necessary that
possibly with a gap that grows (e.g. polylogarithmically) with for some natural choices of ?
§ Discussion
§ Significance & Implications
Minimax rates are defined for an algorithm's output after oracle calls, but many stochastic-approximation methods are used in "anytime" or streaming settings where the current iterate itself is deployed. This problem isolates whether non-adaptivity (fixed, pre-scheduled linear use of past gradients) fundamentally prevents having all query points achieve minimax-order excess risk, even when some function of the history (e.g. an averaged iterate) can be minimax-optimal. A negative answer would justify algorithmic designs that explicitly trade off statistical optimality against the reliability of intermediate iterates; a positive answer would imply that, at least information-theoretically, minimax-optimal performance need not require repeatedly querying provably bad points.
§ Known Partial Results
Ge et al. (2019): Classical stochastic-approximation results show that, for standard convex/noise classes , SGD with polynomially decaying step sizes combined with iterate averaging can achieve minimax-order excess risk (for the averaged output) under unbiased-gradient and moment conditions.
Ge et al. (2019): The COLT 2019 open problem asks whether such minimax-optimality of an output statistic can coexist with minimax-optimality of the raw query points for non-adaptive first-order methods, or whether some must remain separated from infinitely often in the worst case over .
§ References
Open Problem: Do Good Algorithms Necessarily Query Bad Points?
Rong Ge, Prateek Jain, Sham M. Kakade, Rahul Kidambi, Dheeraj M. Nagaraj, Praneeth Netrapalli (2019)
Conference on Learning Theory (COLT), PMLR 99
📍 Open-problem note in COLT proceedings.
Open Problem: Do Good Algorithms Necessarily Query Bad Points? (PDF)
Rong Ge, Prateek Jain, Sham M. Kakade, Rahul Kidambi, Dheeraj M. Nagaraj, Praneeth Netrapalli (2019)
Conference on Learning Theory (COLT), PMLR 99
📍 Proceedings PDF.