Anytime Convergence Rate of Gradient Descent
Posed by Guy Kornowski et al. (2024)
§ Problem Statement
Setup
Let be a convex differentiable function with -Lipschitz gradient ("-smooth"), i.e., for all . Assume attains its minimum, and fix an initialization with some minimizer and value . Consider vanilla gradient descent with a predetermined (oblivious) stepsize sequence (possibly depending on but not on the stopping time ):
The classical worst-case guarantee for suitable constant stepsizes gives an anytime bound of order holding for all and all -smooth convex , with a universal constant .
Unsolved Problem
Do stepsizes alone yield a strictly faster worst-case anytime rate on the actual iterate ? Equivalently, does there exist a stepsize sequence and an exponent such that for some universal constant , for every dimension , every -smooth convex attaining its minimum, every initialization , every choice of minimizer , and every ,
More generally, what is the best function for which there exists an oblivious stepsize schedule such that holds simultaneously for all and all -smooth convex ?
§ Discussion
§ Significance & Implications
This problem isolates the power and limits of "stepsize-only" design for the most basic first-order method. An affirmative answer would show that, without momentum, restarts, or changing the update rule, gradient descent can achieve a worst-case guarantee that decays faster than at every time using a single oblivious schedule. A negative answer would establish an intrinsic separation between plain gradient descent and accelerated methods in the anytime (per-iterate) sense, pinpointing that any apparent acceleration from clever stepsizes must fail at some times or for some smooth convex objectives.
§ Known Partial Results
Kornowski et al. (2024): With constant stepsize , standard analysis yields an anytime worst-case bound of the form for all .
Kornowski et al. (2024): There exist non-constant stepsize schedules that achieve an accelerated bound at selected, pre-specified horizons (e.g., along a sparse subsequence of times), but such horizon-specific acceleration does not by itself imply an anytime bound for the specific iterate at every .
Kornowski et al. (2024): A horizon-specific guarantee can be converted into an anytime guarantee only for "best-so-far" performance (e.g., by selecting a nearby special horizon), which is strictly weaker than bounding for each .
Kornowski et al. (2024): Any oblivious schedule that would improve the anytime worst-case rate beyond uniformly over all -smooth convex objectives (in the sense for all ) must use unbounded stepsizes, i.e., .
Kornowski et al. (2024): Large individual steps can cause persistent large errors: in dimension , if at some time one has and also , then there exists an -smooth convex such that
Kornowski et al. (2024): Consequently, if a schedule has infinitely many times where the ratio is bounded away from while taking such "long" steps, then can remain on the order of at arbitrarily large times, ruling out any decaying anytime bound for that schedule.
§ References
Open Problem: Anytime Convergence Rate of Gradient Descent
Guy Kornowski, Ohad Shamir (2024)
Conference on Learning Theory (COLT), PMLR 247
📍 Open-problem note in COLT proceedings.
Open Problem: Anytime Convergence Rate of Gradient Descent (PDF)
Guy Kornowski, Ohad Shamir (2024)
Conference on Learning Theory (COLT), PMLR 247
📍 Proceedings PDF.