Anytime Convergence Rate of Gradient Descent

§ Problem Statement

Setup

Let $f: \mathbb{R}^d \to \mathbb{R}$ be a convex differentiable function with $L$ -Lipschitz gradient (" $L$ -smooth"), i.e., $\|\nabla f(x)-\nabla f(y)\| \le L\|x-y\|$ for all $x,y$ . Assume $f$ attains its minimum, and fix an initialization $x_0 \in \mathbb{R}^d$ with some minimizer $x^* \in \arg\min f$ and value $f^* = f(x^*)$ . Consider vanilla gradient descent with a predetermined (oblivious) stepsize sequence $(\eta_t)_{t\ge 0}$ (possibly depending on $L$ but not on the stopping time $T$ ):

x_{t+1} = x_t - \eta_t \nabla f(x_t), \qquad t=0,1,2,\dots.

The classical worst-case guarantee for suitable constant stepsizes gives an anytime bound of order $f(x_T)-f^* \le C\,L\|x_0-x^*\|^2/T$ holding for all $T\in\mathbb{N}$ and all $L$ -smooth convex $f$ , with a universal constant $C$ .

Unsolved Problem

Do stepsizes alone yield a strictly faster worst-case anytime rate on the actual iterate $x_T$ ? Equivalently, does there exist a stepsize sequence $(\eta_t)_{t\ge 0}$ and an exponent $\alpha>1$ such that for some universal constant $C<\infty$ , for every dimension $d$ , every $L$ -smooth convex $f$ attaining its minimum, every initialization $x_0$ , every choice of minimizer $x^*\in\arg\min f$ , and every $T\in\mathbb{N}$ ,

f(x_T)-f^* \le C\,\frac{L\|x_0-x^*\|^2}{T^\alpha}?

More generally, what is the best function $r(T)$ for which there exists an oblivious stepsize schedule such that $f(x_T)-f^* \le C\,L\|x_0-x^*\|^2\,r(T)$ holds simultaneously for all $T$ and all $L$ -smooth convex $f$ ?

§ Discussion

Loading discussion…

§ Significance & Implications

This problem isolates the power and limits of "stepsize-only" design for the most basic first-order method. An affirmative answer would show that, without momentum, restarts, or changing the update rule, gradient descent can achieve a worst-case guarantee that decays faster than $1/T$ at every time $T$ using a single oblivious schedule. A negative answer would establish an intrinsic separation between plain gradient descent and accelerated methods in the anytime (per-iterate) sense, pinpointing that any apparent acceleration from clever stepsizes must fail at some times or for some smooth convex objectives.

§ Known Partial Results

Kornowski et al. (2024): With constant stepsize $\eta_t\equiv\eta\in(0,2/L)$ , standard analysis yields an anytime worst-case bound of the form $f(x_T)-f^* \le C\,L\|x_0-x^*\|^2/T$ for all $T\in\mathbb{N}$ .
Kornowski et al. (2024): There exist non-constant stepsize schedules that achieve an accelerated bound at selected, pre-specified horizons (e.g., along a sparse subsequence of times), but such horizon-specific acceleration does not by itself imply an anytime bound for the specific iterate $x_T$ at every $T$ .
Kornowski et al. (2024): A horizon-specific guarantee can be converted into an anytime guarantee only for "best-so-far" performance (e.g., $\min_{t\le T} f(x_t)-f^*$ by selecting a nearby special horizon), which is strictly weaker than bounding $f(x_T)-f^*$ for each $T$ .
Kornowski et al. (2024): Any oblivious schedule that would improve the anytime worst-case rate beyond $O(1/T)$ uniformly over all $L$ -smooth convex objectives (in the sense $f(x_T)-f^* = o(L\|x_0-x^*\|^2/T)$ for all $T$ ) must use unbounded stepsizes, i.e., $\limsup_{t\to\infty} \eta_t = \infty$ .
Kornowski et al. (2024): Large individual steps can cause persistent large errors: in dimension $d=1$ , if at some time $T$ one has $\eta_T\ge 2/L$ and also $\sum_{t=0}^{T-1}\eta_t\ge 1/L$ , then there exists an $L$ -smooth convex $f$ such that
Kornowski et al. (2024): Consequently, if a schedule has infinitely many times where the ratio $\eta_T/\sum_{t<T}\eta_t$ is bounded away from $0$ while taking such "long" steps, then $f(x_T)-f^*$ can remain on the order of $L\|x_0-x^*\|^2$ at arbitrarily large times, ruling out any decaying anytime bound for that schedule.

§ References

[1]

Open Problem: Anytime Convergence Rate of Gradient Descent

Guy Kornowski, Ohad Shamir (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Open-problem note in COLT proceedings.

Link ↗

[2]

Open Problem: Anytime Convergence Rate of Gradient Descent (PDF)

Guy Kornowski, Ohad Shamir (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Proceedings PDF.

Link ↗

§ Tags

colt-open-problem learning-theory optimization