Unsolved

Tight Convergence of SGD in Constant Dimension

Posed by Tomer Koren et al. (2020)

§ Problem Statement

Setup

Fix a dimension d1d\ge 1 that is a constant independent of the iteration budget TT. Let KRd\mathcal K\subseteq\mathbb R^d be a nonempty closed convex set with finite Euclidean diameter D:=sup{xy2:x,yK}<D:=\sup\{\|x-y\|_2:x,y\in\mathcal K\}<\infty. Let f:KRf:\mathcal K\to\mathbb R be convex and attain its minimum on K\mathcal K, and fix some minimizer xargminxKf(x)x^*\in\arg\min_{x\in\mathcal K} f(x). Assume access to a stochastic first-order oracle such that, for each query point xKx\in\mathcal K, it returns a random vector g(x)g(x) satisfying (i) unbiasedness: E[g(x)]f(x)\mathbb E[g(x)]\in\partial f(x), and (ii) bounded second moment: E[g(x)22]G2\mathbb E[\|g(x)\|_2^2]\le G^2 for a known constant G<G<\infty.

Projected stochastic gradient descent (SGD) starts from some x1Kx_1\in\mathcal K and iterates

xt+1=ΠK(xtηtg(xt)),t=1,2,,T1,x_{t+1}=\Pi_{\mathcal K}\bigl(x_t-\eta_t g(x_t)\bigr),\qquad t=1,2,\dots,T-1,

where ΠK\Pi_{\mathcal K} is Euclidean projection and (ηt)t1(\eta_t)_{t\ge 1} is a stepsize schedule that may depend on T,d,D,GT,d,D,G but not on additional unknown properties of (f,oracle)(f,\text{oracle}) beyond the model above. The performance measure is the expected last-iterate suboptimality E[f(xT)f(x)]\mathbb E[f(x_T)-f(x^*)], where the expectation is over the oracle randomness.

Unsolved Problem

Characterize the minimax-optimal dependence on TT of the worst-case expected last-iterate suboptimality achievable by projected SGD over the above problem class when dd is fixed. In particular, determine whether there exist constants c1,c2>0c_1,c_2>0 (depending only on d,D,Gd,D,G) and a stepsize schedule such that for all TT,

c1T1/2  sup(f,oracle)E[f(xT)f(x)]  c2T1/2,c_1\,T^{-1/2}\ \le\ \sup_{(f,\text{oracle})}\mathbb E[f(x_T)-f(x^*)]\ \le\ c_2\,T^{-1/2},

and whether this Θ(1/T)\Theta(1/\sqrt T) characterization already holds in the one-dimensional case d=1d=1.

§ Discussion

Loading discussion…

§ Significance & Implications

SGD is typically deployed without iterate averaging, so last-iterate guarantees govern the algorithm's actual output. A tight minimax characterization in fixed (especially one) dimension would resolve a basic mismatch between existing upper and lower bounds for E[f(xT)f(x)]\mathbb E[f(x_T)-f(x^*)] in the standard stochastic subgradient oracle model, clarifying whether last-iterate SGD fundamentally matches the canonical 1/T1/\sqrt T stochastic-optimization rate in low dimensions or is intrinsically slower.

§ Known Partial Results

  • Koren et al. (2020): Koren and Segal (COLT 2020 open problem) highlight that, when the dimension dd is a constant independent of TT, known upper and lower bounds for the expected last-iterate suboptimality of projected SGD do not match.

  • Koren et al. (2020): They emphasize that this gap persists even for d=1d=1.

  • Koren et al. (2020): They provide evidence in the one-dimensional case consistent with a Θ(1/T)\Theta(1/\sqrt T) last-iterate rate.

  • Koren et al. (2020): They conjecture that the same Θ(1/T)\Theta(1/\sqrt T) last-iterate rate should hold for any constant dimension dd.

§ References

[1]

Open Problem: Tight Convergence of SGD in Constant Dimension

Tomer Koren, Shahar Segal (2020)

Conference on Learning Theory (COLT), PMLR 125

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Tight Convergence of SGD in Constant Dimension (PDF)

Tomer Koren, Shahar Segal (2020)

Conference on Learning Theory (COLT), PMLR 125

📍 Proceedings PDF.

§ Tags