Unsolved

The Oracle Complexity of Convex Optimization with Limited Memory

Posed by Blake Woodworth et al. (2019)

§ Problem Statement

Setup

Fix dimension dNd\in\mathbb{N} and parameters L,B,ϵ>0L,B,\epsilon>0. Let FL,B\mathcal{F}_{L,B} be the class of convex functions F:B2(B)RF:B_2(B)\to\mathbb{R} on the Euclidean ball B2(B)={xRd:x2B}B_2(B)=\{x\in\mathbb{R}^d:\|x\|_2\le B\} that are LL-Lipschitz in 2\|\cdot\|_2, i.e., F(x)F(y)Lxy2|F(x)-F(y)|\le L\|x-y\|_2 for all x,yB2(B)x,y\in B_2(B) (equivalently, every subgradient satisfies g2L\|g\|_2\le L wherever it exists). A first-order oracle, queried at xB2(B)x\in B_2(B), returns a pair (F(x),g)(F(x),g) where gF(x)g\in\partial F(x) is an arbitrary valid subgradient.

A deterministic first-order algorithm with memory budget MM bits and TT oracle queries is an interactive procedure that, between queries, retains only a state μt{0,1}M\mu_t\in\{0,1\}^M (initialized to μ1=0\mu_1=0). At round tt, it chooses xt=ϕt(μt)B2(B)x_t=\phi_t(\mu_t)\in B_2(B), receives (F(xt),gt)(F(x_t),g_t) with gtF(xt)g_t\in\partial F(x_t), updates μt+1=ψt(μt,F(xt),gt)\mu_{t+1}=\psi_t(\mu_t,F(x_t),g_t), and after TT rounds outputs x^=ζ(μT+1)B2(B)\hat x=\zeta(\mu_{T+1})\in B_2(B). The algorithm may perform arbitrary computation during a round, but only MM bits persist across rounds.

Define the minimax memory-bounded oracle complexity TL,B(d,M,ϵ)T_{L,B}(d,M,\epsilon) as the smallest TNT\in\mathbb{N} for which there exists such an MM-bit deterministic algorithm guaranteeing

supFFL,B supg1F(x1),,gTF(xT) (F(x^)minxB2(B)F(x))ϵ.\sup_{F\in\mathcal{F}_{L,B}}\ \sup_{g_1\in\partial F(x_1),\ldots,g_T\in\partial F(x_T)}\ \Big(F(\hat x)-\min_{x\in B_2(B)}F(x)\Big)\le\epsilon.

(If no such TT exists, set TL,B(d,M,ϵ)=T_{L,B}(d,M,\epsilon)=\infty.)

Unsolved Problem

Problem 2019. Characterize TL,B(d,M,ϵ)T_{L,B}(d,M,\epsilon), i.e., the achievable query-memory tradeoffs (T,M)(T,M), up to constant factors (and at most polylogarithmic factors). In particular, determine whether one can achieve near-optimal first-order query complexity T=O~(dpolylog(LB/ϵ))T=\tilde O\big(d\,\mathrm{polylog}(LB/\epsilon)\big) with subquadratic memory M=O(d2δ)M=O(d^{2-\delta}) for some fixed δ>0\delta>0, or whether achieving such TT inherently requires (up to logarithmic factors) Ω(d2)\Omega(d^2) bits of memory.

§ Discussion

Loading discussion…

§ Significance & Implications

For nonsmooth convex optimization with an LL-Lipschitz objective over a radius-BB Euclidean ball, the optimal first-order oracle query complexity (without memory constraints) scales as Θ(dlog(LB/ϵ))\Theta\big(d\log(LB/\epsilon)\big) in the worst case, and is achieved by cutting-plane/center-of-mass type methods. However, known near-optimal-query methods maintain rich geometric information (e.g., an evolving polytope/ellipsoid defined by many past cuts), suggesting a quadratic-in-dd memory footprint. This problem asks whether that quadratic memory is an artifact of current implementations or an information-theoretic necessity, thereby isolating the fundamental tradeoff between two core resources for first-order optimization: number of oracle queries versus persistent memory.

§ Known Partial Results

  • Woodworth et al. (2019): Near-optimal query rates are achievable without memory limits: classical cutting-plane methods (e.g., center-of-mass) attain O(dlog(LB/ϵ))O\big(d\log(LB/\epsilon)\big) first-order queries for Lipschitz convex optimization over B2(B)B_2(B), and this dependence is minimax-optimal in the regime considered by the note.

  • Woodworth et al. (2019): Low-memory, simple iterative methods can be made memory-efficient but require many more queries: a discretized, memory-bounded gradient-descent-type scheme achieves ϵ\epsilon-suboptimality using O((L2B2)/ϵ2)O\big((L^2B^2)/\epsilon^2\big) oracle queries with O(dlog(LB/ϵ))O\big(d\log(LB/\epsilon)\big) bits of persistent memory (note Appendix A, Theorem 1).

  • Woodworth et al. (2019): Any method that succeeds uniformly over all FFL,BF\in\mathcal{F}_{L,B} must store nontrivial information: for ϵLB/2\epsilon\le LB/2, at least Ω(dlog(LB/(2ϵ)))\Omega\big(d\log(LB/(2\epsilon))\big) bits of memory are necessary even to output an ϵ\epsilon-suboptimal point in the worst case (note Appendix C, Theorem 5).

  • Woodworth et al. (2019): The note gives an explicit finite-precision/memory accounting showing that a center-of-mass-style method can retain the O(dlog(LB/ϵ))O\big(d\log(LB/\epsilon)\big) query bound while using O(d2log2(LB/ϵ))O\big(d^2\log^2(LB/\epsilon)\big) bits of memory (note Appendix B).

  • Woodworth et al. (2019): The intermediate tradeoff region is open: it is not known whether one can simultaneously approach the O~(d)\tilde O(d)-query regime and use near-linear (in dd) memory, or whether improving query complexity beyond the Θ(1/ϵ2)\Theta(1/\epsilon^2) regime forces (up to logarithmic factors) quadratic memory in dd.

§ References

[1]

Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory

Blake Woodworth, Nathan Srebro (2019)

Conference on Learning Theory (COLT), PMLR 99

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory (PDF)

Blake Woodworth, Nathan Srebro (2019)

Conference on Learning Theory (COLT), PMLR 99

📍 Proceedings PDF.

§ Tags