UnsolvedMajor Solve Problem

Optimal Mixing Time for Log-Concave Sampling

§ Problem Statement

Setup

Let n1n \ge 1, let V:RnRV:\mathbb{R}^n \to \mathbb{R} be twice continuously differentiable, and define the target probability measure

π(dx)=1ZeV(x)dx,Z=RneV(x)dx<.\pi(dx)=\frac{1}{Z}e^{-V(x)}\,dx,\qquad Z=\int_{\mathbb{R}^n}e^{-V(x)}dx<\infty.

Assume VV is μ\mu-strongly convex and LL-smooth, meaning

μIn2V(x)LInfor all xRn,\mu I_n \preceq \nabla^2 V(x)\preceq L I_n\quad\text{for all }x\in\mathbb{R}^n,

with condition number κ:=L/μ\kappa:=L/\mu. For ε(0,1)\varepsilon\in(0,1), define total-variation mixing time of a Markov process (Xt)(X_t) with invariant law π\pi by

tmix(ε):=inf{t0:supxRnL(XtX0=x)πTVε}.t_{\mathrm{mix}}(\varepsilon):=\inf\Big\{t\ge 0:\sup_{x\in\mathbb{R}^n}\| \mathcal{L}(X_t\mid X_0=x)-\pi\|_{\mathrm{TV}}\le \varepsilon\Big\}.

The study of MCMC mixing for log-concave targets goes back to Dyer et al. (1991); non-asymptotic guarantees for the (overdamped) Langevin algorithm were obtained by Dalalyan (2017). For the underdamped Langevin diffusion on (Xt,Vt)Rn×Rn(X_t,V_t)\in\mathbb{R}^n\times\mathbb{R}^n,

dXt=Vtdt,dVt=γVtdtV(Xt)dt+2γdBt,dX_t=V_t\,dt,\qquad dV_t=-\gamma V_t\,dt-\nabla V(X_t)\,dt+\sqrt{2\gamma}\,dB_t,

where γ>0\gamma>0 and (Bt)(B_t) is standard nn-dimensional Brownian motion, the invariant law on phase space is proportional to eV(x)v2/2e^{-V(x)-\|v\|^2/2}, whose xx-marginal is π\pi.

Unsolved Problem

Determine the optimal worst-case dependence on (n,κ,ε)(n,\kappa,\varepsilon) of the mixing time (or, for time-discretizations such as Langevin Monte Carlo, the number of gradient evaluations) needed to obtain ε\varepsilon-accuracy in total variation for sampling from this class of targets. In particular, is it possible to achieve, up to polylogarithmic factors,

tmix(ε)κn1/2log(1/ε)t_{\mathrm{mix}}(\varepsilon)\lesssim \sqrt{\kappa}\,n^{1/2}\,\log(1/\varepsilon)

for general strongly log-concave targets? Existing lower bounds are metric- and model-dependent (continuous-time diffusion vs discretized chains), so precise matching minimax TV scaling remains unresolved. See also Chen et al. (2022) and the closely related KLS conjecture.

§ Discussion

Loading discussion…

§ Significance & Implications

Sampling from log-concave distributions is a fundamental primitive in statistics, machine learning, and Bayesian inference. A sharp complexity theory should distinguish clearly between (i) continuous-time diffusion contraction rates, (ii) discretization error, and (iii) the chosen metric (TV, KL, or W2W_2). For example, Vempala & Wibisono (2019) analyze discretized overdamped Langevin (ULA) in KL/Renyi-type divergences under functional-inequality assumptions (with linear condition-number dependence in their regime, up to logarithmic factors and bias terms), while Cheng et al. (2018) give non-asymptotic bounds for a discretized underdamped method in W2W_2 (not TV). This problem asks for the optimal TV complexity dependence on (n,κ,ε)(n,\kappa,\varepsilon), which is still unresolved.

§ Known Partial Results

  • Vempala et al. (2019): Overdamped Langevin: distinguish diffusion vs discretization. Continuous-time diffusion has exponential convergence under strong log-concavity/LSI (typically shown in KL or W2W_2), while discretized ULA bounds such as Vempala & Wibisono (2019) are stated in divergence metrics (KL/Renyi-type) and include discretization-bias considerations.

  • Cheng et al. (2018): Underdamped Langevin (discretized): Cheng et al. (2018) provide non-asymptotic guarantees in W2W_2; their headline bound is for W2W_2 accuracy, not TV.

  • Chen et al. (2022) explicitly notes an open discretization-analysis gap (at the time) for obtaining linear condition-number dependence under α\alpha-LSI/α\alpha-PI.

  • Chen et al. (2022): The KLS conjecture is widely expected to improve dimension dependence for isotropic log-concave sampling/mixing bounds, but the exact implied exponent and algorithm/metric dependence should be stated with theorem-specific citations when used.

  • Chen et al. (2022): Status: open.

§ References

[1]

Improved analysis for a proximal algorithm for sampling

Yongxin Chen, Sinho Chewi, Adil Salim, Andre Wibisono (2022)

Proceedings of Thirty Fifth Conference on Learning Theory (PMLR 178)

📍 Section 4.2 (Applications), numbered list item 2, p. 8 (PMLR PDF): discussion that linear-in-condition-number discretization analysis for Langevin under $\alpha$-LSI/$\alpha$-PI was not known at that time.

[3]

Underdamped Langevin MCMC: A non-asymptotic analysis

Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael I. Jordan (2018)

[4]

A random polynomial-time algorithm for approximating the volume of convex bodies

Martin Dyer, Alan Frieze, Ravi Kannan (1991)

Journal of the ACM

📍 Section 3 (Random walks on convex bodies), Theorem 3.1 (rapid mixing of the ball walk via conductance), pp. 8-10.

[5]

Theoretical guarantees for approximate sampling from smooth and log-concave density

Arnak S. Dalalyan (2017)

Journal of the Royal Statistical Society Series B

📍 Section 3 (Main results), Theorem 1 (non-asymptotic convergence bound for unadjusted Langevin algorithm in TV distance), p. 658.

§ Tags