Unsolved

Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games

Posed by Guillaume Wang et al. (2024)

§ Problem Statement

Setup

Let Td=(R/Z)dT^d=(\mathbb{R}/\mathbb{Z})^d be the flat dd-torus with Lebesgue measure, and let P(Td)\mathcal{P}(T^d) be the set of Borel probability measures on TdT^d. Fix β>0\beta>0 and a smooth payoff fC(Td×Td)f\in C^\infty(T^d\times T^d). Define, for (μ,ν)P(Td)×P(Td)(\mu,\nu)\in\mathcal{P}(T^d)\times\mathcal{P}(T^d),

Fβ(μ,ν)=f(x,y)dμ(x)dν(y)+β1H(μ)β1H(ν),F_\beta(\mu,\nu)=\iint f(x,y)\,d\mu(x)\,d\nu(y)+\beta^{-1}H(\mu)-\beta^{-1}H(\nu),

where H(ρ)=TdrlogrH(\rho)=\int_{T^d} r\log r if ρ\rho has density rr w.r.t. Lebesgue measure and H(ρ)=+H(\rho)=+\infty otherwise. In this setting FβF_\beta has a unique saddle point (μ,ν)(\mu^\star,\nu^\star) (the entropy-regularized mixed Nash equilibrium).

Consider the single-timescale Wasserstein gradient descent-ascent (GDA) flow associated with FβF_\beta: μt\mu_t follows the Wasserstein gradient flow that decreases μFβ(μ,νt)\mu\mapsto F_\beta(\mu,\nu_t) while νt\nu_t follows the Wasserstein gradient flow that increases νFβ(μt,ν)\nu\mapsto F_\beta(\mu_t,\nu), using the same time parameter tt. For instance, when μt=mtdx\mu_t=m_t\,dx and νt=ntdy\nu_t=n_t\,dy have smooth positive densities, writing Φν(x)=f(x,y)dν(y)\Phi_{\nu}(x)=\int f(x,y)\,d\nu(y) and Ψμ(y)=f(x,y)dμ(x)\Psi_{\mu}(y)=\int f(x,y)\,d\mu(x), the formal PDE system is

tmt=(mtΦνt)+β1Δmt,tnt=(ntΨμt)+β1Δnt,\partial_t m_t=\nabla\cdot\big(m_t\nabla\Phi_{\nu_t}\big)+\beta^{-1}\Delta m_t,\qquad \partial_t n_t=-\nabla\cdot\big(n_t\nabla\Psi_{\mu_t}\big)+\beta^{-1}\Delta n_t,

with gradients and Laplacians on TdT^d.

Unsolved Problem

For every smooth ff and every β>0\beta>0, do trajectories (μt,νt)(\mu_t,\nu_t) of this single-timescale Wasserstein GDA flow converge as tt\to\infty (e.g. weakly in P(Td)\mathcal{P}(T^d) for each marginal) to the unique saddle point (μ,ν)(\mu^\star,\nu^\star)?

§ Discussion

Loading discussion…

§ Significance & Implications

This asks for a qualitative long-time convergence result for a coupled descent-ascent flow in Wasserstein space that models the mean-field (infinite-particle) limit of Langevin descent-ascent in entropy-regularized two-player zero-sum games. A proof (or a counterexample) would clarify whether the natural single-timescale min-max dynamics is intrinsically stabilizing at the PDE/measure level, beyond regimes where one can enforce convergence by separating ascent and descent timescales.

§ Known Partial Results

  • Wang et al. (2024): The functional FβF_\beta is entropy-regularized (via β1H(μ)\beta^{-1}H(\mu) and β1H(ν)-\beta^{-1}H(\nu)) and admits a unique saddle point (μ,ν)(\mu^\star,\nu^\star), interpreted as the entropy-regularized mixed Nash equilibrium.

  • Wang et al. (2024): The associated Wasserstein gradient descent-ascent flow (μt,νt)(\mu_t,\nu_t) corresponds to the mean-field limit of a Langevin descent-ascent particle dynamics.

  • Wang et al. (2024): Convergence can be ensured by using different timescales for descent and ascent (a timescale-separated variant), but the single-timescale convergence question remains open for general smooth ff and β>0\beta>0.

  • Wang et al. (2024): The core difficulty is establishing (or refuting) global asymptotic convergence for this coupled min-max Wasserstein flow with currently available tools for long-time analysis in Wasserstein geometry.

§ References

[1]

Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games

Guillaume Wang, Lénaïc Chizat (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Open-problem note in COLT proceedings.

[2]

Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games (PDF)

Guillaume Wang, Lénaïc Chizat (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Proceedings PDF.

§ Tags