Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games

§ Problem Statement

Setup

Let $T^d=(\mathbb{R}/\mathbb{Z})^d$ be the flat $d$ -torus with Lebesgue measure, and let $\mathcal{P}(T^d)$ be the set of Borel probability measures on $T^d$ . Fix $\beta>0$ and a smooth payoff $f\in C^\infty(T^d\times T^d)$ . Define, for $(\mu,\nu)\in\mathcal{P}(T^d)\times\mathcal{P}(T^d)$ ,

F_\beta(\mu,\nu)=\iint f(x,y)\,d\mu(x)\,d\nu(y)+\beta^{-1}H(\mu)-\beta^{-1}H(\nu),

where $H(\rho)=\int_{T^d} r\log r$ if $\rho$ has density $r$ w.r.t. Lebesgue measure and $H(\rho)=+\infty$ otherwise. In this setting $F_\beta$ has a unique saddle point $(\mu^\star,\nu^\star)$ (the entropy-regularized mixed Nash equilibrium).

Consider the single-timescale Wasserstein gradient descent-ascent (GDA) flow associated with $F_\beta$ : $\mu_t$ follows the Wasserstein gradient flow that decreases $\mu\mapsto F_\beta(\mu,\nu_t)$ while $\nu_t$ follows the Wasserstein gradient flow that increases $\nu\mapsto F_\beta(\mu_t,\nu)$ , using the same time parameter $t$ . For instance, when $\mu_t=m_t\,dx$ and $\nu_t=n_t\,dy$ have smooth positive densities, writing $\Phi_{\nu}(x)=\int f(x,y)\,d\nu(y)$ and $\Psi_{\mu}(y)=\int f(x,y)\,d\mu(x)$ , the formal PDE system is

\partial_t m_t=\nabla\cdot\big(m_t\nabla\Phi_{\nu_t}\big)+\beta^{-1}\Delta m_t,\qquad \partial_t n_t=-\nabla\cdot\big(n_t\nabla\Psi_{\mu_t}\big)+\beta^{-1}\Delta n_t,

with gradients and Laplacians on $T^d$ .

Unsolved Problem

For every smooth $f$ and every $\beta>0$ , do trajectories $(\mu_t,\nu_t)$ of this single-timescale Wasserstein GDA flow converge as $t\to\infty$ (e.g. weakly in $\mathcal{P}(T^d)$ for each marginal) to the unique saddle point $(\mu^\star,\nu^\star)$ ?

§ Discussion

Loading discussion…

§ Significance & Implications

This asks for a qualitative long-time convergence result for a coupled descent-ascent flow in Wasserstein space that models the mean-field (infinite-particle) limit of Langevin descent-ascent in entropy-regularized two-player zero-sum games. A proof (or a counterexample) would clarify whether the natural single-timescale min-max dynamics is intrinsically stabilizing at the PDE/measure level, beyond regimes where one can enforce convergence by separating ascent and descent timescales.

§ Known Partial Results

Wang et al. (2024): The functional $F_\beta$ is entropy-regularized (via $\beta^{-1}H(\mu)$ and $-\beta^{-1}H(\nu)$ ) and admits a unique saddle point $(\mu^\star,\nu^\star)$ , interpreted as the entropy-regularized mixed Nash equilibrium.
Wang et al. (2024): The associated Wasserstein gradient descent-ascent flow $(\mu_t,\nu_t)$ corresponds to the mean-field limit of a Langevin descent-ascent particle dynamics.
Wang et al. (2024): Convergence can be ensured by using different timescales for descent and ascent (a timescale-separated variant), but the single-timescale convergence question remains open for general smooth $f$ and $\beta>0$ .
Wang et al. (2024): The core difficulty is establishing (or refuting) global asymptotic convergence for this coupled min-max Wasserstein flow with currently available tools for long-time analysis in Wasserstein geometry.

§ References

[1]

Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games

Guillaume Wang, LÃ©naÃ¯c Chizat (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Open-problem note in COLT proceedings.

Link ↗

[2]

Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games (PDF)

Conference on Learning Theory (COLT), PMLR 247

📍 Proceedings PDF.

Link ↗

§ Tags

colt-open-problem learning-theory optimization game-theory