Unsolved

Optimal Instance-Dependent Sample Complexity for finding Nash Equilibrium in Two Player Zero-Sum Matrix games

Posed by Arnab Maiti (2025)

§ Problem Statement

Setup

Let A[1,1]n×mA \in [-1,1]^{n \times m} be an unknown payoff matrix of a two-player zero-sum matrix game. The row player chooses xΔn:={xR0n:i=1nxi=1}x \in \Delta_n := \{x \in \mathbb{R}^n_{\ge 0}: \sum_{i=1}^n x_i = 1\} and the column player chooses yΔmy \in \Delta_m; the expected payoff to the row player is xAyx^\top A y. The (minimax) value is

v(A):=maxxΔnminyΔmxAy=minyΔmmaxxΔnxAy.v(A) := \max_{x \in \Delta_n}\min_{y \in \Delta_m} x^\top A y = \min_{y \in \Delta_m}\max_{x \in \Delta_n} x^\top A y.

The algorithm does not observe AA directly. On each round tt, it adaptively selects an entry (it,jt)[n]×[m](i_t,j_t) \in [n] \times [m] and receives a random sample XtX_t such that E[Xtit,jt]=Aitjt\mathbb{E}[X_t \mid i_t,j_t] = A_{i_t j_t} and the noise XtAitjtX_t - A_{i_t j_t} is (say) conditionally 1-sub-Gaussian. For accuracy ε>0\varepsilon>0 and confidence δ(0,1)\delta \in (0,1), an output pair (x^,y^)Δn×Δm(\hat x,\hat y) \in \Delta_n \times \Delta_m is an ε\varepsilon-approximate Nash equilibrium (equivalently, an ε\varepsilon-approximate saddle point) if its exploitability is at most ε\varepsilon:

maxxΔnxAy^x^Ay^εandx^Ay^minyΔmx^Ayε.\max_{x\in\Delta_n} x^\top A\hat y - \hat x^\top A\hat y \le \varepsilon \quad\text{and}\quad \hat x^\top A\hat y - \min_{y\in\Delta_m} \hat x^\top A y \le \varepsilon.

Define the instance-dependent sample complexity T(A,ε,δ)T^*(A,\varepsilon,\delta) as the smallest integer TT for which there exists an adaptive algorithm that, on instance AA, makes at most TT noisy entry queries and outputs an ε\varepsilon-approximate Nash equilibrium with probability at least 1δ1-\delta.

Unsolved Problem

Characterize T(A,ε,δ)T^*(A,\varepsilon,\delta) for general two-player zero-sum matrix games under this noisy-entry sampling model by proving matching (up to universal constants and/or logarithmic factors) information-theoretic lower bounds and achievable upper bounds that depend on the specific matrix AA (not only on n,m,ε,δn,m,\varepsilon,\delta).

§ Discussion

Loading discussion…

§ Significance & Implications

An instance-dependent characterization would identify which concrete properties of a particular payoff matrix AA (e.g., entry-wise separations relevant to equilibrium optimality) control the minimal number of noisy samples needed for equilibrium identification, rather than only providing worst-case rates over all AA. This would extend the "gap-dependent" perspective from stochastic bandits to strategic decision problems where performance depends on a coupled minimax structure, and would yield algorithms with provably optimal adaptive sampling behavior on easy instances while certifying unavoidable difficulty on hard instances.

§ Known Partial Results

  • Maiti (2025): Instance-dependent (gap-dependent) sample complexity is well studied for stochastic multi-armed bandits; the COLT 2025 note highlights that an analogous sharp instance-dependent theory for noisy two-player zero-sum matrix games is largely unexplored.

  • Maiti (2025): The COLT 2025 open-problem note formulates the above noisy-entry model and explicitly asks for matching instance-dependent lower and upper bounds for identifying an ε\varepsilon-approximate Nash equilibrium in general n×mn \times m zero-sum matrix games.

§ References

[1]

Open Problem: Optimal Instance-Dependent Sample Complexity for finding Nash Equilibrium in Two Player Zero-Sum Matrix games

Arnab Maiti (2025)

Conference on Learning Theory (COLT), PMLR 291

📍 Open-problem note in COLT proceedings.

§ Tags