Optimal Instance-Dependent Sample Complexity for finding Nash Equilibrium in Two Player Zero-Sum Matrix games
Posed by Arnab Maiti (2025)
§ Problem Statement
Setup
Let be an unknown payoff matrix of a two-player zero-sum matrix game. The row player chooses and the column player chooses ; the expected payoff to the row player is . The (minimax) value is
The algorithm does not observe directly. On each round , it adaptively selects an entry and receives a random sample such that and the noise is (say) conditionally 1-sub-Gaussian. For accuracy and confidence , an output pair is an -approximate Nash equilibrium (equivalently, an -approximate saddle point) if its exploitability is at most :
Define the instance-dependent sample complexity as the smallest integer for which there exists an adaptive algorithm that, on instance , makes at most noisy entry queries and outputs an -approximate Nash equilibrium with probability at least .
Unsolved Problem
Characterize for general two-player zero-sum matrix games under this noisy-entry sampling model by proving matching (up to universal constants and/or logarithmic factors) information-theoretic lower bounds and achievable upper bounds that depend on the specific matrix (not only on ).
§ Discussion
§ Significance & Implications
An instance-dependent characterization would identify which concrete properties of a particular payoff matrix (e.g., entry-wise separations relevant to equilibrium optimality) control the minimal number of noisy samples needed for equilibrium identification, rather than only providing worst-case rates over all . This would extend the "gap-dependent" perspective from stochastic bandits to strategic decision problems where performance depends on a coupled minimax structure, and would yield algorithms with provably optimal adaptive sampling behavior on easy instances while certifying unavoidable difficulty on hard instances.
§ Known Partial Results
Maiti (2025): Instance-dependent (gap-dependent) sample complexity is well studied for stochastic multi-armed bandits; the COLT 2025 note highlights that an analogous sharp instance-dependent theory for noisy two-player zero-sum matrix games is largely unexplored.
Maiti (2025): The COLT 2025 open-problem note formulates the above noisy-entry model and explicitly asks for matching instance-dependent lower and upper bounds for identifying an -approximate Nash equilibrium in general zero-sum matrix games.
§ References
Arnab Maiti (2025)
Conference on Learning Theory (COLT), PMLR 291
📍 Open-problem note in COLT proceedings.
Arnab Maiti (2025)
Conference on Learning Theory (COLT), PMLR 291
📍 Proceedings PDF.