Unsolved

Minimax Rate for Wasserstein Distance Estimation in High Dimensions

§ Problem Statement

Setup

Fix dNd \in \mathbb{N}, p1p \ge 1, and sample sizes n,mNn,m \in \mathbb{N}. Let Pd\mathcal{P}_d be the set of all Borel probability measures on [0,1]dRd[0,1]^d \subset \mathbb{R}^d. For P,QPdP,Q \in \mathcal{P}_d, define the pp-Wasserstein distance

Wp(P,Q):=(infπΠ(P,Q)[0,1]d×[0,1]dxy2pdπ(x,y))1/p,W_p(P,Q):=\left(\inf_{\pi \in \Pi(P,Q)} \int_{[0,1]^d \times [0,1]^d} \|x-y\|_2^p \, d\pi(x,y)\right)^{1/p},

where Π(P,Q)\Pi(P,Q) is the set of all couplings of PP and QQ.

Assume we observe independent samples X1,,Xni.i.d.PX_1,\dots,X_n \stackrel{\mathrm{i.i.d.}}{\sim} P and Y1,,Ymi.i.d.QY_1,\dots,Y_m \stackrel{\mathrm{i.i.d.}}{\sim} Q, with the XX-sample independent of the YY-sample. An estimator of Wp(P,Q)W_p(P,Q) is any measurable map W^=W^(X1,,Xn,Y1,,Ym)R\widehat W=\widehat W(X_1,\dots,X_n,Y_1,\dots,Y_m) \in \mathbb{R}. Its worst-case squared-error risk over Pd\mathcal{P}_d is

Rn,m,d,p(W^):=supP,QPdEPnQm ⁣[(W^Wp(P,Q))2].\mathcal{R}_{n,m,d,p}(\widehat W):=\sup_{P,Q \in \mathcal{P}_d} \mathbb{E}_{P^n \otimes Q^m}\!\left[\left(\widehat W-W_p(P,Q)\right)^2\right].

Define the minimax risk

Mn,m,d,p:=infW^Rn,m,d,p(W^).\mathfrak{M}_{n,m,d,p}:=\inf_{\widehat W}\mathcal{R}_{n,m,d,p}(\widehat W).

Unsolved Problem

Determine the sharp dependence of Mn,m,d,p\mathfrak{M}_{n,m,d,p} on (n,m,d,p)(n,m,d,p) (up to constants, and logarithmic factors where unavoidable) for the full class Pd\mathcal{P}_d. A key distinction is between: (1) rates for empirical-measure approximation (Wp(Pn,P)W_p(P_n,P) and Wp(Qm,Q)W_p(Q_m,Q)), and (2) rates for direct estimation of the functional Wp(P,Q)W_p(P,Q) from samples. In high dimension (notably regimes like d>2pd>2p), does the empirical plug-in estimator

W^emp:=Wp(Pn,Qm),Pn:=1ni=1nδXi,Qm:=1mj=1nδYj,\widehat W_{\mathrm{emp}}:=W_p(P_n,Q_m), \qquad P_n:=\frac1n\sum_{i=1}^n \delta_{X_i}, \quad Q_m:=\frac1m\sum_{j=1}^n \delta_{Y_j},

achieve minimax-optimal squared risk over unrestricted Pd\mathcal{P}_d, or can one do strictly better (possibly under additional smoothness/separation assumptions)? Current literature does not resolve this for the fully nonparametric class.

§ Discussion

Loading discussion…

§ Significance & Implications

Optimal transport and Wasserstein distances are central in statistics and machine learning. For unrestricted high-dimensional distributions, empirical Wasserstein convergence shows strong dimensional effects (with regime changes around d=2pd=2p), but those results do not by themselves settle minimax optimality for direct estimation of Wp(P,Q)W_p(P,Q) under squared loss. Clarifying this gap is important both theoretically and for practical sample-complexity guidance.

§ Known Partial Results

  • Weed et al. (2019): Empirical-measure estimation (not direct functional estimation): Weed & Bach (2019) gives sharp benchmark rates for EWp(Pn,P)\mathbb{E}W_p(P_n,P) over broad classes on [0,1]d[0,1]^d, with regime split at d=2pd=2p: typically n1/2n^{-1/2} for d<2pd<2p, n1/2(logn)1/pn^{-1/2}(\log n)^{1/p} at d=2pd=2p, and n1/dn^{-1/d} for d>2pd>2p (for unsquared loss; squared-loss exponents double).

  • Niles-Weed et al. (2022): Plug-in implications for Wp(P,Q)W_p(P,Q): triangle-inequality arguments transfer empirical-measure upper bounds to Wp(Pn,Qm)Wp(P,Q)|W_p(P_n,Q_m)-W_p(P,Q)|, but these are indirect and do not by themselves prove minimax optimality for direct functional estimation under squared risk.

  • Niles-Weed et al. (2022): Direct Wasserstein-functional estimation: Niles-Weed & Rigollet (2022) provides model-based lower/upper bounds (spiked transport model) showing nontrivial gaps and that structure can help, without resolving the unrestricted minimax rate over all Pd\mathcal P_d.

  • Manole et al. (2024): Smooth-cost / separation regimes: Manole & Niles-Weed (2024) establishes sharp empirical OT rates under smooth-cost regularity assumptions; these results clarify favorable structured regimes but do not close the general worst-case two-sample functional-estimation question.

  • Niles-Weed et al. (2022): Current status: the full minimax characterization for estimating Wp(P,Q)W_p(P,Q) over unrestricted Pd\mathcal P_d remains open, especially in high-dimensional regimes such as d>2pd>2p.

§ References

[1]

Estimation of Wasserstein distances in the spiked transport model

Jonathan Niles-Weed, Philippe Rigollet (2022)

Bernoulli

📍 Section 3.3 (Lower bounds), paragraph after Theorem 3 (states that closing the rate gap for Wasserstein-distance estimation is a fundamental open question).

[2]

Sharp convergence rates for empirical optimal transport with smooth costs

Tudor Manole, Jonathan Niles-Weed (2024)

Annals of Applied Probability

[3]

Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance

Jonathan Weed, Francis Bach (2019)

Bernoulli

📍 Sections 1 and 3 (dimension-dependent rates for empirical-measure convergence in $W_p$, including low-dimensional boundary-log behavior).

§ Tags