Unsolved

Optimal FDR-FNR tradeoff beyond independent two-group mixtures

Sourced from the work of Yutong Nie, Yihong Wu

§ Problem Statement

Setup

For each dimension n1n\ge 1, let (X(n),H(n))(X^{(n)},H^{(n)}) be a random pair with X(n)=(X1,,Xn)XnX^{(n)}=(X_1,\dots,X_n)\in\mathcal X^n observed and H(n)=(H1,,Hn){0,1}nH^{(n)}=(H_1,\dots,H_n)\in\{0,1\}^n latent, where Hi=0H_i=0 means the ii-th null hypothesis is true and Hi=1H_i=1 means it is false. No independence is assumed: the coordinates of H(n)H^{(n)} may be dependent, and the conditional law of X(n)X^{(n)} given H(n)H^{(n)} may have arbitrary dependence across coordinates. Let PnP_n denote the joint law of (X(n),H(n))(X^{(n)},H^{(n)}).

A multiple-testing rule is a measurable map δn:Xn{0,1}n\delta_n:\mathcal X^n\to\{0,1\}^n, with δn,i(X(n))=1\delta_{n,i}(X^{(n)})=1 meaning reject hypothesis ii. Define

Rn=i=1nδn,i,Vn=i=1n(1Hi)δn,i,Un=nRn,Wn=i=1nHi(1δn,i).R_n=\sum_{i=1}^n \delta_{n,i},\qquad V_n=\sum_{i=1}^n (1-H_i)\delta_{n,i},\qquad U_n=n-R_n,\qquad W_n=\sum_{i=1}^n H_i(1-\delta_{n,i}).

Here VnV_n is the number of false discoveries and WnW_n is the number of false non-discoveries. The false discovery rate and false non-discovery rate under PnP_n are

FDRPn(δn)=EPn ⁣[VnRn1],FNRPn(δn)=EPn ⁣[WnUn1].\mathrm{FDR}_{P_n}(\delta_n)=\mathbb E_{P_n}\!\left[\frac{V_n}{R_n\vee 1}\right],\qquad \mathrm{FNR}_{P_n}(\delta_n)=\mathbb E_{P_n}\!\left[\frac{W_n}{U_n\vee 1}\right].

Fix α(0,1)\alpha\in(0,1). for a dependent model sequence P=(Pn)n1P=(P_n)_{n\ge1}, define

Ψdep(α;P)=lim infn infδn: FDRPn(δn)α FNRPn(δn).\Psi_{\mathrm{dep}}(\alpha;P) =\liminf_{n\to\infty}\ \inf_{\delta_n:\ \mathrm{FDR}_{P_n}(\delta_n)\le \alpha}\ \mathrm{FNR}_{P_n}(\delta_n).

Unsolved Problem

Characterize Ψdep(α;P)\Psi_{\mathrm{dep}}(\alpha;P) for broad classes of dependent high-dimensional models PP (beyond independent two-group mixtures), and determine whether there is a strict asymptotic performance gap between compound rules (each δn,i\delta_{n,i} may depend on all of X(n)X^{(n)}) and separable rules (each δn,i\delta_{n,i} depends only on XiX_i, possibly with external randomization).

§ Discussion

Loading discussion…

§ Significance & Implications

Many practically important large-scale testing settings exhibit substantial dependence across hypotheses. Extending fundamental limit theory to dependent settings is important for understanding whether existing procedures remain near-optimal or can be substantially improved; see Nie & Wu (2023).

§ Known Partial Results

  • Nie et al. (2026): The paper resolves the asymptotic frontier for the two-group random mixture model (and fixed non-null proportion extensions), including Gaussian location examples, under the model assumptions studied there. Targeted follow-up check through 2026-02-17 did not find a definitive post-2024 resolution of the strongly dependent-model tradeoff question in this generality.

§ References

[1]

Large-scale Multiple Testing: Fundamental Limits of False Discovery Rate Control and Compound Oracle

Yutong Nie, Yihong Wu (2026)

Annals of Statistics (forthcoming; listed as 2026 in author publication records)

📍 arXiv v3, Section 6.3 ("Weakly dependent data"), paragraph immediately preceding Theorem 24: "Characterizing the optimal FDR-FNR tradeoff for models with strongly dependent data is an open problem."

Primary source paper; arXiv v3 is the fixed accessible version used for locator text.

§ Tags