Unsolved

Polynomial-time minimax robust mean estimation under star-shaped constraints

Sourced from the work of Akshay Prasadan, Matey Neykov

§ Problem Statement

Setup

Fix integers n,N1n,N \ge 1, a contamination level ϵ[0,1/2)\epsilon \in [0,1/2), a scale parameter σ>0\sigma>0, and a star-shaped set KRnK\subseteq\mathbb{R}^n (that is, there exists kKk^\star\in K such that k+t(xk)Kk^\star+t(x-k^\star)\in K for every xKx\in K and t[0,1]t\in[0,1]). Let

d:=supu,vKuv2[0,].d:=\sup_{u,v\in K}\|u-v\|_2\in[0,\infty].

For δ>0\delta>0 and ARnA\subseteq\mathbb{R}^n, let M(δ,A)\mathcal{M}(\delta,A) be the maximal cardinality of a δ\delta-packing of AA in Euclidean norm (pairwise distances >δ>\delta). For an absolute constant c>0c>0, define the local entropy

MKloc(η,c):=supνKM(η/c,  B(ν,η)K),η>0,\mathcal{M}^{\mathrm{loc}}_K(\eta,c):=\sup_{\nu\in K}\mathcal{M}(\eta/c,\;B(\nu,\eta)\cap K),\qquad \eta>0,

and define

η:=sup{η0:Nη2σ2logMKloc(η,c)}.\eta^\star:=\sup\left\{\eta\ge 0:\frac{N\eta^2}{\sigma^2}\le \log \mathcal{M}^{\mathrm{loc}}_K(\eta,c)\right\}.

This setup follows Prasadan & Neykov (2024).

Data model: there are unobserved clean samples X~i=μ+ξi\widetilde X_i=\mu+\xi_i, i=1,,Ni=1,\dots,N, with unknown μK\mu\in K. An adversary, after seeing all clean samples and the estimation procedure, outputs observed samples X1,,XNX_1,\dots,X_N by changing at most ϵN\epsilon N coordinates arbitrarily. Denote by Cϵ\mathfrak C_\epsilon the class of all such contamination mechanisms. The estimator μ^\widehat\mu is any measurable function of (X1,,XN)(X_1,\dots,X_N).

Noise regimes:

  1. Gaussian: ξiiidN(0,σ2In)\xi_i\stackrel{iid}{\sim}N(0,\sigma^2 I_n).

  2. Known-or-sign-symmetric sub-Gaussian: ξi\xi_i are iid mean-zero sub-Gaussian vectors with parameter at most σ\sigma, and either the full noise law is known or ξi=dξi\xi_i\stackrel{d}{=}-\xi_i.

Unsolved Problem

  1. Unknown sub-Gaussian: ξi\xi_i are iid mean-zero sub-Gaussian vectors with parameter at most σ\sigma, with otherwise unknown law.

§ Discussion

Loading discussion…

§ Known Partial Results

  • Prasadan et al. (2024): This paper proves the above minimax rates information-theoretically and gives matching (but computationally hard) procedures in the stated small-contamination regime. Prior efficient methods either need stronger assumptions (e.g., symmetry/known structure) or are rate-suboptimal in general unknown-noise settings.

§ References

[1]

Information Theoretic Limits of Robust Sub-Gaussian Mean Estimation Under Star-Shaped Constraints

Akshay Prasadan, Matey Neykov (2024)

Annals of Statistics (to appear)

📍 Section 6 (Discussion and Future Work), first paragraph, which asks for "computationally efficient algorithms achieving the same performance under various constraints for the mean".

Primary source for this problem. Year convention here uses the initial arXiv preprint year (2024), not a later revision/acceptance publication year.

§ Tags