Unsolved

Quantify and characterize the efficiency gap induced by convex-loss restriction for non-log-concave errors

Sourced from the work of Oliver Y. Feng, Yu-Chun Kao, Min Xu, Richard J. Samworth

§ Problem Statement

Setup

Let pNp \in \mathbb{N} be fixed, and suppose we observe i.i.d. pairs (Xi,Yi)Rp×R(X_i,Y_i) \in \mathbb{R}^p \times \mathbb{R}, i=1,,ni=1,\dots,n, from the linear model

Yi=Xiβ0+εi,Y_i = X_i^\top \beta_0 + \varepsilon_i,

where β0Rp\beta_0 \in \mathbb{R}^p is unknown, XiX_i is independent of εi\varepsilon_i, ΣX:=E[XiXi]\Sigma_X := \mathbb{E}[X_iX_i^\top] is positive definite, and εi\varepsilon_i has density ff on R\mathbb{R}. Assume regularity conditions ensuring standard MM-estimation asymptotics (e.g., EXi2+δ<\mathbb{E}\|X_i\|^{2+\delta}<\infty for some δ>0\delta>0, integrability and stochastic equicontinuity conditions, and nondegeneracy of curvature terms).

For a convex loss ρ:RR\rho:\mathbb{R}\to\mathbb{R}, write ψ=ρ\psi=\rho'. Assume ψ\psi is absolutely continuous (so ψ\psi' exists a.e.), E[ψ(εi)](0,)\mathbb{E}[\psi'(\varepsilon_i)]\in(0,\infty), and E[ψ(εi)2]<\mathbb{E}[\psi(\varepsilon_i)^2]<\infty. For Fisher consistency/identification, assume E[ψ(εi)]=0\mathbb{E}[\psi(\varepsilon_i)]=0 and that β0\beta_0 is the unique root of E[ψ(YXb)X]=0\mathbb{E}[\psi(Y-X^\top b)X]=0. Define the convex MM-estimator

β^ρargminbRpi=1nρ(YiXib).\hat\beta_\rho \in \arg\min_{b\in\mathbb{R}^p}\sum_{i=1}^n \rho(Y_i-X_i^\top b).

Its asymptotic covariance matrix is

Avar(β^ρ)=E[ψ(ε)2]E[ψ(ε)]2ΣX1.\operatorname{Avar}(\hat\beta_\rho) = \frac{\mathbb{E}[\psi(\varepsilon)^2]}{\mathbb{E}[\psi'(\varepsilon)]^2}\,\Sigma_X^{-1}.

Assume ff is known. Let sf(u):=ddulogf(u)s_f(u):=\frac{d}{du}\log f(u) and If:=E[sf(ε)2]I_f:=\mathbb{E}[s_f(\varepsilon)^2] (finite, positive Fisher information for location). The oracle maximum-likelihood estimator

β^MLE,fargminbRpi=1n(logf(YiXib))\hat\beta_{\mathrm{MLE},f}\in \arg\min_{b\in\mathbb{R}^p}\sum_{i=1}^n \bigl(-\log f(Y_i-X_i^\top b)\bigr)

has asymptotic covariance

Avar(β^MLE,f)=If1ΣX1.\operatorname{Avar}(\hat\beta_{\mathrm{MLE},f}) = I_f^{-1}\Sigma_X^{-1}.

For the class Ccvx(f)\mathcal{C}_{\mathrm{cvx}}(f) of convex losses ρ\rho satisfying the above conditions, define the best convex relative efficiency

Ecvx(f):=supρCcvx(f)Avar(β^MLE,f)Avar(β^ρ)=supρCcvx(f)E[ψ(ε)]2IfE[ψ(ε)2],\mathcal{E}_{\mathrm{cvx}}(f) := \sup_{\rho\in\mathcal{C}_{\mathrm{cvx}}(f)} \frac{\operatorname{Avar}(\hat\beta_{\mathrm{MLE},f})}{\operatorname{Avar}(\hat\beta_\rho)} = \sup_{\rho\in\mathcal{C}_{\mathrm{cvx}}(f)} \frac{\mathbb{E}[\psi'(\varepsilon)]^2}{I_f\,\mathbb{E}[\psi(\varepsilon)^2]},

which, when ρ\rho is twice differentiable with integrable ρ\rho'', is equivalently

E[ρ(ε)]2IfE[ρ(ε)2].\frac{\mathbb{E}[\rho''(\varepsilon)]^2}{I_f\,\mathbb{E}[\rho'(\varepsilon)^2]}.

The matrix ratio is interpreted in Loewner order (equivalently here as a scalar ratio since both covariances are multiples of ΣX1\Sigma_X^{-1}).

Unsolved Problem

For non-log-concave ff (so logf-\log f is not convex), determine and characterize Ecvx(f)\mathcal{E}_{\mathrm{cvx}}(f): compute or bound it for broad classes of such ff, characterize exactly when Ecvx(f)=1\mathcal{E}_{\mathrm{cvx}}(f)=1, and establish nontrivial distribution-free lower bounds on Ecvx(f)\mathcal{E}_{\mathrm{cvx}}(f) under explicit regularity assumptions on ff (e.g., smoothness, tail, and Fisher-information conditions).

§ Discussion

Loading discussion…

§ Significance & Implications

The abstract gives one non-log-concave example (Cauchy) with efficiency above 0.870.87, suggesting convex estimators can remain highly competitive. A general theory of the efficiency gap would clarify when convexity is effectively free and when it is statistically costly. This directly informs robustness-computation-efficiency tradeoffs in practice. See Feng et al. (2024) for details.

§ Known Partial Results

  • Feng et al. (2024): The paper derives the optimal convex loss via a score-matching/log-concave-projection principle and proves asymptotic optimality within convex MM-estimators. It provides the Cauchy case as evidence of high efficiency (>0.87) but does not claim a full characterization across error distributions.

§ References

[1]

Optimal Convex $M$-Estimation via Score Matching

Oliver Y. Feng, Yu-Chun Kao, Min Xu, Richard J. Samworth (2024)

arXiv preprint

📍 arXiv:2403.16688v2, Section 5 (Discussion), paragraph beginning "A particularly interesting question concerns the magnitude of $\\eta_f$", PDF p. 19.

Primary preprint source where the efficiency-gap discussion is posed.

[2]

Optimal Convex $M$-Estimation via Score Matching

Oliver Y. Feng, Yu-Chun Kao, Min Xu, Richard J. Samworth

Annals of Statistics (to appear)

📍 Cambridge Apollo metadata record (accepted version), journal field listed as Annals of Statistics.

Journal-publication metadata kept separate from preprint metadata; journal publication year to be filled once finalized.

§ Tags