Unsolved

Non-asymptotic oracle inequalities for fully data-driven generalized-inverse shrinkage

Sourced from the work of Taras Bodnar, Nestor Parolya

§ Problem Statement

Setup

Let X1,,XnRpX_1,\dots,X_n\in\mathbb R^{p} be i.i.d. with E[Xi]=0\mathbb E[X_i]=0 and

Xi=Σ1/2Zi,E[Zi]=0, E[ZiZi]=Ip,X_i=\Sigma^{1/2}Z_i,\qquad \mathbb E[Z_i]=0,\ \mathbb E[Z_iZ_i^\top]=I_p,

where Σ\Sigma is symmetric positive definite. Work in the high-dimensional regime p>np>n with p/nc>1p/n\to c>1. Assume a bounded 4+ε4+\varepsilon moment bound for some ε>0\varepsilon>0, e.g.

supu2=1EuZi4+εK4+ε<.\sup_{\|u\|_2=1}\mathbb E\,|u^\top Z_i|^{4+\varepsilon}\le K_{4+\varepsilon}<\infty.

(If one imposes stronger tails such as sub-Gaussianity, treat that as a deliberate strengthening.)

This setup follows Bodnar & Parolya (2024).

Define

Sn=1ni=1nXiXi,S_n=\frac1n\sum_{i=1}^n X_iX_i^\top,

its Moore-Penrose inverse SnS_n^\dagger, and ridge inverse Gn(λ)=(Sn+λIp)1G_n(\lambda)=(S_n+\lambda I_p)^{-1} for λ>0\lambda>0. Let Θ=Σ1\Theta_\star=\Sigma^{-1} and fix a deterministic symmetric target TnT_n with bounded operator norm. For αA[0,1]\alpha\in\mathcal A\subset[0,1] and λΛ(0,)\lambda\in\Lambda\subset(0,\infty), define [ \widehat\Theta_n^{\mathrm{MP}}(\alpha)=\alpha S_n^\dagger+(1-\alpha)T_n,\qquad \widehat\Theta_n^{\mathrm{R}}(\alpha,\lambda)=\alpha G_n(\lambda)+(1-\alpha)T_n. ]

Unsolved Problem

With loss Ln(Θ)=p1ΘΘF2L_n(\Theta)=p^{-1}\|\Theta-\Theta_\star\|_F^2 and risk Rn(Θ)=E[Ln(Θ)]\mathcal R_n(\Theta)=\mathbb E[L_n(\Theta)], seek fully data-driven selectors (α^n,λ^n)(\hat\alpha_n,\hat\lambda_n) (measurable in the sample, no population oracle input) such that finite-sample oracle inequalities hold:

Rn ⁣(Θ^nMP(α^n))infαARn ⁣(Θ^nMP(α))Cψn,p,\mathcal R_n\!\left(\widehat\Theta_n^{\mathrm{MP}}(\hat\alpha_n)\right)-\inf_{\alpha\in\mathcal A}\mathcal R_n\!\left(\widehat\Theta_n^{\mathrm{MP}}(\alpha)\right)\le C\psi_{n,p},

and

Rn ⁣(Θ^nR(α^n,λ^n))infαA,λΛRn ⁣(Θ^nR(α,λ))Cψn,p,\mathcal R_n\!\left(\widehat\Theta_n^{\mathrm{R}}(\hat\alpha_n,\hat\lambda_n)\right)-\inf_{\alpha\in\mathcal A,\lambda\in\Lambda}\mathcal R_n\!\left(\widehat\Theta_n^{\mathrm{R}}(\alpha,\lambda)\right)\le C\psi_{n,p},

with explicit non-asymptotic ψn,p\psi_{n,p} and transparent constants under the stated moment/regime assumptions.

§ Discussion

Loading discussion…

§ Significance & Implications

The source establishes asymptotic behavior in large-dimensional settings, but practical tuning still needs explicit finite-sample guarantees. Non-asymptotic oracle inequalities would quantify reliability gaps for pseudo-inverse and ridge-type precision estimation in the c>1c>1 regime.

§ Known Partial Results

  • Bodnar et al. (2024): This problem remains open in the specific scope above: fully data-driven tuning with explicit finite-sample oracle inequalities for both Moore-Penrose and ridge-type shrinkage under the p/nc>1p/n\to c>1 and bounded 4+ε4+\varepsilon-moment framework. Nearby 2025 non-asymptotic ridge-related results (risk/concentration bounds in adjacent models) reduce technical uncertainty but do not by themselves close this exact oracle-inequality target.

§ References

[1]

Reviving pseudo-inverses: Asymptotic properties of large dimensional Moore-Penrose and Ridge-type inverses with applications

Taras Bodnar, Nestor Parolya (2024)

arXiv preprint

📍 Section 1 (Introduction), p. 2, second paragraph: “No other results have been derived either for the Moore-Penrose inverse or for the ridge-type inverse in the non-asymptotic setting…”

Primary source motivating this synthesized open problem.

§ Tags