Unsolved

Outlier characterization for unbounded link functions (e.g., phase retrieval)

Sourced from the work of Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath

§ Problem Statement

Setup

Let (d,n)(d,n)\to\infty with n/dα(0,)n/d\to\alpha\in(0,\infty). For each dd, let Y1,,YnRdY_1,\dots,Y_n\in\mathbb{R}^d be i.i.d. samples from a kk-component Gaussian mixture

Y=μZ+g,P(Z=a)=πa, πa>0, a=1kπa=1, gN(0,Id),Y=\mu_{Z}+g,\qquad \mathbb{P}(Z=a)=\pi_a,\ \pi_a>0,\ \sum_{a=1}^k\pi_a=1,\ g\sim\mathcal{N}(0,I_d),

where μ1,,μkRd\mu_1,\dots,\mu_k\in\mathbb{R}^d are deterministic class means with μa=O(1)\|\mu_a\|=O(1). Let X=[x1,,xp]Rd×pX=[x_1,\dots,x_p]\in\mathbb{R}^{d\times p} be deterministic with xj=O(1)\|x_j\|=O(1), and define the projected feature U=XYRpU_\ell=X^\top Y_\ell\in\mathbb{R}^p.

This setup follows Arous et al. (2025).

Consider random matrices of Hessian/information type

Mn=1n=1nw(U)YY,M_n=\frac1n\sum_{\ell=1}^n w(U_\ell)\,Y_\ell Y_\ell^\top,

where w:RpRw:\mathbb{R}^p\to\mathbb{R} is measurable and may be unbounded (for example, w(u)w(u) growing polynomially; in phase retrieval-type models one gets w(u)u2w(u)\asymp u^2 in the single-index case p=1p=1). Let S=span{x1,,xp,μ1,,μk}S=\mathrm{span}\{x_1,\dots,x_p,\mu_1,\dots,\mu_k\} and r=dimSp+kr=\dim S\le p+k. Assume the empirical spectral distribution of MnM_n converges almost surely to a deterministic law ν\nu whose support is not bounded above.

Unsolved Problem

In bounded-support settings, outliers are characterized by eigenvalues separating to the right of the finite bulk edge. Here no finite right edge exists. Motivated by the open-direction discussion in the source paper, formulate a replacement theory in this unbounded-support regime:

  1. Give necessary and sufficient conditions, in terms of (α,πa,μa,X,w)(\alpha,\pi_a,\mu_a,X,w) (equivalently the finite-dimensional Gram data of signal directions plus the link-induced moments), for existence of finitely many spike-generated eigenvalues of MnM_n that are spectrally distinguishable from the background spectrum despite supsupp(ν)=\sup\mathrm{supp}(\nu)=\infty.

  2. Prove deterministic asymptotic formulas for those eigenvalues and for eigenvector alignment with SS, i.e. limits of quantities like PSvj2\|P_S v_j\|^2 for corresponding unit eigenvectors vjv_j.

  3. Identify an appropriate notion of "isolation" when the bulk has unbounded support (for example, separation from the unspiked comparison model at the relevant extreme-value scale) and establish a BBP-type phase transition criterion under that notion.

See Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions (arXiv:2502.15655v3) for context.

§ Discussion

Loading discussion…

§ Significance & Implications

Unbounded-link models (including phase-retrieval-type examples) arise naturally, while rigorous outlier characterizations in this line of work mainly rely on a finite right bulk edge. Extending those results to unbounded-support regimes would broaden the currently analyzable model class.

§ Known Partial Results

  • Arous et al. (2025): The paper develops effective bulk analysis beyond uniformly bounded links, but its outlier arguments are tied to separation past a finite right bulk edge; a full unbounded-support outlier theory is left open (as of February 25, 2025, in arXiv:2502.15655v3).

§ References

[1]

Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions

Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath (2025)

Annals of Statistics (to appear)

📍 arXiv v3, Section 1.5.2 (Parametric regression for the multi-index model), Remark 1.15

Source paper; Section 1.5.2 and Remark 1.15 motivate the unbounded-link outlier characterization as an open direction.

§ Tags