Unsolved

Structure-Agnostic Minimax Risk for Partial Linear Model

Posed by Yihong Gu (2025)

§ Problem Statement

Setup

Let (Yi,Ti,Xi)i=1n(Y_i,T_i,X_i)_{i=1}^n be i.i.d., with Y,TRY,T\in\mathbb{R} and XX taking values in a measurable space X\mathcal{X}. Assume the partial linear model

Y=θ0T+f0(X)+ϵ,E[ϵX,T]=0,Y = \theta_0 T + f_0(X) + \epsilon,\qquad \mathbb{E}[\epsilon\mid X,T]=0,

where θ0R\theta_0\in\mathbb{R} is the target and f0:XRf_0:\mathcal{X}\to\mathbb{R} is an unknown nuisance. Define

g0(x):=E[TX=x],m0(x):=E[YX=x]=θ0g0(x)+f0(x),g_0(x):=\mathbb{E}[T\mid X=x],\qquad m_0(x):=\mathbb{E}[Y\mid X=x]=\theta_0 g_0(x)+f_0(x),

and the residualized treatment U:=Tg0(X)U:=T-g_0(X). Assume finite second moments and a nondegeneracy/conditioning bound

E[Y2]<, E[T2]<,0<σlower2E[U2]σupper2<.\mathbb{E}[Y^2]<\infty,\ \mathbb{E}[T^2]<\infty,\qquad 0<\sigma_{\mathrm{lower}}^2\le \mathbb{E}[U^2]\le \sigma_{\mathrm{upper}}^2<\infty.

A cross-fitted DML estimator is: split indices into folds; for each ii, fit nuisance predictors m^(i)\widehat m^{(-i)} and g^(i)\widehat g^{(-i)} using data excluding ii (or excluding ii's fold), evaluate them at XiX_i, and compute

θ^DML:=i=1n(Tig^(i)(Xi))(Yim^(i)(Xi))i=1n(Tig^(i)(Xi))2.\widehat\theta_{\mathrm{DML}}:=\frac{\sum_{i=1}^n (T_i-\widehat g^{(-i)}(X_i))(Y_i-\widehat m^{(-i)}(X_i))}{\sum_{i=1}^n (T_i-\widehat g^{(-i)}(X_i))^2}.

Structure-agnostic learnability assumption: there exist such cross-fitted predictors (m^(i),g^(i))(\widehat m^{(-i)},\widehat g^{(-i)}) constructed from the nn samples for which, for an independent draw XPXX\sim P_X (independent of the training sample),

E[(m^(X)m0(X))2]δm2,E[(g^(X)g0(X))2]δg2,\mathbb{E}\big[(\widehat m(X)-m_0(X))^2\big]\le \delta_m^2,\qquad \mathbb{E}\big[(\widehat g(X)-g_0(X))^2\big]\le \delta_g^2,

where the expectation averages over the sample and any algorithmic randomness.

Let Pn(δm,δg;σlower,σupper)\mathcal{P}_n(\delta_m,\delta_g;\sigma_{\mathrm{lower}},\sigma_{\mathrm{upper}}) be the set of all distributions PP over (Y,T,X)(Y,T,X) satisfying the model, the moment and E[U2]\mathbb{E}[U^2] bounds above, and for which there exist cross-fitted predictors achieving the stated L2(PX)L_2(P_X) errors (δm,δg)(\delta_m,\delta_g) from nn samples. Define the structure-agnostic minimax mean-squared error

R(n,δm,δg;σlower,σupper):=infθ^ supPPn(δm,δg;σlower,σupper) EP[(θ^θ0)2],R^*(n,\delta_m,\delta_g;\sigma_{\mathrm{lower}},\sigma_{\mathrm{upper}}):=\inf_{\widehat\theta}\ \sup_{P\in\mathcal{P}_n(\delta_m,\delta_g;\sigma_{\mathrm{lower}},\sigma_{\mathrm{upper}})}\ \mathbb{E}_P\big[(\widehat\theta-\theta_0)^2\big],

where the infimum is over all estimators measurable w.r.t. the sample.

Unsolved Problem

Characterize R(n,δm,δg;σlower,σupper)R^*(n,\delta_m,\delta_g;\sigma_{\mathrm{lower}},\sigma_{\mathrm{upper}}) sharply (up to universal constants and, if unavoidable, logarithmic factors) as a function of (n,δm,δg,σlower,σupper)(n,\delta_m,\delta_g,\sigma_{\mathrm{lower}},\sigma_{\mathrm{upper}}). In particular, determine whether θ^DML\widehat\theta_{\mathrm{DML}} attains the minimax rate uniformly over all regimes of (δm,δg)(\delta_m,\delta_g) under only the structure-agnostic learnability assumption; if not, determine the minimax rate and exhibit an estimator achieving it, clarifying how variance/conditioning through UU constrains what is achievable without additional structural assumptions on m0m_0 or g0g_0.

§ Discussion

Loading discussion…

§ Significance & Implications

The problem asks for an information-theoretic benchmark for estimating the scalar coefficient θ0\theta_0 in a partial linear model when the only quantitative control on nuisance learning is out-of-sample mean-squared prediction error bounds (δm,δg)(\delta_m,\delta_g), with no smoothness/sparsity/parametric structure assumed. A sharp characterization of RR^* would pin down the best possible tradeoff between sample size, nuisance prediction accuracy, and treatment-residual conditioning E[U2]\mathbb{E}[U^2], and would decide whether cross-fitted orthogonal/DML estimation is uniformly rate-optimal in this purely structure-agnostic regime. This directly impacts when black-box prediction guarantees alone justify the commonly used residualization-and-regression pipeline for semiparametric/causal effect estimation, versus when additional assumptions are necessary to control variance-driven limitations tied to the residualized treatment.

§ Known Partial Results

  • Gu (2025): The orthogonal estimating equation underlying DML is first-order insensitive to nuisance errors, so analyses typically reduce MSE control for θ^DML\widehat\theta_{\mathrm{DML}} to bounding higher-order remainder terms involving the nuisance estimation errors.

  • Chernozhukov et al. (2018): In many standard DML analyses, achieving n1/2n^{-1/2}-type accuracy (or sharper bounds on E[(θ^θ0)2]\mathbb{E}[(\widehat\theta-\theta_0)^2]) requires product-rate conditions on nuisance estimation (informally, that the combined effect of m^m0\widehat m-m_0 and g^g0\widehat g-g_0 is small enough) together with nondegeneracy of the residualized treatment.

  • Gu (2025): The COLT 2025 note (Gu, 2025) highlights a specific gap for structure-agnostic minimax lower bounds in this model: existing techniques do not cleanly capture how variance/conditioning, as reflected by U=TE[TX]U=T-\mathbb{E}[T\mid X] and its second moment bounds, may limit (or allow) improvements beyond generic DML rates, leaving the sharp minimax risk characterization open even for estimating θ0\theta_0.

§ References

[1]

Open Problem: Structure-Agnostic Minimax Risk for Partial Linear Model

Yihong Gu (2025)

Conference on Learning Theory (COLT), PMLR 291

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Structure-Agnostic Minimax Risk for Partial Linear Model (PDF)

Yihong Gu (2025)

Conference on Learning Theory (COLT), PMLR 291

📍 Proceedings PDF.

[3]

Root-N-Consistent Semiparametric Regression

Peter M. Robinson (1988)

Econometrica

📍 Classical partially linear model results (root-n estimation under regularity conditions).

[4]

Double/debiased machine learning for treatment and structural parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, James Robins (2018)

The Econometrics Journal

📍 Canonical DML framework with orthogonal scores and cross-fitting.

§ Tags