Partially Resolved

Optimal Distribution-Free Prediction Intervals

§ Problem Statement

Setup

Let dNd \in \mathbb N, α(0,1)\alpha \in (0,1), and let P\mathcal P be the class of all Borel probability distributions on Rd×R\mathbb R^d \times \mathbb R. For any PPP \in \mathcal P, draw i.i.d. pairs (X1,Y1),,(Xn,Yn),(Xn+1,Yn+1)P(X_1,Y_1),\dots,(X_n,Y_n),(X_{n+1},Y_{n+1}) \sim P, where XiRdX_i \in \mathbb R^d and YiRY_i \in \mathbb R. A (possibly randomized) prediction-interval procedure is a measurable map sending (X1,Y1),,(Xn,Yn)(X_1,Y_1),\dots,(X_n,Y_n) and a test covariate xx to an interval C^n(x)=[Ln(x),Un(x)]R\widehat C_n(x)=[L_n(x),U_n(x)]\subseteq\mathbb R, with length C^n(x):=Un(x)Ln(x)|\widehat C_n(x)|:=U_n(x)-L_n(x).

Finite-sample marginal validity under exchangeability at level 1α1-\alpha is

infPPPP ⁣(Yn+1C^n(Xn+1))1α.\inf_{P\in\mathcal P}\mathbb P_P\!\big(Y_{n+1}\in \widehat C_n(X_{n+1})\big)\ge 1-\alpha.

Define

Rn(C^n;α):=supPPEP ⁣[C^n(Xn+1)],Rn,unres(α):=infC^n marginally validRn(C^n;α).\mathcal R_n(\widehat C_n;\alpha):=\sup_{P\in\mathcal P}\mathbb E_P\!\big[|\widehat C_n(X_{n+1})|\big],\qquad \mathcal R_{n,\mathrm{unres}}^*(\alpha):=\inf_{\widehat C_n\ \text{marginally valid}}\mathcal R_n(\widehat C_n;\alpha).

For unrestricted P\mathcal P, the objective is degenerate: Rn,unres(α)=\mathcal R_{n,\mathrm{unres}}^*(\alpha)=\infty (equivalently, no finite uniform expected-length guarantee over all Borel laws).

Unsolved Problem

Pose and solve the minimax question on a restricted class PΘP\mathcal P_\Theta\subset\mathcal P (e.g., moment/tail, noise, smoothness, or shape constraints):

Rn,Θ(α):=infC^n : infPPΘPP(Yn+1C^n(Xn+1))1α supPPΘEP ⁣[C^n(Xn+1)].\mathcal R_{n,\Theta}^*(\alpha):=\inf_{\widehat C_n\ :\ \inf_{P\in\mathcal P_\Theta}\mathbb P_P(Y_{n+1}\in\widehat C_n(X_{n+1}))\ge 1-\alpha} \ \sup_{P\in\mathcal P_\Theta}\mathbb E_P\!\big[|\widehat C_n(X_{n+1})|\big].

Determine sharp rates/constants of Rn,Θ(α)\mathcal R_{n,\Theta}^*(\alpha) in (n,α,d,Θ)(n,\alpha,d,\Theta) and whether computationally efficient procedures attain them.

Now separate conditional targets:

  1. Exact conditional coverage (known impossible distribution-free):
PP, xsupp(PX): PP ⁣(Yn+1C^n(Xn+1)Xn+1=x)1α.\forall P\in\mathcal P,\ \forall x\in\operatorname{supp}(P_X):\ \mathbb P_P\!\big(Y_{n+1}\in\widehat C_n(X_{n+1})\mid X_{n+1}=x\big)\ge 1-\alpha.
  1. Relaxed conditional coverage, e.g. (δ,α)(\delta,\alpha)-approximate conditional coverage:
PPΘ, Aσ(X) with PX(A)δ: PP ⁣(Yn+1C^n(Xn+1)Xn+1A)1α.\forall P\in\mathcal P_\Theta,\ \forall A\in\sigma(X)\ \text{with }P_X(A)\ge\delta:\ \mathbb P_P\!\big(Y_{n+1}\in\widehat C_n(X_{n+1})\mid X_{n+1}\in A\big)\ge 1-\alpha.

Characterize minimax-optimal length under such relaxed conditional criteria (or other precisely specified local/averaged variants) and compare conformal-type procedures to minimax lower bounds.

See Vovk et al. (2005) for conformal prediction, Lei & Wasserman (2014), Romano et al. (2019), Barber et al. (2021), and Gibbs & Candès (2021).

§ Discussion

Loading discussion…

§ Significance & Implications

Distribution-free predictive inference is central in statistics and machine learning. Conformal methods provide finite-sample marginal validity under exchangeability, but practical guarantees depend on finite-sample score calibration choices (typically conservative at the 1/(n+1)1/(n+1) scale due to quantile discretization/tie handling). Exact conditional coverage is impossible without additional assumptions, so the key frontier is sharp efficiency and computational optimality under explicit structural restrictions and under well-defined relaxed conditional targets.

§ Known Partial Results

  • Lei et al. (2014): Unrestricted minimax objective under only distribution-free marginal validity is degenerate: the worst-case expected length over all Borel laws is infinite, so meaningful minimax analysis requires restricting P\mathcal P.

  • Vovk et al. (2005): conformal prediction gives finite-sample marginal validity under exchangeability (with finite-sample calibration/discretization caveats).

  • Lei & Wasserman (2014): early distribution-free prediction-band constructions and analysis for nonparametric regression settings.

  • Romano et al. (2019): conformalized quantile regression with finite-sample marginal validity under exchangeability.

  • Barber et al. (2021): exact conditional coverage is impossible distribution-free except with vacuous/very wide intervals.

  • Gibbs & Candès (2021): adaptive conformal methods for certain distribution-shift regimes, outside exact conditional guarantees.

§ References

[1]

Distribution-free prediction bands for non-parametric regression

Jing Lei, Larry Wasserman (2014)

Journal of the Royal Statistical Society: Series B (Statistical Methodology)

📍 JRSSB 76(1):71-96; methodology and guarantees in Sections 2-3, discussion/open questions in Section 6.

[2]

The limits of distribution-free conditional predictive inference

Rina Foygel Barber, Emmanuel Candès, Aaditya Ramdas, Ryan Tibshirani (2021)

Information and Inference: A Journal of the IMA

[3]

Conformalized Quantile Regression

Yaniv Romano, Evan Patterson, Emmanuel Candès (2019)

Advances in Neural Information Processing Systems (NeurIPS 32)

📍 NeurIPS 2019 paper; Algorithm 1 and Theorem 1 (finite-sample marginal validity under exchangeability).

[4]

Algorithmic Learning in a Random World

Vladimir Vovk, Alexander Gammerman, Glenn Shafer (2005)

Springer

📍 Monograph introducing conformal prediction and finite-sample validity under exchangeability (early chapters).

[5]

Adaptive Conformal Inference Under Distribution Shift

Isaac Gibbs, Emmanuel Candès (2021)

Advances in Neural Information Processing Systems (NeurIPS 34)

📍 Adaptive conformal calibration for covariate/distribution shift settings.

§ Tags