Unsolved

Statistical Identification of Primary Adjustment Sets From Data

Sourced from the work of F. Richard Guo, Qingyuan Zhao

§ Problem Statement

Setup

Let VV be a finite set of observed variables, and let G=(V,E,E)G=(V,E_{\to},E_{\leftrightarrow}) be an acyclic directed mixed graph (ADMG): EE_{\to} contains directed edges ABA\to B, EE_{\leftrightarrow} contains bidirected edges ABA\leftrightarrow B, and there is no directed cycle. For AVA\in V, let deG(A)\mathrm{de}_G(A) be its descendants and anG(A)\mathrm{an}_G(A) its ancestors; extend to sets by unions.

A path is called a confounding arc between distinct A,BVA,B\in V if it has no collider and has an arrowhead at both endpoints. Given CV{A,B}C\subseteq V\setminus\{A,B\}, such a confounding arc is unblocked given CC if none of its non-endpoint vertices belongs to CC. Write A̸ ⁣ ⁣ ⁣ ⁣arcBCA \not\!\perp\!\!\!\perp_{\mathrm{arc}} B\mid C if there exists an unblocked confounding arc between AA and BB given CC, and write A ⁣ ⁣ ⁣arcBCA \perp\!\!\!\perp_{\mathrm{arc}} B\mid C otherwise.

For distinct A,BVA,B\in V, a set CV{A,B}C\subseteq V\setminus\{A,B\} is an adjustment set for (A,B)(A,B) if C(deG(A)deG(B))=C\cap(\mathrm{de}_G(A)\cup \mathrm{de}_G(B))=\varnothing. Given a baseline set SV{A,B}S\subseteq V\setminus\{A,B\}, an adjustment set CC is primary for (A,B)(A,B) relative to SS if

A ⁣ ⁣ ⁣arcBSC.A \perp\!\!\!\perp_{\mathrm{arc}} B \mid S\cup C.

It is minimal primary if no proper subset of CC is primary relative to SS.

Define

MG(A,B;S):={C: C is minimal primary for (A,B) relative to S},\mathcal M_G(A,B;S):=\{C:\ C\text{ is minimal primary for }(A,B)\text{ relative to }S\},

fix a deterministic selector t\mathfrak t on nonempty finite sets, and define

PG(A,B;S):={t(MG(A,B;S)),MG(A,B;S),,MG(A,B;S)=,P^\star_G(A,B;S):=\begin{cases} \mathfrak t\big(\mathcal M_G(A,B;S)\big), & \mathcal M_G(A,B;S)\neq\varnothing,\\ \bot, & \mathcal M_G(A,B;S)=\varnothing, \end{cases}

where \bot is a convention meaning "no minimal primary adjustment set exists." For the target pair (X,Y)(X,Y), let P(X,Y;G):=PG(X,Y;)P^\star(X,Y;G):=P^\star_G(X,Y;\varnothing).

Data are generated from an unknown causal model whose latent projection on VV is GG. Let P\mathbb P denote either (i) the observational distribution on VV, or (ii) a specified family of interventional distributions {Pdo(I=i):iI}\{\mathbb P^{do(I=i)}: i\in\mathcal I\} for a known intervention set I\mathcal I. Let G(P)\mathcal G(\mathbb P) be the class of ADMGs compatible with the available distribution(s).

Unsolved Problem

Characterize conditions under which P(X,Y;G)P^\star(X,Y;G) is identifiable from P\mathbb P over G(P)\mathcal G(\mathbb P), i.e. whether all compatible graphs agree on this target (including possible value \bot). Equivalently, ask whether there exists a functional ϕ\phi such that for every admissible GG,

ϕ(P)=P(X,Y;G).\phi(\mathbb P)=P^\star(X,Y;G).

Under such conditions, construct estimators P^n=P^n(Pn)\widehat P_n=\widehat P_n(\mathbb P_n) from nn samples (or pooled interventional samples) that are consistent for this discrete target, i.e. Pr(P^n=P(X,Y;G))1\Pr(\widehat P_n=P^\star(X,Y;G))\to 1 as nn\to\infty.

§ Discussion

Loading discussion…

§ Significance & Implications

The method is interactive and relies on elicited structural information rather than full graph specification. Understanding when primary sets are data-identifiable would reduce reliance on subjective input and connect the framework to statistical learning. It would also clarify limits of what can be identified without complete causal-graph knowledge.

§ Known Partial Results

  • Guo et al. (2025): The paper provides a procedural framework and correctness guarantees conditional on correctly provided primary adjustment sets, but does not claim these sets are statistically identifiable from observed/interventional data alone. This direction remains open in the source paper.

§ References

[1]

Confounder Selection via Iterative Graph Expansion

F. Richard Guo, Qingyuan Zhao (2025)

Annals of Statistics (to appear)

📍 Section 5.4 (Finding primary adjustment sets): source paper raises the learnability/identifiability question for primary adjustment sets. The selector-based $P^\star_G$ / $\phi$ formulation is an explicit formalization layer.

Primary source paper. Citation-year convention used here: first arXiv submission in 2023, cited version is arXiv v3 (2025 revision).

§ Tags