Statistical Identification of Primary Adjustment Sets From Data
Sourced from the work of F. Richard Guo, Qingyuan Zhao
§ Problem Statement
Setup
Let be a finite set of observed variables, and let be an acyclic directed mixed graph (ADMG): contains directed edges , contains bidirected edges , and there is no directed cycle. For , let be its descendants and its ancestors; extend to sets by unions.
A path is called a confounding arc between distinct if it has no collider and has an arrowhead at both endpoints. Given , such a confounding arc is unblocked given if none of its non-endpoint vertices belongs to . Write if there exists an unblocked confounding arc between and given , and write otherwise.
For distinct , a set is an adjustment set for if . Given a baseline set , an adjustment set is primary for relative to if
It is minimal primary if no proper subset of is primary relative to .
Define
fix a deterministic selector on nonempty finite sets, and define
where is a convention meaning "no minimal primary adjustment set exists." For the target pair , let .
Data are generated from an unknown causal model whose latent projection on is . Let denote either (i) the observational distribution on , or (ii) a specified family of interventional distributions for a known intervention set . Let be the class of ADMGs compatible with the available distribution(s).
Unsolved Problem
Characterize conditions under which is identifiable from over , i.e. whether all compatible graphs agree on this target (including possible value ). Equivalently, ask whether there exists a functional such that for every admissible ,
Under such conditions, construct estimators from samples (or pooled interventional samples) that are consistent for this discrete target, i.e. as .
§ Discussion
§ Significance & Implications
The method is interactive and relies on elicited structural information rather than full graph specification. Understanding when primary sets are data-identifiable would reduce reliance on subjective input and connect the framework to statistical learning. It would also clarify limits of what can be identified without complete causal-graph knowledge.
§ Known Partial Results
Guo et al. (2025): The paper provides a procedural framework and correctness guarantees conditional on correctly provided primary adjustment sets, but does not claim these sets are statistically identifiable from observed/interventional data alone. This direction remains open in the source paper.
§ References
Confounder Selection via Iterative Graph Expansion
F. Richard Guo, Qingyuan Zhao (2025)
Annals of Statistics (to appear)
📍 Section 5.4 (Finding primary adjustment sets): source paper raises the learnability/identifiability question for primary adjustment sets. The selector-based $P^\star_G$ / $\phi$ formulation is an explicit formalization layer.
Primary source paper. Citation-year convention used here: first arXiv submission in 2023, cited version is arXiv v3 (2025 revision).