Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-07T10:45:46.338Z Has data issue: false hasContentIssue false

Beyond the Mean: A Flexible Framework for Studying Causal Effects Using Linear Models

Published online by Cambridge University Press:  01 January 2025

Christian Gische*
Affiliation:
Humboldt University Berlin
Manuel C. Voelkle
Affiliation:
Humboldt University Berlin
*
Correspondence should be made to Christian Gische, Department of Psychology, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin, Germany. Email: christian.gische@hu-berlin.de
Rights & Permissions [Opens in a new window]

Abstract

Graph-based causal models are a flexible tool for causal inference from observational data. In this paper, we develop a comprehensive framework to define, identify, and estimate a broad class of causal quantities in linearly parametrized graph-based models. The proposed method extends the literature, which mainly focuses on causal effects on the mean level and the variance of an outcome variable. For example, we show how to compute the probability that an outcome variable realizes within a target range of values given an intervention, a causal quantity we refer to as the probability of treatment success. We link graph-based causal quantities defined via the do-operator to parameters of the model implied distribution of the observed variables using so-called causal effect functions. Based on these causal effect functions, we propose estimators for causal quantities and show that these estimators are consistent and converge at a rate of N-1/2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$N^{-1/2}$$\end{document} under standard assumptions. Thus, causal quantities can be estimated based on sample sizes that are typically available in the social and behavioral sciences. In case of maximum likelihood estimation, the estimators are asymptotically efficient. We illustrate the proposed method with an example based on empirical data, placing special emphasis on the difference between the interventional and conditional distribution.

Type
Theory & Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
Copyright © 2021 The Author(s)

Graph-Based Models for Causal Inference

The graph-based approach to causal inference was primarily formalized by Judea Pearl (Reference Pearl1988, Reference Pearl1995, Reference Pearl2009) and Spirtes, Glymour, and Scheines (Reference Spirtes, Glymour and Scheines2001). A causal graph represents a researcher’s theory about the causal structure of the data-generating mechanism. Based on a causal graph, causal inference can be conducted using the interventional distribution, from which standard causal quantities such as average treatment effects (ATEs) can be derived. In the most general formulation, a causal graph is accompanied by a set of nonparametric structural equations. Thus, a common acronym for Pearl’s general nonparametric model is NPSEM, which stands for non-parametric structural equation model (Pearl, Reference Pearl2009; Shpitser, Richardson, & Robins, Reference Shpitser, Richardson and Robins2020).

Graph-based causal models share many common characteristics with the traditional literature on structural equation models (SEM) prevalent in the social and behavioral sciences and economics (Bollen & Pearl, Reference Bollen and Pearl2013; Heckman & Pinto, Reference Heckman and Pinto2015; Pearl, Reference Pearl2009, Reference Pearl2012). However, these two approaches also differ in several aspects including the underlying assumptions (e.g., graph-based models assume modularity), notational conventions (e.g., the meaning of bidirected edges in graphical representations), research focus (e.g., nonparametric identification in graph-based models vs. parametric estimation in traditional SEM), and standard procedures.

Graph-based procedures often focus on a single causal quantity of interest (e.g., ATE) and establishing its causal identification based on a minimal set of assumptions (e.g., without making parametric assumptions). Causal quantities are well defined via the do-operator and the resulting interventional distribution and causal identification can be established based on graphical tools such as the back-door criterion (Pearl, Reference Pearl1995) or a set of algebraic rules called do-calculus (Shpitser & Pearl, Reference Shpitser and Pearl2006; Tian & Pearl, Reference Tian and Pearl2002a). The central insights developed within the graph-based approach relate to causal identification, whereas less attention has been devoted to the estimation of causal quantities.Footnote 1

On the other hand, the traditional literature on SEM frequently assumes parametrized (often linear) models and usually focuses on identification and estimation of the entire model.Footnote 2 Causal quantities such as direct, indirect and total effects can be defined based on reduced-form equations and partial derivatives (Alwin & Hauser, Reference Alwin and Hauser1975; Bollen, Reference Bollen1987; Stolzenberg, Reference Stolzenberg1980). A main focus within the traditional SEM literature lies on the model implied joint distribution of observed variables and its statistical modeling. A considerable body of literature is available on model identification (Bekker, Merckens, & Wansbeek, Reference Bekker, Merckens and Wansbeek1994; Bollen, Reference Bollen1989; Fisher, Reference Fisher1966; Wiley, Reference Wiley1973) and estimation (Browne, Reference Browne1984; Jöreskog, Reference Jöreskog1967; Satorra & Bentler, Reference Satorra and Bentler1994) for parametrized SEM.

In this paper, we combine causal quantities from graph-based models with identification and estimation results from the traditional literature on linear SEM. For this purpose, we formalize the do-operator using matrix algebra in the section on “Graph-Based Causal Models with Linear Equations.” Based on this matrix representation, we derive a closed-form parametric expression of the interventional distribution and several causal quantities in the section entitled “Interventional Distribution.” Linear graph-based models imply a parametrized joint distribution of the observed variables. We define causal effect functions as a mapping from the parameters of the joint distribution of observed variables onto the causal quantities defined via the do-operator in the section entitled “Causal Effect Functions.” Methods for identifying parametrized causal quantities are discussed in the section entitled “Identification of Parametrized Causal Quantities”. Estimators of causal quantities that are consistent and converge at a rate of N - 1 / 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^{-1/2}$$\end{document} are proposed in the section on "Estimation of Causal Quantities." We show that the proposed estimators are asymptotically efficient in case of maximum likelihood estimation.

Our work extends the literature on traditional SEM by providing closed-form expressions of graph-based causal quantities in terms of model parameters of linear SEM. Furthermore, we extend the literature on linear graph-based models by providing a unifying estimation framework for (multivariate) causal quantities that also allows estimation of causal quantities beyond the mean and the variance. We illustrate the method using simulated data based on an empirical application and provide a thorough discussion of the differences between conditional and interventional distributions in the illustration section.

Throughout this paper, we focus on situations in which direct causal effects are functionally independent of the values of variables in the system. In other words, direct causal effects are constant. In such situations, the data-generating mechanisms can be adequately represented by linear structural equations and the use of linear graph-based causal models is justified. A priori knowledge that suggests constant direct causal effects sometimes allows identifying causal quantities that would not be identified under the more flexible assumptions of the NPSEM (see illustration section for an example). However, scientific theories that suggest constant direct causal effects might be incorrect and consequently, linear models might be misspecified. We will discuss issues related to model misspecification in the discussion section, where we will also point to future research directions.

Graph-Based Causal Models with Linear Equations

Linear graph-based causal models are an appropriate tool in situations in which a priori scientific knowledge suggests that each of the following statements is true:Footnote 3

  1. 1. The causal ordering of observed variables and unobserved confounders is known.

  2. 2. Interventions only alter the mechanisms that are directly targeted (modularity).Footnote 4

  3. 3. The treatment status of a unit (e.g., person) does not affect the treatment status or the outcome of other units (no interference).

  4. 4. Direct causal effects are constant across units (homogeneity).

  5. 5. Direct causal effects are constant across value combinations of observed variables and unobserved error terms (no effect modification).

  6. 6. Omitted direct causes as comprised in the error terms follow a multivariate normal distribution.Footnote 5

The first three assumptions listed above are generic to the graph-based approach to causal inference and need to hold in its most general nonparametric formulation. Assumptions 4 and 5 justify the use of linear structural equations. Assumptions 6 justifies the use of multivariate normally distributed error terms. We further assume that variables are measured on a continuous scale and are observed without measurement error. Throughout this paper, we assume that the model is correctly specified. In the discussion section, we briefly point to the literature on statistical tests of model assumptions and methods for analyzing the sensitivity of causal conclusions with respect to violations of untestable assumptions. Furthermore, we briefly discuss possible ways to relax the model assumptions (e.g., measurement errors, unobserved heterogeneity, effect modification, excess kurtosis in the error terms).

A linear graph-based causal models over the set V = { V 1 , . . . , V n } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {V}}=\{V_1,...,V_n\}$$\end{document} of observed variables are defined by the following set of equations (Brito & Pearl, Reference Brito and Pearl2006, p.2):Footnote 6

(1) V j = i n c ji V i + ε j , j = 1 , . . . , n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} V_j=\sum _i^nc_{ji}V_i+\varepsilon _j, \quad j=1,...,n \end{aligned}$$\end{document}

We assume that all variables are deviations from their means and no intercepts are included in Eq. (1). A nonzero structural coefficient ( c ji 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{ji}\ne 0$$\end{document}) expresses the assumption that V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} has a direct causal influence on V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_j$$\end{document}. Restricting a structural coefficient to zero ( c ji = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{ji}=0$$\end{document}) indicates the assumption that V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} has no direct causal effect on V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_j$$\end{document}. The parameter c ji \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{ji}$$\end{document} quantifies the magnitude of a direct effect. The q × 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q \times 1$$\end{document} parameter vector θ F Θ F R q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {F}}}\in {\varvec{\Theta }}_{{\mathcal {F}}}\subseteq {\mathbb {R}}^q$$\end{document} contains all distinct, functionally unrelated and unknown structural coefficients c ji \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{ji}$$\end{document}. Θ F \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Theta }}_{{\mathcal {F}}}$$\end{document} denotes the parameter space, and it is a subspace of the q-dimensional Euclidean space. Restating Eqs. (1) in matrix notation yields:

(2) V = C V + ε V = ( I n - C ) - 1 ε \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {V}}={\mathbf {C}}{\mathbf {V}}+{\varvec{\varepsilon }} \quad \Leftrightarrow \quad {\mathbf {V}}&=({\mathbf{I}}_n-{\mathbf {C}})^{-1}{\varvec{\varepsilon }} \end{aligned}$$\end{document}

The n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times n$$\end{document} identity matrix is denoted as I n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{I}}_n$$\end{document}. The n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times n$$\end{document} matrix of structural coefficients is denoted as C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {C}}$$\end{document}, and we sometimes use the notation C ( θ F ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {C}}({\varvec{\theta }}_{{\mathcal {F}}})$$\end{document} to emphasize that C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {C}}$$\end{document} is a function of θ F \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {F}}}$$\end{document}. We restrict our attention to recursive systems for which the variables V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}$$\end{document} can be ordered in such a way that the matrix C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {C}}$$\end{document} is strictly lower triangular (which ensures the existence of the inverse in Eq. (2); Bollen, Reference Bollen1989). The set of error terms is denoted by E = { ε 1 , . . . , ε n } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {E}}=\{\varepsilon _1,...,\varepsilon _n\}$$\end{document}. Each error term ε i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon _i$$\end{document}, i = 1 , . . . , n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,...,n$$\end{document}, comprises variables that determine the level of V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} but are not explicitly included in the model. Typically the following assumptions (or a subset thereof) are made (Brito & Pearl, Reference Brito and Pearl2002; Kang & Tian, Reference Kang and Tian2009; Koster, Reference Koster1999):

  1. (a) E ( ε ) = 0 n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {E}({\varvec{\varepsilon }})={\mathbf {0}}_n$$\end{document}, where 0 n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {0}}_n$$\end{document} is an n × 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\times 1$$\end{document} vector that contains only zeros.

  2. (b) E ( ε ε ) = Ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {E}({\varvec{\varepsilon }}{\varvec{\varepsilon }}^\intercal )={\varvec{\Psi }}$$\end{document}, where the n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\times n$$\end{document} matrix Ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Psi }}$$\end{document} is finite, symmetric and positive definite.

  3. (c) ε N n ( 0 n , Ψ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\varepsilon }}\sim {{N}}_n({\mathbf {0}}_n,{\varvec{\Psi }})$$\end{document}, where N n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{N}}_n$$\end{document} denotes the n-dimensional normal distribution.

A nonzero covariance ψ ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{ij}$$\end{document} indicates the existence of an unobserved common cause of the variables V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} and V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_j$$\end{document}. The p × 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p \times 1$$\end{document} parameter vector θ P Θ P R p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {P}}}\in {\varvec{\Theta }}_{{\mathcal {P}}}\subseteq {\mathbb {R}}^p$$\end{document} contains all distinct, functionally unrelated and unknown parameters from the error term distribution. Θ P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Theta }}_{{\mathcal {P}}}$$\end{document} denotes the parameter space, and it is a subspace of the p-dimensional Euclidean space. We sometimes use the notation Ψ ( θ P ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Psi }}({\varvec{\theta }}_{{\mathcal {P}}})$$\end{document} to emphasize that Ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Psi }}$$\end{document} is a function of θ P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {P}}}$$\end{document}. The resulting model implied joint distribution of the observed variables is denoted by { P ( v , θ ) θ Θ } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{P({\mathbf {v}},{\varvec{\theta }})\mid {\varvec{\theta }} \in {\varvec{\Theta }}\}$$\end{document}, where Θ = Θ F × Θ P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Theta }}={\varvec{\Theta }}_{{\mathcal {F}}}\times {\varvec{\Theta }}_{{\mathcal {P}}}$$\end{document}, and P is the family of n-dimensional multivariate normal distributions.

The graph G \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {G}}$$\end{document} is constructed by drawing a directed edge from V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} pointing to V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_j$$\end{document} if and only if the corresponding coefficient is not restricted to zero (i.e., c ji 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{ji}\ne 0$$\end{document}). A bidirected edge between vertices V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} and V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_j$$\end{document} is drawn if and only if ψ ij 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{ij}\ne 0$$\end{document} (bidirected edges are often drawn using dashed lines). The absence of a bidirected edge between V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} and V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_j$$\end{document} reflects the assumption that there is no unobserved variable that has a direct causal effect on both V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} and V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_j$$\end{document} (no unobserved confounding).Footnote 7 For recursive systems, the resulting graph belongs to the class of acyclical directed mixed graphs (ADMG), whereas mixed refers to the fact that graphs in this class contain directed edges as well as bidirected edges (Richardson, Reference Richardson2003; Shpitser, Reference Shpitser2018). An example model with n = 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=6$$\end{document} variables and the corresponding causal graph is introduced in the illustration section.

At the heart of the graph-based approach to causal inference lies a hypothetical experiment in which the values of a subset of observed variables are controlled by an intervention. This exogenous intervention is formally denoted via the do-operator, namely d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document}, where x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}$$\end{document} denotes the interventional levels and X V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {X}}\subseteq {\mathcal {V}}$$\end{document} denotes the subset of variables that are controlled by the experimenter. The system of equation under the intervention d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document} is obtained from the original system by replacing the equation for each variable V i X \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i\in {\mathcal {X}}$$\end{document} (i.e., for each variable that is subject to the d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document}-intervention) with the equation , where is a constant interventional level (Pearl, Reference Pearl2009; Spirtes et al., Reference Spirtes, Glymour and Scheines2001). Note that the d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document}-intervention does not alter the equations for variables that are not subject to intervention, an assumption known as autonomy or modularity (Pearl, Reference Pearl2009; Peters, Janzing, & Schölkopf, Reference Peters, Janzing and Schölkopf2017; Spirtes et al., Reference Spirtes, Glymour and Scheines2001).

The probability distribution of the variables V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}$$\end{document} one would observe had the intervention d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document} been uniformly applied to the entire population is called the interventional distribution, and it is denoted as P ( V d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P({\mathbf {V}}\mid do({\mathbf {x}}))$$\end{document}.Footnote 8 The interventional distribution differs formally and conceptually from the conditional distribution P ( V X = x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P({\mathbf {V}}\mid {\mathbf {X}}={\mathbf {x}})$$\end{document}. The former describes a situation where the data-generating mechanism has been altered by an external d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document}-type intervention in an (hypothetical) experiment. The latter describes a situation where the data-generating mechanism of V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}$$\end{document} has not been altered, but the evidence X = x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {X}}={\mathbf {x}}$$\end{document} about the values of a subset of variables X V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {X}}\subseteq {\mathcal {V}}$$\end{document} is available. These differences will be further discussed in the illustration section (see also, e.g., Gische, West, & Voelkle, Reference Gische, West and Voelkle2021; Pearl, Reference Pearl2009).

In the remainder of this section, we translate the changes in the data-generating mechanism induced by the d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document}-intervention into matrix notation (see Hauser and Bühlmann (Reference Hauser and Bühlmann2015) for a similar approach). The following definition introduces the required notation.

Definition 1.

(interventions in linear graph-based models)

  1. 1. Variables X V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {X}}\subseteq {\mathcal {V}}$$\end{document} are subject to an external intervention, where | X | = K x n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\mathcal {X}}|=K_x \le n$$\end{document} denotes the set size. The K x × 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_x\times 1$$\end{document} vector of interventional levels is denoted by x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}$$\end{document}. The external intervention is denoted by d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document}.

  2. 2. Let I { 1 , 2 , . . . , n } , | I | = K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {I}} \subseteq \{1,2,...,n\},\, |{\mathcal {I}}|=K_x$$\end{document} denote the index set of variables that are subject to intervention. The index set of all variables that are not subject to intervention is denoted by N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {N}}$$\end{document}, namely N : = { 1 , 2 , . . . , n } \ I , | N | = n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {N}}:=\{1,2,...,n\}\setminus {\mathcal {I}},\, |{\mathcal {N}}|=n-K_x$$\end{document}, where the operator \ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\setminus $$\end{document} denotes the set complement.

  3. 3. Let ı i R n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{{{\imath }}}_i \in {\mathbb {R}}^n$$\end{document} be the i-th unit vector, namely a (column) vector with entry 1 on the i-th component and zeros elsewhere. The n × K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times K_x$$\end{document} matrix 1 I : = ( ı i ) i I \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {1}}_{{\mathcal {I}}}:=(\varvec{{{\imath }}}_i)_{i\in {\mathcal {I}}}$$\end{document} contains all unit vectors with an interventional index. The n × ( n - K x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times (n-K_x)$$\end{document} matrix 1 N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{1}}_{{\mathcal {N}}}$$\end{document} is defined analogously, namely 1 N : = ( ı i ) i N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {1}}_{{\mathcal {N}}}:=(\varvec{{{\imath }}}_i)_{i\in {\mathcal {N}}}$$\end{document}. The matrices 1 I \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {1}}_{{\mathcal {I}}}$$\end{document} and 1 N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {1}}_{{\mathcal {N}}}$$\end{document} are called selection matrices.

  4. 4. Let I N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{I}}_ {{\mathcal {N}}}$$\end{document} be an n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times n$$\end{document} diagonal matrix with zeros and ones as diagonal values. The i-th diagonal value is equal to one if i N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in {\mathcal {N}}$$\end{document} and zero otherwise.

Note that all of the elements of the matrices 1 I , 1 N , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {1}}_{{\mathcal {I}}},{\mathbf {1}}_{{\mathcal {N}}},$$\end{document} and I N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{I}}_ {{\mathcal {N}}}$$\end{document} are either zero or unity. The variables V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}$$\end{document} in a linear graph-based model under the intervention d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document} are determined by the following set of structural equations:Footnote 9

(3) given d o ( x ) : V = I N C V + I N ε + 1 I x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {given} \ do({\mathbf {x}}):\quad {\mathbf {V}}&={\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}}{\mathbf {V}}+{\mathbf{I}}_ {{\mathcal {N}}}{\varvec{\varepsilon }}+{\mathbf {1}}_{{\mathcal {I}}}{\mathbf {x}} \end{aligned}$$\end{document}

The corresponding interventional reduced form equation is given by:

(4) V d o ( x ) = ( I n - I N C ) - 1 ( I N ε + 1 I x ) = ( I n - I N C ) - 1 I N = : T 1 n × n ε + ( I n - I N C ) - 1 1 I = : a 1 n × K x x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {V}}\mid do({\mathbf {x}})&=({\mathbf{I}}_ {n}-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}({\mathbf{I}}_ {{\mathcal {N}}}{\varvec{\varepsilon }}+{\mathbf {1}}_{{\mathcal {I}}} {\mathbf {x}})=\underbrace{({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_ {{\mathcal {N}}}}_{=:{\mathbf {T}}_1 \ \ n \times n}{\varvec{\varepsilon }}+\underbrace{({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1} {\mathbf {1}}_{{\mathcal {I}}}}_{=:{\mathbf {a}}_1 \ \ n \times K_x}{\mathbf {x}} \end{aligned}$$\end{document}

The matrix I N C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}}$$\end{document} is obtained from C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {C}}$$\end{document} by replacing its rows with interventional indexes by rows of zeros, and consequently ( I n - I N C ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\mathbf{I}}_ {n}-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})$$\end{document} is non-singular. Equation (4) states that V d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}\mid do({\mathbf {x}})$$\end{document} is a linear transformation of the random vector ε \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\varepsilon }}$$\end{document}. The corresponding transformation matrix is labeled as T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_1$$\end{document}, and the additive constant is a 1 x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_1{\mathbf {x}}$$\end{document}.

The target quantity of interest is the interventional distribution of those variables that are not subject to intervention, denoted by V N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}_{{\mathcal {N}}}$$\end{document}. The reduced form equation of all non-interventional variables is given by:

(5) V N d o ( x ) = 1 N V d o ( x ) = 1 N ( T 1 ε + a 1 x ) = 1 N T 1 = : T 2 ε + 1 N a 1 = : a 2 x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {V}}_{{\mathcal {N}}}\mid do({\mathbf {x}})={\mathbf {1}}_{{\mathcal {N}}}^{\intercal }{\mathbf {V}}\mid do({\mathbf {x}})={\mathbf {1}}_{{\mathcal {N}}}^{\intercal } ({\mathbf {T}}_1{\varvec{\varepsilon }}+{\mathbf {a}}_1{\mathbf {x}}) =\underbrace{{\mathbf {1}}_{{\mathcal {N}}}^{\intercal } {\mathbf {T}}_1}_{=:{\mathbf {T}}_2}{\varvec{\varepsilon }} +\underbrace{{\mathbf {1}}_{{\mathcal {N}}}^{\intercal } {\mathbf {a}}_1}_{=:{\mathbf {a}}_2}{\mathbf {x}} \end{aligned}$$\end{document}

Important characteristics of the distribution of a linear transformation of a random vector depend on the rank of the transformation matrix.

Lemma 2.

(rank of transformation matrices) The n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times n$$\end{document} transformation matrix T 1 : = ( I n - I N C ) - 1 I N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_1:=({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_ {{\mathcal {N}}}$$\end{document} has reduced rank n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n-K_x$$\end{document}. The ( n - K x ) × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(n-K_x)\times n$$\end{document} transformation matrix T 2 : = 1 N ( I n - I N C ) - 1 I N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_2:={\mathbf {1}}_{{\mathcal {N}}}^{\intercal }({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_ {{\mathcal {N}}}$$\end{document} has full row rank n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n-K_x$$\end{document}.

Proof.

See Appendix. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Based on the reduced form equations, we derive the interventional distribution and its features in the following section.

Interventional Distribution

Combining the reduced form stated in Eq. (4) with the assumptions on the first- and second-order moments of the error term distribution yields the following moments of the interventional distribution:

(6a) E ( V d o ( x ) ) = E ( T 1 ε + a 1 x ) = a 1 x = ( I n - I N C ) - 1 1 I x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))&=\mathrm {E}({\mathbf {T}}_1\mathbf {{\varvec{\varepsilon }}} +{\mathbf {a}}_1{\mathbf {x}})={\mathbf {a}}_1{\mathbf {x}} =({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf {1}}_{{\mathcal {I}}} {\mathbf {x}} \end{aligned}$$\end{document}
(6b) V ( V d o ( x ) ) = V ( T 1 ε + a 1 x ) = T 1 Ψ T 1 = ( I n - I N C ) - 1 I N Ψ I N ( I n - I N C ) - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathrm {V}({\mathbf {V}}\mid do({\mathbf {x}}))&=\mathrm {V}({\mathbf {T}}_1{\varvec{\varepsilon }} +{\mathbf {a}}_1{\mathbf {x}})={\mathbf {T}}_1{\varvec{\Psi }} {\mathbf {T}}_1^\intercal =({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_ {{\mathcal {N}}}{\varvec{\Psi }}{\mathbf{I}}_ {{\mathcal {N}}}({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-\intercal } \end{aligned}$$\end{document}

The results are obtained via a direct application of the rules for the computation of moments of linear transformations of random variables. Note that these results do not require multivariate normality of the error terms. The interventional mean vector is functionally dependent on the vector of interventional levels x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}$$\end{document}, whereas the interventional covariance matrix is functionally independent of x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}$$\end{document}. The interventional distribution in linear graph-based models with multivariate normal error terms is given as:

Result 3.

(interventional distribution for Gaussian linear graph-based models)

(7a) V d o ( x ) N n n - K x ( a 1 x , T 1 Ψ T 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {V}} \mid do({\mathbf {x}})&\sim {{N}}^{n-K_x}_n (\ {\mathbf {a}}_1{\mathbf {x}} \ , \ {\mathbf {T}}_1{\varvec{\Psi }} {\mathbf {T}}_1^{\intercal } \ ) \end{aligned}$$\end{document}
(7b) V N d o ( x ) N n - K x ( a 2 x , T 2 Ψ T 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {V}}_{{\mathcal {N}}} \mid do({\mathbf {x}})&\sim {{N}}_{n-K_x} \quad ( \ {\mathbf {a}}_2{\mathbf {x}} \ , \ {\mathbf {T}}_2{\varvec{\Psi }}{\mathbf {T}}_2^{\intercal } \ ) \end{aligned}$$\end{document}

Proof.

Both results follow from the fact that linear transformations of multivariate normal vectors are also multivariate normal (Rao, Reference Rao1973). Results on the rank of the transformation matrices T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_1$$\end{document} and T 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_2$$\end{document} can be found in Lemma 2. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Equation (7a) states that the interventional distribution of all variables is a singular normal distribution in R n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^n$$\end{document} with reduced rank n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n-K_x$$\end{document} as denoted by the superscript n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n-K_x$$\end{document}. Singularity follows from the fact that the K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_x$$\end{document} interventional variables are no longer random given the d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do({\mathbf {x}})$$\end{document}-intervention, but are fixed to the constant interventional levels x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}$$\end{document}. Therefore, the random vector V d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}\mid do({\mathbf {x}})$$\end{document} satisfies the restriction 1 I ( V d o ( x ) ) = x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {1}}_{{\mathcal {I}}}^{\intercal }({\mathbf {V}}\mid do({\mathbf {x}}))={\mathbf {x}}$$\end{document} with a probability of one. Equation (7b) states that the vector of all non-interventional variables follows a ( n - K x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(n-K_x)$$\end{document}-dimensional normal distribution.

Typically, one is interested in a subset Y V N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}\subseteq {\mathcal {V}}_{{\mathcal {N}}}$$\end{document} of outcome variables. The marginal interventional distribution P ( y d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P({\mathbf {y}}\mid do({\mathbf {x}}))$$\end{document} can be obtained as follows:

Result 4.

(marginal interventional distribution for Gaussian linear graph-based models) Let the outcome variables Y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}$$\end{document} be a subset of the non-interventional variables (i.e., Y V N , | Y | = K y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}\subseteq {\mathcal {V}}_{{\mathcal {N}}}, \, |{\mathcal {Y}}|=K_y$$\end{document}). The index set of the outcome variables is denoted as I y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {I}}_y$$\end{document}. Then, the following result holds:

(8) P ( y d o ( x ) ) N K y ( 1 I y a 1 x , 1 I y T 1 Ψ T 1 1 I y ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P({\mathbf {y}}\mid do({\mathbf {x}}))&\sim {N}_{K_y} ( \ {\mathbf {1}}_{{\mathcal {I}}_y}^\intercal {\mathbf {a}}_1{\mathbf {x}} \ , \ {\mathbf {1}}_{{\mathcal {I}}_y}^\intercal {\mathbf {T}}_1 {\varvec{\Psi }}{\mathbf {T}}_1^{\intercal }{\mathbf {1}}_{{\mathcal {I}}_y} \ ) \end{aligned}$$\end{document}

The result follows from the fact that the family of multivariate normal distributions is closed with respect to marginalization (Rao, Reference Rao1973). An important special case of Result 4 is the ATE of a single variable V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document} on another variable V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_j$$\end{document}, which is obtained by the setting Y = { V j } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}=\{V_j\}$$\end{document} and X = { V i } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {X}}=\{V_i\}$$\end{document} (and consequently I x = { i } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {I}}_x=\{i\}$$\end{document}, I y = { j } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {I}}_y=\{j\}$$\end{document}, K y = K x = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_y=K_x=1$$\end{document}). The ATE of the intervention do(x) relative to the intervention d o ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x')$$\end{document} (where x and x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x'$$\end{document} are distinct treatment levels) on Y is defined as the mean difference E ( y d o ( x ) ) - E ( y d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {E}(y\mid do(x))-\mathrm {E}(y\mid do(x'))$$\end{document}. For a single outcome variable { V j } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{V_j\}$$\end{document}, the selection matrix 1 I y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {1}}_{{\mathcal {I}}_y}$$\end{document} simplifies to the unit vector ı j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{{{\imath }}}_j$$\end{document} and E ( y d o ( x ) ) - E ( y d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {E}(y\mid do(x))-\mathrm {E}(y\mid do(x'))$$\end{document} can be expressed as ı j a 1 ( x - x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{{{\imath }}}_j^\intercal {\mathbf {a}}_1(x-x')$$\end{document} (using the mean expression from the normal distribution in Eq. [8]).

The probability density function (pdf) of the interventional distribution of all non-interventional variables is given as follows:

(9) f ( v N d o ( x ) ) = ( 2 π ) - n - K x 2 | T 2 Ψ T 2 | - 1 2 exp - 1 2 ( v N - a 2 x ) ( T 2 Ψ T 2 ) - 1 ( v N - a 2 x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f({\mathbf {v}}_{{\mathcal {N}}}\mid do({\mathbf {x}}))=(2\pi )^{-\frac{n-K_x}{2}}|{\mathbf {T}}_{2} {\varvec{\Psi }}{\mathbf {T}}_{2}|^{-\frac{1}{2}}\exp \left( -\frac{1}{2}({\mathbf {v}}_{{\mathcal {N}}} -{\mathbf {a}}_{2}{\mathbf {x}})^{\intercal }({\mathbf {T}}_{2} {\varvec{\Psi }}{\mathbf {T}}_{2}^{\intercal })^{-1}({\mathbf {v}}_{{\mathcal {N}}} -{\mathbf {a}}_{2}{\mathbf {x}})\right) \end{aligned}$$\end{document}

Many features of the interventional distribution that hold substantive interest in applied research (e.g., probabilities of interventional events, quantiles of the interventional distribution) can be calculated from the pdf via integration. For example, a physician would like a patient’s blood glucose level (outcome) to fall into a predefined range of values (e.g., to avoid hypo- or hyperglycemia) given an injection of insulin (intervention). More formally, let [ y low , y up ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[{\mathbf {y}}^{\textit{low}},{\mathbf {y}}^{\textit{up}}]$$\end{document} denote a predefined range of values of a set of outcome variables Y V N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}\subseteq {\mathcal {V}}_{{\mathcal {N}}}$$\end{document}. The interventional probability P ( y low y y up d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P({\mathbf {y}}^{{low}}\le {\mathbf {y}} \le {\mathbf {y}}^{{up}} \mid do({\mathbf {x}}))$$\end{document} is given by:

(10) P ( y low y y up d o ( x ) ) = y low y up f ( y d o ( x ) ) d y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P({\mathbf {y}}^{{low}}\le {\mathbf {y}} \le {\mathbf {y}}^{{up}} \mid do({\mathbf {x}})) =\int _{{\mathbf {y}}^{{low}}}^{{\mathbf {y}}^{{up}}} f({\mathbf {y}}\mid do({\mathbf {x}})) \textsf {d}{\mathbf {y}} \end{aligned}$$\end{document}

The interventional distribution and its features will be used to formally define parametric causal quantities in the following section.

Causal Effect Functions

In this section, we formally define terms containing the do-operator as causal quantities denoted by γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document}. According to this definition, any feature of the interventional distribution that can be expressed using the do-operator is a causal quantity. Let the space of causal quantities be denoted as Γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Gamma }}$$\end{document}. As discussed in earlier in the section on “Graph-Based Causal Models with Linear Equations,” linear causal models imply a joint distribution of observed variables that is parametrized by θ Θ R q + p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}\in {\varvec{\Theta }}\subseteq {\mathbb {R}}^{q+p}$$\end{document} and denoted by { P ( v , θ ) θ Θ } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{P({\mathbf {v}},{\varvec{\theta }})\mid {\varvec{\theta }} \in {\varvec{\Theta }}\}$$\end{document}. A function g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}$$\end{document} that maps the parameters θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} of the model implied joint distribution onto a causal quantity γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} is called causal effect function. This idea is illustrated in Fig. 1 and stated in Definition 5.

Figure. 1 Causal Effect Functions. Figure 1 displays the mapping g : Θ Γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}:{\varvec{\Theta }}\mapsto {\varvec{\Gamma }}$$\end{document} that corresponds to a causal effect function γ = g ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}={\mathbf {g}}({\varvec{\theta }})$$\end{document}. The domain Θ R q + p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Theta }}\subseteq {\mathbb {R}}^{q+p}$$\end{document} (left-hand side) contains the parameters of the model implied joint distribution of observed variables (no do-operator). The co-domain Γ R r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Gamma }}\subseteq {\mathbb {R}}^{r}$$\end{document} (right-hand side) contains causal quantities γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} that are defined via the do-operator.

Definition 5.

(causal quantity and causal effect function) Let γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} be an r-dimensional feature of the interventional distribution. Let Θ γ Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Theta }}_{\varvec{\gamma }}\subseteq {\varvec{\Theta }}$$\end{document} be an s-dimensional subspace of the parameter space of the model implied joint distribution of observed variables. A mapping g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}$$\end{document}

(11) g : Θ γ R r , with γ = g ( θ γ ) , θ γ Θ γ R s , γ R r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {g}}: {\varvec{\Theta }}_{{\varvec{\gamma }}} \mapsto {\mathbb {R}}^{r}, \quad \text {with} \ {\varvec{\gamma }}={\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }}), \quad {\varvec{\theta }}_{\varvec{\gamma }}\in {\varvec{\Theta }}_{\varvec{\gamma }}\subseteq {\mathbb {R}}^s, {\varvec{\gamma }}\in {\mathbb {R}}^{r} \end{aligned}$$\end{document}

is called a causal effect function. The image γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} of a causal effect function is called a causal quantity which is parametrized by θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document}. If the value of a causal quantity depends on other variables (e.g., the interventional level x R K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}\in {\mathbb {R}}^{K_x}$$\end{document}, the values v N R n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {v}}_{{\mathcal {N}}}\in {\mathbb {R}}^{n-K_x}$$\end{document} of non-interventional variables), we include these variables as auxiliary arguments in the causal effect function separated by a semicolon (e.g., g ( θ γ ; x , v N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }};{\mathbf {x}}, {\mathbf {v}}_{{\mathcal {N}}})$$\end{document}).

This idea can be applied to the interventional mean from Eq. (6a) by defining it as a causal quantity γ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}_1$$\end{document} as follows:

(12a) γ 1 : = E ( V d o ( x ) ) = g 1 ( θ F ; x ) = ( I n - I N C ( θ F ) ) - 1 1 I x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\varvec{\gamma }}_1&:=\mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}})) ={\mathbf {g}}_1({\varvec{\theta }}_{{\mathcal {F}}};{\mathbf {x}})=({\mathbf{I}}_n -{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}}({\varvec{\theta }}_{{\mathcal {F}}}))^{-1} {\mathbf {1}}_{{\mathcal {I}}}{\mathbf {x}} \end{aligned}$$\end{document}
(12b) g 1 : Θ Θ F R n Γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {g}}_1&:{\varvec{\Theta }} \supseteq {\varvec{\Theta }}_{{\mathcal {F}}}\mapsto {\mathbb {R}}^{n}\subseteq {\varvec{\Gamma }} \end{aligned}$$\end{document}

The right-hand side of Eq. (12a) is free of the do-operator and contains the parameter vector θ F \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {F}}}$$\end{document} (structural coefficients ) as a main argument and the interventional level x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}$$\end{document} as an auxiliary argument. Thus, the causal effect function g 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_1$$\end{document} maps the parameter vector θ F \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {F}}}$$\end{document} onto the interventional mean. The interventional mean is an n × 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\times 1$$\end{document} vector and therefore the co-domain of g 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_1$$\end{document} is R n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{n}$$\end{document} (i.e., r = n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=n$$\end{document}), as stated in Eq. (12b). Note that the causal effect function g 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_1$$\end{document} depends on the distinct and functionally unrelated structural coefficients θ F \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {F}}}$$\end{document} but is independent of the parameters from the error term distribution θ P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {P}}}$$\end{document}. Therefore, the domain of g 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_1$$\end{document} is Θ F \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Theta }}_{{\mathcal {F}}}$$\end{document} and s = q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=q$$\end{document}.

The interventional covariance matrix from Eq. (6b) can be expressed using the notation from Definition 5 as follows:

(13) γ 2 : = vech ( V ( V d o ( x ) ) ) = g 2 ( θ ) = vech ( I n - I N C ( θ F ) ) - 1 I N Ψ ( θ P ) I N ( I n - I N C ( θ F ) ) - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\varvec{\gamma }}_2:&={{{\,\mathrm{\textsf {vech}}\,}}}(\mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}})))={\mathbf {g}}_2({\varvec{\theta }})\nonumber \\&={{{\,\mathrm{\textsf {vech}}\,}}}\left( ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}} ({\varvec{\theta }}_{{\mathcal {F}}}))^{-1}{\mathbf{I}}_{{\mathcal {N}}} {\varvec{\Psi }}({\varvec{\theta }}_{{\mathcal {P}}}){\mathbf{I}}_{{\mathcal {N}}} ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}} ({\varvec{\theta }}_{{\mathcal {F}}}))^{-\intercal }\right) \end{aligned}$$\end{document}

To avoid matrix valued causal effect functions, we defined γ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}_2$$\end{document} as the half-vectorized interventional covariance matrix, which is of dimension r = n ( n + 1 ) / 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=n(n+1)/2$$\end{document} (the operator vech \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\,\mathrm{\textsf {vech}}\,}}}$$\end{document} stands for half-vectorization). The interventional covariance matrix is a function of both the structural coefficients θ F \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {F}}}$$\end{document} and the entries of the covariance matrix θ P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\mathcal {P}}}$$\end{document}. Thus, θ γ 2 = θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{{\varvec{\gamma }}}_2}={\varvec{\theta }}$$\end{document} and s = q + p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=q+p$$\end{document}. No auxiliary arguments are included in the causal effect function g 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_2$$\end{document}, since the value of γ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}_2$$\end{document} only depends on the values of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} (recall that I n , I N , 1 I \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf{I}}_n,{\mathbf{I}}_{{\mathcal {N}}},{\mathbf {1}}_{{\mathcal {I}}}$$\end{document} are constant zero-one matrices).

The interventional pdf f ( v N d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f({\mathbf {v}}_{{\mathcal {N}}}\mid do({\mathbf {x}}))$$\end{document} from Eq. (9) can be formally defined as a causal effect function as follows:

(14) γ 3 : = g 3 ( θ ; x , v N ) = ( 2 π ) - n - K x 2 | T 2 ( θ F ) Ψ ( θ P ) T 2 ( θ F ) | - 1 2 × exp - 1 2 ( v N - a 2 ( θ F ) x ) ( T 2 ( θ F ) Ψ ( θ P ) T 2 ( θ F ) ) - 1 ( v N - a 2 ( θ F ) x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _3:=&g_3({\varvec{\theta }};{\mathbf {x}},{\mathbf {v}}_{{\mathcal {N}}}) =(2\pi )^{-\frac{n-K_x}{2}}|{\mathbf {T}}_{2}({\varvec{\theta }}_{{\mathcal {F}}}) {\varvec{\Psi }}({\varvec{\theta }}_{{\mathcal {P}}}){\mathbf {T}}_{2} ({\varvec{\theta }}_{{\mathcal {F}}})^{\intercal }|^{-\frac{1}{2}}\nonumber \\ \times&\exp \left( -\frac{1}{2}({\mathbf {v}}_{{\mathcal {N}}}-{\mathbf {a}}_{2} ({\varvec{\theta }}_{{\mathcal {F}}}){\mathbf {x}})^{\intercal }({\mathbf {T}}_{2} ({\varvec{\theta }}_{{\mathcal {F}}}){\varvec{\Psi }}({\varvec{\theta }}_{{\mathcal {P}}}) {\mathbf {T}}_{2}({\varvec{\theta }}_{{\mathcal {F}}})^{\intercal })^{-1} ({\mathbf {v}}_{{\mathcal {N}}}-{\mathbf {a}}_{2} ({\varvec{\theta }}_{{\mathcal {F}}}){\mathbf {x}})\right) \end{aligned}$$\end{document}

The interventional density depends on both the structural coefficients and the parameters of the error term distribution, yielding θ γ 3 = θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\gamma _3}={\varvec{\theta }}$$\end{document}, Θ γ 3 = Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Theta }}_{\gamma _3}={\varvec{\Theta }}$$\end{document} and s = q + p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=q+p$$\end{document}. The interventional density is scalar-valued and thus r = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=1$$\end{document}. Since the value of the interventional pdf depends on x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}$$\end{document} and v N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {v}}_{{\mathcal {N}}}$$\end{document}, both are included as auxiliary arguments in the causal effect function g 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_3$$\end{document}, namely g 3 ( θ ; x , v N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_3({\varvec{\theta }};{\mathbf {x}},{\mathbf {v}}_{{\mathcal {N}}})$$\end{document}.

Probabilities of interventional events can be understood as a causal quantity in the following way:

(15) γ 4 : = P ( y low y y up d o ( x ) ) = g 4 ( θ γ 4 ; x , y low , y up ) = y low y up f ( y d o ( x ) ) d y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _4:=P({\mathbf {y}}^{{low}}\le {\mathbf {y}} \le {\mathbf {y}}^{{up}} \mid do({\mathbf {x}}))=g_4({\varvec{\theta }}_{\gamma _4};{\mathbf {x}}, {\mathbf {y}}^{{low}},{\mathbf {y}}^{{up}}) =\int _{{\mathbf {y}}^{{low}}}^{{\mathbf {y}}^{{up}}} f({\mathbf {y}}\mid do({\mathbf {x}})) \textsf {d}{\mathbf {y}} \end{aligned}$$\end{document}

Where θ γ 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\gamma _4}$$\end{document} is the subset of parameters that appear in the marginal interventional pdf f ( y d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f({\mathbf {y}}\mid do({\mathbf {x}}))$$\end{document}. The causal effect function g 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_4$$\end{document} is a scalar-valued and thus r = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=1$$\end{document}. The value of the interventional probability depends on x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}$$\end{document}, y low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {y}}^{{low}}$$\end{document}, and y up \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {y}}^{{up}}$$\end{document} ( y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {y}}$$\end{document} integrates out), which are included as auxiliary arguments in the causal effect function g 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_4$$\end{document}.

Identification of Parametrized Causal Quantities

The meaning of the term “identification” as used in the nonparametric graph-based approach slightly differs from the meaning in the field of traditional SEM. A graph-based causal quantity is said to be identified if it can be expressed as a functional of joint, marginal, or conditional distributions of observed variables (Pearl, Reference Pearl2009). The latter distributions can in principle be estimated based on observational data using nonparametric statistical models. In other words, an identified nonparametric causal quantity could in theory be computed from an infinitely large sample without further limitations.Footnote 10 Graph-based tools for identification exploit the causal structure depicted in the causal graph and are independent of the functional form of the structural equations. Thus, causal identification is established in the absence of the risk of misspecification of the functional form.

By contrast, model identification in traditional parametric SEM relies on the solvability of a system of nonlinear equations in terms of a finite number of model parameters. A single parameter θ Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta \in {\varvec{\Theta }}$$\end{document} is identified if it can be expressed as a function of moments of the the joint distribution of observed variables in a unique way (Bekker et al., Reference Bekker, Merckens and Wansbeek1994; Bollen & Bauldry, Reference Bollen and Bauldry2010). If all parameters in θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} are identified, then the model is identified. Definition 6 uses causal effect functions to combine the above ideas.

Definition 6.

(causal identification of parametrized causal quantities) Let γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} be a parametrized causal quantity in a linear graph-based model. γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} is said to be causally identified if (i) it can be expressed in a unique way as a function of the parameter vector θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document} via a causal effect function, namely γ = g ( θ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}={\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})$$\end{document}, and (ii) the value of θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document} can be uniquely computed from the joint distribution of the observed variables.

Based on this insight, graph-based techniques for causal identification in linear models have been derived, for example by Brito and Pearl (Reference Brito and Pearl2006); Drton, Foygel, and Sullivant (Reference Drton, Foygel and Sullivant2011); Kuroki and Cai (Reference Kuroki and Cai2007). Furthermore, part (ii) of the above definition has been dealt with extensively in the literature on traditional linear SEM (see, e.g., Bekker et al., Reference Bekker, Merckens and Wansbeek1994; Bollen, Reference Bollen1989; Fisher, Reference Fisher1966; Wiley, Reference Wiley1973).

We now illustrate Definition 6 for the causal quantities defined in Eqs. (22a) and (22b) from the illustration section. For the interventional mean stated in Eq. (22a), part (i) of the definition is satisfied, since the causal quantity γ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _1$$\end{document} can be expressed as a function of the parameter θ γ 1 = c yx \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }_{\gamma _1}={c_{yx}}$$\end{document} in a unique way as follows: γ 1 : = E ( Y 3 d o ( x 2 ) ) = g 1 ( θ γ 1 ; x 2 ) = c yx x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{1}:=\mathrm{E}(Y_3 \mid do(x_2))=g_1({\theta }_{\gamma _1};x_2)=c_{yx}x_2$$\end{document}. Part (ii) of the above definition requires that the single structural coefficient c yx \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{yx}$$\end{document} can be uniquely computed from the joint distribution of the observed variables.

Similarly, part (i) of the definition is satisfied for the the causal quantity γ 2 : = V ( Y 3 d o ( x 2 ) ) = g 2 ( θ γ 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{2}:=\mathrm{V}(Y_3 \mid do(x_2))=g_2({\varvec{\theta }}_{\gamma _2})$$\end{document} in Eq. (22b). Part (ii) of the above definition requires that each of the structural coefficients and (co)variances on the right-hand side of Eq. (22b), namely θ γ 2 = ( c yx , c yy , ψ x 1 x 1 , ψ x 1 y 1 , ψ y 1 y 1 , ψ y 1 y 2 , ψ y 2 y 3 , ψ yy ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\gamma _2}=(c_{yx},c_{yy},\psi _{x_1x_1},\psi _{x_1y_1},\psi _{y_1y_1},\psi _{y_1y_2},\psi _{y_2y_3},\psi _{yy})^\intercal $$\end{document}, can be uniquely computed from the joint distribution of the observed variables.

Note that both of the causal quantities discussed above require only a subset of parameters to be identified (i.e., it is not required to identify the entire model θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document}). After causal identification of a parametrized causal quantity has been established, it can be estimated from a sample using the techniques described in the following section.

Estimation of Causal Quantities

Estimators of causal quantities as defined in Eq. (11) are constructed by replacing the parameters in the causal effect function with a corresponding estimator, namely γ ^ = g ( θ ^ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}={\mathbf {g}}(\widehat{{\varvec{\theta }}}_{\varvec{\gamma }})$$\end{document}. This plug-in procedure is summarized in the following definition.

Definition 7.

(estimation of parametrized causal quantities) Let γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} be an identified causal quantity in a linear graph-based models and g ( θ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})$$\end{document} the corresponding causal effect function. Let θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}$$\end{document} denote an estimator of θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document}, then γ ^ : = g ( θ ^ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}:={\mathbf {g}}(\widehat{{\varvec{\theta }}}_{\varvec{\gamma }})$$\end{document} is an estimator of the causal quantity γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document}.

A main strength of the traditional SEM literature is that a variety of estimation procedures have been developed. Common estimation techniques include maximum likelihood (ML; Jöreskog, Reference Jöreskog1967; Jöreskog & Lawley, Reference Jöreskog and Lawley1968), generalized least squares (GLS; Browne, Reference Browne1974), and asymptotically distribution free (ADF; Browne, Reference Browne1984).Footnote 11 Note that some estimation techniques do not rely on the assumption of multivariate normal error terms and for others robust versions have been proposed that allow for certain types of deviations from multivariate normality (Satorra & Bentler, Reference Satorra and Bentler1994; Yuan & Bentler, Reference Yuan and Bentler1998).

In the following, we assume that causal effect functions g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}$$\end{document} and estimators θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}$$\end{document} satisfy certain regularity conditions stated as Properties A.1 and A.2 in the Appendix. The following theorem establishes the asymptotic properties of estimators of causal quantities γ ^ = g ( θ ^ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}={\mathbf {g}}(\widehat{{\varvec{\theta }}}_{\varvec{\gamma }})$$\end{document}.

Theorem 8.

(asymptotic properties of estimators of causal quantities) Let γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} be a causal quantity and g ( θ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})$$\end{document} the corresponding causal effect function. Let θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}$$\end{document} be an estimator of θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document}. Assume that g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}$$\end{document} and θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}$$\end{document} satisfy Property A.1 and Property A.2, respectively.

(16a) γ ^ = g ( θ ^ γ ) p g ( θ γ ) = γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \widehat{{\varvec{\gamma }}}={\mathbf {g}}(\widehat{{\varvec{\theta }}}_{\varvec{\gamma }})&\xrightarrow { \ p \ } {\mathbf {g}}({\varvec{\theta }}^*_{\varvec{\gamma }})={\varvec{\gamma }}^* \end{aligned}$$\end{document}
(16b) N γ ^ - γ d N r ( 0 r , AV ( N γ ^ ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sqrt{N}\left( \widehat{{\varvec{\gamma }}}-{\varvec{\gamma }}^*\right)&\xrightarrow { \ d \ } {{N}}_{r} \big (\,{\mathbf {0}}_r\, , \,\mathrm{AV}(\sqrt{N}\widehat{{\varvec{\gamma }}})\,\big ) \end{aligned}$$\end{document}
(16c) with AV ( N γ ^ ) : = g ( θ γ ) θ γ | θ γ = θ γ AV ( N θ ^ γ ) g ( θ γ ) θ γ | θ γ = θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {with} \quad \mathrm{AV}(\sqrt{N}\widehat{{\varvec{\gamma }}})&:=\frac{\partial {\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})}{\partial {\varvec{\theta }}_{\varvec{\gamma }}^\intercal }\Bigr |_{{\varvec{\theta }}_{\varvec{\gamma }} ={\varvec{\theta }}_{\varvec{\gamma }}^*}\mathrm{AV}(\sqrt{N}\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}) \frac{\partial {\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})}{\partial {\varvec{\theta }}_{\varvec{\gamma }}}\Bigr |_{{\varvec{\theta }}_{\varvec{\gamma }} ={\varvec{\theta }}_{\varvec{\gamma }}^*} \end{aligned}$$\end{document}

Where θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{{\varvec{\gamma }}}^*$$\end{document} denotes the true population value and p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xrightarrow { \ p \ }$$\end{document} ( d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xrightarrow { \ d \ }$$\end{document}) refers to convergence in probability (distribution) as the sample size N tends to infinity. AV ( N θ ^ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {AV}(\sqrt{N}\widehat{{\varvec{\theta }}}_{\gamma })$$\end{document} denotes the covariance matrix of the limiting distribution.

Proof.

The results are obtained via a straightforward application of standard results on transformations of convergent sequences of random variables (Mann & Wald, Reference Mann and Wald1943; Serfling, Reference Serfling1980, Chapter 1.7), one of which is known as the multivariate delta method (Cramér, Reference Cramér1946; Serfling, Reference Serfling1980, Chapter 3.3). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Theorem 8 establishes that the estimator γ ^ = g ( θ ^ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}={\mathbf {g}}(\widehat{{\varvec{\theta }}}_{\varvec{\gamma }})$$\end{document} is consistent and converges at a rate of N - 1 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^{-\frac{1}{2}}$$\end{document} to the true population value γ = g ( θ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}^*\!\!={\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }}^*)$$\end{document}. The rate of convergence is independent of the finite number of parameters and variables in the model. If the causal effect function contains auxiliary variables, then the results in Theorem 8 hold pointwise for any fixed value combination of the auxiliary variable.

Note that the results in Theorem 8 hold whenever an estimator satisfies Property A.2 and they do not depend on a particular estimation method. However, if θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document} is estimated via maximum likelihood, the proposed estimator γ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}$$\end{document} of the causal quantity has the following property:

Theorem 9.

(asymptotic efficiency of γ ^ = g ( θ ^ γ ML ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}={\mathbf {g}}(\widehat{{\varvec{\theta }}}^{ML}_{\varvec{\gamma }})$$\end{document}) Let the situation be as in Theorem 8 and θ ^ γ ML \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}^{ML}_{\varvec{\gamma }}$$\end{document} denote the maximum likelihood estimator of θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document}. Then, the estimator γ ^ = g ( θ ^ γ ML ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}={\mathbf {g}}(\widehat{{\varvec{\theta }}}^{ML}_{\varvec{\gamma }})$$\end{document}

  1. (i) is the maximum likelihood estimator γ ^ ML \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}^{ML}$$\end{document} of the causal quantity γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document};

  2. (ii) is asymptotically efficient, namely the asymptotic covariance matrix AV ( N γ ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\sqrt{N}\widehat{{\varvec{\gamma }}})$$\end{document} reaches the Cramér–Rao lower bound.

Proof.

Result (i) is a direct consequence of the functional invariance of the ML-estimator (Zehna, Reference Zehna1966; see, for example, Casella & Berger, Reference Casella and Berger2002, Chapter 7.2) and result (ii) was established by Cramér (Reference Cramér1946) and Rao (Reference Rao1945). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

To make inference feasible in practical applications, a consistent estimator of AV ( N γ ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{AV}(\sqrt{N}\widehat{{\varvec{\gamma }}})$$\end{document} is required.

Corollary 10.

(consistent estimator of AV ( N γ ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{AV}(\sqrt{N}\widehat{{\varvec{\gamma }}})$$\end{document}) Let the situation be as in Theorem 8 and let the estimator of AV ( N γ ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\sqrt{N}\widehat{{\varvec{\gamma }}})$$\end{document} be defined as:

(17) AV ^ ( N γ ^ ) : = g ( θ γ ) θ γ | θ γ = θ ^ γ AV ^ ( N θ ^ γ ) g ( θ γ ) θ γ | θ γ = θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\widehat{\mathrm{AV}}}(\sqrt{N}\widehat{{\varvec{\gamma }}})&:=\frac{\partial {\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})}{\partial {\varvec{\theta }}_{\varvec{\gamma }}^\intercal }\Bigr |_{{\varvec{\theta }}_{\varvec{\gamma }} =\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}}{\widehat{\mathrm{AV}}}(\sqrt{N} \widehat{{\varvec{\theta }}}_{\varvec{\gamma }})\frac{\partial {\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})}{\partial {\varvec{\theta }}_{\varvec{\gamma }}}\Bigr |_{{\varvec{\theta }}_{\varvec{\gamma }} =\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}} \end{aligned}$$\end{document}

Then, AV ^ ( N γ ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\mathrm{AV}}}(\sqrt{N}\widehat{{\varvec{\gamma }}})$$\end{document} is a consistent estimator of AV ( N γ ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\sqrt{N}\widehat{{\varvec{\gamma }}})$$\end{document} if AV ^ ( N θ ^ γ ) p AV ( N θ ^ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\mathrm{AV}}}(\sqrt{N}\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}) \xrightarrow { \ p \ } {\mathrm{AV}}(\sqrt{N}\widehat{{\varvec{\theta }}}_{\varvec{\gamma }})$$\end{document}.

Proof.

Note that the partial derivatives g ( θ γ ) θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial {\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})}{\partial {\varvec{\theta }}_{\varvec{\gamma }}^\intercal }$$\end{document} are continuous (see Property A.1) and that θ ^ γ p θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}\xrightarrow { \ p \ }{\varvec{\theta }}^*_{\varvec{\gamma }}$$\end{document} holds (see Property A.2). Thus, the result is a direct consequence of standard results on transformations of convergent sequences of random variables (Mann & Wald, Reference Mann and Wald1943; Serfling, Reference Serfling1980, Chapter 1.7). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Equation (17) states that estimates of the asymptotic covariance matrix of a causal quantity γ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\gamma }}}$$\end{document} can be computed based on (i) the estimate of the asymptotic covariance matrix AV ^ ( N θ ^ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\text{ AV }}}(\sqrt{N}\widehat{{\varvec{\theta }}}_{\varvec{\gamma }})$$\end{document}, and (ii) the Jacobian matrix g ( θ γ ) θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial {\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }}\!)}{\partial {\varvec{\theta }}^{\intercal }_{\varvec{\gamma }}}$$\end{document} (evaluated at θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}$$\end{document}). Estimation results for (i) the asymptotic covariance matrix depend on the estimation method that is used to obtain θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}$$\end{document}. For many standard procedures (e.g., 3SLS, ADF, GLS, GMM, ML, IV), theoretical results on the asymptotic covariance matrix are available in the corresponding literature and estimators are implemented in various software packages (e.g., see Muthén & Muthén, Reference Muthén and Muthen1998-2017; Rosseel, Reference Rosseel2012). Explicit expressions of (ii) the Jacobian matrices for the causal effect functions g 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {g_1}$$\end{document}, g 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {g_2}$$\end{document}, g 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_3$$\end{document}, and g 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_4$$\end{document} are provided in the following corollary.

Corollary 11.

(Jacobian matrices of basic causal effect functions) Let the causal effect functions g 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_1$$\end{document}, g 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_2$$\end{document}, g 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_3$$\end{document}, and g 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_4$$\end{document} be defined as in Eqs. (12a), (13), (14), and (15), respectively. Then, the Jacobian matrices with respect to θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} are given by:

(18a) g 1 ( θ γ 1 ; x ) θ = ( ( x 1 I ( I n - I N C ) - ) ( ( I n - I N C ) - 1 I N ) ) ) vec C θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial {\mathbf {g}}_1({\varvec{\theta }}_{{{\varvec{\gamma }}}_1};{\mathbf {x}})}{\partial {\varvec{\theta }}^\intercal }&=\big (({\mathbf {x}}^\intercal {\mathbf {1}}_{{\mathcal {{I}}}}^\intercal ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal })\otimes (({\mathbf{I}}_n -{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}}))\big ) \frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}} {\mathbf {C}}}{\partial {\varvec{\theta }}^\intercal } \end{aligned}$$\end{document}
(18b) g 2 ( θ γ 2 ) θ = L n [ G 2 , C vec C θ + G 2 , Ψ vec Ψ θ ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial {\mathbf {g}}_2({\varvec{\theta }}_{{{\varvec{\gamma }}}_2})}{\partial {\varvec{\theta }}^\intercal }&={\mathbf {L}}_n\big [{\mathbf {G}}_{2,\mathbf{C}} \frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}{{\mathbf {C}}}}{\partial {\varvec{\theta }}^\intercal } +{\mathbf {G}}_{2,\varvec{\Psi }}\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}{{\varvec{\Psi }}}}{\partial {\varvec{\theta }}^\intercal }\big ]\end{aligned}$$\end{document}
(18c) g 3 ( θ γ 3 ; x , v N ) θ = f ( v N d o ( x ) ) [ G 3 , μ , G 3 , Σ ] 1 N g 1 ( θ γ 1 ; x ) θ ( 1 N 1 N ) D n g 2 ( θ γ 2 ) θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial g_3({\varvec{\theta }}_{{{\gamma }}_3};{\mathbf {x}},{\mathbf {v}}_{{\mathcal {N}}})}{\partial {\varvec{\theta }}^\intercal }&=f({\mathbf {v}}_{{\mathcal {N}}}\mid do({\mathbf {x}}))\big [{\mathbf {G}}_{3,\varvec{\mu }},{\mathbf {G}}_{3,\varvec{\Sigma }}\big ] \begin{pmatrix} {\mathbf {1}}_{{\mathcal {N}}}^\intercal \frac{\partial {\mathbf {g}}_1 ({\varvec{\theta }}_{{{\varvec{\gamma }}}_1};{\mathbf {x}})}{\partial {\varvec{\theta }}^\intercal }\\ ({\mathbf {1}}_{{\mathcal {N}}}^\intercal \otimes {\mathbf {1}}_{{\mathcal {N}}}^\intercal ){\mathbf {D}}_n\frac{\partial {\mathbf {g}}_2({\varvec{\theta }}_{{{\varvec{\gamma }}}_2})}{\partial {\varvec{\theta }}^\intercal } \end{pmatrix}\end{aligned}$$\end{document}
(18d) g 4 ( θ γ 4 ; x , y low , y up ) θ = [ G 4 , μ , G 4 , σ 2 ] ı j g 1 ( θ γ 1 ; x ) θ ı ( j - 1 ) n + j D n g 2 ( θ γ 2 ) θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial g_4({\varvec{\theta }}_{{{\gamma }}_4};{\mathbf {x}},{\mathbf {y}}^{\text {low}}, {\mathbf {y}}^{\text {up}})}{\partial {\varvec{\theta }}^\intercal }&=\big [{\mathbf {G}}_{4,\mu },{\mathbf {G}}_{4,\sigma ^2} \big ] \left( \begin{array}{rl} \varvec{{{\imath }}}_{j}^\intercal &{} \frac{\partial {\mathbf {g}}_1({\varvec{\theta }}_{{{\varvec{\gamma }}}_1};{\mathbf {x}})}{\partial {\varvec{\theta }}^\intercal }\\ \varvec{{{\imath }}}_{(j-1)n+j}^\intercal &{}{\mathbf {D}}_n \frac{\partial {\mathbf {g}}_2({\varvec{\theta }}_{{{\varvec{\gamma }}}_2})}{\partial {\varvec{\theta }}^\intercal } \end{array}\right) \end{aligned}$$\end{document}

Where the unit vector in the upper entry of the vector in Eq. (18d) is of dimension ( n × 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(n \times 1)$$\end{document} and the unit vector in the lower entry is of dimension ( n 2 × 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(n^2 \times 1)$$\end{document}. The matrices denoted by G \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {G}$$\end{document} and a subscript are defined as follows:

G 2 , C : = ( I n 2 + K n ) [ ( ( I n - I N C ) - 1 I N Ψ I N ) I n ] [ ( I n - I N C ) - ( ( I n - I N C ) - 1 I N ) ] G 2 , Ψ : = [ ( I n - I N C ) - 1 ( I n - I N C ) - 1 ] ( I N I N ) G 3 , μ : = ( v N - μ N ) Σ N - 1 G 3 , Σ : = 1 2 ( [ ( v N - μ N ) ( v N - μ N ) ] ( Σ N - 1 Σ N - 1 ) - vec ( Σ N - 1 ) ) G 4 , μ : = - 1 σ y ϕ y up - μ y σ y - ϕ y low - μ y σ y G 4 , σ 2 : = - 1 2 σ y 2 ϕ y up - μ y σ y y up - μ y σ y - ϕ y low - μ y σ y y low - μ y σ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {G}}_{2,\mathbf{C}}&:=({\mathbf{I}}_{n^2}+{\mathbf {K}}_n) \big [(({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}} {\varvec{\Psi }}{\mathbf{I}}_{{\mathcal {N}}})\otimes {\mathbf{I}}_n\big ]\big [({\mathbf{I}}_n -{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal } \otimes ( ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}})\big ]\\ {\mathbf {G}}_{2,\varvec{\Psi }}&:=\big [({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1} \otimes ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}\big ] ({\mathbf{I}}_{{\mathcal {N}}}\otimes {\mathbf{I}}_{{\mathcal {N}}}) \\ {\mathbf {G}}_{3,\varvec{\mu }}&:=({\mathbf {v}}_{{\mathcal {N}}} -{\varvec{\mu }}_{{\mathcal {N}}})^\intercal {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}\\ {\mathbf {G}}_{3,\varvec{\Sigma }}&:=\frac{1}{2}\big (\big [({\mathbf {v}}_{{\mathcal {N}}} -{\varvec{\mu }}_{{\mathcal {N}}})^\intercal \otimes ({\mathbf {v}}_{{\mathcal {N}}} -{\varvec{\mu }}_{{\mathcal {N}}})^\intercal \big ]({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1} \otimes {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})-{{{\,\mathrm{\textsf {vec}}\,}}} ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})^\intercal \big )\\ {\mathbf {G}}_{4,\mu }&:= - \frac{1}{\sigma _y}\left[ \phi \left( \frac{y^{\text {up}} -\mu _y}{\sigma _y}\right) -\phi \left( \frac{y^{\text {low}}-\mu _y}{\sigma _y} \right) \right] \\ {\mathbf {G}}_{4,\sigma ^2}&:= - \frac{1}{2\sigma _y^{2}}\left[ \phi \left( \frac{y^{\text {up}}-\mu _y}{\sigma _y}\right) \left( \frac{y^{\text {up}} -\mu _y}{\sigma _y}\right) -\phi \left( \frac{y^{\text {low}} -\mu _y}{\sigma _y}\right) \left( \frac{y^{\text {low}}-\mu _y}{\sigma _y} \right) \right] \end{aligned}$$\end{document}

Where L n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {L}}_n$$\end{document}, D n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {D}}_n$$\end{document}, and K n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {K}}_n$$\end{document} denote the elimination matrix, duplication matrix, and commutation matrix for n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times n$$\end{document}-matrices, respectively (Magnus & Neudecker, Reference Magnus and Neudecker1979, Reference Magnus and Neudecker1980). μ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _y$$\end{document} and σ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _y$$\end{document} denote univariate inerventional moments.

Proof.

See Appendix. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Note that the Jacobian matrix for interventional probabilities stated in Eq. (18d) is given for a single outcome variable Y = V j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y=V_j$$\end{document} (i.e., | Y | = K y = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\mathcal {Y}}|=K_y=1$$\end{document}). For simplicity of notation, the derivatives in Corollary 11 are taken with respect to the entire parameter vector θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document}. Recall that a causal quantity is a function of the s × 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\times 1$$\end{document} subvector θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document}. Consequently, the r × ( q + p ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \times (q+p)$$\end{document} Jacobian matrix g ( θ γ ) θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial {\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})}{\partial {\varvec{\theta }}^\intercal }$$\end{document} will contain ( q + p - s ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(q+p-s)$$\end{document} columns with zero entries that can be eliminated by pre-multiplication with an appropriate selection matrix.

These asymptotic results can be used for approximate causal inference based on finite samples, as will be illustrated in the following section.

Illustration

We illustrate the method proposed in the previous paragraphs using simulated data. In this way, the data-generating process is known and we know with certainty that the model is correctly specified. For didactic purposes, we link the simulated data to a real-world example: The data are simulated according to a modified version of the model used in a study by Ito et al. (Reference Ito, Wada, Makimura, Matsuoka, Maruyama and Saruta1998).Footnote 12

Our simulation mimics an observational study where N = 100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=100$$\end{document} persons are randomly drawn from a target population of homogeneous individuals and measured at three successive ( Δ t = 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta t=6$$\end{document} min) occasions. Variables X 1 , X 2 , X 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1,X_2,X_3$$\end{document} represent mean-centered blood insulin levels at three successive measurement occasions measured in micro international units per milliliter (mcIU/ml). Variables Y 1 , Y 2 , Y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_1,Y_2,Y_3$$\end{document} represent mean-centered blood glucose levels measured in milligrams per deciliter (mg/dl). Mean-centered blood glucose levels below - 40 mg/dl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-40 \ \text {mg/dl}$$\end{document} or above 80 mg/dl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$80 \ \text {mg/dl}$$\end{document} indicate hypo- or hyperglycemia, respectively. Both hypo- and hyperglycemia should be avoided, yielding an acceptable range for blood glucose levels of [ y low , y up ] = [ - 40 , 80 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[y^{low},y^{up}]=[-40,80]$$\end{document}. The graph of the assumed linear graph-based models is depicted in Fig. 2.

Figure. 2 Causal Graph (ADMG) in the Absence of Interventions. Figure 2 displays the ADMG corresponding to the linear graph-based model. The dashed bidirected edge drawn between X 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document} and Y 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_1$$\end{document} represents a correlation due to an unobserved common cause. Directed edges are labeled with the corresponding path coefficients that quantify direct causal effects. For example, the direct causal effect of X 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document} on Y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{3}$$\end{document} is quantified as c yx \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{yx}$$\end{document}. Traditionally, disturbances (residuals, error terms), denoted by ε \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\varepsilon }}$$\end{document} in Eq. (19), are not explicitly drawn in an ADMG.

Each directed edge corresponds to a direct causal effect and is quantified by a nonzero structural coefficient. We assume that direct causal effects are identical (stable) over time. For example, we assign the same parameter c yx \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{yx}$$\end{document} to the directed edges X 1 c yx Y 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1 {\mathop {\rightarrow }\limits ^{c_{yx}}} Y_2$$\end{document} and X 2 c yx Y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2 {\mathop {\rightarrow }\limits ^{c_{yx}}} Y_3$$\end{document} to indicate that we assume time-stable direct effects of X t - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{t-1}$$\end{document} on Y t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_t$$\end{document}. The absence of a directed edge from, say, X 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document} to Y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_3$$\end{document} in the ADMG encodes the assumption that there is no direct effect of insulin levels at t = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1$$\end{document} on glucose levels at t = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=3$$\end{document}. In other words, we assume that X 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document} only indirectly affects Y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_3$$\end{document} via X 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document} or via Y 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_2$$\end{document}. Furthermore, we assume the absence of effect modification which justifies the use of the following system of linear structural equations:

(19) X 1 Y 1 X 2 Y 2 X 3 Y 3 V = 0 0 0 0 0 0 0 0 0 0 0 0 c xx c xy 0 0 0 0 c yx c yy 0 0 0 0 0 0 c xx c xy 0 0 0 0 c yx c yy 0 0 C X 1 Y 1 X 2 Y 2 X 3 Y 3 + ε x 1 ε y 1 ε x 2 ε y 2 ε x 3 ε y 3 ε \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \underbrace{\begin{pmatrix} X_1\\ Y_1\\ X_2\\ Y_2\\ X_3\\ Y_3 \end{pmatrix}}_{{\mathbf {V}}}=\underbrace{\begin{pmatrix} 0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0\\ c_{xx}&{}c_{xy}&{}0&{}0&{}0&{}0\\ c_{yx}&{}c_{yy}&{}0&{}0&{}0&{}0\\ 0&{}0&{}c_{xx}&{}c_{xy}&{}0&{}0\\ 0&{}0&{}c_{yx}&{}c_{yy}&{}0&{}0 \end{pmatrix}}_{{\mathbf {C}}}\begin{pmatrix} X_1\\ Y_1\\ X_2\\ Y_2\\ X_3\\ Y_3 \end{pmatrix}+\underbrace{\begin{pmatrix} \varepsilon _{x1}\\ \varepsilon _{y1}\\ \varepsilon _{x2}\\ \varepsilon _{y2}\\ \varepsilon _{x3}\\ \varepsilon _{y3} \end{pmatrix}}_{{\varvec{\varepsilon }}} \end{aligned}$$\end{document}

Each bidirected edge in the ADMG indicates the existence of an unobserved confounder. In linear graph-based models, unobserved confounders are formalized as covariances between error terms. The covariance matrix of the error terms implied by the graph is given by:

(20) Ψ = ψ x 1 x 1 ψ x 1 y 1 ψ x 1 x 2 0 0 0 ψ x 1 y 1 ψ y 1 y 1 0 ψ y 1 y 2 0 0 ψ x 1 x 2 0 ψ xx ψ xy ψ x 2 x 3 0 0 ψ y 1 y 2 ψ xy ψ yy 0 ψ y 2 y 3 0 0 ψ x 2 x 3 0 ψ xx ψ xy 0 0 0 ψ y 2 y 3 ψ xy ψ yy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\varvec{\Psi }}=\begin{pmatrix} \psi _{x_1x_1}&{}\psi _{x_1y_1}&{}\psi _{x_1x_2}&{}0&{}0&{}0\\ \psi _{x_1y_1}&{}\psi _{y_1y_1}&{}0&{}\psi _{y_1y_2}&{}0&{}0\\ \psi _{x_1x_2}&{}0&{}\psi _{xx}&{}\psi _{xy}&{}\psi _{x_2x_3}&{}0\\ 0&{}\psi _{y_1y_2}&{}\psi _{xy}&{}\psi _{yy}&{}0&{}\psi _{y_2y_3}\\ 0&{}0&{}\psi _{x_2x_3}&{}0&{}\psi _{xx}&{}\psi _{xy}\\ 0&{}0&{}0&{}\psi _{y_2y_3}&{}\psi _{xy}&{}\psi _{yy} \end{pmatrix} \end{aligned}$$\end{document}

The entries ψ x 1 x 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{x_1x_1}$$\end{document}, ψ y 1 y 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{y_1y_1}$$\end{document} and ψ x 1 y 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{x_1y_1}$$\end{document} describe the (co-)variances of the initial values of blood insulin and blood glucose. (Co-)Variances of error terms at time 2 and time 3 are assumed to be constant and are denoted as ψ xx \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{xx}$$\end{document}, ψ yy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{yy}$$\end{document}, and ψ xy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{xy}$$\end{document}. Serial correlations in the X-series (Y-series) are denoted by ψ x 1 x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{x_1x_2}$$\end{document}, ψ x 2 x 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{x_2x_3}$$\end{document} ( ψ y 1 y 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{y_1y_2}$$\end{document}, ψ y 2 y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{y_2y_3}$$\end{document}). The covariances COV ( X t , Y t ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {COV}(X_t,Y_t)$$\end{document}, t = 1 , 2 , 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1,2,3$$\end{document}, encode the assumption that the contemporaneous relationship of blood insulin and blood glucose is confounded. The absence of a bidirected edge between X t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_t$$\end{document} and Y t + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{t+1}$$\end{document} encodes the assumption that there are no unobserved confounders that affect the lagged relationship of blood insulin and blood glucose.

Further, we assume that the error terms follow a multivariate normal distribution. Thus, the linear graph-based model is parametrized by the following vector of distinct, functionally unrelated and unknown parameters: θ = ( θ F , θ P ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^\intercal =({\varvec{\theta }}^\intercal _{{\mathcal {F}}},{\varvec{\theta }}^\intercal _{{\mathcal {P}}})$$\end{document} with θ F = ( c xx , c xy , c yx , c yy ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^\intercal _{{\mathcal {F}}}=(c_{xx},c_{xy},c_{yx},c_{yy})$$\end{document} and θ P = ( ψ x 1 x 1 , ψ y 1 y 1 , ψ x 1 y 1 , ψ xx , ψ yy , ψ xy , ψ x 1 x 2 , ψ x 2 x 3 , ψ y 1 y 2 , ψ y 2 y 3 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^\intercal _{{\mathcal {P}}}=(\psi _{x_1x_1},\psi _{y_1y_1},\psi _{x_1y_1},\psi _{xx},\psi _{yy},\psi _{xy},\psi _{x_1x_2},\psi _{x_2x_3},\psi _{y_1y_2},\psi _{y_2y_3})$$\end{document}.

We are interested in the effect of an intervention on blood insulin at the second measurement occasion (i.e., X 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}) on blood glucose levels at the third measurement occasion (i.e., Y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_3$$\end{document}). We set the interventional level of blood insulin to one standard deviation, namely x 2 = V ( X 2 ) = 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2=\sqrt{\text {V}(X_2)}=11.54$$\end{document}. The graph of the causal model under the intervention d o ( x 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x_2)$$\end{document} is depicted in Fig. 3.

Figure. 3 Causal Graph (ADMG) Under the Intervention d o ( x 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x_2)$$\end{document}. Figure 3 displays the ADMG of the graph-based model under the intervention d o ( x 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x_2)$$\end{document}. Edges that enter node X 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document} (i.e., that have an arrowhead pointing at node X 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}) are removed since the value of X 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document} is now set by the experimenter via the intervention d o ( x 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x_2)$$\end{document}. The interventional value x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2$$\end{document} is neither determined by the values of the causal predecessors of X 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document} nor by unobserved confounding variables. All other causal relations are unaffected by the intervention reflecting the assumption of modularity.

Based on the above description of the research situation and the hypothetical experiment, all terms in Definition 1 are uniquely determined and given by:

(21) n = 6 , X = { X 2 } , Y = { Y 3 } , K x = K y = 1 , I = { 3 } , N = { 1 , 2 , 4 , 5 , 6 } x = x 2 = V ( X 2 ) , 1 I = 0 0 1 0 0 0 , 1 N = 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 , I N = 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} n&=6, \ {\mathcal {X}}=\{X_2\}, \ {\mathcal {Y}}=\{Y_3\}, \ K_x=K_y=1, \ {\mathcal {I}}=\{3\}, \ {\mathcal {N}}=\{1,2,4,5,6\}\nonumber \\ {\mathbf {x}}&=x_2=\sqrt{\text {V}(X_2)}, \ {\mathbf {1}}_{{\mathcal {I}}}=\begin{pmatrix} 0\\ 0\\ 1\\ 0\\ 0\\ 0 \end{pmatrix}, \ {\varvec{1}}_{{\mathcal {N}}}=\begin{pmatrix} 1&{}0&{}0&{}0&{}0\\ 0&{}1&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0\\ 0&{}0&{}1&{}0&{}0\\ 0&{}0&{}0&{}1&{}0\\ 0&{}0&{}0&{}0&{}1 \end{pmatrix}, \ {\mathbf{I}}_ {{\mathcal {N}}}=\begin{pmatrix} 1&{}0&{}0&{}0&{}0&{}0\\ 0&{}1&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}1&{}0&{}0\\ 0&{}0&{}0&{}0&{}1&{}0\\ 0&{}0&{}0&{}0&{}0&{}1 \end{pmatrix} \end{aligned}$$\end{document}

The target quantity of causal inference in this example is the interventional distribution P ( Y 3 d o ( x 2 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \mid do(x_2))$$\end{document}, which can be characterized, for example, by the following causal quantities:Footnote 13

(22a) γ 1 : = E ( Y 3 d o ( x 2 ) ) = c yx x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{1}:=&{\text {E}}(Y_3 \mid do(x_2))=c_{yx}x_2 \end{aligned}$$\end{document}
(22b) γ 2 : = V ( Y 3 d o ( x 2 ) ) = c yx 2 c yy 2 ψ x 1 x 1 + c yy 4 ψ y 1 y 1 + 2 c yx c yy 3 ψ x 1 y 1 + ( 1 + c yy 2 ) ψ yy + 2 c yy 3 ψ y 1 y 2 + 2 c yy ψ y 2 y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{2}:=&\mathrm{V}(Y_3 \mid do(x_2))=c_{yx}^2 c_{yy}^2 \psi _{x_1x_1} + c_{yy}^4 \psi _{y_1y_1} + 2 c_{yx} c_{yy}^3 \psi _{x_1y_1}+ (1 + c_{yy}^2) \psi _{yy}\nonumber \\ +&2 c_{yy}^3 \psi _{y_1y_2}+ 2 c_{yy} \psi _{y_2y_3}\end{aligned}$$\end{document}
(22c) γ 3 : = f ( y 3 d o ( x 2 ) ) = ( 2 π ) - 1 2 ( V ( Y 3 d o ( x 2 ) ) ) - 1 2 exp - 1 2 ( y 3 - c yx x 2 ) 2 V ( Y 3 d o ( x 2 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{3}:=&f(y_3 \mid do(x_2))=(2\pi )^{-\frac{1}{2}}(\mathrm{V}(Y_3 \mid do(x_2)))^{-\frac{1}{2}}\exp \left( -\frac{1}{2}\frac{(y_3-c_{yx} x_2)^2}{\mathrm{V}(Y_3 \mid do(x_2))}\right) \end{aligned}$$\end{document}
(22d) γ 4 : = P ( y low Y 3 y up d o ( x 2 ) ) = Φ y up - E ( Y 3 d o ( x 2 ) ) V ( Y 3 d o ( x 2 ) ) - Φ y low - E ( Y 3 d o ( x 2 ) ) V ( Y 3 d o ( x 2 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{4}:=&P(y^{low} \le Y_3 \le y^{up}\mid do(x_2))=\nonumber \\&\Phi \left( \frac{y^{up}-\mathrm{E}(Y_3 \mid do(x_2))}{\sqrt{\mathrm{V}(Y_3 \mid do(x_2))}} \right) - \Phi \left( \frac{y^{low}-\mathrm{E}(Y_3 \mid do(x_2))}{\sqrt{\mathrm{V}(Y_3 \mid do(x_2))}}\right) \end{aligned}$$\end{document}

Where Φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi $$\end{document} denotes the cumulative distribution function (cdf) of the standard normal distribution. A central goal of a treatment at time 2 (i.e., d o ( x 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x_2)$$\end{document}) is to avoid hypo- or hyperglycemia at time 3. We therefore refer to the event { y low Y 3 y up d o ( x 2 ) } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{y^{low} \le Y_3 \le y^{up} \mid do(x_2)\}$$\end{document} as treatment success. Using this terminology, the causal quantity γ 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{4}$$\end{document} from Eq. (22d) is called the probability of treatment success.

The causal effect functions corresponding to these causal quantities are stated below and satisfy Property A.1:

(23a) γ 1 = g 1 ( θ γ 1 ; x 2 ) , with θ γ 1 = c yx \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{1}&=g_1(\theta _{\gamma _1};x_2), \ \text {with} \ \theta _{\gamma _1}=c_{yx} \end{aligned}$$\end{document}
(23b) γ 2 = g 2 ( θ γ 2 ) , with θ γ 2 = ( c yx , c yy , ψ x 1 x 1 , ψ x 1 y 1 , ψ y 1 y 1 , ψ y 1 y 2 , ψ y 2 y 3 , ψ yy ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{2}&=g_2({\varvec{\theta }}_{\gamma _2}), \ \text {with} \ {\varvec{\theta }}_{\gamma _2}=(c_{yx},c_{yy},\psi _{x_1x_1},\psi _{x_1y_1}, \psi _{y_1y_1},\psi _{y_1y_2},\psi _{y_2y_3},\psi _{yy})^\intercal \end{aligned}$$\end{document}
(23c) γ 3 = g 3 ( θ γ 3 ; x 2 , y 3 ) , with θ γ 3 = θ γ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{3}&=g_3({\varvec{\theta }}_{\gamma _3};x_2, y_3), \ \text {with} \ {\varvec{\theta }}_{\gamma _3}={\varvec{\theta }}_{\gamma _2} \end{aligned}$$\end{document}
(23d) γ 4 = g 4 ( θ γ 4 ; x 2 , y low , y up ) , with θ γ 4 = θ γ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{4}&=g_4({\varvec{\theta }}_{\gamma _4};x_2,y^{low},y^{up}), \ \text {with} \ {\varvec{\theta }}_{\gamma _4}={\varvec{\theta }}_{\gamma _2} \end{aligned}$$\end{document}

Figure 4 displays the pdfs of interventional distributions that result from three distinct (hypothetical) experiments where different interventional levels are chosen, namely - 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-11.54$$\end{document}, 0, and 11.54. Note that the interventional mean γ 1 = g 1 ( θ γ 1 ; x 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{1}=g_1(\theta _{\gamma _1};x_2)$$\end{document} is functionally dependent on the interventional level x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2$$\end{document} (see also Eq. [22a]). Thus, the location of the interventional distributions in Fig. 4 depends on the interventional level x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2$$\end{document}. By contrast, the interventional variance γ 2 = g 2 ( θ γ 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{2}=g_2({\varvec{\theta }}_{\gamma _2})$$\end{document} is functionally independent of x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2$$\end{document} (see also Eq. [22b]). Consequently, the scale of the interventional distributions in Fig. 4 is the same for all interventional levels.

Figure. 4 Interventional Distributions for Three Distinct Treatment Levels. Figure 4 displays several features of the interventional distribution for three distinct interventional levels x 2 = 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2=11.54$$\end{document} (solid), x 2 = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2'=0$$\end{document} (dashed), and x 2 = - 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2''=-11.54$$\end{document} (dotted). The pdfs of the interventional distributions are represented by the bell-shaped curves. The interventional means are represented by vertical line segments. The interventional variances correspond to the width of the bell-shaped curves and are equal across the different interventional levels. The probabilities of treatment success are represented by the shaded areas below the curves in the interval [ - 40 , 80 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-40,80]$$\end{document}.

Equations 23(a-d) display the causal effect functions corresponding to the causal quantities γ 1 , , γ 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{1},\dots ,\gamma _{4}$$\end{document}. Definition 6 states that the parametrized causal quantities γ 1 , , γ 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{1},\dots ,\gamma _{4}$$\end{document} are identified if the corresponding parameters θ γ 1 , θ γ 2 , θ γ 3 , and θ γ 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{\gamma _1},{\varvec{\theta }}_{\gamma _2},{\varvec{\theta }}_{\gamma _3}, \mathrm{and}\, {\varvec{\theta }}_{\gamma _4}$$\end{document} can be uniquely computed from the joint distribution of the observed variables. We show in the Appendix that the values of the entire parameter vector θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} can be uniquely computed from the joint distribution of the observed variables. In fact, the values of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} can be uniquely computed from the covariance matrix of the observed variables.Footnote 14

The joint distribution of the observed variables is given by { P ( v , θ ) θ Θ } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{P({\mathbf {v}},{\varvec{\theta }})\mid {\varvec{\theta }} \in {\varvec{\Theta }}\}$$\end{document}, where P is the family of 6-dimensional multivariate normal distributions. We estimated all parameters simultaneously by minimizing the maximum likelihood discrepancy function of the model implied covariance matrix and the sample covariance matrix. The ML-estimator θ ^ ML \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}^{ML}$$\end{document} is consistent, asymptotically efficient, and asymptotically normally distributed (Bollen, Reference Bollen1989) and therefore satisfies Property A.2. Additionally, the asymptotic covariance matrix of the ML-estimator is known (e.g., see Bollen, Reference Bollen1989) and consistent estimates thereof are implemented in many statistical software packages (e.g., in the R package lavaan; Rosseel, Reference Rosseel2012). The corresponding estimation results for θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} are displayed in Table 1.

Since Property A.1 and Property A.2 are satisfied, the asymptotic properties of the estimators γ ^ 1 , γ ^ 2 , γ ^ 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\gamma }}_1,\widehat{{\gamma }}_2,\widehat{{\gamma }}_3$$\end{document} and γ ^ 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\gamma }}_4$$\end{document} can be established via Theorem 8. The Jacobian matrices of the causal effect functions in Eq. (23) can be calculated according to Corollary 11. Estimates of the causal quantities are reported in Table 2 together with estimates of the asymptotic standard errors and approximate z-values.

Table 1 Parameters in the Linear Graph-Based Model.

The estimation results θ ^ ML \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}^{ML}$$\end{document} for the model parameters θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} (using a covariance-based maximum likelihood estimator with N = 100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=100$$\end{document}) are displayed together with the true population values used for data simulation. The z-values are reported for the null hypothesis of a population quantity equal to zero. Structural coefficients are displayed in the upper part, and the variance–covariance parameters are displayed in the lower part. ASE = asymptotic standard error.

Table 2 Causal Quantities in the Linear Graph-Based Model.

The estimation results for the causal quantities γ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{1}$$\end{document}, γ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{2}$$\end{document}, γ 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{3}$$\end{document}, and γ 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{4}$$\end{document} are displayed together with the population values used for data simulation. The z-values are reported for the null hypothesis of a population quantity equal to zero. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\dagger $$\end{document} The estimates γ ^ 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\gamma }}_{3}$$\end{document} and γ ^ 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\gamma }}_{4}$$\end{document} depend on x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2$$\end{document}, y 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_3$$\end{document}, y 3 low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_3^{low}$$\end{document}, or y 3 up \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_3^{up}$$\end{document} in a nonlinear way. The displayed quantities are calculated for x 2 = 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2=11.54$$\end{document}, y 3 = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_3=0$$\end{document}, y 3 low = - 40 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_3^{low}=-40$$\end{document} and y 3 up = 80 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_3^{up}=80$$\end{document}. ASE = asymptotic standard error.

Figure. 5 Estimate of the Probability Density Function of the Interventional Distribution. Figure 5 displays the estimated interventional pdf f ^ ( y 3 d o ( x 2 = 11.54 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{f}}(y_3 \mid do(x_2=11.54))$$\end{document} (black solid line) with pointwise 95 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document} confidence intervals, that is, ± 1.96 · ASE ^ [ f ^ ( y 3 d o ( x 2 = 11.54 ) ) ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm 1.96\cdot {\widehat{\mathrm{ASE}}}[{\widehat{f}}(y_3 \mid do(x_2=11.54))]$$\end{document} (gray shaded area). The true population interventional pdf f ( y 3 d o ( x 2 = 11.54 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(y_3 \mid do(x_2=11.54))$$\end{document} is displayed by the gray dashed line.

From Theorem 8, we know that γ ^ 3 = g 3 ( θ ^ γ 3 ; x 2 , y 3 ) = f ^ ( y 3 d o ( x 2 ) ) p f ( y 3 d o ( x 2 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\gamma }_3=g_3(\widehat{{\varvec{\theta }}}_{\gamma _3};x_2, y_3)=\widehat{f}(y_3 \mid do(x_2))\xrightarrow { \ p \ } f(y_3 \mid do(x_2))$$\end{document} holds pointwise for any ( y 3 , x 2 ) R 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y_3,x_2) \in {\mathbb {R}}^2$$\end{document}. Figure 5 displays the estimated interventional pdf together with its population counterpart as well as pointwise asymptotic confidence intervals for the fixed interventional level x 2 = 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2=11.54$$\end{document} over the range y 3 [ - 100 , 100 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_3\in [-100,100]$$\end{document}.

Figure 5 shows that a sample size of N = 100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=100$$\end{document} yields very precise estimates of the interventional pdf over the whole range of values y 3 [ - 100 , 100 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_3\in [-100,100]$$\end{document}, which is a consequence of the rate of convergence N - 1 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^{-\frac{1}{2}}$$\end{document} established in Theorem 8.

Figure 6 displays the estimated probability that the blood glucose level falls into the acceptable range (i.e., hypo- and hyperglycemia are avoided) at t = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=3$$\end{document}, given an intervention d o ( x 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x_2)$$\end{document} on blood insulin at t = 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=2$$\end{document}, as a function of the interventional level x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2$$\end{document}. Just like in the case of the interventional pdf, Fig. 6 shows that a sample size of N = 100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=100$$\end{document} yields very precise estimates of interventional probabilities over the whole range of values x 2 [ - 50 , 50 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2\in [-50,50]$$\end{document}. Given the intervention d o ( x 2 = 11.54 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x_2=11.54)$$\end{document}, the probability of treatment success (i.e, blood glucose level within the acceptable range at t = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=3$$\end{document}) equals .85, as depicted in Fig. 6. Since the curve in Fig. 6 displays a unique (local and global) maximum, the interventional level can be chosen such that the probability of treatment success is maximized. The maximal probability of treatment success is equal to .94 and can be obtained by administering intervention d o ( x 2 = - 38.3 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(x_2^*=-38.3)$$\end{document}. Note that the curve is relatively flat around its maximum, meaning that slight deviations from the optimal treatment level will result in a small decrease in the probability of treatment success.

Figure. 6 Estimated Probability of Treatment Success. Figure 6 displays the estimated probability of treatment success (i.e., γ ^ 4 = P ^ ( - 40 Y 3 80 d o ( x 2 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\gamma }}_4={\widehat{P}}(-40 \le Y_3 \le 80\mid do(x_2))$$\end{document}; black solid line) as a function of the interventional level x 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2$$\end{document}. The pointwise confidence intervals ± 1.96 · ASE ^ [ P ^ ( - 40 Y 3 80 d o ( x 2 ) ) ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm 1.96\cdot {\widehat{\mathrm{ASE}}}[{\widehat{P}}(-40 \le Y_3 \le 80\mid do(x_2))]$$\end{document} are displayed by the (very narrow) gray shaded area around the solid black line (see electronic version for high resolution). The vertical dashed lines are drawn at the interventional levels x 2 = 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2=11.54$$\end{document} and x 2 = - 38.3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2=-38.3$$\end{document}. The horizontal dashed lines correspond to the probabilities of treatment success for the treatments d o ( X 2 = 11.54 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(X_2=11.54)$$\end{document} and d o ( X 2 = - 33.3 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(X_2=-33.3)$$\end{document}.

Interventional Distribution vs. Conditional Distribution

To illustrate the conceptual differences between the interventional and conditional distribution, we use the numeric population values from the first row of Table 1 and Table 2, respectively. The interventional distribution is given by P ( Y 3 d o ( x 2 ) ) = N 1 ( - 0.6 x 2 , 1096.39 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \mid do(x_2))={{N}}_{1}(-0.6x_2 \, , \, 1096.39)$$\end{document} and it differs from both the conditional distribution, P ( Y 3 X 2 = x 2 ) = N 1 ( 1.76 x 2 , 353.99 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \mid X_2=x_2)={{N}}_{1}(1.76x_2 \, , \, 353.99)$$\end{document}, and the unconditional distribution, P ( Y 3 ) = N 1 ( 0 , 766.91 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3)={{N}}_{1}(0 \, , \,766.91)$$\end{document}, as depicted in Fig. 7.Footnote 15

Figure. 7 Marginal, Conditional, and Interventional Distribution. The panels depict (i) the pdf of the unconditional distribution P ( Y 3 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3)$$\end{document} (top panel), (ii) the conditional distribution P ( Y 3 X 2 = x 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \mid X_2=x_2)$$\end{document} (middle panel), and (iii) the interventional distribution P ( Y 3 d o ( x 2 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \mid do(x_2))$$\end{document} (bottom panel). In (ii) the level x 2 = 11.54 mg/dl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2=11.54 \ \text {mg/dl}$$\end{document} was passively measured whereas in (iii) the intervention d o ( X 2 = 11.54 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(X_2=11.54)$$\end{document} was performed. The central vertical black solid lines are drawn at the mean and shaded areas cover 95 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document} of the probability mass.

The unconditional distribution (upper panel) corresponds to a situation where no prior observation is available and no intervention is performed. Note that the conditional distribution (middle panel) is shifted to the right (for X 2 = 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2=11.54$$\end{document}), whereas the interventional distribution (bottom panel) is shifted to the left for d o ( X 2 = 11.54 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(X_2=11.54)$$\end{document}, as displayed in Fig. 7. The differences displayed between P ( Y 3 X 2 = 11.54 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \mid X_2=11.54)$$\end{document} and P ( Y 3 d o ( X 2 = 11.54 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \mid do(X_2=11.54))$$\end{document} reflect the fundamental difference between the mode of seeing, namely passive observation, and the mode of doing, namely active intervention (Pearl, Reference Pearl2009).

On the one hand, observing a blood insulin level of X 2 = 11.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2=11.54$$\end{document} at the second measurement occasion leads to an expected value of 20.31 mg/dl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20.31 \ \text {mg/dl}$$\end{document} for blood glucose at the third measurement occasion (i.e., E ( Y 3 X 2 = 11.54 ) = 1.76 · 11.54 = 20.31 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{E}(Y_3 \mid X_2=11.54)=1.76\cdot 11.54=20.31$$\end{document}). Using the conditional variance V ( Y 3 X 2 = 11.54 ) = 353.99 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{V}(Y_3 \mid X_2=11.54)=353.99$$\end{document} to compute a 95 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document} forecast interval yields P ( Y 3 [ - 16.56 , 57.19 ] X 2 = 11.54 ) = . 95 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \in [-16.56,57.19] \mid X_2=11.54)=.95$$\end{document}, as indicated by the shaded area under the curve in the middle panel of Fig. 7.

On the other hand, setting the level of blood insulin to d o ( X 2 = 11.54 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(X_2=11.54)$$\end{document} at the second occasion by an active intervention leads to an expected value of - 6.92 mg/dl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-6.92 \ \text {mg/dl}$$\end{document} for blood glucose at the third measurement occasion (i.e., E ( Y 3 d o ( x 2 = 11.54 ) ) = - 0.60 · 11.54 = - 6.92 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{E}(Y_3 \mid do(x_2=11.54))=-0.60\cdot 11.54=-6.92$$\end{document}). Using the interventional variance V ( Y 3 d o ( 11.54 ) ) = 1096.39 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{V}(Y_3 \mid do(11.54))=1096.39$$\end{document} to compute a 95 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document} forecast interval yields P ( Y 3 [ - 71.82 , 57.97 ] d o ( x 2 = 11.54 ) ) = . 95 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y_3 \in [-71.82,57.97] \mid do(x_2=11.54))=.95$$\end{document}, as indicated by the shaded area under the curve in the bottom panel of Fig. 7.

Based on both the conditional and interventional distribution, valid statements about values of blood glucose can be made. A patient who measures a high level of insulin at time 2 in the absence of an intervention (e.g., self-measured monitoring of blood insulin; mode of seeing) will predict a high level of blood glucose at time 3 based on the conditional distribution. A physician who actively administers a high dose of insulin at time 2 (e.g., via an insulin injection; mode of doing) will forecast a low value of blood glucose at time 3 based on the interventional distribution.

Incorrect conclusions arise if the conditional distribution is used to forecast effects of interventions, or the other way around, the interventional distribution is used to predict future values of blood glucose in the absence of interventions. For example, a physician who correctly uses the interventional distribution to choose the optimal treatment level would administer d o ( X 2 = - 38.3 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(X_2=-38.3)$$\end{document}, resulting in a 94 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94\%$$\end{document} probability of treatment success (see Fig. 6). A physician who erroneously uses the conditional distribution to specify the optimal treatment level would administer d o ( X 2 = 11.4 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$do(X_2=11.4)$$\end{document}. Such a non-optimal intervention would result in a 85 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$85\%$$\end{document} probability of treatment success. Thus, an incorrect decision results in an absolute decrease of 9 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$9\%$$\end{document} in the probability of treatment success (Gische et al., Reference Gische, West and Voelkle2021).

Discussion

Graph-based causal models combine a priori assumptions about the causal structure of the data-generating mechanism (e.g., encoded in a ADMG) and observational data to make inference about the effects of (hypothetical) interventions. Causal quantities are defined via the do-operator and may comprise any feature of the interventional distribution (e.g., the mean vector, the covariance matrix, the pdf). This flexibility allows researchers to analyze effects of interventions beyond changes in the mean level. Causal effect functions map the parameters of the model implied joint distribution of observed variables onto causal quantities and therefore enable analyzing causal quantities using tools from the literature on traditional SEM. We propose an estimator for causal quantities and show that it is consistent and converges at a rate of N - 1 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^{-\frac{1}{2}}$$\end{document}. In case of maximum likelihood estimation, the proposed estimator is asymptotically efficient.

In the remainder of the paper, we discuss several situations in which linear graph-based models are misspecified and how the proposed procedure can be extended to be applicable in such situations.

Causal Structure, Modularity, and Conditional Interventions

A researcher’s beliefs about the causal structure are encoded in the graph. Based on the concept of d-separation, every ADMG implies a set of (conditional) independence relations between observable variables that can be tested parametrically (Chen, Tian, & Pearl, Reference Chen, Tian and Pearl2014; Shipley, Reference Shipley2003; Thoemmes, Rosseel, & Textor, Reference Thoemmes, Rosseel and Textor2018) or nonparametrically (Richardson, Reference Richardson2003; Tian & Pearl, Reference Tian and Pearl2002b). One drawback of these tests is that they only distinguish between equivalence classes of ADMGs and do not evaluate the validity of a single graph.

One way of dealing with this situation is to further analyze the equivalence class to which a specified model belongs (Richardson & Spirtes, Reference Richardson and Spirtes2002). Some authors have proposed methods to draw causal conclusions based on common features of an entire equivalence class instead of using a single model (Hauser & Bühlmann, Reference Hauser and Bühlmann2015; Maathuis, Kalisch, & Bühlmann, Reference Maathuis, Kalisch and Bühlmann2009; Perkovic, Reference Perkovic2020; Zhang, Reference Zhang2008). However, equivalence classes can be large and its members might not overlap with respect to the causal effects of interest (He & Jia, Reference He and Jia2015).

Another approach discussed in the literature is to complement the available observational data with experimental data. If these experiments are optimally chosen, the size of an equivalence class can be substantially reduced (Eberhardt, Glymour, & Scheines, Reference Eberhardt, Glymour and Scheines2005; Hyttinen, Eberhardt, & Hoyer, Reference Hyttinen, Eberhardt and Hoyer2013). The idea of combining observational data and experimental data is theoretically appealing for many reasons, and it has stimulated the development of a variety of techniques (He & Geng, Reference He and Geng2008; Peters, Bühlmann, & Meinshausen, Reference Peters, Bühlmann and Meinshausen2016; Sontakke, Mehrjou, Itti, & Schölkopf, Reference Sontakke, Mehrjou, Itti and Schölkopf2020). Most importantly, the combination of observational and interventional data allows differentiating causal models that cannot be distinguished solely based on observational data.

Furthermore, the availability of experimental evidence enables (partly) testing further causal assumptions such as the assumption of modularity, which cannot be tested solely based on observational data. While modularity seems rather plausible if the mechanisms correspond to natural laws (e.g., chemical or biological processes, genetic laws, laws of physics), it needs additional reflection if the mechanisms describe human behavior. For example, humans might respond to an intervention by adjusting behavioral mechanisms different from the one that is intervened on. The proposed method can readily be adjusted to capture such violations of the modularity assumption if an intervention changes other mechanisms in a known way. However, if the ways in which humans adjust their behavior in response to an intervention are unknown, they need to be learned. Well-designed experiments may be particularly useful for this purpose.

Throughout the manuscript, we focus on specific do-type interventions that assign fixed values to the interventional variables according to an exogenous rule. However, in practical applications interventional values are often chosen conditionally on the values of other observed variables. In our illustrative example, the interventional insulin level at t = 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=2$$\end{document} might be chosen in response to the glucose level observed at t = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1$$\end{document}. Such situations are discussed in the literature on conditional interventions (Pearl, Reference Pearl2009) and dynamic treatment plans (Pearl & Robins, Reference Pearl and Robins1995; Robins, Hernán, & Brumback, Reference Robins, Hernán and Brumback2000). In principle, the proposed method can be extended to evaluate conditional interventions and effects of dynamic treatment plans. However, the derivation of the closed-form representations of parametrized causal quantities and the corresponding causal effect functions in these settings require further research.

Finally, consequences of specific violations of non-testable causal assumptions can be gauged via sensitivity analyses and robustness checks (Ding & VanderWeele, Reference Ding and VanderWeele2016; Dorie, Harada, Carnegie, & Hill, Reference Dorie, Harada, Carnegie and Hill2016; Franks, D’Amour, & Feller, Reference Franks, D’Amour and Feller2020; Rosenbaum, Reference Rosenbaum2002).

Effect Modification and Heterogeneity

In this article, we have focused on situations in which direct causal effects are constant across value combinations of observed variables and error terms. In such situations, the use of linear models is justified. Statistical tests for linearity of the functional relations exist for both nested and non-nested models (Amemiya, Reference Amemiya1985; Lee, Reference Lee2007; Schumacker & Marcoulides, Reference Schumacker and Marcoulides1998). If these tests provide evidence against linearity, the assumption of constant direct effects is likely to be violated.

Theoretical considerations often suggest the existence of so-called effect modifiers (moderators), which can be modeled in parametrized graph-based models via nonlinear structural equations (Amemiya, Reference Amemiya1985; Klein & Muthén, Reference Klein and Muthén2007). However, a closed-form representation of the entire interventional distribution in case of nonlinear structural relations cannot be derived via a direct application of the method proposed in this paper. The extent to which the proposed parametric method can be generalized to capture common types of nonlinearity (e.g., simple product terms that capture certain types of effect modification) is a focus of ongoing research. Preliminary results suggest that parametrized closed-form expressions of certain features of the interventional distribution (e.g., its moments) can be obtained (Kan, Reference Kan2008; Wall & Amemiya, Reference Wall and Amemiya2003), which in turns enables analyzing ATEs and other causal quantities.

Furthermore, we assumed that direct causal effects quantified by structural coefficients are equal across individuals in the population. However, (unobserved) heterogeneity in mean levels or direct effects might be present in many applied situations. A common procedure to capture specific types of unobserved heterogeneity is to include random intercepts or random coefficients in panel data models (Hamaker, Kuiper, & Grasman, Reference Hamaker, Kuiper and Grasman2015; Usami, Murayama, & Hamaker, Reference Usami, Murayama and Hamaker2019; Zyphur et al. Reference Zyphur, Allison, Tay, Voelkle, Preacher, Zhang and Diener2019). Gische et al. (Reference Gische, West and Voelkle2021) apply the method proposed in this paper to linear cross-lagged panel models with additive person-specific random intercepts and show how absolute values of optimal treatment levels differ across individuals.

Even though additive random intercepts capture unobserved person-specific differences in the mean levels of the variables, these models still imply constant effects of changes in treatment level across persons. The latter implication might be overly restrictive in many applied situations in which treatment effects vary across individuals (e.g., different patients respond differently to variations in treatment level). An extension of the proposed methods to more complex dynamic panel data models (e.g., models including random slopes) requires further research. Several alternative approaches to model effect heterogeneity have been proposed for example within the social and behavioral sciences (Xie, Brand, & Jann, Reference Xie, Brand and Jann2012), economics (Athey & Imbens, Reference Athey and Imbens2016), the political sciences (Imai & Ratkovic, Reference Imai and Ratkovic2013), and the computer sciences (Nie & Wager, Reference Nie and Wager2020; Wager & Athey, Reference Wager and Athey2018).

Measurement Error and Non-Normality

We assumed that variables are observed without measurement error. The proposed method can be extended to define, identify, and estimate causal effects among latent variables. In other words, measurement errors and measurement models can be included. The model implied joint distribution of observed variables in latent variable SEM is known (Bollen, Reference Bollen1989), and the derivation of the parametric expressions for causal quantities and causal effect functions in such models is subject to ongoing research.

However, measurement models for latent variables often can only mitigate measurement error issues (unless the true measurement model is known and everything is correctly specified). Furthermore, the degree to which interventions on certain types of latent constructs is feasible in practice needs further discussion (e.g., see Bollen, Reference Bollen2002; Borsboom, Mellenbergh, & van Heerden, Reference Borsboom, Mellenbergh and van Heerden2003; van Bork, Rhemtulla, Sijtsma, & Borsboom, Reference van Bork, Rhemtulla, Sijtsma and Borsboom2020).

Some population results derived in this paper rely on multivariate normally distributed error terms (e.g., Result 3), while others do not (e.g., the moments of the interventional distribution in Eqs. (6a) and (6b) or Theorem 8). For the former results, a systematic analytic inquiry of the consequences of incorrectly assuming multivariate normal error terms requires specific knowledge about the type of misspecification. If such knowledge is not available, one could attempt to assess the sensitivity of, for example, the interventional pdf, to misspecifications in the error term distribution via simulation studies.

Some estimation results derived in this paper rely on a known parametric distributional family of the error terms (e.g., Theorem 9 requires maximum-likelihood estimation), while others do not (e.g., Theorem 8 ensures consistency of the estimators of causal quantities for a broad class of estimators including ADF or WLS estimation of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document}). Thus, inference about the interventional moments can be conducted in the absence of parametric assumptions on the error term distribution. Furthermore, it has been shown that ML-estimators in linear SEM are robust to certain types of distributional misspecification but sensitive to others (West, Finch, & Curran, Reference West, Finch and Curran1995) and robust estimators have been developed for several types of distributional misspecifications (Satorra & Bentler, Reference Satorra and Bentler1994; Yuan & Bentler, Reference Yuan and Bentler1998).

Conclusion

Causal graphs (e.g., ADMGs) allow researchers to express their causal beliefs in a transparent way and provide a sound basis for the definition of causal effects using the do-operator. Causal effect functions enable analyzing causal quantities in parametrized models. They are a flexible tool that allow researchers to model causal effects beyond the mean and covariance structure and can thus be applied in a large variety of research situations. Consistent and asymptotically efficient estimators of parametric causal quantities are provided that yield precise estimates based on sample sizes commonly available in the social and behavioral sciences.

Acknowledgements

The first author thanks Stephen G. West for the careful editing and helpful comments; Bernd Droge and Grégoire Njacheun-Njanzoua for the insightful discussions on asymptotic inference; and the editor and reviewers for their helpful comments which significantly strengthened the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Declarations

Disclosure statement

The authors do not have any conflicts of interest to disclose.

Appendix

Proof of Lemma 2.

Proof of rank ( T 1 ) = n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {rank}({\mathbf {T}}_1)=n-K_x$$\end{document}: The matrix ( I n - I N C ) - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}$$\end{document} is lower triangular with ones on the diagonal and thus has full rank n (Lütkepohl, Reference Lütkepohl1997, result 9.14.1(4)(c), p. 165). By construction I N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {{\mathbf{I}}}_{{\mathcal {N}}}$$\end{document} is a diagonal matrix where K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_x$$\end{document} diagonal elements are equal to zero which implies rank ( I N ) = n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {rank}(\mathbf {{\mathbf{I}}}_{{\mathcal {N}}})=n-K_x$$\end{document} (Lütkepohl Reference Lütkepohl1997, result 9.4(3)(a), p. 120). Thus, rank ( T 1 ) = rank ( ( I n - I N C ) - 1 I N ) = n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {rank}({\mathbf {T}}_1)=\text {rank}(({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}\mathbf {{\mathbf{I}}}_{{\mathcal {N}}})=n-K_x$$\end{document}, where the last equality sign follows from result 4.3.1(9) in Lütkepohl (Reference Lütkepohl1997).

Proof of rank ( T 2 ) = n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {rank}({\mathbf {T}}_2)=n-K_x$$\end{document}: ( I n - I N C ) - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}$$\end{document} is a lower triangular matrix of full rank n that has ones on the diagonal. Postmultiplying ( I n - I N C ) - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\mathbf{I}}_n-{\mathbf{I}}_ {{\mathcal {N}}}{\mathbf {C}})^{-1}$$\end{document} with I N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {{\mathbf{I}}}_{{\mathcal {N}}}$$\end{document} sets all columns with index i I \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in {\mathcal {I}}$$\end{document} to zero. Or more formally, [ t 1 ] i = 0 n × 1 , i I \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[{\mathbf {t}}_{1}]_{\bullet i}={\mathbf {0}}_{n \times 1}, \, i \in {\mathcal {I}}$$\end{document}, where [ t 1 ] i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[{\mathbf {t}}_{1}]_{\bullet i}$$\end{document} denotes the i-th column of the matrix T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_1$$\end{document}. Similarly, [ t 1 ] i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[{\mathbf {t}}_{1}]_{i \bullet }$$\end{document} denotes the i-th row of the matrix T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_1$$\end{document}. Thus, all diagonal elements t ii \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{ii}$$\end{document} with i I \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ i \in {\mathcal {I}}$$\end{document} are equal to zero. Premultiplying T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_1$$\end{document} with 1 N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {1}}_{{\mathcal {N}}}^{\intercal }$$\end{document} deletes all rows [ t 1 ] i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[{\mathbf {t}}_{1}]_{i\bullet }$$\end{document} that have an index i I \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in {\mathcal {I}}$$\end{document}. The deleted rows are exactly those rows that have t ii = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{ii}=0$$\end{document} as diagonal elements. The matrix T 2 = 1 N T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_2={\mathbf {1}}_{{\mathcal {N}}}^{\intercal }{\mathbf {T}}_1$$\end{document} contains only those rows of T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_1$$\end{document} that have a non-interventional index, that is, rows that have diagonal elements t ii \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{ii}$$\end{document} equal to 1. The resulting structure of T 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_2$$\end{document} is illustrated below:

T 2 = [ t 2 ] 1 [ t 2 ] j 2 [ t 2 ] j 3 [ t 2 ] n 1 0 0 0 0 0 0 0 0 [ t 2 ] 1 1 0 0 0 0 0 [ t 2 ] 2 1 0 0 [ t 2 ] 3 1 [ t 2 ] ( n - K x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {T}}_{2}=\begin{array}{cccccccccccc|l} [{\mathbf {t}}_{2}]_{\bullet 1} &{}&{}\dots &{}&{} [{\mathbf {t}}_{2}]_{\bullet _{j_2}} &{}&{}\dots &{}&{}&{} [{\mathbf {t}}_{2}]_{\bullet _{j_3}}&{} \dots &{} [{\mathbf {t}}_{2}]_{\bullet n}\\ \hline 1 &{} 0 &{} \dots &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} [{\mathbf {t}}_{2}]_{1 \bullet }\\ * &{} * &{} \dots &{} * &{} 1 &{} 0 &{} \dots &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} [{\mathbf {t}}_{2}]_{2 \bullet }\\ *&{} * &{} \dots &{} * &{} * &{} * &{} \dots &{} * &{} 1 &{} 0 &{} \dots &{} 0 &{} [{\mathbf {t}}_{2}]_{3 \bullet }\\ \vdots &{} &{} &{} &{} &{} &{} &{} &{} &{} &{} &{} \vdots &{} \vdots \\ * &{} * &{} \dots &{} * &{} * &{} * &{} \dots &{} * &{} * &{} * &{} \dots &{} 1 &{} [{\mathbf {t}}_{2}]_{(n-K_x)\bullet } \end{array} \end{aligned}$$\end{document}

The ordered set of non-interventional indexes is given by N : = { 1 , 2 , . . . , n } \ I = { j 1 , j 2 , . . . , j n - K x } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {N}}:=\{1,2,...,n\}\setminus {\mathcal {I}}=\{j_1,j_2,...,j_{n-K_x}\}$$\end{document}. For clarity of display (and without loss of generality), we assume j 1 = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j_1=1$$\end{document} and j n - K x = n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j_{n-K_x}=n$$\end{document}, that is, variables V 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_1$$\end{document} and V n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_n$$\end{document} are not subject to intervention. Due to the step structure of the matrix T 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_{2}$$\end{document} with the rightmost nonzero element of each row equal to one, the matrix T 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {T}}_{2}$$\end{document} has full row rank, that is, rank ( T 2 ) = n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {rank}({\mathbf {T}}_2)=n-K_x$$\end{document}. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Sketch of proof of local identification of example model

Due to space restrictions and the necessity to state high-dimensional vectors and matrices explicitly, a detailed and fully reproducible version of the proof is given in the online supplementary material.

Let V = C V + ε \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}={\mathbf {C}}{\mathbf {V}}+{\varvec{\varepsilon }}$$\end{document} be a linear graph-based model as defined in Eq. (2), where n = 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=6$$\end{document}, and C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {C}}$$\end{document} and Ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Psi }}$$\end{document} are given in Eq. (19) and (20), respectively. Plugging in these quantities into Eq. (3.3.6) from Bekker et al. (Reference Bekker, Merckens and Wansbeek1994) yields:

(A.1) J ~ = R Ψ ( I 36 + K 6 ) ( I 6 Ψ ) R C ( I 6 ( I - C ) ) , ( 43 × 36 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \tilde{{\mathbf {J}}}=\begin{pmatrix} {\mathbf {R}}_{{\varvec{\Psi }}}(\mathbf {{\mathbf{I}}}_{36}+{\mathbf {K}}_6)(\mathbf {{\mathbf{I}}}_6 \otimes {\varvec{\Psi }})\\ {\mathbf {R}}_{\mathbf {C}}(\mathbf {{\mathbf{I}}}_6 \otimes (\mathbf {{\mathbf{I}}}-{\mathbf {C}})^\intercal ) \end{pmatrix} \quad , \quad (43 \times 36) \end{aligned}$$\end{document}

The ( 32 × 36 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(32 \times 36)$$\end{document} matrix R Ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {R}}_{{\varvec{\Psi }}}$$\end{document} and the ( 11 × 36 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(11 \times 36)$$\end{document} matrix R C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {R}}_\mathbf{{C}}$$\end{document} encode the zero restrictions and equality constraints imposed on the covariance matrix and the matrix of structural coefficients in Eqs. (20) and (19), respectively. The matrix K 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {K}}_6$$\end{document} denotes the commutation matrix for n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\times n$$\end{document} matrices (Magnus & Neudecker, Reference Magnus and Neudecker1979). Theorem 3.3.1 in Bekker et al. (Reference Bekker, Merckens and Wansbeek1994) states that under certain regularity conditions the parameter vector θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} is locally identified, if and only if, the Jacobian matrix J ~ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{{\mathbf {J}}}$$\end{document} has full column rank. We show that rank ( J ~ ) = 36 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {rank}(\tilde{{\mathbf {J}}})=36$$\end{document}. The exact form of the restriction matrices R Ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {R}}_{{\varvec{\Psi }}}$$\end{document} and R C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {R}}_\mathbf{{C}}$$\end{document}, and the Mathematica (Wolfram Research Inc., 2018) code used to evaluate the rank of J ~ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{{\mathbf {J}}}$$\end{document} are provided in the online supplementary material. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Properties Required for Theorem 8:

Property A.1.

(properties of causal effect functions g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}$$\end{document}) Let γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\gamma }}$$\end{document} be a causal quantity and g ( θ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})$$\end{document} the corresponding causal effect function. Let g ( θ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})$$\end{document} be continuously differentiable with respect to θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document} in a neighborhood around the true population parameter value θ γ Θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*_{\varvec{\gamma }}\in {\varvec{\Theta }}_{\varvec{\gamma }}$$\end{document}. The r × s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \times s$$\end{document} matrix of partial derivatives is non-singular and denoted by g ( θ γ ) θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial {\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }})}{\partial {\varvec{\theta }}_{\varvec{\gamma }}^\intercal }$$\end{document}. If the causal effect function contains auxiliary variables, say g ( θ γ ; x , v N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}({\varvec{\theta }}_{\varvec{\gamma }};{\mathbf {x}},{\mathbf {v}}_{{\mathcal {N}}})$$\end{document}, then non-singularity of the matrix of partial derivatives is supposed to hold for any fixed value combination ( x , v N ) R K x × R n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\mathbf {x}},{\mathbf {v}}_{{\mathcal {N}}})\in {\mathbb {R}}^{K_x}\times {\mathbb {R}}^{n-K_x}$$\end{document}.Footnote 16

Property A.2.

(statistical properties of θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}$$\end{document}) Let θ ^ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\theta }}}_{\varvec{\gamma }}$$\end{document} be an estimator of θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_{\varvec{\gamma }}$$\end{document} with:

(A.2a) θ ^ γ p θ γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \widehat{{\varvec{\theta }}}_{\varvec{\gamma }}&\xrightarrow { \ p \ }{\varvec{\theta }}^*_{\varvec{\gamma }} \end{aligned}$$\end{document}
(A.2b) N θ ^ γ - θ γ d N s ( 0 s , AV ( N θ ^ γ ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sqrt{N}\left( \widehat{{{\varvec{\theta }}}}_{{\varvec{\gamma }}}-{{\varvec{\theta }}}^*_{{{\varvec{\gamma }}}} \right)&\xrightarrow { \ d \ } { {N}}_{s} \big (\,{\mathbf {0}}_s\, , \, \mathrm {AV}(\sqrt{N}\widehat{{{\varvec{\theta }}}}_{{\varvec{\gamma }}})\,\big ) \end{aligned}$$\end{document}

Where θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} denotes the true population value and p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xrightarrow { \ p \ }$$\end{document} ( d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xrightarrow { \ d \ }$$\end{document}) refers to convergence in probability (distribution) as the sample size N tends to infinity. The covariance matrix of the limiting distribution is denoted as AV ( N θ ^ γ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{AV}(\sqrt{N}\widehat{{\varvec{\theta }}}_{\varvec{\gamma }})$$\end{document} and is assumed to be finite.Footnote 17

Proof of Corollary 11.

We follow the definition of a matrix differential and a matrix derivative in Magnus and Neudecker (Reference Magnus and Neudecker1999). To complete the proof, we make extensive use of results (a) from matrix differential calculus (Abadir & Magnus, Reference Abadir and Magnus2005; Magnus & Neudecker, Reference Magnus and Neudecker1999) and (b) regarding the vec \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\mathrm{\textsf {vec}}\,}}$$\end{document}-operator and Kronecker products (e.g., see Lütkepohl, Reference Lütkepohl1997 for an overview).

Proof of Equation (18a):

(A.3) E ( V d o ( x ) ) = ( I n - I N C ) - 1 1 I x d E ( V d o ( x ) ) = d [ ( I n - I N C ) - 1 1 I x ] = ( I n - I N C ) - 1 I N [ d C ] ( I n - I N C ) - 1 1 I x vec ( d E ( V d o ( x ) ) ) = vec ( ( I n - I N C ) - 1 I N [ d C ] ( I n - I N C ) - 1 1 I x ) d E ( V d o ( x ) ) ) = ( ( x 1 I ( I n - I N C ) - ) ( ( I n - I N C ) - 1 I N ) ) d vec C d E ( V d o ( x ) ) ) = ( ( x 1 I ( I n - I N C ) - ) ( ( I n - I N C ) - 1 I N ) ) vec C θ = E ( V d o ( x ) ) θ d θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))&=({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1} {\mathbf {1}}_{{\mathcal {I}}}{\mathbf {x}}\nonumber \\ \Rightarrow \quad \textsf {d}\mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))&=\textsf {d}\big [({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-1}{\mathbf {1}}_{{\mathcal {I}}}{\mathbf {x}}\big ] =({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}}[\textsf {d}{\mathbf {C}}]({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1} {\mathbf {1}}_{{\mathcal {I}}}{\mathbf {x}}\nonumber \\ \Leftrightarrow \quad {{{\,\mathrm{\textsf {vec}}\,}}} \big (\textsf {d}\mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))\big )&={{{\,\mathrm{\textsf {vec}}\,}}} \big (({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}}[\textsf {d}{\mathbf {C}}]({\mathbf{I}}_n -{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf {1}}_{{\mathcal {I}}} {\mathbf {x}}\big )\nonumber \\ \Leftrightarrow \quad \textsf {d}\mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}})) \big )&=\big (({\mathbf {x}}^\intercal {\mathbf {1}}_{{\mathcal {I}}}^\intercal ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal })\otimes (({\mathbf{I}}_n -{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}})\big ) \textsf {d}{{{\,\mathrm{\textsf {vec}}\,}}} {\mathbf {C}}\nonumber \\ \Leftrightarrow \quad \textsf {d}\mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))\big )&=\underbrace{\big (({\mathbf {x}}^\intercal {\mathbf {1}}_{{\mathcal {I}}}^\intercal ({\mathbf{I}}_n -{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal }) \otimes (({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1} {\mathbf{I}}_{{\mathcal {N}}})\big )\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}} {\mathbf {C}}}{\partial {\varvec{\theta }}^\intercal }}_{=\frac{\partial \mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))}{\partial {\varvec{\theta }}^\intercal }}\textsf {d}{\varvec{\theta }} \end{aligned}$$\end{document}

Note that vec d C = vec C θ d θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\mathbf {C}}=\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}} {\mathbf {C}}}{\partial {\varvec{\theta }}^\intercal }\textsf {d}{\varvec{\theta }}$$\end{document} holds by definition and that each entry of the matrix C = C ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {C}}={\mathbf {C}}({\varvec{\theta }})$$\end{document} is either equal to a single element of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} or equal to zero. Thus, the n 2 × p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^2 \times p$$\end{document} Jacobian matrix vec C θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}} {\mathbf {C}}}{\partial {\varvec{\theta }}^\intercal }$$\end{document} is a zero-one matrix.

Proof of Equation (18b):

(A.4) V ( V d o ( x ) ) = ( I n - I N C ) - 1 I N Ψ I N ( I n - I N C ) - d V ( V d o ( x ) ) = d [ ( I n - I N C ) - 1 I N Ψ I N ( I n - I N C ) - ] = ( I n - I N C ) - 1 I N [ d C ] ( I n - I N C ) - 1 I N Ψ I N ( I n - I N C ) - + ( I n - I N C ) - 1 I N [ d Ψ ] I N ( I n - I N C ) - + ( I n - I N C ) - 1 I N Ψ I N ( I n - I N C ) - [ d C ] I N ( I n - I N C ) - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))&=({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1} {\mathbf{I}}_{{\mathcal {N}}}{\varvec{\Psi }}{\mathbf{I}}_{{\mathcal {N}}}({\mathbf{I}}_n -{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal }\nonumber \\ \Rightarrow \quad \textsf {d}\mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))&=\textsf {d}\big [({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}}{\varvec{\Psi }}{\mathbf{I}}_{{\mathcal {N}}} ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal }\big ]\nonumber \\&=({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}} [\textsf {d}{\mathbf {C}}]({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1} {\mathbf{I}}_{{\mathcal {N}}}{\varvec{\Psi }}{\mathbf{I}}_{{\mathcal {N}}}({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-\intercal }\nonumber \\&\quad +({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}} [\textsf {d}{\varvec{\Psi }}] {\mathbf{I}}_{{\mathcal {N}}}({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-\intercal }\nonumber \\&\quad +({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}}{\varvec{\Psi }} {\mathbf{I}}_{{\mathcal {N}}}({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal } [\textsf {d}{\mathbf {C}}^{\intercal }]{\mathbf{I}}_{{\mathcal {N}}}({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-\intercal } \end{aligned}$$\end{document}

Vectorizing Eq. (A.4) yields the following term for vec d V ( V d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}\mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))$$\end{document}:

(A.5) ( I n 2 + K n ) [ ( ( I n - I N C ) - 1 I N Ψ I N ) I n ] [ ( I n - I N C ) - ( ( I n - I N C ) - 1 I N ) ] vec d C + [ ( I n - I N C ) - 1 ( I n - I N C ) - 1 ] ( I N I N ) vec d Ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&({\mathbf{I}}_{n^2}+{\mathbf {K}}_n)\big [(({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}}{\varvec{\Psi }}{\mathbf{I}}_{{\mathcal {N}}}) \otimes {\mathbf{I}}_n\big ]\big [({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal } \otimes (({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}}) \big ]{{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\mathbf {C}}\nonumber \\ +&\big [({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}\otimes ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}\big ]({\mathbf{I}}_{{\mathcal {N}}} \otimes {\mathbf{I}}_{{\mathcal {N}}}){{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Psi }} \end{aligned}$$\end{document}

Where K n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {K}}_n$$\end{document} denotes the commutation matrix for n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times n$$\end{document} matrices (Magnus & Neudecker, Reference Magnus and Neudecker1979). For simplicity of notation, we define the following n 2 × n 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^2 \times n^2$$\end{document} matrices:

G 2 , C : = ( I n 2 + K n ) [ ( ( I n - I N C ) - 1 I N Ψ I N ) I n ] [ ( I n - I N C ) - ( ( I n - I N C ) - 1 I N ) ] G 2 , Ψ : = [ ( I n - I N C ) - 1 ( I n - I N C ) - 1 ] ( I N I N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {G}}_{2,\mathbf {C}}&:=({\mathbf{I}}_{n^2}+{\mathbf {K}}_n) \big [(({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}} {\varvec{\Psi }}{\mathbf{I}}_{{\mathcal {N}}})\otimes {\mathbf{I}}_n\big ]\big [({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-\intercal } \otimes (({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}})\big ]\\ {\mathbf {G}}_{2,\varvec{\Psi }}&:=\big [({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}} {\mathbf {C}})^{-1}\otimes ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}\big ]({\mathbf{I}}_{{\mathcal {N}}} \otimes {\mathbf{I}}_{{\mathcal {N}}}) \end{aligned}$$\end{document}

Substituting G 2 , C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {G}}_{2,\mathbf {C}}$$\end{document} and G 2 , Ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {G}}_{2,\varvec{\Psi }}$$\end{document} into the expression for vec d V ( V d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}\mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))$$\end{document} yields:

(A.6) vec d V ( V d o ( x ) ) = [ G 2 , C vec C θ + G 2 , Ψ vec Ψ θ ] = vec V ( V d o ( x ) ) θ d θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}\mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))&=\underbrace{\big [{\mathbf {G}}_{2,\mathbf {C}} \frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\, {{\mathbf {C}}}}{\partial {\varvec{\theta }}^\intercal }+{\mathbf {G}}_{2,\varvec{\Psi }}\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\, {{\varvec{\Psi }}}}{\partial {\varvec{\theta }}^\intercal }\big ]}_{=\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))}{\partial {\varvec{\theta }}^\intercal }}\textsf {d}{\varvec{\theta }} \end{aligned}$$\end{document}

Note that vec d Ψ = vec Ψ θ d θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\,\mathrm{\textsf {vec}}\,}}}\,\,\textsf {d}{\varvec{\Psi }}=\frac{\partial {{\,\mathrm{\textsf {vec}}\,}}{{\varvec{\Psi }}}}{\partial {\varvec{\theta }}^\intercal }\textsf {d}{\varvec{\theta }}$$\end{document} holds by definition and each entry of the matrix Ψ = Ψ ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Psi }}={\varvec{\Psi }} ({\varvec{\theta }})$$\end{document} is either equal to a single element of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} or equal to zero. Thus, the n 2 × p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^2 \times p$$\end{document} Jacobian matrix vec Ψ θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\, {\varvec{\Psi }}}{\partial {\varvec{\theta }}^\intercal }$$\end{document} is a zero-one matrix. Since V ( V d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))$$\end{document} is symmetric, one oftentimes works with the half-vectorized version, given by:

(A.7) vech d V ( V d o ( x ) ) = L n [ G 2 , C vec C θ + G 2 , Ψ vec Ψ θ ] = vech V ( V d o ( x ) ) θ d θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {{{\,\mathrm{\textsf {vech}}\,}}}\, \textsf {d}\mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))&=\underbrace{{\mathbf {L}}_n\big [{\mathbf {G}}_{2,\mathbf {C}} \frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\,{{\mathbf {C}}}}{\partial {\varvec{\theta }}^\intercal }+{\mathbf {G}}_{2,\varvec{\Psi }}\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\,{{\varvec{\Psi }}}}{\partial {\varvec{\theta }}^\intercal }\big ]}_{=\frac{\partial {{{\,\mathrm{\textsf {vech}}\,}}} \mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))}{\partial {\varvec{\theta }}^\intercal }}\textsf {d}{\varvec{\theta }} \end{aligned}$$\end{document}

Where L n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {L}}_n$$\end{document} denotes the elimination matrix for n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times n$$\end{document} matrices Magnus and Neudecker (Reference Magnus and Neudecker1980).

Proof of Equation (18c): We treat the interventional pdf f ( v N d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f({\mathbf {v}}_{{\mathcal {N}}}\mid do({\mathbf {x}}))$$\end{document} as a function φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi $$\end{document} of the interventional mean vector and the interventional covariance matrix:

(A.8a) φ ( μ N , Σ N ) = ( 2 π ) - n - K x 2 | Σ N | - 1 2 × exp - 1 2 ( v N - μ N ) Σ N - 1 ( v N - μ N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varphi ({\varvec{\mu }}_{{\mathcal {N}}},{\varvec{\Sigma }}_{{\mathcal {N}}})&=(2\pi )^{-\frac{n-K_x}{2}}|{\varvec{\Sigma }}_{{\mathcal {N}}}|^{-\frac{1}{2}} \times \exp \left( -\frac{1}{2}({\mathbf {v}}_{{\mathcal {N}}} -{\varvec{\mu }}_{{\mathcal {N}}})^\intercal {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1} ({\mathbf {v}}_{{\mathcal {N}}}-{\varvec{\mu }}_{{\mathcal {N}}})\right) \end{aligned}$$\end{document}
(A.8b) μ N : = 1 N E ( V d o ( x ) ) , Σ N : = 1 N V ( V d o ( x ) ) 1 N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\varvec{\mu }}_{{\mathcal {N}}}:&={\mathbf {1}}_{{\mathcal {N}}}^\intercal \mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))\quad , \quad {\varvec{\Sigma }}_{{\mathcal {N}}}:={\mathbf {1}}_{{\mathcal {N}}}^\intercal \mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}})){\mathbf {1}}_{{\mathcal {N}}} \end{aligned}$$\end{document}

Further, we treat φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi $$\end{document} as a product of two functions, that is, φ = φ 1 · φ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi =\varphi _1\cdot \varphi _2$$\end{document}, with:

(A.9a) φ 1 ( Σ N ) : = ( 2 π ) - n - K x 2 | Σ N | - 1 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varphi _1({\varvec{\Sigma }}_{{\mathcal {N}}})&:=(2\pi )^{-\frac{n-K_x}{2}} |{\varvec{\Sigma }}_{{\mathcal {N}}}|^{-\frac{1}{2}} \end{aligned}$$\end{document}
(A.9b) φ 2 ( μ N , Σ N ) : = exp - 1 2 ( v N - μ N ) Σ N - 1 ( v N - μ N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varphi _2({\varvec{\mu }}_{{\mathcal {N}}},{\varvec{\Sigma }}_{{\mathcal {N}}})&:= \exp \left( -\frac{1}{2}({\mathbf {v}}_{{\mathcal {N}}} -{\varvec{\mu }}_{{\mathcal {N}}})^\intercal {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1} ({\mathbf {v}}_{{\mathcal {N}}}-{\varvec{\mu }}_{{\mathcal {N}}})\right) \end{aligned}$$\end{document}

We display φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi $$\end{document} from Eq. (A.8a) as a function of φ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _1$$\end{document} and φ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _2$$\end{document} and apply the product rule, yielding:

(A.10) d φ = d [ φ 1 · φ 2 ] = [ d φ 1 ] · φ 2 + φ 1 · [ d φ 2 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}\varphi&= \textsf {d}[\varphi _1\cdot \varphi _2]=[\textsf {d}\varphi _1]\cdot \varphi _2+\varphi _1\cdot [\textsf {d}\varphi _2] \end{aligned}$$\end{document}

Both φ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _1$$\end{document} and φ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _2$$\end{document} are composite functions:

(A.11) φ 1 = g 1 ( f 1 ( Σ N ) ) , φ 2 = h 2 ( g 2 [ f 21 ( μ N ) , f 22 ( Σ N ) ] ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varphi _1&=g_1(f_1({\varvec{\Sigma }}_{{\mathcal {N}}})), \quad \varphi _2=h_2(g_2[{\mathbf {f}}_{21}({\varvec{\mu }}_{{\mathcal {N}}}), {\mathbf {f}}_{22}({\varvec{\Sigma }}_{{\mathcal {N}}})]) \end{aligned}$$\end{document}

with:

(A.12a) f 1 ( Σ N ) = | Σ N | , R n × n R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f_1({\varvec{\Sigma }}_{{\mathcal {N}}})&=|{\varvec{\Sigma }}_{{\mathcal {N}}}|, \quad {\mathbb {R}}^{n \times n} \mapsto {\mathbb {R}} \end{aligned}$$\end{document}
(A.12b) g 1 ( f 1 ) = ( 2 π ) - n - K x 2 f 1 - 1 2 , R R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g_1(f_1)&=(2\pi )^{-\frac{n-K_x}{2}}f_1^{-\frac{1}{2}}, \quad {\mathbb {R}} \mapsto {\mathbb {R}}\end{aligned}$$\end{document}
(A.12c) f 21 ( μ N ) = ( v N - μ N ) , R n - K x R n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {f}}_{21}({\varvec{\mu }}_{{\mathcal {N}}})&=({\mathbf {v}}_{{\mathcal {N}}} -{\varvec{\mu }}_{{\mathcal {N}}}), \quad {\mathbb {R}}^{n-K_x} \mapsto {\mathbb {R}}^{n-K_x} \end{aligned}$$\end{document}
(A.12d) f 22 ( Σ N ) = Σ N - 1 , R n × n R n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {f}}_{22}({\varvec{\Sigma }}_{{\mathcal {N}}})&= {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}, \quad {\mathbb {R}}^{n \times n} \mapsto {\mathbb {R}}^{n \times n} \end{aligned}$$\end{document}
(A.12e) g 2 ( f 21 , f 22 ) = f 21 f 22 f 21 , R n - K x × R n × n R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g_2({\mathbf {f}}_{21},{\mathbf {f}}_{22})&={\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22}{\mathbf {f}}_{21}, \quad {\mathbb {R}}^{n-K_x} \times {\mathbb {R}}^{n \times n} \mapsto {\mathbb {R}}\end{aligned}$$\end{document}
(A.12f) h 2 ( g 2 ) = exp ( - 1 2 g 2 ) , R R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} h_2(g_2)&=\text {exp}(-\frac{1}{2}g_2), \quad {\mathbb {R}} \mapsto {\mathbb {R}} \end{aligned}$$\end{document}

The differentials of φ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _1$$\end{document} and φ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _2$$\end{document} are computed using Cauchy’s invariance (Magnus & Neudecker, Reference Magnus and Neudecker1999). We start with φ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _1$$\end{document} and compute the differential of the innermost function f 1 ( Σ N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_1({\varvec{\Sigma }}_{{\mathcal {N}}})$$\end{document}:

(A.13) d f 1 = | Σ N | tr ( Σ N - 1 d Σ N ) = | Σ N | vec ( Σ N - ) vec d Σ N = f 1 vec ( Σ N - 1 ) vec d Σ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}f_{1}&=|{\varvec{\Sigma }}_{{\mathcal {N}}}|\text {tr} ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}\textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}})=|{\varvec{\Sigma }}_{{\mathcal {N}}}|{{{\,\mathrm{\textsf {vec}}\,}}}\,\, ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-\intercal })^\intercal {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}=f_1{{{\,\mathrm{\textsf {vec}}\,}}}\,\, ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})^\intercal {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}} \end{aligned}$$\end{document}

Next, we obtain the differential of g 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_1$$\end{document} with respect to f 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_1$$\end{document}:

(A.14) d g 1 d f 1 = - 1 2 ( 2 π ) - n - K x 2 f 1 - 3 2 = - 1 2 φ 1 f 1 - 1 d g 1 = - 1 2 φ 1 f 1 - 1 d f 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\textsf {d}g_1}{\textsf {d}f_1}&=-\frac{1}{2}(2\pi )^{-\frac{n-K_x}{2}}f_1^{-\frac{3}{2}} =-\frac{1}{2}\varphi _1f_1^{-1} \Rightarrow \textsf {d}g_1=-\frac{1}{2}\varphi _1 f_1^{-1} \textsf {d}f_1 \end{aligned}$$\end{document}

Plugging in Eq. (A.13) into Eq. (A.14) yields:

(A.15) d φ 1 = d g 1 d f 1 d f 1 = - 1 2 φ 1 f 1 - 1 f 1 vec ( Σ N - 1 ) vec d Σ N = - 1 2 φ 1 vec ( Σ N - 1 ) vec d Σ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}\varphi _1=\frac{\textsf {d}g_1}{\textsf {d}f_1}\textsf {d}f_1&=-\frac{1}{2}\varphi _1f_1^{-1} f_1 {{{\,\mathrm{\textsf {vec}}\,}}}\,\, ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})^\intercal {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}=-\frac{1}{2}\varphi _1 {{{\,\mathrm{\textsf {vec}}\,}}}\,\, ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})^\intercal {{{\,\mathrm{\textsf {vec}}\,}}}\,\,\textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}} \end{aligned}$$\end{document}

For φ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _2$$\end{document}, we start with the differentials of f 21 ( μ N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {f}}_{21}({\varvec{\mu }}_{{\mathcal {N}}})$$\end{document} and f 22 ( Σ N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {f}}_{22}({\varvec{\Sigma }}_{{\mathcal {N}}})$$\end{document}:

(A.16a) d f 21 = d ( v N - μ N ) = - d μ N f 21 μ N = - I n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}{\mathbf {f}}_{21}&=\textsf {d}({\mathbf {v}}_{{\mathcal {N}}} -{\varvec{\mu }}_{{\mathcal {N}}})=-\textsf {d}{\varvec{\mu }}_{{\mathcal {N}}} \Rightarrow \frac{\partial {\mathbf {f}}_{21}}{\partial {\varvec{\mu }}_{{\mathcal {N}}}^\intercal }=-{\mathbf{I}}_{n-K_x} \end{aligned}$$\end{document}
(A.16b) d f 22 = d Σ N - 1 = - Σ N - 1 [ d Σ N ] Σ N - 1 vec d f 22 = - ( Σ N - 1 Σ N - 1 ) vec d Σ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}{\mathbf {f}}_{22}&=\textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}= -{\varvec{\Sigma }}_{{\mathcal {N}}}^{-1} [\textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}] {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1} \Rightarrow {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\mathbf {f}}_{22}=-({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}\otimes {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}){{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}} \end{aligned}$$\end{document}

Next, we obtain the total differential of g 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_2$$\end{document} by applying the product rule twice:

(A.17) d g 2 = d [ f 21 f 22 f 21 ] = [ d f 21 ] f 22 f 21 + f 21 [ d f 22 ] f 21 + f 21 f 22 d f 21 = 2 f 21 f 22 d f 21 + ( f 21 f 21 ) vec d f 22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}g_2&=\textsf {d}[{\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22} {\mathbf {f}}_{21}]=[\textsf {d}{\mathbf {f}}_{21}^\intercal ]{\mathbf {f}}_{22} {\mathbf {f}}_{21}+{\mathbf {f}}_{21}^\intercal [\textsf {d}{\mathbf {f}}_{22}] {\mathbf {f}}_{21}+{\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22} \textsf {d}{\mathbf {f}}_{21}\nonumber \\&=2{\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22}\textsf {d}{\mathbf {f}}_{21}+({\mathbf {f}}_{21}^\intercal \otimes {\mathbf {f}}_{21}^\intercal ) {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\mathbf {f}}_{22} \end{aligned}$$\end{document}

The last mapping that is applied in this chain is h 2 ( g 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_2(g_2)$$\end{document}, which is a scalar function of a scalar argument:

(A.18) d h 2 d g 2 = d d g 2 exp ( - 1 2 g 2 ) = - 1 2 exp ( - 1 2 g 2 ) = - 1 2 φ 2 d h 2 = - 1 2 φ 2 d g 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\textsf {d}h_2}{\textsf {d}g_2}&=\frac{\textsf {d}}{\textsf {d}g_2} \text {exp}(-\frac{1}{2}g_2)=-\frac{1}{2}\text {exp}(-\frac{1}{2}g_2) =-\frac{1}{2}\varphi _2 \quad \Rightarrow \ \textsf {d}h_2=-\frac{1}{2}\varphi _2 \textsf {d}g_2 \end{aligned}$$\end{document}

Plugging in (A.17) into (A.18) yields:

(A.19) d φ 2 = d h 2 d g 2 d g 2 = - 1 2 φ 2 2 f 21 f 22 d f 21 + ( f 21 f 21 ) vec d f 22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}\varphi _2&=\frac{\textsf {d}h_2}{\textsf {d}g_2}\textsf {d}g_2=-\frac{1}{2}\varphi _2\left[ 2{\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22} \textsf {d}{\mathbf {f}}_{21}+({\mathbf {f}}_{21}^\intercal \otimes {\mathbf {f}}_{21}^\intercal ){{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\mathbf {f}}_{22} \right] \end{aligned}$$\end{document}

Plugging in Eqs. (A.16) into (A.19) yields:

(A.20) d φ 2 = - 1 2 φ 2 ( 2 f 21 f 22 [ - d μ N ] + ( f 21 f 21 ) [ - ( Σ N - 1 Σ N - 1 ) vec d Σ N ] ) = φ 2 [ f 21 f 22 d μ N + 1 2 ( f 21 f 21 ) ( Σ N - 1 Σ N - 1 ) vec d Σ N ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}\varphi _2&=-\frac{1}{2}\varphi _2 (2{\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22} {[}-\textsf {d}{\varvec{\mu }}_{{\mathcal {N}}}]+({\mathbf {f}}_{21}^\intercal \otimes {\mathbf {f}}_{21}^\intercal )[-({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1} \otimes {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}) {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}])\nonumber \\&=\varphi _2\big [{\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22}\textsf {d}{\varvec{\mu }}_{{\mathcal {N}}}+\frac{1}{2}({\mathbf {f}}_{21}^\intercal \otimes {\mathbf {f}}_{21}^\intercal )({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}\otimes {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}){{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}\big ] \end{aligned}$$\end{document}

We now insert Eqs. (A.9a), (A.9b), (A.15), and (A.20) into Eq. (A.10):

(A.21) d φ = d f ( v N d o ( x ) ) = ( - 1 2 φ 1 vec ( Σ N - 1 ) vec d Σ N ) φ 2 + φ 1 · ( φ 2 [ f 21 f 22 d μ N + 1 2 ( f 21 f 21 ) ( Σ N - 1 Σ N - 1 ) vec d Σ N ] ) = φ 1 φ 2 f 21 f 22 d μ N + - 1 2 vec ( Σ N - 1 ) + 1 2 ( f 21 f 21 ) ( Σ N - 1 Σ N - 1 ) vec d Σ N = φ ( v N - μ N ) Σ N - 1 d μ N + 1 2 ( [ ( v N - μ N ) ( v N - μ N ) ] ( Σ N - 1 Σ N - 1 ) - vec ( Σ N - 1 ) ) vec d Σ N = f ( v N d o ( x ) ) [ G 3 , μ , G 3 , Σ ] d μ N vec d Σ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\textsf {d}\varphi =\textsf {d}f({\mathbf {v}}_{{\mathcal {N}}} \mid do({\mathbf {x}}))\nonumber \\&=\bigg (-\frac{1}{2}\varphi _1 {{{\,\mathrm{\textsf {vec}}\,}}}\,\, ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})^\intercal {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}\bigg )\varphi _2\nonumber \\&\qquad +\varphi _1\cdot \bigg (\varphi _2\big [{\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22} \textsf {d}{\varvec{\mu }}_{{\mathcal {N}}}+\frac{1}{2}({\mathbf {f}}_{21}^\intercal \otimes {\mathbf {f}}_{21}^\intercal )({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1} \otimes {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}){{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}\big ]\bigg )\nonumber \\&=\varphi _1 \varphi _2\left[ {\mathbf {f}}_{21}^\intercal {\mathbf {f}}_{22} \textsf {d}{\varvec{\mu }}_{{\mathcal {N}}}+\left( -\frac{1}{2}{{{\,\mathrm{\textsf {vec}}\,}}}\,\, ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})^\intercal +\frac{1}{2} ({\mathbf {f}}_{21}^\intercal \otimes {\mathbf {f}}_{21}^\intercal ) ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}\otimes {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}) \right) {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}\right] \nonumber \\&{=}\varphi \left[ ({\mathbf {v}}_{{\mathcal {N}}} {-}{\varvec{\mu }}_{{\mathcal {N}}})^\intercal {\varvec{\Sigma }}_{{\mathcal {N}}}^{{-}1}\textsf {d}{\varvec{\mu }}_{{\mathcal {N}}}{+}\frac{1}{2}\big (\big [({\mathbf {v}}_{{\mathcal {N}}} {-}{\varvec{\mu }}_{{\mathcal {N}}})^\intercal \otimes ({\mathbf {v}}_{{\mathcal {N}}}\right. \left. {-}{\varvec{\mu }}_{{\mathcal {N}}})^\intercal \big ] ({\varvec{\Sigma }}_{{\mathcal {N}}}^{{-}1}\otimes {\varvec{\Sigma }}_{{\mathcal {N}}}^{{-}1})\right. \left. {-}{{{\,\mathrm{\textsf {vec}}\,}}} ({\varvec{\Sigma }}_{{\mathcal {N}}}^{{-}1})^\intercal \big ) {{{\,\mathrm{\textsf {vec}}\,}}} \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}\right] \nonumber \\&\qquad =f({\mathbf {v}}_{{\mathcal {N}}}\mid do({\mathbf {x}}))\big [{\mathbf {G}}_{3,\varvec{\mu }},{\mathbf {G}}_{3,\varvec{\Sigma }} \big ]\begin{pmatrix} \textsf {d}{\varvec{\mu }}_{{\mathcal {N}}}\\ {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}} \end{pmatrix} \end{aligned}$$\end{document}

Where we have resubstituted the expressions for φ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _1$$\end{document}, φ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _2$$\end{document}, φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi $$\end{document}, f 21 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {f}}_{21}$$\end{document}, f 22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {f}}_{22}$$\end{document} and introduced the following terms for simplicity of notation:

G 3 , μ : = ( v N - μ N ) Σ N - 1 G 3 , Σ : = 1 2 ( [ ( v N - μ N ) ( v N - μ N ) ] ( Σ N - 1 Σ N - 1 ) - vec ( Σ N - 1 ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {G}}_{3,\varvec{\mu }}&:=({\mathbf {v}}_{{\mathcal {N}}}-{\varvec{\mu }}_{{\mathcal {N}}})^\intercal {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}\\ {\mathbf {G}}_{3,\varvec{\Sigma }}&:=\frac{1}{2}\big (\big [({\mathbf {v}}_{{\mathcal {N}}}-{\varvec{\mu }}_{{\mathcal {N}}})^\intercal \otimes ({\mathbf {v}}_{{\mathcal {N}}}-{\varvec{\mu }}_{{\mathcal {N}}})^\intercal \big ]({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1}\otimes {\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})- {{{\,\mathrm{\textsf {vec}}\,}}}\,\, ({\varvec{\Sigma }}_{{\mathcal {N}}}^{-1})^\intercal \big ) \end{aligned}$$\end{document}

From the equations stated in Eq. (A.8b), it immediately follows:

(A.22) d μ N = d [ 1 N E ( V d o ( x ) ) ] = 1 N d E ( V d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}{\varvec{\mu }}_{{\mathcal {N}}}&=\textsf {d}[{\mathbf {1}}_{{\mathcal {N}}}^\intercal \mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))] ={\mathbf {1}}_{{\mathcal {N}}}^\intercal \textsf {d}\mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}})) \end{aligned}$$\end{document}
(A.23) vec d Σ N = vec d [ 1 N V ( V d o ( x ) ) 1 N ] = ( 1 N 1 N ) vec d V ( V d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}{\varvec{\Sigma }}_{{\mathcal {N}}}&= {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \textsf {d}[{\mathbf {1}}_{{\mathcal {N}}}^\intercal \mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}})){\mathbf {1}}_{{\mathcal {N}}}] =({\mathbf {1}}_{{\mathcal {N}}}^\intercal \otimes {\mathbf {1}}_{{\mathcal {N}}}^\intercal ) {{{\,\mathrm{\textsf {vec}}\,}}}\,\,\textsf {d}\mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}})) \end{aligned}$$\end{document}

Using Eqs. (A.3) and (A.4), we obtain the final result:

(A.24) d f ( v N d o ( x ) ) = f ( v N d o ( x ) ) [ G 3 , μ , G 3 , Σ ] 1 N ( ( x 1 I ( I n - I N C ) - ) ( ( I n - I N C ) - 1 I N ) ) vec C θ ( 1 N 1 N ) G 2 , C vec C θ + G 2 , Ψ vec Ψ θ f ( v N d o ( x ) ) θ d θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\textsf {d}f({\mathbf {v}}_{{\mathcal {N}}}\mid do({\mathbf {x}}))=\nonumber \\&\underbrace{f({\mathbf {v}}_{{\mathcal {N}}}\mid do({\mathbf {x}}))\big [{\mathbf {G}}_{3,\varvec{\mu }},{\mathbf {G}}_{3,\varvec{\Sigma }}\big ] \begin{pmatrix} {\mathbf {1}}_{{\mathcal {N}}}^\intercal \big (({\mathbf {x}}^\intercal {\mathbf {1}}_{{\mathcal {I}}}^\intercal ({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-\intercal })\otimes (({\mathbf{I}}_n-{\mathbf{I}}_{{\mathcal {N}}}{\mathbf {C}})^{-1}{\mathbf{I}}_{{\mathcal {N}}})\big )\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\, {\mathbf {C}}}{\partial {\varvec{\theta }}^\intercal }\\ ({\mathbf {1}}_{{\mathcal {N}}}^\intercal \otimes {\mathbf {1}}_{{\mathcal {N}}}^\intercal )\left( {\mathbf {G}}_{2,\mathbf {C}} \frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\,{{\mathbf {C}}}}{\partial {\varvec{\theta }}^\intercal }+{\mathbf {G}}_{2,\varvec{\Psi }}\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\, {{\varvec{\Psi }}}}{\partial {\varvec{\theta }}^\intercal }\right) \end{pmatrix}}_{\frac{\partial f({\mathbf {v}}_{{\mathcal {N}}} \mid do({\mathbf {x}}))}{\partial {\varvec{\theta }}^\intercal }}\textsf {d}{\varvec{\theta }} \end{aligned}$$\end{document}

Proof of Equation (18d): The general definition of g 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_4$$\end{document} for a vector of outcome variables Y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {Y}}$$\end{document} is given in Eq. (15). The following derivation is restricted to the case of a single (scalar) outcome variable Y, that is, | Y | = K y = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\mathcal {Y}}|=K_y=1$$\end{document}.

(A.25) γ 4 : = P ( y low y y up d o ( x ) ) = g 4 ( θ γ 4 ; x , y low , y up ) = y low y up f ( y d o ( x ) ) d y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _4:=P(y^{{low}}\le y \le y^{{up}} \mid do({\mathbf {x}}))=g_4({\varvec{\theta }}_{\gamma _4};{\mathbf {x}},y^{{low}}, y^{{up}})=\int _{y^{{low}}}^{y^{{up}}} f(y\mid do({\mathbf {x}})) \textsf {d}y \end{aligned}$$\end{document}

Let Y be the j-th entry of V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}$$\end{document}. For simplicity of notation, we denote the scalar interventional mean and the scalar interventional variance as:

(A.26a) μ y = μ y ( θ ) : = E ( y d o ( x ) ) = ı j E ( V d o ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu _y=\mu _y({\varvec{\theta }})&:=\mathrm {E}(y\mid do({\mathbf {x}})) =\varvec{{{\imath }}}_{j}^\intercal \mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}})) \end{aligned}$$\end{document}
(A.26b) σ y 2 = σ y 2 ( θ ) : = V ( y d o ( x ) ) = ı j V ( V d o ( x ) ) ı j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma ^2_y=\sigma ^2_y({\varvec{\theta }})&:=\mathrm {V}(y\mid do({\mathbf {x}}))=\varvec{{{\imath }}}_{j}^\intercal \mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))\varvec{{{\imath }}}_{j} \end{aligned}$$\end{document}

Again, we take the derivative with respect to the entire parameter vector θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document}.

(A.27) θ g 4 ( θ ; x , y up , y low ) = θ y low - μ y ( θ ) σ y ( θ ) ) y up - μ y ( θ ) σ y ( θ ) ϕ ( u ) d u = y low - μ y ( θ ) σ y ( θ ) ) y up - μ y ( θ ) σ y ( θ ) θ ϕ ( u ) d u + ϕ y up - μ y ( θ ) σ y ( θ ) θ y up - μ y ( θ ) σ y ( θ ) - ϕ y low - μ y ( θ ) σ y ( θ ) θ y low - μ y ( θ ) σ y ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\frac{\partial }{\partial {\varvec{\theta }}^\intercal }g_4({\varvec{\theta }}; {\mathbf {x}},y^{{up}},y^{{low}})=\frac{\partial }{\partial {\varvec{\theta }}^\intercal }\int _{\frac{y^{{low}}-\mu _y({\varvec{\theta }})}{\sigma _y({\varvec{\theta }})})}^{\frac{y^{{up}}-\mu _y({\varvec{\theta }})}{\sigma _y({\varvec{\theta }})}}\phi (u)\textsf {d}u=\int _{\frac{y^{{low}} -\mu _y({\varvec{\theta }})}{\sigma _y({\varvec{\theta }})})}^{\frac{y^{{up}} -\mu _y({\varvec{\theta }})}{\sigma _y({\varvec{\theta }})}}\frac{\partial }{\partial {\varvec{\theta }}^\intercal }\phi (u)\textsf {d}u \nonumber \\&+\phi \left( \frac{y^{{up}}-\mu _y({\varvec{\theta }})}{\sigma _y({\varvec{\theta }})} \right) \frac{\partial }{\partial {\varvec{\theta }}^\intercal }\left[ \frac{y^{{up}}-\mu _y({\varvec{\theta }})}{\sigma _y({\varvec{\theta }})}\right] -\phi \left( \frac{y^{\textit{low}} -\mu _y({\varvec{\theta }})}{\sigma _y({\varvec{\theta }})}\right) \frac{\partial }{\partial {\varvec{\theta }}^\intercal }\left[ \frac{y^{{low}}-\mu _y({\varvec{\theta }})}{\sigma _y({\varvec{\theta }})}\right] \end{aligned}$$\end{document}

The last equation sign of Eq. (A.27) follows from Leibniz’s rule for partial differentiation of an integral (Dieudonné, Reference Dieudonné1969). The derivative under the integral sign (first term after the last equation sign) is equal to zero since the pdf of the standard normal ϕ ( u ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (u)$$\end{document} is functionally independent of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document}. For simplicity of notation, we use μ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _y$$\end{document} and σ y 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_y$$\end{document} instead of μ y ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _y({\varvec{\theta }})$$\end{document} and σ y 2 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_y({\varvec{\theta }})$$\end{document} in the following. The two partial derivatives in the second line of Eq. (A.27) have the same structure φ 3 = h 3 [ f 31 ( μ y ) , f 32 ( σ y 2 ) ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _3=h_3[f_{31}(\mu _y),f_{32}(\sigma _y^2)]$$\end{document} and differ only in the constants y up \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^{{up}}$$\end{document} and y low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^{{low}}$$\end{document}. The functions below are stated for y up \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^{{up}}$$\end{document} and are defined analogously for y low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^{{low}}$$\end{document} (we do not state the latter ones explicitly):

(A.28a) f 31 ( μ y ) = ( y up - μ y ) , R R , f 32 ( σ y 2 ) = ( σ y 2 ) - 1 2 , R + R + \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&f_{31}(\mu _y)=(y^{{up}}-\mu _y) \ , \ {\mathbb {R}} \mapsto {\mathbb {R}} \quad , \quad f_{32}(\sigma _y^2) ={(\sigma _y^{2})}^{-\frac{1}{2}} \ , \ {\mathbb {R}}^+ \mapsto {\mathbb {R}} ^+ \end{aligned}$$\end{document}
(A.28b) h 3 ( f 31 , f 32 ) = f 31 f 32 , R × R + R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&h_3(f_{31},f_{32})=f_{31}f_{32} \ , \ {\mathbb {R}}\times {\mathbb {R}}^{+} \mapsto {\mathbb {R}} \end{aligned}$$\end{document}

The corresponding differentials and derivatives are given by:

(A.29) h 3 f 31 = ( σ y 2 ) - 1 2 , h 3 f 32 = ( y up - μ y ) , d f 31 d μ y = - 1 , d f 32 d σ y 2 = ( - 1 2 ) ( σ y 2 ) - 3 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial h_3}{\partial f_{31}}&={(\sigma _y^{2})}^{-\frac{1}{2}} \ , \ \frac{\partial h_3}{\partial f_{32}}=(y^{\textit{up}}-\mu _y) \ , \ \frac{\textsf {d}f_{31}}{\textsf {d}\mu _y}=-1 \ , \ \frac{\textsf {d}f_{32}}{\textsf {d}\sigma _y^2}=(-\frac{1}{2}){(\sigma _y^{2})}^{-\frac{3}{2}} \end{aligned}$$\end{document}

The differential of φ 3 = h 3 [ f 31 ( μ y ( θ ) ) , f 32 ( σ y 2 ( θ ) ) ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _3=h_3[f_{31}(\mu _y({\varvec{\theta }})),f_{32}(\sigma _y^2({\varvec{\theta }}))]$$\end{document} can be evaluated as follows using the total differential, Cauchy’s invariance and the chain rule:

(A.30) d φ 3 = h 3 θ d θ = h 3 f 31 f 31 μ y μ y θ + h 3 f 32 f 32 σ y 2 σ y 2 θ d θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textsf {d}\varphi _3&=\frac{\partial h_3}{\partial {\varvec{\theta }}^\intercal }\textsf {d}{\varvec{\theta }}=\left( \frac{\partial h_3}{\partial f_{31}}\frac{\partial f_{31}}{\partial \mu _y}\frac{\partial \mu _y}{\partial {\varvec{\theta }}^\intercal }+\frac{\partial h_3}{\partial f_{32}}\frac{\partial f_{32}}{\partial \sigma _y^2}\frac{\partial \sigma _y^2}{\partial {\varvec{\theta }}^\intercal } \right) \textsf {d}{\varvec{\theta }} \end{aligned}$$\end{document}

Inserting Eqs. (A.28) and (A.29) into Eq. (A.30) yields the following term for y up \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^{{up}}$$\end{document} (analogous for y low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^{{low}}$$\end{document}):

(A.31) h 3 θ = - 1 σ y μ y θ - 1 2 σ y 2 y up - μ y σ y σ y 2 θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial h_3}{\partial {\varvec{\theta }}^\intercal }&=-\frac{1}{\sigma _y}\frac{\partial \mu _y}{\partial {\varvec{\theta }}^\intercal }-\frac{1}{2\sigma _y^{2}} \left( \frac{y^{{up}}-\mu _y}{\sigma _y}\right) \frac{\partial \sigma _y^2}{\partial {\varvec{\theta }}^\intercal } \end{aligned}$$\end{document}

Inserting Eqs. (A.28), (A.29), (A.30), and (A.31) into the derivative of the causal effect function g 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_4$$\end{document} (Eq. [A.27]) and rearranging yields:

(A.32) θ g 4 ( θ ; x , y up , y low ) = ϕ y up - μ y σ y - 1 σ y μ y θ - 1 2 σ y 2 y up - μ y σ y σ y 2 θ - ϕ y low - μ y σ y - 1 σ y μ y θ - 1 2 σ y 2 y low - μ y σ y σ y 2 θ = - 1 σ y ϕ y up - μ y σ y - ϕ y low - μ y σ y μ y θ - 1 2 σ y 2 ϕ y up - μ y σ y y up - μ y σ y - ϕ y low - μ y σ y y low - μ y σ y σ y 2 θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\frac{\partial }{\partial {\varvec{\theta }}^\intercal }g_4({\varvec{\theta }};x, y^{{up}},y^{{low}})=\nonumber \\&\phi \left( \frac{y^{{up}}-\mu _y}{\sigma _y} \right) \left( -\frac{1}{\sigma _y}\frac{\partial \mu _y}{\partial {\varvec{\theta }}^\intercal }-\frac{1}{2\sigma _y^{2}}\left( \frac{y^{{up}} -\mu _y}{\sigma _y}\right) \frac{\partial \sigma _y^2}{\partial {\varvec{\theta }}^\intercal }\right) \nonumber \\&-\phi \left( \frac{y^{{low}}-\mu _y}{\sigma _y}\right) \left( -\frac{1}{\sigma _y}\frac{\partial \mu _y}{\partial {\varvec{\theta }}^\intercal } -\frac{1}{2\sigma _y^{2}}\left( \frac{y^{{low}}-\mu _y}{\sigma _y}\right) \frac{\partial \sigma _y^2}{\partial {\varvec{\theta }}^\intercal }\right) =\nonumber \\&-\frac{1}{\sigma _y}\left[ \phi \left( \frac{y^{{up}}-\mu _y}{\sigma _y}\right) -\phi \left( \frac{y^{{low}}-\mu _y}{\sigma _y}\right) \right] \frac{\partial \mu _y}{\partial {\varvec{\theta }}^\intercal }\nonumber \\&-\frac{1}{2\sigma _y^{2}}\left[ \phi \left( \frac{y^{{up}}-\mu _y}{\sigma _y}\right) \left( \frac{y^{{up}}-\mu _y}{\sigma _y}\right) -\phi \left( \frac{y^{{low}}-\mu _y}{\sigma _y}\right) \left( \frac{y^{{low}}-\mu _y}{\sigma _y}\right) \right] \frac{\partial \sigma _y^2}{\partial {\varvec{\theta }}^\intercal } \end{aligned}$$\end{document}

The derivatives μ y θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial \mu _y}{\partial {\varvec{\theta }}^\intercal }$$\end{document} and σ y 2 θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial \sigma _y^2}{\partial {\varvec{\theta }}^\intercal }$$\end{document} are obtained from the general expressions in Eqs. (A.3) and (A.6) by selecting the corresponding rows. Row selection can be obtained by premultiplication with a selection matrix:

(A.33) θ g 4 ( θ ; x , y up , y low ) = [ G 4 , μ , G 4 , σ 2 ] ı j E ( V d o ( x ) ) θ ı ( j - 1 ) n + j vec V ( V d o ( x ) ) θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\frac{\partial }{\partial {\varvec{\theta }}^\intercal }g_4({\varvec{\theta }};x,y^{{up}},y^{{low}}) =\big [{\mathbf {G}}_{4,\mu },{\mathbf {G}}_{4,\sigma ^2} \big ] \left( \begin{array}{rl} \varvec{{{\imath }}}_{j}^\intercal &{}\frac{\partial \mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))}{\partial {\varvec{\theta }}^\intercal }\\ \varvec{{{\imath }}}_{(j-1)n+j}^\intercal &{}\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))}{\partial {\varvec{\theta }}^\intercal } \end{array}\right) \end{aligned}$$\end{document}

Where the unit vector in the upper entry of the vector in Eq. (A.33) is of dimension ( n × 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(n \times 1)$$\end{document} and the unit vector in the lower entry is of dimension ( n 2 × 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(n^2 \times 1)$$\end{document}. The matrices denoted by G \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {G}$$\end{document} and a subscript are defined as follows:

(A.34a) G 4 , μ : = - 1 σ y ϕ y up - μ y σ y - ϕ y low - μ y σ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {G}}_{4,\mu }&:= - \frac{1}{\sigma _y}\left[ \phi \left( \frac{y^{{up}}-\mu _y}{\sigma _y}\right) -\phi \left( \frac{y^{{low}}-\mu _y}{\sigma _y}\right) \right] \end{aligned}$$\end{document}
(A.34b) G 4 , σ 2 : = - 1 2 σ y 2 ϕ y up - μ y σ y y up - μ y σ y - ϕ y low - μ y σ y y low - μ y σ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {G}}_{4,\sigma ^2}&:= - \frac{1}{2\sigma _y^{2}} \left[ \phi \left( \frac{y^{{up}}-\mu _y}{\sigma _y}\right) \left( \frac{y^{{up}}-\mu _y}{\sigma _y}\right) -\phi \left( \frac{y^{{low}}-\mu _y}{\sigma _y}\right) \left( \frac{y^{{low}}-\mu _y}{\sigma _y}\right) \right] \end{aligned}$$\end{document}

where μ y θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial \mu _y}{\partial {\varvec{\theta }}^\intercal }$$\end{document} is obtained from E ( V d o ( x ) ) θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial \mathrm {E}({\mathbf {V}} \mid do({\mathbf {x}}))}{\partial {\varvec{\theta }}^\intercal }$$\end{document} by selecting the j-th row. Since vec V ( V d o ( x ) ) θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial {{{\,\mathrm{\textsf {vec}}\,}}}\,\, \mathrm {V}({\mathbf {V}} \mid do({\mathbf {x}}))}{\partial {\varvec{\theta }}^\intercal }$$\end{document} is a vectorized quantity, σ y 2 θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial \sigma _y^2}{\partial {\varvec{\theta }}^\intercal }$$\end{document} is obtained by selecting the ( ( j - 1 ) n + j ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$((j-1)n+j)$$\end{document}-th row. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-021-09811-z.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Exceptions from this statement include, for example Ernest and Bühlmann (Reference Ernest and Bühlmann2015) and Bhattacharya, Nabi, and Shpitser (Reference Bhattacharya, Nabi and Shpitser2020). Furthermore, statistical procedures from related fields such as the potential outcome framework (Robins, Reference Robins1986, Reference Robins1987; Robins, Rotnitzky, & Zhao, Reference Robins, Rotnitzky and Zhao1994; Rosenbaum & Rubin, Reference Rosenbaum and Rubin1983; van der Laan & Rubin, Reference van der Laan and Rubin2006) or econometrics (Chernozhukov, Fernández-Val, Newey, Stouli, & Vella, Reference Chernozhukov, Fernández-Val, Newey, Stouli and Vella2020; Matzkin, Reference Matzkin2015) could be adjusted such that they can be used to estimate causal quantities in the NPSEM framework.

2 However, techniques for identification (e.g., rank and order conditions) and estimation (e.g., limited information estimators) of single structural equations have been developed (c.f. Bollen, Reference Bollen1996; Bollen, Kolenikov, & Bauldry, Reference Bollen, Kolenikov and Bauldry2014; Bowden & Turkington, Reference Bowden and Turkington1985; Fisher, Reference Fisher1966).

3 If the listed statements are indeed true, the causal Markov assumption is implied. For a detailed discussion of the logical relation of causal assumptions encoded in graph-based models and causal assumptions from the Neyman–Rubin potential outcome framework (e.g., ignorability, SUTVA), see, for example, Holland (Reference Holland1988); Pearl (Reference Pearl2009); Shpitser et al. (Reference Shpitser, Richardson and Robins2020).

4 Similar concepts such as autonomy (Aldrich, Reference Aldrich1989), exogeneity (Mouchart, Russo, & Wunsch, Reference Mouchart, Russo and Wunsch2009), and invariance (Cartwright, Reference Cartwright2009) have been discussed in the econometric literature. However, we believe that these concepts are not part of the canonical assumptions of traditional SEM as used in the social and behavioral sciences.

5 Many results derived in this paper (e.g., the moments of the interventional distribution in Eqs. (6a) and (6b) or Theorem 8) do not rely on multivariate normality. However, Result 3 on the distributional family of the interventional distribution requires multivariate normality.

6 Throughout this article, we use the following conventions: Sets of random variables are denoted by calligraphic letters (e.g., V = { V 1 , . . . , V n } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {V}}=\{V_1,...,V_n\}$$\end{document}). Single random variables from a set are denoted by corresponding upper-case Latin letters (e.g., V i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_i$$\end{document}). The column vector containing all random variables in a set is denoted by the corresponding bold Latin letter (e.g., V = ( V 1 , . . . , V n ) ⊺ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}=(V_1,...,V_n)^\intercal $$\end{document}). Realizations of a random vector V \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {V}}$$\end{document} are denoted by lower-case Latin letters (e.g., v \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {v}}$$\end{document}).

7 Note that bidirected edges in a causal graph (see Fig. 2 in the illustration section) represent a nonzero covariance between error terms that is due to an unobserved common cause. This convention for causal graphs is different for path diagrams from the traditional SEM literature where bidirected edges simply indicate a correlation without being specific about its origin.

8 An alternative approach to compute the distribution of an outcome variable under different (hypothetical) treatments is Robin’s (Reference Robins1986) G-formula. For similarities and differences between the two approaches, see, for example Hernán and Robins (Reference Hernán and Robins2020); Pearl (Reference Pearl2009); Pearl and Robins (Reference Pearl and Robins1995).

9 A detailed justification that the matrix expressions in Eq. (3) adequately represent the changes to the linear system imposed by the do-operator is provided in the online supplementary material.

10 In practice, nonparametric estimation of multivariate distributions requires certain regularity conditions and large sample sizes due to reduced rates of convergence (as compared to parametric estimation procedures). This practical limitation will be particularly pronounced in high dimensional systems with continuous variables, a phenomenon known as the curse of dimensionality.

11 Additional estimation techniques include two- and three-stage least squares (2SLS, 3SLS; Bollen, Reference Bollen1996; Sargan, Reference Sargan1988; Theil, Reference Theil1971), instrumental variables (IV; Bowden & Turkington, Reference Bowden and Turkington1985), and generalized methods of moments (GMM; Bollen et al., Reference Bollen, Kolenikov and Bauldry2014; Hansen, Reference Hansen1982; Hayashi, Reference Hayashi2011).

12 A more detailed description of the data simulation is provided in the online supplementary material.

13 For the detailed derivation of analytic expressions and computational details, we refer the reader to the online supplementary material.

14 Put more technically, we show that the model is locally identified using a generalized version of Wald’s (Reference Wald1950) rank rule (Bekker et al., Reference Bekker, Merckens and Wansbeek1994). Given the triangular structure of the matrix of structural coefficients and the special structure of the covariance restrictions, we believe that the model is also globally identified (Hausman & Taylor, Reference Hausman and Taylor1983; Hsiao, Reference Hsiao1983).

15 For the detailed derivation of analytic expressions and computational details, we refer the reader to the online supplementary material.

16 Note that the functions g 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_1$$\end{document}, g 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_2$$\end{document}, g 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_3$$\end{document} and g 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_4$$\end{document} introduced in Eqs. (12a), (13), (14) and (15) satisfy Property A.1 at every point in the interior of Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Theta }}$$\end{document} for any fixed ( x , v N ) ∈ R K x × R n - K x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\mathbf {x}},{\mathbf {v}}_{{\mathcal {N}}})\in {\mathbb {R}}^{K_x}\times {\mathbb {R}}^{n-K_x}$$\end{document}.

17 Note that many standard estimators from the field of linear SEM (e.g., 3SLS, ADF, GLS, GMM, ML, IV) satisfy Property A.2 under fairly general conditions.

References

Abadir, K. M., & Magnus, J. R. (2005). Matrix algebra. Cambridge University Press. https://doi.org/10.1017/CBO9780511810800CrossRefGoogle Scholar
Aldrich, J., (1989). Autonomy Oxford Economic Papers 4–1(1) 1534CrossRefGoogle Scholar
Alwin, D. F., & Hauser, R. M., (1975). The decomposition of effects in path analysis American Sociological Review 40(1) 3747CrossRefGoogle Scholar
Amemiya, T. (1985). Advanced econometrics (1st ed.). Harvard University Press.Google Scholar
Athey, S., & Imbens, G., (2016). Recursive partitioning for heterogeneous causal effects Proceedings of the National Academy of Sciences 113(27) 73537360CrossRefGoogle ScholarPubMed
Bekker, P. A., Merckens, A., & Wansbeek, T. J. (1994). Identification, equivalent models, and computer algebra. Academic Press.Google Scholar
Bhattacharya, R., Nabi, R., & Shpitser, I. (2020). Semiparametric inference for causal effects in graphical models with hidden variables. Retrieved from https://arxiv.org/abs/2003.12659Google Scholar
Bollen, K. A., (1987). Total, direct, and indirect effects in structural equation models Sociological Methodology 17 3769CrossRefGoogle Scholar
Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons. https://doi.org/10.1002/9781118619179CrossRefGoogle Scholar
Bollen, K. A., (1996). An alternative two-stage least squares (2SLS) estimator for latent variable equations Psychometrika 61(1) 109121CrossRefGoogle Scholar
Bollen, K. A., (2002). Latent variables in psychology and the social sciences Annual Review of Psychology 53 605634CrossRefGoogle ScholarPubMed
Bollen, K. A., & Bauldry, S., (2010). A note on algebraic solutions to identification The Journal ofMathematical Sociology 34(2) 136145Google ScholarPubMed
Bollen, K. A.,Kolenikov, S., Bauldry, S., (2014). Model-implied instrumental variable-generalized method of moments (MIIV-GMM) estimators for latent variable models Psychometrika 79(1) 2050CrossRefGoogle ScholarPubMed
Bollen, K. A., & Pearl, J. (2013). Eight myths about causality and structural equation models. In S. L. Morgan (Ed.), Handbook of causal analysis for social research (pp. 301–328). Springer. https://doi.org/10.1007/978-94-007-6094-3_15CrossRefGoogle Scholar
Borsboom, D.,Mellenbergh, G. J., van Heerden, J., (2003). The theoretical status of latent variables Psychological Review 110(2) 203–19CrossRefGoogle ScholarPubMed
Bowden, R. J., & Turkington, D. A. (1985). Instrumental variables. Cambridge University Press. https://doi.org/10.1017/CCOL0521262410CrossRefGoogle Scholar
Brito, C., & Pearl, J., (2002). A new identification condition for recursive models with correlated errors Structural Equation Modeling: A Multidisciplinary Journal 9(4) 459474CrossRefGoogle Scholar
Brito, C., & Pearl, J. (2006). Graphical condition for identification in recursive SEM. In R. Dechter & T. S. Richardson (Eds.), Proceedings of the 23rd conference on uncertainty in artificial intelligence (pp. 47-54). AUAI Press.Google Scholar
Browne, M. W., (1974). Generalized least squares estimators in the analysis of covariance structures South African Statistical Journal 8(1) 124Google Scholar
Browne, M. W., (1984). Asymptotically distribution-free methods for the analysis of covari-ance structures British Journal of Mathematical and Statistical Psychology 37(1) 6283CrossRefGoogle Scholar
Cartwright, N. (2009). Causality, invariance, and policy. In D. Ross & H. Kincaid (Eds.), The oxford handbook of philosophy of economics. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780195189254.003.0015CrossRefGoogle Scholar
Casella, G., & Berger, R. (2002). Statistical inference. Duxbury.Google Scholar
Chen, B., Tian, J., & Pearl, J. (2014). Testable implications of linear structural equation models. In Proceedings of the 28th AAAI conference on artificial intelligence (pp. 2424–2430). AAAI Press.CrossRefGoogle Scholar
Chernozhukov, V.,Fernández-Val, I., Newey, W.,Stouli, S., Vella, F., (2020). Semiparametric estimation of structural functions in nonseparable triangular models Quantitative Economics 11(2) 503533CrossRefGoogle Scholar
Cramér, H. (1946). Mathematical methods of statistics. Princeton University Press.Google Scholar
Dieudonné, J. (1969). Foundations of modern analysis. In Pure and applied mathematics. Academic Press.Google Scholar
Ding, P., & VanderWeele, T. J., (2016). Sensitivity analysis without assumptions Epidemiology 27(3) 368377CrossRefGoogle ScholarPubMed
Dorie, V.,Harada, M., Carnegie, N. B., & Hill, J., (2016). A flexible, interpretable framework for assessing sensitivity to unmeasured confounding Statistics in Medicine 35(20) 34533470CrossRefGoogle ScholarPubMed
Drton, M.,Foygel, R., Sullivant, S., (2011). Global identifiability of linear structural equation models Annals of Statistics 39(2) 865886CrossRefGoogle Scholar
Eberhardt, F., Glymour, C., & Scheines, R. (2005). On the number of experiments sufficient and in the worst case necessary to identify all causal relations among N variables (pp. 178–184). AUAI Press.Google Scholar
Ernest, J., & Bühlmann, P., (2015). Marginal integration for nonparametric causal inference Electronic Journal of Statistics 9(2) 31553194CrossRefGoogle Scholar
Fisher, F. (1966). The identification problem in econometrics. McGraw-Hill.Google Scholar
Franks, A., D’Amour, A., & Feller, A. (2020). Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association, 115(532), 1730–1746. https://doi.org/10.1080/01621459.2019.1604369CrossRefGoogle Scholar
Gische, C.,West, S. G., Voelkle, M. C., (2021). Forecasting causal effects of interventions versus predicting future outcomes Structural Equation Modeling: A Multidisciplinary Journal 28(3) 475492CrossRefGoogle ScholarPubMed
Hamaker, E. L.,Kuiper, R. M., Grasman, R. P. P. P., (2015). A critique of the cross-lagged panel model Psychological Methods 20(1) 102116CrossRefGoogle ScholarPubMed
Hansen, L. P., (1982). Large sample properties of generalized method of moments estimators Econometrica 50(4) 10291054CrossRefGoogle Scholar
Hauser, A., & Bühlmann, P., (2015). Jointly interventional and observational data: Estimation of interventional Markov equivalence classes of directed acyclic graphs Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77(1) 291318CrossRefGoogle Scholar
Hausman, J. A., & Taylor, W. E., (1983). Identification in linear simultaneous equations models with covariance restrictions: An instrumental variables interpretation Econometrica 51(5) 15271549CrossRefGoogle Scholar
Hayashi, F. (2011). Econometrics. Princeton University Press.Google Scholar
He, Y-B Geng, Z., (2008). Active learning of causal networks with intervention experiments and optimal designs Journal of Machine Learning Research 9 25232547Google Scholar
He, Y-B Jia, J., (2015). Counting and exploring sizes of Markov equivalence classes of directed acyclic graphs Journal of Machine Learning Research (16) 25892609Google Scholar
Heckman, J. J., & Pinto, R., (2015). Causal analysis after Haavelmo Econometric Theory 31(1) 115151CrossRefGoogle ScholarPubMed
Hernán, M. A., & Robins, J. M. (2020). Causal inference: What if. Chapman & Hall / CRC.Google Scholar
Holland, P. W., (1988). Causal inference, path analysis, and recursive structural equations models Sociological Methodology 18 449484CrossRefGoogle Scholar
Hsiao, C. (1983). Identification. In Z. Griliches & M. D. Intriligator (Eds.), Handbook of econometrics. (Vol. 1). North-Holland.Google Scholar
Hyttinen, A.,Eberhardt, F., Hoyer, P. O., (2013). Experiment selection for causal discovery Journal of Machine Learning Research 14(57) 30413071Google Scholar
Imai, K., & Ratkovic, M., (2013). Estimating treatment effect heterogeneity in randomized program evaluation The Annals of Applied Statistics 7(1) 443470CrossRefGoogle Scholar
Ito, K.,Wada, T., Makimura, H.,Matsuoka, A., Maruyama, H., & Saruta, T., (1998). Vector autoregressive modeling analysis of frequently sampled oral glucose tolerance test results The Keio Journal of Medicine 47(1) 2836CrossRefGoogle ScholarPubMed
Jöreskog, K. G., (1967). A general approach to confirmatory maximum likelihood factor analysis ETS Research Bulletin Series 1967(2) 183202CrossRefGoogle Scholar
Jöreskog, K. G., & Lawley, D. N., (1968). New methods in maximum likelihood factor analysis British Journal of Mathematical and Statistical Psychology 21(1) 8596CrossRefGoogle Scholar
Kan, R., (2008). From moments of sum to moments of product Journal of Multivariate Analysis 99(3) 542554CrossRefGoogle Scholar
Kang, C., & Tian, J., (2009). Markov properties for linear causal models with correlated errors Journal of Machine Learning Research 10 4170Google Scholar
Klein, A. G., & Muthén, B. O., (2007). Quasi-maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects Multivariate Behavioral Research 42(4) 647673CrossRefGoogle Scholar
Koster, J. T. A., (1999). On the validity of the Markov interpretation of path diagrams of Gaussian structural equations systems with correlated errors Scandinavian Journal ofStatistics 26(3) 413431CrossRefGoogle Scholar
Kuroki, M., & Cai, Z. (2007). Evaluation of the causal effect of control plans in nonrecursive structural equation models. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (pp. 227-234). AUAI Press.Google Scholar
Lee, S.-Y. (2007). Structural equation modeling: A Bayesian approach. John Wiley & Sons.CrossRefGoogle Scholar
Lütkepohl, H. (1997). Handbook of matrices (1st ed.). Wiley.Google Scholar
Maathuis, M. H.,Kalisch, M., Bühlmann, P., (2009). Estimating high-dimensional intervention effects from observational data The Annals of Statistics 37 6A 31333164CrossRefGoogle Scholar
Magnus, J. R., & Neudecker, H., (1979). The commutation matrix: Some properties and applications The Annals ofStatistics 7(2) 381394Google Scholar
Magnus, J. R., & Neudecker, H. (1980). The elimination matrix: Some lemmas and applications (Other publications TiSEM). Tilburg, The Netherlands: Tilburg University, School of Economics and Management. Retrieved from https://pure.uvt.nl/ws/portalfiles/portal/649691/26951_6623.pdfGoogle Scholar
Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics (2nd ed.). Wiley.Google Scholar
Mann, H. B., & Wald, A., (1943). On stochastic limit and order relationships The Annals of Mathematical Statistics 14(3) 217226CrossRefGoogle Scholar
Matzkin, R. L., (2015). Estimation of nonparametric models with simultaneity Econometrica 83(1) 166CrossRefGoogle Scholar
Mouchart, M., Russo, F., & Wunsch, G. (2009). Structural modelling, exogeneity, and causality. In H. Engelhardt, H. Kohler, & A. Fürnkranz-Prskawetz (Eds.), Causal analysis in population studies (Vol. 23, pp. 59–82). Springer.CrossRefGoogle Scholar
Muthén, L. K., & Muthen, B. O. (1998-2017). Mplus user’s guide (8th ed.) [Computer software manual]. Los Angeles, CA. Retrieved from https://www.statmodel.com/Google Scholar
Nie, X., & Wager, S. (2020, 09). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), 299-319. https://doi.org/10.1093/biomet/asaa076CrossRefGoogle Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference (1st.). Morgan Kaufmann. https://doi.org/10.1016/B978-0-08-051489-5.50001-1CrossRefGoogle Scholar
Pearl, J., (1995). Causal diagrams for empirical research Biometrika 82(4) 669688CrossRefGoogle Scholar
Pearl, J. (2009). Causality (2nd ed.). Cambridge University Press.Google Scholar
Pearl, J. (2012). The causal foundations of structural equation modeling. In R. Hoyle (Ed.), Handbook ofstructural equation modeling (pp. 68–91). Guilford Press.CrossRefGoogle Scholar
Pearl, J., & Robins, J. M. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In P. Besnard & S. Hanks (Eds.), Uncertainty in artificial intelligence (pp. 444–453). Morgan Kaufmann.Google Scholar
Perkovic, E. (2020). Identifying causal effects in maximally oriented partially directed acyclic graphs. In J. Peters & D. Sontag (Eds.), Proceedings of the 36th conference on uncertainty in artificial intelligence (UAI) (Vol. 124, pp. 530-539). PMLR.Google Scholar
Peters, J.,Bühlmann, P., Meinshausen, N., (2016). Causal inference by using invariant prediction: identification and confidence intervals Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78(5) 9471012CrossRefGoogle Scholar
Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements ofcausal inference. The MIT Press.Google Scholar
Rao, C., (1945). Information and accuracy attainable in the estimation of statistical parameters Bulletin ofthe Calcutta Mathematical Society 37 8191Google Scholar
Rao, C. (1973). Linear statistical inference and its applications (2nd ed.). Wiley.Google Scholar
Richardson, T. S., (2003). Markov properties for acyclic directed mixed graphs Scandinavian Journal ofStatistics 30(1) 145157CrossRefGoogle Scholar
Richardson, T. S., & Spirtes, P., Ancestral graph Markov models. Annals ofStatistics 30(4) 9621030Google Scholar
Robins, J. M., (2002). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect Mathematical Modelling (1986). 9–12(7) 13931512Google Scholar
Robins, J. M., (1987). A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods Journal of Chronic Diseases 40 Suppl 2 139161CrossRefGoogle Scholar
Robins, J. M.,Hernán, M. A., Brumback, B., (2000). Marginal structural models and causal inference in epidemiology Epidemiology 11(5) 550560CrossRefGoogle ScholarPubMed
Robins, J. M.,Rotnitzky, A., Zhao, L. P., (1994). Estimation of regression coefficients when some regressors are not always observed Journal of the American Statistical Association 89(427) 846866CrossRefGoogle Scholar
Rosenbaum, P. R. (2002). Observational studies (2nd ed.). Springer. https://doi.org/10.1007/978-1-4757-3692-2CrossRefGoogle Scholar
Rosenbaum, P. R., & Rubin, D. B., (1983). The central role of the propensity score in observational studies for causal effects Biometrika 70(1) 4155CrossRefGoogle Scholar
Rosseel, Y., (2012). lavaan: An R package for structural equation modeling Journal ofStatistical Software 48(2) 136Google Scholar
Sargan, D. (1988). Lectures on advanced econometric theory. Basil Blackwell.Google Scholar
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Sage Publications.Google Scholar
Schumacker, R., & Marcoulides, G. (1998). Interaction and nonlinear effects in structural equation modeling. Lawrence Erlbaum Associates.Google Scholar
Serfling, R. (1980). Approximation theorems of mathematical statistics. John Wiley. https://doi.org/10.1002/9780470316481CrossRefGoogle Scholar
Shipley, B., (2003). Testing recursive path models with correlated errors using dseparation Structural Equation Modeling: A Multidisciplinary Journal 10(2) 214221CrossRefGoogle Scholar
Shpitser, I. (2018). Identification in graphical causal models. In M. Maathuis, M. Drton, S. Lauritzen, & M. Wainwright (Eds.), Handbook of graphical models (pp. 381–403). CRC Press.CrossRefGoogle Scholar
Shpitser, I., & Pearl, J. (2006). Identification of conditional interventional distributions. In R. Dechter & T. S. Richardson (Eds.), Proceedings ofthe 22nd conference on uncertainty in artificial intelligence (pp. 437-444). AUAI Press.Google Scholar
Shpitser, I., Richardson, T. S., & Robins, J. M. (2020). Multivariate counterfactual systems and causal graphical models. Preprint on arXiv. Retrieved from arXiv:2008.06017Google Scholar
Sontakke, S. A., Mehrjou, A., Itti, L., & Schölkopf, B. (2020). Causal curiosity: RL agents discovering self-supervised experiments for causal representation learning. Preprint on arXiv. Retrieved from arXiv:2010.03110Google Scholar
Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, prediction, and search (2nd ed.). The MIT Press.Google Scholar
Stolzenberg, R. M., (1980). The measurement and decomposition of causal effects in nonlinear and nonadditive models Sociological Methodology 11 459488CrossRefGoogle Scholar
Theil, H. (1971). Principles ofeconometrics. Wiley.Google Scholar
Thoemmes, F.,Rosseel, Y., Textor, J., (2018). Local fit evaluation of structural equation models using graphical criteria Psychological Methods 23(1) 2741CrossRefGoogle ScholarPubMed
Tian, J., & Pearl, J. (2002a). A general identification condition for causal effects. In Proceedings of the 18th national conference on artificial intelligence (pp. 567-573). AAAI Press / MIT Press.Google Scholar
Tian, J., & Pearl, J. (2002b). On the testable implications of causal models with hidden variables. In A. Darwiche & N. Friedman (Eds.), Proceedings of the 18th conference on uncertainty in artificial intelligence (pp. 519-527). Morgan Kaufmann.Google Scholar
Usami, S.,Murayama, K., Hamaker, E. L., (2019). A unified framework of longitudinal models to examine reciprocal relations Psychological Methods 24(5) 637–57CrossRefGoogle ScholarPubMed
van Bork, R., Rhemtulla, M., Sijtsma, K., & Borsboom, D. (2020). A causal theory of error scores. Preprint on PsyArXiv. Retrieved from arXiv:2009.10025 https://doi.org/10.31234/osf.io/h35saCrossRefGoogle Scholar
van der Laan, M. J., & Rubin, D. (2006). Targeted maximum likelihood learning. The International Journal ofBiostatistics, 2(1). https://doi.org/10.2202/1557-4679.1043CrossRefGoogle Scholar
Wager, S., & Athey, S., (2018). Estimation and inference of heterogeneous treatment effects using random forests Journal of the American Statistical Association 113(523) 12281242CrossRefGoogle Scholar
Wald, A. (1950). Note on the identification of economic relations. In T. C. Koopmans (Ed.), Statistical inference in dynamic economic models. Wiley.Google Scholar
Wall, M. M., & Amemiya, Y., (2003). A method of moments technique for fitting interaction effects in structural equation models British Journal of Mathematical and Statistical Psychology 56 4763CrossRefGoogle ScholarPubMed
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnor-mal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56–75). SAGE.Google Scholar
Wiley, D. (1973). The identification problem for structural equations with unmeasured variables. In A. Goldberger & O. Duncan (Eds.), Structural equation models in the social sciences (pp. 69–83). Academic Press.Google Scholar
Wolfram Research Inc. (2018). Mathematica, Version 11.3 [Computer software manual]. Champaign, IL. Retrieved from https://www.wolfram.com/mathematicaGoogle Scholar
Xie, Y.,Brand, J. E., Jann, B., (2012). Estimating heterogeneous treatment effects with observational data Sociological Methodology 42(1) 314347CrossRefGoogle ScholarPubMed
Yuan, K-H Bentler, P. M., (1998). Structural equation modeling with robust covariances Sociological Methodology 28(1) 363396CrossRefGoogle Scholar
Zehna, P. W., (1966). Invariance of maximum likelihood estimators The Annals of Mathematical Statistics 37(3) 744744CrossRefGoogle Scholar
Zhang, J., (2008). Causal reasoning with ancestral graphs Journal of Machine Learning Research 9 14371474Google Scholar
Zyphur, M. J., Allison, P. D., Tay, L., Voelkle, M. C., Preacher, K. J., Zhang, Z., & Diener, E. (2019). From data to causes I: Building a general cross-lagged panel model (GCLM). Organizational Research Methods. https://doi.org/10.1177/1094428119847278CrossRefGoogle Scholar
Figure 0

Figure. 1 Causal Effect Functions. Figure 1 displays the mapping g:Θ↦Γ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathbf {g}}:{\varvec{\Theta }}\mapsto {\varvec{\Gamma }}$$\end{document} that corresponds to a causal effect function γ=g(θ)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\varvec{\gamma }}={\mathbf {g}}({\varvec{\theta }})$$\end{document}. The domain Θ⊆Rq+p\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\varvec{\Theta }}\subseteq {\mathbb {R}}^{q+p}$$\end{document} (left-hand side) contains the parameters of the model implied joint distribution of observed variables (nodo-operator). The co-domain Γ⊆Rr\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\varvec{\Gamma }}\subseteq {\mathbb {R}}^{r}$$\end{document} (right-hand side) contains causal quantities γ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\varvec{\gamma }}$$\end{document} that are defined via the do-operator.

Figure 1

Figure. 2 Causal Graph (ADMG) in the Absence of Interventions. Figure 2 displays the ADMG corresponding to the linear graph-based model. The dashed bidirected edge drawn between X1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_1$$\end{document} and Y1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$Y_1$$\end{document} represents a correlation due to an unobserved common cause. Directed edges are labeled with the corresponding path coefficients that quantify direct causal effects. For example, the direct causal effect of X2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2$$\end{document} on Y3\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$Y_{3}$$\end{document} is quantified as cyx\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$c_{yx}$$\end{document}. Traditionally, disturbances (residuals, error terms), denoted by ε\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\varvec{\varepsilon }}$$\end{document} in Eq. (19), are not explicitly drawn in an ADMG.

Figure 2

Figure. 3 Causal Graph (ADMG) Under the Interventiondo(x2)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$do(x_2)$$\end{document}. Figure 3 displays the ADMG of the graph-based model under the intervention do(x2)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$do(x_2)$$\end{document}. Edges that enter node X2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2$$\end{document} (i.e., that have an arrowhead pointing at node X2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2$$\end{document}) are removed since the value of X2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2$$\end{document} is now set by the experimenter via the intervention do(x2)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$do(x_2)$$\end{document}. The interventional value x2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_2$$\end{document} is neither determined by the values of the causal predecessors of X2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2$$\end{document} nor by unobserved confounding variables. All other causal relations are unaffected by the intervention reflecting the assumption of modularity.

Figure 3

Figure. 4 Interventional Distributions for Three Distinct Treatment Levels. Figure 4 displays several features of the interventional distribution for three distinct interventional levels x2=11.54\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_2=11.54$$\end{document} (solid), x2′=0\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_2'=0$$\end{document} (dashed), and x2′′=-11.54\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_2''=-11.54$$\end{document} (dotted). The pdfs of the interventional distributions are represented by the bell-shaped curves. The interventional means are represented by vertical line segments. The interventional variances correspond to the width of the bell-shaped curves and are equal across the different interventional levels. The probabilities of treatment success are represented by the shaded areas below the curves in the interval [-40,80]\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$[-40,80]$$\end{document}.

Figure 4

Table 1 Parameters in the Linear Graph-Based Model.

Figure 5

Table 2 Causal Quantities in the Linear Graph-Based Model.

Figure 6

Figure. 5 Estimate of the Probability Density Function of the Interventional Distribution. Figure 5 displays the estimated interventional pdf f^(y3∣do(x2=11.54))\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\widehat{f}}(y_3 \mid do(x_2=11.54))$$\end{document} (black solid line) with pointwise 95%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$95\%$$\end{document} confidence intervals, that is, ±1.96·ASE^[f^(y3∣do(x2=11.54))]\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\pm 1.96\cdot {\widehat{\mathrm{ASE}}}[{\widehat{f}}(y_3 \mid do(x_2=11.54))]$$\end{document} (gray shaded area). The true population interventional pdf f(y3∣do(x2=11.54))\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$f(y_3 \mid do(x_2=11.54))$$\end{document} is displayed by the gray dashed line.

Figure 7

Figure. 6 Estimated Probability of Treatment Success. Figure 6 displays the estimated probability of treatment success (i.e., γ^4=P^(-40≤Y3≤80∣do(x2))\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\widehat{\gamma }}_4={\widehat{P}}(-40 \le Y_3 \le 80\mid do(x_2))$$\end{document}; black solid line) as a function of the interventional level x2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_2$$\end{document}. The pointwise confidence intervals ±1.96·ASE^[P^(-40≤Y3≤80∣do(x2))]\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\pm 1.96\cdot {\widehat{\mathrm{ASE}}}[{\widehat{P}}(-40 \le Y_3 \le 80\mid do(x_2))]$$\end{document} are displayed by the (very narrow) gray shaded area around the solid black line (see electronic version for high resolution). The vertical dashed lines are drawn at the interventional levels x2=11.54\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_2=11.54$$\end{document} and x2=-38.3\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_2=-38.3$$\end{document}. The horizontal dashed lines correspond to the probabilities of treatment success for the treatments do(X2=11.54)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$do(X_2=11.54)$$\end{document} and do(X2=-33.3)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$do(X_2=-33.3)$$\end{document}.

Figure 8

Figure. 7 Marginal, Conditional, and Interventional Distribution. The panels depict (i) the pdf of the unconditional distribution P(Y3)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$P(Y_3)$$\end{document} (top panel), (ii) the conditional distribution P(Y3∣X2=x2)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$P(Y_3 \mid X_2=x_2)$$\end{document} (middle panel), and (iii) the interventional distribution P(Y3∣do(x2))\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$P(Y_3 \mid do(x_2))$$\end{document} (bottom panel). In (ii) the level x2=11.54mg/dl\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_2=11.54 \ \text {mg/dl}$$\end{document} was passively measured whereas in (iii) the intervention do(X2=11.54)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$do(X_2=11.54)$$\end{document} was performed. The central vertical black solid lines are drawn at the mean and shaded areas cover 95%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$95\%$$\end{document} of the probability mass.

Supplementary material: File

Gische and Voelkle supplementary material

Gische and Voelkle supplementary material
Download Gische and Voelkle supplementary material(File)
File 659.5 KB