1 Motivation and preview
There is an uncontroversial sense in which causal reasoning is more difficult than purely probabilistic or statistical reasoning. The latter seems hard enough: estimating probabilities, predicting future events from past observations, determining statistical significance, and adjudicating between statistical hypotheses—these are already formidable tasks, long mired in controversy. No free lunch theorems [Reference Belot6, Reference Shalev-Shwartz and Ben-David41] show that strong assumptions are necessary to gain any inductive purchase on such problems, and there is considerable disagreement about what kinds of assumptions are reasonable in different epistemic and practical circumstances [Reference Efron15]. Problems of causal inference only seem to make our tasks harder. Inferring causal effects, predicting the outcomes of interventions, determining causal direction, and learning a causal model—these problems typically demand statistical reasoning, but they also demand more on the part of the investigator. They may require that we actively interrogate the world through deliberate experimentation rather than passive observation, or that we antecedently accept strong assumptions sufficient to justify the causal conclusions we want to reach, or (very often) both. Indeed, statistical indistinguishability is the norm in causal inference, even with substantive assumptions [Reference Spirtes, Glymour and Scheines43]. As formalized in the causal hierarchy theorem of Bareinboim et al. [Reference Bareinboim, Correa, Ibeling, Icard, Geffner, Dechter and Halpern5] (see also [Reference Ibeling and Icard26]), it is not only impossible to infer causal information from purely correlational (or “observational”) data, but also generically impossible to infer counterfactual or explanatory information from purely experimental (or “interventional”) data. From an inferential perspective, probabilistic information vastly underdetermines causal information.
A feature common to both statistical inference and causal inference is that the most prominent approaches to each can be understood, at least in part, as attempts to turn an inductive problem into a deductive one. This is famously true of frequentist methods in the tradition associated with Neyman and Pearson (see [Reference Neyman31]), but is arguably true of Bayesian approaches as well. As Gelman and Shalizi [Reference Gelman and Shalizi20] suggest, “Statistical models are tools that let us draw inductive inferences on a deductive background,” rendering statistical inferences “deductively guaranteed by probabilistic assumptions” (p. 27). Indeed, one of the benefits of specifying a Bayesian probability model is that it provides an answer to virtually any question about the probability of a hypothesis conditional on data. Given the model and the data, this answer follows as a matter of logic.
Causal underdetermination is likewise confronted with methods for formulating precise inductive assumptions, sometimes allowing answers to causal questions to be derived by mere calculation.
Example 1.1 (Do-calculus)
As one prominent example, the do-calculus of Pearl and collaborators (see [Reference Pearl33] and Chapter 3 of [Reference Pearl34]) establishes systematic correspondences between qualitative (“graphical”) properties of a causal scenario and certain conditional independence statements involving causal quantities. A typical causal quantity of interest is the (average) causal effect, e.g., how likely Y is to take on value y given an intervention setting X to x. In a formal language (introduced in the sequel as $\mathcal {L}_{\mathrm {causal}}$ ), we write this as ${\mathbf {P}}([X=x]Y=y)$ , or more briefly, ${\mathbf {P}}([x]y)$ .
Absent assumptions, it is never possible to infer the value of ${\mathbf {P}}([x]y)$ from observational data [Reference Bareinboim, Correa, Ibeling, Icard, Geffner, Dechter and Halpern5]. Suppose, however, that we could assume the causal structure has something like the following shape (known in the literature as the front door graph):
For a standard example, we might assume that any causal effect of smoking (X) on cancer (Y) will be mediated by tar deposited in the lungs (Z), and moreover that any unknown sources of variation (U) on X or on Y (or on both), such as a person’s genotype, do not directly influence Z. Under these circumstances, the do-calculus licenses several substantive causal assumptions, which may be rendered precisely in $\mathcal {L}_{\mathrm {causal}}$ . Let $\Gamma $ be the set of equality statements below:
-
(i) ${\mathbf {P}}([x]z) = {\mathbf {P}}(z|x)$ ,
-
(ii) ${\mathbf {P}}([z]x) = {\mathbf {P}}(x)$ ,
-
(iii) ${\mathbf {P}}([x]y | [x]z)={\mathbf {P}}([x,z]y)={\mathbf {P}}([z]y)$ ,
-
(iv) ${\mathbf {P}}([z]y | [z]x)={\mathbf {P}}(y|x,z)$ .
For instance, (i) says that the causal effect of $X=x$ on $Z=z$ simply coincides with the conditional probability ${\mathbf {P}}(Z=z|X=x)$ . Appealing to a combination of laws of probability and distinctively causal laws involving the “causal-conditional” statements like $[x]y$ , it is possible to show that the following equality is in fact entailed by the statements $\Gamma $ , that is, by (i)–(iv):
In other words, (1) shows that the causal effect of $X=x$ on $Y=y$ can simply be calculated from suitable observational data involving the variables $X,Y,Z$ .
Methods such as these extend beyond the specific problem of estimating causal effects, to include estimation of counterfactual quantities as well. For instance, we may want to determine—from experimental data and background assumptions—the joint probability that an individual would survive if and only if they are assigned a certain treatment, a quantity we would write as ${\mathbf {P}}([X=1]Y=1 \wedge [X=0]Y=0)$ . Inferential techniques similar to those in Example 1.1 have been employed in such settings, and have even been automated (e.g., [Reference Duarte, Finkelstein, Knox, Mummolo and Shpitser14]).
More broadly, a number of different approaches to inductive inference, both statistical and causal, can be assimilated to a regiment something like this:
In Example 1.1, $\Gamma $ are the inductive assumptions, the data would be information about ${\mathbf {P}}(X,Y,Z)$ , and the conclusion would be an estimate of the causal effect of $X=x$ on $Y=y$ . In a standard Bayesian analysis, the inductive assumption might be a prior probability model for some latent variables (e.g., parameters for a class of probability measures), while the data would be values of some observable variables, and the conclusion might be the posterior values for the hidden variables, or perhaps posterior predictive values for some yet-to-be-observed variables. A critical job of the statistician or data scientist is to identify suitable inductive assumptions that a relevant party judges reasonable (or, ideally if feasible, which are themselves empirically verifiable) and that are sufficiently strong to license meaningful conclusions from the types of data available.
From this vantage point our titular question takes on a new significance. Rather than asking about the difficulty of an inference task in terms of the strength of assumptions needed to justify the inference, we could instead ask how difficult it is in general, computationally speaking, to reason from inductive assumptions (together with data) to an inferential conclusion, in the strong sense of (2). In other words, we ask how difficult questions like (2) could be across different logical languages for describing relevant assumptions, data, and conclusions.
The contrast of interest in this article is between languages $\mathcal {L}_{\mathrm {prob}}$ , suitable for probabilistic reasoning, and languages $\mathcal {L}_{\mathrm {causal}}$ , which extend the corresponding probabilistic languages to encompass causal reasoning in addition. In short, $\mathcal {L}_{\mathrm {prob}}$ encompasses “pure” probabilistic reasoning about some set of random variables. In $\mathcal {L}_{\mathrm {causal}}$ we also reason about the probabilities of causal conditionals, the causal effect ${\mathbf {P}}\big ([x]y\big )$ being a simple example. Such mixed reasoning is crucial for applications like the do-calculus, where causal conclusions depend on distinctively causal assumptions (such as (i)–(iv) in Example 1.1). Some of the emblematic principles of $\mathcal {L}_{\mathrm {causal}}$ reveal a subtle interplay between the probabilistic and causal-conditional components. For example, the following formula states that if causal interventions which set the values of the variable X thereby affect the values taken the variable Y, then the converse cannot be true:
This formula emerges as an instance of a more general scheme in a complete axiomatization of $\mathcal {L}_{\mathrm {causal}}$ (see [Reference Ibeling and Icard25]), implying that X and Y cannot each causally affect the other.
In light of the considerable empirical (and expressive) gulf between these two kinds of languages, we might expect to see a parallel jump in computational complexity when moving from $\mathcal {L}_{\mathrm {prob}}$ to $\mathcal {L}_{\mathrm {causal}}$ . In a certain respect, $\mathcal {L}_{\mathrm {causal}}$ can be seen as a combination of logics, embedding one modal system (a conditional logic) inside another (a probability logic), with non-trivial interactions between the two (such as (3)). It is common wisdom that such combinations may in general drive up complexity, in some cases even resulting in undecidability (see, e.g., [Reference Kurucz, Benthem, Blackburn and Wolter28]). As a famous example, even seemingly innocuous combinations of modalities for knowledge and time (each independently of low complexity) can lead to $\Pi ^1_1$ -hardness [Reference Halpern and Vardi21]. The present work introduces two main results, which show that this does not happen here: causal reasoning and probabilistic reasoning are, in a precise and robust sense, equally difficult.
The distinction between $\mathcal {L}_{\mathrm {prob}}$ and $\mathcal {L}_{\mathrm {causal}}$ is orthogonal to another distinction, namely how much arithmetic we admit in our formal language of probability over a set of probability terms ${\mathbf {P}}(\delta )$ . A wide range of probability logics have been studied in the literature, from pure qualitative comparisons between probability terms (e.g., [Reference de Finetti19]) to richer fragments capable of reasoning about polynomials over such terms (e.g., [Reference Scott and Krauss40]). For any such choice $\mathcal {L}_{\mathrm {prob}}$ of probabilistic language we can consider the extension $\mathcal {L}_{\mathrm {causal}}$ to allow not only probability terms, but also causal-probability terms like those introduced above. A strength of our analysis is that we provide a complexity-reflecting reduction from $\mathcal {L}_{\mathrm {causal}}$ to $\mathcal {L}_{\mathrm {prob}}$ in a way that is independent of our choice of probabilistic primitives. Thus, across the landscape of probability logics, we see no increase in complexity. Summarizing, our main result states:
Theorem 1.2 Probabilistic reasoning is no harder than causal reasoning. In particular:
-
1. Reasoning about (causal or non-causal) probabilities is as hard as reasoning about sums of (causal or non-causal) probabilities; both are as hard as reasoning about Boolean formulas, or about sums of real numbers.
-
2. Reasoning about (causal or non-causal) conditional probabilities is as hard as reasoning about arbitrary polynomials in (causal or non-causal) probabilities; both are as hard as reasoning about arbitrary polynomials in real numbers.
The above theorem is stated formally below as Theorem 3.3. While the relationship between probabilistic and causal languages is our main focus, it is worth pointing out that some of our results are of interest beyond the connection with causality. In particular, we find that reasoning in the language of conditional comparative probability is precisely as hard as reasoning in the full existential first-order theory of real numbers ( $\mathsf {\exists }\mathbb {R}$ ), thus establishing another notable example of a problem complete for this complexity class. It is also noteworthy that this expressively weak probabilistic language is—from a computational perspective—as complex as the most expressive causal languages we consider in the paper (namely, $\mathcal {L}_{\mathrm {causal}}^{\mathrm {poly}}$ ).
1.1 Relation to previous work
There is a long line of work on probability logic, including a host of results about complexity [Reference Abadi and Halpern1, Reference Fagin, Halpern and Megiddo18, Reference Ognjanović, Rašković and Marković32, Reference Speranski42]. As just mentioned, our contribution advances this literature. Concerning causal reasoning, there have been a number of complexity studies for various non-probabilistic causal notions [Reference Aleksandrowicz, Chockler, Halpern and Ivrii4, Reference Eiter and Lukasiewicz16]. Most germane to the present study is Halpern’s [Reference Halpern22] analysis of the satisfiability problem for deterministic reasoning about causal models, which he shows to be ${\mathsf {NP}}$ -complete (the same as propositional logical reasoning). Eiter and Lukasiewicz [Reference Eiter and Lukasiewicz16] studied numerous model-checking queries in a probabilistic setting, including the problem of determining the probability of a specific causal query. They show that this problem is complete for the class $\#\mathsf {P}$ , the “counting analogue” to ${\mathsf {NP}}$ which also characterizes the problem of determining (approximations for) probabilities of (even very simple) propositional expressions [Reference Roth35].
Our interest in the present contribution is the complexity of reasoning—viz. testing for satisfiability, validity, or entailment, as portrayed in (2)—for probabilistic and causal languages. While this angle has not yet been explored thoroughly in the literature, our study is indebted to, and draws upon, much of this previous work. Theorem 1.2 synthesizes as well as greatly extends a heretofore piecemeal line of results [Reference Fagin, Halpern and Megiddo18, Reference Ibeling24, Reference Ibeling and Icard25]. Moreover, the results just mentioned by Halpern [Reference Halpern22] and by Eiter and Lukasiewicz [Reference Eiter and Lukasiewicz16]—see also [Reference Darwiche13]—could be said to lend further support to the claim that causal reasoning is no more difficult (in the sense of computational complexity) than purely probabilistic reasoning.
1.2 Overview of the paper
In the next two sections (Sections 2 and 3), we introduce the languages and the notions from computational complexity needed to state Theorem 1.2 more formally. The proof of this main result appears in Section 4. Finally, in Section 5 we zoom out to consider what our results show about the relationship between probabilistic and causal reasoning, as well as consider a number of outstanding problems in this domain. In our presentation we assume no prior knowledge of causal modeling, complexity theory, or probability logic. Only elementary logic and probability are presupposed.
2 Introducing causal and probabilistic languages
In this section, we introduce the syntax and semantics for a series of probabilistic and causal languages. With a precise syntax and semantics in hand, we illustrate that these languages form an expressive hierarchy.
2.1 Syntax
Let $\mathbf {V}$ be a (possibly infinite) collection, representing the (endogenous) random variables under consideration. Informally, these are the variables that we may want to observe, change, query, or otherwise reason about explicitly.
For each variable $V \in \mathbf {V}$ , let $\text {Val}(V)$ denote the finite signature (range) of V. For example, for two binary variables we have $\mathbf {V} = \{X, Y\}$ with $\text {Val}(X) = \text {Val}(Y) = \{0, 1\}$ . We introduce the following deterministic languages:
Choose either $\mathcal {L}_{\mathrm {prop}}$ or $\mathcal {L}_{\mathrm {full}}$ as the base language $\mathcal {L}$ . The former is essentially a propositional language with extended ranges, while the latter is a causal conditional language. The semantics of these formulas will be introduced in Section 2.2, but intuitively we can interpret a formula of $\mathcal {L}_{\mathrm {full}}$ , such as $[X=1]Y=0$ , as expressing a subjunctive conditional: were X to take on value $1$ , then Y would come to have value $0$ . We understand the conditional causally, in a sense to be made precise below.
So-called terms over the base language are the main ingredient of our probabilistic languages. The most basic term is ${\mathbf {P}}(\delta )$ for $\delta \in \mathcal {L}$ , representing the probability of $\delta $ . By varying the composite terms admitted, we can define polynomial, conditional, linear, and comparative languages. Where $\delta , \delta ' \in \mathcal {L}$ are formulas of $\mathcal {L}$ :
We define for each $* \in \{\mathrm {comp}, \mathrm {lin},\mathrm {cond},\mathrm {poly}\}$ the causal and purely probabilistic languages:
Several of these probabilistic languages have appeared in the literature. For instance, $\mathcal {L}^{\mathrm {poly}}_{\mathrm {prob}}$ appeared already in early work by Scott and Krauss [Reference Scott and Krauss40], while $\mathcal {L}^{\mathrm {lin}}_{\mathrm {prob}}$ was introduced explicitly by Fagin et al. [Reference Fagin, Halpern and Megiddo18]. The language $\mathcal {L}^{\mathrm {poly}}_{\mathrm {causal}}$ was introduced and studied recently in [Reference Ibeling and Icard25] (see also [Reference Bareinboim, Correa, Ibeling, Icard, Geffner, Dechter and Halpern5, Reference Eiter and Lukasiewicz16]). Many of these languages, however, have not yet received explicit treatment.
2.2 Semantics
2.2.1 Structural causal models
The semantics for all of these languages will be defined relative to structural causal models, which can be understood as a very general framework for encoding data-generating processes. In addition to the endogenous variables $\mathbf {V}$ , structural causal models also employ exogenous variables $\mathbf {U}$ as a source of random variation among endogenous settings. For extended introductions, see, e.g., [Reference Bareinboim, Correa, Ibeling, Icard, Geffner, Dechter and Halpern5, Reference Pearl34].
Definition 2.1. A structural causal model (SCM) $\mathfrak {M}$ is a tuple $\mathfrak {M} = (\mathcal {F}, {\mathbb {P}}, \mathbf {U}, \mathbf {V})$ , with:
-
(a) $\mathbf {V}$ a set of endogenous variables, with each $V \in \mathbf {V}$ taking on possible values $\text {Val}(V)$ ,
-
(b) $\mathbf {U}$ a set of exogenous variables, with each $U \in \mathbf {U}$ taking on possible values $\text {Val}(U)$ ,
-
(c) $\mathcal {F} =\{f_V\}_{V \in \textbf {V}}$ a set of structural functions, such that $f_V$ determines the value of V given the values of the exogenous variables $\mathbf {U}$ and those of the other endogenous variables $V' \in \mathbf {V}$ , and
-
(d) ${\mathbb {P}}$ a probability measure on a $\sigma $ -algebra $\sigma (\mathbf {U})$ on $\mathbf {U}$ .
Here we will assume for convenience that $\text {Val}(V)$ and $\text {Val}(U)$ are all finite.
In addition, we adopt the common assumption that our SCMs are recursive:
Definition 2.2. An SCM $\mathfrak {M}$ is recursive if there is a well-order $\prec $ on $\mathbf {V}$ such that $\mathcal {F}$ respects $\prec $ in the following sense: for any $V \in \mathbf {V}$ , whenever $\mathbf {v}_1,\mathbf {v}_2:V \mapsto \text {Val}(V)$ have the property that $\mathbf {v}_1(V') = \mathbf {v}_2(V')$ for all $V' \prec V$ , we are guaranteed that $f_V(\mathbf {v}_1,\mathbf {u}) = f_V(\mathbf {v}_2,\mathbf {u})$ .
Intuitively, $\mathfrak {M}$ is recursive if for all $V \in \textbf {V}$ , the function $f_V$ ensures that the value of V is determined only by the exogenous random variables $U \in \mathbf {U}$ and endogenous random variables $V^\prime \in \textbf {V}$ for which $V^\prime \prec V$ . Thus in a recursive model $\mathfrak {M}$ , the probability measure ${\mathbb {P}}$ on $\sigma (\textbf {U})$ induces a joint probability distribution ${\mathbb {P}}(\textbf {V})$ over values of the variables $V \in \textbf {V}$ .
Causal interventions represent the result of a manipulation to the causal system, and are defined in the standard way (e.g., [Reference Pearl34, Reference Spirtes, Glymour and Scheines43]):
Definition 2.3. An intervention is a partial function $i : V \mapsto \text {Val}(V)$ . It specifies variables $\text {dom}(i) \subseteq \mathbf {V}$ to be held fixed and the values to which they are fixed. An intervention i induces a mapping, also denoted i, of systems of equations $\mathcal {F} = \{f_V\}_{V \in \textbf {V}}$ , such that $i (\mathcal {F})$ is identical to $\mathcal {F}$ , but with $f_V$ replaced by the constant function $f_V(\cdot ) = i(V)$ for each $V \in \text {dom}(i)$ . Similarly, where $\mathfrak {M}$ is a model with equations $\mathcal {F}$ , we write $i(\mathfrak {M})$ for the model which is identical to $\mathfrak {M}$ but with the equations $i(\mathcal {F})$ in place of $\mathcal {F}$ .
In order to guarantee that interventions lead to a well-defined semantics, we work with structural causal models which are measurable:
Definition 2.4. We say that $\mathfrak {M}$ is measurable if under every finite intervention i, the joint distribution ${\mathbb {P}}(\mathbf {V})$ associated with the model $i(\mathfrak {M})$ is well-defined.
For measurable models, one can define a notion of causal influence:
Definition 2.5. A model $\mathfrak {M}$ induces the influence relation $V_i \rightsquigarrow V_j$ when there exist values $v, v^\prime \in \text {Val}(V_j)$ and interventions $\alpha ,\alpha ^\prime $ differing only in the value they impose upon $V_i$ for whichFootnote 1
Given an enumeration of variables $V_1,\ldots ,V_n$ compatible with a well-order $\prec $ , the model $\mathfrak {M}$ is compatible with $\prec $ when it induces no instance $V_i \rightsquigarrow V_j$ with $i> j$ .
To illustrate the preceding definitions, we return to the front door graph shown in Example 1.1, and demonstrate an example of an SCM that is compatible with this graph:
Example 2.6. Consider the SCM $\mathfrak {M} = (\mathcal {F}, {\mathbb {P}}, \mathbf {U}, \mathbf {V})$ , with the exogenous $\mathbf {U} = \{U, U_X, U_Y, U_Z\}$ , each of which has probability of being 1 and probability of being 0, and with three endogenous variables $\mathbf {V} = \{X, Y, Z\}$ . The equations $\mathcal {F} = \{f_V\}_{V \in \mathbf {V}}$ are given by
We observe that $\mathfrak {M}$ is measurable and recursive with the ordering $\prec $ given by $X \prec Z \prec Y$ . Further, $X \rightsquigarrow Z$ and $Z \rightsquigarrow Y$ , so that $\mathfrak {M}$ indeed realizes the front door graph and is compatible with $\prec $ .
2.2.2 Interpretations of terms and truth definitions
It suffices to give the semantics for $\mathcal {L}_{\mathrm {causal}}^{\mathrm {poly}}$ , since this language includes all of the other languages introduced above. A model is a recursive and measurable SCM $\mathfrak {M} =(\mathcal {F}, {\mathbb {P}}, \mathbf {U},\mathbf {V})$ . For each assignment $\mathbf {u}: U \mapsto \text {Val}(U)$ of values to exogenous variables, each $V \in \mathbf {V}$ , and each $v \in \text {Val}(V)$ , we define $\mathcal {F}, \textbf {u} \models V = v$ if the equations $\mathcal {F}$ together with the assignment $\textbf {u}$ assign the value v to V. Conjunction and negation are defined in the usual way, giving semantics for $\mathcal {F}, \textbf {u} \models \beta $ for any $\beta \in \mathcal {L}_{\mathrm {prop}}$ . If $\mathcal {F}, \mathbf {u} \models \beta $ holds for all $\mathbf {u}$ , then we simply write $\mathcal {F} \models \beta $ . When the relation $\mathcal {F},\mathbf {u} \models \beta $ does not depend on $\mathbf {u}$ at all—that is, we have $\mathcal {F}, \mathbf {u} \models \beta $ iff $\mathcal {F}, \mathbf {u'} \models \beta $ for all $\mathbf {u},\mathbf {u'}$ and all formulas $\beta $ —we say that the equations $\mathcal {F}$ are deterministic. For $\beta ,\beta ^\prime \in \mathcal {L}_{\mathrm {prop}}$ , we write $\beta \models \beta ^\prime $ when $\mathcal {F} \models \beta \rightarrow \beta ^\prime $ for all $\mathcal {F}$ , where material implication is defined in the usual way.
For each intervention $\alpha \in \mathcal {L}_{int}$ and each $\beta \in \mathcal {L}_{\mathrm {prop}}$ , we define $\mathcal {F}, \mathbf {u} \models [\alpha ] \beta $ iff $i_\alpha (\mathcal {F}), \textbf {u}\models \beta $ , where $i_\alpha $ is the intervention which effects the assignments described by $\alpha $ . We also allow that $\alpha $ may be the trivial intervention $\top $ , in which case we simply write $\beta $ instead of $[\alpha ]\beta $ . We define
For conditional probability terms we define when and using the above definition and the usual ratio definition otherwise. For two terms $\textbf {t}_1,\textbf {t}_2$ , we define $\mathfrak {M} \models \textbf {t}_1 \geq \textbf {t}_2$ iff . The semantics for negation and conjunction are defined in the usual way, giving a semantics for $\mathfrak {M}\models \varphi $ for any $\varphi \in \mathcal {L}_{\mathrm {causal}}^{\mathrm {poly}}$ .
With this semantics, probability behaves as expected. For example, we have the following validity for any $\epsilon _1,\epsilon _2$ :
Causal interventions behave as expected as well. Indeed, fix any model $\mathfrak {M}$ with equations $\mathcal {F} $ , any variable $V \in \textbf {V}$ , and any assignment $\textbf {u}$ of values to the exogenous variables. Then V takes on at least and at most one value upon the intervention $\alpha $ : this is trivial if $\alpha $ intervenes on V, and it otherwise follows immediately from the fact that once $\textbf {u}$ is fixed, the values of all variables are determined by the equations $i_\alpha (\mathcal {F})$ . In other words, in the language $\mathcal {L}_{\mathrm {causal}}^*$ for any $* \in \{\mathrm {comp, lin, cond, poly}\}$ , we have the validity for all $\mathfrak {M}$ and $\textbf {u}$ :
More generally, for each $\alpha \in \mathcal {L}_{\mathrm {int}}$ , the indexed box $[\alpha ]$ can be thought of as a normal, functional modal operator.
Having introduced the syntax and semantics for several languages and pointed to some basic validities, we recall in the next subsection various results and examples that illustrate the expressive relationships between these languages.
2.3 A two-dimensional expressive hierarchy
Definition 2.7. For a formula $\varphi $ in any of the languages just introduced, let Mod $(\varphi ) = \{\mathfrak {M}: \mathfrak {M} \models \varphi \}$ be the class of its models. For two languages $\mathcal {L}_{1}$ and $\mathcal {L}_{2}$ , we say that $\mathcal {L}_{2}$ is at least as expressive as $\mathcal {L}_{1}$ if for every $\varphi \in \mathcal {L}_1$ there is some $\psi \in \mathcal {L}_2$ such that Mod $(\varphi ) = $ Mod $(\psi )$ . We say $\mathcal {L}_{2}$ is strictly more expressive than $\mathcal {L}_{1}$ if $\mathcal {L}_{2}$ is at least as expressive as $\mathcal {L}_{1}$ but not vice versa.
In this section, mostly rehearsing familiar results and examples, we illustrate that the expressivity of the languages $\mathcal {L}^*$ for $\mathcal {L} \in \{\mathcal {L}_{\mathrm {prob}}, \mathcal {L}_{\mathrm {causal}}\}$ and $* \in \{\mathrm {comp, lin, cond, poly}\}$ form an expressive hierarchy along two axes. First, the purely probabilistic language $\mathcal {L}_{\mathrm {prob}}^*$ is always less expressive than the corresponding causal language $\mathcal {L}_{\mathrm {causal}}^*$ . Second, $\mathcal {L}^{\mathrm {comp}}$ is less expressive than both $\mathcal {L}^{\mathrm {lin}}$ and $\mathcal {L}^{\mathrm {cond}}$ , both of which are less expressive than the language $\mathcal {L}^{\mathrm {poly}}$ . Where each arrow indicates a strict increase in expressivity, the hierarchy can be shown graphically:Footnote 2
2.3.1 First axis: from probabilistic to causal
To illustrate the expressivity of causal as opposed to purely probabilistic languages, we recall a variation by Bareinboim et al. [Reference Bareinboim, Correa, Ibeling, Icard, Geffner, Dechter and Halpern5] on an example due to Pearl [Reference Pearl34]:
Example 2.8 (Causation without correlation)
Let $\mathfrak {M}_1 = (\mathcal {F}, {\mathbb {P}}, \mathbf {U}, \mathbf {V})$ , where $\textbf {U}$ contains two binary variables $U_1,U_2$ such that , and $\textbf {V}$ contains two variables $V_1, V_2$ such that $f_{V_1} = U_1$ and $f_{V_2} = U_2$ . Then $V_1$ and $V_2$ are independent. Having observed this, one could not conclude that $V_1$ has no causal effect on $V_2$ ; indeed, consider the model $\mathfrak {M}^\prime $ , which is like $\mathfrak {M}$ , except with the mechanisms:
Here $\mathbf {1}_S$ is the indicator function for statement S, equal to $1$ if S holds and $0$ otherwise. In this case ${\mathbb {P}}_{\mathfrak {M}}(V_1,V_2) = {\mathbb {P}}_{\mathfrak {M}^\prime }(V_1,V_2)$ , so that the models are indistinguishable in any of the probabilistic languages $\mathcal {L}_{\mathrm {prob}}^*$ . However, the models are distinguishable in $\mathcal {L}_{\mathrm {causal}}^{\mathrm {comp}}$ , and so in all of the other causal languages. Indeed, note that while . Then, for instance, the following statement
belongs to $\mathcal {L}_{\mathrm {causal}}^{\mathrm {comp}}$ and distinguishes $\mathfrak {M}$ from $\mathfrak {M}^\prime $ .
As shown in [Reference Bareinboim, Correa, Ibeling, Icard, Geffner, Dechter and Halpern5] (cf. also [Reference Suppes and Zanotti45]), the pattern in Example 2.8 is universal: for any model $\mathfrak {M}$ it is always possible to find some $\mathfrak {M}^\prime $ that agrees with $\mathfrak {M}$ on all of $\mathcal {L}_{\mathrm {prob}}^{\mathrm {poly}}$ but disagrees on $\mathcal {L}_{\mathrm {causal}}^{\mathrm {poly}}$ .Footnote 3
Theorem 2.9. $\mathcal {L}_{\mathrm {causal}}^{\mathrm {poly}}$ is more expressive than $\mathcal {L}_{\mathrm {prob}}^{\mathrm {poly}}$ . What is stronger, no $\mathcal {L}_{\mathrm {prob}}^{\mathrm {poly}}$ -theory (i.e., maximally consistent set in this language) uniquely determines a $\mathcal {L}_{\mathrm {causal}}^{\mathrm {poly}}$ -theory.
2.3.2 Second axis: from qualitative to quantitative
Focusing just on probabilistic languages, we will show that $\mathcal {L}_{\mathrm {prob}}^{\mathrm {comp}}$ is less expressive than both $\mathcal {L}_{\mathrm {prob}}^{\mathrm {lin}}$ and $\mathcal {L}_{\mathrm {prob}}^{\mathrm {cond}}$ , and that both of these are less expressive than the language $\mathcal {L}_{\mathrm {prob}}^{\mathrm {poly}}$ . In each case, it suffices to give two measures ${\mathbb {P}}_1(\textbf {V})$ and ${\mathbb {P}}_2(\textbf {V})$ which are indistinguishable in the less expressive language but which can be distinguished by some statement in the more expressive one.
Comparative probability
First, we claim that $\mathcal {L}_{\mathrm {prob}}^{\mathrm {comp}}$ is less expressive than $\mathcal {L}_{\mathrm {prob}}^{\mathrm {lin}}$ . Suppose we have just a single binary variable X, abbreviating $X=1$ by q and $X=0$ by $\neg q$ . Then let so that , and let so that . The qualitative order on the four events $q, \neg q, \top , \bot $ is the same, but, for instance, ${\mathbb {P}}_1(q) = {\mathbb {P}}_1(\neg q)+{\mathbb {P}}_1(\neg q)$ , while ${\mathbb {P}}_2(q) \neq {\mathbb {P}}_2(\neg q)+{\mathbb {P}}_2(\neg q)$ .
Next, we recall an example due to Luce [Reference Luce29], which shows that $\mathcal {L}_{\mathrm {prob}}^{\mathrm {comp}}$ is less expressive than $\mathcal {L}_{\mathrm {prob}}^{\mathrm {cond}}$ . Let $p,q,r$ each be events corresponding to the three possible values taken by a random variable. Consider the measures and . Then the two orders are the same, because for $i \in [2]$ ,
However, the conditional probabilities differ: ${\mathbb {P}}_1(r | q \lor r) < {\mathbb {P}}_1(q | p \lor q)$ , while ${\mathbb {P}}_2(r | q \lor r)> {\mathbb {P}}_2(q | p \lor q)$ . In other words, the measures ${\mathbb {P}}_1$ and ${\mathbb {P}}_2$ are indistinguishable in $\mathcal {L}_{\mathrm {prob}}^{\mathrm {comp}}$ but distinguishable in $\mathcal {L}_{\mathrm {prob}}^{\mathrm {cond}}$ .
Polynomials in probabilities
To show that $\mathcal {L}_{\text {prob}}^{\text {lin}}$ is less expressive than $\mathcal {L}_{\text {prob}}^{\text {poly}}$ , we simply identify a formula $\varphi \in \mathcal {L}_{\text {prob}}^{\text {poly}}$ such that there is no $\psi \in \mathcal {L}_{\text {prob}}^{\text {lin}}$ with Mod $(\varphi ) = $ Mod $(\psi )$ . For this we can take the example ${\mathbb {P}}(A \wedge B)={\mathbb {P}}(\neg A \vee \neg B) \land {\mathbb {P}}(A|B) = {\mathbb {P}}(B)$ . (This is in fact expressible already in $\mathcal {L}_{\text {prob}}^{\text {cond}}$ .) This enforces that ${\mathbb {P}}(B)= 1/\sqrt {2}$ , while Ibeling et al. [Reference Ibeling, Icard, Mierzewski and Mossé27] show that every formula in $\mathcal {L}_{\text {prob}}^{\text {lin}}$ has models in which every probability is rational.
Finally, we give an example to show that $\mathcal {L}^{\mathrm {cond}}_{\mathrm {prob}}$ is less expressive than $\mathcal {L}^{\mathrm {poly}}_{\mathrm {prob}}$ . As above, let $p,q,r$ be events corresponding to possible values taken by a random variable. Define , and , while , and . One can verify by exhaustion that all comparisons of conditional probabilities agree between ${\mathbb {P}}_1$ and ${\mathbb {P}}_2$ , and thus they are indistinguishable in $\mathcal {L}_{\text {prob}}^{\mathrm {cond}}$ . At the same time, there are statements in $\mathcal {L}_{\text {prob}}^{\mathrm {poly}}$ in which the models differ. For example, ${\mathbb {P}}_1(r){\mathbb {P}}_1(q) < {\mathbb {P}}_1(p)$ , whereas ${\mathbb {P}}_2(r){\mathbb {P}}_2(q)> {\mathbb {P}}_2(p)$ . This shows that $\mathcal {L}_{\mathrm {prob}}^{\mathrm {cond}}$ is less expressive than $\mathcal {L}_{\mathrm {prob}}^{\mathrm {poly}}$ . Further, we observe that ${\mathbb {P}}_1, {\mathbb {P}}_2$ can be distinguished in $\mathcal {L}_{\mathrm {prob}}^{\mathrm {lin}}$ : ${\mathbb {P}}_i(q) \geq 0.2$ for $i = 1$ but not for $i=2$ , and this statement is equivalent to the statement in $\mathcal {L}_{\mathrm {prob}}^{\mathrm {lin}}$ that
Together, this observation and the earlier remark that ${\mathbb {P}}(A \wedge B)={\mathbb {P}}(\neg A \vee \neg B) \land {\mathbb {P}}(A|B) = {\mathbb {P}}(B)$ is expressible in $\mathcal {L}_{\text {prob}}^{\text {cond}}$ show that $\mathcal {L}^{\mathrm {lin}}$ and $\mathcal {L}^{\mathrm {cond}}$ are incomparable in expressivity.
Summarizing the results of this section:
Theorem 2.10. $\mathcal {L}^{\mathrm {lin}}$ and $\mathcal {L}^{\mathrm {cond}}$ are incomparable in expressive power. Both are strictly more expressive than $\mathcal {L}^{\mathrm {comp}}$ and strictly less expressive than $\mathcal {L}^{\mathrm {poly}}$ .
3 Introducing computational complexity
In this section, we introduce the ideas from complexity theory needed to state our main results. We denote by ${\mathsf {SAT}}_{\mathrm {prob}}^*, {\mathsf {SAT}}_{\mathrm {causal}}^*$ the satisfiability problems for $\mathcal {L}_{\mathrm {prob}}^*, \mathcal {L}_{\mathrm {causal}}^*$ , respectively, where $* \in \{\mathrm {comp, lin, cond, poly}\}$ . There are two key definitions:
Definition 3.1. Say that a map $\varphi \mapsto \psi $ preserves and reflects satisfiability when $\varphi $ is satisfiable if and only if $\psi $ is satisfiable. Such a map is called a many-one reduction of $\varphi $ to $\psi $ . Such a map is said to run in polynomial time if it is computable by a Turing machine in a number of time steps that is a polynomial function of the length $|\varphi |$ of the input formula. When the Turing machine is non-deterministic, the map is said to be non-deterministic as well; in this case we say that the reduction is an $\mathsf {NP}$ -reduction.
Definition 3.2. A decision problem maps an input, represented as a binary string, to an output “yes” or “no.” For example, ${\mathsf {SAT}}_{\mathrm {prob}}^*$ maps a standard encoding of the formula $\varphi \in \mathcal {L}_{\mathrm {prob}}^*$ to “yes” if it is satisfiable and to “no” otherwise. When each member of a collection $\mathcal {C}$ of decision problems can be reduced via some deterministic, polynomial-time map to a particular decision problem $ c\in \mathcal {C}$ , one says that the problem c is $\mathcal {C}$ -complete. The class $\mathcal {C}$ of decision problems is called a complexity class.
When a problem c is complete for some complexity class, this means that the complexity class $\mathcal {C}$ fully characterizes the difficulty of the problem: the problem c is at least as “hard” as any of the problems in $\mathcal {C}$ , and it is itself in $\mathcal {C}$ . Thus any two problems which are complete for a complexity class are equally hard, since each can be reduced in deterministic polynomial time to the other. Complete problems facilitate results relating complexity classes: to show that a class $\mathcal {C}$ is contained in another $\mathcal {C}^\prime $ , it suffices to give deterministic, polynomial-time, many-one reduction from a problem c which is complete for $\mathcal {C}$ to any problem $c^\prime \in \mathcal {C}^\prime $ .
Fagin et al. [Reference Fagin, Halpern and Megiddo18] showed that ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {lin}}$ is complete for the complexity class ${\mathsf {NP}}$ . That ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {comp}}$ is also ${\mathsf {NP}}$ -complete follows quickly from this result and the Cook–Levin theorem [Reference Cook12], which says that Boolean satisfiability is ${\mathsf {NP}}$ -complete as well. For clarity, we include these known results in the statement of our main result, which gives completeness results for all of the other probabilistic and causal languages defined above:
Theorem 3.3. We characterize two sets of tasks:
-
1. ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {comp}}, {\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {lin}}, {\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {comp}}, {\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {lin}}$ are ${\mathsf {NP}}$ -complete.
-
2. ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {cond}}, {\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {poly}}, {\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {cond}}, {\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {poly}}$ are ${\exists \mathbb {R}}$ -complete.
This is the more formal statement of Theorem 1.2.
Since problems that are complete for a class are all equally hard, our main results imply that causal and probabilistic reasoning in these languages do not differ in complexity. In the remainder of this section, we introduce the complexity classes ${\mathsf {NP}}$ and ${\exists \mathbb {R}}$ . We note that the inclusions ${\mathsf {NP}} \subseteq {\exists \mathbb {R}} \subseteq \mathsf {PSPACE}$ are known [Reference Canny9], where $\mathsf {PSPACE}$ is the set of problems solvable using polynomial space; it is an open problem whether either inclusion is strict. Further, ${\mathsf {NP}}$ and ${\mathsf {PSPACE}}$ are closed under many-one ${\mathsf {NP}}$ -reductions, and ten Cate et al. [Reference ten Cate, Kolaitis and Othman11] show that ${\exists \mathbb {R}}$ is also closed under many-one ${\mathsf {NP}}$ -reductions:
Definition 3.4. A complexity class $\mathcal {C}$ is closed under many-one ${\mathsf {NP}}$ reductions if to show that a problem is in $\mathcal {C}$ , it suffices to find a polynomial-time ${\mathsf {NP}}$ -reduction of the problem to one that is known to be in $\mathcal {C}$ .
3.1 The class ${\mathsf {NP}}$
The class ${\mathsf {NP}}$ contains any problem that can be solved by a non-deterministic Turing machine in a number of steps that grows polynomially in the input size. Equivalently, it contains any problem solvable by a polynomial-time deterministic Turing machine, when the machine is provided with a polynomial-size certificate, which we think of as providing the solution to the problem, or “lucky guesses.” In this case we think of the deterministic Turing machine as a verifier, tasked with ensuring that the certificate communicates a valid solution to the problem.
Hundreds of problems are known to be ${\mathsf {NP}}$ -complete. Among them are Boolean satisfiability and the decision problems associated with several natural graph properties, for example possession of a clique of a given size or possession of a Hamiltonian path. See [Reference Ruiz-Vanoye, Pérez-Ortega, Pazos R., Díaz-Parra, Frausto-Solís, Huacuja, Cruz-Reyes and Martínez F.36] for a survey of such problems and their relations.
3.2 The class ${\exists \mathbb {R}}$
The Existential Theory of the Reals (ETR) contains all true sentences of the form
where $\mathcal {S}$ is a system of equalities and inequalities of arbitrary polynomials in the variables $x_1,\ldots ,x_n$ . For example, one can state in ETR the existence of the golden ratio, which is the only root of the polynomial $f(x) =x^2 - x -1$ greater than one, by “there exists $x>1$ satisfying $f(x) = 0$ .” The decision problem of saying whether a given formula $\varphi \in $ ETR is complete (by definition) for the complexity class ${\exists \mathbb {R}}$ .
The class ${\exists \mathbb {R}}$ is the real analogue of ${\mathsf {NP}}$ , in two senses. Firstly, the satisfiability problem that is complete for ${\exists \mathbb {R}}$ features real-valued variables, while the satisfiability problems that are complete for ${\mathsf {NP}}$ typically feature integer- or Boolean-valued variables. Secondly, and more strikingly, Erickson et al. [Reference Erickson, van der Hoog and Miltzow17] recently showed that while ${\mathsf {NP}}$ is the class of decision problems with answers that can be verified in polynomial time by machines with access to unlimited integer-valued memory, ${\exists \mathbb {R}}$ is the class of decision problems with answers that can be verified in polynomial time by machines with access to unlimited real-valued memory.
As with ${\mathsf {NP}}$ , a myriad of problems are known to be ${\exists \mathbb {R}}$ -complete. We include some examples that illustrate the diversity of such problems:
-
• In graph theory, there is the ${\exists \mathbb {R}}$ -complete problem of deciding whether a given graph can be realized by a straight line drawing [Reference Schaefer39].
-
• In game theory, there is the ${\exists \mathbb {R}}$ -complete problem of deciding whether an (at least) three-player game has a Nash equilibrium with no probability exceeding a fixed threshold [Reference Bilò and Mavronicolas7].
-
• In geometry, there is the ${\exists \mathbb {R}}$ -complete “art gallery” problem of finding the smallest number of points from which all points of a given polygon are visible [Reference Abrahamsen, Adamaszek and Miltzow2].
-
• In machine learning, there is the ${\exists \mathbb {R}}$ -complete problem of finding weights for a neural network trained on a given set of data such that the total error of the network falls below a given threshold [Reference Abrahamsen, Kleist and Miltzow3].
For discussions of further ${\exists \mathbb {R}}$ -complete problems, see [Reference Cardinal10, Reference Schaefer38].
4 Our results
In this section, we prove our main result (Theorem 1.2). To do this, we first establish that one can reduce satisfiability problems for causal languages to corresponding problems for purely probabilistic languages.
4.1 Reduction
Definition 4.1. Fix a set $OP$ of operations on $\mathbb {R}$ , and for a given placeholder set S, let $OP(S)$ be the set of terms generated by application of operations in $OP$ to members of S. Define
The semantics for these languages are restricted to recursive SEMs.
Proposition 4.2 (Reduction)
There exists a many-one ${\mathsf {NP}}$ reduction from ${\mathsf {SAT}}_{\mathcal {L}_{\mathrm {causal}}}$ to ${\mathsf {SAT}}_{\mathcal {L}_{\mathrm {prob}}}$ .
We first give a prose overview of the main ideas underlying the reduction. Fix $\varphi \in \mathcal {L}_{\mathrm {causal}}$ . The key observation is that the reduction is straightforward when every $\epsilon $ with ${\mathbf {P}}(\epsilon )$ mentioned in $\varphi \in \mathcal {L}_{\mathrm {causal}}$ is a complete state description, where a complete state description says, for each possible intervention and each variable, what value that variable takes upon that intervention. Indeed, complete state descriptions have three nice properties:
-
1. Polynomial-time comparison to ordering. One can easily check whether a complete state description implies influence relations conflicting with a given order $\prec $ on the variables appearing in it. Indeed, one simply reads which variables influence which variables of the intervention statements appearing in $\epsilon $ .
-
2. Existence of model matching probabilities. If a collection of complete state descriptions does not conflict with an order $\prec $ , then any probability distribution on the descriptions $\epsilon $ has a recursive model that induces it; briefly, one can simply take a distribution over deterministic models for the mutually unsatisfiable descriptions $\epsilon $ .
-
3. Small model property. At most $|\varphi |$ complete state descriptions are mentioned in $\varphi $ , and so at most that many receive positive probability in any model satisfying $\varphi $ .
These properties will allow a reduction to go through. Indeed, fix $\varphi \in \mathcal {L}_{\mathrm {causal}}$ . Given that $\varphi $ is satisfiable, one can request as an ${\mathsf {NP}}$ certificate an ordering $\prec $ and (relying on #3) the small set of complete state descriptions receiving positive probability. One then checks (relying on #1) that these descriptions do not conflict with $\prec $ . Since $\varphi $ is satisfiable only if there exists a measure satisfying its inequalities, one can safely translate those inequalities into the probabilistic language, giving a satisfiable probabilistic formula $\psi $ . If the probabilistic formula $\psi $ is satisfiable via some measure, one can (relying on #2) infer a corresponding recursive model for the causal formula $\varphi $ . Thus the map $\varphi \mapsto \psi $ preserves and reflects satisfiability.
As it turns out, the same reduction goes through in the general case, when the $\epsilon $ for which ${\mathbf {P}}(\epsilon )$ is mentioned in $\varphi $ need not be complete state descriptions. Roughly, the strategy is to simply replace every $\epsilon $ such that ${\mathbf {P}}(\epsilon )$ is mentioned in $\varphi \in \mathcal {L}_{\mathrm {causal}}$ with an equivalent disjunction of complete state descriptions. The primary complication with this strategy is that there are too many possible interventions, variables, and values those variables could take on; truly complete state descriptions are exponentially long, making the reduction computationally intractable. To address this issue, we work with a restricted class of state descriptions, which feature only the interventions, variables, and values appearing in the input formula $\varphi $ :
Definition 4.3. Fix a formula $\varphi \in \mathcal {L}_{\mathrm {prop}} \cup \mathcal {L}_{\mathrm {causal}}$ . Let $\mathcal {I}$ contain all interventions appearing in $\varphi $ and let $\mathbf {V}_\varphi $ denote all variables appearing in $\varphi $ . For each variable $V \in \mathbf {V}_\varphi $ , let $\text {Assignments}_\varphi (V)$ contain $V =v$ whenever $V=v$ or $V\neq v$ appears in $\varphi $ , and let it also contain one assignment $V = v^*$ not satisfying either of these conditions. Let $\Delta _\varphi $ contain all possible interventions paired with all possible assignments, where the possibilities are restricted to $\varphi $ :
Call $\bigwedge _{V \in \mathbf {V}_\varphi } \beta _V^\alpha $ the results of the intervention $\alpha $ , and $\beta _V^\alpha $ the result for V of the intervention $\alpha $ . We write $\alpha \in \delta $ when $\delta \in \Delta _{\varphi }$ as shorthand for $\alpha \in \mathcal {I}$ . We write $V \in \alpha $ when $\alpha $ contains some assignment $V= v$ .
The following three lemmas confirm that even working with this restricted class of state descriptions, (versions of) the three nice properties outlined above are retained.
Definition 4.4. Fix a formula $\varphi \in \mathcal {L}_{\mathrm {prop}} \cup \mathcal {L}_{\mathrm {causal}}$ and $ \Delta ^{\prime } \subseteq \Delta _\varphi $ . Fix a well-order $\prec $ on $\textbf {V}_{\varphi }$ . Enumerate the variables $V_1,\ldots ,V_n$ in $\textbf {V}_\varphi $ in a way consistent with $\prec $ . The formula $\delta \in \Delta ^{\prime }$ is compatible with $\prec $ when there exists a model $\mathfrak {M}$ that assigns positive probability to $\delta $ and that is compatible with $\prec $ . Define $\Delta _\prec $ to contain all $\delta \in \Delta _\varphi $ compatible with $\prec $ .
Lemma 4.5 (Polytime comparison to ordering)
Fix $\varphi \in \mathcal {L}_{\mathrm {prop}} \cup \mathcal {L}_{\mathrm {causal}}$ . Given a set $\Delta ^{\prime } \leq |\varphi |$ , one can check that $\Delta ^{\prime } \subseteq \Delta _{\varphi }$ and that each $\delta \in \Delta ^{\prime }$ is compatible with $\prec $ in time polynomial in $|\varphi |$ .
This lemma shows that given some statement $\varphi $ and a set of formulas $\Delta ^{\prime }$ , one can efficiently (i.e., in polynomial time) check that the formulas $\delta \in \Delta ^{\prime }$ satisfy two conditions. The first condition is that the formulas $\delta $ describe, in the fullest terms possible, the ways that $\varphi $ could be true (i.e., $\Delta ^{\prime } \subseteq \Delta _\varphi $ ). The second is that the formulas $\delta $ do not rule out the causal influence relations specified by the order $\prec $ , for example the relations $X \prec Z \prec Y$ induced by the model of smoking’s effect on lung cancer discussed in Examples 1.1 and 2.6.
Proof. Checking that $\Delta ^{\prime } \subseteq \Delta _{\varphi }$ is fast, since one can simply scan $\varphi $ to make sure that $\varphi $ mentions precisely interventions mentioned in all $\delta \in \Delta ^{\prime }$ ; that $\varphi $ mentions precisely the variables V appearing in the results of every intervention in $\delta \in \Delta ^{\prime }$ ; and that for each such variable V, at most of one of its assignments $V=v$ in $\Delta ^{\prime }$ does not appear as an assignment or a negated assignment in $\varphi $ .
We now give an algorithm to check whether $\delta \in \Delta ^{\prime }$ is compatible with $\prec $ . We first give prose and formal descriptions of the algorithm and then consider its runtime and correctness.
Order the variables $V_1,\ldots ,V_n$ in $\textbf {V}_\varphi $ in a way consistent with the well-order $\prec $ . For each variable $V_i$ with $i \in [n]$ , do the following. First, for each intervention $\alpha $ in $\delta $ that mentions $V_i$ , confirm that the intervention leads to satisfiable results: if $\delta $ says that upon the intervention $\alpha $ which sets $V_i= v$ , the variable $V_i$ takes a value $v^\prime \neq v$ , we reject $\delta $ , which necessarily has probability 0. Next, for each pair of interventions $\alpha , \alpha ^\prime $ in $\delta $ which do not intervene on the value assigned to $V_i$ , check whether both interventions result in the same assignments to variables $V_j$ for all $j < i$ ; we say that such interventions $\alpha ,\alpha ^\prime $ have agreement on all $V_j$ for $j < i$ . If this is the case, and yet $\delta $ says that these two interventions result in different values for $V_i$ , reject $\delta $ ; since $V_i$ can depend only on the values of $V_j$ for $j < i$ , when these values are constant, $V_i$ must be constant as well. Here is a formal description of the algorithm. We will write $V \in \alpha $ to denote that the variable V appears (or is mentioned) in the intervention $\alpha $ , i.e., that $V = v$ is a conjunct in $\alpha $ for some value v.
Algorithm 1 Check that $\delta \in \Delta ^{\prime }$ is compatible with $\prec $
Below, we show that the above algorithm indeed runs in time $\mathrm {poly}(|\varphi |)$ and is correct, but for clarity, let us step through its execution on some examples. Consider the input $\delta := [V_1= 0] V_1 = 1$ . Then, by the first “if” clause in the algorithm, $\delta $ is rejected as unsatisfiable, since the intervention $[V_1= 0]$ leads to impossible results. For another example, let $\delta ^\prime $ be the formula
Then in the second “if” clause on the third iteration, $\delta ^\prime $ is rejected as incompatible with $\prec $ , because the interventions $\alpha = [V_1 = 1\land V_4 = 1]$ and $\alpha ^\prime = [V_1 =1\land V_4= 0]$ do not intervene on $V_3$ , result in the same values for $V_1$ and $V_2$ , and do result in the same value for $V_3$ , contradicting the fact that $V_3$ ’s value must depend only on those assigned to $V_1$ and $V_2$ .
It is helpful in considering these examples and the runtime of the algorithm to consider the following table of values:
In effect, the second “for” loop over all interventions $\alpha ,\alpha ^\prime $ constructs the above table, starting with the leftmost column $V_1$ and proceeding to the right. The algorithm rejects $\delta ^\prime $ when two cells in the column $V_i$ and rows $\alpha $ and $\alpha ^\prime $ (with $V_i\notin \alpha , \alpha ^\prime $ ) do not assign the same value to $V_i$ but agree on all columns $V_j$ to the left. The restriction that $V_i$ does not appear in $\alpha $ or $\alpha ^\prime $ must be included because distinct interventions $\alpha , \alpha ^\prime $ can disagree on the values they impose on $V_i$ when intervening on it, regardless of the values assigned to $V_j$ with $j < i$ ; such disagreement does not constitute a violation of the ordering $\prec $ .
Let us first confirm that this algorithm runs in time $\mathrm {poly}(|\varphi |)$ and then show its correctness. We observe that $\max \{|\delta |,n\} = \mathrm {poly}(|\varphi |)$ . The algorithm contains an $O(n)$ loop over $V_1,\ldots ,V_n$ and two $O(|\delta |^2) $ loops over interventions. The work performed inside of these loops takes time $O(n \cdot |\delta |)$ , since we are simply reading $\delta $ and checking values for the variables $V_j$ for all $j < i$ , which can be stored in a lookup table (like the one above) of size $O(n \cdot |\delta |)$ . Thus the runtime of the algorithm is indeed $\mathrm {poly}(|\varphi |)$ .
Finally, we confirm that the algorithm is correct. Fix any $\delta \in \Delta ^{\prime }$ and recall that $\delta $ is of the form
where $\beta _V^\alpha \in \text {Assignments}_\varphi (V)$ . First, suppose that the above algorithm declares $\delta $ compatible with $\prec $ . We will inductively construct a deterministic model of equations $\mathcal {F} = \{f_{V_i} \}_{i \in [n]}$ and show that $\mathcal {F} \models \delta $ and $\mathcal {F}$ is compatible with $\prec $ . Define $f_{V_1}$ to be the constant function sending all arguments to $\beta _{V_1}$ , where $\beta _{V_1}$ is the value of $V_1$ upon any intervention $\alpha \in \delta $ with $V_1 \not \in \alpha $ ; the second “for” loop in the algorithm ensures that there is at most one such value, and if there is no such value, $\beta _{V_1}$ can be chosen arbitrarily. Then $f_{V_1} \models \bigwedge _{\alpha } [\alpha ] \beta _{V_1}^\alpha $ . Indeed, this holds by construction for $\alpha $ with $V_1 \not \in \alpha $ , and it holds trivially for $\alpha $ with $V_1 \in \alpha $ , because, by the first “for” loop, each $\alpha $ is compatible with its results. For the inductive step, define $f_{V_i}(V_1 = \beta _{V_1},\ldots ,V_{i-1}=\beta _{V_{i-1}}) = \beta _{V_i}$ , where $\beta _{V_i}$ is the value of $V_i$ upon any intervention $\alpha \in \delta $ for which $V_i \not \in \alpha $ and $\beta _{V_j}^\alpha = \beta _{V_j}$ for all $j < i$ ; by the same reasoning, there is at most one such value, and if there is no such value, $\beta _{V_i}$ can be chosen arbitrarily. Then by the same reasoning, $f_{V_i} \models \bigwedge _{\alpha } [\alpha ] \beta _{V_i}^\alpha $ . Because this holds for all $i \in [n]$ , we have $\mathcal {F} \models \delta $ . By construction, $\mathcal {F}$ is compatible with $\prec $ , as desired.
Now, suppose that $\delta $ is compatible with $\prec $ , so that $\delta $ is not self-contradictory and there exists some $\mathcal {F}=\{f_{V_i} \}_{i \in [n]}$ compatible with $\prec $ for which $\mathcal {F}\models \delta $ . We claim that the above algorithm returns that $\delta $ is indeed compatible with $\prec .$ Suppose for a contradiction that on iteration i, the algorithm rejects $\delta $ as incompatible with $\prec $ . Since $\delta $ is not self-contradictory, it follows by the definition of the algorithm that for some interventions $\alpha , \alpha ^\prime $ (with $V_i \not \in \alpha , \alpha ^\prime $ ) which agree on all $V_j$ for $j < i$ , we have $[\alpha ] V_i = v$ and $[\alpha ^\prime ] V_i = v^\prime $ with $v \neq v^\prime $ . Let $\beta _j$ be the value such that the assignments $V_j = \beta _j$ for $j < i$ result from the interventions $\alpha $ and $\alpha ^\prime $ . Then
which is impossible.
Lemma 4.6 (Existence of model matching probabilities)
Fix $\varphi \in \mathcal {L}_{\mathrm {prop}} \cup \mathcal {L}_{\mathrm {causal}}$ , and suppose ${\mathbb {P}}$ is a measure on $\Delta _{\prec } \subseteq \Delta _{\varphi }$ for some $\prec $ . Then there is a model $\mathfrak {M}$ inducing the measure ${\mathbb {P}}$ on $\Delta _{\prec }$ , i.e., for all $\delta \in \Delta _\prec .$
Proof. Let us first define the model $\mathfrak {M}$ and then show that it is recursive. Let $\mathbf {V}_{\varphi }$ denote all variables appearing in $\varphi $ . We define $\mathfrak {M} = \big (\mathcal {F}, {\mathbb {P}}_{\mathfrak {M}}, \{U\}, \mathbf {V}_{\varphi }\big )$ , where $\mbox {Val}(U) = \Delta _\varphi $ and ${\mathbb {P}}_{\mathfrak {M}} (U = \delta ) = {\mathbb {P}}(\delta )$ for all $\delta \in \Delta _\varphi $ . Enumerate the variables $\mathbf {V}_\varphi = \{V_1,\ldots ,V_n\}$ in a way consistent with $\prec $ . Fix any $\delta \in \Delta _\varphi $ . Recall that $\delta $ is of the form
where $\beta _{V_i}^\alpha \in \text {Assignments}_\varphi (V_i)$ . If $\delta $ is satisfiable, it has a model, i.e., a deterministic system of equations $\mathcal {F}^\delta = \{f_{V_i}^\delta \}_{i \in [n]}$ such that $\mathcal {F}^\delta \models \delta $ . Turning now to define the equations $\mathcal {F} = \{f_{V_i}\}_{i \in [n]}$ , for any assignment $\textbf {v}$ to the variables $V_j$ for $j < i$ , put
By the above equations and mutual unsatisfiability of $\delta \in \Delta _{\prec }$ , it follows that for all such $\delta $
as required.
It remains for us to confirm that $\mathfrak {M}$ is recursive. We claim that the influence relationships $V_i \rightsquigarrow V_j$ induced by the model $\mathfrak {M}$ are simply those induced by the deterministic modelsFootnote 4 $\mathcal {F}^\delta $ . This would complete the proof, since by assumption we have $\delta \in \Delta _{\prec }$ , so that $\mathcal {F}^\delta $ is compatible with $\prec $ , and therefore $i < j$ . Suppose that $\mathfrak {M}$ induces the influence relation $V_i \rightsquigarrow V_j$ . Then for some interventions $\alpha , \alpha ^\prime $ which disagree only on the value assigned to $V_i$ , some assignment $\textbf {u}$ to U, and some distinct values $v, v^\prime $ of $V_j$ , we have
Let $\delta $ be the value that $\textbf {u}$ assigns to U. We claim that
Indeed, this follows from the fact that $f_{V_i}(\textbf {v}, U= \delta ) = f_{V_i}^\delta (\textbf {v})$ for all $i \in [n]$ .
Lemma 4.7 (Small model property)
Fix $\varphi \in \mathcal {L}_{\mathrm {prop}} \cup \mathcal {L}_{\mathrm {causal}}$ . If $\varphi $ is satisfiable, then $\varphi $ has a small model, in the sense that the model assigns positive probability to at most $|\varphi |$ elements $\delta \in \Delta _{\varphi }$ .
Proof. Since $\varphi $ is satisfiable, it has a recursive model $\mathfrak {M}$ with some order $\prec $ . Given the existence of $\mathfrak {M}$ , we claim there exists a small model $\mathfrak {M}_{\mathrm {small}}$ . Indeed, consider the system of equations in the unknowns $\{{\mathbb {P}}(\delta ) : \delta \in \Delta _\prec \}$ given by
There are at most $|\varphi |$ equations, and since for each $\epsilon $ , there exists some $\delta \in \Delta _\prec $ for which $\delta \models \epsilon $ , the equations are non-trivial. Suppose for the moment that ${\mathbb {P}} = {\mathbb {P}}_{\mathfrak {M}}$ is a solution. Then by a fact of linear algebra (see Lemma 4.8 of [Reference Fagin, Halpern and Megiddo18]), since the at most $|\varphi |$ linear equations have a solution, they have a solution ${\mathbb {P}}={\mathbb {P}}_{\mathrm {small}}$ in which at most $|\varphi |$ of the variables ${\mathbb {P}}_{\mathrm {small}}(\delta )$ are nonzero. By Lemma 4.6, we then infer from the existence of ${\mathbb {P}}_{\mathrm {small}}$ that the desired model $\mathfrak {M}_{\mathrm {small}}$ exists.
It remains to confirm that ${\mathbb {P}} = {\mathbb {P}}_{\mathfrak {M}}$ is indeed a solution to the above system of equations. To show this, we must show that for $\epsilon $ with ${\mathbf {P}}(\epsilon )$ mentioned in $\varphi $ ,
By our choice of $\prec $ , we know that the recursive model $\mathfrak {M}$ will assign probability 0 to all $\delta \not \in \Delta _\prec $ . It thus suffices to show that the above holds when $\Delta _\prec $ is replaced with the larger set $\Delta _\varphi $ . To do this, we will put $\epsilon $ into a more manageable form; afterwards, establishing the above equality will be relatively straightforward.
If $\epsilon $ mentions only one intervention $\alpha $ , we claim that $\models \epsilon \leftrightarrow [\alpha ] \beta $ , where $\beta $ is an assignment of variables in $\varphi $ to various values. Indeed, negation and conjunction distribute over $[\alpha ]$ , in the sense that $\models \neg [\alpha ] \beta \leftrightarrow [\alpha ] \neg \beta $ and $\models [\alpha ] \beta \land [\alpha ] \beta ^\prime \leftrightarrow [\alpha ] (\beta \land \beta ^\prime )$ , so $[\alpha ]$ can be assumed to appear on the outside. Further, since by the validity $\mathsf {Def}$ , each variable takes one and only one value upon the intervention $\alpha $ , we can replace $\beta $ with a disjunction over all assignments to all variables in $\varphi $ which agree with $\beta $ . Let us use $\mathbf {v}_\varphi $ to denote such an assignment $\bigwedge _{V \in \mathbf {V}_\varphi } \beta _V$ where $\beta _V \in \mathrm {Assignments}(V)$ , as defined in Definition 4.3. Summing up:
The exact same ideas apply when $\epsilon $ mentions several interventions $[\alpha _i] \beta _i$ for $i \in [n]$ , in which case
Thus, since all interventions, variables, and assignments appearing in $\epsilon $ are mentioned by the $\delta \in \Delta _\varphi $ , and one can always add trivial interventions $[\alpha ] \top $ , we see that it is a validity that $\epsilon $ is equivalent to a disjunction of formulas $\delta \in \Delta _\varphi $ . Finally, we conclude with the observation that since the $\delta \in \Delta _\varphi $ are mutually unsatisfiable, additivity for the measure ${\mathbb {P}}$ (according to which ${\mathbb {P}}(\delta \lor \delta ^\prime ) = {\mathbb {P}}(\delta ) + {\mathbb {P}}(\delta ^\prime )$ for mutually unsatisfiable $\delta , \delta ^\prime $ ) tells us that
as desired.
With the lemmas in hand, we now give the desired reduction:
Proof of Proposition 4.2. Fix a ${\mathsf {SAT}}_{\mathcal {L}_{\mathrm {causal}}}$ instance $\varphi $ . We first describe the ${\mathsf {NP}}$ certificate and many-one reduction and then prove soundness and completeness. The ${\mathsf {NP}}$ certificate consists of an order $\prec $ on ${\mathbf {V}}_{\varphi }$ and a set of $\Delta ^{\prime }$ of size at most $|\varphi |$ . The reduction proceeds as follows.
-
1. Check that $\Delta ^{\prime } \subseteq \Delta _{\varphi }$ and that each $\delta \in \Delta ^{\prime }$ is compatible with $\prec $ . We note that by Lemma 4.5, this can be done in time polynomial in $|\varphi |$ .
-
2. Replace ${\mathbf {P}}(\epsilon )$ appearing in $\varphi $ with ${\mathbf {P}}\Big (\bigvee _{\substack {\delta \in \Delta ^{\prime }\\\delta \models \epsilon }} f(\delta )\Big )$ , where f is a bijection between $\Delta ^{\prime }$ and an arbitrary set of mutually unsatisfiable statements in $\mathcal {L}_{\mathrm {prop}}$ . Call the resulting $\mathcal {L}_{\mathrm {prob}}$ formula $\varphi (\Delta ^{\prime })$ .
We note that checking $\delta \models \epsilon $ can be done in polynomial time, since $\delta $ is a complete description of the results of all interventions.
First we show completeness. If $\varphi $ is satisfiable, by Lemma 4.7 it has a small model that assigns positive probability only to some $\Delta ^{\prime } \subseteq \Delta _\prec $ for some ordering $\prec $ , and the probabilities given by this model also solve $\varphi (\Delta ^{\prime })$ . So the certificate exists and the reduction succeeds in producing a satisfiable formula.
Now we show soundness. If $\varphi (\Delta ^{\prime })$ is satisfiable, it is solved by some measure ${\mathbb {P}}$ . This is a measure defined on $f(\Delta ^{\prime })$ , and so on $\Delta ^{\prime } \subseteq \Delta _\prec $ . Thus since each $\delta \in \Delta ^{\prime }$ is compatible with $\prec $ , by Lemma 4.6 there exists a model $\mathfrak {M}$ such that for $\delta \in \Delta _\varphi $ . This $\mathfrak {M}$ is a model of the inequalities stated by $\varphi $ and is recursive, so $\varphi $ is satisfiable as well. $\Box $
4.2 Characterization
Recall that Theorem 3.3 characterizes two sets of tasks:
We characterize two sets of tasks:
-
1. ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {comp}}, {\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {lin}}, {\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {comp}}, and\, {\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {lin}}$ are ${\mathsf {NP}}$ -complete.
-
2. ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {cond}}, {\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {poly}}, {\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {cond}}, and\, {\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {poly}}$ are ${\exists \mathbb {R}}$ -complete.
We can express these results in a diagram, which holds for $* \in \{\mathrm {prob, causal}\}\!:$
The line separates ${\exists \mathbb {R}}$ -complete problems from ${\mathsf {NP}}$ -complete problems, and an arrow from one satisfiability problem to another indicates that any instance of the former problem is an instance of the latter.
We note that these results imply that there exists a many-one, polynomial-time, deterministic many-one reduction from ${\mathsf {SAT}}_{\mathrm {causal}}^{*}$ to ${\mathsf {SAT}}_{\mathrm {prob}}^{*}$ , for any $* \in \{\mathrm {comp, lin, cond, poly}\}$ , whereas Proposition 4.2 only gives a non-deterministic reduction. To illustrate, recall the model of smoking’s effect on lung cancer discussed in Examples 1.1 and 2.6. Consider again the task of determining whether smoking makes one more likely to possess lung cancer, given one’s causal assumptions $\Gamma $ and one’s observation of statistical correlation between smoking, tar deposits in the lungs, and lung cancer. In other words, the task is determine whether
where the correlational data include statements such as ${\mathbf {P}}(Y = 1 |X = 1)> {\mathbf {P}}(Y =1 | X = 0)$ and $ 0.7> {\mathbf {P}}(Y = 1| X=1) > 0.6$ . The above result implies that this task is no more difficult than that of determining whether an analogous entailment
holds, given purely probabilistic assumptions $\Gamma ^\prime $ . Indeed, given Equation (4), one can efficiently (i.e., in polynomial time) construct a probabilistic equation with the form of Equation (5) such that both entailments have the same truth-value; the causal inference goes through if and only if the purely probabilistic inference goes through.
To show the results in the second part of Theorem 1.2, we borrow the following lemma from [Reference Abrahamsen, Adamaszek and Miltzow2]:
Lemma 4.8. Fix variables $x_1,\ldots ,x_n$ , and set of equations of the form $x_i + x_j = x_k$ or $x_i x_j = 1$ , for $i,j,k \in [n]$ . Let $\exists \mathbb {R}$ -inverse be the problem of deciding whether there exist reals $x_1,\ldots ,x_n$ satisfying the equations, subject to the restrictions . This problem is ${\exists \mathbb {R}}$ -complete.
Here, for reasons of space we only outline the two steps in the proof of Lemma 4.8. First, one shows that finding a real root of a degree 4 polynomial with rational coefficients is ${\exists \mathbb {R}}$ -complete, and then one repeatedly performs variable substitutions to get the constraints $x_i + x_j = x_k$ and $x_i x_j = 1$ . Second, one shows that any such polynomial has a root within a closed ball about the origin, and then one shifts and scales this ball to contain exactly the range .
With the lemma in hand, we show the theorem:
Proof of Theorem 1.2
We begin with the first statement. Using the fact that ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {comp}}$ is ${\mathsf {NP}}$ -hard, it suffices to show that ${\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {lin}}$ is inside ${\mathsf {NP}}$ ; indeed, since all of the satisfiability problems mentioned in the first statement include ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {comp}}$ and are included by ${\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {lin}}$ , they would all then be ${\mathsf {NP}}$ -hard and inside ${\mathsf {NP}}$ , and so would all be ${\mathsf {NP}}$ -complete. It is known both that ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {lin}}$ is inside ${\mathsf {NP}}$ and that ${\mathsf {NP}}$ is closed under many-one ${\mathsf {NP}}$ reductions; by Proposition 4.2, this places ${\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {lin}}$ inside ${\mathsf {NP}}$ , as desired.
We turn now to the second statement. By the same reasoning, it suffices to show that ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {cond}}$ is ${\exists \mathbb {R}}$ -hard and that ${\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {poly}}$ is inside ${\exists \mathbb {R}}$ . We claim that ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {poly}}$ is inside ${\exists \mathbb {R}}$ ; ${\exists \mathbb {R}}$ is closed under many-one ${\mathsf {NP}}$ reductions [Reference ten Cate, Kolaitis and Othman11], so Proposition 4.2 will place ${\mathsf {SAT}}_{\mathrm {causal}}^{\mathrm {poly}}$ in ${\exists \mathbb {R}}$ immediately.
To show that ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {poly}}$ is inside ${\exists \mathbb {R}}$ , we slightly extend a proof by [Reference Ibeling and Icard25] that the problem is in ${\mathsf {PSPACE}}$ . Suppose that $\varphi \in \mathcal {L}_{\text {prob}}^{\text {poly}}$ is satisfied by some model $\mathbb {P}$ . Again using the fact that $\exists \mathbb {R}$ is closed under $\mathsf {NP}$ -reductions, we will provide a reduction of $\varphi $ to a formula $\psi \in \mathsf {ETR}$ . Let E contain all $\epsilon $ such that ${\mathbf {P}}(\epsilon )$ appears in $\varphi $ . Then consider the formula
The measure ${\mathbb {P}}$ satisfies the above formula, so by Lemma 4.7, the above formula is satisfied by some model ${\mathbb {P}}_{\text {small}}$ assigning positive probability to a subset $\Delta ^+ \subseteq \Delta _\varphi $ of size at most $|E| \leq |\varphi |$ . Thus adding to $\varphi $ the constraint $\sum _{\delta \in \Delta ^+} {\mathbf {P}}(\delta ) = 1$ and replacing each ${\mathbf {P}}(\epsilon )$ appearing in $\varphi $ with $\sum _{\delta \in \Delta ^+: \delta \models \epsilon }{\mathbf {P}}(\delta )$ gives a formula $\psi $ belonging to $\mathsf {ETR}$ which has a model, namely ${\mathbb {P}}_{\text {small}}$ —and conversely, the mutual unsatisfiability of the $\delta \in \Delta ^+$ , together with the fact that they sum to unity, ensures that any model of $\psi $ is a model of $\varphi $ . Further, the size constraints on E and $\Delta ^+$ ensure that $\psi $ can be formed in polynomial time.
Let us conclude the proof by showing that ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {cond}}$ is ${\exists \mathbb {R}}$ -hard. To do this, consider an $\exists \mathbb {R}$ -inverse problem instance $\varphi $ with variables $x_1,\ldots ,x_n$ . It suffices to find in polynomial time a ${\mathsf {SAT}}_{\mathrm {prob}}^{\mathrm {cond}}$ instance $\psi $ preserving and reflecting satisfiability. We first describe the reduction and then show that it preserves and reflects satisfiability.
Corresponding to the variables $x_1,\ldots ,x_n$ , let us define events $\delta _1,\ldots ,\delta _n$ , with an additional event $\delta _{n+1}$ , and with the events $\delta _i \in \mathcal {L}_{\mathrm {prop}}$ mutually unsatisfiable. We introduce further variables $\delta _{i,j}$ for $i,j \in [n]$ . We must now translate into $\mathcal {L}_{\mathrm {prob}}^{\mathrm {cond}}$ the constraints of the form (a) , (b) $x_i + x_j = x_k$ , and (c) $x_i x_j = 1$ . At a high level, the strategies are (a) to require that for any model ${\mathbb {P}}$ of $\psi $ , we have , (b) to express constraints on sums using probabilities of disjoint unions of events, and (c) to express constraints on products using conditional probabilities. The reduction will proceed so that the map $x_i \mapsto x_i/2n$ sends solutions of $\varphi $ to solutions of $\psi $ (and so that the map’s inverse sends solutions of $\psi $ to solutions of $\varphi $ ).
For each $i \in [n]$ , we would like to require that , adding to these the constraint ${\mathbf {P}}(\delta _{n+1}) = {\mathbf {P}}(\neg \bigvee _{i\leq n}\delta _i)$ ; the former is analogous to the constraint that , and the latter ensures that the probabilities ${\mathbf {P}}(\delta _i)$ sum to 1. The only obstacle to including the constraint is that $\mathcal {L}_{\mathrm {prob}}^{\mathrm {cond}}$ does not contain constants for and .
This obstacle can be addressed as follows. We replace all instances of with ${\mathbf {P}}\big (\bigvee _{\substack {\ell \in [m]\\b_\ell =1}} \epsilon _i\big )$ , where:
-
• The variable $b_\ell $ denotes the $\ell $ -th digit in the binary expansion $b_1 \ldots b_m$ of .
-
• The events $\epsilon _\ell \in \mathcal {L}_{\mathrm {prop}}$ for $\ell \in [m]$ are mutually unsatisfiable and mention neither any of the variables mentioned in the $\delta _{i,j}$ for $i,j\in [n]$ nor any of the variables mentioned by $\delta _k$ for $k \in [n+1]$ .
-
• We impose the constraint ${\mathbf {P}}(\epsilon _1) = {\mathbf {P}}(\neg \epsilon _1)$ and, for each integer $\ell $ between 2 and m, impose the two constraints ${\mathbf {P}}(\epsilon _\ell ) = {\mathbf {P}}(\epsilon _{\ell , \ell -1} \land \epsilon _{\ell -1})$ and ${\mathbf {P}}(\epsilon _{\ell , \ell -1}| \epsilon _{\ell -1}) = {\mathbf {P}}(\epsilon _{\ell -1})$ . Here, the events $\epsilon _{\ell , \ell -1}$ mention none of the variables mentioned in $\epsilon _{\ell ^\prime }$ for any $\ell ^\prime \in [m]$ , nor those mentioned in $\delta _{i,j},\delta _i$ for any $i,j \in [n+1]$ .
These constraints ensure that for any solution ${\mathbb {P}}$ to $\psi $ , we have ${\mathbb {P}}(\epsilon _j) = 2^{-j}$ , so that (by mutual unsatisfiability) . The same procedure allows for the expression of arbitrary rational constants, for example or .
Turning next to sums and products, we replace each constraint of the form $x_i + x_j = x_k$ with ${\mathbf {P}}(\delta _i \vee \delta _j) = {\mathbf {P}}(\delta _k)$ , and we replace each constraint of the form $x_ix_j = 1$ with the two constraints ${\mathbf {P}}(\delta _{i,j} | \delta _i) = {\mathbf {P}}(\delta _j)$ and . The latter two constraints are included in order to ensure that for any solution ${\mathbb {P}}$ to $\psi $ , we have , giving the desired multiplicative constraint. The events $\delta _{i,j}\in \mathcal {L}_{\mathrm {prop}}$ for $i,j\in [n]$ are to mention none of the variables mentioned in $\delta _k$ for any $k \in [n+1]$ .
This completes our description of the reduction. The operations performed are very simple, so to check the runtime of this reduction, it suffices to check the length of $\psi $ . The probabilistic equation corresponding to $x_i + x_j = x_k$ is of the same length and introduces no new variables besides the original $\delta _i$ for $i \in [n]$ . We next observe that our substitution for the equation $x_i x_j =1$ introduces two new constraints, each with length polynomial in $|\varphi |$ . Finally, we observe that the constraints introduced in the above itemized list number at most $2m$ , where , each being of length polynomial in $m $ , which is in turn polynomial in $|\varphi |$ .
Let us now show that the map $\varphi \mapsto \psi $ preserves satisfiability. Given a solution $x_1,\ldots ,x_n$ to $\varphi $ , we will construct a solution to $\psi $ . Define for $i \in [n]$ and ${\mathbb {P}}(\delta _{n+1}) = 1- \sum _{i \leq n} {\mathbb {P}}(\delta _i)$ . Then as , we have , and the probabilities ${\mathbb {P}}(\delta _i) $ sum to 1, as required. Further, since $x + x_j = x_k$ , we have (by mutual unsatisfiability) ${\mathbb {P}}(\delta _i \vee \delta _j) = {\mathbb {P}}(\delta _i) + {\mathbb {P}}(\delta _j) = {\mathbb {P}}(\delta _k)$ , as required. Finally, let us check that the constraints ${\mathbb {P}}(\delta _{i,j} | \delta _i) = {\mathbb {P}}(\delta _j)$ and are satisfied. Using the definition of conditional probability, these amount to the constraint that
which is satisfied, since , , and $x_ix_j =1$ . To prove that $\varphi \mapsto \psi $ reflects satisfiability, we use the same reasoning, this time defining $x_i = 2n \cdot {\mathbb {P}}(\delta _i)$ .
5 Conclusion and outlook
We have shown that questions posed in probabilistic causal languages can be systematically reduced to purely probabilistic queries, showing that the former are—from a computational perspective—no more complex than the latter. In fact, we demonstrated a kind of bifurcation between two classes of languages. On the one hand, languages encompassing at most addition enjoy an ${\mathsf {NP}}$ -complete satisfiability problem, whether the language is causal or not. However, as soon as we admit even a modicum of multiplication into the language, causal and probabilistic languages become hard for the class ${\exists \mathbb {R}}$ , and even the full language of polynomials over (causal) probability terms is ${\exists \mathbb {R}}$ -complete. At the low end, this applies to a language with no explicit addition or multiplication, but just inequalities between conditional probability terms, or even simple independence statements for pairs of variables. As clarified in the resulting landscape of formal systems, we have identified an important sense in which causal reasoning is no more difficult than pure probabilistic reasoning. The substantial empirical and expressive gulf between causation and “mere (statistical) association” is evidently not reflected in a complexity gap.
It should be acknowledged that, from the standpoint of inferential practice, questions of the form (2) constitute just one part of a larger methodological pipeline. In some sense this is only a final stage in the process of going from an inductive problem to a deductive conclusion. The formulation of reasonable inductive assumptions can itself be an arduous task, as can translating those assumptions into a language like $\mathcal {L}_{\mathrm {prob}}$ or $\mathcal {L}_{\mathrm {causal}}$ (that is, into the set $\Gamma $ ). Take once again the example of do-calculus (Example 1.1). The idea behind this method is that in many contexts investigators will be in a position to make reasonable qualitative (viz. graphical) assumptions, perhaps justified by expert knowledge, to the effect that some variables are not causally impacted in a direct way by certain other variables. Even when this method involves nothing more than assuming a specific causal (directed acyclic) graph, it may still take work to determine which causal-probabilistic statements are licensed by the graph. Many subtasks in this connection have been studied. For instance, determining whether three sets of variables in a graph stand in the so-called d-separation relation (which in turn guarantees conditional independence) is known to be very easy (it is linear time; see, e.g., [Reference Schachter37]). Nonetheless, there are certainly other questions related to complexity that one might ask in this and other settings.
Moving beyond statistical and causal inference tasks narrowly construed, the results in this article raise a number of further research questions, both technical and conceptual. For instance, one can easily imagine versions of our causal languages in a multi-agent setting, with a (causal) probability operator ${\mathbf {P}}_a$ for multiple agents a. As has been widely recognized, strategic interaction routinely involves reasoning about causality and counterfactuals (see, e.g., [Reference Stalnaker44]). Existing formal proposals for capturing these styles of reasoning have been largely qualitative, with counterfactual patterns formalized using models of belief revision rather than structural causal models (see, e.g., [Reference Board8]). Whereas (“pure”) probability-logical languages have been thoroughly explored in the game theory literature (e.g., [Reference Heifetz and Mongin23]), the causal-probability-logical languages studied here would be quite natural to investigate in that context. Echoing our themes in the present article, what happens to computational complexity in this multi-agent setting, and specifically would a reduction to pure (multi-)probability would still be possible?
In a more technical vein, there are natural questions about further extensions to even the most expressive languages we considered. To take just one example, much of probabilistic and causal reasoning employs tools from information theory like (conditional) entropy that in turn rely on logarithmic principles, or alternatively (via inversion), reasoning about exponentiation. A major open problem in logic—known as Tarski’s exponential function problem—is to determine whether the first-order theory of the reals with exponentiation is decidable. Short of that, one might hope to show that some of the weaker (causal-)probability languages studied here remain decidable, perhaps even of relatively low complexity, when exponentiation is added. However, for the strongest languages, such as $\mathcal {L}^{\mathrm {poly}}_{\mathrm {prob}}$ , this may prove difficult. As Macintyre and Wilkie [Reference Macintyre, Wilkie and Odifreddi30] have shown, decidability of the existential theory of the reals with the unary function $e^x$ would already imply a positive answer to Tarski’s problem.
The reader will surely think of further questions and extensions pertaining to our work in this article. We hope that the systems, results, and methods offered here will be useful in these various directions moving forward, and more generally will help to catalyze further research at the fruitful intersection of logic, probability, causality, and complexity.