Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-16T02:56:28.887Z Has data issue: false hasContentIssue false

Fast Inference for Probabilistic Answer Set Programs Via the Residual Program

Published online by Cambridge University Press:  15 January 2025

DAMIANO AZZOLINI
Affiliation:
Department of Environmental and Prevention Sciences, University of Ferrara, Ferrara, Italy (e-mail: damiano.azzolini@unife.it)
FABRIZIO RIGUZZI
Affiliation:
Department of Mathematics and Computer Science, University of Ferrara, Ferrara, Italy (e-mail: fabrizio.riguzzi@unife.it)
Rights & Permissions [Opens in a new window]

Abstract

When we want to compute the probability of a query from a probabilistic answer set program, some parts of a program may not influence the probability of a query, but they impact on the size of the grounding. Identifying and removing them is crucial to speed up the computation. Algorithms for SLG resolution offer the possibility of returning the residual program which can be used for computing answer sets for normal programs that do have a total well-founded model. The residual program does not contain the parts of the program that do not influence the probability. In this paper, we propose to exploit the residual program for performing inference. Empirical results on graph datasets show that the approach leads to significantly faster inference. The paper has been accepted at the ICLP2024 conference and under consideration in Theory and Practice of Logic Programming (TPLP).

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1 Introduction

Statistical Relational Artificial Intelligence (Raedt et al., Reference Raedt, Kersting, Natarajan and Poole2016) is a subfield of Artificial Intelligence aiming at representing uncertain domains with interpretable languages. One of these languages is probabilistic answer set programming (PASP) under the credal semantics (CS), that is Answer Set Programming (ASP, and we use the same acronym to denote answer set programs) extended with probabilistic facts. Inference in PASP often requires grounding the whole program, due to the model driven ASP solving approach. However, other formalisms based on query driven languages, such as PITA (Riguzzi and Swift, Reference Riguzzi and Swift2011) and ProbLog2 (Dries et al., Reference Dries, Kimmig, Meert, Renkens, Van den Broeck, Vlasselaer and De Raedt2015), only ground the relevant part of the program. For a specific class of ASP, namely normal programs without odd loops over negation, we propose to extract the relevant program using SLG resolution (Chen and Warren, Reference Chen and Warren1996), that offers the possibility of returning the residual program, which can then be used to compute the answer sets of the program. At a high level, the process is the following: first, we convert a PASP into a Prolog program that is interpreted under the Well-founded semantics (WFS). Then, we leverage SLG resolution via tabling to compute the residual program for a given query. Lastly, we convert the residual program into a PASP, often smaller than the original PASP, and call a solver to compute the probability of the query. In this way we reduce the size of the program that should be grounded, consistently speeding up the execution time, as demonstrated by different experiments on graph datasets.

The paper is structured as follows: Section 2 provides some background knowledge, Section 3 introduces our solution to extract the residual program, which is tested in Section 4. Section 5 discusses related works, and Section 6 concludes the paper.

2 Background

In this paper, we consider normal logic programs, that is programs composed of normal rules of the form $r=h{:\!-}\ b_1, \dots, b_m, not\ c_1,\ldots, not\ c_m$ , where $h$ , $b_i$ for $i=1,\ldots, m$ and $c_j$ for $j=1,\ldots, n$ are atoms. Given a rule $r$ , we call $H(r)=h$ , $B^+(r)=\{b_1,\ldots, b_m\}$ , $B^-(r)=\{c_1,\ldots, c_n\}$ , $B(r)=\{b_1, \dots, b_m, not\ c_1,\ldots, not\ c_m\}$ the head, positive body, negative body and body of $r$ . A rule with an empty body is called fact. We indicate the Herbrand base of a program $P$ with $B_P$ and its grounding with $ground(P)$ . We use the standard notation $name/arity$ to denote a predicate with name $name$ and arity, that is, number of arguments, $arity$ . The call graph of a program $P$ is a directed graph with one node for each predicate in the program. There is an edge between a predicate $p/n$ and a predicate $q/m$ if $p/n$ is the predicate of the head atom of a rule and $q/m$ is the predicate of a literal in the body of that rule. The edge is labeled as positive ( $+$ ) or negative ( $-$ ) depending on whether the literal is positive or negative in the body of the considered rule. A program $P$ includes Odd Loops Over Negation (OLON) if its call graph contains a cycle with an odd number of negations. Figure 1 shows examples of programs with and without OLON, together with their call graphs. The dependency graph of a program $P$ is a directed graph with one node for each atom in the Herbrand base of the program. There is an edge between an atom $a$ and an atom $b$ if $a$ is the head of a rule in the grounding of $P$ and $b$ is the atom of a literal in the body of that rule. A semantics is relevant (Dix, Reference Dix1995) if the truth value of an atom $a$ depends only from the truth value of the atoms of the relevant sub-graph of the dependency graph, that is the sub-graph that contains the nodes that are reachable from $a$ .

Fig. 1. Programs, call graphs (CD), and dependency graphs (DG) with (Figure 1a) and without (Figure 1b) OLON. The dependency graph of both programs is the same. They only differ in the call graph: for the left program, the call graph contains an edge labeled with $-$ (negative) while for the right program the same edge is labeled with $+$ (positive).

2.1 Stable model semantics

The stable model semantics (SMS) (Gelfond and Lifschitz, Reference Gelfond and Lifschitz1988) associates zero or more stable models to logic programs. An interpretation is a subset of $B_P$ . The reduct of a ground program $P$ w.r.t. an interpretation $I$ , $P^I$ , also known as Gelfond-Lifschitz reduct, is the set of rules in the grounding of $P$ that have their body true in $I$ , that is $P^I=\{r\in ground(P) \mid B^+(r)\subseteq I, B^-(r)\cap I=\emptyset \}$ . A stable model or answer set (AS) of a program $P$ is an interpretation $I$ such that $I$ is a minimal model under set inclusion of $P^I$ . With $AS(P)$ we denote the set of answer sets of a program $P$ . We also consider projected answer sets (Gebser et al., Reference Gebser, Kaufmann, Schaub, van Hoeve and Hooker2009) on a set of ground atoms $V$ , defined as $AS_V(P) = \{A \cap V \mid A \in AS(P)\}$ . Answer Set Programming (ASP) (Brewka et al., Reference Brewka, Eiter and Truszczyński2011) considers programs under the SMS.

Example 1 The following program is composed by (in order of appearance) three facts and four normal rules.

It has the following 8 answer sets (we only report the $path/2$ atoms, for brevity): $\{\}$ , $\{path(a,c)\}$ , $\{path(a,b)\}$ , $\{path(b,d)\}$ , $\{path(a,c), path(b,d)\}$ , $\{path(a,b), path(a,c)\}$ , $\{path(a,d), path(a,b), path(b,d)\}$ , and $\{path(a,d), path(a,b), path(a,c), path(b,d)\}$ . These answer sets are exactly the ones obtained by projecting the original answer sets on the $path/2$ atoms.

ASP has been extended to consider various extensions of normal logic programs: disjunction, constraints, explicit negation, and aggregates. The SMS for normal programs is not relevant, as is shown by the following example.

Example 2 Consider the query $q$ from the program { $q \,{:\!-}\ a.$ , $a.$ }. This program has a single AS $I=\{q,a\}$ so $q$ is true in all AS (we say that $q$ is skeptically true). However, if we add the constraint $c \,{:\!-}\ not \ c$ , the program has no AS, so $q$ is no more skeptically true, even if the dependency graph of $q$ includes only the node $q$ and $a$ .

In this paper, we restrict to normal programs without OLON. These programs always have at least one AS and the SMS in this case is relevant (Marple and Gupta, Reference Marple and Gupta2014).

2.2 Well-founded semantics

The WFS (Van Gelder et al., Reference Van Gelder, Ross and Schlipf1991) assigns a three valued model to a program. A three valued interpretation $I$ is a pair $I=\langle I_T ; I_F \rangle$ where both $I_T$ and $I_F$ are disjoint subsets of $B_P$ and represent the sets of true and false atoms, respectively. Given a three valued interpretation $I = \langle I_T ; I_F \rangle$ for a program $P$ , an atom $a$ is i) true in $I$ if $a \in I_T$ and ii) false in $I$ if $a \in I_F$ while an atom $not \ a$ is i) true in $I$ if $a \in I_F$ and ii) false in $I$ if $a \in I_T$ . If $a$ does not belong neither to $I_T$ nor $I_F$ it is undefined. Furthermore, we define the functions $t(I)$ , $f(I)$ , and $u(I)$ returning the true, false, and undefined atoms, respectively. Lastly, we can define a partial order on three valued interpretations as $\langle I_T ; I_F \rangle \leq \langle J_T ; J_F \rangle$ if $I_T \subseteq J_T$ and $I_F \subseteq J_F$ .

We recall here the iterated fixpoint definition of the WFS from Przymusinski (Reference Przymusinski1989). Consider two sets of ground atoms, $T$ and $F$ , a normal logic program $P$ , and a three valued interpretation $I$ . We define the following two operators:

  • $\mathit{OT}_{I}^{P}(T) = \{a \mid a$ is not true in $I$ and there exist a clause $h \leftarrow l_1,\dots, l_m$ of $P$ such that $a = h\theta$ for a grounding substitution $\theta$ of the clause and $\forall i \in \{1,\dots, m\}$ , $l_i\theta$ is true in $I$ or $l_i\theta \in T\}$ and

  • $\mathit{OF}_{I}^{P}(F) = \{a \mid a$ is not false in $I$ and for every clause $h \leftarrow l_1,\dots, l_m$ and every grounding substitution $\theta$ of the clause of $P$ such that $a = h\theta$ there exist an $i \in \{1,\dots, m\}$ such that $l_i\theta$ is false in $I$ or $l_i\theta \in F\}$ .

In other words, $\mathit{OT}_{I}^{P}(T)$ is the set of atoms that can be derived from $P$ knowing $I$ and $T$ while $\mathit{OF}_{I}^{P}(F)$ is the set of atoms that can be shown false in $P$ knowing $I$ and  $F$ . Przymusinski (Reference Przymusinski1989) proved that both operators are monotonic and so they have a least and greatest fixpoint ( $\mathit{lfp}$ and $\mathit{gfp}$ , respectively). Furthermore, the iterated fixpoint operator $\mathit{IFP}^P(I) = I \cup \langle \mathit{lfp}(\mathit{OT}_{I}^{P}), \mathit{gfp}(\mathit{OF}_{I}^{P}) \rangle$ has also been proved monotonic by Przymusinski (Reference Przymusinski1989). The Well-Founded model (WFM) of a normal program $P$ is the least fixpoint of $\mathit{IFP}^P$ , that is $\mathit{WFM}(P)=\mathit{lfp}(\mathit{IFP}^P)$ . If $u(\mathit{WFM}(P)) = \{\}$ (i.e., the set of undefined atoms of the WFM of $P$ is empty), the WFM is two-valued and the program is called dynamically stratified. The WFS enjoys the property of relevance, and the SMS and WFS are related since, for a normal program $P$ , the WFM of $P$ is a subset of every stable model of $P$ seen as a three-valued interpretation, as proven by Van Gelder et al. (Reference Van Gelder, Ross and Schlipf1991).

2.3 SLG resolution and tabling

SLG resolution was proposed by Chen and Warren (Reference Chen and Warren1996) and was proven sound and complete for the WFS under certain conditions. Its implementation in the most common Prolog systems, such as XSB (Swift and Warren, Reference Swift and Warren2012) and SWI (Wielemaker et al., Reference Wielemaker, Schrijvers, Triska and Lager2012), is based on tabling. In the forest of tree model of SLG resolution (Swift, Reference Swift, Barahona and Alferes1999), a tree is generated for each sub-goal encountered during the derivation of a query. Nodes are of the form $\textit{fail}$ or

\begin{equation*} AnswerTemplate \,{:\!-}\ GoalList | DelayList \end{equation*}

where $AnswerTemplate$ is a (partial) instantiation of the sub-goal and $GoalList$ and $DelayList$ are lists of literals. $DelayList$ contains a set of literals that have been delayed, which is needed to allow the evaluation of a query under the WFS (where the computation cannot follow a fixed order for literal selection) with a Prolog engine (where the computation follows a fixed order for selecting literals in a rule). An answer is a leaf with an empty $GoalList$ . It is named unconditional if the set of delayed atoms is empty; conditional otherwise.

The XSB and SWI implementations of SLG allow mixing it with SLDNF resolution. To obtain the SLG behavior on a predicate, the user should declare the predicate as tabled via the directive $table/1$ , and use $tnot$ instead of $not$ or $\backslash +$ to express negation. After the full evaluation of a query, a forest of trees is built where each leaf node is either $\textit{fail}$ or an answer. If there are conditional answers, we also get the residual program $P^r_q$ , that is the program where the head of rules have the $AnswerTemplate$ of answer nodes and the body contains the literals of the delay list. SLG resolution is sound and complete with respect to the WFS in the sense that atoms that are instantiations of unconditional answers have value true, those that are instantiations of sub-goals whose tree has only $\mathit{fail}$ leaves are false, and those that are instantiations of conditional answers are undefined in the WFM of the program. Let us now show an example.

Example 3 The following program defines three tabled predicates.

If we query $path(a,d)$ , we get the following residual program:

2.4 Probabilistic answer set programming

The CS (Cozman and Mauá, Reference Cozman and Mauá2020) allows the representation of uncertain domains with ASP extended with ProbLog probabilistic facts (De Raedt et al., Reference De Raedt, Kimmig, Toivonen and Veloso2007) of the form $p_i::a_i$ where $p_i \in [0,1]$ is a probability and $a_i$ is a ground atom. Such programs are called Probabilistic ASP (PASP, and we use the same acronym to denote Probabilistic Answer Set Programs). A world is obtained by adding to the ASP a subset of the atoms $a_i$ where $p_i::a_i$ is a probabilistic fact. Every PASP with $n$ probabilistic facts has thus $2^n$ worlds. The probability of a world $w$ is computed as: $P(w) = \prod _{a_i \in w} p_i \cdot \prod _{a_i \not \in w} (1 - p_i)$ . Each world is an ASP and it may have 0 or more answer sets but, for the CS to be defined, it is required that each world has at least one AS. If the ASP is normal without OLON, then the CS exists.

In regular Probabilistic Logic Programming (PLP) (Riguzzi, Reference Riguzzi2022), each world is assigned a WFM which is required to be two-valued, and the probability $P(q)$ of a ground literal (called query) $q$ is given by the sum of the probabilities of the worlds where $q$ is true. In PASP, each world may have more than one AS, so the question is: how to distribute the probability mass of a world among its AS? The CS answers this question by not assuming a specific distribution but allowing all the possible ones, leading to the association of a probability interval to $q$ . The upper bound $\overline{P}(q)$ and the lower bound $\underline{P}(q)$ are given by

(1) \begin{equation} \overline{P}(q) = \sum _{w_i \mid \exists m \in AS(w_i), \ m \models q} P(w_i), \, \, \, \underline{P}(q) = \sum _{w_i \mid \forall m \in AS(w_i), \ m \models q} P(w_i). \end{equation}

In other words, the upper probability is the sum of the probabilities of the worlds where the query is present in at least one answer set, while the lower probability is the sum of the probabilities of the worlds where the query is present in every answer set. In the case that every world has exactly one answer set, the lower and upper probability coincide, otherwise $\overline{P}(q) \gt \underline{P}(q)$ .

To clarify, consider the following example.

Example 4 The following PASP defines 3 probabilistic facts.

It has $2^3 = 8$ worlds, listed in Table 1. Consider the query $path(a,d)$ . Call $w_5$ the world where $e(a,b)$ and $e(b,d)$ are present and $e(a,c)$ absent. It has 4 answer sets but only one of them includes the query. Call $w_7$ the world where all the probabilistic facts are true. It has 8 answer sets but only two include the query. Overall, the probability of the query is $[0,P(w_5) + P(w_7)] = [0,0.03]$ . Note that in $w_5$ and $w_7$ the query is present only in some answer sets, so they contribute only to the upper probability.

Table 1. Worlds and probabilities for Example4. The column #q/# A.S. contains the number of answer sets where the query $path(a,d)$ is true and the total number of answer sets

Several approaches exist to perform inference in PASP, such as projected answer set enumeration (Azzolini et al., Reference Azzolini, Bellodi, Riguzzi, Gottlob, Inclezan and Maratea2022) or second level algebraic model counting (2AMC) (Kiesel et al., Reference Kiesel, Totis and Kimmig2022). Here we focus on the latter since it has been proved more effective (Azzolini and Riguzzi, Reference Azzolini, Riguzzi, Basili, Lembo, Limongelli and Orlandini2023). The components of a 2AMC problem are: a propositional theory $T$ whose variables are divided into two disjoint sets, $X_o$ and $X_i$ , two commutative semirings $R^{i} = (D^i, \oplus ^i, \otimes ^i, n_{\oplus ^i}, n_{\otimes ^i})$ and $R^{o} = (D^o, \oplus ^o, \otimes ^o, n_{\oplus ^o}, n_{\otimes ^o})$ , two weight functions, $w_i: lit(X_i)\rightarrow D^i$ and $w_o: lit(X_o)\rightarrow D^o$ , and a transformation function $f: D^i\rightarrow D^o$ , where $lit(X)$ is the set of literals build on the variables from $X$ . 2AMC is represented as:

(2) \begin{equation} \begin{split} 2AMC(T) =& \bigoplus \nolimits _{I_{o} \in \mu (X_{o})}^{o} \bigotimes \nolimits ^{o}_{a \in I_{o}} w_{o}(a) \otimes ^{o} f( \bigoplus \nolimits _{I_{i} \in \varphi (\Pi \mid I_{o})}^{i} \bigotimes \nolimits ^{i}_{b \in I_{i}} w_{i}(b) ) \end{split} \end{equation}

where $\mu (X_{o})$ is the set of assignments to the variables in $X_{o}$ and $\varphi (T \mid I_{o})$ is the set of assignments to the variables in $T$ that satisfy $I_{o}$ . In other words, the 2AMC task requires solving an algebraic model counting (AMC) (Kimmig et al., Reference Kimmig, Van den Broeck and De Raedt2017) task on the variables $X_i$ for each assignment of the variables $X_o$ . The probability of a query $q$ in a PASP $P$ can be computed by translating $P$ into a propositional theory $T$ with exactly the same models as the AS of $P$ and using (Azzolini and Riguzzi, Reference Azzolini, Riguzzi, Basili, Lembo, Limongelli and Orlandini2023) (i) $\mathcal{R}^{i} = (\mathbb{N}^2, +, \cdot, (0,0), (1,1))$ with $w_i$ mapping $not \ q$ to $(0, 1)$ and all other literals to $(1, 1)$ ; (ii) as transformation function $f(n_1,n_2)$ computing $(v_{lp},v_{up})$ where $v_{lp} = 1$ if $n_1 = n_2$ , 0 otherwise, and $v_{up} = 1$ if $n_1 \gt 0$ , 0 otherwise, and, (iii) $\mathcal{R}^{o} = ([0, 1]^2, +, \cdot, (0, 0),(1, 1))$ , with $w_o$ associating $(p, p)$ and $(1 - p, 1 - p)$ to $a$ and $not \ a$ , respectively, for every probabilistic fact $p :: a$ and $(1, 1)$ to all the remaining literals. Here $X_o$ contains all the probabilistic atoms and $X_i$ contains the remaining atoms from $B_P$ . aspmc (Eiter et al., Reference Eiter, Hecher and Kiesel2021) is a tool that can solve 2AMC via knowledge compilation (Darwiche and Marquis, Reference Darwiche and Marquis2002) targeting negation normal form (NNF) formulas. One measure to assess the size of a program, and so its complexity, is the treewidth (Bodlaender et al., Reference Bodlaender1993), which represents how distant is the considered formula graph from being a tree. The treewidth of a graph can be obtained via tree decomposition, a process that generates a tree starting from a graph, where nodes are assigned to bags, that is, subsets of vertices. A graph may have more than one tree decomposition but its treewidth $t$ is the minimum integer $t$ such that there exists a tree decomposition whose bags have size at most $t+1$ .

3 Extracting the residual program for PASP

From Example4, we can see that the probability of the query $path(a,d)$ is not influenced by the probabilistic fact $e(a,c)$ . Let us call $P(e(a,b)) = p_0$ , $P(e(a,c)) = p_1$ , and $P(e(b,d)) = p_2$ , for brevity. The upper probability of $path(a,d)$ is computed as $P(w_5) + P(w_7) = p_0 \cdot (1-p_1) \cdot p_2 + p_0 \cdot p_1 \cdot p_2 = (p_0 \cdot p_2) \cdot ((1-p_1) + p_1) = p_0 \cdot p_2$ , so, the value of $p_1$ is irrelevant, and the probabilistic fact $e(a,c)$ can be removed from the program. However, during the grounding process, the probabilistic fact is still considered, increasing the size of the grounding. The same happens with rules that do not influence the probability of a query. While the programmer should take care of writing a compact program, encoding exactly the minimal information needed to answer a query, this is usually difficult to do. Consider again Example4: here it is difficult to immediately spot that $e(a,c)$ is irrelevant to the computation of the probability of $path(a,d)$ . To overcome this, the PLP systems PITA (Riguzzi and Swift, Reference Riguzzi and Swift2011) and ProbLog2 (Dries et al., Reference Dries, Kimmig, Meert, Renkens, Van den Broeck, Vlasselaer and De Raedt2015) build a proof for a query containing only the rules that are actually involved in the probability computation. This is possible in PLP since it enjoys the property of relevance. However, the SMS for normal programs without OLON also enjoys the property of relevance, so we aim to do the same in PASP by exploiting the residual program.

We first provide a definition and two results regarding the residual program.

Definition 1 Given a normal program $P$ , the $\mathit{WF}$ reduct of $P$ , indicated with $P^{\mathit{WF}}$ , is obtained by removing from $ground(P)$ the rules with the body false in $\mathit{WFM}(P)$ and by removing from the body of the remaining rules the literals that are true in $\mathit{WFM}(P)$ .

Lemma 1 Given a normal program $P$ , $AS(P)=AS(P^{\mathit{WF}})$ .

Proof Consider an $A\in AS(P)$ . Then $t(\mathit{WFM}(P))\subseteq A$ and $f(\mathit{WFM}(P))\subseteq (B_P\setminus A)$ (see Section 2.2). Consider a rule $r\in P^A$ (i.e., the reduct of $P$ w.r.t. $A$ ). Then $(P^{\mathit{WF}})^A$ contains a rule $r'$ that differs from $r$ because the body does not contain literals that are true in all answer sets and so also in $A$ . Since $r$ is satisfied in $A$ , $r'$ is also satisfied in $A$ . So $A$ is a model of $(P^{\mathit{WF}})^A$ . Moreover, $A$ is also a minimal model of $(P^{\mathit{WF}})^A$ , because otherwise there would be at least one atom $a$ that could be removed from $A$ leading to a set $A'$ that would still be a model for $(P^{\mathit{WF}})^A$ . However, since $A$ was minimal for $P^A$ , this means that there is a rule $r=a\,{:\!-}\ body$ with $body$ true in $A$ . Since there would be a rule $r'=a\,{:\!-}\ body'$ in $(P^{\mathit{WF}})^A$ with $body'$ still true, then $a$ cannot be removed from $A$ against the hypothesis. So $A\in AS(P^{\mathit{WF}})$ .

On the other hand, consider an $A\in AS(P^{\mathit{WF}})$ and a rule $r \in (P^{\mathit{WF}})^A$ . Then $ground(P)$ contains a rule $r'$ that differs from $r$ because the body contains other literals that are true in all answer sets of $P$ . Since $r$ is satisfied in $A$ , $r'$ is also satisfied in $A$ . So $A$ is a model of $P^A$ . Moreover, $A$ is also minimal, because otherwise there would be at least one atom $a$ that could be removed from $A$ leading to a set $A'$ that would still be a model for $P^A$ . However, since $A$ was minimal for $(P^{\mathit{WF}})^A$ , this means that there is a rule $r=a\,{:\!-}\ body$ with $body$ true in $A$ . Since there would be a rule $r'=a\,{:\!-}\ body'$ in $(P^{\mathit{WF}})^A$ with $body'$ still true, then $a$ cannot be removed from $A$ against the hypothesis. So $A\in AS(P)$ .

Theorem 1 Given a normal program $P$ without OLON together with its residual program $P^r_q$ for a query $q$ , the answer sets projected onto the Herbrand base $B_{P^r_q}$ of $P^r_q$ coincide with the answer sets of $P^r_q$ , that is

\begin{equation*} AS_{B_{P^r_q}}(P) = AS(P^r_q). \end{equation*}

Proof $AS( P^{\mathit{WF}})=AS(P)$ by Lemma1, so we prove that $ AS_{B_{P^r_q}}(P^{\mathit{WF}}) = AS(P^r_q)$ . For the soundness of SLG resolution and the fact it analyses the whole relevant sub-graph, the truth of the body of each rule $r\in P^r_q$ is not influenced by the truth value of atoms outside $B_{P^r_q}$ . Therefore, an $A\in AS(P^r_q)$ can be extended to an $A'\in AS(P^{\mathit{WF}})$ such that $A=A'\cap B_{P^r_q}$ . Thus, $A\in AS_{B_{P^r_q}}(P^{\mathit{WF}})$ . In the other direction, if $A'\in AS(P^{\textit{WF}})$ , consider $A=A'\cap B_{P^r_q}$ . Since $B_{P^r_q}$ contains all the atoms in the relevant sub-graph, the truth of the body of each rule $r\in P^r_q$ is not influenced by the truth value of atoms outside $B_{P^r_q}$ and $A$ must be an AS of $P^r_q$ .

To consider PASP, we first translate a PASP into a normal program. We convert each probabilistic fact $p::a$ into a pair of rules:

where $na$ is a fresh atom not appearing elsewhere in the program. This pair of rules encode the possibility that a probabilistic fact may or may not be selected. Then, we replace the negation symbol applied to each atom $b$ with $tnot(b)$ and declare as tabled all the predicates appearing in the program. We extract the residual program and we replace the pair of rules mimicking probabilistic facts with the actual probabilistic fact they represent (i.e., the two rules listed in the previous box are replaced with $p::a$ ). Then, we call a standard solver such as aspmc (Eiter et al., Reference Eiter, Hecher and Kiesel2021, Reference Eiter, Hecher and Kiesel2024) or PASTA (Azzolini et al., Reference Azzolini, Bellodi, Riguzzi, Gottlob, Inclezan and Maratea2022).

Theorem 2 Given a PASP $P$ together with its residual program $P^r_q$ for a query $q$ , let $\overline{P}(q)$ be the upper probability of $q$ in $P$ and $\overline{P'}(q)$ be the upper probability of $q$ in $P^r_q$ . Then

\begin{equation*} \overline {P}(q)=\overline {P'}(q). \end{equation*}

The same is true for the lower probability.

Proof If the clauses generated for some probabilistic fact are absent from the residual program, they will not influence the probability. Let us prove it by induction on the number $n$ of probabilistic facts whose clauses are absent. If $n=1$ and the fact is $p_1::a_1$ , consider an AS $A$ in $AS_{B_{P^r_q}}(P^r_q)$ , associated to a world $w$ of $P^r_q$ . Then, there are going to be two subsets of $AS(P)$ , $\mathcal{A}'$ and $\mathcal{A}''$ , such that $\forall I\in\mathcal{A}'\cup\mathcal{A}'':I\supseteq A$ , $\forall I\in\mathcal{A}':a_1\in I$ and $\forall I\in\mathcal{A}'':na_1\in I$ . $\mathcal{A}'$ is the set of AS of a world $v'$ such that $a_1\in v'$ and $\mathcal{A}''$ is the set of AS of a world $v''$ such that $na_1\in v''$ . However, if $q\in A$ , then $\forall I\in\mathcal{A}'\cup\mathcal{A}'':q\in I$ , so $v'$ and $v''$ either both contribute to one of the probability bounds or neither does. The contribution, if present, would be given by $P(w)\cdot p_1+P(w)\cdot (1-p_1)=P(w)$ , so the fact $p_1::a_1$ does not influence the probability of $q$ . Now suppose the theorem holds for $n-1$ probabilistic facts whose clauses are not present and consider the $n$ -th fact $p_n::a_n$ . Let us call $P^*$ the program $P$ without the fact $p_n::a_n$ . Then $(P^*)^r_q=P^r_q$ , and we can repeat the reasoning for $n=1$ .

4 Experiments

We ran the experiments on a computer running at 2.40 GHz with 32 GB of RAM with cutoff times of 100, 300, and 500 s.Footnote 1

4.1 Datasets description

We considered two datasets with two variations each and with an increasing number of instances. The reachability (reach) dataset models a reachability problem in a probabilistic graph. All the instances have the rules (here we model negation with $\backslash +$ , since it is the symbol adopted in aspmc):

where the $e/2$ facts are probabilistic with probability 0.1. We developed two variations for this dataset: reachBA and reachGrid. The difference between the two is in the generation of the $e/2$ facts: for the former, they are generated by following a Barabasi-Albert model with initial number of nodes equal to the size of the instance and 2 edges to attach from a new node to existing nodes (these two values are, respectively, the values of the $n$ and $m$ parameters of the method barabasi_albert_graph of the NetworkX Python library (Hagberg et al., Reference Hagberg, Schult, Swart, Varoquaux, Vaught and Millman2008) we used to generate them). The query is $path(0,n-1)$ . For the latter, the $e/2$ facts are such that they form a two-dimensional grid. In this case, the query is $path(0,i)$ , where $i$ is a random node (different for every dataset).

The smokers dataset contains a set of programs modeling a social network where some people smoke, and others are influenced by this behavior. Each person is indexed with a number, starting from 0. The base program is

Each $stress/1$ atom is probabilistic with probability 0.1 and each $\mathit{influences}/2$ atom is probabilistic with associated probability 0.2. Also for this dataset we consider two variations, smokersBA and smokersGrid, that are generated with the same structure as for reachBA and reachGrid, respectively. For smokersBA the query is $smokes(n-1)$ where $n$ is the number of person in the network, while for smokersGrid the query is $smokes(i)$ where $i$ is a random person (different for every dataset). For all the instances, the probability associated with probabilistic facts does not influence the time required to compute the probability of the query.

Fig. 2. Cactus plot for aspmc and $\mathrm{aspmc}^r$ with 100, 300, and 500 s of time on the smokersGrid and smokersBA datasets.

Fig. 3. Cactus plot for aspmc and $\mathrm{aspmc}^r$ with 100, 300, and 500 s of time limit on the reachGrid and reachBA datasets.

4.2 Results

In the following, aspmc denotes the results obtained by applying aspmc directly on the considered instance while $\mathrm{aspmc}^r$ denotes the results obtained by first computing the residual program and then passing it to aspmc. For all the experiments the extraction of the residual program was done using the predicate $call\_residual\_program/2$ available in SWI. The extraction takes less than one second, so we decided to report only the total execution times, without indicating the two components for $\mathrm{aspmc}^r$ . This is also the motivation behind the decision of testing only one Prolog system, namely SWI. We could have also used XSB but the results would not have been much different given the almost instantaneous extraction of the residual program. Given the probabilistic nature of the generation of Barabasi-Albert graphs and of the query for grid graphs, the results are averaged over 10 runs. For reachBA and smokersBA, the query is the same in every run but the structure of the graph changes in every run (i.e., each instance has a different graph structure). For reachGrid and smokersGrid, the grid graph is the same in each of the 10 runs but the query changes in each attempt. Figures 2 and 3 show the cactus plot for the four datasets. For smokersGrid (Figure 2a), bare aspmc cannot solve more than 20 instances while $\mathrm{aspmc}^r$ can solve up to 60. Similar considerations hold for smokersBA (Figure 2b), where $\mathrm{aspmc}^r$ can go to over 100 instances solved while aspmc stops at 10. For both datasets, the timeout seems to not influence too much the number of solvable instances, since most of the curves almost coincide. Analogous considerations apply for reachGrid (Figure 3a) and reachBA (Figure 3b). The improvement provided by the residual program extraction can be further assessed from Tables 2 and 3, reporting the number of solved instances and the average number of bags, the average threewidth, and the average number of vertices obtained from the tree decomposition performed by aspmc, with and without residual program extraction. Note that the result of the treewidth decomposition is available even if the instance cannot be solved within the time limit (i.e., the averages are always over the 10 runs). For example, for size 80 of the reachBA dataset, the number of bags goes from 1956 to 47, after the residual program extraction. Overall, the residual program extraction has a huge impact on the simplification of the program, both in terms of more compact representation and in terms of execution time.

Table 2. Values for the reachBA and smokersBA datasets in the format ( $\mathrm{aspmc}^r$ - aspmc) for the tests with 500 s of time limit. $\mu$ stands for (rounded) mean, tw. for treewidth, and vert. for vertices related to the dependency graph. The column # unsolved contains the number of unsolved instances for the specific size

Table 3. Values for the reachGrid and smokersGrid datasets in the format ( $\mathrm{aspmc}^r$ - aspmc) for the tests with 500 s of time limit. $\mu$ stands for (rounded) mean, tw. for treewidth, and vert. for vertices related to the dependency graph. The column # unsolved contains the number of unsolved instances for the specific size

5 Related works

The residual program extraction is at the heart of PITA (Riguzzi and Swift, Reference Riguzzi and Swift2011) and ProbLog2 (Dries et al., Reference Dries, Kimmig, Meert, Renkens, Van den Broeck, Vlasselaer and De Raedt2015), the first adopting Prolog SLG resolution to caching the part of the programs that has already been analyzed. There are other semantics to represent uncertainty with an answer set program such as LPMLN (Lee and Yang, Reference Lee, Yang, Singh and Markovitch2017), P-log (Baral et al., Reference Baral, Gelfond and Rushton2009) or smProbLog (Totis et al., Reference Totis, De Raedt and Kimmig2023). LPMLN allows defining weighted rules and assigns weights to answer sets while P-log adopts probabilistic facts but requires normalization for the computation of the probability. Furthermore, P-log has an interface built on top of XSB (Anh et al., Reference Anh, Kencana Ramli, Damásio, Garcia de la Banda and Pontelli2008) leveraging its tabling mechanisms to speed up inference. The relation between the two has been studied in detail (Balai and Gelfond, Reference Balai, Gelfond and Kambhampati2016; Lee and Yang, Reference Lee, Yang, Singh and Markovitch2017). Another possibility to associate weights to rules is via weak constraints, available in all ASP solvers, that however cannot be directly interpreted as probabilities. smProbLog is the semantics closest to the CS: both support probabilistic facts added on top of an ASP. The probability of a stable model in smProbLog is the probability of its corresponding world $w$ divided by the number of answer sets of $w$ . The CS has also been extended by Rocha and Gagliardi Cozman (Reference Rocha and Gagliardi Cozman2022) to also handle worlds without answer sets, but it requires three truth values (true, false, and undefined). The residual program extraction may help to speed up inference also in these alternative semantics: exploring this is an interesting future work. We consider the SLG resolution implemented in SWI Prolog. However, as already discussed in Section 2.3, it was initially proposed and implemented in the XSB system (Swift and Warren, Reference Swift and Warren2012). Our approach is general and can be built on top of any Prolog system that supports SLG resolution.

The problem of grounding in ASP has also been addressed by the s(ASP) (Marple et al., Reference Marple, Salazar and Gupta2017) and s(CASP) Arias et al. (Reference Arias, Carro, Salazar, Marple and Gupta2018) systems, which are top-down goal-driven ASP interpreters (the latter also allowing constraints). The result of a query in these systems is a subset of the stable models of the whole program containing only the atoms needed to prove the query. Furthermore, the evaluation of a query does not need to ground the whole program. s(ASP) is based on several techniques combined together, such as coinductive SLD resolution to handle cycles through negation and constructive negation based on dual rules to identify why a particular query fails. DLV (Leone et al., Reference Leone, Pfeifer, Faber, Eiter, Gottlob, Perri and Scarcello2006) also has also a query mode. However, none of these systems target probabilistic inference. Furthermore, our approach does not aim to replace the ASP solver, rather to reduce the size of the program that should be grounded. Another possibility to extract the residual program is by analyzing the dependency graph. However, to do so, it is often needed to ground the whole program. With our approach based on SLG resolution, only the relevant part of the program is grounded.

6 Conclusions

In this paper, we proposed to speed up inference in PASP via extraction of the residual program. The residual program represents the part of the program that is needed to compute the probability of a query, and it is often smaller than the original program. This allows a reasoner to ground a smaller portion of the program to compute the probability of a query, reducing the execution time. We extract the residual program by applying SLG resolution and tabling. Empirical results on graph datasets shows that (i) the time spent to extract the residual program is negligible w.r.t. the inference time and (ii) querying the residual program is much faster than querying the original program.

Acknowledgements

This work has been partially supported by Spoke 1 “FutureHPC & BigData” of the Italian Research Center on High-Performance Computing, Big Data and Quantum Computing (ICSC) funded by MUR Missione 4 - Next Generation EU (NGEU) and by Partenariato Esteso PE00000013 - “FAIR - Future Artificial Intelligence Research” - Spoke 8 “Pervasive AI”, funded by MUR through PNRR - M4C2 - Investimento 1.3 (Decreto Direttoriale MUR n. 341 of 15th March 2022) under the Next Generation EU (NGEU). Both authors are members of the Gruppo Nazionale Calcolo Scientifico – Istituto Nazionale di Alta Matematica (GNCS-INdAM).

Competing interests

The authors declare none.

Footnotes

1 Implementation and datasets are available at: https://github.com/damianoazzolini/aspmc

References

Anh, H. T., Kencana Ramli, C. D. P. and Damásio, C. V. 2008. An implementation of extended P-log using XASP. In Logic Programming, Garcia de la Banda, M. and Pontelli, E. Eds. Berlin, Heidelberg, Springer Berlin Heidelberg, 739743.Google Scholar
Arias, J., Carro, M., Salazar, E., Marple, K. and Gupta, G. 2018. Constraint answer set programming without grounding. Theory and Practice of Logic Programming 18, 3-4, 337354.Google Scholar
Azzolini, D., Bellodi, E. and Riguzzi, F. 2022. Statistical statements in probabilistic logic programming. In Logic Programming and Nonmonotonic Reasoning, Gottlob, G., Inclezan, D. and Maratea, M., Eds. Cham, Springer International Publishing, 4355.Google Scholar
Azzolini, D. and Riguzzi, F. 2023. Inference in probabilistic answer set programming under the credal semantics. In AIxIA. 2023 - Advances in Artificial Intelligence, Basili, R., Lembo, D., Limongelli, C. and Orlandini, A., Eds.14318. Heidelberg, Germany, Springer, 367380. Lecture Notes in Artificial Intelligence.Google Scholar
Balai, E. and Gelfond, M. 2016. On the relationship between P-log and LPMLN. Kambhampati, S., Ed. In Proc. of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI. 2016, IJCAI/AAAI Press, New York, NY, USA, 915921.Google Scholar
Baral, C., Gelfond, M. and Rushton, N. 2009. Probabilistic reasoning with answer sets. Theory and Practice of Logic Programming 9, 1, 57144.Google Scholar
Bodlaender, H. L. et al. 1993. A tourist guide through treewidth. Acta Cybernetica 11, 1-2, 121.Google Scholar
Brewka, G., Eiter, T. and Truszczyński, M. 2011. Answer set programming at a glance. Communications of the ACM 54, 12, 92103.Google Scholar
Chen, W. and Warren, D. S. 1996. Tabled evaluation with delaying for general logic programs. Journal of the ACM 43, 1, 2074.Google Scholar
Cozman, F. G. and Mauá, D. D. 2020. The joy of probabilistic answer set programming: Semantics, complexity, expressivity, inference. International Journal of Approximate Reasoning 125, 218239.Google Scholar
Darwiche, A. and Marquis, P. 2002. A knowledge compilation map. Journal of Artificial Intelligence Research 17, 229264.Google Scholar
De Raedt, L., Kimmig, A. and Toivonen, H. 2007. ProbLog: A probabilistic prolog and its application in link discovery. In 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Veloso, M. M., Ed. 7, AAAI Press, 24622467.Google Scholar
Dix, J. 1995. A classification theory of semantics of normal logic programs: I. strong properties. Fundamenta Informaticae 22, 3, 227255.Google Scholar
Dries, A., Kimmig, A., Meert, W., Renkens, J., Van den Broeck, G., Vlasselaer, J. and De Raedt, L. 2015. ProbLog2: Probabilistic logic programming. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2015), Springer, 9286, 312315, Lecture Notes in Computer Science.Google Scholar
Eiter, T., Hecher, M. and Kiesel, R. 2021. Treewidth-aware cycle breaking for algebraic answer set counting. Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, KR 2021, abs/1709.00501, 269279.Google Scholar
Eiter, T., Hecher, M. and Kiesel, R. 2024. Aspmc: New frontiers of algebraic answer set counting. Artificial Intelligence 330, 104109.Google Scholar
Gebser, M., Kaufmann, B. and Schaub, T. 2009. Solution enumeration for projected boolean search problems. In Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, van Hoeve, W.-J. and Hooker, J. N., Eds. Berlin, Heidelberg, Springer Berlin Heidelberg, 7186.Google Scholar
Gelfond, M. and Lifschitz, V. 1988. The stable model semantics for logic programming. In 5th International Conference and Symposium on Logic Programming (ICLP/SLP), 1988, MIT Press, USA, 88, 10701080.Google Scholar
Hagberg, A. A., Schult, D. A. and Swart, P. J. 2008. Exploring network structure, dynamics, and function using NetworkX. Varoquaux, G., Vaught, T. and Millman, J., Eds. Proc. of the 7th Python in Science Conference, Pasadena, CA USA, 1115.Google Scholar
Kiesel, R., Totis, P. and Kimmig, A. 2022. Efficient knowledge compilation beyond weighted model counting. Theory and Practice of Logic Programming 22, 4, 505522.Google Scholar
Kimmig, A., Van den Broeck, G. and De Raedt, L. 2017. Algebraic model counting. Journal of Applied Logic 22, C, 4662.Google Scholar
Lee, J. and Yang, Z. 2017. LPMLN, weak constraints, and P-log. Singh, S. and Markovitch, S., Eds. In Proc. of the Thirty-First AAAI Conference on Artificial Intelligence, 4-9 Feb. 2017, AAAI Press, San Francisco, California, USA, 11701177.Google Scholar
Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S. and Scarcello, F. 2006. The DLV system for knowledge representation and reasoning. ACM Transactions on Computational Logic 7, 3, 499562.Google Scholar
Marple, K. and Gupta, G. 2014. Dynamic consistency checking in goal-directed answer set programming. Theory and Practice of Logic Programming 14, 4-5, 415427.Google Scholar
Marple, K., Salazar, E. and Gupta, G. 2017. Computing stable models of normal logic programs without grounding. CoRR, abs/1709.00501.Google Scholar
Przymusinski, T. C. 1989. Every logic program has a natural stratification and an iterated least fixed point model. In Proc. of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’89, Association for Computing Machinery, New York, NY, USA, 1121.Google Scholar
Raedt, L. D., Kersting, K., Natarajan, S. and Poole, D. 2016. Statistical relational artificial intelligence: Logic, probability, and computation. Synthesis Lectures on Artificial Intelligence and Machine Learning 10, 2, 1189.Google Scholar
Riguzzi, F. 2022. Foundations of Probabilistic Logic Programming Languages, Semantics, Inference and Learning, 2nd ed. River Publishers, Gistrup, Denmark.Google Scholar
Riguzzi, F. and Swift, T. 2011. The PITA system: Tabling and answer subsumption for reasoning under uncertainty. Theory and Practice of Logic Programming 11, 4-5, 433449.Google Scholar
Rocha, V. H. N. and Gagliardi Cozman, F. 2022. A credal least undefined stable semantics for probabilistic logic programs and probabilistic argumentation. Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning, KR 2022 309319.Google Scholar
Swift, T. 1999. A new formulation of tabled resolution with delay. Barahona, P. and Alferes, J. J., Eds. In Progress in Artificial Intelligence, 9th Portuguese Conference on Artificial Intelligence, EPIA ’99, 21-24 Sept. 1999, Springer, Évora, Portugal, Berlin, 1695, 163177, Lecture Notes in Computer Science.Google Scholar
Swift, T. and Warren, D. S. 2012. XSB: Extending prolog with tabled logic programming. Theory and Practice of Logic Programming 12, 1-2, 157187.Google Scholar
Totis, P., De Raedt, L. and Kimmig, A. 2023. SmProbLog: Stable model semantics in probLog for probabilistic argumentation. In Theory and Practice of Logic Programming, Cambridge University Press, 150.Google Scholar
Van Gelder, A., Ross, K. A. and Schlipf, J. S. 1991. The well-founded semantics for general logic programs. Journal of the ACM 38, 3, 620650.Google Scholar
Wielemaker, J., Schrijvers, T., Triska, M. and Lager, T. 2012. SWI-prolog. Theory and Practice of Logic Programming 12, 1-2, 6796.Google Scholar
Figure 0

Fig. 1. Programs, call graphs (CD), and dependency graphs (DG) with (Figure 1a) and without (Figure 1b) OLON. The dependency graph of both programs is the same. They only differ in the call graph: for the left program, the call graph contains an edge labeled with $-$ (negative) while for the right program the same edge is labeled with $+$ (positive).

Figure 1

Table 1. Worlds and probabilities for Example4. The column #q/# A.S. contains the number of answer sets where the query $path(a,d)$ is true and the total number of answer sets

Figure 2

Fig. 2. Cactus plot for aspmc and $\mathrm{aspmc}^r$ with 100, 300, and 500 s of time on the smokersGrid and smokersBA datasets.

Figure 3

Fig. 3. Cactus plot for aspmc and $\mathrm{aspmc}^r$ with 100, 300, and 500 s of time limit on the reachGrid and reachBA datasets.

Figure 4

Table 2. Values for the reachBA and smokersBA datasets in the format ($\mathrm{aspmc}^r$ - aspmc) for the tests with 500 s of time limit. $\mu$ stands for (rounded) mean, tw. for treewidth, and vert. for vertices related to the dependency graph. The column # unsolved contains the number of unsolved instances for the specific size

Figure 5

Table 3. Values for the reachGrid and smokersGrid datasets in the format ($\mathrm{aspmc}^r$ - aspmc) for the tests with 500 s of time limit. $\mu$ stands for (rounded) mean, tw. for treewidth, and vert. for vertices related to the dependency graph. The column # unsolved contains the number of unsolved instances for the specific size