1 Introduction
Abstract Interpretation (Cousot and Cousot Reference Cousot and Cousot1977) allows constructing sound program analysis tools that can extract properties of a program by safely approximating its semantics. Abstract interpretation-based analysis was proved practical and effective in the context of (Constraint) Logic Programming ((C)LP) (García de la Banda et al. Reference García de la Banda, Hermenegildo, Bruynooghe, Dumortier, Janssens and Simoens1996; Warren et al. Reference Warren, Hermenegildo and Debray1988; Muthukumar and Hermenegildo Reference Muthukumar and Hermenegildo1990; Van Roy and Despain Reference Van Roy and Despain1990; Le Charlier and Van Hentenryck Reference Le Charlier and Van Hentenryck1994; Kelly et al. Reference Kelly, Marriott, Søndergaard and Stuckey1998; Warren et al. Reference Warren, Hermenegildo and Debray1988), which was also one of its very first application areas (Giacobazzi and Ranzato Reference Giacobazzi and Ranzato2022), and the techniques developed for CLP have also proved useful in the analysis and verification of other programming languages by using semantic translation into Constraint Horn Clauses (CHCs) (Henriksen and Gallagher Reference Henriksen and Gallagher2006; De Angelis et al. Reference De Angelis, Fioravanti, Gallagher, Hermenegildo, Pettorossi and Proietti2021; Méndez-Lojo et al. Reference Méndez-Lojo, Navas and Hermenegildo2007a). In the context of static analysis of (C)LP programs, variable sharing soon emerged as a fundamental property, which has led to very active and continuous development of variable sharing analysis domains. Sharing proved immediately necessary for ensuring correctness and precision while inferring most other useful program properties such as modes, determinacy, non-failure, and cost, among others. In fact, some early LP analyses were actually incorrect because variable sharing was not taken into account. Sharing analyses have also proven fundamental in the optimization of unification (Søndergaard Reference Søndergaard1986) and in automatic (and-)parallelization (Cabeza and Hermenegildo Reference Cabeza and Hermenegildo1994; Hermenegildo and Rossi Reference Hermenegildo and Rossi1995; Pontelli et al. Reference Pontelli, Gupta, Pulvirenti, Ferro and Naish1997; Bueno et al. Reference Bueno, García de la Banda and Hermenegildo1999; García de la Banda et al. Reference García de la Banda, Hermenegildo and Marriott2000). for example in parallelization it is crucial to precisely detect whether two variables are independent (i.e., they do not share), a variable is ground, etc. Sharing has also been used as the basis for more complex analyses for related properties such as linearity, paths, or freeness (Muthukumar and Hermenegildo Reference Muthukumar and Hermenegildo1991; Bruynooghe and Codish Reference Bruynooghe and Codish1993; King and Soper Reference King and Soper1994; Amato and Scozzari Reference Amato and Scozzari2014; Amato et al. Reference Amato, Meo and Scozzari2022). Furthermore, beyond (C)LP, sharing analysis is directly related to aliasing and points-to analyses in imperative programming, widely used in the context of languages with pointers and dynamic memory (Landi and Ryder Reference Landi, Ryder and McKinley1992; Steensgaard Reference Steensgaard1996; Aiken et al. Reference Aiken, Foster, Kodumal and Terauchi2003; Bravenboer and Smaragdakis Reference Bravenboer and Smaragdakis2009; Navas et al. Reference Navas, Méndez-Lojo and Hermenegildo2009; Rountev et al. Reference Rountev, Milanova and Ryder2001; Whaley and Lam Reference Whaley and Lam2002), sometimes applying directly domains stemming from (C)LP (Zanardini Reference Zanardini2018; Méndez-Lojo and Hermenegildo Reference Méndez-Lojo and Hermenegildo2008). In fact, (C)LP sharing analyses constituted some of the very first abstract interpretation-based pointer aliasing analyses for any programming language.
In this paper we concentrate on a popular abstract domain for variable sharing analysis: set-sharing analysis (Jacobs and Langen Reference Jacobs and Langen1989; Muthukumar and Hermenegildo Reference Muthukumar and Hermenegildo1989). This domain captures which sets of program variables share run-time variables, encoding this information in sharing sets. For example, assume $X$ , $Y$ , and $Z$ are the syntactic program variables that we need to track, and consider the substitution (run-time store) $\{X/f(M, K, a), Y/g(b, M), Z/g(a,b)\}$ . This substitution is abstracted to the sharing set $\{\{X\}, \{X, Y\}\}$ , where $\{X, Y\}$ represents that there is (or, more precisely, may be) one or more variables shared in the terms to which $X$ and $Y$ are bound, and $\{X\}$ represents that there is one or more variables that appear only in $X$ . Additionally, $Z$ not appearing in any set means that it contains no variables, that is $Z$ is bound to a ground term. Set-sharing encodes not only variable independence, that is the fact that substitutions that affect a given variable will not affect another one, but also groundness, grounding dependencies (e.g., the fact that if $Y$ becomes ground then $X$ becomes ground, but not the other way around), independence relationships, etc. This representation richness does come, however, at a price: the scalability of set-sharing domains to a wide set of realistic programs is challenging, since the number of sharing sets carried by the abstraction can be exponential in the number of variables of the clause being analyzed. This has prompted much work in improvements and alternative representations of set-sharing abstractions, with the objective of improving performance while maintaining precision as much as possible. In contrast, in this work, rather than defining new sharing abstract domains or modifying existing ones, we concentrate on developing techniques that can be incorporated in the analysis framework to address the root causes of the performance issues faced by the set-sharing domains, such as the number of variables, without affecting precision. We draw inspiration from techniques used in the context of compiler optimizations, which significantly reduce the number of variables presented in the abstractions.
The rest of the paper is structured as follows: First, Section 2 provides the necessary background, covering set-sharing abstract domains and top-down analysis. Section 3 presents our approach, offering first a program transformation that can provide an optimal solution; and second, an alternative solution based on variable trimming that can be applied during analysis without modifying the program. Section 4 reports our experimental evaluation and finally, Section 5 summarizes our conclusions and discusses some lines of future work.
2 Notation and preliminaries
We represent variables by uppercase letters (e.g.,: $X$ , $Y$ , $Z$ , etc.) and atoms by lowercase letters (e.g.,: $a$ , $b$ , $c$ , etc.). $\mathcal{P}(S)$ denotes the powerset of set $S$ and $\mathcal{P}^0(S)$ the proper powerset of set $S$ , that is $\mathcal{P}^0(S)=\mathcal{P}(S)\backslash \{\emptyset \}$ . Given a term T, vars(T) denotes the set of its variables. A Constraint Logic Program (CLP) is a set of clauses of the form $H \: \mbox{:-} \: A_{1}, \ldots , A_{n}$ where $A_{1}, \ldots , A_{n}$ are literals that form the body and H is a positive literal said to be the head of the clause. A substitution is a set $\theta = \{V_1/t_1, \ldots , V_n/t_n\}$ with $V_i$ distinct variables and $t_i$ terms. We say that $t_i$ is the value of $V_i$ in $\theta$ . The set $\{V_1,\ldots ,V_n\}$ is the domain of $\theta$ (dom( $\theta$ )).
The main idea behind Abstract Interpretation (Cousot and Cousot Reference Cousot and Cousot1977) is to interpret the program over a special, abstract domain whose elements are finite representations of possibly infinite sets of actual substitutions in the concrete domain. We denote the concrete domain as $D_{\gamma }$ , the abstract domain as $D_{\alpha }$ , and the functions that relate sets of concrete substitutions with abstract substitutions as the abstraction function $\alpha : D_{\gamma } \rightarrow D_{\alpha }$ and the concretization function $\gamma : D_{\alpha } \rightarrow D_{\gamma }$ . The concrete domain is typically a complete lattice with the set inclusion order which induces an ordering relation in the abstract domain represented by $\sqsubseteq$ . Under this relation the abstract domain is usually a complete lattice and $(D_{\gamma }, \alpha , D_{\alpha }, \gamma )$ is a Galois insertion/connection (Cousot and Cousot Reference Cousot and Cousot1977). Several frameworks for abstract interpretation exist; this work focuses on top-down frameworks, discussed in Section 2.
Set-Sharing Analyses. As mentioned in the introduction, in static analysis of logic programs, tracking of variables shared among terms is essential. A set of program variables $V_1,\ldots , V_m$ share if, in some execution of the program, they may respectively be bound to terms $\mathtt{T}_{1},\ldots ,\mathtt{T}_{m}$ such that $vars(\mathtt{T}_{1})\cap \ldots \cap vars(\mathtt{T}_{m}) \neq \emptyset$ . One of the most accurate abstract domains defined for tracking sharing information is set-sharing (Jacobs and Langen Reference Jacobs and Langen1989; Muthukumar and Hermenegildo Reference Muthukumar and Hermenegildo1989). This domain captures whether at run-time there are variables sharing, condensing this information in a concise set representation. As an example, consider program variables X, Y, Z, W, T, and assume they are bound at run-time as follows: $\theta =\{\mathtt{X/f(M,a), Y=g(b,M,c), Z/[1,M,3], W/g(b), T/h(K,m)}\}$ . This substitution (run-time state) is represented in the set-sharing domain by the sharing abstraction $\{\{\mathtt{X,Y,Z}\},\{\mathtt{T}\}\}$ . The first element, $\{\mathtt{X,Y,Z}\}$ , represents the fact that there is (at least one) variable (i.e., M) that appears in all of X, Y, Z, but not in T or W; and the second element, $\{\mathtt{T}\}$ , represents that there is (at least one) variable that appears in T (i.e., K) but not in any of the others. We say that X, Y and Z “share,” and that T does not “share” with X, Y, or Z. The fact that W does not appear in any set means it contains no variables and thus, it is ground. This representation also captures that there are no other variables in X, Y, or Z in addition to the one(s) they share, which has important implications with respect to grounding: after a program statement that grounds one of them (e.g., Z=[1,2,3]), we know both X and Y will also be grounded. However, grounding T does not ground any of X, Y, or Z. Other abstract domains have also been studied, notably the pair-sharing, which keeps track of pairs of variables that share. for example for the example above, its pair-sharing abstraction is: $\{(\mathtt{X}, \mathtt{Y}), (\mathtt{Y},\mathtt{Z}), (\mathtt{X}, \mathtt{Z})\}$ . The intricacies of the relation and tradeoffs between set-sharing and pair-sharing are beyond the scope of this paper; the reader is referred to, for example, Bagnara et al. Reference Bagnara, Hill and Zaffanella(1997) and Bueno and García de la Banda (Reference Bueno and García de la Banda2004) for further discussion of this topic. However, our subject of study in this work is set-sharing analyses.
The set-sharing domain has attracted a lot of attention in the literature and has been enhanced in different ways and extended with other kinds of information such as freeness or linearity (Muthukumar and Hermenegildo Reference Muthukumar and Hermenegildo1991; Bruynooghe et al. Reference Bruynooghe, Codish, Mulkers, de Boer and Gabbrielli1994; Filé Reference Filé1994; King and Soper Reference King and Soper1994; Codish et al. Reference Codish, Dams, Filé and Bruynooghe1996; Fecht Reference Fecht, Kuchen and Swierstra1996; Zaffanella et al. Reference Zaffanella, Bagnara and Hill1999; Hill et al. Reference Hill, Zaffanella and Bagnara2004; Navas et al. Reference Navas, Bueno and Hermenegildo2006; Trias et al. Reference Trias, Navas, Ackley, Forrest and Hermenegildo2008; Amato and Scozzari Reference Amato and Scozzari2009; Amato et al. Reference Amato, Meo and Scozzari2022).
However, the set representation, which allows the sharing domain to offer high precision, is also one of the reasons why this family of domains presents scalability challenges. A set-sharing abstraction is presented as a set of sets, each of them capturing a possible sharing that occurs at runtime. In cases where there is not much (or any) sharing information at runtime, more (or all the) sharing sets are possible. Given a set of variables appearing in a program being analyzed ( $V$ ), the size of a set-sharing abstraction has an upper bound given by the abstraction which captures all the possible non-empty sharing sets ( $\mathcal{P}^0(V)$ ). Thus, the size of an abstraction is, in the worst case, exponential in terms of the number of variables that appear in the program. To overcome these problems, various representations have been proposed, such as collapsing subsets of the abstraction into “cliques” (sets of variables that represent the proper powerset of those variables). These representations allow for a reduction in the size of the sharing abstraction and can improve performance, even more so when equipped with widenings (albeit then at the cost of losing precision) (Zaffanella et al. Reference Zaffanella, Bagnara and Hill1999; Navas et al. Reference Navas, Bueno and Hermenegildo2006). For example, the set-sharing abstraction $\{\{\mathtt{X}\},\{\mathtt{X,Y}\},\{\mathtt{Y}\},\{\mathtt{Z}\}\}$ can be represented using cliques as the tuple $(\{\{\mathtt{X,Y}\}\}, \{\{\mathtt{Z}\}\})$ where the clique $\{\mathtt{X,Y}\}$ represents $\mathcal{P}^0(\{\mathtt{X,Y}\})$ .
The PLAI Top-Down Analyzer. Top-down analyses are a family of static analyses that build an analysis graph starting from a series of program entry points. This approach was first used in analyzers such as MA3 and Ms (Warren et al. Reference Warren, Hermenegildo and Debray1988), and matured in the PLAI analyzer (Muthukumar and Hermenegildo 1990, 1992) using an optimized fixpoint algorithm now also referred to as the top-down algorithm or solver. This algorithm was later applied to the analysis of CLP/CHCs (García de la Banda et al. Reference García de la Banda, Hermenegildo, Bruynooghe, Dumortier, Janssens and Simoens1996) and imperative programs (Henriksen and Gallagher Reference Henriksen and Gallagher2006; De Angelis et al. Reference De Angelis, Fioravanti, Gallagher, Hermenegildo, Pettorossi and Proietti2021; Méndez-Lojo et al. b), and used in analyzers such as GAIA (Le Charlier and Van Hentenryck Reference Le Charlier and Van Hentenryck1994), the CLP( $\mathcal{R}$ ) analyzer (Kelly et al. Reference Kelly, Marriott, Søndergaard and Stuckey1998), or Goblint (Seidl and Vogler Reference Seidl and Vogler2021; Tilscher et al. Reference Tilscher, Stade, Schwarz, Vogler, Seidl, Arceri, Cortesi, Ferrara and Olliaro2023).
The graph constructed by the PLAI algorithm during analysis is a finite, abstract object whose concretization approximates the (possibly infinite) set of (possibly infinite) maximal AND-trees of the concrete semantics. This approach separates the abstraction of the structure of the concrete trees (the paths through the program) from the abstraction of the substitutions at the nodes in those concrete trees (the program states in those paths). The first abstraction, $T_{\alpha }$ , is typically built-in, as an abstract domain of analysis graphs. The framework is parametric on a second abstract domain, $D_{\alpha }$ , whose elements appear as labels in the nodes of the analysis graph. A more detailed recent discussion can be found in De Angelis et al. Reference De Angelis, Fioravanti, Gallagher, Hermenegildo, Pettorossi and Proietti(2021). Assume we are analyzing a literal Goal in the body of some clause in the program, and that HeadBody is a clause in a predicate whose head unifies with Goal. Assume also that the substitution affecting Goal at the time of this call is approximated by the abstract substitution Call such that $\mathtt{vars}(\mathtt{Goal})$ $\subseteq$ $\mathtt{dom}(\mathtt{Call})$ and $\mathtt{vars}(\mathtt{Call})\cap (\mathtt{vars}(\mathtt{Head})\cup \mathtt{vars}(\mathtt{Body}))=\emptyset$ . The success (exit state) of Goal after having executed the above clause is represented by the abstract substitution Success given by:
As an example, let Goal $=\mathtt{p(A,f(B),E)}$ , Call $=\{\{\mathtt{A}\},\{\mathtt{B,C}\}, \{\mathtt{A,C,D}\}\}$ (notice that E is ground, since it does not appear in any sharing set) and Head $\,\: \mbox{:-} \:\,$ Body be the clause p(f(X),f(Y),Z) $\,\: \mbox{:-} \:\,$ [X|T1]=[Z,Y|T2], whose Head unifies with Goal. The success abstraction is computed as follows:
-
(i) First, the abstraction Call is projected onto the variables in Goal by means of the project function, obtaining Proj $=\{\{\mathtt{A}\}, \{\mathtt{B}\}\}$ .
-
(ii) Then, callToEntry(Proj, Goal, Head) yields an abstract substitution that represents the unification p(A,f(B),E)=p(f(X),f(Y),Z) (i.e., Goal=Head) in the context represented by Proj. In our example, such abstraction is $\{\{\mathtt{X}\}, \{\mathtt{Y}\}\}$ , where Z is becomes ground because it is unified with E, which is ground.
-
(iii) Now, the domain of such abstraction is extended with the variables in Body that do not appear in Head (i.e., T1 and T2), and the abstraction is updated accordingly by including safely approximated information. This is done by operation augment, which returns the Entry abstraction $\{\{\mathtt{X}\}, \{\mathtt{Y}\}, \{\mathtt{T1}\}, \{\mathtt{T2}\}\}$ .
-
(iv) Then, Body is traversed so that each of its literals are (recursively) analyzed by procedure entryToExit, described in Algorithm 1. In our example, entryToExit(Entry, Head, Body) proceeds as follows:
-
(a) First, the Exit abstraction is initialized with the current Entry abstraction (Line 2), and the first literal of the body is selected (Line 3), which in this case corresponds to the only literal in the body: [X|T1]=[Z,Y|T2].
-
(b) The PLAI framework proceeds differently depending on the kind of literal being analyzed (see Lines 4–15). Since the literal [X|T1]=[Z,Y|T2] is neither a recursive call nor a call to a predicate in the analysis scope, the steps in Lines 9–15 are performed as explained below.
-
(c) First, the abstraction is projected onto the variables in the literal, returning $\{\{\mathtt{X}\}, \{\mathtt{Y}\}, \{\mathtt{T1}\}, \{\mathtt{T2}\}\}$ ) and the operation abstractLiteral is invoked, which captures the abstract behavior of the literal. In our example, it performs the abstract unification [X|T1]=[Z,Y|T2]. Since Z is ground, and the unification induces X=Z, the groundness information is propagated to X. Such unification also induces T1=[Y|T2], which results in the creation of new sharing sets accordingly. The $\mathtt{Abs}_{\mathtt{exit}}$ abstraction obtained after these operations is $\{\{\mathtt{Y},\mathtt{T1}\}, \{\mathtt{Y},\mathtt{T1},\mathtt{T2}\}, \{\mathtt{Y},\mathtt{T2}\}\}$ .
-
(d) Finally, such abstraction is used to update the previous exit abstraction by the extend operation (Line 15), yielding Exit= $\{\{\mathtt{Y},\mathtt{T1}\}, \{\mathtt{Y},\mathtt{T1},\mathtt{T2}\}, \{\mathtt{Y},\mathtt{T2}\}\}$ .
-
-
(v) After the execution of entryToExit, the obtained Exit abstraction is projected over vars(Head) and represented in the context of the variables of Call. This is done by operation exitToPrime(project(vars(Head), Exit), Head, Goal), which captures the effects of the unification Head=Goal. In our example, it yields Prime= $\{\{\mathtt{B}\}\}$ , since the groundness information has been propagated from X to A.
-
(vi) The analysis concludes with the extend(Call, Prime) operation, which updates the initial Call abstraction with the new inferred information in the Prime abstraction, obtaining the success abstraction: ${\mathtt{Success}}=\{\{\mathtt{B,C}\}\}$ , where the sharing-set $\{\mathtt{A,C,D}\}$ has been deleted because A is ground, and such information is propagated to D.
As some final remarks, if no predicate head can be unified with the goal under analysis, a bottom abstraction ( $\bot$ ) is returned (representing that the exit state is unreachable). If several clauses are available, all of them are analyzed, and a collection of prime abstractions $\mathtt{Prime}_1,\ldots ,\mathtt{Prime}_m$ is obtained, one abstraction per clause, where $m$ is the number of clauses. Then, the success abstraction is computed as Success=extend(Call,computeLub( $\mathtt{Prime}_1,\ldots ,\mathtt{Prime}_m$ )), where computeLub yields the least upper bound of the collection of abstractions (other operators, including disjunction and widenings, are possible).
In the entryToExit loop (Lines 3–5), when the current literal under analysis corresponds to a recursive call (Lines 4–5), the analyzer computes a fixpoint for the associated call pattern. Such call pattern is determined by the current literal Goal and the abstraction Call representing the environment under which Goal is executed (this may require the use of a widening operation to ensure termination). A detailed discussion on the different fixpoint computation methods is outside the scope of this work and we believe that it is not necessary for understanding our approach and contribution. The reader is referred to, for example Muthukumar and Hermenegildo (1990, 1992) for more details.
In the case that the current literal is not recursive but corresponds to a call to a predicate within the analysis scope, the associated call pattern is analyzed (Lines 6–7) following steps (i) to (vi) illustrated above using our example, with the clauses that unify with the literal.
Finally, if the literal to be analyzed does not correspond to any of the two cases discussed above and the analysis domain does not know how to abstract it either (Lines 9–11), the invocation to the abstractLiteral operation returns a fail atom. The analyzer then computes the top-most information for vars(Literal) by calling the top-most function (Line 12). Then, the original Call abstraction is extended with such top-most information. In our example, if the abstract domain did not implement how to abstract the unification [X|T1]=[Z,Y|T2], the top-most abstraction would be $\mathcal{P}^0(\{\mathtt{X,T1,T2,Y,Z}\})$ . Notice that this can be quite common when, for example, the analyzer has to process a call to a predicate which is imported from a library whose source code is not available, is not in the analysis scope, etc. In this case, the top-most abstraction has to be assumed to ensure correctness.
3 Environment reassociation and abstract environment trimming
Given a clause $H\: \mbox{:-} \: B_1,\ldots , B_n$ , its environment is the set of all the variables appearing in theclause, defined as $\mathtt{vars}(H)\cup \bigcup _{i=1}^{n} \mathtt{vars}(B_i)$ . As mentioned in Section 2, a set-sharing abstraction at any analysis point contains, at most, $2^V - 1$ sharing sets, where $V$ is the size of the environment. Intuitively, a clause should be faster to analyze if it has fewer variables. Since it is not possible to artificially reduce the number of variables present in a clause, we propose two techniques: rearranging the literals of the body into new predicates, and modifying the domain of the abstraction without altering the clause being analyzed.
3.1 Environment reassociation
Expression reassociation (Briggs and Cooper Reference Briggs and Cooper1994), also known as reordering or restructuring, is a technique that involves changing the grouping of terms in an expression without altering its overall value. It is used for optimization purposes, such as improving performance, reducing floating-point errors, eliminating redundancies, etc.
Given a clause $H\: \mbox{:-} \: B_1,\ldots , B_n$ , a partition is a collection $P_1,\ldots ,P_s$ such that each $P_j$ is a subsequence of consecutive literals, $P_j=B_{i}, B_{i+1},\ldots ,B_{k}$ with $1\leq i\lt k \leq n$ , such that given $P_j$ , $P_{j+1}$ with $j\in \{1,s-1\}$ : a) if the first element of $P_{j+1}$ is $B_k$ , then the last element of $P_{j}$ is $B_{k-1}$ , b) $B_1 \in P_1$ and c) $B_n \in P_s$ .
Folding each of the literals encapsulated by each $P_i$ generates new auxiliary predicates whose environments are smaller (or equal) than the environment of the original predicate. This procedure can be repeated recursively over the auxiliary predicates obtaining a new collection of predicates with reduced environments. Finally, our focus is to find an optimal partition. An optimal partition is a (possibly recursive) partition where the number of variables in the environments of each of the auxiliary predicates generated is minimal.
Consider the clause of predicate qplan/3 shown in Figure 1a (the meaning of the comments will be explained later). We refer to body literals by $\mathtt{L}_i$ , where $i$ is a position in the body of the clause. For example, $\mathtt{L}_4$ = $\mathtt{mark(P0,L,0,V1)}$ .
The collections $P_1$ = $\mathtt{L}_1,\ldots , \mathtt{L}_3$ , $P_2$ = $\mathtt{L}_4,\ldots , \mathtt{L}_6$ , and $P_3$ = $\mathtt{L}_7,\ldots , \mathtt{L}_9$ define a partition of the predicate qplan. This partition, when folded, generates three auxiliary predicates: $\mathtt{aux1}(\mathtt{P0,X0,Vg,N}) \: \mbox{:-} \: \mathtt{L}_1,\ldots , \mathtt{L}_3$ , $\mathtt{aux2}(\mathtt{P0,Vg,P2}) \: \mbox{:-} \: \mathtt{L}_4,\ldots , \mathtt{L}_6$ , and $\mathtt{aux3}(\mathtt{N,X0,P2,X,P}) \: \mbox{:-} \: \mathtt{L}_7,\ldots , \mathtt{L}_9$ , with environments containing 5, 6, and 6 variables respectively. Finally, Figure 1b presents a transformation of qplan obtained by recursively reassociating the predicate. Each auxiliary clause is annotated with the worst case size for any set-sharing abstraction traversing it. While in the original program the maximum size is $2^{12}$ - $1$ , it is reduced to $2^6$ - $1$ in the transformed program.
3.2 Abstract environment trimming
The technique of environment reassociation described before, allows obtaining, given a clause, a collection of clauses where the number of variables appearing in each one is reduced with respect to the original one. However, applying transformations over the program under analysis may not always be desirable. In this section we provide an alternative approach, where the domain of abstractions is dynamically modified as the analysis of a clause processes each body literal. Although the resulting abstraction domains might not be optimal, this technique avoids the transformations, since such dynamic domain modifications are performed as part of the abstract operations. Given a clause $\mathtt{Head}\: \mbox{:-} \:\mathtt{B}_{1},\ldots ,\mathtt{B}_{n}$ a variable X is a live variable while analyzing $\mathtt{B}_{i}$ (that we refer to as the analysis point $\mathtt{B}_{i}$ ) if $\mathtt{X} \in \mathtt{vars}(\mathtt{Head})$ or there exists $j, k$ $1 \leq j\leq i \leq k \leq n$ such that X belongs to both vars( $\mathtt{B}_{j}$ ) and vars( $\mathtt{B}_{k}$ ). Conversely, a variable $\mathtt{X}$ is a dead variable at analysis point $\mathtt{B}_{i}$ if it does not appear in the body after such point, that is $\mathtt{X} \not \in \bigcup _{j=i}^n \mathtt{vars}(\mathtt{B}_{j})$ . Our definition of liveness is similar to imperative programming, with the difference that variables are not reassigned and that variables become live on the first appearance, since logic variables do not need to be declared in the body of a clause (Aho et al. Reference Aho, Lam, Sethi and Ullman2006, pp. 608–610). Figure 1a shows, at each program point, which body variable lives or dies. In that sense it is reminiscent of the concept of environment trimming used in the Warren Abstract Machine (Warren Reference Warren1987; Ait-Kaci Reference Ait-Kaci1991), but more general, since variables also come in instead of only out, and the objective is of course different: optimizing abstract operations versus saving stack space.
Algorithm 2 presents the operations $LIVE-VARS$ and $DEAD-VARS$ to obtain the variables becoming alive and the ones which are dead at a given analysis point. $LIVE-VARS$ receives the set of current live variables (LiveVars), and the literal that is going to be analyzed ( $\mathtt{B}$ ), and returns the set of new live variables at that analysis point, that is the variables that appear in the literal but were not in LiveVars. The other operation, $DEAD-VARS$ , takes as input the set of current live variables (LiveVars), the variables of the head (HVars), and the set of literals that have not been analyzed yet ( $\{\mathtt{B}_{i},\ldots ,\mathtt{B}_{n}\}$ ), and produces as output a set containing the variables of LiveVars that do not appear in any of the literals nor belong to the clause head. More efficient procedures to determine the liveness of variables are possible. However, we checked experimentally that they do not offer substantial improvements, and thus we decided to keep these simpler definitions. With these auxiliary operations, the analyzer can determine, at each analysis point, whether a variable is live or dead. With this information, it is possible to restrict the domain of the abstractions to the set of live variables. To do so, the computation of the Success abstraction is modified slightly:
In this case, the Entry abstraction is obtained by directly computing callToEntry, instead of by augmenting the result of the callToEntry invocation, as was done in the schema presented in Section 2. Thus, in this approach, the domain of the Entry abstraction is exactly the set of head variables (which are the only variables alive at that analysis point). Finally, the function entryToExit presented in Algorithm 1 is modified so that it keeps the abstraction defined only over the live variables. Such a modified version is described by Algorithm 3. There, a set containing the live variables is carried around while analyzing a clause body ( $\{\mathtt{B}_{1},\ldots ,\mathtt{B}_{m}\}$ ). Such set is initialized with the variables of the clause head (Line 3), and updated by adding new variables to it as they become alive (Lines 6-7) and by removing the dead variables (Lines 21-22). Accordingly, the abstraction is augmented with the new live variables (Line 8) and reduced when some of them die (Line 23).
4 Experimental evaluation
We have conducted an extensive experimental study to assess the benefits of the techniques proposed, to which we will refer to here as reassociation and trimming for short. In particular, we measured, for a variety of programs, the effects of applying both techniques to a number of abstract domains that use set sharing: the classical set-sharing domain, share (Muthukumar and Hermenegildo Reference Muthukumar and Hermenegildo1989), the combination of sharing and freeness, shfr (Muthukumar and Hermenegildo Reference Muthukumar and Hermenegildo1991), and the combination of sharing and freeness including cliques, shfr-clique (Navas et al. Reference Navas, Bueno and Hermenegildo2006); the latter is the most efficient of the three, while share and shfr are, in general, more precise but slower. All experiments were performed on Debian/GNU Linux 12 (bookworm) 64bit (amd64) with 128 GB RAM, and 800 GB of disk space. We focus on analysis times since precision is unchanged.
We start by showing in Table 1 the detailed results for one of our benchmarks, the LPdoc documenter (Hermenegildo Reference Hermenegildo2000; Hermenegildo and Morales Reference Hermenegildo and Morales2011), which is a relatively large, real-world application, and whose results are typical. The LPdoc source code is composed of 26 modules that exhibit different challenges: for some of them analysis terminates in times that range from a few milliseconds to several minutes, while others cannot be analyzed, either because of a timeout, set for these experiments at 5 min per module, or by running out of memory. In either of these cases we will say that the analysis fails. For each abstract domain, Column TC shows the total time that the classical domain requires to analyze the modules, columns TR and TT show the total analysis times when applying reassociation and trimming respectively to the corresponding classical domain, and columns $\rho$ R and $\rho$ T present the relative speedup computed as TC/TT and TC/TR. Total analysis times are the addition of the times for the loading of the module, the pre-processing (including the transformation required by the reassociation method), and the actual analysis time. Some statistics are included at the bottom of the table: the total number of modules (Mods) and the number of modules for which the different analyses fail, for the classical approach (FC), when applying reassociation (FR), and when applying trimming (FT). Finally, $\mu$ T and $\mu$ R show the mean of the speedups obtained. We can see that applying reassociation and trimming allows analyzing a significant number of modules that could not be analyzed with the classical techniques. In particular, when applied to the shfr-clique domain they achieve a full analysis of the LPdoc source code. The mean speedups show that trimming outperforms reassociation in this application. This is due to the significant improvements that trimming brings to the modules images and refsdb. Specifically, in the case of images, the trimming approach significantly reduces analysis times for the share and shfr domains, resulting in speedups comparable to those achieved by shfr-clique. However, trimming introduces a slow-down when applied to the shfr-clique domain. Conversely, reassociation does not provide any significant changes. In the texinfo module, reassociation performs 10 times faster than trimming, while in the remaining modules, both methods behave similarly. Our hypothesis is that the overhead introduced by the transformation is what causes reassociation to slightly under-perform in some cases.
The proposed methods were also evaluated across a set of classic benchmarks (see Table C.1 in the appendices). The benchmarks include a variety of examples, ranging from simple predicates that feature only direct recursions, such as qsort and append, to more complex cases with mutual recursions and elaborate aliasing. Some benchmarks are extracted from real-world programs. For example, aikal is part of an analyzer for the AKL language, while read and rdtok are Prolog parsers. Additionally, parts of actual programs are also included, such as ann (the &-Prolog parallelizer), qplan (the core of the Chat-80 application), and witt (a conceptual clustering application). As expected, in these comparatively less challenging programs the advantages obtained are smaller, but they are still significant. For instance, using trimming and reassociation bring mean relative speedups of $1.1$ and $1.06$ respectively for the share domain; $0.95$ and $1.5$ for shfr; and $2.2$ and $1.97$ for shfr-clique.
Most of these modules already required small analysis times (less than $0.1$ s, which may indicate they offer fewer opportunities for optimization). The improvements are most significant for qplan, which had the largest analysis times to begin with: $5.7$ s (share), $1$ s (shfr), and $5$ s (shfr-clique). In this case, trimming and reassociation bring analysis times of $1.1$ and $0.5$ s, respectively, with share; and $0.3$ and $0.2$ s with shfr. Moreover, with shfr-clique both techniques bring the time down to $0.1$ s ( $50\times$ speedup, the largest one). The full experimental results can be found in Appendix C.1.
Given the positive results, in a second phase of our experimentation we decided to greatly expand the code base used to measure the impact of applying abstract environment trimming, in order to explore whether the improvements obtained for the classic benchmarks and the LPdoc application would translate much more widely. All in all we have analyzed around 1200 program modules. These include, in addition to the previously mentioned LPdoc documenter (28 modules) and classic benchmarks (26 modules): all the libraries distributed with the Ciao system (Hermenegildo et al. Reference Hermenegildo, Bueno, Carro, Lopez-Garcia, Mera, Morales and Puebla2012) (comprising more than 1000 modules), CiaoPP’s program analyzer (Hermenegildo et al. Reference Hermenegildo, Puebla, Bueno and Lopez-Garcia2003) PLAI (56 modules), s(CASP) (Arias et al. Reference Arias, Carro, Salazar, Marple and Gupta2018), a top-down interpreter for ASP programs with constraints (44 modules), and Spectector (Guarnieri et al. Reference Guarnieri, Köpf, Morales, Reineke and Sánchez2020), a tool for automatically detecting leaks caused by speculatively executed instructions in x64 assembly programs (15 modules). The $\approx$ 1000 library modules in the standard Ciao distribution come from many sources including libraries ported from SICStus (Carlsson and Mildner Reference Carlsson and Mildner2012); libraries common with Yap (Santos Costa et al. Reference Santos Costa, Rocha and Damas2012), XSB (Swift and Warren Reference Swift and Warren2012), SWI (Wielemaker et al. Reference Wielemaker, Schrijvers, Triska and Lager2012), etc., including those from the Prolog Commons project (https://prolog-commons.org/); the classic libraries from O’Keefe and those for Constraint Logic Programming over Rationals and Reals (Holzbaur Reference Holzbaur1995); libraries for implementing language extensions such as functional syntax or the Ciao assertion language; etc., developed by around 100 different programmers. We argue that this selection of benchmarks constitutes a good representation of real-life code written in Prolog. The detailed results of these experiments can be found in Appendix C, while here we present them in a more manageable aggregated form.
We first present in Figure 2 a cactus plot of the results. Cactus plots display, on the $X$ -axis, the number of benchmarks that can be analyzed, that is those for which the analysis does not time out or run out of memory, and, on the $Y$ -axis, the accumulated analysis time. The plots for each abstract domain and analysis technique are generated by taking all the analysis times, sorting them in increasing order, and then plotting them following the formula $(i, t_i)$ where $t_i$ is the accumulated time in the $i$ -th position. This way, the analysis time of the benchmark corresponding to the $i+1$ -th position is $t_{i+1} - t_i$ . Darker colors represent abstract domains with trimming, while lighter colors represent domains with classic analyses. Empty circles correspond to the classic set-sharing domain, crosses to shfr, and stars to shfr-clique. The module compilation times are given for comparison, represented by gray triangles –resulting in the gray vertical line. The plot has been zoomed in to exclude points with $X$ -values (numbers of benchmarks) less than $800$ , as the most interesting information is in the more challenging benchmarks beyond, that require the highest analysis times. Also, plots corresponding to modules for which the analysis fails are excluded. This figure allows us to observe that when the domains are used with abstract trimming, they perform better than their classical counterparts. Specifically, share and shfr reduce the number of modules they are unable to analyze from $134$ and $130$ to $81$ and $74$ , respectively, corresponding to a $39.55$ % and $43.07$ % improvement. For shfr-clique applying abstract environment trimming results in failure to analyze only $21$ modules versus the $47$ that the classical approach failed to analyze, while reducing the accumulated time by $8.36$ min. This translates to a reduction in timeouts by $55.32$ % while reducing the accumulated time by $34.3$ %. We have also obtained mean speedups of $5.82$ , $5.91$ , and $22.08$ when analyzing with share, shfr and shfr-clique respectively. Moreover, the number of modules that can be analyzed in a time lower than the compilation time also increases.
The cactus plot shows how the analysis results accumulate. In order to show how these results relate individually, Figure 3 presents a scatter plot displaying the time required to analyze each module. Given a point $(x, y)$ , the value in $x$ corresponds to the time required by the classical analysis to analyze a given module, while $y$ corresponds to the time required to analyze that same module using abstract trimming. Modules that are not analyzed with the classic approach or with both are not displayed. If the points are close to the orange line, it means that the times are very similar; if they are above the line, it means that abstract trimming introduces an overhead; if they are below, abstract trimming speeds up the analysis. To complement this information, Figure 4 presents a scatter plot displaying a comparison between the time required to analyze each module with the classical analysis ( $X$ -axis) and the speedup obtained by applying the abstract trimming technique ( $Y$ -axis). The speedup values range from very close to zero to significantly larger numbers (see the 0.09 and the 1098 speedup obtained by abstract trimming when analyzing the “errors” module with sharefree-clique and the “images” module with shfr, as shown in Table 1). To better represent these values, we have applied a base 10 logarithm to the resulting speedups. Thus, values between 0 and 1 become negative (with larger absolute values the closer they are to 0), while values greater than 1 are scaled in the positive plane. The most significant performance improvements are observed in the right-most side of the figures, corresponding to the modules where the classical approach takes more time. Conversely, in cases where the classical approach is very fast (left-most part of the figures), the technique of abstract trimming does not yield many speedups but introduces some overheads, which are however small. Another observation is that in most benchmarks, the analysis times are quite low, but of course our target has been the rest that present significant challenges.
To concentrate on this set we have collected the speedup results considering only the benchmarks taking more than 0.1, 0.5, and 1 s when analyzed with share (which represents the slower domain). These results are presented in Table 2. For example, the row starting with “ $\mu$ T>1s (70-74-160)” shows the mean speedup for the benchmarks successfully analyzed such that the time required to analyze them with share is greater than 1 s or share times out. The results show that in the case of share, 70 of these more challenging modules are successfully analyzed with both trimming and reassociation (note that both approaches need to succeed in order to compute the means), 74 in the case of shfr, and 160 in the case of shfr-clique. Similar very positive results are obtained for the other cases.
5 Conclusions
We have proposed a number of techniques for addressing the scalability problems inherent to set-sharing analyses. We have focused on the root of the problem: the potentially exponential dependency of the size of the abstractions on the number of variables. We have cast this problem as an instance of expression reassociation and provided an optimal solution using program transformations. Additionally, we have proposed a practical solution that can be integrated into top-down analyzers, based on the liveness of variables in the body of the clause being analyzed. We have conducted an extensive experimental evaluation of over 1100 program modules taken from both production code and classical benchmarks. We have obtained significant speedups, and, more importantly, the number of modules that require a timeout was cut in half. As a result, many more programs can be analyzed precisely in reasonable times. We believe that the results obtained suggest that the proposed local technique improves significantly the scalability of set-sharing analyses, and can thus enhance the practicality of top-down set-sharing analysis for production code. As a possible avenue for future work, note that the definition of live variables used in this work is local to each clause of the predicate being analyzed. Future lines of work could explore a more global notion that also considers the calls to predicates within the clause under analysis. This will presumably incur additional cost but could also possibly allow further reduction in the size of the domains of the sharing abstractions.
Competing interests
The authors declare that they have no competing interests.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S1471068424000358