Table Of Content7
1
0
2
n
a
J
Efficiently Computing Provenance Graphs for
0
2
Queries with Negation
]
B
D
. Seokki Lee, Sven Ko¨hler, Bertram Luda¨scher, Boris Glavic
s
c
[
1
IIT DB Group Technical Report
v
9 IIT/CS-DB-2016-03
9
6
5
0
2016-10
.
1
0
7
1
:
v
i
X
r
a
http://www.cs.iit.edu/∼dbgroup/
LIMITED DISTRIBUTION NOTICE: The research presented in this report may be submitted as a whole or in parts for
publication and will probably be copyrighted if accepted for publication. It has been issued as a Technical Report for early
dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IIT-DB
prior to publication should be limited to peer communications and specific requests. After outside publication, requests should
be filled only by reprints or legally obtained copies of the article (e.g. payment of royalties).
Efficiently Computing Provenance Graphs for
Queries with Negation
Seokki Lee∗ Sven Ko¨hler† Bertram Luda¨scher‡ Boris Glavic∗
∗Illinois Institute of Technology. {[email protected], [email protected]}
‡University of Illinois at Urbana-Champaign. {[email protected]}
†University of California at Davis. {[email protected]}
Abstract—Explainingwhyananswerisintheresultofaquery r1 :Q(X,Y):−Train(X,Z),Train(Z,Y),¬Train(X,Y)
or why it is missing from the result is important for many RelationTrain
ResultofqueryQ
applications including auditing, debugging data and queries, fromCity toCity
and answering hypothetical questions about data. Both types of s c nneewwyyoorrkk waschhinicgatgoondc washinXgtondc chiYcago
questions, i.e., why and why-not provenance, have been studied chicago seattle newyork seattle
extensively. In this work, we present the first practical approach w n seattle chicago seattle seattle
for answering such questions for queries with negation (first- washingtondc seattle chicago chicago
orderqueries).OurapproachisbasedonarewritingofDatalog Fig. 1: Example train connection database and query
rules(calledfiringrules)thatcapturessuccessfulrulederivations
within the context of a Datalog query. We extend this rewriting Q(n,s)
tosupportnegationandtocapturefailedderivationsthatexplain
missinganswers.Givena(whyorwhy-not)provenancequestion, r1(n,s,w) r1(n,s,c)
wecomputeanexplanation,i.e.,thepartoftheprovenancethatis g1(n,w) g2(w,s) g3(n,s) g1(n,c) g2(c,s)
1 1 1 1 1
relevanttoanswerthequestion.Weintroduceoptimizationsthat
prunepartsofaprovenancegraphearlyonifwecandetermine T(n,w) T(w,s) T(n,s) T(n,c) T(c,s)
thattheywillnotbepartoftheexplanationforagivenquestion.
Fig. 2: Provenance graph explaining WHYQ(n,s)
We present an implementation that runs on top of a relational
database using SQL to compute explanations. Our experiments
demonstrate that our approach scales to large instances and
significantly outperforms an earlier approach which instantiates (FO) queries (i.e., non-recursive Datalog with negation) and
the full provenance to compute explanations.
an efficient method for explaining a (missing) answer using
SQL. Our approach is based on the observation that typically
I. INTRODUCTION
only a part of provenance, which we call explanation in this
Provenance for relational queries records how results of a work, is actually relevant for answering the user’s provenance
query depend on the query’s inputs. This type of information question about the existence or absence of a result.
can be used to explain why (and how) a result is derived by a
Example 1. Consider the relation Train in Fig.1 that stores
query over a given database. Recently, approaches have been
trainconnectionsintheUS.TheDatalogruler inFig.1com-
developed that use provenance-like techniques to explain why 1
puteswhichcitiescanbereachedwithexactlyonetransfer,but
atuple(orasetoftuplesdescribeddeclarativelybyapattern)
notdirectly.Weusethefollowingabbreviationsinprovenance
is missing from the query result. However, the two problems
graphs:T=Train;n=NewYork;s=Seattle;w=Washington
of computing provenance and explaining missing answers
DC and c = Chicago. Given the result of this query, the user
have been treated mostly in isolation. A notable exception
may be interested to know why he/she is able to reach Seattle
is [22] which computes causes for answers and non-answers.
from New York (WHYQ(n,s)) with one intermediate stop but
However, the approach requires the user to specify which
not directly or why it is not possible to reach New York from
missing inputs to consider as causes for a missing output.
Seattle in the same fashion (WHYNOTQ(s,n)).
Capturing provenance for a query with negation necessitates
the unification of why and why-not provenance, because to An explanation for either type of question should justify
explainaresultofthequerywehavetodescribehowexisting the existence (absence) of a result as the success (failure) to
and missing intermediate results (via positive and negative derive the result through the rules of the query. Furthermore,
subqueries,respectively)leadtothecreationoftheresult.This it should explain how the existence (absence) of tuples in the
has also been recognized by Ko¨hler et al. [20]: asking why a database caused the derivation to succeed (fail). Provenance
tuple t is absent from the result of a query Q is equivalent graphsprovidingthistypeofjustificationforWHYQ(n,s)and
to asking why t is present in ¬Q. Thus, a provenance model WHYNOTQ(s,n) are shown in Fig.2 and Fig.3, respectively.
thatsupportsquerieswithnegationnaturallysupportswhy-not Therearethreetypesofgraphnodes:rulenodes(boxeslabeled
provenance.Inthispaper,wepresentaframeworkthatanswers with a rule identifier and the constant arguments of a rule
why and why-not questions for queries with negation. To this derivation), goal nodes (rounded boxes labeled with a rule
end, we introduce a graph model for provenance of first-order identifier and the goal’s position in the rule’s body), and tuple
Q(s,n) in Fig.3) because there exists no connection from Seattle to
WashingtonDC(atuplenodeT(s,w)inred),andWashington
r1(s,n,w) r1(s,n,c) r1(s,n,s) r1(s,n,n)
DC to New York (a tuple node T(w,n) in red). Note that the
g1(s,w) g2(w,n) g2(c,n) g1(s,s) g2(s,n) g1(s,n) g2(n,n) successful goal ¬T(s,n) (there is no direct connection from
1 1 1 1 1 1 1
Seattle to New York) does not contribute to the failure of this
T(s,w) T(w,n) T(c,n) T(s,s) T(s,n) T(n,n)
derivation and, thus, is not part of the explanation. A failed
Fig. 3: Provenance graph explaning WHYNOTQ(s,n) negatedgoalisexplainedbyanexistingtupleinthedatabase.
Thatis,ifatuple(s,n)wouldexistintheTrainrelation,then
an additional failed goal node g3(s,n) would be part of the
1
nodes (ovals). In these provenance graphs, nodes are either explanation and be connected to each failed rule derivation.
colored as green (successful/existing) or red (failed/missing).
Overview and Contributions. Provenance games [20], a
Example 2. Consider the explanation (provenance graph in game-theoretical formalization of provenance for first-order
Fig.2) for question WHYQ(n,s). Seattle can be reached from (FO) queries, also supports queries with negation. However,
New York by either stopping in Washington DC or Chicago theapproachiscomputationallyexpensive,becauseitrequires
and there is no direct connection between these two cities. instantiationofaprovenancegraphexplainingallanswersand
Thesetwooptionscorrespondtotwosuccessfulderivationsfor missinganswers.Forinstance,theprovenancegraphproduced
rule r with X=n, Y=s, and Z=w (or Z=c, respectively). by this approach for our toy example already contains more
1
In the provenance graph, there are two rule nodes denoting than 64 (=43) nodes (i.e., only counting nodes corresponding
thesesuccessfulderivationsofQ(n,s)byruler .Aderivation to rule derivations), because there are 43 ways of binding
1
is successful if all goals in the body evaluate to true, i.e., values from adom(I)={c,n,s,w} to the 3 variables (X, Y,
a successful rule node is connected to successful goal nodes and Z) of the rule r . Typically, most of the nodes will not
1
(e.g., r is connected to g1, the 1st goal in the rule’s body). endupbeingpartoftheexplanationfortheuser’sprovenance
1 1
A positive (negated) goal is successful if the corresponding question.Toefficientlycomputetheexplanation,weintroduce
tuple is (is not) in the database. Thus, a successful goal node anewDatalogprogramwhichcomputespartoftheprovenance
is connected to the node corresponding to the existing (green) graph of an explanation bottom-up. Evaluating this program
or missing (red) tuple justifying the goal, respectively. over instance I returns the edge relation of an explanation.
The main driver of our approach is a rewriting of Datalog
Supporting negation and missing answers is quite challeng-
rules that captures successful and failed rule derivations. This
ing,becauseweneedtoenumerateallpotentialwaysofderiv-
rewritingreplacestherulesoftheprogramwithso-calledfiring
ing a missing answer (or intermediate result corresponding to
rules. Firing rules for positive queries were first introduced
a negated subgoal) and explain why each of these derivations
in [19]. These rules are similar to other query instrumentation
has failed. An important question in this respect is how to
techniquesthathavebeenusedforprovenancecapturesuchas
bound the set of missing answers to be considered. Under the
therewriterulesofPerm[13].Oneofourmajorcontributions
open world assumption, provenance would be infinite. Using
is to extend this concept for negation and failed rule deriva-
the closed world assumption, only values that exist in the
tions which is needed to support Datalog with negation and
database or are postulated by the query are used to construct
missing answers. Firing rules provide sufficient information
missing tuples. As is customary in Datalog, we refer to this
forconstructingexplanations.However,tomakethisapproach
set of values as the active domain adom(I) of a database
efficient, we need to avoid capturing rule derivations that will
instance I. We will revisit the assumption that all derivations
not contribute an explanation, i.e., they are not connected to
with constants from adom(I) are meaningful later on.
the nodes corresponding to the provenance question in the
Example3. TheexplanationforWHYNOTQ(s,n)isshownin provenancegraph.Weachievethisbypropagatinginformation
Fig.3, i.e., why it is not true that New York is reachable from from the user’s provenance question throughout the query to
Seattle with exactly one transfer, but not directly. The tuple prune rule derivations early on 1) if they do not agree with
Q(s,n) is missing from the query result because all potential the constants in the question or 2) if we can determine that
ways of deriving this tuple through the rule r have failed. In basedontheirsuccess/failurestatustheycannotbepartofthe
1
thisexample,adom(I)={c,n,s,w}and,thus,thereexistfour explanation.Forinstance,inourrunningexample,Q(n,s)can
failed derivations of Q(s,n) choosing either of these cities as onlybeconnectedtosuccessfulderivationsoftheruler1 with
the intermediate stop between Seattle and New York. A rule X=nandY=s.Wehavepresentedaproof-of-conceptversion
derivation fails if at least one goal in the body evaluates to of our approach as a poster [21]. Our main contributions are:
false.Intheprovenancegraph,onlyfailedgoalsareconnected • We introduce a provenance graph model for full first-
to the failed rule derivations explaining missing answers. order (FO) queries, expressed as non-recursive Datalog
Failedpositivegoalsinthebodyofafailedruleareexplained queries with negation (or Datalog for short).
by missing tuples (red tuple nodes). For instance, we cannot • We extend the concept of firing rules to support negation
reach New York from Seattle with an intermediate stop in and missing answers.
Washington DC (the first failed rule derivation from the left • We present an efficient method for computing explana-
tionstoprovenancequestions.Unlikethesolutionin[20], variables,weuser(c ,...,c )todenotethederivationthatis
1 n
our approach avoids unnecessary work by focusing the the result of binding X =c . We call a derivation successful
i i
computation on relevant parts of the provenance graph. wrt. an instance I if each atom in the body of the rule is true
• We prove the correctness of our algorithm that computes in I and failed otherwise.
the explanation to a provenance question.
B. Negation and Domains
• We present a full implementation of our approach in
the GProM [1] system. Using this system, we compile To be able to explain why a tuple is missing, we have to
Datalog into relational algebra expressions, and translate enumerateallfailedderivationsofthistupleand,foreachsuch
the expressions into SQL code that can be executed by a derivation, explain why it failed. As mentioned in Sec. I, the
standard relational database backend. question is what is a feasible set of potential answers to be
The remainder of this paper is organized as follows. We considered as missing. While the size of why-not provenance
formally define the problem in Sec.II, discuss related work in istypicallyinfiniteundertheopenworldassumption,wehave
Sec.III, and present our approach for computing explanations to decide how to bound the set of missing answers in the
in Sec.IV. We then discuss our implementation (Sec.V), closed world assumption. We propose a simple, yet general,
present experiments (Sec.VI), and conclude in Sec.VII. solution by assuming that each attribute of an IDB or EDB
relation has an associated domain.
II. PROBLEMDEFINITION
Definition 1 (Domain Assignment). Let S ={R ,...,R } be
We now formally define the problem addressed in this 1 n
a database schema where each R (A ,...,A ) is a relation
work: how to find the subgraph of a provenance graph for i 1 m
schema. Given an instance I of S, a domain assignment dom
a given query (input program) P and instance I that explains
is a function that associates with each attribute R.A a domain
existence/absence of a tuple in/from the result of P.
of values. We require dom(R.A)⊇adom(R.A).
A. Datalog
In our approach, the user specifies each dom(R.A) as a
A Datalog program P consists of a finite set of rules query dom that returns the set of admissible values for
R.A
ri : R(X(cid:126)):−R1(X(cid:126)1),..., Rn(X(cid:126)n) where X(cid:126)j denotes a tuple the domain of attribute R.A. We provide reasonable defaults
of variables and/or constants. We assume that the rules of to avoid forcing the user to specify dom for every attribute,
a program are labeled r1 to rm. R(X(cid:126)) is the head of the e.g., dom(R.A)=adom(R.A) for unspecified domR.A. These
rule, denoted head(ri), and R1(X(cid:126)1),...,Rn(X(cid:126)n) is the body associated domains fulfill two purposes: 1) to reduce the
(each Rj(X(cid:126)j) is a goal). We use vars(ri) to denote the size of explanations and 2) to avoid semantically meaningless
set of variables in ri. In this paper, consider non-recursive answers. For instance, if there would exist another attribute
Datalogwithnegation,sogoalsRj(X(cid:126)j)inthebodyareliterals, Price in the relation Train in Fig.1, then adom(I) would
i.e., atoms A(X(cid:126) ) or their negation ¬A(X(cid:126) ). Recursion is not also include all the values that appear in this attribute. Thus,
j j
allowed. All rules r of a program have to be safe, i.e., every some failed rule derivations for r would assign prices to the
1
variable in r must occur positively in r’s body (thus, head variable representing intermediate stops. Different attributes
variables and variables in negated goals must also occur in a may represent the same type of entity (e.g., fromCity and
positivegoal).Forexample,Fig.1showsaDatalogquerywith toCity in our example) and, thus, it would make sense to
a single rule r . Here, head(r ) is Q(X,Y) and vars(r ) is use their combined domain values when constructing missing
1 1 1
{X,Y,Z}. The rule is safe since the head variables and the answers.Fornow,weleaveituptotheusertospecifyattribute
variables in the negated goal also occur positively in the body domains. Using techniques for discovering semantic relation-
(X and Y in both cases). The set of relations in the schema ships among attributes to automatically determine feasible
over which P is defined is referred to as the extensional attribute domains is an interesting avenue for future work.
database (EDB), while relations defined through rules in P When defining provenance graphs in the following, we
form the intensional database (IDB), i.e., the IDB relations are only interested in rule derivations that use constants
are those defined in the head of rules. We require that P has from the associated domains of attributes accessed by the
a distinguished IDB relation Q, called the answer relation. rule. Given a rule r and variable X used in this rule, let
GivenP andinstanceI,weuseP(I)todenotetheresultofP attrs(r,X) denote the set of attributes that variable X is
evaluated over I. Note that P(I) includes the instance I, i.e., bound to in the body of the rule. For instance, in Fig.1,
allEDBatomsthataretrueinI.ForanEDBorIDBpredicate attrs(r ,Z)={Train.fromCity,Train.toCity}. We say a
1
R, we use R(I) to denote the instance of R computed by P rule derivation r(c ,...,c ) is domain grounded iff c ∈
1 n i
(cid:84)
and R(t)∈P(I) to denote that t∈R(I) according to P. dom(A) for all i∈{1,...,n}.
A∈attrs(r,Xi)
We use adom(I) to denote the active domain of instance
C. Provenance Graphs
I, i.e., the set of all constants that occur in I. Similarly, we
use adom(R.A) to denote the active domain of attribute A of Provenance graphs justify the existence or absence of a
relation R. In the following, we make use of the concept of a query result based on the success or failure of derivations
rule derivation. A derivation of a rule r is an assignment of using a query’s rules, respectively. They also explain how
variables in r to constants from adom(I). For a rule with n the existence or absence of tuples in the database caused
derivations to succeed or fail, respectively. Here, we present a compact provenance graphs. For instance, observe that the
constructive definition of provenance graphs that provide this node g3(n,s) is shared by two rule nodes in the explanation
1
type of justification. Nodes in these graphs carry two types of shown in Fig2.
labels: 1) a label that determines the node type (tuple, rule,
D. Questions and Explanations
or goal) and additional information, e.g., the arguments and
rule identifier of a derivation; 2) the success/failure status of Recall that the problem we address in this work is how
nodes. to explain the existence or absence of (sets of) tuples using
provenancegraphs.Suchasetoftuplesiscalledaprovenance
Definition 2 (ProvenanceGraph). LetP beafirst-order(FO)
question (PQ) in this paper. The two questions presented in
query,I adatabaseinstance,dom adomainassignmentforI,
Example1useconstantsonly,butwealsosupportprovenance
andLthedomaincontainingallstrings.Theprovenancegraph
questionswithvariables,e.g.,foraquestionQ(n,X)wewould
PG(P,I)isagraph(V,E,L,S)withnodesV,edgesE,and
return all explanations for existing or missing tuples where
nodelabellingfunctionsL:V →LandS :V →{T,F}.We
the first attribute is n, i.e., why or why-not a city X can be
require that ∀v,v(cid:48) ∈ V : L(v) = L(v(cid:48)) → v = v(cid:48). PG(P,I)
reachedfromNewYorkwithonetransfer,butnotdirectly.We
is defined as follows:
say a tuple t(cid:48) of constants matches a tuple t of variables and
• Tuple nodes: For each n-ary EDB or IDB predicate R constants written as t(cid:48) (cid:50) t if we can unify t(cid:48) with t, i.e., we
and tuple (c ,...,c ) of constants from the associated
1 n can equate t(cid:48) with t by applying a valuation that substitutes
domains (c ∈dom(R.A )), there exists a node v labeled
i i variables in t with constants from t(cid:48).
R(c ,...,c ). S(v) = T iff R(c ,...,c ) ∈ P(I) and
1 n 1 n
S(v)=F otherwise. Definition 3 (Provenance Question). Let P be a query, I an
• Rule nodes: For every successful domain grounded instance, and Q an IDB predicate. A provenance question
derivation r (c ,...,c ), there exists a node v in V la- PQ is an atom Q(t) where t = (v ,...,v ) is a tuple
i 1 n 1 n
beled r (c ,...,c ) with S(v)=T. For every failed do- consisting of variables and constants from the associated
i 1 n
main grounded derivation ri(c1,...,cn) where head(ri domain (dom(Q.A) for each attribute Q.A). WHYQ(t) and
(c1,...,cn))(cid:54)∈P(I), there exists a node v as above but WHYNOTQ(t) restrict the question to existing and missing
withS(v)=F.Inbothcases,v isconnectedtothetuple tuples t(cid:48) (cid:50)t, respectively.
node head(r (c ,...,c )).
i 1 n In Example1, we have presented subgraphs of PG(P,I) as
• Goal nodes: Let v be the node corresponding to a
explanationsforPQs,implicitlyclaimingthatthesesubgraphs
derivation r (c ,...,c ) with m goals. If S(v) = T,
i 1 n aresufficientforexplainingthePQ.Below,weformallydefine
then for all j ∈ {1,...,m}, v is connected to a goal
this type of explanation.
node v labeled gj with S(v ) = T. If S(v) = F, then
j i j
for all j ∈{1,...,m}, v is connected to a goal node vj Definition4(Explanation). TheexplanationEXPL(P,Q(t),I)
withS(v )=F ifthejth goalisfailedinr (c ,...,c ). for Q(t) (PQ) according to P and I, is the subgraph of
j i 1 n
Each goal is connected to the corresponding tuple node. PG(P,I)containingonlynodesthatareconnectedtoatleast
one node Q(t(cid:48)) where t(cid:48) (cid:50) t. For WHYQ(t), only existing
Our provenance graphs model query evaluation by con-
tuples t(cid:48) are considered to match t. For WHYNOTQ(t) only
struction. A tuple node R(t) is successful in PG(P,I) iff
missing tuples are considered to match t.
R(t) ∈ P(I). This is guaranteed, because each tuple built
fromvaluesoftheassociateddomainexistsasanodev inthe Given this definition of explanation, note that 1) all nodes
graphanditslabelS(v)isdecidedbasedonR(t)∈P(I).Fur- connected to a tuple node matching the PQ are relevant for
thermore, there exists a successful rule node r((cid:126)c)∈PG(P,I) computingthistupleand2)onlynodesconnectedtothisnode
iff the derivation r((cid:126)c) succeeds for I. Likewise, a failed rule are relevant for the outcome. Consider t(cid:48) where t(cid:48) (cid:50) t for a
node r((cid:126)c) exists iff the derivation r((cid:126)c) is failed over I and Q(t) (PQ). If Q(t(cid:48)) ∈ P(I), then all successful derivations
head(r((cid:126)c))(cid:54)∈P(I). Fig.2 and3 show subgraphs of PG(P,I) with head Q(t(cid:48)) justify the existence of t(cid:48) and these are
for the query from Fig.1. Since Q(n,s) ∈ P(I) (Fig.2), precisely the rule nodes connected to Q(t(cid:48)) in PG(P,I). If
this tuple node is connected to all successful derivations with Q(t(cid:48))(cid:54)∈P(I),thenallderivationswithheadQ(t(cid:48))havefailed
Q(n,s) in the head which in turn are connected to goal nodes and are connected to Q(t) in the provenance graph. Each
for each of the three goals of rule r . Q(s,n)∈/ P(I) (Fig.3) such derivation is connected to all of its failed goals which
1
and, thus, its node is connected to all failed derivations with are responsible for the failure. Now, if a rule body references
Q(s,n)asahead.Here,wehaveassumedthatallcitiescanbe IDB predicates, then the same argument can be applied to
considered as starting and end points of missing train connec- reason that all rules directly connected to these tuples explain
tions, i.e., both dom(T.fromCity) and dom(T.toCity) are why they (do not) exist. Thus, by induction, the explanation
defined as adom(T.fromCity)∪adom(T.toCity). Thus, we contains all relevant tuple and rule nodes that explain the PQ.
have considered derivations r (s,n,Z) for Z ∈{c,n,s,w}.
1
Animportantcharacteristicofourprovenancegraphsisthat III. RELATEDWORK
eachnodev inagraphisuniquelyidentifiedbyitslabelL(v). Our provenance graphs have strong connections to other
Thus, common subexpressions are shared leading to more provenance models for relational queries, most importantly
provenance games and the semiring framework, and to ap- explain a missing answer based on the query [3], [4], [6],
proaches for explaining missing answers. [25] (i.e., which operators filter out tuples that would have
contributed to the missing answer) or based on the input
ProvenanceGames.Provenancegames[20]modeltheevalu-
data [16], [17] (i.e., what tuples need to be inserted into
ationofagivenquery(inputprogram)P overaninstanceI as
the database to turn the missing answer into an answer).
a 2-player game in a way that resembles SLD(NF) resolution.
The missing answer problem was first stated for query-based
If the position (a node in the game graph) corresponding to
explanations in the seminal paper by Chapman et al. [6].
a tuple t is won (the player starting in this position has a
Huang et al. [17] first introduced an instance-based approach.
winning strategy), then t ∈ P(I) and if the position is lost,
Sincethen,severaltechniqueshavebeendevelopedtoexclude
then t (cid:54)∈ P(I). By virtue of supporting negation, provenance
spuriousexplanations,tosupportlargerclassesofqueries[16],
games can uniformly answer why and why-not questions for
and to support distributed Datalog systems in Y! [26]. The
queries with negation. However, provenance games may be
approachesforinstance-basedexplanations(withtheexception
hardtocomprehendfornon-powerusersastheyrequiresome
ofY!)haveincommonthattheytreatthemissinganswerprob-
backgroundingametheorytobeunderstood,e.g.,thewon/lost
lem as a view update problem: the missing answer is a tuple
status of derivations in a provenance game is contrary to
thatshouldbeinsertedintoaviewcorrespondingtothequery
what may be intuitively expected. That is, a rule node is
andthisinserthastobetranslatedtoaninsertintothedatabase
lost if the derivation is successful. The status of rule nodes
instance.Anexplanationisthenoneparticularsolutiontothis
in our provenance graphs matches the intuitive expectation
view update problem. In contrast to these previous works, our
(e.g., the rule node is successful if the derivation exists).
provenance graphs explain missing answers by enumerating
Ko¨hler et al. [20] also present an algorithm that computes the
all failed rule derivations that justify why the answer is not
provenancegameforP andI.However,thisapproachrequires
in the result. Thus, they are arguably a better fit for use cases
instantiation of the full game graph (which enumerates all
such as debugging queries, where in addition to determining
existing and missing tuples) and evaluation of a recursive
Datalog¬ program over this graph using the well-founded which missing inputs justify a missing answer, the user also
needstounderstandwhyderivationshavefailed.Furthermore,
semantics [11]. In constrast, our approach computes expla-
we do support queries with negation. Importantly, solutions
nations that are succinct subgraphs containing only relevant
for view update missing answer problems can be extracted
provenance. We use bottom-up evaluation instrumented with
from our provenance graphs. Thus, in a sense, provenance
firingrulestocaptureprovenance.Furthermore,weenablethe
graphs with our approach generalize some of the previous
user to restrict provenance for missing answers.
approaches(fortheclassofqueriessupported,e.g.,wedonot
Database Provenance. Several provenance models for
supportaggregationyet).Interestingly,recentworkhasshown
database queries have been introduced in related work, e.g.,
that it may be possible to generate more concise summaries
see [7], [18]). The semiring annotation framework generalizes
of provenance games [12], [24] which is particularly useful
thesemodelsforpositiverelationalalgebra(and,thus,positive
for negation and missing answers to deal with the potentially
non-recursive Datalog). In this model, tuples in a relation are
largesizeoftheresultingprovenance.Similarly,somemissing
annotated with elements from a commutative semiring K. An
answer approaches [16] use c-tables to compactly represent
essential property of the K-relational model is that semiring
setsofmissinganswers.Theseapproachesarecomplementary
N[X],thesemiringofprovenancepolynomials,generalizesall
to our work.
other semirings. It has been shown in [20] that provenance
games generalize N[X] for positive queries and, thus, all ComputingProvenanceDeclaratively.Theconceptofrewrit-
ingaDatalogprogramusingfiringrulestocaptureprovenance
other provenance models expressible as semirings. Since our
as variable bindings of rule derivations was introduced by
graphs are equivalent to provenance games in the sense that
Ko¨hler et al. [19] for provenance-based debugging of positive
there exist lossless transformations between both models (the
Datalog queries. These rules are also similar to relational
discussion is beyond the scope of this paper), our graphs also
encode N[X]. Provenance graphs which are similar to our implementations of provenance polynomials in Perm [13],
LogicBlox [14], and Orchestra [15]. Zhou et al. [27] leverage
graphs restricted to positive queries have been used as graph
suchrulesforthedistributedExSPANsystemusingeitherfull
representationsofsemiringprovenance(e.g.,see[8],[9],[18]).
propagation or reference based provenance. An extension of
Both our graphs and the boolean circuits representation of
firing rules for negation is the main enabler of our approach.
semiring provenance [9] explicitly share common subexpres-
sionsintheprovenance.However,whilethesecircuitssupport
recursive queries, they do not support negation. Exploring the IV. COMPUTINGEXPLANATIONS
relationship of provenance graphs for queries with negation
Recall from Sec. I that our approach generates a new
and m-semirings (semirings with support for set difference) is Datalog program GP(P,Q(t),I) by rewriting a given query
an interesting avenue for future work. Justifications for logic
(input program) P to return the edge relation of explana-
programs [23] are also closely related.
tion EXPL(P,Q(t),I) for a provenance question Q(t) (PQ).
Why-not and Missing Answers. Approaches for explaining In this section, we explain how to generate the program
missing answers can be classified based on whether they GP(P,Q(t),I) using the following steps:
1. We unify the input program P with the PQ by propagating Algorithm 1 Unify Program With PQ
theconstantsinttop-downthroughouttheprogramtobeable 1: procedure UNIFYPROGRAM(P, Q(t))
to later prune irrelevant rule derivations. 2: todo←[Q(t)]
3: done←{}
2. Afterwards, we determine for which nodes in the graph
4: PU =[]
we can infer their success/failure status based on the PQ. We
5: while todo(cid:54)=[] do
model this information as annotations on heads and goals of 6: a←POP(todo)
rules and propagate these annotations top-down. 7: INSERT(done,a)
3. Based on the annotated and unified version created in the 8: rules←GETRULESFORATOM(P,a)
9: for all r∈rules do
previous steps, we generate firing rules that capture variable
10: unRule←UNIFYRULE(r,a)
bindings for successful and failed rule derivations.
11: PU ←PU::unRule
4. To be in the result of one of the firing rules obtained in 12: for all g∈body(unRule) do
the previous step is a necessary, but not sufficient, condition 13: if g(cid:54)∈done then todo←todo::g
forthecorrespondingPG(P,I)fragmenttobeconnectedtoa 14: return PU
node matching the PQ. To guarantee that only relevant frag-
ments are returned, we introduce additional rules that check
connectivity to confirm whether each fragment is connected. that agree with this binding can derive tuples that agree with
5. Finally, we create rules that generate the edge relation of this binding. Based on this unification step, we know which
the provenance graph (i.e., EXPL(P,Q(t),I)) based on the bindingsmayproducefragmentsofPG(P,I)thatarerelevant
rule binding information that the firing rules have captured. for explaining the PQ. The algorithm implementing this step
In the following, we will explain each step in detail and is given as Algorithm1.
illustrate its application based on the question WHYQ(n,s)
B. Add Annotations based on Success/Failure
from Example1, i.e., why is New York connected to Seattle
via a train connection with one intermediate stop, but is not ForWHY andWHYNOT questions,weonlyconsidertuples
that are existing and missing, respectively. Based on this
directly connected to Seattle.
information, we can infer restrictions on the success/failure
A. Unify the Program with PQ status of nodes in the provenance graph that are connected
to PQ node(s) (belong to the explanation). We store these
The node Q(n,s) in the provenance graph (Fig.2) is only
restrictions as annotations T, F, and F/T on heads and goals
connected to rule derivations which return Q(n,s). For in-
of rules. Here, T indicates that we are only interested in
stance,ifvariableX isboundtoanothercityx(e.g.,Chicago)
successfulnodes,F thatweareonlyinterestedinfailednodes,
in a derivation of the rule r , then this rule cannot return
1
andF/T thatweareinterestedinboth.Theseannotationsare
the tuple (n,s). This reasoning can be applied recursively to
determinedusingatop-downpropagationseededwiththePQ.
replacevariablesinruleswithconstants.Thatis,weunifythe
rules in the program top-down with the PQ. This step may Example 5. Continuing with our running example question
produce multiple duplicates of a rule with different bindings. WHYQ(n,s), we know that Q(n,s) is successful because
Weusesuperscriptstomakeexplicitthevariablebindingused the tuple is in the result (Fig.1). This implies that only
by a replica of a rule. successful rule nodes and their successful goal nodes can be
connected to this tuple node. Note that this annotation does
Example 4. Given the question WHYQ(n,s), we unify the
not imply that the rule r would be successful for every Z
single rule r using the assignment (X=n,Y=s): 1
1
(i.e., every intermediate stop between New York and Seattle).
r(X=n,Y=s) :Q(n,s):−T(n,Z),T(Z,s),¬T(n,s) It only indicates that it is sufficient to focus on successful rule
1
derivations since failed ones cannot be connected to Q(n,s).
We may have to create multiple partially unified versions
of a rule or an EDB atom. For example, to explore suc- r(X=n,Y=s),T :Q(n,s)T :−T(n,Z)T,T(Z,s)T,¬T(n,s)T
1
cessful derivations of Q(n,s) we are interested in both train
We now propagate the annotations of the goals in r through-
connections from New York to some city (T(n,Z)) and from 1
outtheprogram.Thatis,foranygoalthatisanIDBpredicate,
any city to Seattle (T(Z,s)). Furthermore, we need to know
we propagate its annotation to the head of all rules deriving
whetherthereisadirectconnectionfromNewYorktoSeattle
the goal’s predicate and, then, propagate these annotations
(T(n,s)). The general idea of this step is motivated by [2]
to the corresponding rule bodies. Note that the inverted
which introduced “magic sets”, an approach for rewriting
annotation is propagated for negated goals. For instance, if
logical rules to cut down irrelevant facts by using additional
T would be an IDB predicate, then the annotation on the
predicates called “magic predicates”. Similar techniques exist
goal ¬T(n,s)T would be propagated as follows. We would
instandardrelationaloptimizationunderthenameofpredicate
annotatetheheadofallrulesderivingT(n,s)withF,because
move around. We use the technique of propagating variable
Q(n,s) can only exist if T(n,s) does not exist.
bindings in the query to restrict the computation based on
the user’s interest. This approach is correct because if we Partially unified atoms (such as T(n,Z)) may occur in
bind a variable in the head of rule, then only rule derivations both negative and positive goals of the rules of the program.
Algorithm 2 Success/Failure Annotations F (n,s):−F (n,s,Z)
Q,T r1,T
1: procedure ANNOTPROGRAM(PU, Q(t)) F (n,s,Z):−F (n,Z),F (Z,s),F (n,s)
2: state←typeof(Q(t)) r1,T T,T T,T T,F
3: todo←[Q(t)state] F (n,Z):−T(n,Z)
T,T
4: done←{}
5: PA =[] FT,T(Z,s):−T(Z,s)
6: while todo(cid:54)=[] do F (n,s):−¬T(n,s)
T,F
7: a←POP(todo)
8: state←typeof(a) Fig. 4: Example firing rules for WHYQ(n,s)
9: INSERT(done,a)
10: rules←GETUNRULESFORATOM(P,a)
11: for all r∈rules do
firing rule consists of the body of the rule (but using the
12: annotRule←ANNOTRULE(r,state)
13: PA ←PA::annotRule firing version of each predicate in the body) and a new head
14: for all g∈body(annotRule) do predicate that contains all variables used in the rule. In this
15: if state = F then way, the firing rule captures all the variable bindings of a rule
16: newstate←F/T
derivation. Furthermore, for each IDB predicate R that occurs
17: else
as a head of a rule r, we create a firing rule that has the
18: newstate←state
19: if ISNEGATED(g) then firing version of predicate R in the head and a firing version
20: newstate←SWITCHSTATE(state) of the rules r deriving the predicate in the body. For EDB
21: if gnewstate (cid:54)∈done∧ISIDB(g) then predicates, we create firing rules that have the firing version
22: todo←todo::gnewstate ofthepredicateintheheadandtheEDBpredicateinthebody.
23: for all r∈PA do
24: if typeof(r)=F/T then
25: PA ←REMOVEANNOTATEDRULES(PA,r,{F,T}) Example6. ConsidertheannotatedprograminExample5for
26: return PA the question WHYQ(n,s). We generate the firing rules shown
in Fig.4. The firing rule for r(X=n,Y=s),T (the second rule
1
fromthetop)isderivedfromtheruler byaddingZ (theonly
1
We denote such atoms using a F/T annotation. The use existential variable) to the head, renaming the head predicate
of these annotations will become more clear in the next as F , and replacing each goal with its firing version (e.g.,
r1,T
subsection when we introduce firing rules. The pseudocode F for the two positive goals and F for the negated goal).
T,T T,F
forthealgorithmthatdeterminestheseannotationsisgivenas Notethatnegatedgoalsarereplacedwithfiringrulesthathave
Algorithm2. In short: invertedannotations(e.g.,thegoal¬T(n,s)T isreplacedwith
F (n,s)). Furthermore, we introduce firing rules for EDB
1) Annotate the head of all rules deriving tuples matching T,F
tuples (the three rules from the bottom in Fig.4)
the question with T (why) or F (why-not).
2) Repeat the following steps until a fixpoint is reached:
As mentioned in Sec. III, firing rules for successful rule
a) Propagate the annotation of a rule head to goals in
derivations have been used for declarative debugging of posi-
the rule body as follows: propagate T for T annotated
tive Datalog programs [19] and, for non-recursive queries, are
heads and F/T for F annotated heads.
essentially equivalent to rewrite rules that instrument a query
b) Foreachannotatedgoalintherulebody,propagateits
tocomputeprovenancepolynomials[1],[13].Weextendfiring
annotation to all rules that have this atom in the head.
rules to support queries with negation and capture missing
For negated goals, unless the annotation is F/T, we
answers. To construct a PG(P,I) fragment corresponding
propagatetheinvertedannotation(e.g.,F forT)tothe
to a missing tuple, we need to find failed rule derivations
head of rules deriving the goal’s predicate.
with the tuple in the head and ensure that no successful
derivations exist with this head (otherwise, we may capture
C. Creating Firing Rules
irrelevant failed derivations of existing tuples). In addition,
To be able to compute the relevant subgraph of PG(P,I) we need to determine which goals are failed for each failed
(the explanation) for the provenance question PQ, we need to rule derivation because only failed goals are connected to the
determine successful and/or failed rule derivations. Each rule node representing the failed rule derivation in the provenance
derivation paired with the information whether it is successful graph. To capture this information, we add additional boolean
over the given database instance (and which goals are failed variables - V for goal gi - to the head of a firing rule that
i
in case it is not successful) corresponds to a certain subgraph. record for each goal whether it failed or not. The body of a
Successful rule derivations are always part of PG(P,I) for a firing rule for failed rule derivations is created by replacing
givenquery(inputprogram)P whereasfailedrulederivations everygoalinthebodywithitsF/T firingversion,andadding
only appear if the tuple in the head failed, i.e., there are no the firing version of the negated head to the body (to ensure
successful derivations of any rule with this head. To capture thatonlybindingsformissingtuplesarecaptured).Firingrules
the variable bindings of successful/failed rule derivations, capturing failed derivations use the F/T firing versions of
we create “firing rules”. For successful rule derivations, a their goals because not all goals of a failed derivation have
F (s,n):−¬F (s,n) Algorithm 3 Create Firing Rules
Q,F Q,T
FQ,T(s,n):−Fr1,T(s,n,Z) 12:: proPcFediruer←e C[]REATEFIRINGRULES(PA, Q(t))
Fr1,F(s,n,Z,V1,V2,¬V3):−FQ,F(s,n),FT,F/T(s,Z,V1), 43:: tsotadtoe←←[Qty(pte)ostfa(teQ](t))
F (Z,n,V ),F (s,n,V )
T,F/T 2 T,F/T 3 5: done←{}
Fr1,T(s,n,Z):−FT,T(s,Z),FT,T(Z,n),FT,F(s,n) 67:: whRile(tt)oσd←o(cid:54)=PO[]Pd(toodo) (cid:46) create rules for a predicate
FT,F/T(s,Z,true):−FT,T(s,Z) 8: INSERT(done,R(t)σ)
F (s,Z,false):−F (s,Z) 9: if ISEDB(R) then
T,F/T T,F 10: CREATEEDBFIRINGRULE(PFire,R(t)σ)
FT,T(s,Z):−T(s,Z) 11: else
FT,F(s,Z):−domT.toCity(Z),¬T(s,Z) 1123:: rCuRlEeAsT←EIDGEBTNREUGLREUS(LRE((Pt)Fσi)re,R(t)σ)
Fig. 5: Example firing rules for WHYNOTQ(s,n) 14: for all r∈rules do (cid:46) create firing rule for r
15: args←(vars(r)−vars(head(r)))
16: args←args(head(r))::args
17: CREATEIDBPOSRULE(PFire,R(t)σ,r,args)
to be failed and the failure status determines whether the 18: CREATEIDBFIRINGRULE(PFire,R(t)σ,r,args)
corresponding goal node is part of the explanation. A firing 19: return PFire
rule capturing missing IDB or EDB tuples may not be safe,
i.e., it may contain variables that only occur in negated goals.
In fact, these variables should be restricted to the associated tuple T(s,n) (a direct train connection from Seattle to New
domains for the attributes the variables are bound to. Since York) is missing. This causes the third goal of r1 to succeed
the associated domain dom for an attribute R.A is given as for any derivation where X=s and Y=n. For each partially
an unary query dom , we can use these queries directly in unified EDB atom annotated with F/T, we create four rules:
R.A
firing rules to restrict the values the variable is bound to. This one for existing tuples (e.g., FT,T(s,Z):−T(s,Z)), one for
is how we ensure that only missing answers formed from the thefailurecase(e.g.,FT,F(s,Z):−domT.toCity(Z),¬T(s,Z)),
associated domains are considered and that firing rules are andtwofortheF/T firingversion.Forthefailurecase,weuse
always safe. predicate domT.toCity to only consider missing tuples (s,Z)
whereZ isavaluefromtheassociateddomainofthisattribute.
Example 7. Reconsider the question WHYNOTQ(s,n) from
The algorithm that creates the firing rules for an annotated
Example1. The firing rules generated for this question are
input query is shown as Algorithm3 (the pseudocode for the
shown in Fig.5. We exclude the rules for the second goal
subprocedures is given as Algorithm4). It maintains a list of
T(Z,n) and the negated goal ¬T(s,n) which are analogous
annotated atoms that need to be processed which is initialized
to the rules for the first goal T(s,Z). Since Q(s,n) is failed
with the Q(t) (PQ). For each such atom R(t)σ (here σ is the
(becausetuple(s,n),i.e.,NewYorkcannotbereachablefrom
annotation of the atom), it creates firing rules for each rule r
Seattle with exactly one transfer, is not in the result), we
thathasthisatomasaheadandapositivefiringruleforR(t).
are only interested in failed rule derivations of the rule r
1 Furthermore, if the atom is annotated with F/T or F, then
with X=s and Y=n. Furthermore, each rule node in the
additional firing rules are added to capture missing tuples and
provenancegraphcorrespondingtosucharulederivationwill
failed rule derivations.
onlybeconnectedtofailedsubgoals.Thus,weneedtocapture
which goals are successful or failed for each such failed EDB atoms. For an EDB atom R(t)T, we use procedure
derivation. This can be modelled through boolean variables CREATEEDBFIRINGRULE to create one rule FR,T(t):−R(t)
V , V , and V (since there are three goals in the body) that returns tuples from relation R that matches t. For miss-
1 2 3
that are true if the corresponding goal is successful and false ing tuples (R(t)F), we extract all variables from t (some
otherwise. The firing version F (s,n,Z,V ,V ,¬V ) of r arguments may be constants propagated during unification)
will contain all variable bindinrg1,sF for deriva1tion2s of3r such1 and create a rule that returns all tuples that can be formed
1
that Q(s,n) is the head (i.e., guaranteed by adding F (s,n) from values of the associated domains of the attributes these
Q,F
to the body), the rule derivations are failed, and the ith goal variables are bound to and do not exist in R. This is achieved
is successful or failed for this binding iff Vi is true or false, by adding goals dom(Xi) as explained in Example7.
respectively.Toproduceallthesebindings,weneedrulescap- Rules. Consider a rule r :R(t):−g (X(cid:126) ),...,g (X(cid:126) ). If the
1 1 n n
turingsuccessfulandfailedtuplenodesforeachsubgoalofthe head of r is annotated with T, then we create a rule with
ruler .WedenotesuchrulesusingaF/T annotationanduse head F (X(cid:126)) where X(cid:126) = vars(r) and the same body as r
1 r,T
abooleanvariable(trueorfalse)torecordwhetheratupleex- except that each goal is replaced with its firing version with
ists(e.g.,F (s,Z,true):−F (s,Z)isoneoftheserules). appropriate annotation (e.g., T for positive goals). For rules
T,F/T T,T
Negatedgoalsaredealtwithbynegatingthisbooleanvariable, annotated with F or F/T, we create one additional rule with
i.e., the goal is successful if the corresponding tuple does not headF (X(cid:126),V(cid:126))whereX(cid:126) isdefinedasabove,andV(cid:126) contains
r,F
exist. For instance, F (s,n,false) represents the fact that V iftheith goalofr ispositiveand¬V otherwise.Thebody
T,F/T i i
Algorithm 4 Create Firing Rules Subprocedures with these annotations. For each R(t)F, we create a rule with
1: procedure CREATEEDBFIRINGRULE(PFire, R(t)σ) ¬FR,T(t) in the body using the associated domain queries to
2: [X1,...,Xn]←vars(t) restrictvariablebindings.ForR(t)F/T,weaddtwoadditional
3: rT ←FR,T(t):−R(t) rules as shown in Fig.5 for EDB atoms.
4: rF ←FR,F(t):−dom(X1),...,dom(Xn),¬R(t)
5: rF/T−1 ←FR,F/T(t,true):−FR,T(t) Theorem 1 (Correctness of Firing Rules). Let P be an input
6: rF/T−2 ←FR,F/T(t,false):−FR,F(t) program,r denotearuleofP withmgoals,andP bethe
7: if σ=T then Fire
8: PFire ←PFire::rT firingversionofP.Weuser(t)|=P(I)todenotethattherule
9: else if σ=F then derivation r(t) is successful in the evaluation of program P
10: PFire ←PFire::rT ::rF over I. The firing rules for P correctly determine existence of
11: else tuples, successful rule derivations, and failed rule derivations
12: PFire ←PFire::rT ::rF ::rF/T−1::rF/T−2 for missing answers:
1: procedure CREATEIDBNEGRULE(PFire, R(t)σ)
2: [X1,...,Xn]←vars(t) • FR,T(t)∈PFire(I)↔R(t)∈P(I)
3: if σ(cid:54)=T then • FR,F(t)∈PFire(I)↔R(t)(cid:54)∈P(I)
564::: if σPrnF=eiwrFe←/←TFPRt,hFFe(itnr)e::−:rdnoewm(X1),...,dom(Xn),¬FR,T(t) •• FFrr,,TF((tt,)V(cid:126)∈)P∈FirPeF(Iir)e(↔I)r↔(t)r|=(tP) ((cid:54)|=I)P(I)∧head(r(t)) (cid:54)∈
7: rT ←FR,F/T(t,true):−FR,T(t) P(I) and for i∈{1,...,m} we have that Vi is false iff
8: rF ←FR,F/T(t,false):−FR,F(t) ith goal fails in r(t).
9: PFire ←PFire::rT ::rF
1: procedure CREATEIDBPOSRULE(PFire, R(t)σ, r, args) Proof: We prove Theorem 1 by induction over the struc-
2: rpred ←FR,T(t):−Fr,T(args) tureofaprogram.Forthebasecase,weconsiderprogramsof
3: PFire ←PFire::rpred “depth” 1, i.e., only EDB predicates are used in rule bodies.
1: procedure CREATEIDBFIRINGRULE(PFire, R(t)σ, r, args) Then,weprovecorrectnessforprogramsofdepthn+1based
2: bodynew ←[] onthecorrectnessofprogramsofdepthn.Wedefinethedepth
3: for all gi(X(cid:126))∈body(r) do dofpredicates,rules,andprogramsasfollows:1)forallEDB
4: σgoal ←T predicatesR,wedefined(R)=0;2)foranIDBpredicateR,
5: if ISNEGATED(g) then
wedefined(R)=max d(r),i.e.,themaximaldepth
6: σgoal ←F head(r)=R
87:: bgondewyn←ewF←prebdo(gdiy),nσegowal:(:X(cid:126)gn)ew aismdo(nrg)a=llmrualexsRr∈bwodiyth(r)hde(aRd()r+)=1,Ri.e;.3,)ththeemdaexpitmhaolfdaepruthleorf
9: if g(X(cid:126))T (cid:54)∈(done∪todo)∧σ=T then all predicates in its body plus one; 4) the depth of a program
10: todo←todo::g(X(cid:126))σgoal P is the maximum depth of its rules: d(P)=maxr∈P d(r).
11: rnew ←Fr,T(args):−bodynew 1) Base Case. Assume that we have a program P with depth
12: PFire ←PFire::rnew 1, e.g., r :Q(X(cid:126)):−R(X(cid:126)1),...,R(X(cid:126)n). We first prove that the
13: if σ(cid:54)=T then
positive and negative versions of firing rules for EDB atoms
14: for all gi ∈body(r) do
are correct, because only these rules are used for the rules of
15: if ISNEGATED(gi) then
16: args←args::¬bi depth 1 programs. A positive version of EDB firing rule FR,T
17: else createsacopyoftheinputrelationRand,thus,atuplet∈Riff
18: args←args::bi t∈FR,T. For the negative version FR,F, all variables are bound
19: bodynew ←[] to associated domains dom and it is explicitly checked that
20: for all gi(X(cid:126))∈body(r) do ¬R(X(cid:126)) is true. Finally, F uses F and F to determine
21: gnew ←Fpred(gi),F/T(X(cid:126),bi) whether the tuple exists inR,FR/.TSince tRh,eTse ruleRs,Fare correct, it
22: bodynew ←bodynew::gnew
23: if g(X(cid:126))F/T (cid:54)∈(done∪todo) then follows that FR,F/T is correct. The positive firing rule for the
24: todo←todo::g(X(cid:126))σgoal rule r (Fr,T) is correct since its body only contains positive
25: rnew ←Fr,σ(args):−bodynew and negative EDB firing rules (FR,T respective FR,F) which are
26: PFire ←PFire::rnew already known to be correct. The correctness of the positive
firingversionofarule’sheadpredicate(F )followsnaturally
Q,T
from the correctness of F . The negative version of the rule
r,T
F (X(cid:126),V(cid:126))containsanadditionalgoal(i.e.,¬Q(X(cid:126)))anduses
ofthisrulecontainstheF/T versionofeverygoalinr’sbody r,F
the firing version F to return only bindings for failed
plus an additional goal F to ensure that the head atom is R,F/T
R,F derivations. Since F has been proven to be correct, we
failed. As an example for this type of rule, consider the third R,F/T
only need to prove that the negative firing version of the head
rule from the top in Fig.5.
predicate of r is correct. For a head predicate with annotation
IDB atoms. For each rule r with head R(t), we create a rule F, we create two firing rules (F and F ). The rule F
Q,T Q,F Q,T
F (t):−F (X(cid:126)) where X(cid:126) is the concatenation of t with all was already proven to be correct. F is also correct, because
R,T r,T Q,F
existentialvariablesfromthebodyofr.IDBatomswithF or it contains only F and domain queries in the body which
Q,T
F/T annotations are handled in the same way as EDB atoms were already proven to be correct.