Techniques For Measuring The Inferential Strength of Forgetting Policies

Techniques for Measuring the Inferential Strength of Forgetting Policies
Patrick Doherty1 , Andrzej Szałas1,2

1
Department of Computer and Information Science, Linköping University, Sweden
2
Institute of Informatics, University of Warsaw, Poland
patrick.doherty@liu.se, andrzej.szalas@{mimuw.edu.pl, liu.se}
arXiv:2404.02454v1 [cs.AI] 3 Apr 2024
Abstract The goal of this paper is to define such measures

in terms of loss functions, show useful properties as-
The technique of forgetting in knowledge representation has sociated with them, show how they can be applied,
been shown to be a powerful and useful knowledge engi- and also show how they can be computed efficiently.
neering tool with widespread application. Yet, very little re-
search has been done on how different policies of forgetting,
The basis for doing this will be to use intuitions
or use of different forgetting operators, affects the inferen- from the area of model counting, both classical and
tial strength of the original theory. The goal of this paper approximate (Gomes, Sabharwal, and Selman 2021)
is to define loss functions for measuring changes in infer- and probability theory. In doing this, a pragmatic
ential strength based on intuitions from model counting and knowledge engineering tool based on the use of
probability theory. Properties of such loss measures are stud- P ROB L OG (De Raedt and Kimmig 2015) will be de-
ied and a pragmatic knowledge engineering tool is proposed scribed that allows for succinctly specifying the inferential
for computing loss measures using P ROB L OG. The paper in- strength of arbitrary theories, in themselves, and their
cludes a working methodology for studying and determining associated losses using the dual forgetting operators. Initial
the strength of different forgetting policies, in addition to con- focus is placed on propositional theories for clarity, but
crete examples showing how to apply the theoretical results
using P ROB L OG. Although the focus is on forgetting, the the techniques are then shown to generalize naturally to
results are much more general and should have wider appli- 1st-order theories, with some restrictions. Although the
cation to other areas. focus in this paper is on measuring inferential strength
relative to forgetting operators, the techniques are much
more general than this and should be applicable to other
1 Introduction application areas including traditional model counting and
The technique of forgetting (Lin and Reiter 1994) in knowl- approximate model counting through sampling.
edge representation has been shown to be a powerful and Let’s begin with describing an approach to measuring
useful knowledge engineering tool with many applications inferential strength using intuitions from model counting
(see Section 9 on related work). and then show how this measurement technique can be
Recently, it has been shown (Doherty and Szałas 2024) used in the context of forgetting. Consider the following
that two symmetric and well-behaved forgetting opera- simple theory, Tc , extracted from an example considered
tors can be defined that provide a qualitative best up- in (Nayak and Levy 1995), where jcar stands for “Japanese
per and lower bound on forgetting in the following car”, ecar – for “European car”, and f car – for “foreign
sense: it has been shown that strong or standard forget- car”:
ting (Lin and Reiter 1994) provides a strongest necessary
jcar → (car ∧ reliable ∧ f car) (1)
condition on a theory with a specific forgetting policy,
while weak forgetting (Doherty and Szałas 2024) provides a ecar → (car ∧ f ast ∧ f car) (2)
weakest sufficient condition on a theory with a specific for- f car → (ecar ∨ jcar). (3)
getting policy. In the former, one loses inferential strength
relative to the original theory wrt necessary conditions, The theory partitions the set of all assignments of truth val-
while in the latter one loses inferential strength wrt sufficient ues to propositional variables occurring in a theory (worlds)
conditions. That is, some necessary (respectively, sufficient) into two sets: (i) the worlds satisfying Tc (models), and (ii)
conditions that can be expressed in the language of the origi- the worlds not satisfying Tc .
nal theory may be lost when the language becomes restricted This suggests loss measures based on model counting
after forgetting. A question then arises as to how these losses to reflect theory strength. In such a case, the entailment
can be measured, both individually relative to each operator strength can be measured by the number of models: the more
and comparatively relative to each other. The loss function models a theory T has, the less conclusions it may entail,
used should be well-behaved and also quantitative, so as to and the less models T has, the more conclusions it possibly
ensure fine-grained measurement. entails.
1
Though model counting enjoys quite successful imple- • no measures have yet been provided in the literature that
mentations (Fichte, Hecher, and Hamiti 2021), it is gener- indicate how reasoning strength changes when the tech-
ally complex (Gomes, Sabharwal, and Selman 2021) with nique of forgetting is used;
challenging scalability issues. • such measures are useful indicators for belief base engi-
In a more general case, the proportion between (i) and neers when determining proper policies of forgetting for
(ii) can be represented by the probability of Tc , another use- various applications;
ful measure of inferential strength. Here we shall deal with
probability spaces, where samples are worlds, events are sets • such measures are generally needed as one of the crite-
of worlds specified by formulas (given as models of formu- ria that can be used for automated selection of symbols
las), and the probability function is given by: to forget, e.g., for optimizing a selected class of queries,
when forgetting a particular choice of symbols can sub-
def |{w ∈ W : w |= T }| stantially increase the system’s performance without seri-
P T = , (4) ously affecting its inferential capabilities.
|W|
The working methodology for the approach is as follows,
where T is a theory, W is the set of worlds, |= is the satisfi- where details are provided in the paper:
ability relation, and |.| denotes the cardinality of a set. That
1. Given a propositional theory T , transform it into a strati-
is, P T is the fraction between the number of models of T fied logic program using the transformation rules in Sec-
and the number of all worlds. tion 4.
For example, there are 26 = 64 assignments of truth
values to propositional variables occurring in Tc (worlds) 2. Specify a probability distribution on worlds using the
and 13 such assignments satisfying Tc (models of Tc ), so propositional facts (variables in T ) as probabilistic vari-
13 ables in P ROB L OG, as described in Section 2.2.
P Tc = = 0.203125. Since evaluating probabilities
64 3. Using P ROB L OG’s query mechanism, described in Sec-
can, in many cases, be very efficient (at least when approx- tion 2.2, query for the probability P T of theory T .
imated by sampling), we focus on probabilities rather than
model counting directly.1 We will also define more general For forgetting, given a propositional theory T h(p̄, q̄) and
probabilistic-based loss measures, where one does not just a forgetting policy p̄:
count models with uniform distributions, but also pay at- 4. Specify the 2nd-order theories, F N C (T h(p̄, q̄); p̄) and
tention to arbitrary probability distributions. This provides F SC (T h(p̄, q̄); p̄), representing strong and weak forget-
a much more flexible and general inferential measurement ting, respectively, relative to the forgetting policy p̄, as
technique. These intuitions provide the basis for the speci- described in Section 2.1.
fication of loss functions for forgetting operators, based on 5. Apply quantifier elimination techniques to both theories
probabilistic measures on worlds, that measure the inferen- as exemplified in Section 7, resulting in propositional the-
tial strength of the original theory and the loss in inferential ories without p̄.
strength after applying a specific forgetting policy and oper-
ator. 6. Apply steps 1 to 3 above to each of the theories.
7. The resulting probabilities P T , P F N C (T , p̄) , and

Two forgetting operators considered in
(Doherty and Szałas 2024), strong (standard) and weak P F SC (T , p̄) , for each of the three theories, can then
forgetting, are shown to have very nice intrinsic properties be used to compute loss values for the respective losses
in the following sense. Given a specific policy of forgetting, of inferential strength associated with strong and weak
when applying that policy to a theory using the weak forget- forgetting relative to the original theory, as described in
ting operator, the resultant models of the revised theory are Section 3 and Section 6.
a subset of models of the original theory. Similarly, when
applying the strong forgetting operator to the same theory Generalization of this approach to the 1st-order case is de-
using the same forgetting policy, the resultant models of the scribed in Section 8.
revised theory are a superset of the models of the original The original results of this paper include:
theory. As mentioned previously, the revised theories • formal specification of model counting-based and
characterize the weakest sufficient and strongest necessary probability-based loss measures for forgetting with an
theories relative to a specific forgetting policy. One can analysis of their properties;
therefore measure the gap between the original theory and • an equivalence preserving technique for transforming any
the theories resulting from forgetting in terms of the size of formula of propositional or 1st-order logic into a set of
gaps between the mentioned sets of models associated with stratified logic programming rules in time linear wrt the
the revised theories. Intuitively, the larger the gaps are, the size of the input formula;
more one loses in terms of reasoning strength.
The motivations for desiring such measures include the • algorithms for computing these loss measures imple-
following: mented using P ROB L OG, a well-known probabilistic
logic programming language.
1 P ROB L OG is chosen as a basis for this research since it pro-
Note that given the probability of a theory T and the number
of variables in the theory, the model count can easily be derived. vides a straightforward means of computing probabilistic
2
measures in general, and it also has built in mechanisms for variables in tuples p̄, q̄.3 When forgetting p̄ in the theory
doing approximate Inference by Sampling (constant time by T h(p̄, q̄), one can delineate two alternative views, with both
fixing the number of samples). Consequently, both classical resulting in a theory expressed in the vocabulary containing
and approximation-based model counting techniques can be q̄ only:
leveraged in defining loss measures.
• strong (standard) forgetting F N C (T h(p̄, q̄); p̄): a theory
The paper is structured as follows. In Section 2 we de- that preserves the entailment of necessary conditions over
scribe the essentials of strong and weak forgetting and the vocabulary q̄;
P ROB L OG constructs used in the paper. In Section 3 we
define model counting-based loss measures for forgetting • weak forgetting F SC (T h(p̄, q̄); p̄): a theory that preserves
and show their properties. Section 4 is devoted to how the entailment by sufficient conditions over q̄.
one computes model-counting-based loss measures using As shown in (Doherty and Szałas 2024) (with Point 1
P ROB L OG. In Section 5, probabilistic-based loss measures proved earlier, in (Lin and Reiter 1994)), we have the fol-
are defined which allow for arbitrary probability distribu- lowing theorem.
tions over worlds rather than uniform distributions. Sec-
tion 6 focuses on how one computes probabilistic-based loss Theorem 2.1. For arbitrary tuples of propositional vari-
measures using P ROB L OG. In Section 7, we consider an ex- ables p̄, q̄ and T h(p̄, q̄),
ample showing how both types of loss measures are used 1. F N C (T h(p̄, q̄); p̄) ≡ ∃p̄ T h(p̄, q̄) .

to describe loss of inferential strength relative to forgetting
policies. Section 8 provides a generalization of the tech- 2. F N C (T h(p̄, q̄), p̄) is the strongest (wrt →) formula over
niques to the 1st-order case. Section 9 is devoted to a dis- vocabulary q̄, preserving the entailment of necessary con-
cussion of related work. Section 10 concludes the paper with ditions over a vocabulary disjoint with p̄.
summary and final remarks. Finally, P ROB L OG sources of 3. F SC (T h(p̄, q̄); p̄) ≡ ∀p̄ T h(p̄, q̄) .

all programs discussed in the paper are listed in the Ap-
pendix in ready to validate form using an online P ROB L OG 4. F SC (T h(p̄, q̄); p̄) is the weakest (wrt →) formula over
interpreter referenced in Section 10. vocabulary q̄, preserving the entailment by sufficient con-
ditions over a vocabulary disjoint with p̄.
2 Preliminaries From the Theorem 2.1 it easily follows that:
2.1 Forgetting |= F SC (T , p̄)) → T ;
(8)
For the sake of simplicity, we start with presenting ideas fo- |= T → F N C (T , p̄).
cusing on classical propositional logic with truth values T
(true) and F (false), an enumerable set of propositional vari- 2.2 ProbLog
ables, V 0 , and standard connectives ¬, ∧, ∨, →, ≡. We as- To compute the probability of T , one can use the
sume the classical two-valued semantics with truth values T well-known probabilistic programming language
(true) and F (false). To define forgetting operators, we shall P ROB L OG (De Raedt and Kimmig 2015).
also use 2nd-order quantifiers ∃p, ∀p, where p is a proposi- P ROB L OG extends classical P ROLOG with constructs spe-
tional variable. The meaning of quantifiers in the proposi- cific for defining probability distributions as well as with
tional context is: stratified negation. It uses the symbol \+ for negation, where
def ‘\’ stands for ‘not’ and ‘+’ stands for ‘provable’. That
∃p A(p) ≡ A(p = F) ∨ A(p = T); (5) is, \+ represents negation as failure. Stratification in logic
def programs and rule-based query languages is a well-known
∀p A(p) ≡ A(p = F) ∧ A(p = T), (6)
idea (see (Abiteboul, Hull, and Vianu 1995) and references
where A(p = expr) denotes a formula obtained from A by there), so we will not discuss it here.
substituting all occurrences of p in A by expression expr. The P ROB L OG constructs specific for defining probabil-
A theory is a finite set of propositional formulas.2 As usu- ity distributions used in the paper are listed below. For sim-
ally, we identify propositional theories with conjunctions of plicity, we restrict the presentation to propositional logic,
their formulas. By a vocabulary (a signature) of T , denoted since we mainly focus on that in the paper. For additional
by VT0 , we mean all propositional variables occurring in T . features, the reader is referred to related literature, includ-
By a world for a theory (formula) T we mean any assigning (De Raedt et al. 2016; De Raedt and Kimmig 2015), in
ment of truth values to propositional variables in VT0 : addition to other references there.
w : VT0 −→ {T, F}. (7) • To specify probabilistic variables, an operator ‘::’ is used.
The specification u :: p states that p is true with probabil-
The set of worlds of T is denoted by WT . ity u and false with probability (1 − u).
The rest of this subsection is based Probabilistic variables can be used to specify probabilistic
on (Doherty and Szałas 2024). Let T (p̄, q̄) be a proposi- facts and heads of probabilistic rules.
tional theory over a vocabulary consisting of propositional
3
When we refer to tuples of variables as arguments, like in
2
Such finite sets of formulas are frequently called belief bases. T h(p̄, q̄), we always assume that p̄ and q̄ are disjoint.
3
Obviously, lossTm T , p̄ = lossN C
T , p̄ + lossSC

• Queries are used to compute probabilities. The probabil- m m T , p̄ .
ity of p is returned as the result of the query query(p) or The intuition behind the lossm measures is the following:
subquery(p, P ), where P obtains the probability of p. • (12) indicates how much, in terms of the number of mod-
• Annotated disjunctions support choices. If u1 , . . . , uk ∈ els, is lost with strong forgetting compared to the original
[0.0, 1.0] such that 0.0 ≤ u1 + . . . + uk ≤ 1.0, and theory. In this case, the new theory is inferentially weaker
p1 , . . . pk are propositional variables, then an annotated than the original theory, and certain necessary conditions
disjunction is an expression of the form: that were previously entailed by the original theory are no
u1 :: p1 ; . . . ; uk :: pk . (9) longer entailed by the resultant theory.
It states that at most one of u1 :: p1 , . . . , uk :: pk is chosen • (13) indicates how much, in terms of the number of mod-
as a probabilistic fact. els, is lost with weak forgetting compared to the origi-
Let W denote the set of all worlds assigning truth values nal theory. In this case, the new theory is inferentially
to variables occurring in a given P ROB L OG program. Dis- stronger than the original theory, although certain suffi-
tribution semantics assumes that probabilistic variables are cient conditions that previously entailed the original the-
independent. Probabilities of queried propositional variables ory no longer entail the resultant theory;
are defined as the sum of probabilities of worlds, where they • (14) indicates what the total loss is measured as the size
are true, of the gap between strong and weak forgetting.
def X
P p = P w , We have the following interesting properties of the de-
w∈W: w|=p fined loss measures.

where probabilities of worlds, P w , are evaluated as the Proposition 3.2. Let ∗ ∈ {N C, SC, T }. Then for every
product of probabilities ui of probabilistic facts being true propositional theory T ,
in these worlds and (1 − ui ) of those being false:
1. loss∗m T , p̄ ∈ [0.0, 1.0];

def Y Y
P w = ui ∗ (1 − ui ). 2. loss∗m T , p̄ = 0.0, when variables from p̄ do not occur

1≤i≤k,w|=pi 1≤i≤k,w6|=pi in T ; in particular:
Approximate evaluation of probabilities, with sampling, (a) loss∗m F, p̄ = 0.0;

is available through P ROB L OG’s interpreter using the com-
(b) loss∗m T, p̄ = 0.0;

mand: problog sample model.txt --estimate -N n _ X
3. loss∗m loss∗m Ti , p̄ , when for all 1 ≤ i 6= j,

where ‘model.txt’ is a text file with a P ROB L OG source, Ti =
and ‘n’ is a positive integer fixing the number of samples. 1≤i 1≤i
Ti ∧ Tj ≡ F, i.e., Ti , Tj represent mutually disjoint sets
3 Model-Counting-Based Loss Measures (countable additivity);
of Forgetting 4. loss∗m T , ǭ = 0.0, where ǭ stands for the empty tuple;

Let mod(T ) and #T denote the set of, and the number of 5. loss∗m T , p̄ = 1.0, when p̄ contains all propositional

models of T , respectively.4 As an immediate corollary of (8) variables occurring in T ;
we have: 6. loss∗m T , p̄ ≤ loss∗m T , q̄ when all propositional

vari-
mod F SC (T , p̄)) ⊆ mod T ⊆ mod F N C (T , p̄) , (10)

ables occurring in p̄ also occur in q̄ (loss∗m is mono-

so also: tonic wrt forgotten vocabulary).
#F SC (T , p̄) ≤ #T ≤ #F N C (T , p̄). (11) ∗

Points 1, 2(a), 3 show that lossm is a measure (with 3 be-
For any propositional theory T , the probability of T , P T , ing a well-known property of probability). Point 2 shows
is defined by Equation (4). that nothing is lost when one forgets redundant variables
Model counting-based measures of forgetting can now be (not occurring in the theory), in particular from F or T. Point
4 shows that loss∗m is 0.0

defined as follows. when nothing is forgotten, and
Point 5 shows that loss∗m is 1.0 when everything is forgot-

Definition 3.1. By model counting-based loss measures of
forgetting we understand: ten. Point 6 shows that the more
propositional variables are
def forgotten, the greater loss∗m is.
lossN C
T , p̄ = P F N C (T , p̄) − P T ;

m (12)
SC
def SC
lossm T , p̄ = P T − P F (T , p̄) ;

(13) 4 Computing Model Counting-Based Loss
def Measures Using P ROB L OG
lossTm T , p̄ = P F N C (T , p̄) − P F SC (T , p̄) , (14)

To compute the probability of T , we use P ROB L OG. Though
where the underlying distribution on worlds is assumed to many #SAT solvers, counting the number of satisfying as-
be uniform. signments exist – see (Fichte, Hecher, and Hamiti 2021;
4
Though the vocabulary after applying forgetting operators is a Chakraborty, Meel, and Vardi 2021;
subset of the vocabulary of the original theory T , for the sake of Gomes, Sabharwal, and Selman 2021;
uniformity of presentation we assume that the considered worlds, Soos and Meel 2019) and references there, we have
thus models, too, are built over the vocabulary of T . chosen P ROB L OG, since:
4
• it allows for computing exact probability as well as proba- Table 1: Transformation of an arbitrary propositional formula to
bility based on approximate (sampling based) evaluation; a stratified logic program.
• the resulting probability and loss measures can be used 

def
within a probabilistic logic programming framework in  ∅; rA = A when A is a propositional variable;
−0− − − −

a uniform manner.


 πB .
rA :– \+ rB . when A = ¬B;


Let us start with an example of a P ROB L OG program 

 −0− −0 − −
computing the probability of theory Tc discussed in Sec-

πC . πD .


rA :– rC , rD . when A = C ∧ D;

tion 1. We encode Tc as Program 1. Indeed, r is equivalent


−0− −0 − −


to ¬jcar ∨ (car ∧ reliable ∧ f car), i.e., to (1), s is equiv- 

 πC . πD .
alent to ¬ecar ∨ (car ∧ f ast ∧ f car), i.e., to (2), and u is



 rA :– rC . when A = C ∨ D;
equivalent to ¬f car ∨ jcar ∨ ecar, i.e., to (3). Since t is the 0
πA =
def rA :– rD .
−0− −0 − −
conjunction of r, s and u, it is equivalent to Tc . Therefore, 
 πC . πD .
rA :– \+ rC . when A = C → D;

the probabilityof t, computed by P ROB L OG’s interpreter, is 

rA :– rD .

equal to P Tc . Indeed, there are 6 propositional variables

−0− −0 − −



and each possible world encodes an assignment of truth val- 

 πC . πD .
ues to these propositions with the probability 0.5. So there

 rA′ :– \+ rC .
rA′ :– rD .

are 26 possible worlds, each having the probability 216 , and

rA′′ :– \+ rD . when A = C ≡ D.




t is true in 13 of them (in those satisfying Tc ). Accordingly, 

 rA′′ :– rC .
rA :– rA′ , rA′′ .

the P ROB L OG interpreter answers that the probability of t
(i.e., of Tc ), is 0.203125.
0
Program 1: P ROB L OG program for computing Program 2: A program πq≡(p∧¬q∧s) encoding the for-
the probability of Tc .
mula q ≡ (p ∧ ¬q ∧ s) .
1 0.5 :: car. 0.5 :: reliable. 0.5 :: f ast. def def def
2 0.5 :: f car. 0.5 :: jcar. 0.5 :: ecar. 1 % πp0 , πq0 , πs0 are empty; rp = p, rq = q, rs = s
3 r :– car, reliable, f car. 2 r¬q :– \+ q.
4 r :– \+ jcar. % r represents (1) 3 rp∧¬q∧s :– p, r¬q , s.
5 s :– car, f ast, f car. 4 rA′ :– \+ q.
6 s :– \+ ecar. % s represents (2) 5 rA′ :– rp∧¬q∧s.
7 u :– jcar. 6 rA′′ :– \+ rp∧¬q∧s.
8 u :– ecar. 7 rA′′ :– q.
9 u :– \+ f car. % u represents (3) 8 rq≡(p∧¬q∧s) :– rA′ , rA′′ .
10 t :– r, s, u. % t represents (1) ∧ (2) ∧ (3)
0
Let us show how this approach works in general. Proof: By construction of πA : For each connective occur-
First, one has to define a transformation of an arbitrary ring in A at most 5 additional rules are added. If there are n
0
propositional formula into a set of rules. Let A be a proposi- connectives then πA contains at most 5n rules.
0
tional formula. The rules of a program πA encoding A, are Similarly, it is easily seen that the following proposition
constructed as shown in Table 1, where the r indexed with also holds.
formulas are auxiliary propositional variables. The intended
meaning of each auxiliary variable rB is that it is equivalent Proposition 4.2.
to formula B. It is assumed that whenever a subformula oc- • Let A be a propositional formula. Then each world w ∈
curs more than once, a single auxiliary variable is used to 0
WA determines the values of auxiliary variables in πA .
substitute all of its occurrences.
As an example of a transformation defined in Table 1, • Given a world w ∈ WA , the truth values of auxiliary
0
consider Program 2 transforming formula q ≡ (p∧¬q ∧s) variables in πA can be determined in time linear in the
into a set of rules. Note that the number of auxiliary vari- size of A.
ables are often optimized while encoding longer expressions The following lemmas are used to show Theorem 4.5.
consisting of the same commutative connectives, such as 0
a conjunction with more than two arguments, A1 ∧ . . . ∧ Ak , Lemma 4.3. For every propositional formula A, πA is
with k > 2. In this case, one usually uses a single vari- a stratified program.
able together with the rule rA1 ∧...∧Ak :– rA1 , . . . , rAk . An Proof: By construction of πA 0
: whenever a variable, say rF ,
example of such an optimization is demonstrated in Line 3 is negated by \+ , it appears in a body of a rule with variable
of Program 2 which contains a conjunction of three literals. rE in its head such that F is a subformula of E. This pro-
Similarly, one optimizes auxiliary variables in disjunctions vides a natural stratification reflecting the subformula nest-
with more than two arguments. ing.
Proposition 4.1. For every propositional formula A, the One can now define a P ROB L OG program, Πm A , comput-
0
size of πA is liner in the size of A. ing the probability of a formula A. Πm A is specified by the
5
following probabilistic facts and rules: The following theorem can now be shown, stating that
programs Πm A can be used to calculate the probability mea-
def 0.5 :: p1 . . . . 0.5 :: pk . sures for a formula (theory) which are then used as a basis
Πm
A = 0 (15)
πA . for computing the values of the considered model counting-
based measures.
In this case, p1 , . . . , pk are all variables occurring in A; they
are assumed to be pairwise independent; and the the prob- Theorem 4.5. For every propositional formula A, the prob-
ability of A is given by querying for the probability of the ability of A is equal to the probability of rA given by the
auxiliary variable rA representing the formula A in the trans- P ROB L OG program Πm A , defined by (15).
0 Proof: By Lemma 4.4, rA is equivalent to A, so in every
formation defined in Table 1. rA is included in πA , the strat-
ified logic program associated with A. possible world, rA is true iff A is true. Assume p1 , . . . , pk
Note that: are all propositional variables occurring in A. There are 2k
possible worlds, so if A is true in n worlds, its probability is
• Πm
A as specified by (15) is a P ROB L OG program since, by n
Lemma 4.3, πA 0
is a stratified logic program; . Each propositional variable is assigned the probability
2k
• the probabilities 0.5, assigned to propositional variables 1
0.5 in ΠmA , so the probability of each possible world is k .
are used as a technical fix that ensures a uniform distribu- 2
tion on worlds and that guarantees the program specifies Since rA is true exactly in worlds where A is true and, by
the true number of models where A is true. the distribution semantics (Sato 1995) used in P ROB L OG,
the probability of rA is the sum of probabilities of possible
0
Lemma 4.4. Let πA occur in the program Πm
A . Then for worlds where rA is true, the value of rA is:
every subformula B of A occurring in the construction of X 1 n
0

πA , we have that rB is equivalent to B. rA = = k =P A .
2k 2
0 w∈W:w|=rA
Proof By Lemma 4.3, πA specifies a stratified logic pro-
gram. The semantics of stratified logic programs can be de-
In order to compute measures lossN C

fined in terms of least models of rules expressed by impli- m T , p̄ ,
cations. In a manner similar to the well-known completion lossSC T
m T , p̄ and lossm T , p̄ , it now suffices to run
technique (Clark 1977), programs Πm F N C (T
,p̄) , Πm
F SC (T ,p̄) and Πm
T . The probabili-
NC SC

each rB appearing in πA m
in a head of a rule (or heads ties P F (T , p̄) , P F (T , p̄) and P T required in
of two rules, as in the case of ∨, → and ≡), is actu- Definition 3.1 are given respectively as the probabilities of
ally equivalent to the rule’s body (respectively, dis- rF N C (T ,p̄) , rF SC (T ,p̄) and rT .
(16)
junction of bodies) due to the completion. This ob-
servation is used below when claiming equivalences, 5 Probabilistic-based Loss Measures
rather than implications, directly reflecting rules. of Forgetting
We proceed by structural induction on B. Model counting-based loss measures are defined using a uni-
form probability distribution on worlds. Probabilistic-based
loss measures generalize this idea to allow arbitrary proba-
0
• When B is a propositional variable, say p, then πB , be- bility distributions on worlds. In general, it is assumed that
def an arbitrary probability distribution over worlds is provided,
ing πp0 , is ∅, and rB = p, so the induction hypothesis is
trivially true. and one is then interested in probabilities of formulas wrt
this distribution. Let us then assume that a probability dis-
• When B is a negated formula, say ¬E, then rB , being tribution on worlds is given, PW : W −→ [0.0, 1.0]. Let T
r¬E , is defined by r¬E :– \+ rE , where \+ stands for nega- be a propositional theory. The probability of T wrt PW is
tion as failure. By observation (16), r¬E ≡ \+ rE . Note then defined by:
that probabilistic clauses in Πm A , given in the first line of def X
(15), assign truth values to all propositional variables oc- P0 T = PW w . (17)
curring in A. Therefore, truth values of all subformulas of w∈W, w|=T
A are uniquely determined, in which case the only reason Probabilistic-based loss measures of forgetting are de-
for a failure to prove an A’s subformula, say E, is the fal- fined as follows.
sity of E. Therefore, in Πm A , negation as failure becomes
classical negation. Thus rB = r¬E ≡ \+ rE ≡ ¬rE . By Definition 5.1. By probabilistic-based loss
measures of for-
the inductive assumption, rE is equivalent to E, so ¬rE getting wrt probability distribution P0 , we understand:
is equivalent to ¬E, which shows the equivalence of rB def
lossN C
T , p̄ = P0 F N C (T , p̄) − P0 T ;

p (18)
and B.
def
lossSC T , p̄ = P0 T − P0 F SC (T , p̄) ;

• When B is a non-negated subformula of A occurring in p (19)
0
the construction of πA , one uses the inductive assumption T
def NC
SC

together with observation (16) and obvious tautologies of lossp T , p̄ = P0 F (T, p̄) −P0 F (T, p̄) , (20)
propositional logic to show that the auxiliary variable rB where the underlying distribution on worlds is assumed to
is equivalent to B. be arbitrary.
6
As in the case of model counting-based loss measures, of all possible worlds, where propositional variables whose
lossTp T , p̄ = lossN C SC

p T , p̄ + loss p T , p̄ . The fol- probabilities are not given explicitly are also included (Line
lowing proposition also applies. 2 in the example). The following proposition shows that
Proposition 5.2. For ∗ ∈ {N C, SC, T } and adding these additional probabilistic facts 0.5 :: qi , does not
every proposi- affect the probability distribution on formulas over a vocab-
tional theory T , the measures loss∗p T , p̄ enjoy properties
ulary consisting of only p1 , . . . , pk .
analogous to those stated in Points 1–6 of Proposition 3.2,
In the following proposition, Ws̄ denotes the set of worlds
where loss∗m is substituted by loss∗p .
assigning truth values to propositional variables in s̄.
6 Computing Probabilistic-Based Loss Proposition 6.1. Let Pp̄ : Wp̄ −→ [0.0, 1.0] be a proba-
bility distribution on Wp̄ , specified by the first line in (21),
Measures Using P ROB L OG Pp̄,q̄ : Wp̄,q̄ −→ [0.0, 1.0] be the whole probability distribu-
As shown through research with probabilistic logic pro- T be an ar-
tion specified in the first two lines of (21), and let
gramming and applications of distribution semantics, the bitrary theory over vocabulary p̄. Then Pp̄ T = Pp̄,q̄ T ,
probability PW can be specified by using P ROB L OG prob- where the probability of T is defined by (17).
abilistic facts and rules (see, e.g., (De Raedt et al. 2016; Proof: According to (17),
Pfeffer 2016; Riguzzi 2023; Sato and Kameya 1997) and def X
references there). Pp̄,q̄ T = Pp̄,q̄ w .
Assuming that PW can be specified in P ROB L OG, w∈Wp̄,q̄ , w|=T
a P ROB L OG program, ΠpA , can be defined that computes the 1
probability of A wrt probability distribution PW : By (21), the probability of each qi ∈ q̄ is . There are n
2
1
P ROB L OG specification of PW

variables in q̄, so Pp̄,q̄ w = Pp̄ w ∗ n . Therefore,
def over p1 , . . . , pk . 2
ΠpA = 0.5 :: q1 . . . . 0.5 :: qn . (21) X 1
0 Pp̄,q̄ T = Pp̄ w ∗ n .
πA . 2
w∈Wp̄,q̄ , w|=T
In this case, p1 , . . . , pk are all probabilistic variables used Every world w ∈ Wp̄,q̄ extends the corresponding world
to define the probability distribution PW over worlds, w′ ∈ Wp̄ by assigning truth values to each variable in q̄.
p1 , . . . , pk are pairwise independent, q1 , . . . , qn are all vari- Each such world w′ has then 2n copies in Wp̄,q̄ (each copy
ables occurring in A other than p1 , . . . , pk , and the the prob- accompanied by a different assignment of truth values to
ability of A is given by querying for the probability of rA variables in q̄). Therefore,
0
which is included in πA , the stratified logic program associ- X 1
= 2n ∗

ated with A. Pp̄,q̄ T Pp̄ w ∗ n
Computing probabilistic-based loss measures 2
w∈Wp̄ , w|=T
lossN C
T , p̄ , lossSC T , p̄ and lossTm T , p̄ , can
X
p p

now be done by analogy with the method described in the = Pp̄ w = Pp̄ T .

last paragraph of Section 4. w∈Wp̄ , w|=T
As an example, consider theory Tc from Section 1, and

assume that in a district of interest, the probability of ecar 7 Generating Loss Measures: an Example
is 0.2 and the probability of jcar is 0.3. Assume further that In this example, we will use the theory Tc and gener-
the choices of jcar and ecar are independent and both ex- ate both model-counting-based and probabilistic-based loss
clude each other. Program 3 below can then be used, where measures for a forgetting policy p̄ = {ecar, jcar}.
an annotated disjunction (Line 1) is used to ensure a suitable In Section 4, it was shown how Tc could be transformed
probability distribution together with the mutual exclusion into a stratified logic program where its probability variables
of ecar, jcar. In this case P0 Tc is 0.33125. were all assigned the value 0.5, reflecting a uniform proba-
bility on worlds. In this case, P0 (Tc ) = 0.203125. In Sec-
Program 3: P ROB L OG program for computing tion 6, a probability distribution where ecar and jcar were
assigned probabilities, 0.2 and 0.3, respectively, resulted in

of Tc when P ecar = 0.2
the probability
P0 (Tc ) = 0.33125.
and P jcar = 0.3.
In the next step, F N C (Tc ; ecar, jcar) and
1 0.2 :: ecar; 0.3 :: jcar. F SC (Tc ; ecar, jcar) need to be reduced to proposi-
2 0.5 :: car. 0.5 :: reliable. 0.5 :: f ast. 0.5 :: f car. tional theories using quantifier elimination. Using results in
3 Lines 3–10 of Program 1 Section 2.1, forgetting ecar and jcar in theory Tc , given by
formulas (1)–(3) in Section 1, is defined by:
In summary, P ROB L OG is used to specify a given proba- F N C (Tc ; ecar, jcar) ≡ ∃ecar∃jcar Tc ;

(22)
bility on worlds and then rules of π are used for computing SC

probabilistic-based loss measures. F (Tc ; ecar, jcar) ≡ ∀ecar∀jcar Tc . (23)
Notice that, as in Section 4, the probabilistic facts 0.5 :: qi Using formulas (1)–(3) together with (22) and (23), it can be
(1 ≤ i ≤ n) serve as technical fix that enables generation shown that:
7
• F N C (Tc ; ecar, jcar) is equivalent to: 0
Program 4: P ROB L OG program π(25) encoding for-
∃ecar∃jcar jcar → (car ∧ reliable ∧ f car)∧ mula (25) equivalent to F N C (Tc ; ecar, jcar)
ecar → (car ∧ f ast ∧ f car)∧ (24) 1 r¬f car :– \+ f car.
f car → (ecar ∨ jcar) . 2 rcar∧f ast :– car, f ast.
3 rcar∧reliable :– car, reliable.
Using (5) and some equivalence preserving transforma- 4 r(car∧f ast)∨(car∧reliable) :– rcar∧f ast .
tions, one can show that (24) is equivalent to the proposi- 5 r(car∧f ast)∨(car∧reliable) :– rcar∧reliable .
tional formula:5 6 rf car→((car∧f ast)∨(car∧reliable)) :– r¬f car .
7 rf car→((car∧f ast)∨(car∧reliable)) :– r(car∧f ast)∨(car∧reliable).
f car → ((car ∧ f ast) ∨ (car ∧ reliable)). (25)
• F SC (Tc ; ecar, jcar) is equivalent to:
∀ecar∀jcar jcar → (car ∧ reliable ∧ f car)

∧ ecar, jcar, its probability is 0.6875 when computing both
lossTm and lossTp .
∀ecar∀jcar ecar → (car ∧ f ast ∧ f car ∧ (26)
The values of model counting-based as well probabilistic-
∀ecar∀jcar f car → (ecar ∨ jcar) ,
based measures of forgetting for Tc with forgetting policy
i.e, to: p̄ = {ecar, jcar} are shown in Table 2.
Notice that, in this particular case, lossN C

car ∧ reliable ∧ f car
∧ m is greater than
(27) lossN C
. That indicates that the reasoning strength wrt nec-
car ∧ f ast ∧ f car ∧ ¬f car, p
essary conditions is lost to a greater extent when we count
which simplifies to F.
equally probable models, compared to the case when mod-
The next step in the working methodology is to els get nonequal probabilities. At the same time, as regards
encode F N C (Tc ; ecar, jcar) as a stratified logic pro- reasoning with sufficient conditions, lossSC m is smaller than
gram in P ROB L OG. There is no need to encode lossSC
p . In this case, though the same models result from
F SC (Tc ; ecar, jcar), since it isequivalent to F and its prob- forgetting, the probabilistic measures indicate how the rea-
ability P F SC (Tc ; ecar, f car) = 0.0.6 soning strength changes when real world probabilities of
Program 4 below, contains rules encoding formula (25) models are taken into account.
which is equivalent to F N C (Tc ; ecar, jcar). In order to
compute model-counting-based loss measures, probabilistic Table 2: The values of measures of forgetting ecar, jcar from Tc .
facts of the form 0.5 :: p for each propositional variable p
occurring in F N C (Tc ; ecar, jcar), i.e., lossN
m
C
lossSC
m lossTm lossN
p
C
lossSC
p lossTp
0.5 :: car. 0.5 :: reliable. 0.5 :: f ast. 0.5 :: f car. , (28) 0.484375 0.203125 0.6875 0.3125 0.375 0.6875
are added to Program 4. To determine the probabil-

ity P F N C (Tc ; ecar, f car) , one sets up a P ROB L OG
query about the probability of the auxiliary variable 8 Extending Measures to the 1st-Order Case
rf car→((car∧f ast)∨(car∧reliable)) , defined in Lines 6–7 of In this section we deal with classical 1st-order logic. For the
Program 4. The query returns P F N C (Tc ; ecar, f car) = 1st-order language we assume finite sets: of relation sym-
0.6875. bols, R, (1st-order) variables, VI and constants, C. We
In order to compute the probabilistic-based loss measures, assume that C is the domain of 1st-order interpretations
the probability distribution is the same as in Program 3. That (worlds) we consider.
is, rather than adding specification (28) to Program 4, one In addition to propositional connectives we also allow
should instead add Lines 1–2 of Program 3. As before, the quantifiers ∀, ∃ ranging over the domain.7 We assume the
probability of F N C (Tc ; ecar, jcar) is given by the prob- standard semantics for 1st-order logic. A variable occur-
ability of its encoding specified by the auxiliary variable rence x is bound in a formula A when it occurs in the scope
rf car→((car∧f ast)∨(car∧reliable)) . But in fact, this of a quantifier ∀x or ∃x. A variable occurrence is free when
would not the occurrence is not bound. In the rest of this section we
change the probability P F N C (Tc ; ecar, jcar) . Since for-
write A(x̄) to indicate that x̄ are all variables that occur free
mula F N C (Tc ; ecar, jcar) does not contain the variables in A. Formula A is closed when it contains no free vari-
5
Here we have used the D LS algo- ables. By an atomic formula (atom, for short) we mean any
rithm (Doherty, Łukaszewicz, and Szałas 1997) together with formula of the form r(z̄), where r ∈ R, each z in z̄ is a vari-
obvious simplifications of the resulting formula. able in VI or a constant in C. We write r(x̄) to indicate that
6
Note that this is a limiting case for F SC (Tc ; p̄), where the re- x̄ are all variables in z̄. By a ground atom we mean an atom
sultant theory is inconsistent. Similarly, but not in this example, the containing only constants, if any.
limiting case for F NC (Tc ; p̄) would be where the resultant theory A 1st-order theory (belief base) is a finite set of closed 1st-
is a tautology. In the former case, the forgetting policy is simply order formulas, T , understood as a single formula being the
too strong, whereas in the latter, it would be too weak. What is
7
is important to observe is that the computational techniques pro- We avoid function symbols, as standard in rule-based
posed can identify these limiting cases, where probabilities of the query languages and underlying belief bases – see, e.g.,
resultant theories would be 0.0 and 1.0, respectively. (Abiteboul, Hull, and Vianu 1995).
8
conjunction of formulas in T .8 In what follows we always I
Program 5: A program π∀x∃y(s(x,y,a)∧t(y,b)) encoding
assume that the set of constants (thus the domain) consists of
all and only constants occurring in the considered theory.9 the formula ∀x∃y(s(x, y, a) ∧ t(y, b)).
The extension of definitions of model counting-based and I I def
1 % πs(x,y,a) , πt(y,b) are empty; rs(x,y,a) = s(x, y, a),
probabilistic measures is now straightforward when we as- def
rt(y,b) = t(y, b)
sume that worlds assign truth values to ground atoms rather
2 rs(x,y,a)∧t(y,b)(x, y) :– s(x, y, a), t(y, b).
than to propositional variables, and that models are worlds 3 r∃y(s(x,y,a)∧t(y,b))(x) :– rs(x,y,a)∧t(y,b)(x, y).
where given theories (formulas) are true. We shall use the 4 rB′ (x) :– \+ r∃y(s(x,y,a)∧t(y,b))(x).
same symbols loss∗m , loss∗p to denote these measures.

5 rA′ () :– rB′ (x).
In order to use P ROB L OG, we have to adjust transforma- 6 r∀x∃y(s(x,y,a)∧t(y,b))() :– \+ rA′ ().
tion of formulas provided in Table 1 to cover the extended
logical language, as shown in Table 3, where rather than aux-
iliary variables, we use auxiliary relation symbols.
• lemmas analogous to Lemmas 4.3, 4.4;
Table 3: Transformation of an arbitrary 1st-order formula to a strat- • a theorem analogous to Theorem 4.5.
ified logic program. We specify variables x̄ to indicate the argu-
ments of rules’ heads and all free variables occurring in formulas The construction of the P ROB L OG program Πm A given in
in rules’ bodies. Equation (15) is adjusted to the 1st-order case by replacing
probabilistic propositional variables by all ground atoms.
That is, if A(x̄) is a 1st-order formula then Πm
A becomes:
def

∅; rA = A when A is an atom;
0.5 :: r1 (c̄11 ). . . . 0.5 :: r1 (c̄1m1 ).


 −I − − − −
πB .

...

 when A = ¬B(x̄); Πm
def
rA (x̄) :– \+ rB (x̄). A = (29)




 −I − −I − − 0.5 :: rl (c̄l1 ). . . . 0.5 :: rl (c̄lml ).
πC . πD . I

πA (x̄).
rA (x̄) :– rC (x̄), rD (x̄). when A = (C ∧ D)(x̄);




−I − −I − −

where r1 , . . . rl are all relation symbols in R, for



 πC . πD .
1 ≤ i ≤ l, mi is the arity of relation ri , and



 rA (x̄) :– rC (x̄). when A = (C ∨ D)(x̄);

 rA (x̄) :– rD (x̄). r1 (c̄11 ), . . . r1 (c̄1m1 ), . . . rl (c̄l1 ), rl (c̄lml ) are all ground
−I − −I − −



 πC . πD . atoms with constants occurring in A(x̄).
The construction of the 1st-order version of ΠpA adjusts

rA (x̄) :– \+ rC (x̄). when A = (C → D)(x̄);



rA (x̄) :– rD (x̄).

I def the construction (21) as follows:
πA = −I − −I − −

 πC . πD . P ROB L OG specification of PW
rA′ (x̄) :– \+ rC (x̄).


over selected ground atoms specified.

rA′ (x̄) :– rD (x̄).




rA′′ (x̄) :– \+ rD (x̄). when A = (C ≡ D)(x̄); def in (29)
ΠpA =

(30)

rA′′ (x̄) :– rC (x̄).


 0.5 :: a1 . . . . 0.5 :: an . where a1 , . . . an
rA (x̄) :– rA′ (x̄), rA′′ (x̄).

are all other ground atoms

−I − − − −



πB . I
πA .

when A(x̄) = ∃yB(y, x̄);


rA (x̄) :– rB (y, x̄).


−I − − − −

For the above construction we have a proposition analo-



 πB .



 rB ′ (y, x̄) :– \+ rB (y, x̄). when A(x̄) = ∀yB(y, x̄). gous to Proposition 6.1.


 rA′ (x̄) :– rB ′ (y, x̄). Essentially, while extending our results to 1st-order logic,
rA (x̄) :– \+ rA′ (x̄). we simply switch from propositional variables to ground
atoms, and then suitably deal with quantifiers. The re-
sulting rules form stratified logic programs, so remain
As an example, consider a formula A, being within the P ROB L OG syntax. Though rules for quan-
∀x∃y(s(x, y, a) ∧ t(y, b)), where x, y are variables and a, b tifiers are similar to standard transformations like, e.g.,
are constants. The set of rules is given as Program 5. in (Lloyd and Topor 1984), the use of auxiliary relation
Similarly to the transformation given in Table 1, for the symbols allows us to properly deal with negation, by using
transformation given in Table 3, we have: it in a stratified manner.
• propositions analogous to Proposition 4.1; Point 1 of Table 4 provides a comparison of transformations given in
Proposition 4.2 and Proposition 6.1; linear complexity Tables 1 and 3 and the transformations of (Tseitin 1968) and
claimed in Point 2 of Proposition 4.2 does not hold in gen- (Lloyd and Topor 1984) from the point of view of properties
eral – here we have deterministic polynomial time wrt the needed in our paper. The rows ‘Sat’, ‘Equiv’, ‘#Models’ re-
number of constants occurring in A;10 spectively indicate whether a given transformation preserves
8 satisfiability, equivalence (in the sense of Lemma 4.4), and
The assumption as to the closedness of formulas is used to
the number of models, ‘Size’ – the (worst case) size of the
simplify the presentation. It can be dropped in a standard manner
by assuming an external assignment of constants to free variables. 1st-order queries (Abiteboul, Hull, and Vianu 1995): worlds can
9
That, is, we assume the Domain Closure Axiom. be seen as relational databases, and formulas – as queries to such
10
The polynomial time complexity reflects the complexity of databases.
9
Table 4: A comparison of properties of formula transformations Our transformation is inspired by Tseitin’s transformation
considered in the paper. of arbitrary propositional formulas into the Conjunctive
Tseitin Llloyd-Topor Table 1 Table 3
Normal Form (see, e.g., (Tseitin 1968; Kuiter et al. 2022;
Sat + + + + Lagniez, Lonca, and Marquis 2020)). Though Tseitin’s
Equiv − − + + transformation preserves satisfiability, it does not preserve
#Models + − + + equivalence, nor does it lead to stratified clauses. On the
Size linear exponential linear linear other hand, our transformation preserves equivalence in
1st-order − + − + the sense formulated in Lemma 4.4 and allows its use to-
Stratified − − + + gether with probabilistic logic programming languages that
use stratified negation, including P ROB L OG. For 1st-order
quantifiers, the transformation shown in Table 3 uses a
resulting formula wrt the size of the input formula, ‘1st- rather standard technique, similar to the well-known trans-
order’ – whether the technique is applicable to 1st-order formation of (Lloyd and Topor 1984). However, the use of
logic, and ‘Stratified’ – whether the result is stratified wrt auxiliary relation symbols again allows us to obtain strat-
negation. ified programs. On the other hand, the transformations
of (Lloyd and Topor 1984) and alike could not be directly
9 Related Work applied here as they generally do not ensure stratification.
For a comparison of Tseitin’s, Llloyd-Topor and our trans-
Strong (Standard) forgetting, F N C (), has been intro- formations from the point of view of properties needed in
duced in the foundational paper (Lin and Reiter 1994), our paper, see Table 4.
where model theoretical definitions and analysis of prop- In the past decades there has been a fundamen-
erties of the standard f orget() operator are provided. tal interest in combining logic with probabilities, in
The paper (Lin and Reiter 1994) opened a research sub- particular in the form of probabilistic programming
area, summarized, e.g., in (van Ditmarsch et al. 2009) or languages which could also be used for comput-
more recently, e.g., in (Eiter and Kern-Isberner 2019; ing measures such as those proposed in our paper.
Doherty and Szałas 2024). Numer- Such languages include C LP (B N ) (Costa et al. 2008),
ous applications of forgetting in- CP (Vennekens, Denecker, and Bruynooghe 2006),
clude (van Ditmarsch et al. 2009; Beierle et al. 2019; I CL (Poole 2008), LP MLN (Lee and Yang 2017),
Delgrande 2017; Del-Pinto and Schmidt 2019; P RISM (Sato and Kameya 1997),
Eiter and Kern-Isberner 2019; Zhao and Schmidt 2017; P RO PPR (Wang, Mazaitis, and Cohen 2013). For books
Zhao and Schmidt 2016; Gonçalves, Knorr, and Leite 2021; surveying related approaches see (De Raedt et al. 2016;
Wang, Wang, and Zhang 2013; Zhang and Foo 2006). In Pfeffer 2016; Riguzzi 2023). Our choice of P ROB L OG is
summary, while the topic of forgetting has gained con- motivated by the rich and extensive research around it, both
siderable attention in knowledge representation, measures addressing theory and applications, as well as by its efficient
of forgetting have not been proposed in the literature. implementation.
Notice that the weak forgetting operator F SC (), introduced To deal with 2nd-order quantifiers, which are inevitably
in (Doherty and Szałas 2024), plays a Tsubstantial role
in associated with forgetting, one can use a variety of tech-
definitions of measures lossSC , lossm , lossSC

m p and niques – see, e.g., (Gabbay, Schmidt, and Szałas 2008) and,
lossTp .

for the context of forgetting, (Doherty and Szałas 2024). For
The model counting problem #SAT as well eliminating 2nd-order quantifiers, in the paper we used the
as #SAT-solvers have intensively been in- D LS algorithm (Doherty, Łukaszewicz, and Szałas 1997).
vestigated (Fichte, Hecher, and Hamiti 2021; Measures considered in our paper are relatively indepen-
Chakraborty, Meel, and Vardi 2021; dent of a particular formalism since we only require algo-
Gomes, Sabharwal, and Selman 2021; rithms for counting models or probabilities over them. For
Soos and Meel 2019). The methods and algo- example, forgetting is particularly useful in rule-based lan-
rithms for #SAT provide a solid alternative for guages, when one simplifies a belief base to improve query-
implementing model counting-based measures. ing performance. This is especially useful for forgetting in
In particular projected model counting, such as Answer Set Programs (Gonçalves, Knorr, and Leite 2021;
that considered in (Lagniez and Marquis 2019; Wang, Wang, and Zhang 2013; Zhang and Foo 2006),
Lagniez, Lonca, and Marquis 2020; where the corresponding entailment tasks, centering
Gebser, Kaufmann, and Schaub 2009), could be used around necessary conditions, are typically intractable.
for computing the measures lossN m
C
since variables Our approach can be adjusted to this framework using
are projected out using existential quantification which algorithms for counting ASP models, like those investigated
directly corresponds to standard forgetting (see Point 1 of in (Aziz et al. 2015).
Theorem 2.1).
We have chosen P ROB L OG as a uniform framework for 10 Conclusions
computing both model counting-based as well as prob- Two types of loss measures have been proposed, one model
abilistic measures. This, in particular, has called for counting-based and the other probabilistic-based, to mea-
a new formula transformation, as shown in Table 1. sure the losses in inferential strength for theories where
10
different forgetting policies and operators are used. The r_A2 :- q.
model counting-based approach is based on an underly- r_q_equiv_p_and_not_q_and_s :- r_A1, r_A2.
ing uniform probability distribution for worlds, whereas the
probabilistic-based approach is based on the use of arbitrary query(r_q_equiv_p_and_not_q_and_s).
probability distributions on worlds. The former can be seen
as a non-weighted approach and the latter as a weighted ap- Program 3
proach to measuring inferential strength and the losses en- The following program computes the probability
of the-
sued through the use of forgetting. A computational frame- ory Tc expressed by (1)–(3) when P ecar = 0.2 and
work based on P ROB L OG is proposed that allows analysis P jcar = 0.3, and the choices of jcar and ecar exclude
of any propositional or closed 1st-order theory with finite each other.
domains relative to different forgetting policies. 0.2::ecar; 0.3::jcar.
The framework proposed should have applications be- 0.5::car. 0.5::reliable.
yond forgetting. For instance, the example in Section 1 has 0.5::fast. 0.5::fcar.
been used in (Nayak and Levy 1995) as a means of mod-
eling abstraction through the use of forgetting where one r :- car, reliable, fcar.
abstracts from ecar, jcar. The loss measures proposed in r :- \+jcar. % r represents (1)
this paper should be useful as one criterion for determin- s :- car, fast, fcar.
ing degrees of abstraction. Earlier in the paper, it was s :- \+ecar. % s represents (2)
pointed out that model counts can be derived directly from u:- jcar.
u:- ecar.
the model-counting-based approach to loss. It would be in-
u:- \+ fcar. % u represents (3)
teresting to empirically compare the use of P ROB L OG and t :- r, s, u. % t represents (1)&(2)&(3)
the techniques proposed here with existing model counting
approaches as mentioned in Section 9. query(t).
It should be mentioned that all P ROB L OG programs spec-
ified in this paper are accessible for experimentation and ver- Program 4
ification using a special P ROB L OG web interface. 11 The following program computes the probability of the for-
mula (25) equivalent to F N C (Tc ; ecar,
jcar). To compute
Appendix A: P ROB L OG Programs the probability of (25) when P ecar = 0.2, P jcar =

Program 1 0.3, and ecar, jcar exclude each other, it suffices to replace
the first line by: 0.2::ecar; 0.3::jcar.
The following program computes the probability of theory
Tc expressed by (1)–(3). 0.5::ecar. 0.5::jcar.
0.5::car. 0.5::reliable.
0.5::car. 0.5::reliable. 0.5::fast. 0.5::fast. 0.5::fcar.
0.5::fcar. 0.5::jcar. 0.5::ecar.
r_not_fcar :- \+ fcar.
r :- car, reliable, fcar. r_car_and_fast :- car, fast.
r :- \+jcar. % r represents (1) r_car_and_reliable :- car, reliable.
s :- car, fast, fcar. r_car_and_fast_or_car_and_reliable
s :- \+ecar. % s represents (2) :- r_car_and_fast.
u:- jcar. r_car_and_fast_or_car_and_reliable
u:- ecar. :- r_car_and_reliable.
u:- \+ fcar. % u represents (3) r_fcar_impl_car_and_fast_or_car_and_reliable
t :- r, s, u. % t represents (1)&(2)&(3) :-r_not_fcar.
query(t). r_fcar_impl_car_and_fast_or_car_and_reliable
:- r_car_and_fast_or_car_and_reliable.
Program 2
query(r_fcar_impl_car_and_fast_or_car_and_reliable).
The following program computes
the probability of the for-
mula q ≡ (p ∧ ¬q ∧ s) . Program 5
0.5::p. 0.5::q. 0.5::s. The following program computes the probability of formula
∀x∃y(s(x, y, a) ∧ t(y, b)), assuming that the underlying do-
r_not_q :- \+ q.
main consists of a, b (expressed by dom(a), dom(b)).
r_p_and_not_q_and_s :- p, r_not_q, s.
r_A1 :- \+ q. 0.5::s(X,Y,Z):- dom(X), dom(Y), dom(Z).
r_A1 :- r_p_and_not_q_and_s. 0.5::t(X,Y):- dom(X), dom(Y).
r_A2 :- \+ r_p_and_not_q_and_s.
dom(a). dom(b).
11
For the convenience of the reader, P ROB L OG sources, in
ready to copy and paste form are included in the appendix. r_sxya_and_tyb(X,Y) :- s(X,Y,a), t(Y,b).
They can be tested using the web interface accessible from r_exists_y_sxya_and_tyb(X)
https://dtai.cs.kuleuven.be/problog/editor.html. :- r_sxya_and_tyb(X,Y).
11
r_B1(X) :- \+ r_exists_y_sxya_and_tyb(X). Eiter, T., and Kern-Isberner, G. 2019. A brief survey on
r_A1 :- r_B1(X). forgetting from a knowledge representation and reasoning
r_forall_x_exists_y_sxya_and_tyb :- \+ r_A1. perspective. Künstliche Intell. 33(1):9–33.
query(r_forall_x_exists_y_sxya_and_tyb). Fichte, J. K.; Hecher, M.; and Hamiti, F. 2021. The model
counting competition 2020. ACM J. of Experimental Algo-
rithmics 26(13):1–26.
Acknowledgments
Gabbay, D. M.; Schmidt, R. A.; and Szałas, A. 2008.
The research has been supported by a research grant from Second-Order Quantifier Elimination. Foundations, Compu-
the ELLIIT Network Organization for Information and tational Aspects and Applications, volume 12 of Studies in
Communication Technology, Sweden. Logic. College Publications.
Gebser, M.; Kaufmann, B.; and Schaub, T. 2009. Solution
References enumeration for projected boolean search problems. In van
Abiteboul, S.; Hull, R.; and Vianu, V. 1995. Foundations of Hoeve, W. J., and Hooker, J. N., eds., 6th Int. Conf. CPAIOR,
Databases. Addison-Wesley. volume 5547 of LNCS, 71–86. Springer.
Aziz, R. A.; Chu, G.; Muise, C. J.; and Stuckey, P. J. 2015. Gomes, C. P.; Sabharwal, A.; and Selman, B. 2021. Model
Stable model counting and its application in probabilistic counting. In Biere et al. (2021). 993–1014.
logic programming. In Bonet, B., and Koenig, S., eds., Proc.
Gonçalves, R.; Knorr, M.; and Leite, J. 2021. Forgetting in
29th AAAI Conf. on AI, 3468–3474. AAAI Press.
Answer Set Programming - A survey. Theory and Practice
Beierle, C.; Kern-Isberner, G.; Sauerwald, K.; Bock, T.; and of Logic Programming 1–43.
Ragni, M. 2019. Towards a general framework for kinds of
Kuiter, E.; Krieter, S.; Sundermann, C.; Thüm, T.; and
forgetting in common-sense belief management. Künstliche
Saake, G. 2022. Tseitin or not Tseitin? The impact of CNF
Intell. 33(1):57–68.
transformations on feature-model analyses. In 37th ASE,
Biere, A.; Heule, M.; van Maaren, H.; and Walsh, T., eds. IEEE/ACM Int. Conf. on Automated Software Engineering,
2021. Handbook of Satisfiability, volume 336 of Frontiers 110:1–110:13.
in AI and Applications. IOS Press.
Lagniez, J.-M., and Marquis, P. 2019. A recursive algorithm
Chakraborty, S.; Meel, K. S.; and Vardi, M. Y. 2021. Ap- for projected model counting. In The 33rd AAAI Conf. on AI,
proximate model counting. In Biere et al. (2021). 1015– 1536–1543. AAAI Press.
1045.
Lagniez, J.-M.; Lonca, E.; and Marquis, P. 2020. Definabil-
Clark, K. L. 1977. Negation as failure. In Gallaire, H., and ity for model counting. Artif. Intell. 281:103229.
Minker, J., eds., Logic and Data Bases, Symposium on Logic
and Data Bases, Advances in Data Base Theory, 293–322. Lee, J., and Yang, Z. 2017. Lpmln , weak constraints, and
New York: Plemum Press. P-log. In Proc. 31st AAAI Conf., 1170–1177.
Costa, S. V.; Page, D.; Quazi, M.; and Cussens, J. 2008. Lin, F., and Reiter, R. 1994. Forget it! In Proc. of the AAAI
CLP(BN): Constraint logic programming for probabilistic Fall Symp. on Relevance, 154–159.
knowledge. In De Raedt et al. (2008), 156–188. Lloyd, J. W., and Topor, R. W. 1984. Making Prolog more
De Raedt, L., and Kimmig, A. 2015. Probabilistic (logic) expressive. Journal of Logic Programming 1(3):225–240.
programming concepts. Machine Learning 100(1):5–47. Nayak, P. P., and Levy, A. Y. 1995. A semantic theory of
De Raedt, L.; Frasconi, P.; Kersting, K.; and Muggleton, S., abstractions. In Proc. 14th IJCAI 95, 196–203. Morgan
eds. 2008. Probabilistic Inductive Logic Programming - Kaufmann.
Theory and Applications, volume 4911 of LNCS. Springer. Pfeffer, A. 2016. Practical Probabilistic Programming.
De Raedt, L.; Kersting, K.; Natarajan, S.; and Poole, D. Manning Pub. Co.
2016. Statistical Relational Artificial Intelligence: Logic, Poole, D. 2008. The independent choice logic and beyond.
Probability, and Computation. Synthesis Lectures on AI and In De Raedt et al. (2008), 222–243.
ML. Morgan & Claypool Pub. Riguzzi, F. 2023. Foundations of Probabilistic Logic Pro-
Del-Pinto, W., and Schmidt, R. A. 2019. Abox abduction gramming. Languages, Semantics, Inference and Learning.
via forgetting in ALC. In The 33rd AAAI Conf. on AI, 2768– Series in Software Engineering. River Publishers.
2775. Sato, T., and Kameya, Y. 1997. PRISM: A language for
Delgrande, J. P. 2017. A knowledge level account of forget- symbolic-statistical modeling. In Proc. of the 15th IJCAI,
ting. J. Artif. Intell. Res. 60:1165–1213. 1330–1339. Morgan Kaufmann.
Doherty, P., and Szałas, A. 2024. Dual forgetting operators Sato, T. 1995. A statistical learning method for logic pro-
in the context of weakest sufficient and strongest necessary grams with distribution semantics. In Proc. 12th Int. Conf.
conditions. Artif. Intell. 326:104036. on Logic Programming ICLP, 715–729.
Doherty, P.; Łukaszewicz, W.; and Szałas, A. 1997. Com- Soos, M., and Meel, K. S. 2019. BIRD: engineering an
puting circumscription revisited. J. Automated Reasoning efficient CNF-XOR SAT solver and its applications to ap-
18(3):297–336. proximate model counting. In 33rd AAAI, 1592–1599.
12
Tseitin, G. S. 1968. On the complexity of derivation in
propositional calculus. In Structures in Constructive Math-
ematics and Mathematical Logic, 115–125. Steklov Mathe-
matical Institute.
van Ditmarsch, H.; Herzig, A.; Lang, J.; and Marquis, P.
2009. Introspective forgetting. Synth. 169(2):405–423.
Vennekens, J.; Denecker, M.; and Bruynooghe, M. 2006.
Representing causal information about a probabilistic pro-
cess. In Fisher, M.; van der Hoek, W.; Konev, B.; and
Lisitsa, A., eds., Proc. Logics in AI, 10th JELIA, volume
4160 of LNCS, 452–464. Springer.
Wang, W. Y.; Mazaitis, K.; and Cohen, W. W. 2013. Pro-
gramming with personalized pagerank: a locally groundable
first-order probabilistic logic. In He, Q.; Iyengar, A.; Nejdl,
W.; Pei, J.; and Rastogi, R., eds., 22nd ACM Int. Conf. on
Information and Knowledge Management, 2129–2138.
Wang, Y.; Wang, K.; and Zhang, M. 2013. Forgetting for
Answer Set Programs revisited. In Rossi, F., ed., Proc. IJ-
CAI’2013, 1162–1168.
Zhang, Y., and Foo, N. Y. 2006. Solving logic program
conflict through strong and weak forgettings. Artif. Intell.
170(8):739–778.
Zhao, Y., and Schmidt, R. A. 2016. Forgetting concept and
role symbols in ALCOIHµ+ (∇, ⊓)-ontologies. In Kamb-
hampati, S., ed., Proc. IJCAI’2016, 1345–1352.
Zhao, Y., and Schmidt, R. A. 2017. Role forget-
ting for ALCOQH (universal role)-ontologies using an
Ackermann-based approach. In Sierra, C., ed., Proc. IJ-
CAI’17, 1354–1361.
13

Techniques For Measuring The Inferential Strength of Forgetting Policies

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Techniques For Measuring The Inferential Strength of Forgetting Policies

Uploaded by

Copyright:

Available Formats

Techniques for Measuring the Inferential Strength of Forgetting Policies

Patrick Doherty1 , Andrzej Szałas1,2

Abstract The goal of this paper is to define such measures

• the resulting probability and loss measures can be used 

As an example, consider theory Tc from Section 1, and

are added to Program 4. To determine the probabil-

You might also like