Professional Documents
Culture Documents
Semester 2
School of Computing
IMPORTANT INFORMATION
This tutorial letter contains model solutions for the self-assessment assignment.
BAR CODE
university
Define Tomorrow. of south africa
Solution
SELF ASSESSMENT ASSIGNMENT 01
Study material: Chapters 9, and 18. You may skip sections 9.3, and 9.4. You only need to
study 18.1, 18.2, and 18.3.
Question 1
• Eliminate implication:
∀y (¬∀x P(x) ∨ ∃x (∀z Q(x, z) ∨ ∀z R(x, y, z)))
• Move ¬ inwards
∀y (∃x ¬P(x) ∨ ∃x (∀z Q(x, z) ∨ ∀z R(x, y, z)))
• Standardize variables: The variable name x is used in the scope of two different quantifiers,
so we change the name of one of them to avoid confusion when we drop the quantifiers. The
same applies for the variable name z.
∀y (∃x ¬P(x) ∨ ∃u (∀z Q(u, z) ∨ ∀v R(u, y, v ))) Note that the scope of (∀z) is the predicate
Q(u, z), and the scope of (∀v ) is the predicate R(u, y, v ). On the other hand, the scope of
(∀y) is the whole formula.
• Skolemize
∀y (¬P(f (y)) ∨ (∀z Q(g(y), z) ∨ ∀v R(g(y), y, v ))) Here f and g are Skolem functions, and
their arguments are all the universally quantified variables in whose scope the existential
quantifier appears. In this case, ∃u appears only in the scope of ∀y. Note that the same
variable u appears twice in the sentence, so we replace each occurrence of u with g(y ).
• Drop universal quantifiers: At this point, all remaining variables are universally quantified.
Moreover, the sentence is equivalent to one in which all the universal quantifiers have been
moved to the left: ∀y, z, v (¬P(f (y)) ∨ (Q(g(y), z) ∨ R(g(y), y, v )))
We can therefore drop the universal quantifiers: ¬P(f (y)) ∨ Q(g(y), z) ∨ R(g(y), y, v )
The sentence is now in CNF and it has only one conjunct (one clause) consisting of a disjunction
of three literals.
Question 2
a. Anyone who passes their history exam and who wins the lottery is happy.
2
COS3751/203/2/2018
d. John is lucky.
(2.2) Translate the above English sentences to FOL statements using the vocabulary you
defined above.
c. ¬Study (John)
d. Lucky(John)
It is important to standardize the variables (to use different variable names) to avoid
confusion when dropping the universal quantifiers.
(2.3) Convert the FOL statements obtained in 2.2 into clause form.
c. No conversion required.
d. No conversion required.
3
e. ∀s (Lucky (s) ⇒ Win(s, Lottery))
≡ ¬Lucky (s) ∨ Win(s, Lottery)
3. ¬Lucky(q) ∨ Pass(q, r )
4. ¬Study (John)
5. Lucky(John)
Note that universal quantifiers have been dropped because all variables were uni-
versally quantified. Skolem functions are only introduced only to remove existential
quantifiers. (See section 9.5 of R&N.)
5. Lucky(John) assumption
12. ∅ 10&11
4
COS3751/203/2/2018
We have shown that the negation of the goal together with the premises (all in clause
form) produce a contradiction (empty clause). Therefore the goal Happy(John), which
translates to ‘John is happy’, is a logical consequence of the other sentences.
It is important to show which clauses form part of the resolution to produce the resol-
vent. You must also always show the substitutions.
Remember: Given a fact with a constant, one cannot simply replace the constant
with a variable to derive a new (general) statement. For example, suppose we add
Carol to the list of persons in the vocabulary for the above problem. We cannot derive
Lucky(x) {x/John}, and then Lucky(Carol) {Carol/x}
Question 3
x1 x2 x3 f(x1 , x2 , x3 )
0 0 0 1
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 1
No information gain values were given, so it becomes a matter of picking the sequence
in which the variables are to be evaluated. The next step would be to simplify the
decision tree by consolidating equivalent leaf nodes.
Variable order x1, x2, x3:
x1
0 1
x2 x2
0 1 0 1
x3 x3 x3 x3
0 1 0 1 0 1 0 1
1 0 1 1 1 0 0 1
5
x1
0 1
x2 x2
0 1 0 1
x3 1 x3 x3
0 1 0 1 0 1
1 0 1 0 0 1
x1
0 1
x3 x3
0 1 0 1
1 x2 x2 x2
0 1 0 1 0 1
0 1 1 0 0 1
x2
0 1
x3 x3
0 1 0 1
x2 x2 1 x2
0 1 0 1 0 1
1 0 1 0 0 1
6
COS3751/203/2/2018
x2
0 1
x3 x3
0 1 0 1
1 0 x1 1
0 1
1 0
x3
0 1
x2 x2
0 1 0 1
1 x1 0 1
0 1
1 0
x3
0 1
x1 x1
0 1 0 1
1 x2 x2 x2
0 1 0 1 0 1
1 0 0 1 0 1
7
(3.2) When we construct a decision tree without the benefit of gain values, the order in
which we evaluate the variables is important. Why?
It may be possible to consolidate leaf nodes with similar values to produce a smaller,
more compact tree.
Question 4
The National Credit Act introduced in South Africa in 2007 places more responsibility on a bank to
determine whether the loan applicant will be able to afford it. Blue Bank has a table of information
on 14 loan applications they have received in the past.
Use the information in this table to construct a decision tree that will assist the bank in determining
the risk associated with a new loan application.
In order to determine the information gain of any different attributes of a certain collection of data,
the calculation of entropy is important the notion of information gain is defined in terms of entropy.
Entropy can be described as a measure of impurity of an arbitrary collection of examples. Russell
& Norvig describe entropy as a measure of the uncertainty of a random variable, and mention that
the acquisition of information corresponds to a reduction in entropy. Namely a random variable
with only one value has no uncertainty and thus its entropy is defined as zero; thus we gain no
information by observing its value.
For any attribute A, if no information can be gained regarding the decision from attribute A, then the
entropy of attribute A is equal to 0. If all the members of a class are split equally by any attribute
A, then the entropy of A is 1. For any collection S the maximum entropy value is 1.
Formally, the entropy of a random variable V with value vk , each with probability P(vk ) is defined
as:
n n
X 1 X 1
H(V ) = −P(vk )log2 =− P(vk )log2 (1)
P(vk ) P(vk )
k=1 k=1
8
COS3751/203/2/2018
Maybe a more comprehensible way of defining the entropy of any collection S is as follows:
c
X
Entropy(S) = − pi log2 (pi ) (2)
i=1
This corresponds to what Russell & Norvig refer to as information content. If the possible answers
vi have probabilities P(vi ) then the information content I of the actual answer is given by:
n
X
I(P(v1 ), ... , P(vn )) = −P(vi )log2 P(vi ) (3)
i=1
You may see the negative as a way of making the values of pi log2 (pi ) positive because all pi < 1,
i.e. each represents a probability of the sample set.
In what follows we will use the ID3 algorithm to develop our decision tree.
Our collection of 14 examples S, has four meaningful attributes (Credit History, Debt, Collateral,
Income), and a 3-wise classification (Low, Medium, High). Since there are no repeat values in the
No. attribute it plays no role in the classification and can be ignored (although we will show the No.
attribute in further tables so that we know which examples we are working with).
For purposes of clarity in the formulae, we will shorten the attribute labels and values as follows:
• Credit History = CH
• Debt = D
• Collateral = C
• Income = I
• Bad = B
• Unknown = U
• Good = G
• High = H
• Low = L
• <R15K = 15K
• R15K-R35K = 15K-35K
• >R35K = 35K
• Medium = M
We will also not be repeating calculations at every step: we show the full set of calculations for the
first level of the tree, after that we will only show the final result of the calculation.
Summarise the example set, S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}, as: S = [5L , 3M , 6H ] We
start by calculating the Entropy for our set of examples:
9
3
X
Entropy(S) = −pi log2 pi
i=1
5 5 3 3 6 6
= −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
14 14 14 14 14 14
5 3 6
= −( )(log2 5 − log2 14) − ( )(log2 3 − log2 14) − ( )(log2 6 − log2 14)
14 14 14
5 log5 log14 3 log3 log14 6 log6 log14
= −( ) − −( ) − −( ) −
14 log2 log2 14 log2 log2 14 log2 log2
= 0.531 + 0.476 + 0.524
= 1.531(max = 1.585)
We have to determine the information gain (IG) of the different attributes in order to select the best
choice for the root node. The information gain measures the expected reduction of entropy – the
higher the IG, the higher the expectation of reduction of entropy.
In what follows we calculate the information gain for each of the four attributes.
Credit History:
Values (CH ) = B, G, U
S = [5L , 3M , 6H ]
SB ← [0L , 1M , 3H ]
SG ← [3L , 1M , 1H ]
SU ← [2L , 1M , 2H ]
0 0 1 1 3 3
Entropy(SB ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
4 4 4 4 4 4
= 0.811
3 3 1 1 1 1
Entropy(SG ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
5 5 5 5 5 5
= 1.371
2 2 1 1 2 2
Entropy(SU ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
5 5 5 5 5 5
= 1.522
X |Sv |
Gain(S, CH) = Entropy(S) − Entropy(Sv )
|S|
v ∈Values(A)
4 5 5
= Entropy(S) − ( )Entropy(SB ) − ( )Entropy(SG ) − ( )Entropy(SU )
14 14 14
4 5 5
= 1.531 − ( )0.811 − ( )1.371 − ( )1.522
14 14 14
= 0.266
10
COS3751/203/2/2018
Debt:
Values(D) = L, H
S = [5L , 3M , 6H ]
SL ← [3L , 2M , 2H ]
SH ← [2L , 1M , 4H ]
3 3 2 2 2 2
Entropy(SL ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
7 7 7 7 7 7
= 1.557
2 2 1 1 4 4
Entropy(SH ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
7 7 7 7 7 7
= 1.379
7 7
Gain(S, D) = Entropy(S) − ( )Entropy(SL ) − ( )Entropy(SH )
14 14
7 7
= 1.531 − ( )1.557 − ( )1.379
14 14
= 0.063
Collateral:
Values (C) = Y,N
S = [5L , 3M , 6H ]
SY ← [2L , 1M , 0H ]
SN ← [3L , 2M , 6H ]
2 2 1 1 0 0
Entropy(SY ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
3 3 3 3 3 3
= 0.918
3 3 2 2 6 6
Entropy(SN ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
11 11 11 11 11 11
= 1.435
3 11
Gain(S, C) = Entropy(S) − ( )Entropy(SY ) − ( )Entropy(SN )
14 14
3 11
= 1.531 − ( )0.918 − ( )1.435
14 14
= 0.207
Income:
Values (I ) = 15 , 15-35, 35
S = [5L , 3M , 6H ]
11
S15 ← [0L , 0M , 4H ]
S15−35 ← [0L , 2M , 2H ]
S35 ← [5L , 1M , 0H ]
0 0 0 0 4 4
Entropy(S15 ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
4 4 4 4 4 4
= 0
0 0 2 2 2 2
Entropy(S15−35 ) = −( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
4 4 4 4 4 4
= 1
5 5 1 1 0 0
Entropy(S35 ) = ( )log2 ( ) − ( )log2 ( ) − ( )log2 ( )
6 6 6 6 6 6
= 0.650
4 4 6
Gain(S, I) = Entropy(S) − ( )Entropy(S15 ) − ( )Entropy(S15−35 ) − ( )Entropy(S35 )
14 14 14
4 4 6
= 1.531 − ( )0 − ( )1 − ( )0.650
14 14 14
= 0.967
Attribute Income provides the highest information gain (0.967), i.e. the best prediction for our target
attribute, Risk. Income becomes the root node of our decision tree.
Income
The ID3 algorithm now performs a recursion with the three subsets of our examples, based on the
three possible values of the Income attribute (<R15K, R15K-R35K, >R35K).
The first subset, corresponding to Income = {< R15K }, is:
We notice that the target attribute is the same for all four of our examples, i.e. all four examples
produce the same target value, High. We have our first leaf node, labelled High.
12
COS3751/203/2/2018
Income
High
Our subset collection of 4 examples S, has three meaningful attributes (Credit History, Debt, Col-
lateral), and a binary classification (Medium, High): S15K −35K = {2, 3, 12, 14} = [2M , 2H ].
We start by calculating the Entropy for this subset:
2
X
Entropy(S15K −35K ) = −pi log2 pi
i=1
2 2 2 2
= −( )log2 ( ) − ( )log2 ( )
4 4 4 4
= 0.5 + 0.5
= 1
Calculate the information gain for the attributes of the subset (entropy for each attribute is calculated
in the same fashion as above).
Credit History:
1 1 2
Gain(S15K −35K , CH) = Entropy(S15K −35K ) − ( )Entropy(SB ) − ( )Entropy(SG ) − ( )Entropy(SU )
4 4 4
1 1 2
= 1 − ( )0 − ( )0 − ( )1
4 4 4
= 0.5
Debt:
1 3
Gain(S15K −35K , D) = Entropy(S15K −35K ) − ( )Entropy(SL ) − ( )Entropy(SH )
4 4
1 3
= 1 − ( )0 − ( )0.918
4 4
= 0.312
13
Collateral:
0 4
Gain(S15K −35K , C) = Entropy(S15K −35K ) − ( )Entropy(SY ) − ( )Entropy(SN )
4 4
0 4
= 1 − ( )0 − ( )1
4 4
= 0
Attribute Credit History provides the highest Information Gain in this subset (0.5). Thus it becomes
our next decision node.
Income
Bad Unknown
Good
Our subset collection of 6 examples S, has three meaningful attributes (Credit History, Debt, Col-
lateral), and a binary classification (Low, Medium):
2
X
Entropy(S35K ) = −pi log2 pi
i=1
5 5 1 1
= −( ) log2 ( ) − ( ) log2 ( )
6 6 6 6
= 0.65
14
COS3751/203/2/2018
We now proceed to calculate the information gain for each of the three attributes in the subset:
Credit History:
Values(CH) = B, G, U
S35K = [5L , 1M ]
SB ← [0L , 1M ]
SG ← [3L , 0M ]
SU ← [2L , 0M ]
0 0 1 1
Entropy(SB ) = −( )log2 ( ) − ( )log2 ( )
1 1 1 1
= 0
3 3 0 0
Entropy(SG ) = −( )log2 ( ) − ( )log2 ( )
3 3 3 3
= 0
2 2 0 0
Entropy(SU ) = −( )log2 ( ) − ( )log2 ( )
2 2 2 2
= =0
1 3 2
Gain(S35K , CH) = Entropy(S35K ) − ( )Entropy(SB ) − ( )Entropy(SG ) − ( )Entropy(SH )
6 6 6
1 3 2
= 0.65 − ( )0 − ( )0 − ( )0
6 6 6
= 0.65
Debt:
Values(D) = L, H
S35K = [5L , 1M ]
SL ← [5L , 1M ]
SH ← [2L , 0M ]
15
3 3 1 1
Entropy(SL ) = −( )log2 ( ) − ( )log2 ( )
4 4 4 4
= 0.811
2 2 0 0
Entropy(SG ) = −( )log2 ( ) − ( )log2 ( )
2 2 2 2
= 0
4 2
Gain(S35K , D) = Entropy(S35K ) − ( )Entropy(SL ) − ( )Entropy(SH )
6 6
4 2
= 0.65 − ( )0.811 − ( )0
6 6
= 0.109
Collateral:
Values(D) = Y , N
S35K = [5L , 1M ]
SY ← [2L , 1M ]
SN ← [3L , 0M ]
2 2 1 1
Entropy(SL ) = −( )log2 ( ) − ( )log2 ( )
3 3 3 3
= 0.918
3 3 0 0
Entropy(SG ) = −( )log2 ( ) − ( )log2 ( )
3 3 3 3
= 0
3 3
Gain(S35K , D) = Entropy(S35K ) − ( )Entropy(SY ) − ( )Entropy(SN )
6 6
3 3
= 0.65 − ( )0.918 − ( )0
6 6
= 0.191
In summary:
Attribute Credit History provides the highest Information Gain in this subset and becomes the
decision node on this branch of the decision tree.
16
COS3751/203/2/2018
Income
The algorithm continues with recursion to the next level. The subset of Credit History that corre-
sponds to the Income = R15K-R35K and Credit History = Bad provides a single example (14).
Income
High
The subset of Credit History that corresponds to the Income = R15K-R35K and Credit History =
Good is also a single example (12).
17
Income
High Medium
The subset of Credit History that corresponds to the Income = R15K-R35K and Credit History =
Unknown has two examples (2,3).
Our subset collection of 2 examples S, has two meaningful attributes (Debt, Collateral), and a bi-
nary classification (Medium, High shortened for clarity to M,H):
1 1 1 1
Entropy(S) = −( )log2 ( ) − ( )log2 ( )
2 2 2 2
= 1
Debt
Values(D) = L, H
SUnknown = [1L , 1M ]
SL ← [2L , 1M ]
SH ← [3L , 0M ]
18
COS3751/203/2/2018
1 1 0 0
Entropy(SL ) = −( )log2 ( ) − ( )log2 ( )
1 1 1 1
= 0
0 0 1 1
Entropy(SH ) = −( )log2 ( ) − ( )log2 ( )
1 1 1 1
= 0
1 1
Gain(SUnknown , D) = Entropy(SUnknown ) − ( )Entropy(SL ) − ( )Entropy(SH )
2 2
1 1
= 1 − ( )0.918 − ( )0
2 2
= 1
Collateral
Values(D) = N
SUnknown = [1M , 1H ]
SN ← [1M , 1H ]
1 1 1 1
Entropy(SN ) = −( )log2 ( ) − ( )log2 ( )
2 2 2 2
= 1
2
Gain(SUnknown , C) = Entropy(SUnknown ) − ( )Entropy(SN )
2
2
= 1 − ( )1
2
= 0
In summary:
Gain(SUnknown , D) = 1
Gain(SUnknown , Collateral) = 0
Debt gives us perfect information gain, and thus becomes the next decision node.
19
Income
High Low
We return to credit history (we first finish all the nodes on the same level). The subset of Credit
History that corresponds to the Income = ¿R35K and Credit History = Bad contains only 1 example
(8).
Income
High Low
20
COS3751/203/2/2018
The subset of Credit History that corresponds to Income = >R35K and Credit History = Good
provides three examples (9,10,13):
Our subset collection of 3 examples S, has two meaningful attributes (Debt, Collateral), and a sin-
gle classification (Low). Hence we again have a leaf node Low.
Income
High Low
The subset of Credit History that corresponds to the Income = >R35K and Credit History = Un-
known is provides 2 examples (5,6).
Our subset collection of 2 examples S, has two meaningful attributes (Debt, Collateral), and a sin-
gle classification (Low). Hence we again have a leaf node Low.
21
Income
High Low
We can now do the last set of calculations. With Debt = High we have only one example left (2),
which makes this a leaf node.
Similarly, when Debt = Low we have one example left (3). This creates the final leaf node.
These last two steps complete our decision tree (note that Collateral plays no role in the decision
based on this decision tree).
22
COS3751/203/2/2018
Income
High Low
High Medium
Copyright
UNISA
c 2018 (v2018.2.1)
23