Professional Documents
Culture Documents
AiML 4 2 SG
AiML 4 2 SG
2
Objectives
• This lesson covers the following objectives:
−Define information entropy
−Understand variance
−Calculate information entropy
−Understand information entropy
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 3
3
Information Entropy
• Information entropy is a concept defined by
mathematicians in 1949
• The idea originated from the concept of
entropy(disorder) in statistical thermodynamics, and
refers to uncertainty in data
• Data with a high level of uncertainty (randomness) will
contain more information that can be used
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 4
4
Information Entropy
• Example:
−If given new data about a topic, then there is new
information
−This means that something is learned, and the information
would have high entropy
−If given known data, then no learning takes place
−This information would have low (or zero) entropy
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 5
5
Information Entropy
• Information entropy allows quantification of making
the best split in our decision tree by looking at the
variance in the data
• The formulas used in the following examples seem
complex, but do not require knowledge of high level
mathematics
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 6
6
Variance
• Variance measures how far a data set is spread out
• The technical definition is
−“The average of the squared differences from the mean”
• This will not be calculated directly, but a few examples
will be used to look at data sets
• There is also an online calculator available:
• http://www.alcula.com/calculators/statistics/variance/
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 7
7
Variance
• Examples
−5,5,5,5,5 has a variance of 0 which means all the numbers are
the same. They do not vary
−5,5,5,5,6 has a variance of 0.16 means that the numbers are
very close. They vary little
−5,5,5,5,2000 has a variance of 636804, which means there is
a larger change in the numbers
• In variance, if there is a high value, then we do not
know much about the data, but if there is a narrow
variance we can be more confident in the data values
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 8
8
Information Entropy
• Information entropy is measured as:
−𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = − σ𝑛𝑥=1 𝑝 𝑥 𝑙𝑜𝑔2 𝑝(𝑥)
• This reads as:
−Entropy equals negative sum over x, p of x multiplied by log p
of x
• This entropy equation will return how much
information to expect from some action
• The higher the number, the more information we
obtain
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 9
9
Entropy Example
• Consider this example:
−There is a bag of 10 green marbles
−Calculate the entropy of selecting a green marble from the
bag
−P(x) is the probability of picking a green marble from the total
number of marbles
−In this example there are only green marbles, so the
probability will be 10/10 = 1
• Use p(x) = 1 in the entropy equation
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 10
10
Entropy Example
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = − σ𝑛𝑥=1 𝑝 𝑥 𝑙𝑜𝑔2 𝑝(𝑥)
• In this example n = 1 as there is one color of marble
−𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = −𝑝 𝑥 𝑙𝑜𝑔2 𝑝 𝑥
−𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = −1𝑙𝑜𝑔2 1
−𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = −1 ∗ 0
−𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0
• p(x) in this example = 1
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 11
11
Entropy Example
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0
• We obtain an entropy of 0
• This gives us very low entropy so we gain little
information which would mean this is not a good
training set
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 12
12
Entropy Example 2
• There is a bag of 5 green marbles and 5 red marbles
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = − σ𝑛𝑥=1 𝑝 𝑥 𝑙𝑜𝑔2 𝑝(𝑥)
• In this example, given two types of marbles, n = 2
• It is necessary to use the equation above two times
• Simplify the above to look at any one value as being:
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 13
13
Entropy Example 2
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = −𝑝(+)𝑙𝑜𝑔2 𝑝 + − 𝑝(−)𝑙𝑜𝑔2 (𝑝−)
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 14
14
Entropy Example 2
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = −𝑝 𝑔𝑟𝑒𝑒𝑛 𝑙𝑜𝑔2 𝑝(𝑔𝑟𝑒𝑒𝑛) −
𝑝 𝑟𝑒𝑑 𝑙𝑜𝑔2 𝑝(𝑟𝑒𝑑)
5 5 5 5
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2
10 10 10 10
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 15
15
Calculating Logarithms
• A calculator may be able to do the following
calculation:
5
−𝑙𝑜𝑔2 10
• Another option is:
5
5 𝑙𝑜𝑔
10
−𝑙𝑜𝑔2 =
10 log 2
• The result is -0.5
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 16
16
Entropy Example 3
• Flipping a coin gives 2 outcomes
• Call these heads or tails
• This is referred to as binary output, because it has 2
states
• The probability of a head or a tail is 0.5
• The result is entropy = 1
1 1 1 1
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =1
2 2 2 2
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 17
17
Entropy Example 3
• What if there is a coin with 2 heads?
• The probability of heads is 1, and entropy = 0
• If the coin has 2 tails, the probability of heads = 0, and
entropy = 0
• Consider coins that are weighted to give a 10% chance
of heads, then a 20% chance, etc.
• The following graph shows entropy values of these
probabilities
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 18
18
Entropy of Coin Toss with Weighted Coins
Entropy On Coin Toss
Entropy
Probability
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 19
19
Entropy of Coin Toss
• So what can be deduced from the coin toss?
• If there is a 2-sided coin with the same side on both,
then the entropy = 0 because the result is already
known
• There is no information to be gained from this
• The highest entropy is 1, when there is a fair, 2-sided
coin
• This means that in half the cases we will gain new
information
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 20
20
How Much Information
• In the coin toss, there is a maximum of 1 bit of
information
• In other examples this can be greater than 1
• In the coin toss, there are 2 options:
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 21
21
Bits of Information
• With a six-sided dice, there are 6 possible results:
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 22
22
Next
• In the next lesson we work through a full example of
the ID3 algorithm
• Make yourself comfortable!
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 23
23
Summary
• In this lesson, you should have learned how to:
−Define information entropy
−Understand variance
−Calculate information entropy
−Understand information entropy
AiML 4-2
Information Entropy Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 24
24
25