Professional Documents
Culture Documents
Information Theory - Shannon's Entropy
Information Theory - Shannon's Entropy
This function takes a set of values or *messages* `X` and produces the entropy of
that set. To do this, it needs to know the probability of a message being in `X`;
this information is provided by the `p` function.
The `p` function must sum to `1` for all the possible messages.
data = [A, B, C]
# Maximum entropy
pmax(x) = 1/3
sum(pmax, data) => 1
We can now calculate the entropy of the data given the various probabilities of
messages:
When the messages are most uncertain - they each have the same probability - the
entropy is maxed out at about `1.1`. If we make the number `11` most probable -
`60%` likelihood - then the entropy is only `0.95`.
If we designate one of the messages as very likely to occur then the entropy should
go down since there is very little uncertainty.
Yep! H is very small now because `9,999` out of `10,000` messages will be `B` - the
world is certain!