You are on page 1of 1

### Shannon's Entropy

Entropy of a variable is a measure of the uncertainty of its value.

H(X) = -sum(p(ex)*ln(p(ex)), ex=X)

This function takes a set of values or *messages* `X` and produces the entropy of
that set. To do this, it needs to know the probability of a message being in `X`;
this information is provided by the `p` function.

The `p` function must sum to `1` for all the possible messages.

data = [A, B, C]

# Maximum entropy
pmax(x) = 1/3
sum(pmax, data) => 1

# NOT a good value of p because it sums to 0.75


pbad(x) = 1/4
sum(pbad, data) => 0.75

# Here is a modeled p function


pmodel(x) = if x == B then 0.6 else 0.2
sum(pmodel, data) => 1

We can now calculate the entropy of the data given the various probabilities of
messages:

H(data, p = pmax) => 1.09861

H(data, p = pmodel) => 0.96566

When the messages are most uncertain - they each have the same probability - the
entropy is maxed out at about `1.1`. If we make the number `11` most probable -
`60%` likelihood - then the entropy is only `0.95`.

If we designate one of the messages as very likely to occur then the entropy should
go down since there is very little uncertainty.

# Here is a modeled p function


pcertain(x) = if x == B then 0.9999 else 0.00005
sum(pcertain, data) => 1

H(data, p = pcertain) => 0.00149

Yep! H is very small now because `9,999` out of `10,000` messages will be `B` - the
world is certain!

You might also like