Professional Documents
Culture Documents
Note: random variables are denoted with capital letters, so when you see, say, P(X=x) that
means, “the probability that a random variable, (capital) X, is equal to some value, (lowercase)
x.”
Boy Boy 2
Boy Girl 1
Girl Boy 1
Girl Girl 0
x P(X=x)
0 1/4
1 1/2
2 1/4
Expected value:
Variance:
Outcomes:
First toss Second toss Third toss X = number of heads
x P(X=x)
1
2
Expected value:
Variance:
Formulas for other sorts of ranges follow straightforwardly from the definition of the CDF:
In other words, the probability that a continuous random variable lies in a range, a < X < b,
is equal to the area under the curve of the PDF between a and b
Note: Non-strict inequality (≤, ≥) and strict inequality (<, >) are interchangeable here. (We are
taking an integral so we are concerned with the area under a curve. Adding or removing a single
point doesn’t affect the area.)
and
Note that here the intuitive concept of all the numbers being equally likely to be chosen
corresponds neatly to all of them having equal likelihood as given by the PDF.
(Graph of PDF)
This is intuitively obvious because if we’re choosing a number between 0 and 1, we’ll never
choose a number less than -2
If we’re choosing a number between 0 and 1, then X<5 covers all possible choices, so it makes
sense that X<5 has probability 1
Note: It’s not a coincidence that we got FX(x) = 1 for x > 1, our PDF was defined so that this
would happen. If we were picking numbers from a different range we would have to define our
PDF differently. As we saw previously, the area under the PDF corresponds to the probability
that X will be in a given range. Therefore the total area under the PDF must equal 1 because X
must be some number.
Stated more formally, for all PDFs we must have
Expectation:
Variance:
Intuitively, this is the idea that the value of one random variable doesn’t affect the value of the
other
For example, if we were tossed a biased coin 10 times, and the probability of getting heads on
this coin was ⅓, we would say the number of heads followed Bi(10,⅓)
(For worked out examples, see the section on discrete random variables)
Also, if we have two independent random variables, X1~Pois(λ1) and X2~Pois(λ2) then
(X1+X2)~Pois(λ1+λ2)
...
10
PDF:
Note: exp(x) is the same as ex. exp(x) is just easier to read when the exponent starts getting
complicated
(Graph of PDF, note that it is symmetric around x=μ)
CDF:
Note: this integral can’t be evaluated using elementary operations, see the section on
the standard normal distribution. It does however have the following properties:
(one sigma)
(two sigma)
(three sigma)
Expectation:
(given as parameter)
Variance:
(given as parameter)
And we use a table to look up the appropriate value for ϕ to get our answer
Example:
Let’s say we have a random variable X~N(10, 4) and we want to calculate the probability
P(X<12). First, we have to define our transformation:
The central limit theorem states that given a set of n i.i.d. random variables, {X1, X2, …, Xn},
which are identical to some X with E(X) = μ and Var(X) = σ2, then as n approaches infinity we
have
Markov Inequality
For a non-negative random variable, X ≥ 0, we can use the following inequality
Note: the upside-down A here simply reads as ‘for all.’ In other words, this is true for all t > 0
Chebyshev Inequality
Examples
Direct calculation
We have X~Bi(10,⅓) which must be an integer 0≤X≤10, so we have
Quite a bit of math, even with a relatively small set of possible values. It would be impractical to
calculate the exact value if there were a very large number of values, which is why we have
approximation/bounding techniques like the following.
Finally,
Markov Inequality
X ≥ 0 (we can never have a negative number of heads), so we can use the Markov inequality to
give us an upper bound here
Chebyshev Inequality
Example 2: Testing algorithm runtime
Let’s say that we have an algorithm with runtime, T. We know the variance, but we don’t know
the mean. We want to perform a series of sample runs, {T1, T2, …, Tn}, to approximate the
mean, but we want to know how many runs we should perform to be reasonably sure we have a
good approximation. How many runs do we need to do to for there to be a 95% chance that the
population mean is within 0.5 of the sample mean?
Given:
By using an equivalent expression for the range we can get the following
So it is the right form for Chebyshev inequality! T is a continuous random variable, so
strict/non-strict inequality is irrelevant here.
From there all we have to do is find an integer value for n that satisfies the following: