Probability Theory Review 2

Probability Theory Review 2
Random Variables and Distributions
Random variable (Discrete)

A random variable is a variable whose possible values are numerical outcomes of a random
phenomena
Note: random variables are denoted with capital letters, so when you see, say, P(X=x) that
means, “the probability that a random variable, (capital) X, is equal to some value, (lowercase)
x.”
Expected value (mean) of a discrete random variable
Variance of a discrete random variable
Example 1: A couple with two kids (X = number of boys)

Outcomes:
First child Second child X = number of boys
Boy Boy 2
Boy Girl 1
Girl Boy 1
Girl Girl 0
x P(X=x)
0 1/4
1 1/2
2 1/4
Expected value:
Variance:
Example 2: 3 biased coin tosses (X = number of heads)

P(Heads) = 2/3
P(Tails) = 1/3
Outcomes:
First toss Second toss Third toss X = number of heads
Heads Heads Heads 3
Head Heads Tails 2
Heads Tails Heads 2
Tails Heads Heads 2
Heads Tails Tails 1
Tails Heads Tails 1
Tails Tails Heads 1
Tails Tails Tails 0
x P(X=x)
1
2
Expected value:
Variance:
Random Variable (Continuous)

When a random variable is continuous, the methods we use for discrete cases break down. For
example, if we choose a number between 0 and 1 at random, P(X=x) will be 0 for all x because
there are an infinite number of possible choices. However, clearly some number must be
chosen. To deal with this we introduce the idea of likelihood, which is given by a probability
density function (PDF), fX(x).
While we still can’t say anything interesting about probability at a point, we can use the PDF to
talk about probability over a range. To do this, we take the following integral to get the
cumulative distribution function (CDF), FX(x).
Formulas for other sorts of ranges follow straightforwardly from the definition of the CDF:
In other words, the probability that a continuous random variable lies in a range, a < X < b,
is equal to the area under the curve of the PDF between a and b
Note: Non-strict inequality (≤, ≥) and strict inequality (<, >) are interchangeable here. (We are
taking an integral so we are concerned with the area under a curve. Adding or removing a single
point doesn’t affect the area.)
Expected value (mean) of a continuous random variable
Variance of a continuous random variable
Example: Choosing a number at random between 0 and 1 (Uniform

distribution)
Because our random variable, X, follows the uniform distribution (i.e. all values are equally likely
to be chosen), our PDF is
and
Note that here the intuitive concept of all the numbers being equally likely to be chosen
corresponds neatly to all of them having equal likelihood as given by the PDF.
(Graph of PDF)
Integrating to get the CDF yields the following
A couple examples of calculating probabilities using the CDF:
This is intuitively obvious because if we’re choosing a number between 0 and 1, we’ll never
choose a number less than -2
If we’re choosing a number between 0 and 1, then X<5 covers all possible choices, so it makes
sense that X<5 has probability 1
Note: It’s not a coincidence that we got FX(x) = 1 for x > 1, our PDF was defined so that this
would happen. If we were picking numbers from a different range we would have to define our
PDF differently. As we saw previously, the area under the PDF corresponds to the probability
that X will be in a given range. Therefore the total area under the PDF must equal 1 because X
must be some number.
Stated more formally, for all PDFs we must have
Expectation:
Variance:
Independent Random Variables

Two discrete random variables, X and Y are defined to be independent if the following is true:
Intuitively, this is the idea that the value of one random variable doesn’t affect the value of the
other
Properties of Expectation and Variance

(In the following equations, all k are constant values)
The following is only true is X and Y are independent:

Binomial Distribution: X~Bi(n,p)
The binomial distribution is the distribution we’ve been using in our discrete cases thus far. It
describes a series of trials where there are two possible outcomes with fixed probabilities (such
as a coin flip). This distribution has two parameters:
n - the number of trials

p - the probability of ‘success’
For example, if we were tossed a biased coin 10 times, and the probability of getting heads on
this coin was ⅓, we would say the number of heads followed Bi(10,⅓)
(For worked out examples, see the section on discrete random variables)
Poisson Distribution: X~Pois(λ)

The Poisson distribution is used to describe how many times an event will occur in a given time
period when we know how many times it happens on average. For example, if we know that a
website gets an average of 3 views every minute, we could use the Poisson distribution to
determine the probability that it will get, say, 5 views, or 100 views in that amount of time.
The Poisson distribution takes a single parameter, λ, which is the average number of
occurrences in the given time frame.
Also, if we have two independent random variables, X1~Pois(λ1) and X2~Pois(λ2) then
(X1+X2)~Pois(λ1+λ2)
Example: A website that gets 3 views every minute

Let’s say that we are running a website that gets on average 3 views every minute and we want
to use that information to get an idea of how likely we are to get other amounts of traffic. The
number of views per minute follows Pois(3), so from there we just plug whatever value we want
into the formula.
# of views, k P(X=k)
...
10
Normal Distribution: X~N(μ,σ2)

The normal distribution is the familiar ‘bell curve’ distribution. It takes two parameters:
μ - the mean of the distribution
σ2 - the variance of the distribution
PDF:
Note: exp(x) is the same as ex. exp(x) is just easier to read when the exponent starts getting
complicated
(Graph of PDF, note that it is symmetric around x=μ)
CDF:
Note: this integral can’t be evaluated using elementary operations, see the section on
the standard normal distribution. It does however have the following properties:
(one sigma)
(two sigma)
(three sigma)
Expectation:
(given as parameter)
Variance:
(given as parameter)
Standard Normal Distribution: N(μ=0,σ2=1)

As mentioned above, the CDF for the normal distribution is difficult to calculate. To get around
this, we transform the distribution we’re working with to the standard normal distribution. From
there we can use a table to look up the value of the CDF for the standard normal distribution, ϕ.
To transform any normal distribution, N(μ, σ2), to the standard normal distribution, N(0,1) we use
the following procedure:
Given a random variable, X~N(μ, σ2), we define Y such that
To calculate P(X<k), we use the following equation
And we use a table to look up the appropriate value for ϕ to get our answer
Example:
Let’s say we have a random variable X~N(10, 4) and we want to calculate the probability
P(X<12). First, we have to define our transformation:
Then we find the corresponding value in the standard normal distribution:
Central Limit Theorem

A quick definition before we dive into the theorem:
A set of random variables, {X1, X2, …, Xn}, is independent and identically distributed
(i.i.d.) if they are all mutually independent and they all follow the same distribution.
The central limit theorem states that given a set of n i.i.d. random variables, {X1, X2, …, Xn},
which are identical to some X with E(X) = μ and Var(X) = σ2, then as n approaches infinity we
have
Note: this is an APPROXIMATION as n approaches infinity.
(See examples below)
Markov Inequality
For a non-negative random variable, X ≥ 0, we can use the following inequality
Note: the upside-down A here simply reads as ‘for all.’ In other words, this is true for all t > 0
(See examples below)
Chebyshev Inequality
(see examples below)
Examples
Example 1: Biased coin toss

Let’s say we have a biased coin with P(Heads) = ⅓ and P(Tails) = ⅔ which we toss ten times
and we want to find P(X>2*10*⅓), where X is the number of heads. There are a number of
different ways we could go about it.
Direct calculation
We have X~Bi(10,⅓) which must be an integer 0≤X≤10, so we have
Quite a bit of math, even with a relatively small set of possible values. It would be impractical to
calculate the exact value if there were a very large number of values, which is why we have
approximation/bounding techniques like the following.
Central limit theorem

Let’s define Ti~Bi(1,⅓) as the number of heads in the ith coin toss (i.e. Ti = 1 if the ith toss came
up heads and Ti = 0 if it came up tails). Then we have
Because {T1, T2, ..., Tn} is independent and identically distributed, we can apply the central limit
theorem.
Then all we have to do is transform to the standard normal distribution
Finally,
Markov Inequality
X ≥ 0 (we can never have a negative number of heads), so we can use the Markov inequality to
give us an upper bound here
Chebyshev Inequality
Example 2: Testing algorithm runtime
Let’s say that we have an algorithm with runtime, T. We know the variance, but we don’t know
the mean. We want to perform a series of sample runs, {T1, T2, …, Tn}, to approximate the
mean, but we want to know how many runs we should perform to be reasonably sure we have a
good approximation. How many runs do we need to do to for there to be a 95% chance that the
population mean is within 0.5 of the sample mean?
Given:
We need to find n such that:
By using an equivalent expression for the range we can get the following
And by swapping the direction of the inequalities we get
Which looks an awful lot like the Chebyshev inequality.
So it is the right form for Chebyshev inequality! T is a continuous random variable, so
strict/non-strict inequality is irrelevant here.
From there all we have to do is find an integer value for n that satisfies the following:
Basic algebra yields the answer

Probability Theory Review 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory Review 2

Uploaded by

Copyright:

Available Formats

Probability Theory Review 2

Random Variables and Distributions

Random variable (Discrete)

Expected value (mean) of a discrete random variable

Variance of a discrete random variable

Example 1: A couple with two kids (X = number of boys)

Example 2: 3 biased coin tosses (X = number of heads)

Heads Heads Heads 3

Head Heads Tails 2

Heads Tails Heads 2

Tails Heads Heads 2

Heads Tails Tails 1

Tails Heads Tails 1

Tails Tails Heads 1

Tails Tails Tails 0

Random Variable (Continuous)

Expected value (mean) of a continuous random variable

Variance of a continuous random variable

Example: Choosing a number at random between 0 and 1 (Uniform

Integrating to get the CDF yields the following

A couple examples of calculating probabilities using the CDF:

Independent Random Variables

Properties of Expectation and Variance

The following is only true is X and Y are independent:

n - the number of trials

Poisson Distribution: X~Pois(λ)

Example: A website that gets 3 views every minute

Normal Distribution: X~N(μ,σ​2​)

Standard Normal Distribution: N(μ=0,σ​2​=1)

Given a random variable, X~N(μ, σ​2​), we define Y such that

To calculate P(X<k), we use the following equation

Then we find the corresponding value in the standard normal distribution:

Central Limit Theorem

Note:​ ​this is an APPROXIMATION​ ​as n approaches infinity.

(See examples below)

(See examples below)

(see examples below)

Example 1: Biased coin toss

Central limit theorem

Then all we have to do is transform to the standard normal distribution

We need to find n such that:

And by swapping the direction of the inequalities we get

Which looks an awful lot like the Chebyshev inequality.

Basic algebra yields the answer

You might also like

Normal Distribution: X~N(μ,σ2)

Standard Normal Distribution: N(μ=0,σ2=1)

Given a random variable, X~N(μ, σ2), we define Y such that

Note: this is an APPROXIMATION as n approaches infinity.