You are on page 1of 14

CMPE107 Notes

Andrew Walsh (alwalsh@ucsc.edu)


November 30, 2011
1 Strategy
1. Identify the sample space (): all elements must be mutually exclusive and collectively exhaustive
(cover all possible outcomes).
2. Assign probabilities to all elements:
(a) Estimate based on past experience.
(b) Analysis of random experiments.
(c) Assumptions (eg. equiprobable, Poisson process, Markov property).
3. Identify the event of interest from a statement to a subset of .
4. Compute the probabilities of events of interest.
2 Terms
Axioms: A proposition that is not proven or demonstrated but considered either to be self-evident or
to dene and delimit the realm of analysis.
Bernoulli Distribution: A discrete binary distribution determined by the probability of success or
failure.
Bernoulli Random Variable: Typically a binomial random variable expressed as the function of
two parameters: the number of trials and the probability of success.
Bernoulli Trials: An experiment of some number of trials with the probability of success in a given
trial.
Birth/Death Process: A case of the time-continuous Markov process where states indicate the
number of elements in a system. Useful for modeling populations in mechanical, electrical and natural
systems (eg. human population, service queues).
Certain: An occurrence of an event that is inevitable.
Conditional Probability: The probability that event A occurs when the sample space is limited to
event B. This is read the probability of A, given B.
Conditioning Event: An event that constrains another event to a subset of itself.
Correlation: A statistical relationship between two dependent random variables.
Covariance: How much two random variables change together.
Counterexample: An example that contradicts a proposition and invalidates the result.
Cumulative Distribution Function: A probability function that is the sum of all probabilities up
to some given value of the function (F(t) = P(X t)).
Expected Value: The weighted average or mean value of all random variable values.
Events: A subset of the sample space.
Impossible: An occurrence of an event that will never happen.
Independence: The occurrence of an event having no eect on the probability of another.
1
Independent Increments: When non-overlapping time intervals are formed by independent random
varianbles; X
s
X
t
and X
u
X
v
where X(t) is a random function.
Join Distribution: The probability distribution of events occurring in terms of multiple random
variables dened on the same probability space.
Independent & Identically Distributed (iid): When two random variables are independent of
one another and have the same distribution.
Marginal Distribution: The probability distribution taken from a random variable collections sub-
set.
Memorylessness: Also referred to as the Markov property, this property characterizes a stochastic
process conditional probability distribution of future states being dependent solely on the present
state.
Markov Chain: A random process with the memoryless property, it depicts the probability of transi-
tioning out of any possible countable set of states. Useful for diagramming transitioning probabilities
of given states in some model.
Markov Process: A stochastic process with the memoryless/Markov property; it is conditionally
independent of other states.
Mixed Random Variable: A random variable where the value is determined as the result of two
steps in a trial (eg. ip a coin and roll a die if heads: X = 0 for tails or X = {1, . . . , 6} if heads).
Moment: The shape measure for a set of values.
Poisson Distribution: A binomial distribution allowed under the conditions dictated by the Poisson
process.
Poisson Process: A process that satises a set of specied conditions can be expressed using a Poisson
distribution. These conditions are that the: number of trials be large, probability of success be small
and the average number of successes remain a xed or moderate value (np = ).
Poisson Random Variable: A discrete random variable expressed as a function of a single parameter
that denes the rate of change and is greater than zero.
Probability Space: Consists of the sample space, set of events and assignments of the probabilities
for those events.
Random: When an event may or may not occur in any experiment.
Random Variable: A function that associates a number with the outcomes of a random experiment.
Sample Points: Elements of the sample space.
Sample Space: The set of all possible outcomes even if those outcomes are uncertain.
Standard Deviation: The level of dispersion there is from the expected value or mean.
Stationary Increments: A property where the probability distribution of some increment X
s
X
t
is solely dependent on the length of the s t time interval.
Stochastic Process: A non-deterministic process which is described by a probability distribution.
As opposed to deterministic systems (eg. ordinary dierential equations), it can be used to describe
seemingly random systems (eg. stock markets).
Theorem: Proven statements that are formulated as a logical consequence from axioms.
Trial: The single performance of a well dened experiment.
Variance: How far a value lies from the expected value or mean.
2
3 Prerequisites
Knowing common summations and techniques for integration/derivation will help in simplifying the evalu-
ation of an expression.
n

k=0
_
n
k
_
x
nk
y
k
= (x +y)
n
(Binomial Theorem)

k=0
ar
k
=
a
1 r
(1 < r < 1) (Geometric Summation)
n

i=1
i =
n(n + 1)
2
(Denite Arithmetic Summation)
n

i=1
i
k
=
1
k + 1
n
k+1
(Denite Polynomial Summation)

i=0
x
i
=
1
1 x
(Indenite Exponential Summation)

i=0
ix
i
=
x
(x 1)
2
(Indenite Exponential Summation)
_
u dv = uv
_
v du (Integration by parts)
(fg)

= f

g +fg

(Product rule)
_
e
2x
dx =
1
2
e
2x
(Integrating with Eulers constant)
e
2x
d
dx
= 2e
2x
(Deriving with Eulers constant)

i=0
x
i
i!
= e
x
(Summation Representation of Eulers constant)
4 Axioms of Probability
4.1 Sample Space & Events
The sample space describes the top set of all possible outcomes and an event is a subset of that space. The
sets can be described in set builder notation and be treated with the conventional set operators.
eg. Probability of a hard drive crash.
The sample space, S, that a hard drive will crash in terms of months m. S = {m : m 0}
The event, E, that a hard drive will crash after 36 months. E = {m : m > 36}
The complement of the event E. E = {m : m 36}.
4.2 Axioms
P is called a probability if it satises the following axioms:
1. P(A) 0: The probability of the occurence of an event is always nonnegative.
2. P(S) = 1: The probability of the occurence of the event S that is certain is 1. This ensures that S
covers all possible outcomes.
3. A = {A
1
, A
2
, A
3
, . . . } is a set of multually exclusive events.
3
When these axioms are satised, S and P can be considered a probability model.
4.3 Basic Theorems
1. P() = 0
2. P(

n
i=1
A
i
) =

n
i=1
P(A
i
)
3. P(A) = 1 P(A)
4.4 Inclusion-Exclusion Principle
In order to calculate P(A
1

A
2

A
n
), the unioned events must have mutually exclusive sample points.
Therefore, if A
1
A
2
= , in order to calculate P(A
1

A
2
), the events must be made exclusive: P(A
1

A
2
) =
P(A
1
) +P(A
2
) P(A
1
A
2
).
4.5 Continuity of Probability Function
For any increasing or decreasing sequence of events, {E
n
: n 1},
lim
n
E
n
=

_
n=1
E
n
(increasing)
lim
n
E
n
=

n=1
E
n
(decreasing)
lim
n
P(E
n
) = P( lim
n
E
n
)
eg. Probability of a population dying out.
Some individuals in a population, E
i
, produce ospring that form a successive generation, E
i+1
. If the
probability of an extinction for the n
th
generation is e
(2n
2
+7)/(6n
2
)
, what is the probability of it surviving
forever?
P{surviving forever} = 1 P{exctinction}
= 1 P(

_
i=1
E
i
)
= 1 P( lim
n
E
n
)
= 1 P( lim
n
e
(2n
2
+7)/(6n
2
)
)
= 1 e

1
3
5 Combinatorics
Let n be the total elements in the set: k is the number of elements to be chosen.
Per(n, k) =
n!
(n k)!
(Permutation: Order matters.)
C(n, k) =
n!
k!(n k)!
(Combination: Order doesnt matter.)
4
6 Conditional Probability & Independence
6.1 Conditional Probability
Determines the probability of an event, A, given a subset of the sample space B.
P(A|B) =
P(AB)
P(B)
6.2 Law of Multiplication
The theorem of conditional probability can also be used to nd the intersection of two events:
P(AB) = P(B)P(A|B) i P(A) = 0
= P(A|B)P(B) i P(B) = 0
P(A
1
A
2
A
3
. . . A
n1
A
n
) = P(A
1
)P(A
2
|A
1
)P(A
3
|A
2
A
1
) . . . P(A
n
|A
1
A
2
A
3
. . . A
n1
)
6.3 Law of Total Probability
Given the following constraints over = {A
1
. . . A
n
}
1. A
i
A
j
= i i = j (mutually exclusive)
2. P(A
i
) > 0 i = 1 . . . n
3.

A
i
= (collectively exhaustive)
P(A) = P(A
1
)P(A|A
1
) + +P(A
n
)P(A|A
n
)
6.4 Independence
Given two events A and B. They are independent i:
P(AB) = P(A)P(B)
Therefore it can be inferred that:
P(AB) = P(A|B)P(B) = P(B|A)P(A)
P(AB) = P(A)P(B) = P(B)P(A)
... and generalized to:
P(A
1
A
2
. . . A
n
) = P(A
1
)P(A
2
) . . . P(A
n
)
6.5 Bayes Formula
Given the conditioning event A and a set of events, A
1
. . . A
n
, that are mutually exclusive and collectively
exhaustive, it follows that:
P(A
i
|A) =
P(A
i
)P(A|A
i
)

n
j=1
P(A
j
)P(A|A
j
)
5
7 Random Variables & Distributions
7.1 Random Variable
A real valued function that associates a number with the outcome of a random experiemnt:
X R
X = x{ : X() = x}
X x{ : X() x}
y < X x{ : y < X() x}
7.2 Cumulative/Probability Distribution Function (CDF)
Given some value x, let F(x) be the function over all values up to and including x; it represents the sum of
all probabilities up to x:
F(x) = P(X x) x R
The following constraints must hold for this to be true.
1. F must be non-decreasing; it must be accumulating for each greater value.
2. lim
n+
F(x) = 1
3. lim
n
F(x) = 0
It follows that to nd any probability range, given a CDF, the following cases can be derived:
P(X a) = F(a) P(a < X b) = F(b) F(a)
P(X > a) = 1 F(a) P(a < X < b) = F(b) F(a)
P(X < a) = F(a) P(a X b) = F(b) F(a)
P(X a) = 1 F(a) P(a X < b) = F(b) F(a)
P(X = a) = F(a) F(a)
7.3 Probability Mass Function (PMF - Discrete)
Given a collection of discrete points over the random variable X, let the sum of their probabilities be certain.

i=1
p(x
i
) = 1
With respect to a CDF given the sum of all probabilities up to and including t:
P(X t) = F(t) =
n1

i=1
p(x
i
) where x
n1
t < x
n
Likewise a PMF value can be derived by:
p(x
n
) = F(x
n
) F(x
n1
)
6
7.4 Probability Density Function (PDF - Continuous)
Given a continuous distribution, as opposed to mass points, let a and b be the bounds of a distribution
function f(x).
_

f(x) dx = 1
With respect to a CDF given the sum of all probabilities bounded by a and b:
P(a X b) =
_
b
a
f(x) dx
7.5 Expected Value (First Moment)
Given a random variable X, some function h(X) and a distribution function p(x) and f(x), let the expected
value be the weighted average / mean:
= E[h(X)] =

xi
h(x
i
)p(x
i
) if discrete and converges
=
_
b
a
h(x)f(x) dx if continuous and converges
7.6 Variance, Covariance, Standard Deviation & Correlation
Let
2
be the variance and be the standard deviation of a random variable X:

2
= V ar[X] = E[(X E[X])
2
] = E[X
2
] E[X]
2
=

xi
(x
i
E[X])
2
p(x
i
) if discrete
=
_

(x E[X])
2
f(x) dx if continuous
Let X and Y be random variables, and a and b be constants; the variance has the following properties:
V ar[X Y ] = V ar[X] +V ar[Y ] 2Cov[X, Y ]
V ar[aX b] = V ar[aX] = a
2
V ar[X]
V ar[aX bY ] = a
2
V ar[X] +b
2
V ar[Y ] 2abCov[X, Y ]
Let X and Y be random variables, and a and b be constants; the covariance has the following properties:
Cov[X, Y ] = E[XY ] E[X]E[Y ]
Cov[X, a] = 0
Cov[X, X] = V ar[X]
Cov[aX, bY ] = ab Cov[X, Y ]
Cov[X +a, Y +b] = Cov[X, Y ]
Cov[X, Y ] = 0 i X and Y are independent
The correlation coecient of two random variables:

X,Y
=
Cov[X, Y ]

Y

2
X,Y
=
Cov
2
[X, Y ]
V ar[X]V ar[Y ]
7
7.7 Joint & Marginal Distribution (Bivariate - Discrete)
Let X and Y be discrete random variables dened on a sample space: their possible values are A and B
respectively.
p
XY
(x, y) =

xA

yB
p(x, y) = 1
The function input value can be xed along a particular value for x or y to yield the marginal distribution.
p
X
(x) =

yB
p(x, y) p
Y
(y) =

xA
p(x, y)
The expected value can be derived from the marginal distribution:
E[X] =

xA
xp
X
(x) E[Y ] =

yB
yp
Y
(y)
The joint conditional can be expressed as:
E[X
k
|Y = y
j
] =

xi
x
k
i
p
X|Y
(x
i
|y
j
) : k = 1, 2, . . .
7.8 Joint & Marginal Density (Bivariate - Continuous)
Let X and Y be continuous random variables dened on a countably innite sample space (R R): their
possible values are A and B respectively.
P(X A, Y B) =
_

f(x, y) dx dy = 1
The function input value can be xed along a particular value for x or y to yield the marginal distribution.
f
X
(x) =
_

f(x, y) dy f
Y
(y) =
_

f(x, y) dx
The expected value can be derived from the marginal distribution:
E[X] =
_

xf
X
(x) dx E[Y ] =
_

yf
Y
(y) dy
The joint conditional can be expressed as:
E[X
k
|Y = y] =
_

x
k
f
X|Y
(x|y) dx : k = 1, 2, . . .
8 Case Distributions
8.1 Exponential
Let be an exponential parameter and x the point along a distribution; the CDF is:
F(x) = 1 e
x
Correspondingly, the probability function is the derivative of the CDF:
f(x) = F(x)
d
dx
= (1 e
x
)
d
dx
= e
x
8
The n
th
expected value can be derived by solving for E[X
n
] =
_
x
n
e
x
dx:
E[X
n
] =
n!

n
V ar[X] =
1

2
8.2 Poisson Distribution
Let be an exponential parameter and k be the number of occurrences of an event for the PMF:
p(k) = e

k
k!
Considering two independent Poisson random variables, and ; their sum can be used in the general form:
p(k) = e
(+)
( +)
k
k!
The expected value of a Poisson distribution can be derived by solving for the general form:
E[X] =

i=1
i
e

i
i!
= e

i=1
i

ni
(i 1)!
= e

i=0

i
i!
= e

=
Intuitively, this makes sense since a binomial random variable with parameters n and p would have the
average np = . Further, it can be shown that E[X
2
] = +
2
: V ar[X] = = E[X]. Therefore
E[X] = V ar[X] =
8.3 Bernoulli & Binomial Distribution
Given an experiment with only two outcomes, let n be the total number of trials where k is the number of
successful outcomes, p is the probability of success, q is the probability of failure (q = 1 p) and X is a
Bernoulli random variable; X = {0, 1}.
Since the expected value of a Bernoulli random variable X with parameter p is simply E[X] = 0 P(X =
0) + 1 P(X = 1) = P(X = 1), the expected value, variance and standard deviation can be expressed as:
E[X] = p V ar[X] = pq
X
=

pq
Let Y be a binomial random variable with parameters n and p (k = np Y ).
P(Y = k) =
_
n
k
_
p
k
q
nk
=
_
n
k
_
p
k
(1 p)
nk
The expected value and variance of the binomial random variable is similiar to the Bernoulli but dierent
in that it considers the number of trials, n:
E[Y ] = np V ar[Y ] = npq
Y
=

npq
8.4 Geometric Distribution
Let k be the number of Bernoulli trials before a success; p is the probability of success and q is the probability
of failure:
p(k) = q
k
p
= (1 p)
k
p
9
The expected value of a Geometric distribution can be derived by solving for the general form:
E[X] =

k=0
k q
k
p =

k=0
k(1 p)
k
p =

k=0
k(1 p)
k
p
Therefore the expected value and variance are:
E[X] =
q
p
V ar[X] =
q
p
2
8.5 Continuous Uniform Distribution
Let a and b be two points bounding a continuous distribution; the PDF is:
f(x) =
1
b a
The expected value of a continuous uniform distribution can be derived by solving for the general form:
E[X] =
_
b
a
x f(x) dx =
1
2
(a +b)
Therefore the expected value and variance are:
E[X] =
1
2
(a +b) V ar[X] =
(b a)
2
12
8.6 Normal/Gaussian Distribution
Let and
2
be the center peak and slope of a continuous Gaussian distribution respectively; the curve is
dened as N(,
2
) and the PDF is:
f(x) =
1

2
e
1
2
(
x

)
2
The distribution can be discretized in standard normal form as:
(x) =
1

2
e
1
2
x
2
Although the continuous CDF cannot be solved for in closed form:
(x) =
_
x

f(t) dt
Therefore, it can only be solved for using approximations (see Approximations: Standard Normal Distribution
Approximation).
9 Moments
The expected value E[X] = is the rst moment; E[X
2
] is considered the second moment and E[X
n
] is the
n
th
moment. The general form is:
E[X] =

xi
x
n
p(x
i
) if discrete and converges
=
_
b
a
x
n
f(x) dx if continuous and converges
10
9.1 Law of Total Moments
Let X and Y be random variables and k denote the k
th
moment of a given random variable.
E[E[X
k
|Y ]] = E[X
k
]
=

xi
x
k
i
p
X
(x
i
)
=
_

x
k
f
X
(x) dx
9.2 Moment Generating Function (MGF)
Let R be some xed value and X be the random variable that is dened over.
() = E[e
X
]
=

xi
e
xi
p(x
i
) if discrete
=
_

xi
e
xi
p(x
i
) if continuous
The n
th
expected value can be derived from the MGF by taking the n
th
derivative.
E[X
n
] =
d
n

X
()
d
n

=0
Let X and Y be independent random variables; the MGF of their sum is the product of the separate MGFs:

X+Y
() =
X
()
Y
()
Special distribution cases for the Poisson and exponential respectively.
() = e
(e

1)
() =


9.3 Generating Function (GF)
Let z Z
+
be some xed value and N be the discrete random variable that z is dened over.
g(z) =

n=0
z
n
p
N
(n)
Note that this is simply a re-parameterization of the MGFs e

= z. Therefore when = 0 and z = 1:


e
0
= 1. The moment and variance for a GF can be stated as:
E[N] = g

(1) V ar[N] = [g

(1) +g

(1)] (g

(1))
2
10 Approximations
It is not always practical to determine the distribution of a random variable but an approximation can be
made if the characteristics of the distribution are known (ie. mean, variance, standard deviation).
11
10.1 Markov & Chebychevs Inequalities
Recall that is the expected value and
2
the variance. Let X be a discrete non-negative random variable
and t > 0:
P(X t)
E[X]
t
(Markovs)
P(|X | t)

2
t
2
(Chebychevs)
Chebychevs inequality can also be reparamterised where t = k:
P(|X | k
2
)
1
k
2
Given n Bernoulli trials, let and be the desired rate of error and probability of failure respectively.
Therefore, to nd an upper bound to the number of trials (n) needed to attain the given rate of error ()
and failure probability ():
n
1
4
2

10.2 One-Sided Inequality


Taken as a one-sided variant of Chebychevs inequality.
P(X t)

2

2
+ (t E[X])
2
if t < E[X]
P(X > t)

2

2
+ (t E[X])
2
if t > E[X]
10.3 Law of Large Numbers
Given a number of successes S
n
among n Bernoilli trials where success is dened for when A occurs. Let
Sn
n
and P(A) be the experimental and theoretical probabilities respectively, and be the condence interval.
P
_

S
n
n
P(A)

P(A)(1 P(A))
n
2
P(A)(1 P(A))
n
2

0.25
n
2
(maximum value when P(A) =
1
2
)
10.4 Standard Normal Distribution Approximation
Given a standard normal distribution with mean and variance
2
, an approximation to the CDF at point
x can be found by taking the following equation and determining the approximate value of z in a lookup
table.
P(Z z) = (z) : z =
x

10.5 Standard Normal Approximation Error


Given a standard normal distribution approximated to n trials, let be the error tolerance and the level
of experimental uncertainty to the theoretical probability p.
P
_

S
n
n
p


_
2
_
1
_

pq
__
=
12
10.6 Central-Limit Theorem
Given iid random variables {X
1
, X
2
, . . . , X
n
} where and are nite; > 0 and S
n
=

n
i=1
X
i
. Let there
be a cumulative probability in bounds x and y along a standard normal distribution with a signicantly
large number of trials:
lim
n
P
_
x
S
n
n

n
y
_
= (y) (x)
10.7 r
th
Percentile
Given an exponential random variable, X, and a percentile function, (r), nd the r
th
percentile for all
X r:
P(X (r)) =
r
100
= r%
1 e
(r)
=
r
100
e
(r)
=
100 r
100
(r) =
1

ln
_
100
100 r
_
Therefore the nal function becomes:
(r) = E[X] ln
_
100
100 r
_
11 Stochastic Processes
11.1 Poisson Process
Given some system with time t elapsed and n countable elements, let N(t) be a random variable related to n.
Constrain the system to have independent increments, stationary increments and non-overlapping intervals;
it constitutes a Poisson process with rate . Therefore, the probability of n elements in a system at time t
can be expressed as:
P(N(t) = n) =
e
t
(t)
n
n!
= P
n
(t)
In addition, the probability distribution of wait time until next occurrence of n is exponential.
11.2 Markov Chains
Given some countable set of states, S
n
= {0, 1, . . . , n}, let S
1
be a state set where and are the associated
probabilities for states 0 and 1 respectively.
0 1

1

1
For each state, all outgoing transitions will sum to 1 and can be described by a transition matrix:
P =
_
1
1
_
The above example can be extended to n states for an n n transition matrix.
13
11.3 Continuous-Time Markov Chains
0 1 2

0

1

1

2
14