You are on page 1of 113

1

Probability and statistics


Dr. K.W. Chow
Mechanical Engineering
2
Contents
Review of basic concepts:
- permutations
- combinations
- random variables
- conditional probability

Binomial distribution

3
Contents
Poisson distribution

Normal distribution

Hypothesis testing
4
Basics
Principle of counting:





There are mn different combinations of
marriage (i.e. for each lady, there are n
possible marriage combinations, thus mn)
A
B
m women
n men
5
Basics
Permutation (order important ):

Form a 3-digit number from (1, 2,9)

Combination (order unimportant ):

Mary marries John = John marries Mary

6
Permutations
Permutations of n things taken r at a
time (assuming no repetitions):
For the first slot / vacancy, there are n
choices.
For the second slot / vacancy, there are
(n 1) choices.
Thus there are n(n 1)(n r + 1) =
n!/(n r)! ways.



7
Combinations
Combinations of n things taken r at a
time (assuming order unimportant):
Permutations: n(n 1)(n r + 1) =
n!/(n r )! ways.
Every r! combinations are equivalent to
a single way.
Hence number of combinations:
n!/((n r)! r ! )


8
Conditional Probability
The probability that an event B occurs, given
that another event A has happened.

Definition:


Note that when B and A are independent,
then
) (
) (
) | ( ) (
A P
A B P
A B P A B P
and
given = =
) ( ) ( ) ( A P B P A B P = and
9
Random variables
(Intuitive) Random variables are
quantities whose values are random
and to which a probability distribution is
assigned.

Either discrete or continuous.
10
Random variables
Example of random variables:


Outcome of rolling a fair die
11
Random variables
All possible outcomes belong to the set:


Outcome is random.
Probabilities of every outcome are the same,
i.e. the outcomes follow the uniform
distribution.
Hence the outcomes are random variables.

{ } 6 , 5 , 4 , 3 , 2 , 1 = O
12
Random variables
(Rigorous definition) Random variable is
a MAPPING from elements of the
sample space to a set of real numbers
(or an interval on the real line).

e.g. for a fair die mapping from {1,
2,3,4,5,6} to 1/6.
13
Probability density function
In physics, mass of an object is the integral
of density over the volume of that object:



Probability density function (pdf) f(x) is
defined such that the probability of a random
variable X occurring between a and b is
equal to the integral of f between a and b.

}
=
V
dV m
14
Probability density function
Defining properties:
Probability density function is non-negative.

The integral over the whole sample space
(e.g. the whole real axis) must be unity.
}
= s s
b
a
d f b X a P q q) ( ) (
15
Probability density function
The probability is not defined at single
point, it does not make sense to say
what is the chance of x = 1.23 for a
continuous random variable, as that
chance is zero (infinitely many points).

16
Probability density function
For discrete random variables, the
probability at a point is equal to the
probability density function evaluated at that
point:


Probability between two points (inclusive):
n
x
n n
p x X P = = ) (

=
= s s
j
i n
n j i
p x X x P ) (
17
Cumulative density function
Cumulative density function (cdf) F is
related to pdf by:



Note: the lower limit is the smallest value
that can take, not necessarily
}

= s =
x
d f x X P x F q q) ( ) ( ) (

18
Cumulative density function
For discrete random variables:





cdfs for discrete random variables are
discontinuous

s
= s =
x x
n
n
p x X P x F ) ( ) (
19
Cumulative density function
cdf of a discrete
random variable
cdf of a continuous
random variable
20
Expectation and variance of
random variables
Expectation (or mean): Integral or sum of the
probability of an outcome multiplied by that
outcome.

For continuous variables, the probability of
X falling in the interval (x, x+dx) is:
0 ) ( dx dx x f ,
21
Expectation and variance of
random variables
The expectation is:



The integral is taken over the whole sample
space.

Not all distributions have expectation, since
the integral may not exist, e.g. the Cauchy
distribution.
}
= dx x xf X E ) ( ) (
22
Expectation and variance of
random variables
For discrete variables, the probability of an
outcome is:



The expectation is:

n n
p x X P = = ) (

=
n
n n
p x X E ) (
23
Expectation and variance of
random variables
Expectation represents the average
amount one "expects" as the outcome
of the random trial when identical
experiments are repeated many times.
24
Expectation and variance of
random variables
Example: Expectation of rolling a fair die:




Note that this expected value is never
achieved !!
5 . 3
6
1
6
6
1
5
6
1
4
6
1
3
6
1
2
6
1
1
) (
= + + + + + =
=


n
n n
p x X E
25
Expectation and variance of
random variables
Standard deviation : a measure of how a
distribution is spread out relative to the
mean.

Definition:
( )
2 2 2
) ( ) ( )) ( ( X E X E X E X E = = o
o
26
Expectation and variance of
random variables
Variance is defined as the square of standard
deviation:
2 2 2
) ( ) ( ) ( X E X E X Var = =o
27
Binomial distribution
Bernoulli experiment: outcome is either
success or fail.

The number of successes in n independent
Bernoulli experiments are governed by the
Binomial distribution.

This is a distribution with discrete random
variables.
28
Binomial distribution
Suppose we perform an experiment 4
times. What is the chance of getting
three successes? (Chance for success
= p, chance for failure = q, p + q = 1).

29
Binomial distribution
Scenario:
p, p, p, q
p, p, q, p
p, q, p, p
q, p, p, p
There are
4
C
3
ways of placing the
failure case.

30
Binomial distribution
Thus the chance is 4 p
3
q.

For a simpler case getting 2 heads in
throwing a fair coin 3 times:
H, H, T;
H, T, H;
T, H, H.
31
Binomial distribution
Example: chance of getting exactly 2 heads
when a fair coin is tossed 3 times is:
2 3 2
3
1 1 3
(2)
2 2 2 8
f

| |
= =
|
\ .
32
Binomial distribution
The probability density function for r
successes in a fixed number (n ) trials is:
(r = 0, 1, 2n)



where r is the number of successes, and p
is the probability of success of each trial.
r n r
p p
r
n
r f

|
|
.
|

\
|
= ) 1 ( ) (
33
Binomial distribution
Expectation:



Variance:
np X E = ) (
npq X Var = ) (
34
Binomial distribution
Methods to derive the formula E(X) = np
for the binomial distribution:
(1) Direct argument: Gain of p at each trial.
Hence total gain of np in n trials.
(2) Direct summation of series.
(3) Differentiate the series expansion of
the binomial theorem.
35
Binomial distribution
The probability
density function
36
Binomial distribution
The
cumulative
density
function
37
Poisson distribution
Poisson distribution is a special limiting case
of the binomial distribution by taking:


while keeping the product np finite.
The probability density function is:
0 p n ,
| |
np
r
r f
r
=

=

,
!
exp
) (
38
Poisson distribution
Expectation of the Poisson distribution:



Variance of the Poisson distribution:
= ) (X E
= ) (X Var
39
The Poisson distribution
Physical meaning: a large number of
trials (n going to infinity), and the
probability of the event occurring by
itself is pretty small (p approaching
zero).
BUT (!!) the combined effect is finite (np
being finite).
40
The Poisson distribution
Examples:
(a) The number of incorrectly dialed
telephone calls if you have to dial a
huge number of calls.
(b) Number of misprints in a book.
(c) Number of accidents on a highway in
a given period of time.
41
Poisson distribution
The probability
density function
(usually shows a
single maximum).
42
Poisson distribution
The cumulative
density function
(must start from
zero and end up
in one)
43
Normal distribution
The normal distribution for a continuous
random variable is a bell-shaped curve
with a maximum at the mean value.
It is a special limit of the binomial
distribution when the number of data
points is large (i.e. n going to infinity
but without special conditions on p).

44
Normal distribution
As such the normal distribution is
applicable to many physical problems
and phenomena.
The Central Limit Theorem in the
theory of probability asserts the
usefulness of the normal distribution.
45
Normal distribution
The probability density function:



where
(


=
2
2
2
) (
exp
2
1
) (
o

t o
x
x f
deviation standard
mean
=
=
o

46
Normal distribution
The curve is symmetric about
= x
The
probability
density
function
47
Normal distribution
For small standard deviation, the curve
is tall, sharply peaked and narrow.
For large standard deviation, the curve
is short and widely spread out.
(As the area under the curve must sum
up to one to be a probability density
function).
48
Normal distribution
The
cumulative
density
function
49
Normal distribution
Cumulative density function or probability of
a normally distributed random variable falling
within the interval (a, b):




Values of the above integral can be found
from standard tables.
}
= s s
b
a
dx x f b X a P ) ( ) (
50
Simple tutorial examples for
the normal distribution
It is obviously not possible to tabulate the
normal distribution pdf for all values of
mean and standard deviation. In
practice, we reduce, by simple scaling
arguments, every normal distribution
problem to one with mean zero and
standard deviation. (Notation: N(,
2
))
51
The binomial approximation of
the normal distribution
In many situations, the binomial
distribution formulation is impractical as
the computation of the factorial term is
problematic.
The normal distribution provides a good
approximation to the binomial
distribution.
52
The binomial approximation of
the normal distribution
Example: chance of getting
exactly 59 heads in tossing a
fair coin 100 times:
The exact formulation is:

100
C
59
(1/2)
59
(1/2)
41
but difficult to calculate 100!

53
Normal distribution
Instead we use the normal
distribution (a continuous random
variable (rv)) to approximate the
binomial distribution (a discrete rv):


5 50 = = = = npq np o ,
54
The binomial approximation of
the normal distribution
We use the mean (np) and variance (npq)
of the binomial distribution as the
corresponding parameters of the
normal distribution.
We use an interval of length one to cover
every integer, e.g. to cover an integer
of 59, we use the interval (58.5, 59.5).
55
Normal distribution
Set:



Form the standard variable:
npq np = = o ,
o

=
X
Z
56
Normal distribution




Find the probability of this range of Z from
tables:



9 . 1 7 . 1
5
50 5 . 59
5
50 5 . 58
s s

s s

Z
Z
0159 . 0 ) 9 . 1 7 . 1 ( = s s Z P
Value obtained from binomial formulation :
0.0159 (agree to three decimal places)
57
Normal / binomial distributions
(For your information): Class example on
university admission.
Yield rate = (number of students who
actually attend) / (number of offers or
admission letters sent to students)
Vary from year to year. Even Harvard
has only a yield ratio of about 0.6 0.8.
58
Normal distribution
A large state university with a yield ratio
of say 0.3.
Will send out 450 offers or letters of
admission.
Chance of more than 150 students
actually coming to campus (i.e. cannot
accommodate beyond this limit of 150).
59
Normal distribution
The exact binomial formulation: Sum r
450
C
r
(0.3)
r
(0.7)
450 r

from 151 to 450. (a) 450! is too
large and (b) sum of 300 terms??
60
Normal distribution
Use (150.5, 151.5) for r = 151,
(151.5, 152.5) for r = 152,
(152.5, 153.5) for r = 153 and so on.
n = 450, p = 0.3
[150
(450)(0.3)]/Sqrt[450(0.3)(0.7)]
= 1.59

61
The binomial approximation of
the normal distribution
Upper limit of 450.5 can effectively
be taken as positive infinity. Thus
we need to find the area of the
normal curve between 1.59 and
infinity. From table this area is
0.0559. Hence the chance of 151
admitted students or more actually
coming to campus is 0.0559.
62
Chi-squared distribution
Chi-squared distribution is a distribution
for continuous random variables.

Commonly used in statistical
significance tests.
63
Chi-squared distribution
If are independent and identically
distributed random variables which follow
the normal distribution, then



has a chi-squared distribution of degree-of-
freedom k.
i
X
2
1

=
|
.
|

\
|

=
k
i
i
X
Y
o

64
Chi-squared distribution
The probability density function is:






where is the gamma-function
( )
0
2 2
2
exp
) (
2
1
2
>
I
(

x
k
x
x
x f
k
k

I
65
Chi-squared distribution
The pdf
66
Chi-squared distribution
The cdf
67
Sum of random variables
Consider the problem of throwing a die
twice. What is the chance of getting a
sum of the two outcomes at 7? The
answer is the combination of (1,6), (2,5),
(3,4), (4,3), (5,2), (6,1) or 6 outcomes
out of 36 possible ones, i.e. a chance of
6/36 = 1/6.
68
Sum of continuous r. v.
Now consider a more complicated
problem of finding the probability
density function of the of
69
Sum of normal r. v.
Suppose Z = X + Y and each of X, Y are
N(,
2
). We consider the simpler case
of N(0, 1) first. Suppose Z is to attain a
value of z, and if X is of value , then Y
MUST have the value of z , and now
we integrate over from negative infinity
to plus infinity.
70
Sum of normal r. v.
On calculating the integrals, Z is found to
go like N(0, 2). In general if
X ~ N(
1
, (
1
)
2
)
Y ~ N(
2
, (
2
)
2
)
X + Y ~ N(
1
+

2
, (
1
)
2
+

(
2
)
2
)


71
Linearity of normal r. v.
Suppose Z = a X + b, where X is N(,
2
),
and a, b are scalars, then
(a) Mean of Z = a + b
(b) Variance of Z = a
2

2

72
Sum of normal r. v.
(a) The mean is just shifted accordingly to
this linear scaling.
(b) b does NOT affect the variance of Z.
This makes sense as b is just a
translation of the data and should not
affect how the data are spread out.
Note also that a
2
is involved.
73
A sequence of random
variables
Now consider the problem of doing a
series of experiments, and assume the
outcome of each experiment is random.
Alternatively, we are collecting a large
number of data point, and we assume
each data point might be considered as
the outcome of a random experiment
(e.g. asking for information in a census).
74
Sequence of random variables
Now consider a sequence of n random
variables (e.g. throwing a die n times,
doing the experiment n times, or asking
for the age of n residents in a
censusetc). Each outcome is a
random variable X
r ,
r = 1, 2, 3 n.

75
The Sample Mean (Careful!!)
The sample mean is defined by



The sample mean is a random
variable itself!!!

=
=
n
i
i
X
n
X
1
1
76
The Sample Variance (Careful)
The sample variance is defined by:




Note: the denominator is n 1
to get an unbiased estimation.
( )

=
n
i
i
X X
n
S
1
2
2
1
1
77
Unbiased Estimator
A function or an expression of a
random variable will be an
UNBIASED ESTIMATOR of a
random variable, if the expectation
or mean will give the true mean of
the random variable, e.g. the
Sample Mean is an unbiased
estimator of the mean.
78
Mean and S.D. of the Sample
Mean
Since all are normally distributed
then the mean and variance of the sample
mean are:
i
X
n
X Var
X
2
) (
) (
o

=
=
) (
2
o , N
79
t- distribution
Arises in the problem of estimating the mean
of a normally distributed population when the
standard deviation is unknown.

The random variable:


follows a t- distribution with degree of freedom
n-1
n S
X
80
t- distribution
The probability density function is:





with k as the degree-of-freedom
R t
k
t
k
k
k
t f
k
e
|
|
.
|

\
|
+
|
.
|

\
|
I
|
.
|

\
|
+
I
=
+


2
1
2
1
2
2
1
) (
t
81
t- distribution
The pdf
82
t- distribution
The cdf
83
Hypothesis testing
Example 1:
Sample space: All cars in America
Statement (hypothesis): 30% of
them are trucks.
84
Hypothesis testing
Impossible to examine all cars in
the country (impractical).
Test a sample of cars, e.g. find
500 cars in a random manner. If
close of 30% of them are trucks,
accept the claim.
85
Hypothesis testing
Example 2:
Sample space: All students at HKU
Statement (hypothesis): The
average balance of their bank
accounts is 100 dollars.
86
Hypothesis testing
Not enough time and money to ask
all students. They might not tell you
the truth anyway.
Test a sample of students, e.g.
find 50 students in a random
manner. If the statement holds,
accept the claim.
87
Hypothesis testing
The original hypothesis is also
known as the null hypothesis,
denoted by
Null hypothesis, H
0
: = a given
value.
Alternative hypothesis, H
1
: the
given value.
0
H
88
Hypothesis testing
Type I error:
Probability that we reject the null
hypothesis when it is true.
Type II error:
Probability that we accept the null
hypothesis when it is false (other
alternatives are true).

89
Hypothesis testing
Class Example A Claim: 60% of all
households in a city buy milk from
company A. Choose a random
sample of 10 families, if 3 or less
families buy milk from company A,
reject the claim.
H
0
: p = 0.6 versus H
1
: p < 0.6
90
Hypothesis testing
One sided test (
0
= a given value):
H
0
:

=
0
versus H
1
:


<
0

H
0
:

=
0
versus H
1
:

>
0

Two sided test:
H
0
:

=
0
versus H
1
:


0



91
Hypothesis testing
Implication in terms of finding the
area from the normal curve:
For 1-sided test, find the area in one
tail only.
For 2-sided test, the area in both tails
must be accounted for.
92
Hypothesis testing
Probability model: Binomial dist.
Type I error: rejecting null
hypothesis even though it is true,
i.e. (we are so unfortunate in
picking the data such that) 3 or less
families buy milk from company A,
even though p is actually 0.6.
93
Hypothesis testing
That very small chance of picking
these unfortunate or far away
from the mean data is called the
LEVEL OF SIGNIFICANCE.

94
Hypothesis testing


0548 . 0 4 . 0 6 . 0
10
) 3 (
10
3
0
=
|
|
.
|

\
|
= s

=

r r
r
r
X P
95
Hypothesis testing
Type II error: accepting null
hypothesis when the alternative
is true. Usually cannot do much as
we need to fix a value of p before
we can compute a binomial
distribution.


96
Hypothesis testing
A simple case of p = 0.3 is illustrated here:



Hence the chance that the alternative is
rejected is (hence accepting the null
hypothesis):
6496 . 0 7 . 0 3 . 0
10
) 3 (
10
3
0
=
|
|
.
|

\
|
= s

=

r r
r
r
X P
3504 . 0 6496 . 0 1 =
97
Hypothesis testing
The previous example utilizes the
binomial distribution. Let consider
one where we need to use the
normal approximation to the
binomial.
98
Hypothesis testing
Class Example B: A drug is only
25% effective. For a trial with 100
patients, the doctors will believe
that the drug is more than 25%
effective if 33 or more patients
show improvement.
99
Hypothesis testing
What is the chance that the doctor
will (falsely) believe that the drug is
endorsed even it is really only 25%
effective? i.e. What is the chance
that we have such a group of good
patients that most of them improve
on their own?
100
Hypothesis testing
For binomial distribution, we sum r
for
100
C
r
(0.25)
r
(0.75)
100 r


r = 33 to 100.


101
Hypothesis testing
We use the normal approximation
and consider
(32.5 - 100(0.25))
/Sqrt[100(0.25)(0.75)]
= 1.732
102
Hypothesis testing
We then find the area of the normal
curve to the right of 1.732 (as the
upper limit of 100.5 is effectively
infinity). That will be the Type I
error.
103
Hypothesis testing
In practice we work in reverse. We
fix the magnitude of the Type I error,
i.e. the level of significance, and
then determine what is threshold
level of patients for endorsing the
drug.
104
Hypothesis testing
Probably the most important
application is to test hypothesis
involving the sample mean. The
standard deviation may or may not
be known (the more logical case is
that it is unknown).
105
Hypothesis testing
If the standard deviation of the
whole population is known, then
the standard variable is:
n
X
Z
o

=
106
Hypothesis testing
This is not practical nor reasonable as
the standard deviation of the whole
population is usually unknown.
The SAMPLE standard deviation
variable in this case:
n S
X
t

=
107
Hypothesis testing
S is the sample standard deviation
obtained by taking the square root of
the sample variance.

Use the t- distribution instead of normal
distribution tables.
108
Hypothesis testing
Class example C:
CLAIM: Life expectancy of 70 years
in a metropolitan area.
In a city, from an examination of the
death records of 100 persons, the
average life span is 71.8 years.
109
Hypothesis testing
i.e. you actually have noted the 100
data points, add them together and
divide by 100 to get the sample
mean of 71.8
110
Hypothesis testing
H
0
: = 70 versus
H
1
: > 70
Using a level of significance of 0.05,
i.e.
z = (Xbar mu)/(sigma/sqrt(n))
must be compared with 1.645.
111
Hypothesis testing
For the present example, assume
sigma is known at 8.9, then
(71.8 70)/(8.9/Sqrt[100])
= 2.02
As 2.02 > 1.645,
Reject H
0
, life span is bigger than 70
years.

112
Hypothesis testing
Testing hypothesis is DIFFERENT
from solving a differential equation,
e.g. to solve
dy/dx = y, y(0) = 1;
Once you identity y = exp(x), that is
the exact solution beyond all doubt.
113
Hypothesis testing
Nobody can argue with you
regarding the true solution of the
differential equation.
In Hypothesis Testing, we do NOT
prove that the mean is a certain
value. We just assert that the data
are CONSISTENT with that claim.