AMA1611 Lecture 5 and 6 Powerpoints
AMA1611 Lecture 5 and 6 Powerpoints
Data Analytics
Fundamentals
Solution:
As a table,
! 0 1 2 3 4
1 4 6 4 1
Pr ) = !
16 16 16 16 16
7 Example 2
Solution:
As a formula,
4
#&$
Pr # = % = = %
16 16
for % = 0, 1, 2, 3, 4.
Example 2
8
Find the probability distribution of the number of heads when
a coin is tossed 4 times.
Solution:
As a graph,
Note: Total area of rectangle = 1.
0.375
0.25 0.25
0.0625 0.0625
0 1 2 3 4
9 Example 3
An experiment of tossing two fair dice. Let ! be the sum of
two dice. Express the probability distribution of ! in a table
form.
Solution:
2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
10 Example 3
An experiment of tossing two fair dice. Let ! be the sum of
two dice. Express the probability distribution of ! in a table
form.
Solution: As a table,
! 2 3 4 5
+ )=! 1⁄36 2⁄36 3⁄36 4⁄36 2 3 4 5 6 7
3 4 5 6 7 8
! 6 7 8 9
+ )=! 5⁄36 6⁄36 5⁄36 4⁄36 4 5 6 7 8 9
5 6 7 8 9 10
! 10 11 12
+ )=! 3⁄36 2⁄36 1⁄36 6 7 8 9 10 11
7 8 9 10 11 12
11 3.1.1 Random Variables
´ The probability function, ! " , of a discrete random
variable # expresses the probability that # takes the
value ", as a function of ". That is
! " = Pr # = "
where the function is evaluated at all possible values of ".
´ Properties of probability function Pr # = " :
1. Pr # = " ≥ 0 for any value ".
2. The individual probabilities sum to 1; that is
6 Pr # = " = 1.
.
12 Example 4
Find the probability function of the number of boys
on a committee of 3 selected at random from 4
boys and 3 girls.
Solution:
Let ! be the number of boys on a committee.
#&$ × %&%&$
Pr ! = % =
'&%
for % = 0, 1, 2 and 3.
13 3.1.1 Random Variables
Continuous Probability Distribution
1. The total area under this curve bounded by the
%-axis is equal to one.
2. The area under the curve between lines % = -
and % = . gives the probability that ! lies
between - and ., which can be denoted by
Pr(- ≤ ! ≤ .).
3. We call 2(%) a "probability density function", i.e.
p.d.f.
14 3.1.2 Mathematical Expectations
= 0.21
16 3.1.2 Mathematical Expectations
Definition
Let ! be a random variable. The expectation of the
squared discrepancy about the mean, 3 ! − 4( ) , is
called the variance, denoted 7() , and given by
8-9 ! = 7() = 3 ! − 4( )
= 8 % − 4( )
Pr ! = %
$
= 8 % ) Pr ! = % − 4()
$
17 3.1.2 Mathematical Expectations
1. 3 -! + . = -3 ! + . = -4( + .
Standard deviation
C. V. = ×100%
Mean
20 Example 6
Decide which project to invest, Project A or Project B.
PROJECT A PROJECT B
Solution:
Without considering risk, choose B because , - > , / . But,
01231456 / = 0.3 150 − 205 / + 0.3 200 − 205 / + 0.4 (250 −
205)/
= 1725
⇒ AB / = 41.53
/ /
01231456 - = 0.2 −400 − 220 + 0.6 300 − 220 + 0.1 (400 −
220)/
/
+ 0.1 800 − 220
= 117600
⇒ AB - = 342.93
Risk averse management might prefer A.
Example 6
22
Solution:
Risk can be compared more satisfactorily by taking the
ratio of the standard deviation to the mean of profit.
41.53
C. V. of project A = ×100% = 20.3%
205
342.93
C. V. of project B = ×100% = 155.9%
220
As a result, B is relatively riskier than A
3.2 The Normal Distribution
23
Definition
A continuous random variable # is defined to be a normal
random variable if its probability function (p.d.f.) is given by
H H G−O (
F G = LMN − PQR − ∞ < G < +∞
I JK J I
where
´ U = the mean of #
´ V = the standard deviation of #
´ W ≈ 3.14159
Notation: We use # ~ Z U, V / to mean # is a normal random
variable with mean U and variance V /.
Example 7
24
Figure below shows three normal probability distributions,
each of which has the same mean but a different standard
deviation. Even though these curves differ in appearance, all
three are “normal curves”.
-3 -2 -1 0 1 2 3
3.2 The Normal Distribution
25
Properties of the normal distribution
1. It is a continuous distribution.
2. The curve is symmetric and bell-shaped about
a vertical axis through the mean 4, i.e. mean =
mode = median = 4.
3. The total area under the curve and above the
horizontal axis is equal to 1.
3.2 The Normal Distribution
26
Properties of the normal distribution
4. Area under the normal curve:
´ Approximately 68% of the values in a normally distributed population within 1
standard deviation from the mean.
´ Approximately 95.5% of the values in a normally distributed population within 2
standard deviation from the mean.
´ Approximately 99.7% of the values in a normally distributed population within 3
standard deviation from the mean.
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
-4 -3 -2 -1 0 1 2 3 4
3.2 The Normal Distribution
27
Definition
The distribution of a normal random variable with U = 0 and V = 1 is
called a standard normal distribution. Usually a standard normal
random variable is denoted by \.
Notation: \ ~ Z 0, 1
-3 -2 -1 0 1 2 3
30
Example 8(b)
Given R ~ T(0, 1), find
U(0 < R < 1.73).
Solution:
U 0 < R < 1.73 = 0.5 −
0.0418 = 0.4582
-3 -2 -1 0 1 2 3
30
Example 8(c)
31
Given R ~ T(0, 1), find U(−2.42 < R <
0.8).
Solution:
U −2.42 < R < 0.8 = 1 − 0.00776
−0.2119 = 0.78034
-3 -2 -1 0 1 2 3
Example 8(d)
32
-3 -2 -1 0 1 2 3
33 Example 8(e)(i)
Given 9 ~ ;(0, 1), find the value <
that has 5% of the area below it.
Solution:
= 9 < < = 0.05
< = −1.645
-3 -2 -1 0 1 2 3
34 Example 8(e)(ii)
Given ( ~ 1(0, 1), find the value - that has
39.44% of the area between 0 and -.
Solution:
4 0 < ( < - = 0.3944
4 ( > - = 0.5 − 0.3944
4 ( > - = 0.1056
- = 1.25
!=
-3 -2 -1 0 1 2 3
35 3.2 The Normal Distribution
Theorem
If ! is a normal random variable with mean 4 and
standard deviation 7, then
!−4
R=
7
is a standard normal random variable and hence
%B − 4 %) − 4
Pr %B < ! < %) = Pr <R<
7 7
36 Example 9
Given A ~ 1 50, 10! , find 4 45 < A < 62 .
Solution:
"#$% '$% (!$%
4 45 < A < 62 = 4 < <
& & &
"#$#) '$#) (!$#)
=4 < <
*) *) *)
= 4 −0.5 < ( < 1.2
= 1 − 0.3085 − 0.1151
= 0.5764
-3 -2 -1 0 1 2 3
Example 10(a)
37
The charge account at a certain department store is
approximately normally distributed with an average balance
of $80 and a standard deviation of $30. What is the
probability that a charge account randomly selected has a
balance over $125.
Solution:
Let A be the balance of charge account ($).
A ~ 1(80, 30! )
'$+) *!#$+)
4 A > 125 = 4 >
,) ,)
= 4 ( > 1.5
= 0.0668
-3 -2 -1 0 1 2 3
Example 10(b)
The charge account at a certain department store is approximately
normally distributed with an average balance of $80 and a standard
deviation of $30. What is the probability that a charge account randomly
selected has a balance between $65 and $95.
Solution:
Let A be the balance of charge account ($).
A ~ 1(80, 30! )
(#$+) '$+) -#$+)
4 65 < A < 95 = 4 < <
,) ,) ,)
-3 -2 -1 0 1 2 3
Example 11
-3 -2 -1 0 1 2 3
On an examination the average grade was 74 and the standard deviation was 7. If 12% of the class are given A's, and the
grades are curved to follow a normal distribution, what is the lowest possible A and the highest possible B?
Solution:
39
Assume that grades are given in whole number. The lowest possible A is 83 and the highest possible B
is 82.
End of Lecture 5. After Lecture
Lecture notes:
Read Chapter 3 3.1 to 3.2
Coming Tutorial 5:
Discuss Chapter 3 Ex. 5,6,10 and 12
3.3 The Binomial Distribution
41
Definition
In a binomial experiment with a constant probability ^ of
success at each trial, the probability distribution of the
binomial random variable #, the number of successes in 4
independent trials, is called the binomial distribution.
Notation: # ~ _ 4, ^
4 . =>.
] # = " = =`. ^ . a=>. = " ^ a for " = 0, 1, … , 4
where ^ + a = 1.
44 Example 12(1)
1. In testing 10 items as they come off an assembly line, where each test or
trial may indicate a defective or a non-defective item. Let ! be the
number of defective items among the 10 tested items. What is the
distribution of !?
Solution:
i. There are n = 10 identical observations or trials.
ii. Each trial has two possible outcomes, one called defective and the
other non-defective. The outcomes are mutually exclusive and
collectively exhaustive for each trial.
iii. The probabilities of defective, &, and of non-defective, 1 − &, remain
the same for all trials.
iv. The outcomes of trials are independent of each other.
Therefore,
! ~ ) 10, &
Example 12(2)
45
2. Five cards are drawn with replacement from an ordinary deck
(without the Joker cards) and each trial is labelled a success or failure
depending on whether the card is red or black. Let D be the number
of red cards among the 5 cards drawn. What is the distribution of D?
Solution:
i. There are n = 5 identical observations or trials.
ii. Each trial has two possible outcomes, one called red and the
other black. The outcomes are mutually exclusive and
collectively exhaustive for each trial.
iii. The probabilities of red, F = 26⁄52, and of black, 1 − F, remain the
same for all trials.
iv. The outcomes of trials are independent of each other.
Therefore,
D ~ H 5, 26⁄52
Example 13(a) – Approach 1
46
Theorem
The mean (4 ) and variance (7 ) ) of the binomial
distribution with parameters \ and ] are:
4 = \] and
7 ) = \]^, respectively
where ] + ^ = 1.
52 Example 15
A packaging machine produces 20 percent defective
packages. A random sample of ten packages is selected,
what are the mean and standard deviation of the
binomial distribution of that process?
Solution:
Let # be the number of defective packages among the 10
randomly chosen packages. Then, # ~ _(10, 0.2).
, # = 4^ = 10 0.2 = 2
AB # = 012 # = 4^a = 10 0.2 0.8 ≈ 1.265
53 3.3 The Binomial Distribution
The Normal Approximation to the Binomial Distribution
Theorem
Given # is a random variable which follows the binomial
distribution with parameters 4 and ^, then
n=4, np=1, nq=3 n=10, np=2.5, nq=7.5 n=25, np=6.25, nq=18.75 n=50, np=12.5, nq=37.5
0.45 0.3 0.2 0.14
0.4 0.18
0.25 0.12
0.35 0.16
0.14 0.1
0.3 0.2
0.12
0.25 0.08
0.15 0.1
0.2 0.06
0.08
0.15 0.1
0.06 0.04
0.1 0.04
0.05 0.02
0.05 0.02
0 0 0 0
0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20 22 24 0 3 6 9 12151821242730333639424548
55 3.3 The Binomial Distribution
Remark: If both 4^ and 4a are greater than 5, the
approximation will be good.
For example, when ^ = 0.05,
n=4, np=0.2, nq=3.8 n=10, np=0.5, nq=9.5 n=25, np=1.25, nq=23.75 n=50, np=2.5, nq=47.5
0.9 0.7 0.4 0.3
0.8 0.35
0.6 0.25
0.7
0.3
0.5
0.6 0.2
0.25
0.5 0.4
0.2 0.15
0.4 0.3
0.15
0.3 0.1
0.2
0.2 0.1
0.1 0.05
0.1 0.05
0 0 0 0
0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20 22 24 0 3 6 9 12151821242730333639424548
3.3 The Binomial Distribution
56
Remark: If both \] and \^ are greater than 5, the
approximation will be good.
For example, when ] = 0.95,
n=4, np=3.8, nq=0.2 n=10, np=9.5, nq=0.5 n=25, np=23.75, nq=1.25 n=50, np=47.5, nq=2.5
0.9 0.7 0.4 0.3
0.8 0.35
0.6 0.25
0.7
0.3
0.5
0.6 0.2
0.25
0.5 0.4
0.2 0.15
0.4 0.3
0.15
0.3 0.1
0.2
0.2 0.1
0.1 0.05
0.1 0.05
0 0 0 0
0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20 22 24 0 3 6 9 12151821242730333639424548
57 3.3 The Binomial Distribution
The Normal Approximation to the Binomial Distribution
Theorem
" − h. i − 4^ " + h. i − 4^
Pr # = " = Pr <\<
4^a 4^a
Example 16
58
A process yields 10% defective items. If 100 items are randomly selected from
the process, what is the probability that the number of defective items exceeds
13?
Solution:
Let A be the number of defective items in the 100. Then, A ~ H(100, 0.1).
Approach 1:
4 the number of defective items exceeds 13 = 4 A > 13
= 4 A = 14 + 4 A = 15 + ⋯ + 4 A = 100 ≈ 0.1239
Approach 2:
4 the number of defective items exceeds 13 = 4 A > 13
= 1 − 4 A ≤ 13 = 1 − 4 A = 0 − 4 A = 1 − ⋯ − 4 A = 13
≈ 0.1239
Can we obtain an approximately accurate answer with less amount of
calculation?
59
Example 16
Since 4^ = 100 0.1 = 10 and 4a = 100 0.9 = 90 are greater than 5,
the Normal approximation will be good. In other words, we have
# ~ Z 100 0.1 , 100 0.1 0.9 ~ j 10, 9 .
] the number of defective items exceeds 13 = ] # > 13
F>@? @BGE.F>@?
=] >
B B
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
13.5
60 Example 16
4 the number of defective items exceeds 13 = 4 A > 13
F>@? @BG?.E>@?
=] >
B B
= \ > 1.1667
= 0.1210
-3 -2 -1 0 1 2 3
Example 17 – Exact
61
24.5
30.5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
63 Example 17 – Normal Approximation
Solution:
= ] 1.1619 ≤ \ ≤ 2.7111
= 0.1230 − 0.00336
≈ 0.1196
-3 -2 -1 0 1 2 3
End of Lecture 6. After Lecture
Lecture notes:
Read Chapter 3 3.3
(exclude 3.4 The Possion Distribution)
Coming Tutorial 6:
Discuss Chapter 3 Ex. 9, 11 and 13(a)
3.4 The Poisson Distribution
65
Experiments yielding numerical values of a random
variable #, the number of successes (observations)
occurring during a given time interval (or in a specified
region) are often called Poisson experiments.
A Poisson experiment has the following properties:
1. The number of successes in any interval is independent
of the number of successes in other non-overlapping
intervals.
2. The probability of a single success occurring during a
short interval is proportional to the length of the time
interval and does not depend on the number of
successes occurring outside this time interval.
3. The probability of more than one success in a very small
interval is negligible.
66 3.4 The Poisson Distribution
Examples of random variables following Poisson
Distribution
1. The number of customers who arrive during a
time period of length _.
2. The number of telephone calls per hour
received by an office.
3. The number of typing errors per page.
4. The number of accidents per day at a junction.
67 3.4 The Poisson Distribution
Definition
The probability distribution of the Poisson random
variable ! is called the Poisson distribution.
Notation: ! ~ Po ` where ` is the average number
of successes occurring in the given time interval.
a &N `$
U !=% = for % = 0, 1, 2, …
%!
where a ≈ 2.718282.
68 3.4 The Poisson Distribution
Motivation
For example, the number of typing errors per page
can be modelled by Binomial distribution.
Since there are a lot of word on a page and the
chance of having a typo is small, we have a very
large \ and a very small ].
\ $ O&$ a &N `$
] ^ ≈
% %!
where ` = \].
69 Example 18
The average number of radioactive particles passing
through a counter during 1 millisecond in a laboratory
experiment is 4. What is the probability that 6 particles
enter the counter in a given millisecond?
Solution:
Let # be the number of radioactive particles passing
through the counter in 1 millisecond. Then, # ~ ]m(4).
6 >D4C
] #=6 = = 0.1042
6!
70 Example 19
Ships arrive in a harbour at a mean rate of two per
hour. Suppose that this situation can be described
by a Poisson distribution. Find the probabilities for a
30-minute period that
(a) No ships arrive;
(b) Three ships arrive.
71 Example 19(a)
Ships arrive in a harbour at a mean rate of two per hour.
Suppose that this situation can be described by a Poisson
distribution. Find the probabilities for a 30-minute period
that (a) No ships arrive.
Solution:
Let # be the number of arrivals in 30 minutes. Then,
# ~ ]m(1).
6 >@1?
] No ships arrive = ] # = 0 = = 6 >@ ≈ 0.3679
0!
72 Example 19(b)
Ships arrive in a harbour at a mean rate of two per hour.
Suppose that this situation can be described by a Poisson
distribution. Find the probabilities for a 30-minute period
that (b) Three ships arrive.
Solution:
Let # be the number of arrivals in 30 minutes. Then,
# ~ ]m(1).
6 >@1B
] Three ships arrive = ] # = 3 = ≈ 0.06131
3!
3.4 The Poisson Distribution
73
Theorem
The mean and variance of the Poisson distribution are both
equal to o.
Poisson approximation to the binomial distribution
If 4 is large and ^ is near 0 or near 1 in the binomial
distribution, then the binomial distribution can be
approximated by the Poisson distribution with parameter
o = 4^.
General speaking, the Poisson distribution will provide a
good approximation to binomial when
i. 4 is at least 20 and ^ is at most 0.05; or
ii. 4 is at least 100, the approximation will generally be
excellent provided ^ < 0.1.
Example 20 – Exact
74 If the probability that an individual suffers a bad reaction from a
certain injection is 0.001, determine the probability that out of
2000 individuals, more than 2 individuals will suffer a bad
reaction.
Solution:
Let # be the number of individuals suffering a bad reaction. Then,
# ~ _ 2000, 0.001 .
] # >2 =1−] # ≤2
=1−] # =0 −] # =1 −] # =2
= 1 − /???`? 0.001 ? 0.999 /???>? − /???`@ 0.001 @ 0.999 /???>@
# %QQ&#
U !=4 = %QQ&# 0.02 0.98 ≈ 0.133841964
77 Example 21 – Poisson Approximation
Solution:
\ = 300 is at least 20, the Poisson approximation
will be excellent provided ] = 0.02 ≤ 0.05. Then,
! ~ Ue 300×0.02 = Ue 6 .
i&S jT
f g=h = ≈ k. lmmnopjln
h!