Professional Documents
Culture Documents
MATHEMATICS
Analysis and Approaches (SL and HL)
Lecture Notes
Christos Nikolaidis
TOPIC 4
STATISTICS AND PROBABILITY
Only for HL
June 2019
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Discrete OR Continuous
{10,20,30} [40,100]
{0,1,2,3,…} R
(finite or numerable set) (interval)
1
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Colored
Freq
Balls
Blue 13
Green 8
Red 10
Yellow 3
Age Frequency
[0,10) 7
[10,20) 5
[20,30) 1
[30,40) 3
2
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
3
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
Here
10 + 20 + 20 + 20 + 30 + 30 + 40 + 50 +70 +70 +80
mean = = 40
11
median = 30
4
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
• For the data 10, 20, 30
Median = 20
For the data 10, 20, 30, 40
Median = 25
That is, for an even number of data,
median = the mean of the two middle values
• The median is not the n -th entry as one would possibly expect.
2
10, 20, 30, 40, 50, 60, 70, 80, 90, 100
µ=
∑x i
5
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
Find
a) the integers a ≤ b ≤ c , given that mean=4, mode=5, median=5.
The median implies that b=5. The mode implies that also c=5.
a+5 +5
Then = 4 ⇔ a + 10 = 12 ⇔ a = 2
3
Therefore, the numbers are 2,5,5.
♦ MEASURES OF SPREAD
We use the same set of data
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
A) STANDARD DEVIATION
2 In fact,
the Greek letter σ is used for the whole population;
the Latin letter sn is used for a sample of the population
6
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
As the estimation of the values Q1, Q2, Q3 is quite tricky, let us see
some extra cases in the following example.
b) For n=8 entries: 10, 20, 30, 40, 50, 60, 70, 80
The median is Q2=45 (the 4.5th entry). Hence Q1=25, Q3=65.
c) For n=9 entries: 10, 20, 30, 40, 50, 60, 70, 80, 90
The median is Q2=50 (the 5th entry). Hence Q1=25, Q3=75.
d) For n=10 entries: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
Then Q2=55 (the 5.5th entry). Hence Q1=30, Q3=80.
7
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
The square of the standard deviation is called variance. That is
2
variance = σ2 or sn
♦ USE OF GDC
Notice that
The standard deviation in the GDC is denoted by σχ
The variance is not given; it is simply the square of σχ
min Q1 Q2 Q3 max
8
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ MORE DETAILS
1) Percentiles
The values Q1, Q2, Q3 are also called
Q1 : 25th-percentile
Q2 : 50th-percentile
Q3 : 75th-percentile
2) Outliers
Such a value is viewed as being too far from the central values to
be reasonable. In our example,
Q1 - 1.5×IQR = 20 - 1.5×50 = - 55
9
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
∑x
2
i 10 2 + 202 + 202 + L + 80 2 23400
= = =2127.27
n 11 11
Hence
σ 2 =2127.27- 402 =527.27
∑(x − x) ∑(x
2 2
2 i i − 2xi x + x 2 )
σ = =
n n
∑x ∑x + ∑x ∑x
2 2 2
i i i nx 2
= − 2x = − 2xx +
n n n n n
∑x ∑x
2 2
i 2 2 i
= − 2x + x = − x2
n n
10
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Data Frequency
x f
10 1
20 3
30 2
40 1
50 1
70 2
80 1
n=11
µ=
f1 x1 + f2 x 2 + f3 x3 +L
or otherwise µ=
∑f x
i i
n n
11
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
It helps here to add an extra column in the table above with the
so-called cumulative frequencies:
♦ MEASURES OF SPREAD
A) STANDARD DEVIATION
12
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ USE OF GDC
13
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ GROUPED DATA
Suppose that 100 students took an exam and obtained scores from
1 to 60 (full marks), according to the following table:
0 < x ≤ 10 5 8 8
10 < x ≤ 20 15 12 20
20 < x ≤ 30 25 10 30
30 < x ≤ 40 35 25 55
40 < x ≤ 50 45 35 90
50 < x ≤ 60 55 10 100
n=100
14
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
x: up to ≤ 10 ≤ 20 ≤ 30 ≤ 40 ≤ 50 ≤ 60
y: c.f 8 20 30 55 90 100
15
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Below that graph we can easily draw box and whisker plot:
In the same way we can find any percentile. For example, for the
40th-percentile
Estimate 40% of n: here 40% of 100 students is 40;
Draw a horizontal line at y=40 until you meet the curve;
Then draw a vertical line;
Hence
40th-percentile = 35.
Q1 - 1.5×IQR = 26 - 1.5×50 = - 4
Q3 + 1.5×IQR = 46 + 1.5×50 = 76
16
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Given that fi is the frequency of the entry xi, the formulas now
become:
σ 2
=
∑f (x − µ)
i i
2
n
thus
σ=
∑f (x − µ)
i i
2
In our example,
and then
standard deviation = 527.27 = 22.96
∑f x
2
2 i i r
sn = - x2
n
r
For our example, we have x =40 and
∑f x
2
i i 1 × 10 2 + 3 × 20 2 + 2 × 20 2 + L 23400
= = =2127.27
n 11 11
Hence
σ 2 =2127.27- 402 =527.27
17
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
4.4 REGRESSION
x 10 12 15 20 23 28 30
y 120 135 174 213 270 301 305
300
200
100
x
-5 5 10 15 20 25 30 35
The closest to the ends ±1, the more our data are linearly related.
(-1 implies a negative slope while +1 implies a positive slope)
The closest to 0, the less our data are linearly related.
There is also a line y=ax+b that best fits our data; it is known as
regression line. We can easily obtain these details by using a GDC.
18
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ USE OF GDC
300
200
y =9.83x+23.1
100
x
-5 5 10 15 20 25 30 35
Notice that x=18 is within the range of our list while x=40 is not.
f(18)=200 is known as interpolation, f(40)=416 as extrapolation.
In general, interpolations are more reliable than extrapolations.
19
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
• CALC
• 2VAR: We obtain all the statistics, separately for x’s and y’s
In our example
x = 19.7 y =216.9
Thus the line passes through the point M(19.7, 216.9).
20
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
x y
y
10
r =1
1 2 8
perfect positive
2 4 6 correlation
3 6 4
Regression line:
4 8 2
x y=2x
5 10 1 2 3 4 5
x y y
r =-1
10
1 10 perfect negative
8
2 8 correlation
6
3 6 4
Regression line:
4 4 2
y=-2x+12
5 2 1 2 3 4 5
x
x y y
10
r =0.98
1 2
8 strong positive
2 3
6 correlation
3 7 4
4 8 Regression line:
2
5 10 x y=2.1x-0.3
1 2 3 4 5
x y y
10
r =-0.98
1 10 strong negative
8
2 8 6 correlation
3 7 4
Regression line:
4 3 2
x y=-2.1x+12.3
5 2 1 2 3 4 5
x y y
10
r= 0
1 8
8 no correlation
2 2
6 at all
3 5 4
4 2 Regression line:
2
5 8
x y=5
1 2 3 4 5
21
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ BASIC NOTIONS
In elementary set theory, a set is just a collection of objects (or
elements). It is usually denoted by a capital letter. For example,
a∈B
f ∉B
Let us consider the set A = {1,2,3}. The subsets of A are sets that
contain some (or none or all) elements of A. There are 8 subsets:
∅
{1}, {2}, {3}
{1,2}, {1,3}, {2,3}
{1,2,3}
22
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
In general,
if A contains n elements, there are 2n subsets.
B⊂ A
♦ VENN DIAGRAMS
We usually refer to a large set S, called universal set, and consider
several subsets of S.
Let
S = { a,b,c,d,e,f,g,h,i,j }
A
a b d f h
c e g i j
23
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
A B
a b d f h
c e g i j
S 10
A B
3 2 2
We denote by
n(A) = the number of elements of set A
In our example
n(S) = 10
n(A)=5 n(B)=4
Notice that the number n(A)=5 does not appear on the Venn
diagram. The subset A consists of two regions of size 3 and 2, thus
n(A)=3+2=5
24
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Now we can study some basic operations between sets. Let us refer
again to our example where S = { a,b,c,d,e,f,g,h,i,j } and
A = {a,b,c,d,e}
B = {d,e,f,g}
S
A
S
A B
S
A B
25
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ A BASIC PREPERTY
S A B
26
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
5.5 PROBABILITY
n(A)
P(A) =
TOTAL
For example, in the following Venn diagram, the sample space S
contains 100 elements, while the event A contains 30 elements
100
30
70
n(A) 30
P(A) = = = 0.3
TOTAL 100
We understand that
0 ≤ P(A) ≤ 1
Clearly
P(∅)=0 and P(S) = 1
27
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ COMPLEMENTARY EVENTS
In our example above P(A΄) = 0.7
In general
P(A΄) = 1- P(A)
♦ COMBINED EVENTS
Remember the basic property for combined events
A B
20 10 30
40
Then
P(A) = 0.3, P(B) = 0.4, P(A)=0.7, P(B)=0.6
Also
P(A∩B)=0.1, P(A∪B)=0.6
Clearly
P(A∪B) = P(A) + P(B) – P(A∩B)
0.6 = 0.3 + 0.4 – 0.1
A Venn diagram may also contain probabilities instead of numbers
of elements. The Venn diagram above takes the form
A B
0.4
28
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
S A B
EXAMPLE 1
Given that P(A) = 0.5, P(B) = 0.3, P(A∪B)=0.6, let us construct a
Venn diagram representing the combined events A and B
Notice that
P(A∪B)≠P(A)+P(B)
0.6 ≠ 0.8
The difference implies the existence of an intersection; P(A∩B)=0.2
Starting from the intersection 0.2 we may easily complete the
following Venn diagram
A B
0.4
29
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ TABLES
Another way to represent sets in order to find probabilities is the
tabular form below. It is appropriate when the sample space is
partitioned in disjoint subsets according two different criteria; for
example MALE-FEMALE and SMOKERS-NON SMOKERS.
120
• male is P(male) = = 0.6
200
80
• female is P(female) = = 0.4
200
60
• smoker is P(smoker) = = 0.3
200
140
• non-smoker is P(non - smoker) = = 0.7
200
40
• male AND smoker P(male ∩ smoker) = = 0.2
200
140
• male OR smoker P(male ∪ smoker) = = 0.7
200
30
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ TWO DICE
1 2 3 4 5 6
1 • • • • • •
2 • • • • • •
3 • • • • • •
4 • • • • • •
5 • • • • • •
6 • • • • • •
Notice that there is only one combination ones (the first dot; 1-1)
but two combinations of one-two (1-2 and 2-1).
1
P(two sixes) = (the very last dot)
36
11
P(at leat one six) = (last column and last row)
36
10
P(exactly one six) = (why?)
36
6
P(same score) = (the main diagonal: 1-1, etc)
36
4
P(sum of scores = 9) = (the dotted line)
36
6
P(sum of scores > 9) = (below the dotted line)
36
26
P(sum of scores < 9) = (above the dotted line)
36
31
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
is different than
1
Clearly P(A) = .
100
1
then P(A | B) = (there are 90 two-digit numbers)
90
n(A ∩ B) P(A ∩ B)
P(A | B) = or P(A | B) =
n(B) P(B)
32
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
A B
20 10 30
40
30
We know that P(A) = . What about P(A|B) ?
100
We start with the given event B; now the total number is not 100,
the size of the whole sample space, but only 40, the size of B:
?
P(A | B) =
40 ← given B
How many elements of A are inside the given space B? Only 10.
Therefore,
10
P(A | B) =
40
NOTICE
In fact, in the last result we apply the formula
n(A ∩ B) 10
P(A | B) = = = 0.25
n(B) 40
33
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
• Similarly we obtain
10
P(B | A) =
30 ← given A
• Similarly we obtain
30 20 40
P(A'| B) = P(A | B' ) = P(A'| B' ) =
40 60 60
♦ P(A|B) IN A TABLE
Perhaps it is much easier to observe the conditional probability in
tables. Consider again the example
Clearly,
60
P(smoker) =
200
40
P(smoker|male) =
120 ← given male
• Similarly we obtain
40
P(male|smoker) =
60 ← given smoker
• Similarly we obtain
20 60
P(female|smoker) = ≈ 0.33 P(non-smoker|female) = = 0.75
60 80
34
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ INDEPENDENT EVENTS
P(A|B) = P(A)
P(A ∩ B)
• In this case the definition P(A | B) = gives
P(B)
To summarize
EXAMPLE 1
120
A B
20 10 30
60
35
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
• Many students confuse the terms
Mutually exclusive events and Independent events
Remember
Mutually exclusive events means A ∩B = ∅
Independent events means P(A ∩ B) = P(A) ⋅ P(B)
• Mind that
P(A∪B) = P(A) + P(B) – P(A∩B) holds in general
P(A ∩ B) = P(A) ⋅ P(B) holds for independent events
1 1 1
P(A ∩ B) = P(A) ⋅ P(B) = =
6 2 12
36
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
Let P(A)=0.4 and P(B)=0.3. Find P(A∪B) in the following cases
a) A and B are mutually exclusive
b) A and B are independent
c) P(A ∩ B) = 0.2
d) P(A | B) = 0.2
Solution
a) P(A∪B) = P(A) + P(B) = 0.4 + 0.3 = 0.7
b) P(A∪B) = P(A) + P(B) - P(A) ⋅ P(B) = 0.4+0.3–(0.4)(0.3) = 0.58
c) P(A∪B) = P(A) + P(B) – P(A∩B) = 0.4+0.3–0.2 = 0.5
P(A ∩ B)
d) P(A | B) = ⇒ P(A∩B)= P(A | B) P(B)= (0.2)(0.3) = 0.06
P(B)
Hence, P(A∪B) = P(A)+P(B)–P(A∩B) = 0.4+0.3–0.06 = 0.64
EXAMPLE 3
Let A and B be independent events with
P(A)=0.4 and P(A∪B)=0.7.
Find P(B).
Solution
For independent events it holds
P(A∪B) = P(A) + P(B) - P(A) ⋅ P(B)
⇔ 0.7 = 0.4 + P(B) – 0.4 P(B)
⇔ 0.3 = 0.6 P(B)
⇔ P(B) = 0.5
37
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
AAAA BBBBBB
We play the game twice. All possible scenarios are shown below; the
corresponding probabilities are shown on the branches of the tree:
A
Scenario AA
0.4
A
0.4
0.6 B Scenario AB
Scenario BA
0.4 A
0.6 B
0.6 B Scenario BB
38
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
A
0.16
0.4
A
0.4
0.6 B
0.24
0.24
0.4 A
0.6 B
0.6 B 0.36
• to obtain no A is 0.36
39
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
AAAA BBBBBB
C
0.12
0.3
A
0.4
0.7 D
0.28
0.24
0.4 C
0.6 B
0.6 D 0.36
Thus
40
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
In the tree diagram above, the value 0.3 of the branch AC is in fact
the conditional probability
C
P(A∩
∩C)
P(C|A)
A
P(A)
P(D|A) D
P(A∩
∩D)
P(B∩
∩C)
P(C|B) C
P(B) B
P(D|B) D P(B∩
∩D)
41
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1.
We throw a die.
If we get 1 we stop.
If we get 2,3,4 or 5 we toss a coin.
If we get 6 we toss two coins.
Find the probability that only one head is obtained.
Solution.
For our convenience, we denote the results of the die by
A={1}, B={2,3,4,5}, C={6}
1/6
1/2 H
4/6 B
1/2
1/6 T H
1/2
C 1/2 1/2
H T
H
1/2 1/2
T
1/2 T
42
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
10 balls
We select two balls, one after the other. All possible outcomes are
clearly shown on the following tree diagram
30
B 90
5
9
B
6
10 4 24
90
9
W
6 B 24
9
90
4
10 W
3 12
9
W 90
6 5 30 1
P(both balls are BLACK) = ⋅ = =
10 9 90 3
6 4 24 8
P(only one ball is BLACK) = ⋅ ×2 = ×2 =
10 9 90 15
6 5 4 3 42 7
P(balls of same color) = ⋅ + ⋅ = =
10 9 10 9 90 15
43
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
C 0.12
0.3
A
0.4
0.7
D 0.28
C 0.24
0.4
0.6 B
0.6
D 0.36
We said that P(C|A) = 0.3 is shown on the tree (on the branch AC).
0.12 ← combination AC
P(A | C) =
0.12 + 0.24 ← given C
P(A ∩ C)
Actually, it is the formula P(A | C) =
P(C)
Therefore,
0.12 0.24
P(A | C) = ≈ 0.33 P(B | C) = ≈ 0.67
0.36 0.36
0.28 0.36
P(A | D) = ≈ 0.44 P(B | D) = ≈ 0.56
0.64 0.64
44
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Α P(A∩
∩B) = P(B)P(A|B)
P(Α|Β)
Β
P(B)
P(Α΄|Β) P(A΄∩
∩B) = P(Β)P(A΄|Β)
Α΄
Α P(A∩
∩B΄) = P(B΄)P(A|B΄)
P(Α|B΄)
P(B΄) Β΄
P(Α΄|B΄) P(A΄∩
∩B΄) = P(B΄)P(A΄|Β΄)
Α΄
P(B)P(A | B)
P(B | A) =
P(B)P(A | B) + P(B΄ )P(A | B΄ )
Assume now that the first level of the tree diagram consists of 3
disjoint events B1,B2,B3. Bayes’ theorem for P(B1|A) takes the form
P(B1 )P(A | B1 )
P(B1 | A) =
P(B1 )P(A | B1 ) + P(B2 )P(A | B2 ) + P(B3 )P(A | B3 )
45
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2.
In a private school party, 30% of the students wear RED suits, 20%
wear GREEN suits and 50% wear BLUE suits. 25% of the RED
students, 35% of the GREEN students and 45% of the BLUE
students are MALE. Find the probability that a MALE student
wears GREEN suit, that is
P(GREEN|MALE).
Solution.
Instead of applying the Bayes’ formula we will construct a tree
diagram to obtain the “inverse given” probability.
MALE 0.075
0.25
RED
D
0.3 FEMALE
MALE 0.070
0.2 GREEN 0.35
0.5
FEMALE
BLUE 0.45
MALE 0.225
FEMALE
Therefore,
0.070 0.07
P(GREEN|MALE) = = ≈ 0.189
0.075 + 0.070 + 0.225 0.37
46
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Discrete OR Continuous
X∈{0,1,2,3,…} X∈R
10, 20, 30
with probabilities
0.2, 0.3, 0.5
X 10 20 30
Clearly
P(X=10) = 0.2
47
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
X x1 x2 x3 …
P(X=x) p1 p2 p3 …
it holds
(i) pi ≥ 0 , for all i
(ii) ∑p i = 1, i.e p1 + p 2 + p3 + L = 1
We write
P(X=x1) = p1, P(X=x2) = p2, and so on.
X 10 20 30
48
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
Consider
X 10 20 30
P(X=x) a b 0.5
EXAMPLE 2
Consider again the same table above. But now we select one of the
numbers 10, 20, 30 at random.
If we select 10 we earn 6 points
If we select 20 we earn 1 point
If we select 30 we lose 2 points
X 10 20 30
Profit 6 points 1 point -2 points
Prob 0.2 0.3 0.5
49
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Explanation
In other words, if we play this game 10 times we expect to earn 5
points on average.
Indeed, if we play the game 10 times we expect to obtain
2 times the number 10, that is 2 × 6=12 points
3 times the number 20, that is 3 × 1=3 points
5 times the number 30, that is 5 × (-2)=-10 points
In total, 12+3-10 = 5 points
EXAMPLE 3
We throw two dice.
If we obtain TWO SIXES we earn 15€
If we obtain ONLY ONE SIX we earn 1€
If we obtain NO SIX we lose 1€
Notice. If the first winning prize was not 15€ but 14€, the expected
1
profit would be -
36
In other words, if we play the game 36000 times (or otherwise bet
36000€) we expect to lose 1000€.
50
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ MEDIAN-MODE
These measures, known from statistics, are defined analogously:
MODE = The value X=a of the highest probability
MEDIAN = The value X=m where the probability splits
in two equal parts (0.5-0.5)
Look at the examples below
X 10 20 30 X 10 20 30
P(X=x) 0.4 0.3 0.3 P(X=x) 0.2 0.3 0.5
MODE = 10 MODE = 30
MEDIAN = 20 MEDIAN = 25 (why?)
An equivalent definition is
Var(X) = E(X2)-µ2
where
E(X2) = x12 × p1 + x22 × p2 + x32 × p3 + …
EXAMPLE 4
Consider again the probability distribution
X 10 20 30
P(X=x) 0.2 0.3 0.5
51
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
n
p(x) = p x (1 − p) n- x x = 0,1,2,3, K, n
x
n = number of trials
p = probability of success
while
X is the random variable for the number of successes
EXAMPLE 1
We toss a die 5 times. Let the success be to obtain a six.
Then
1
n=5 and p=
6
The probability to obtain
5
1
always a six is P(X=5) =
6
5
5
no six at all is P(X=0) =
6
2 3
5 1 5
2 sixes (and thus 3 no-sixes) is P(X=2) =
2 6 6
In general, the probability to obtain x sixes (and (5-x) no-sixes) is
x 5- x
5 1 5
P(X=x) =
x 6 6
X 0 1 2 3 4 5
3125 3125 1250 250 25 1
P(X=x)
65 65 65 65 65 65
=0.4019 =0.4019 =0.1607 =0.0322 =0.0032 =0.0001
♦ GDC
Our GDC (Casio fx) gives the results for a Binomial distribution
For both
53
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Then for each value of x (or x and y), EXE gives the result
1
For our example above, where n=5 and p= , we can easily answer
6
the following questions:
54
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
A box contains 5 balls, 1 BLACK and 4 WHITE. We win if we select
a BLACK ball. We play this game 10 times.
Find
(a) The probability to win exactly 4 times
(b) The probability to win at most 4 times
(c) The probability to win at least once
(d) The expected number of winning games.
(e) The variance of the number of winning games.
Solution
The variable
X = number of winning games
1
follows a binomial distribution with n=10 and p= =0.2
5
[we may also write X∼B(10,0.2)]
(a) The probability to win exactly 4 times is
10
(0.2 ) (0.8) =0.088
4 6
P(X=4) =
4
or by GDC: Bpd(4) = 0.088
(b) The probability to win at most 4 times is
P(X≤4) = P(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)
I agree, this is too much! Use directly GDC.
It is Bcd(0-4) = 0.967
(c) The probability to win al least once is
P(X≥1) = 1-P(X=0) = 1-0.107 = 0.893
or by GDC: Bcd (1-10) = 0.893
(d) The expected number is E(X)=np=10×0.2 = 5
(e) The variance is Var(X) = np(1-p) = 5×0.8=4
55
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 3
Solution
10 1
p (1- p ) ⇔ 10p(1-p)9=0.268
9
P(X=1) =
1
y=10x(1-x)9-0.268
p = 0.038 or p = 0.2
EXAMPLE 4
Solution
n
P(X=1) = (0.2) 1 (0.8)
n-1
⇔ 0.2n(0.8)n-1 = 0.268
1
y=0.2x(0.8)x-1 - 0.268
n ≈ 1.5 and n ≈ 10
n
Notice:the symbol cannot be used in graph mode. So remember
r
n n n n(n - 1)
= 1 = n =
0 1 2 2
56
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
57
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
µ = mean
σ = standard deviation.
-∞ µ +∞
Roughly speaking, there is a highly likely mean value µ and all the
other values of X spread out symmetrically about the mean. As we
move away from the mean (either to the left or to the right of the
mean) the probability decreases dramatically!
• Weight of people
• Height of people
• Time spent in a super market
• Weight of a pack of coffee labeled 500 g.
58
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
It is estimated4 that
ranges between
Percentage of the population
in general for our problem
about 68% of the population µ-σ and µ+σ [65,85]
about 95% of the population µ-2σ and µ+2σ [55,95]
almost the whole population µ-3σ and µ+3σ [45,105]
68.3%
x
-∞ 65 75 85 +∞
NOTICE
• The whole area under the curve is 1 (i.e. 100%). The area before
the mean as well as the area after the mean is 0.5 (i.e. 50%)
• Theoretically, the distribution of X ranges between -∞ to +∞.
In practice, we may assume that almost the whole population
(in fact 99,7%) ranges between µ-3σ and µ+3σ.
• The standard deviation σ indicates the spread of the population.
For example, assume that
Greeks: µ=75 kg σ=10 kg
Italians: µ=75 kg σ=8 kg
This implies that both populations have the same mean but
Italians are closer to the mean than Greeks. In other words,
almost the whole population is between µ±3σ, namely
75±30 i.e. 45-105 kg for Greeks
75±24 i.e. 51-99 kg for Italians
59
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
-∞ 60 75 82 +∞
60
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
• For P(60≤X≤82) the GDC gives p=0.691 (question (a))
Below this result some extra information is given:
z:Low =-1.5 z:Up =0.7
This means that
the lower bound is 1.5 st. devs below µ: 75-1.5×10=60
the upper bound is 0.7 st. devs above µ: 75+0.7×10=82
We will refer to those values z later on; they are known as
standardized values.
• The probability that the weight is 1 standard deviation away
from the mean, that is between 65 and 85 kg is
P(65≤X≤85)≅0.683, (68.3%) as we said in the introduction.
Notice that z:Low =-1 z:Up =1
• The probabilities above refer to a selection of one person only.
For example,
P(a person is between 60 and 82 kg) = 0.691,
P(a person is not between 60 and 82 kg) =1-0.691= 0.309
If we select two people,
P(both between 60 and 82 kg) = (0.691)2
P(none between 60 and 82 kg) = (0.309)2
P(only one between 60 and 82 kg) = 2× (0.691)×(0.309)
• A question may combine the normal and binomial distributions:
Suppose that we select 10 people. What is the probability that
exactly three of them are between 60 and 82 kg?
The normal distribution gives p=0.691.
A binomial distribution (of a new variable Y) is defined with
n=10 and p=0.691.
Hence
10
P(Y=3) = (0.691 )3 (0.309)7 =0.0106
3
or directly by the GDC, P(Y=3) = 0.0106.
61
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
0.067
-∞ a +∞
62
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
The mass of packs of a certain type of coffee is normally distributed
with a mean of 500 g and standard deviation of 15 g.
(a) Find the probability that a pack weighs more than 520 g
(b) The lightest 4% of the packs weigh less than a, while the
heaviest 4% of the packs weigh more than b. Find a and b.
63
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
X- µ
Z=
σ
For our example:
X follows N(µ,σ2) with µ=75 and σ=10,
X - 75
Z= follows N(0,1).
10
We also say that,
60 - 75
the standardised value of x=60 is z 60 = = −1.5
10
82 - 75
the standardised value of x=82 is z82 = = 0.7
10
Check by your GDC that
P(60≤X≤82)=0.691 and P(-1.5≤ Z ≤0.7) = 0.691
In diagrams:
X
0.691
x
-∞ 60 75 82 +∞
standardisation
0.691
x
-∞ -1.5 0 0.7 +∞
64
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
65
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
For a random variable X we know that
35% is less than 60
25% is more than 90
That is
P(X≤60)=0.35, P(X≥90)=0.25.
Find µ and σ.
Solution
(The diagram is not necessary - sometimes it helps)
The standardized values Z60 and Z90 can be obtained by the GDC:
Tail: Left, Area: 0.35, σ=1, µ=0 gives Z60 = -0.385
Tail: Right, Area: 0.25, σ=1, µ=0 gives Z90 = 0.674
Then
60 - µ 90 - µ
- 0.385 = and 0.674 =
σ σ
We obtain the system
µ-0.385σ =60
µ+0.674σ =90.
The solution of the system is µ=70.9 and σ= 28.3
66
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
ONLY FOR
HL
67
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
68
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
x
1 2 3 4
(i) f(x)≥0
(ii) the area of the triangle under the graph is 1
69
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
P(X=a) = 0
it holds
(i) f(x) ≥ 0 , i.e. the function is non-negative
+∞
(ii) ∫ f(x)dx = 1 ,
−∞
i.e the total area under the curve is 1
Notice that
P(a≤X≤b) and P(a<X<b)
are exactly the same as the probability that X takes a particular
value, say P(X=a) is zero!
P(X=2) = 0, P(X=3.7) = 0
70
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
It is defined by
Var(X) = E(X-µ)2
that is
+∞
2
Var(X) = ∫ (x - µ)
−∞
f(x)dx
Var(X) = E(X2)-µ2
where
+∞
2
E(X2) = ∫x
−∞
f(x)dx
Notice:
The initial definition gives
+∞ 4
2 8 2x
Var(X) = ∫ (x - µ) f(x)dx = ∫ (x − ) dx
−∞ 0
3 8
71
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
X DISCRETE X CONTINUOUS
+∞
Ε(Χ2) = ∑x 2
i pi Ε(Χ2) = ∫x
2
f(x)dx
−∞
Var(X) = E(X2)-µ2
♦ MODE
It is the value of x where f(x) has its maximum.
For our example,
MODE = 4
♦ MEDIAN
It is the value of m where P(X≤m)=0.5
∫ f(x)dx = 0.5
−∞
♦ QUARTILES
The lower quartile Q1 and the upper quartile Q3 are defined by
∫ f(x)dx = 0.25
−∞
and ∫ f(x)dx = 0.75
−∞
72
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
Let X be a continuous random variable in [0,4] with pdf
y
0.5
x
, 0≤x≤2
f(x) = 4
x
1- , 2≤x≤4
x 4
1 2 3 4
14 2
Then Var(X) = − 22 =
3 3
2
x
• Since ∫ 4 dx = 0.5, the median is 2.
0
b
f (x), a ≤ x ≤ b
Notice. Let f(x) = 1 . We check ∫f 1 (x)dx =A
f2 (x), b ≤ x ≤ c a
median
If A > 0.5, the median is between a and b, we solve ∫f 1 (x)dx = 0.5
a
c
If A < 0.5, the median is between b and c, we solve ∫f 2 (x)dx = 0.5
median
73
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Suppose that
a
1
b a1 b1 c1 d1
2
c a2 b2 c2 d2
3
d a3 b3 c3 d3
A B
4 3
In general,
A B
m n
74
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
6 6
EXAMPLE 2
26 26
b) there are 26 choices for the first box. Once we complete the first
box there are 25 choices for the second box
26 25
75
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ WORDS OF r LETTERS
n n n … n
There are n choices for each box. Therefore, the total number of
words is
n ⋅ n ⋅ n L n = nr
r times
EXAMPLE 3
Solution
a) 264 = 456976
b) 54 = 625
c) 104 = 10000
d) 24 = 16
2 2 2 2
24 = 16
76
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 4
26 26 10 10 10
♦ FACTORIAL: n!
We define
n! = 1 ⋅ 2 ⋅ 3L n
For example
1!= 1
2!= 1 ⋅ 2 = 2
3! = 1 ⋅ 2 ⋅ 3 = 6
4! = 1 ⋅ 2 ⋅ 3 ⋅ 4 = 24
5! = 1 ⋅ 2 ⋅ 3 ⋅ 4 ⋅ 5 = 120
77
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
♦ REARRANGEMENTS OF n OBJECTS
Therefore,
5 4 3 2 1
In general
EXAMPLE 5
Three people, Alex, Bill, Chris must sit in three chairs in a row!
There are 3!=6 ways to be arranged. Indeed,
COMBINATIONS PERMUTATIONS
We do not mind about the order We mind about the order
(The r objects are seen as a group) AB is different than BA
nCr nPr
78
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 6
AB
AC BC
AD BD CD
AE BE CE DE
AB BA CA DA EA
AC BC CB DB EB
AD BD CD DC EC
AE BE CE DE ED
n n! n!
nCr = = nPr =
r r! (n − r)! (n − r)!
5! 5!
For example 5C2= =10 while 5P2= =20
2!3! 3!
79
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
n
=n Indeed, there are n ways to choose 1 out of n
1
[the 1st,or the 2nd, or the 3rd, or …, or the nth]
n
=1 Indeed, there is only 1 way to choose n out of n
n
[that is to choose all the n objects]
n
=1 Indeed, there is only 1 way to choose 0 out of n
0
[that is to choose nothing!]
Notice also that
n n(n - 1)
=
2 2
Indeed,
n n! 1 ⋅ 2 ⋅ 3L n (n - 1)n
= = =
2 2! (n - 2)! (1 ⋅ 2) ⋅ (1 ⋅ 2 ⋅ 3L (n - 2)) 2
For example,
5 5 ⋅ 4 10 10 ⋅ 9
= =10 = =45
2 2 2 2
EXAMPLE 7
10 9 8
= 720 possible ways
80
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 8 (LOTTO)
49
49C6 or
6
That is
49! 44 ⋅ 45 ⋅ 46 ⋅ 47 ⋅ 48 ⋅ 49
= = 13983816
6!43! 1 ⋅ 2 ⋅3⋅ 4 ⋅5 ⋅6
This implies that if we choose 6 numbers the probability to win is 1
out of 13983816, that is almost 1 out of 14million!
EXAMPLE 9
There are 26 Latin letters (A, B, C, … , Z). When we construct
words of 3 letters, the order of letters plays a role, thus we have
permutations. However, it will be better to use the Box-technique
instead of permutation formula nPr. The total number of these
words is
26 26 26
Find
a) how many of them consist of different letters.
b) how many of them begin with A.
c) how many of them do not begin with A.
d) how many of them do not contain the letter F.
e) how many of them contain the letter F (at least once).
Solution
a)
26 25 24
81
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
b)
1 26 26
reserved
c)
25 26 26
25 25 25
Fxx 1 25 25 252=625
xFx 25 1 25 252=625
xxF 25 25 1 252=625
FFx 1 1 25 25
FxF 1 25 1 25
xFF 25 1 1 25
FFF 1 1 1 1
82
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 10
A school class consists of 30 students, 10 boys and 20 girls. We
select a committee of 5 students. Clearly, the total number of all
possible committees is
30
5
Solution
10 20
a) ways
2 3
10 20 10
b) = ways
5 0 5
c) There are two cases. The committee contains only boys or only
10 20
girls. Hence, the number of possible ways is +
5 5
83
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
84
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Solution
We have seen that the TOTAL number of possible 5-member
30
committees is
5
Hence, the answers are
10 20
2 3
a)
30
5
10 20 10
5 0 5
b) =
30 30
5 5
10 20
+
5 5
c)
30
5
10 20 10 20 10 20
+ +
0 5 1 4 2 3
d)
30
5
85