You are on page 1of 38

EdExcel: Statistics 2

Yan Jiaqi

May 3, 2024

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 1 / 33
Chapter 3. Approximation

Chapter 3. Approximation

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 2 / 33
Chapter 3. Approximation

Original Approximation
Parameter Condition Rule
Distribution Distribution
B(n, p) P o(λ) λ = np n large p small np ≤ 10
µ = np
B(n, p) N (µ, σ 2 ) n large p close to 0.5
σ 2 = npq
P o(λ) N (µ, σ 2 ) µ = σ2 = λ λ is large

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 3 / 33
Continuous Random Variables

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 4 / 33
Sampling Distribution

Sampling Distribution

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 5 / 33
Sampling Distribution

knowing Population

A population is the whole set of items of interest.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 6 / 33
Sampling Distribution

Census and Sampling

There are two ways to gather information of population.


A census observes every member of a population.
Sampling observes a part (the sample) of a population to represent the population.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 7 / 33
Sampling Distribution

Census and Sampling

Advantages Disadvantages
• Time-consuming and expensive
• Cannot be used when the testing process
Census It should give a completely accurate result
destroys the item
• Hard to process a large quantity of data

• Less time-consuming and expensive than • The data may not be as accurate
a census • The sample may not be large enough
Sampling
• Fewer people needed to respond to give information about small subgroups
• Less data to process than in a census of the population

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 8 / 33
Sampling Distribution

Sampling

Identifying the population, sampling frame, sampling units, sample size, etc.

Sample
Sampling Unit
Sampling Frame

Population

Population: The entire group of individuals or objects that we wish to know something about.
Sampling Unit: The individual person, animal, or object (NOT the data) that has the
measurement (observation) taken on them/it.
Sampling Frame: A list of the sampling units to from which a sample may be taken.
Sample: The individuals or objects who provide the data to be collected.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 9 / 33
Sampling Distribution

Statistic and Parameter

Observation method Numerical Characteristics Symbols


Population Census Parameter µ, σ, p, ρ, …
Sample Sampling Statistic x̄, s, p̂, r, …

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 10 / 33
Sampling Distribution

Sampling Distribution

We use Xi to denote an observation from the distribution X.


A statistic is a quantity calculated only from the observation in a sample.
The distribution of a statistic is called the Sampling Distribution.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 11 / 33
Sampling Distribution

Finding Sampling Distribution

1 Identify the underlying population distribution;


2 List all possible combinations of samples with given size;
3 Calculate the corresponding probability.
Some relevant : max Xi , min Xi , m(Xi ), X, ….

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 12 / 33
Sampling Distribution

Example
A manufacturer of light bulbs sells 60 watt and 100 watt bulbs in the ratio of 3 : 1.
1 Find the mean and variance of the wattage of the light bulbs in this population.

A random sample of 3 light bulbs is taken from a store containing bulbs in this ratio.
2 List all the possible samples.
3 Find the sampling distribution of the mean X.
4 Find the sampling distribution of the mode M .

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 13 / 33
Chapter 7. Hypothesis Testing (without CLT)

Chapter 7. Hypothesis Testing (without CLT)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 14 / 33
Chapter 7. Hypothesis Testing (without CLT)

Small probability = Impossible

In this chapter, a general procedure of hypothesis test is founded.

Example
Suppose you have a set of 6-faced die, some are fair and some are biased such that the number 6
appears more than others.
If you pick a die and roll it 16 times, how many time would you need to roll a 6 to convince
yourself that this die is biased?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 15 / 33
Chapter 7. Hypothesis Testing (without CLT)

Small probability = Impossible

Example
If you pick a die and roll it 16 times, how many time would you need to roll a 6 to convince yourself that
this die is biased?

Will you believe it is fair when you roll 16 sixes with 16 rolls of a fair die?
X = the number of sixes with 16 rolls, then X ∼ B(16, 16 ) and
( )16
1
P (X = 16) = = 3.54 × 10−13 .
6

x 0 ··· 5 6 7 8 ··· 15 16
P (X = x) 0.054 ··· 0.076 0.028 0.008 1.78 × 10−3 ··· 2.84 × 10−11 3.54 × 10−13
F (x) 0.054 ··· 0.962 0.990 0.998 9.96 × 10−1 ··· 1.000 1.000

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 16 / 33
Chapter 7. Hypothesis Testing (without CLT)

Small probability = Impossible

Example
If you pick a die and roll it 16 times, how many time would you need to roll a 6 to convince yourself that
this die is biased?

x 0 ··· 5 6 7 8 ··· 15 16
P (X = x) 0.054 ··· 0.076 0.028 0.008 1.78 × 10−3 ··· 2.84 × 10−11 3.54 × 10−13
F (x) 0.054 ··· 0.962 0.990 0.998 9.96 × 10−1 ··· 1.000 1.000

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 17 / 33
Chapter 7. Hypothesis Testing (without CLT)

Prob

x
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
If you believe a probability of 0.05 is small enough, then

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 18 / 33
Chapter 7. Hypothesis Testing (without CLT)

Vocabulary of Hypothesis Testing

Symbol Explanations
Null Hypothesis H0 an equation about parameters assumed
to be true.
Alternative Hypothesis H1 an inequality about parameters against
H0
Test Statistic T a random variable consisting of samples
Observed Value tobs a value of T calculated from sample val-
ues
Significance Level α a small probability you choose such that
when a probability < α means unlikely to
happen.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 19 / 33
Chapter 7. Hypothesis Testing (without CLT)

Vocabulary of Hypothesis Testing (Cont’)

Symbol Explanations
Critical Region P (T ≥ t) < α a set of values of T calculated from the prob-
(Rejection Region) P (T ≤ t) < α ability inequalities, the most α marginal val-
P (|T | ≥ t) < α ues of T .
Critical Value / the first value in the critical region.
Acceptance Region P (T < t) ≥ 1 − α a set of central (1 − α) values of T
(Confidence Interval) P (T > t) ≥ 1 − α
P (|T | < t) ≥ 1 − α
Actual Significance Level P (T ∈ Critical Region) the probability that T falls in the critical re-
gion, which is meaningful when T is discrete.
p-Value p = P (T ≥ tobs ) the probability of obtaining a result equal
to or more extreme than what was actually
observed.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 20 / 33
Chapter 7. Hypothesis Testing (without CLT)

Frame of Hypothesis Testing


1 State the null hypothesis H0 and the alternative hypothesis H1 .
2 Define a test statistic T , find the observed value tobs .
3 Deduce the distribution of T given that H0 is true.
4 Test for the Significance Level α.
Ha Critical Region P value
Left-tailed p < p0 P (T ≤ t) < α P (T ≤ tobs )
Right-tailed p > p0 P (T ≥ t) < α P (T ≥ tobs )
Two-tailed p ̸= p0 P (|T | ≥ t) < α P (|T | ≥ |tobs |)
α
P (T ≤ t) < 2 · P (T ≤ tobs )
2
and or
α
P (T ≥ t) < 2 · P (T ≥ tobs )
2
Critical region: Reject H0 if tobs is in the critical region.
p-Value: Reject H0 if p < α.
5 Draw a conclusion. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 21 / 33
Chapter 7. Hypothesis Testing (without CLT)

Example
A single observation x is taken from a binomial distribution B(10, p) and a value of 5 is obtained. Use
this observation to test H0 : p = 0.25 against H1 : p > 0.25 using a 5% significance level.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 22 / 33
Chapter 7. Hypothesis Testing (without CLT)

Example
A single observation x is taken from a binomial distribution B(10, p) and a value of 5 is obtained. Use
this observation to test H0 : p = 0.25 against H1 : p > 0.25 using a 5% significance level.

test statistic X ∼ B(10, 0.25), observed value xobs = 5, significance level α = 0.05, Right-tailed.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 22 / 33
Chapter 7. Hypothesis Testing (without CLT)

Example
A single observation x is taken from a binomial distribution B(10, p) and a value of 5 is obtained. Use
this observation to test H0 : p = 0.25 against H1 : p > 0.25 using a 5% significance level.

test statistic X ∼ B(10, 0.25), observed value xobs = 5, significance level α = 0.05, Right-tailed.

Method 1 (Critical Region)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 22 / 33
Chapter 7. Hypothesis Testing (without CLT)

Example
A single observation x is taken from a binomial distribution B(10, p) and a value of 5 is obtained. Use
this observation to test H0 : p = 0.25 against H1 : p > 0.25 using a 5% significance level.

test statistic X ∼ B(10, 0.25), observed value xobs = 5, significance level α = 0.05, Right-tailed.

Method 1 (Critical Region) Method 2 (P value)

Critical Region P (X > x) < 0.05


P (X ≤ 4) = 0.9219, P (X ≤ 5) = 0.9803, then
the critical region is 6 − 10.
The observed value is NOT in the critical region.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 22 / 33
Chapter 7. Hypothesis Testing (without CLT)

Example
A single observation x is taken from a binomial distribution B(10, p) and a value of 5 is obtained. Use
this observation to test H0 : p = 0.25 against H1 : p > 0.25 using a 5% significance level.

test statistic X ∼ B(10, 0.25), observed value xobs = 5, significance level α = 0.05, Right-tailed.

Method 1 (Critical Region) Method 2 (P value)

Critical Region P (X > x) < 0.05 p-value = P (X ≥ xobs )


P (X ≤ 4) = 0.9219, P (X ≤ 5) = 0.9803, then p = 1 − P (X ≤ 4) = 1 − 0.9219 = 0.0781 > 0.05.
the critical region is 6 − 10.
The observed value is NOT in the critical region.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 22 / 33
Chapter 7. Hypothesis Testing (without CLT)

Example
A single observation x is taken from a binomial distribution B(10, p) and a value of 5 is obtained. Use
this observation to test H0 : p = 0.25 against H1 : p > 0.25 using a 5% significance level.

test statistic X ∼ B(10, 0.25), observed value xobs = 5, significance level α = 0.05, Right-tailed.

Method 1 (Critical Region) Method 2 (P value)

Critical Region P (X > x) < 0.05 p-value = P (X ≥ xobs )


P (X ≤ 4) = 0.9219, P (X ≤ 5) = 0.9803, then p = 1 − P (X ≤ 4) = 1 − 0.9219 = 0.0781 > 0.05.
the critical region is 6 − 10.
The observed value is NOT in the critical region.
DO NOT reject the null hypothesis H0 .
There is insufficient evidence to doubt p = 0.25.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 22 / 33
Chapter 7. Hypothesis Testing (without CLT)

Example
Over a period of time, Agnetha has discovered that the carrots she grows have a 25% chance of being
longer than 7 cm. She tries a new type of fertiliser to help them grow. In a random sample of 30
carrots, 13 are longer than 7 cm. Agnetha claims that the new fertiliser has changed the probability of a
carrot being longer than 7 cm. Test Agnetha’s claim at the 5% significance level. State your hypotheses
clearly.

H0 :p = 0.25
H1 :p ̸= 0.25
test statistic X ∼ B(30, 0.25), observed value xobs = 13, sig. level 0.05, two tailed.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 23 / 33
Chapter 7. Hypothesis Testing (without CLT)

test statistic X ∼ B(30, 0.25), observed value xobs = 13, sig. level 0.05, two tailed.
Method 1 (Critical Region)

Critical Region P (X > x) < 0.025 and


P (X < x) < 0.025
P (X ≤ 2) = 0.0106, P (X ≤ 3) = 0.0374,
P (X ≤ 11) = 0.9493, P (X ≤ 12) = 0.9784,
then the critical region is 0 − 2 and 12 − 30.
The observed value 13 is in the critical region.
We reject H0 at 5% level of significance.
There is sufficient evidence to support the claim that the new fertiliser has changed the probability.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 24 / 33
Chapter 7. Hypothesis Testing (without CLT)

test statistic X ∼ B(30, 0.25), observed value xobs = 13, sig. level 0.05, two tailed.
Method 2 (P Value)

P Value 2 · P (X ≥ xobs ) = 2 · P (X ≥ 13) = 2 · (1 − P (X ≤ 12)) = 2 · 0.0211 = 0.0422


then the p value = 0.0422 < 0.05.
We reject H0 at 5% level of significance.
There is sufficient evidence to support the claim that the new fertiliser has changed the probability.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 25 / 33
EdExcel S3

EdExcel S3

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 26 / 33
Chapter 1. Sampling

Chapter 1. Sampling

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 27 / 33
Chapter 1. Sampling

1. Simple Random Simpling.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 28 / 33
Chapter 1. Sampling

Sampling Methods

Advantages Disadvantages
Simple Random Sampling
Systematic Sampling
Stratified Sampling
Quota Sampling

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 29 / 33
Chapter 1. Sampling

Using Random Number Table

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 30 / 33
Chapter 2. Combination of Random Variables

Chapter 2. Combination of Random Variables

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 31 / 33
Chapter 2. Combination of Random Variables

Linearity of Expectation and Variance

Let X, Y, . . . be random variables, then

E(aX + bY + cZ + · · · ) = aE(X) + bE(Y ) + cE(Z) + · · ·

Moreover, if they are independent,

Var(aX + bY + cZ + · · · ) = a2 Var(X) + b2 Var(Y ) + c2 Var(Z).

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 32 / 33
Chapter 2. Combination of Random Variables

If

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Yan Jiaqi EdExcel: Statistics 2 May 3, 2024 33 / 33

You might also like