You are on page 1of 18

Permutations and Combinations

Definition:
- A permutation is an ordered arrangement of objects.
- A combination is a selection of objects in which the order of selection does not matter

Permutations:
Given Objects taken No. of Permutations
𝑛 distinct objects 𝑛 𝑛!
𝑛 distinct objects 𝑟
n
no repetitions Pr or nCr𝑟!

𝑛 distinct objects 𝑟 𝑛
𝑟

with repetitions
𝑛 objects, not all distinct 𝑛 𝑛!
(𝑛1 of type 1, 𝑛2 of type 2,..., 𝑛𝑘 of type 𝑛1!𝑛2!...𝑛𝑘!

k, where 𝑛 = 𝑛1 + 𝑛2 +... + 𝑛𝑘 )
𝑛 objects, not all distinct 𝑟 Involve combination & permutation

Combinations:
Given No. of ways to select 𝑟 objects
𝑛 distinct objects
n
Cr

𝑛 objects, not all distinct No direct way to calculate. Need to consider different cases.
Example 1: Given a, a, a, b, b, c. To choose 3 letters (arrangement not
required):
Case 1: All distinct (abc) - 1 way
Case 2: Contains an identical pair (aab, aac, bba, bbc) - 4 ways
Case 3: All identical (aaa) - 1 way
Total no. of ways = 1 + 4 +1 = 6

Example 2: Given a, a, a, b, b, c. To find the no. of 3-letter codes


(arrangement required):
Case 1: All distinct (abc) - 1x3! = 6 waya
3!
Case 2: Contains an identical pair (aab, aac, bba, bbc) - 4 x 2!
=12 ways
Case 3: All identical (aaa) - 1 way
Total no. of ways = 6 + 12 +1 = 19
Circular Permutations:
- Arrange n individuals at a round table with n identical seats (not numbered) = (𝑛 − 1)!
- If the seats of the round table are numbered, we will perform the necessary calculations assuming
the seats are not numbered, and multiply the value with the number of seats.
- If no. of seats is more than no. of individuals to be arranged, assume the empty seats are “ghosts
(Identical individuals)” and calculate accordingly.

Techniques when dealing with restrictions:


1) Grouping (“must be together”)
2) Slotting (“cannot be together”, “must be separated”)
3) Taking Complement (“two items cannot be together”, “at least 1”)
Note: When there are two groups of people that cannot be together within each group (e.g. sisters cannot
be together and brothers must be separated), the slotting method does not work and need to use the
complement method.
Probability
Basics:
For equally likely outcomes from a finite sample space S, the probability of getting an event A written as
𝑃(𝐴) is defined as
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑎𝑦𝑠 𝑒𝑣𝑒𝑛𝑡 𝐴 𝑐𝑎𝑛 𝑜𝑐𝑐𝑢𝑟 𝑛(𝐴)
𝑃(𝐴) =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
= 𝑛(𝑆)

Useful results:
1. 0 ≤ 𝑃(𝐴) ≤ 1, answers should always be between 0 and 1 inclusive AND exact where possible
𝑛
2. ∑ 𝑃(𝐴𝑖) = 1, where 𝐴𝑖 are all possible outcomes for 𝑖 = 1, 2,..., 𝑛.
𝑖=1
3. 𝑃(𝐴') = 1 − 𝑃(𝐴)
4. 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴∩𝐵) [Note: this is always true; more results/relationships can
be derived using venn diagram]

Conditional Probability:
If A and B are two events and 𝑃(𝐵) ≠ 0, then the probability that event A occurs, given that event B has
already occurred, is written as 𝑃(𝐴|𝐵) and is calculated using the formula:
𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
Note:
𝑃(𝐵∩𝐴)
1. 𝑃(𝐵|𝐴) = 𝑃(𝐴) , where 𝑃(𝐴) ≠ 0.
2. 𝑃(𝐴∩𝐵)=𝑃(𝐴|𝐵)𝑃(𝐵)=𝑃(𝐵|𝐴)𝑃(𝐴).
3. 𝑃(𝐴'|𝐵)=1 − 𝑃(𝐴|𝐵).
4. In general, 𝑃(𝐴|𝐵) ≠ 𝑃(𝐵|𝐴).

Special Events:
Mutually exclusive events: A and B are said to be mutually exclusive events if they cannot occur at the
same time, i.e. 𝐴∩𝐵 = ∅ or 𝑃(𝐴∩𝐵) = 0.
Thus we have 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵).

Independent events: A and B are said to be independent events if the occurrence of even B does not
affect the probability of occurrence of event A and vice versa. [Note: in venn diagram, A and B intersect]
To prove independence of events A and B, we can show either one of the following:
1. 𝑃(𝐴|𝐵)=𝑃(𝐴)
2. 𝑃(𝐵|𝐴)=𝑃(𝐵)
3. 𝑃(𝐴∩𝐵)= 𝑃(𝐴)𝑃(𝐵)
It can be easily shown that if A and B are independent, then the following pairs are also independent:
A and B’, A’ and B’, A’ and B.

Note: We DO NOT assume independence unless we have proven it or condition is given in the question.

Representing using Tree Diagrams:

A special example involving use of Geometric Series formula:


A, B and C in that order, throw a tetrahedral die. The first one to throw a 4 wins, the game is continued
indefinitely until someone wins. Find the probability that A wins.
Discrete Random Variable
Definition:
(1) A discrete random variable is one which can only take a finite or infinitely countable number of
values.

(2) Probability Distribution Function:


Let 𝑋 be a discrete random variable taking values 𝑥1 , 𝑥2 , ... , 𝑥𝑛 . Then, the probability
distribution function (pdf) of 𝑋 is the function 𝑓 that maps each value 𝑥𝑘 to the probability that
𝑋 = 𝑥𝑘 , i.e., 𝑓(𝑥) = 𝑃(𝑋 = 𝑥) for 𝑥 = 𝑥1 , 𝑥2 , ... , 𝑥𝑛 .

(3) Cumulative Distribution Function:


If 𝑋 is a discrete random variable with probability distribution function 𝑃(𝑋 = 𝑥) for 𝑥 =
𝑥1 , 𝑥2 , ... , then the cumulative distribution function (cdf) of X is given by 𝑃(𝑋 ≤ 𝑥) =

∑ 𝑃(𝑋 = 𝑟).
𝑟≤𝑥

Probability Distribution:
A table or formula giving the values of 𝑃(𝑋 = 𝑥) for every 𝑥 in sample space is called the probability
distribution of 𝑋. For the experiment of tossing a fair coin 3 times where 𝑋 is the number of heads
obtained, the probability distribution of 𝑋 is as follows:
𝑥 0 1 2 3

𝑃(𝑋 = 𝑥) 1 3 3 1
8 8 8 8

The probability distribution of 𝑋 satisfy the following:


1. 0 ≤ 𝑃(𝑋 = 𝑥) ≤ 1 for all 𝑥 in 𝑆.

2. ∑ 𝑃(𝑋 = 𝑥) = 1 where the summation is over all values of 𝑥 in 𝑆.


𝑥∈𝑆

Expectation of Discrete Random Variable:


THe expectation (or mean, or expected value) of a discrete random variable 𝑋 taking values from a set 𝑆 is

given by 𝐸(𝑋) = ∑ 𝑥𝑃(𝑋 = 𝑥).


𝑥∈𝑆

Independent Discrete Random Variables:


Let 𝑋 and 𝑌 be two discrete random variables taking on possible values 𝑥1 , 𝑥2 , ... and 𝑦1 , 𝑦2 , ...
respectively. The random variables 𝑋 and 𝑌 are said to be independent if for all 𝑖 and 𝑗,
𝑃(𝑋 = 𝑥𝑖 𝑎𝑛𝑑 𝑌 = 𝑦𝑖 ) = 𝑃(𝑋 = 𝑥𝑖 )𝑃( 𝑌 = 𝑦𝑖 ).
Functions of a Discrete Random Variable:

The expectation 𝑔(𝑋), where 𝑔 is a function 𝑋, is denoted by 𝐸(𝑔(𝑋)) = ∑ 𝑔(𝑥)𝑃(𝑋 = 𝑥). In particular,
𝑥∈𝑆

2 2
𝐸(𝑋 ) = ∑ 𝑥 𝑃(𝑋 = 𝑥).
𝑥∈𝑆

Variance and Standard Deviation of a Discrete Random Variable:


2 2 2
The variance of a discrete random variable 𝑋 is given by 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 − µ) = 𝐸(𝑋 ) − µ , where
µ = 𝐸(𝑋).
The standard deviation of 𝑋, denoted by σ, is defined as 𝑉𝑎𝑟(𝑋).

Properties of Expectation and Variance:


(Note that these properties hold for discrete and continuous random variables)
Let 𝑋and 𝑌 be random variables and 𝑎 and 𝑏 be constants. We have
Expectation Variance

(1) 𝐸(𝑎) = 𝑎 (1) 𝑉𝑎𝑟(𝑎) = 0


(2) 𝐸(𝑎𝑋) = 𝑎𝐸(𝑋) 2
(2) 𝑉𝑎𝑟(𝑎𝑋) = 𝑎 𝑉𝑎𝑟(𝑋)
(3) 𝐸(𝑎𝑋 ± 𝑏) = 𝑎𝐸(𝑋) ± 𝑏 2
(3) 𝑉𝑎𝑟(𝑎𝑋) ± 𝑏 = 𝑎 𝑉𝑎𝑟(𝑋)
(4) 𝐸(𝑎𝑋 ± 𝑏𝑌) = 𝑎𝐸(𝑋) ± 𝑏𝐸(𝑌)
If 𝑋 and 𝑌 are independent random variables, then
2 2
(4) 𝑉𝑎𝑟(𝑎𝑋 ± 𝑏𝑌) = 𝑎 𝑉𝑎𝑟(𝑋) + 𝑏 𝑉𝑎𝑟(𝑌)
Binomial Distribution
Definition:
A binomial random variable is a special discrete random variable (A binomial distribution is a special
case of a discrete distribution).

Conditions for an experiment to follow binomial distribution:


1. It consists of 𝑛 independent trials.
2. The outcome of each trial is either a success or a failure.
3. The probability of success for each trial, denoted by 𝑝, remains constant.

Note that the above must be stated in context of a given question.

For example, a biased coin has probability 0.6 of obtaining a head in any tool. Find the probability of
getting 6 heads if the coin is tossed 8 times. Conditions in context as follow:
1. The coin is tossed independently 8 times.
2. The outcome of each toss is either a head (success) or tail (failure).
3. The probability of obtaining a head (success) remains constant at 0.6.

Binomial Distribution:
If a discrete random variable 𝑋 follows a binomial distribution, we write 𝑋~𝐵(𝑛, 𝑝), where the
parameters of the distribution 𝑛 and 𝑝 refer to the number of trials and the probability of success for each
trial respectively.
The probability distribution of 𝑋 is given by (in MF26)

The mean and variance of 𝑋 are given by


(1) 𝐸(𝑋) = 𝑛𝑝 (in MF26)
(2) 𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝) (in MF26)
Normal Distribution
Definition and Basics:
Continuous random variables are quantities that take on uncountably many possible random values. It best
describes data such as height, mass, time, distance, etc.

Normal random variable is a special continuous random variable. For continuous random variables,
probability is calculated using the area under its probability density function.

Note that 𝑃(𝑋 < 𝑥) ≠ 𝑃(𝑋 ≤ 𝑥) for discrete random variables BUT 𝑃(𝑋 < 𝑥) = 𝑃(𝑋 ≤ 𝑥) for
continuous random variables.

Normal Distribution:
2
A normal random variable 𝑋 with mean µ and variance σ has the following probability density function
f(𝑥):

2 2
We say that 𝑋 follows a normal distribution. We write 𝑋~𝑁(µ, σ ), where 𝐸(𝑋) = µ and 𝑉𝑎𝑟(𝑋) = σ .

Properties of a Normal Curve:


2
Let 𝑋~𝑁(µ, σ ).
(1) It is symmetrical about the line 𝑥 = µ.
(2) The mean, median and mode are all equal to µ.
(3) It approaches the 𝑥-axis as 𝑥 → ± ∞.
𝑏
(4) Area under the graph gives the probabilities, i.e. 𝑃(𝑎 < 𝑋 < 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥 where 𝑦 = 𝑓(𝑥)
𝑎
represents the probability density function of the normal curve. Hence 𝑃(𝑎 < 𝑋 < 𝑏) is given by
the are under the graph from 𝑥 = 𝑎 to 𝑥 = 𝑏.
(5) Total area under the curve is 1.
(6) 𝑃(µ − σ < 𝑋 < µ + σ) ≈ 0. 68
𝑃(µ − 2σ < 𝑋 < µ + 2σ) ≈ 0. 95
𝑃(µ − 3σ < 𝑋 < µ + 3σ) ≈ 0. 997
i.e. approximately 68%, 95% and 99.7% of the values drawn from a normal distribution lies
within 1, 2 and 3 standard deviations of the mean respectively.
Standard Normal Distribution:
2 𝑋−µ
Let 𝑋~𝑁(µ, σ ). The random variable Z, which is the standard normal variable, is defined by 𝑍 = σ
.
The standard normal distribution is 𝑍~𝑁(0, 1).
2
The process of converting 𝑋~𝑁(µ, σ ) into 𝑍~𝑁(0, 1) is known as standardization.
𝑋−µ 𝑥−µ 𝑥−µ
𝑃(𝑋 ≤ 𝑥) = 𝑃( σ
≤ σ
) = 𝑃(𝑍 ≤ σ
)
Standardization is usually applied when there are unknown parameter(s) when we cannot use the
commands normalcdf or invNorm of GC directly.

Using the Properties of Expectation and Variance of Random Variables, we have the following results for
independent Normal Random Variables

2 2
Let 𝑋~𝑁(µ1, σ1 ) and 𝑌~𝑁(µ2, σ2 ) be independent random variables and 𝑎 and 𝑏 be constants. We
have
2 2
(1) 𝑋 ± 𝑌~𝑁(µ1 ± µ2, σ1 + σ2 )
2 2
(2) 𝑎𝑋 ± 𝑏~𝑁(𝑎µ1 ± 𝑏, 𝑎 σ1 )
2 2 2 2
(3) 𝑎𝑋 ± 𝑏𝑌~𝑁(𝑎µ1 ± 𝑏µ2, 𝑎 σ1 + 𝑏 σ2 )
Sampling
Definition:
A population is the entire collection of data (persons or items) that we want to study, e.g. apples produced
by a farm.

A sample is random if every element in the population has an equal chance of being selected, and the
selection of an element is independent of another, e.g. ‘Every biscuit bar has an equal chance of being
selected, and the selection of one biscuit bar is not affected or influenced by the selection of another
biscuit bar’. Note that it is not sufficient to say ‘each biscuit bar has an equal chance of being selected’.

A sample is non-random if each element in the population does not have an equal chance of being
selected, resulting in certain segments of the population being over-represented, as some members are
“systematically or deliberately excluded” from the study and the sample being biased.

The Sample Mean, 𝑋 as a Random Variable:


Let 𝑋1, 𝑋2, 𝑋3,..., 𝑋𝑛 be a random sample of size 𝑛 taken from an infinite population (or finite population
2
is sampling is done with replacement) with mean µ and variance σ . Then the sample mean 𝑋, defined by
1 𝑋1+𝑋2+ 𝑋3+...+𝑋𝑛 σ
2
𝑋= 𝑛
∑𝑋 = 𝑛
is a random variable with 𝐸(𝑋) = µ and 𝑉𝑎𝑟(𝑋) = 𝑛
.

The Distribution of the Sample Mean:


Let 𝑋1, 𝑋2, 𝑋3,..., 𝑋𝑛 be a random sample of size 𝑛 Let 𝑋1, 𝑋2, 𝑋3,..., 𝑋𝑛 be a large random sample of
taken from a normal population with mean µ and size 𝑛 taken from a non-normal population with
2 2
variance σ . mean µ and variance σ .
Then Then since sample size 𝑛 is large,
2 2
σ σ
(1) 𝑋~𝑁(µ, 𝑛
) (3) 𝑋~𝑁(µ, 𝑛
) approximately by Central
(2) 𝑋1 + 𝑋2 + 𝑋3 +... + 𝑋𝑛~𝑁(𝑛µ, 𝑛σ )
2 Limit Theorem
2
(4) 𝑋1 + 𝑋2 + 𝑋3 +... + 𝑋𝑛~𝑁(𝑛µ, 𝑛σ )
approximately by Central Limit
Theorem

Tips:
1. To identify questions on CLT, look out when there is no mention of “normally distributed” or
“normal population” and the question asks for “the probability that the sample mean / average
value of X / sum…”, “by using a suitable approximation” or “estimate the probability”.
2
2. Do no write 𝑋~𝑁(µ, σ ) if question never mentions “normally distributed” or “normal
population”. There are other kinds of distributions that a population could follow.
2 2
3. When the population variance σ is unknown, the unbiased estimate of population variance 𝑠
will be used and the notation will change accordingly. This is especially important when we write
2
𝑠
the distribution of 𝑋~𝑁(µ, 𝑛
) approximately under hypothesis testing.
Unbiased Estimates of Population Mean and Population Variance:
In statistics, an estimate is considered to be “good” if it’s unbiased, i.e. the average value of the sample
statistic (used to estimate the population parameter) for all possible samples given the true value of the
population parameter.
In particular, the average value of 𝑋 gives the true value of µ, that is, 𝐸(𝑋) = µ.

Population Estimate from Unbiased estimate? How?


sample

µ Sample mean Yes, sample mean is 𝑥=


𝚺𝑥
or 𝑥 =
𝚺(𝑥−𝑎)
+𝑎
𝑛 𝑛
𝑥 an unbiased
estimate of
population mean

σ
2 Sample variance No, sample variance Calculate the unbiased estimate of
is NOT an unbiased 2
population variance 𝑠 by using one of the
estimate of following:
population variance. 2 𝑛 𝚺(𝑥−𝑥)
2
(1) 𝑠 = 𝑛−1
[ 𝑛
]
2
2 1 2 (𝚺𝑥)
(2) 𝑠 = 𝑛−1
[𝚺𝑥 − 𝑛 ]
2 𝑛
(3) 𝑠 = 𝑛−1
(𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒)
2
2 1 2 (𝚺(𝑥−𝑎))
(4) 𝑠 = 𝑛−1
[𝚺 (𝑥 − 𝑎 ) − 𝑛
]
Hypothesis Testing
Definition:
(1) Null hypothesis 𝐻0: A statement that states the population mean is equal to a specific value which
usually implies status quo.

(2) Alternative hypothesis 𝐻1: It is the conclusion about the population mean that we will go for when the
null hypothesis is being rejected. It is a statement that states the population mean is different from the
specified value. The change in mean can be represented in the form of unequal, greater or smaller than the
specified value.

Tips: If the question suggests a one tail test, i.e. 𝐻1 : μ < 60 or μ > 60, we can use the value of the sample
mean to judge which one to go for. For example, if 𝑥 = 57 , it is definitely more meaningful to test against
𝐻1 : μ < 60.

Test statistic:
𝑋−µ 2 𝑋−µ 2
In our syllabus, we either use σ (when σ is known) or 𝑠 (when σ is unknown) as the test statistic.
𝑛 𝑛
2
(1) Case 1: σ is known
𝑋−µ
σ ~𝑁(0, 1) ⇒ Use Z-test
𝑛

Population (𝑋) is normal Population not normal or unknown

Reason σ
2
If 𝑛 is large, by CLT,
If 𝑋 is normal, 𝑋~𝑁(µ, 𝑛
) 2
σ
exactly for any 𝑛 ≥ 1. 𝑋~𝑁(µ, 𝑛
) approximately.

What if 𝑛 is small? 𝑋 will not be normally distributed.


We can still proceed with Z-test but the question will
require us to assume that 𝑋 is normally distributed.
2
(2) Case 2: σ is unknown (𝑛 needs to be large)
𝑋−µ
𝑠 ~𝑁(0, 1) ⇒ Use Z-test
𝑛
2
𝑠
When 𝑛 is large, by CLT, 𝑋~𝑁(µ, 𝑛
) approximately.

Tips:
- 𝑛 = 30 is also considered as sufficiently large for CLT to work.
- If the question asks if it is necessary to assume 𝑋 is normally distributed, look out if CLT applies.
- Also check that the sample is a random sample for the test to be valid.
Level of significance:
a) Definition of 5% level of significance: The probability of wrongly rejecting 𝐻0 when 𝐻0 is true is
0.05. Do give your definition in the context of question if required by question. E.g. “there is a
0.05 probability of wrongly concluding that the mean IQ of the students is more than 118 when in
fact it is 118”.
b) Smallest level of significance for rejecting 𝐻0 is the p-value as α% ≥ 𝑝.
c) Useful implications:
- Reject 𝐻0 at 5% ⇒ reject 𝐻0 at 10% or any α higher than 5%.
- Do not reject 𝐻0 at 5% ⇒ do not reject 𝐻0 at 1% or any α lower than 5%.
P-value:
a) Definition: If 𝐻0 is rejected at a certain 𝑝-value, the probability of wrongly rejecting 𝐻0 when 𝐻0
is true is 𝑝. E.g. “The 𝑝-value = 𝑃(𝑋 ≥ 121) refers to the probability of getting a sample with
average IQ at least 121. This is also the value we will put down on our script if question asks for
the smallest level of significance at which 𝐻0 can be rejected in favour of 𝐻1”.
b) 𝑃-value is also the smallest level of significance for rejecting 𝐻0.
c) 𝑃-value of a two tail test = 2(𝑝-value of the corresponding one tail test based on the same sample)
For example, 𝐻0: µ = 60 𝐻1 : μ < 60
2 𝑋−µ
Suppose 𝑥 = 57, σ = 81 and 𝑛 = 30 and test statistic σ =− 1. 826.
𝑛
81
𝑝-value = 𝑃(𝑋 < 57 when 𝑋~ 𝑁(60, 30
)) = 𝑃(𝑍 <− 1. 826 when 𝑍~ 𝑁(0, 1)) = 0. 0339
For 𝐻0: µ = 60 𝐻1 : μ ≠ 60, 𝑝-value = 2(0.0339).

Two perspectives of rejecting 𝐻0:


Take an example: 𝐻0: µ = 60 𝐻1 : μ < 60, level of significance = 5%
2
Suppose 𝑥 = 57, σ = 81 and 𝑛 = 30 and test statistic =− 1. 826. Using Z-test,
Probability Critical Value

𝑝-value=0.0339 Test statistic


𝑋−µ
=− 1. 826
σ
𝑛

Reject 𝐻0 when 𝑝-value ≤ 5% Reject 𝐻0 when Test statistic ≤− 1. 645 (rejection or


critical region)

Understand that 0.0339 < 5% is equivalent to −1.826 < −1.645.


So we can conclude reject 𝐻0 at 5% whichever perspective you take.
This is faster in a test with no unknowns This is used when there are unknowns in the question,,
as G.C. calculates the 𝑝-value for you. 2
e.g. µ0 is stated instead of 60, 𝑛 is unknown, σ is
unknown, or even 𝑥 is unknown due to missing data.
For example, if σ is unknown and we reject 𝐻0, we will
use the appropriate test statistic
𝑋−µ 57−60
σ = σ <− 1. 645 (that is for 5% level of
𝑛 30

significance for Z-test) and solve for the values of σ.

Conclusion:
Reject 𝐻0 means accept 𝐻1. Do not reject 𝐻0 is NOT the same as accept 𝐻0.
Make sure you conclude by saying
- reject 𝐻0 at α% level of significance. Sufficient evidence that 𝐻1 is true in the context of question
OR
- do not reject 𝐻0 at α% level of significance. Insufficient evidence that 𝐻1 is true in the context of
question

Perform a Hypothesis Test:


Step 1 Understand the given question and write down the null hypothesis 𝐻0 and the
alternative hypothesis 𝐻1

Step 2 Write down the level of significance α (usually given in the question)

Step 3 Decide on the test statistic to be used and determine its distribution

Step 4 Use G.C. to calculate the 𝑝-value

Step 5 Reject 𝐻0 if 𝑝-value ≤ α, OR


Do not reject 𝐻0 if 𝑝-value > α
Write down the conclusion in the context of the question

Example:
Let 𝑋 be the IQ of a student in ABC University. Remember to always define
X if these are not defined in
the question.
Step 1: To test 𝐻0: µ = 118 vs 𝐻1 : μ > 118 Read question carefully to
decide >, < or ≠
Step 2: Perform a 1-tail test at 5% level of significance Decide if 1-tail or 2-tail test
should be performed
Step 3: (Sample from a Normal population of known variance)
2
σ
Under 𝐻0, 𝑋~𝑁(µ, 𝑛
), where µ0=118 and σ=12.
From the sample, 𝑥 = 121 and 𝑛 = 50.

OR

Step 3: (Large sample from a Normal population of unknown


variance)
2
𝑠
Under 𝐻0, 𝑋~𝑁(µ, 𝑛
) approximately, with µ0=118.
From the sample, 𝑥 = 121 and 𝑠 = 97. 454.

OR
Step 3: (Large sample from a non-Normal population of unknown
variance)
2
𝑠
Under 𝐻0, since 𝑛 = 60 is large, 𝑋~𝑁(µ, 𝑛
)
approximately by Central Limit Theorem, with µ0=118.
From the sample, 𝑥 = 121 and 𝑠 = 97. 454.

Step 4: Using a z-test, 𝑝-value = 𝑃(𝑋 ≥ 121) = 0. 0385 (3𝑠. 𝑓.) “𝐻1 : µ > µ0” ⇒ 𝑃(𝑋 ≥ 𝑥)
“𝐻1 : µ < µ0” ⇒ 𝑃(𝑋 ≤ 𝑥)
“𝐻1 : µ ≠ µ0” ⇒
2𝑃(𝑋 ≤ 𝑥) if 𝑥 < µ0
or 2𝑃(𝑋 ≥ 𝑥) if 𝑥 > µ0
Step 5: Since 𝑝-value = 0.0385 < 0.05, we reject 𝐻0 and conclude that
there is sufficient evidence, at 5% level of significance, to support the
claim that the mean IQ of students in ABC University is greater than
118.

OR
(if test is performed at 1% level of significance)

Step 5: Since 𝑝-value = 0.0385 > 0.01, we do not reject 𝐻0 and


conclude that there is insufficient evidence, at 5% level of
significance, to support the claim that the mean IQ of students in ABC
University is greater than 118.

[Notice that the end of the sentence is to describe the alternative


hypothesis 𝐻1]

Important cases where conclusion is given and we are asked to find:


Case 1: Level of significance α%
Carry on as if we are performing a test. After finding the 𝑝-value:
α α
If given 𝐻0 is rejected, 𝑝-value ≤ 100
; if given 𝐻0 is not rejected, 𝑝-value > 100
.

Case 2: Sample mean 𝑥 (No standardization required)


2
σ
Example: For a test at 5% level of significance and under 𝐻0, 𝑋~𝑁(µ0, 𝑛
), where µ0=118 and σ
=12. From the sample, 𝑛=12.
12
If given 𝐻0 is rejected, use: [GC invNorm: key in standard deviation = , not 12]
50

1-tail (e.g. 𝐻1: µ > 118) 2-tail (e.g. 𝐻1: µ ≠ 118)

𝑝-value ≤ 0.05 𝑝-value ≤ 0.05


𝑃(𝑋 ≥ 𝑥) ≤ 0. 05 2𝑃(𝑋 ≥ 𝑥) ≤ 0. 05 or 2𝑃(𝑋 ≤ 𝑥) ≤ 0. 05
Using GC, 𝑃(𝑋 ≥ 137. 738) = 0. 05 𝑃(𝑋 ≥ 𝑥) ≤ 0. 025 or 𝑃(𝑋 ≤ 𝑥) ≤ 0. 025
𝑥 ≥ 138 …

2 2
Case 3: µ0 or 𝑛 or σ or 𝑠
All steps are similar to that of Case 2 above except we have to standardize in order to solve for
the unknown parameter.
Example: (to find µ0)

1-tail (e.g. 𝐻1: µ > µ0; from sample, 𝑥=121)

𝑝-value ≤ 0.05
𝑃(𝑋 ≥ 121) ≤ 0. 05
121−µ0
𝑃(𝑍 ≥ 2
) ≤ 0. 05
12
50

Using GC, 𝑃(𝑍 ≥ 1. 6449) = 0. 05


121−µ0
2
≥ 1. 6449
12
50


Correlation & Regression
Definition:
An independent variable is the variable whose change will have an effect on the dependent variable.
Sometimes the independent variable can be controlled so that the variable only assumes a set of
predetermined values.

A scatter diagram is a two-dimensional plot, with the values of one variable plotted along each axis. We
plot the independent variable along the horizontal axis. A scatter diagram is used to show visually the
relation between two variables, and it helps to identify outliers.

The product moment correlation coefficient, denoted by 𝑟, is a measure of the strength of the linear
relation between two variables. The value of 𝑟 is independent of the units of the variables, and -1≤ 𝑟 ≤1.

Description of Scatter Relation between variables 𝑟-value


Diagram

Points lie close to a straight line Positive linear correlation 𝑟 >0


of positive gradient between variables

Points lie close to a straight line Negative linear correlation 𝑟 <0


of negative gradient between variables

All points lie on a straight line Perfect positive linear 𝑟 =1


of positive gradient correlation between variables

All points lie on a straight line Perfect negative linear 𝑟 =− 1


of negative gradient correlation between variables

Points are spread randomly No clear relation between 𝑟 ≈0


without visible trend variables

Points lie close to a curve Non-linear relation between Depends on how close the curve
variables is to a straight line

Note the product moment correlation coefficient merely gives an idea of the linear relationship between
the variables. It does not imply any cause-and-effect relationship between the variables. There may be
intermediate variables involved in the relationship which we do not know about, or there may even be
more than one explanation to the linear relation.

Linear regression attempts to model the relationship between two variables by fitting a linear equation to
a set of observed data.

Regression Line:
The least squares regression line of 𝑦 on 𝑥 has the form 𝑦 = 𝑎 + 𝑏𝑥. It is used when
- 𝑥 is the independent variable and 𝑦 is the dependent variable; or
- The independent/dependent variable cannot be determined and you want to estimate 𝑦 for a given
value of 𝑥.

The least squares regression line of 𝑥 on 𝑦 has the form 𝑥 = 𝑐 + 𝑑𝑦. It is used when
- 𝑦 is the independent variable and 𝑥 is the dependent variable; or
- the independent/dependent variable cannot be determined and you want to estimate 𝑥 for a given
value of 𝑦

Note:
- Both the regression lines of 𝑦 on 𝑥 and 𝑥 on 𝑦 pass through the point (𝑥, 𝑦).
- In general, the regression line of 𝑦 on 𝑥 is not the same as the regression line of 𝑥 on 𝑦.
- The regression lines of 𝑦 on 𝑥 and 𝑥 on 𝑦 are the same if and only if the product moment
correlation coefficient between 𝑥 and 𝑦 is 1 or -1. The closer the coefficient is to either of these
values, the closer are the lines to each other.

Estimate a Value using a Regression Line


Using the regression line to estimate a value within the range of data is known as interpolation. Using the
line to estimate values outside the range of data is known as extrapolation.
Extrapolating using values beyond the given range is unreliable as the regression model may not be
applicable outside the range.

An estimate obtained using the linear regression line is reliable if


- there is linear correlation between the variables, and
- the estimate is obtained by interpolation.

Linearization of Data
There are cases where the relation between the variables are non-linear. However, through a suitable
transformation on the data, it may still be possible to find a linear relation between the variables for the
2
transformed data, e.g. 𝑥 and 𝑦 have a non-linear relation if 𝑦 = 𝑎𝑥 + 𝑏, but there is a linear relation
2
between 𝑥 and 𝑦.

Reminder when using GC:


For the GC to give the value of 𝑟 together with the equation of the linear regression line, ensure that STAT
DIAGNOSTICS is turned ON.

You might also like