Professional Documents
Culture Documents
Definition:
- A permutation is an ordered arrangement of objects.
- A combination is a selection of objects in which the order of selection does not matter
Permutations:
Given Objects taken No. of Permutations
𝑛 distinct objects 𝑛 𝑛!
𝑛 distinct objects 𝑟
n
no repetitions Pr or nCr𝑟!
𝑛 distinct objects 𝑟 𝑛
𝑟
with repetitions
𝑛 objects, not all distinct 𝑛 𝑛!
(𝑛1 of type 1, 𝑛2 of type 2,..., 𝑛𝑘 of type 𝑛1!𝑛2!...𝑛𝑘!
k, where 𝑛 = 𝑛1 + 𝑛2 +... + 𝑛𝑘 )
𝑛 objects, not all distinct 𝑟 Involve combination & permutation
Combinations:
Given No. of ways to select 𝑟 objects
𝑛 distinct objects
n
Cr
𝑛 objects, not all distinct No direct way to calculate. Need to consider different cases.
Example 1: Given a, a, a, b, b, c. To choose 3 letters (arrangement not
required):
Case 1: All distinct (abc) - 1 way
Case 2: Contains an identical pair (aab, aac, bba, bbc) - 4 ways
Case 3: All identical (aaa) - 1 way
Total no. of ways = 1 + 4 +1 = 6
Useful results:
1. 0 ≤ 𝑃(𝐴) ≤ 1, answers should always be between 0 and 1 inclusive AND exact where possible
𝑛
2. ∑ 𝑃(𝐴𝑖) = 1, where 𝐴𝑖 are all possible outcomes for 𝑖 = 1, 2,..., 𝑛.
𝑖=1
3. 𝑃(𝐴') = 1 − 𝑃(𝐴)
4. 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴∩𝐵) [Note: this is always true; more results/relationships can
be derived using venn diagram]
Conditional Probability:
If A and B are two events and 𝑃(𝐵) ≠ 0, then the probability that event A occurs, given that event B has
already occurred, is written as 𝑃(𝐴|𝐵) and is calculated using the formula:
𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
Note:
𝑃(𝐵∩𝐴)
1. 𝑃(𝐵|𝐴) = 𝑃(𝐴) , where 𝑃(𝐴) ≠ 0.
2. 𝑃(𝐴∩𝐵)=𝑃(𝐴|𝐵)𝑃(𝐵)=𝑃(𝐵|𝐴)𝑃(𝐴).
3. 𝑃(𝐴'|𝐵)=1 − 𝑃(𝐴|𝐵).
4. In general, 𝑃(𝐴|𝐵) ≠ 𝑃(𝐵|𝐴).
Special Events:
Mutually exclusive events: A and B are said to be mutually exclusive events if they cannot occur at the
same time, i.e. 𝐴∩𝐵 = ∅ or 𝑃(𝐴∩𝐵) = 0.
Thus we have 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵).
Independent events: A and B are said to be independent events if the occurrence of even B does not
affect the probability of occurrence of event A and vice versa. [Note: in venn diagram, A and B intersect]
To prove independence of events A and B, we can show either one of the following:
1. 𝑃(𝐴|𝐵)=𝑃(𝐴)
2. 𝑃(𝐵|𝐴)=𝑃(𝐵)
3. 𝑃(𝐴∩𝐵)= 𝑃(𝐴)𝑃(𝐵)
It can be easily shown that if A and B are independent, then the following pairs are also independent:
A and B’, A’ and B’, A’ and B.
Note: We DO NOT assume independence unless we have proven it or condition is given in the question.
∑ 𝑃(𝑋 = 𝑟).
𝑟≤𝑥
Probability Distribution:
A table or formula giving the values of 𝑃(𝑋 = 𝑥) for every 𝑥 in sample space is called the probability
distribution of 𝑋. For the experiment of tossing a fair coin 3 times where 𝑋 is the number of heads
obtained, the probability distribution of 𝑋 is as follows:
𝑥 0 1 2 3
𝑃(𝑋 = 𝑥) 1 3 3 1
8 8 8 8
The expectation 𝑔(𝑋), where 𝑔 is a function 𝑋, is denoted by 𝐸(𝑔(𝑋)) = ∑ 𝑔(𝑥)𝑃(𝑋 = 𝑥). In particular,
𝑥∈𝑆
2 2
𝐸(𝑋 ) = ∑ 𝑥 𝑃(𝑋 = 𝑥).
𝑥∈𝑆
For example, a biased coin has probability 0.6 of obtaining a head in any tool. Find the probability of
getting 6 heads if the coin is tossed 8 times. Conditions in context as follow:
1. The coin is tossed independently 8 times.
2. The outcome of each toss is either a head (success) or tail (failure).
3. The probability of obtaining a head (success) remains constant at 0.6.
Binomial Distribution:
If a discrete random variable 𝑋 follows a binomial distribution, we write 𝑋~𝐵(𝑛, 𝑝), where the
parameters of the distribution 𝑛 and 𝑝 refer to the number of trials and the probability of success for each
trial respectively.
The probability distribution of 𝑋 is given by (in MF26)
Normal random variable is a special continuous random variable. For continuous random variables,
probability is calculated using the area under its probability density function.
Note that 𝑃(𝑋 < 𝑥) ≠ 𝑃(𝑋 ≤ 𝑥) for discrete random variables BUT 𝑃(𝑋 < 𝑥) = 𝑃(𝑋 ≤ 𝑥) for
continuous random variables.
Normal Distribution:
2
A normal random variable 𝑋 with mean µ and variance σ has the following probability density function
f(𝑥):
2 2
We say that 𝑋 follows a normal distribution. We write 𝑋~𝑁(µ, σ ), where 𝐸(𝑋) = µ and 𝑉𝑎𝑟(𝑋) = σ .
Using the Properties of Expectation and Variance of Random Variables, we have the following results for
independent Normal Random Variables
2 2
Let 𝑋~𝑁(µ1, σ1 ) and 𝑌~𝑁(µ2, σ2 ) be independent random variables and 𝑎 and 𝑏 be constants. We
have
2 2
(1) 𝑋 ± 𝑌~𝑁(µ1 ± µ2, σ1 + σ2 )
2 2
(2) 𝑎𝑋 ± 𝑏~𝑁(𝑎µ1 ± 𝑏, 𝑎 σ1 )
2 2 2 2
(3) 𝑎𝑋 ± 𝑏𝑌~𝑁(𝑎µ1 ± 𝑏µ2, 𝑎 σ1 + 𝑏 σ2 )
Sampling
Definition:
A population is the entire collection of data (persons or items) that we want to study, e.g. apples produced
by a farm.
A sample is random if every element in the population has an equal chance of being selected, and the
selection of an element is independent of another, e.g. ‘Every biscuit bar has an equal chance of being
selected, and the selection of one biscuit bar is not affected or influenced by the selection of another
biscuit bar’. Note that it is not sufficient to say ‘each biscuit bar has an equal chance of being selected’.
A sample is non-random if each element in the population does not have an equal chance of being
selected, resulting in certain segments of the population being over-represented, as some members are
“systematically or deliberately excluded” from the study and the sample being biased.
Tips:
1. To identify questions on CLT, look out when there is no mention of “normally distributed” or
“normal population” and the question asks for “the probability that the sample mean / average
value of X / sum…”, “by using a suitable approximation” or “estimate the probability”.
2
2. Do no write 𝑋~𝑁(µ, σ ) if question never mentions “normally distributed” or “normal
population”. There are other kinds of distributions that a population could follow.
2 2
3. When the population variance σ is unknown, the unbiased estimate of population variance 𝑠
will be used and the notation will change accordingly. This is especially important when we write
2
𝑠
the distribution of 𝑋~𝑁(µ, 𝑛
) approximately under hypothesis testing.
Unbiased Estimates of Population Mean and Population Variance:
In statistics, an estimate is considered to be “good” if it’s unbiased, i.e. the average value of the sample
statistic (used to estimate the population parameter) for all possible samples given the true value of the
population parameter.
In particular, the average value of 𝑋 gives the true value of µ, that is, 𝐸(𝑋) = µ.
σ
2 Sample variance No, sample variance Calculate the unbiased estimate of
is NOT an unbiased 2
population variance 𝑠 by using one of the
estimate of following:
population variance. 2 𝑛 𝚺(𝑥−𝑥)
2
(1) 𝑠 = 𝑛−1
[ 𝑛
]
2
2 1 2 (𝚺𝑥)
(2) 𝑠 = 𝑛−1
[𝚺𝑥 − 𝑛 ]
2 𝑛
(3) 𝑠 = 𝑛−1
(𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒)
2
2 1 2 (𝚺(𝑥−𝑎))
(4) 𝑠 = 𝑛−1
[𝚺 (𝑥 − 𝑎 ) − 𝑛
]
Hypothesis Testing
Definition:
(1) Null hypothesis 𝐻0: A statement that states the population mean is equal to a specific value which
usually implies status quo.
(2) Alternative hypothesis 𝐻1: It is the conclusion about the population mean that we will go for when the
null hypothesis is being rejected. It is a statement that states the population mean is different from the
specified value. The change in mean can be represented in the form of unequal, greater or smaller than the
specified value.
Tips: If the question suggests a one tail test, i.e. 𝐻1 : μ < 60 or μ > 60, we can use the value of the sample
mean to judge which one to go for. For example, if 𝑥 = 57 , it is definitely more meaningful to test against
𝐻1 : μ < 60.
Test statistic:
𝑋−µ 2 𝑋−µ 2
In our syllabus, we either use σ (when σ is known) or 𝑠 (when σ is unknown) as the test statistic.
𝑛 𝑛
2
(1) Case 1: σ is known
𝑋−µ
σ ~𝑁(0, 1) ⇒ Use Z-test
𝑛
Reason σ
2
If 𝑛 is large, by CLT,
If 𝑋 is normal, 𝑋~𝑁(µ, 𝑛
) 2
σ
exactly for any 𝑛 ≥ 1. 𝑋~𝑁(µ, 𝑛
) approximately.
Tips:
- 𝑛 = 30 is also considered as sufficiently large for CLT to work.
- If the question asks if it is necessary to assume 𝑋 is normally distributed, look out if CLT applies.
- Also check that the sample is a random sample for the test to be valid.
Level of significance:
a) Definition of 5% level of significance: The probability of wrongly rejecting 𝐻0 when 𝐻0 is true is
0.05. Do give your definition in the context of question if required by question. E.g. “there is a
0.05 probability of wrongly concluding that the mean IQ of the students is more than 118 when in
fact it is 118”.
b) Smallest level of significance for rejecting 𝐻0 is the p-value as α% ≥ 𝑝.
c) Useful implications:
- Reject 𝐻0 at 5% ⇒ reject 𝐻0 at 10% or any α higher than 5%.
- Do not reject 𝐻0 at 5% ⇒ do not reject 𝐻0 at 1% or any α lower than 5%.
P-value:
a) Definition: If 𝐻0 is rejected at a certain 𝑝-value, the probability of wrongly rejecting 𝐻0 when 𝐻0
is true is 𝑝. E.g. “The 𝑝-value = 𝑃(𝑋 ≥ 121) refers to the probability of getting a sample with
average IQ at least 121. This is also the value we will put down on our script if question asks for
the smallest level of significance at which 𝐻0 can be rejected in favour of 𝐻1”.
b) 𝑃-value is also the smallest level of significance for rejecting 𝐻0.
c) 𝑃-value of a two tail test = 2(𝑝-value of the corresponding one tail test based on the same sample)
For example, 𝐻0: µ = 60 𝐻1 : μ < 60
2 𝑋−µ
Suppose 𝑥 = 57, σ = 81 and 𝑛 = 30 and test statistic σ =− 1. 826.
𝑛
81
𝑝-value = 𝑃(𝑋 < 57 when 𝑋~ 𝑁(60, 30
)) = 𝑃(𝑍 <− 1. 826 when 𝑍~ 𝑁(0, 1)) = 0. 0339
For 𝐻0: µ = 60 𝐻1 : μ ≠ 60, 𝑝-value = 2(0.0339).
Conclusion:
Reject 𝐻0 means accept 𝐻1. Do not reject 𝐻0 is NOT the same as accept 𝐻0.
Make sure you conclude by saying
- reject 𝐻0 at α% level of significance. Sufficient evidence that 𝐻1 is true in the context of question
OR
- do not reject 𝐻0 at α% level of significance. Insufficient evidence that 𝐻1 is true in the context of
question
Step 2 Write down the level of significance α (usually given in the question)
Step 3 Decide on the test statistic to be used and determine its distribution
Example:
Let 𝑋 be the IQ of a student in ABC University. Remember to always define
X if these are not defined in
the question.
Step 1: To test 𝐻0: µ = 118 vs 𝐻1 : μ > 118 Read question carefully to
decide >, < or ≠
Step 2: Perform a 1-tail test at 5% level of significance Decide if 1-tail or 2-tail test
should be performed
Step 3: (Sample from a Normal population of known variance)
2
σ
Under 𝐻0, 𝑋~𝑁(µ, 𝑛
), where µ0=118 and σ=12.
From the sample, 𝑥 = 121 and 𝑛 = 50.
OR
OR
Step 3: (Large sample from a non-Normal population of unknown
variance)
2
𝑠
Under 𝐻0, since 𝑛 = 60 is large, 𝑋~𝑁(µ, 𝑛
)
approximately by Central Limit Theorem, with µ0=118.
From the sample, 𝑥 = 121 and 𝑠 = 97. 454.
Step 4: Using a z-test, 𝑝-value = 𝑃(𝑋 ≥ 121) = 0. 0385 (3𝑠. 𝑓.) “𝐻1 : µ > µ0” ⇒ 𝑃(𝑋 ≥ 𝑥)
“𝐻1 : µ < µ0” ⇒ 𝑃(𝑋 ≤ 𝑥)
“𝐻1 : µ ≠ µ0” ⇒
2𝑃(𝑋 ≤ 𝑥) if 𝑥 < µ0
or 2𝑃(𝑋 ≥ 𝑥) if 𝑥 > µ0
Step 5: Since 𝑝-value = 0.0385 < 0.05, we reject 𝐻0 and conclude that
there is sufficient evidence, at 5% level of significance, to support the
claim that the mean IQ of students in ABC University is greater than
118.
OR
(if test is performed at 1% level of significance)
2 2
Case 3: µ0 or 𝑛 or σ or 𝑠
All steps are similar to that of Case 2 above except we have to standardize in order to solve for
the unknown parameter.
Example: (to find µ0)
𝑝-value ≤ 0.05
𝑃(𝑋 ≥ 121) ≤ 0. 05
121−µ0
𝑃(𝑍 ≥ 2
) ≤ 0. 05
12
50
…
Correlation & Regression
Definition:
An independent variable is the variable whose change will have an effect on the dependent variable.
Sometimes the independent variable can be controlled so that the variable only assumes a set of
predetermined values.
A scatter diagram is a two-dimensional plot, with the values of one variable plotted along each axis. We
plot the independent variable along the horizontal axis. A scatter diagram is used to show visually the
relation between two variables, and it helps to identify outliers.
The product moment correlation coefficient, denoted by 𝑟, is a measure of the strength of the linear
relation between two variables. The value of 𝑟 is independent of the units of the variables, and -1≤ 𝑟 ≤1.
Points lie close to a curve Non-linear relation between Depends on how close the curve
variables is to a straight line
Note the product moment correlation coefficient merely gives an idea of the linear relationship between
the variables. It does not imply any cause-and-effect relationship between the variables. There may be
intermediate variables involved in the relationship which we do not know about, or there may even be
more than one explanation to the linear relation.
Linear regression attempts to model the relationship between two variables by fitting a linear equation to
a set of observed data.
Regression Line:
The least squares regression line of 𝑦 on 𝑥 has the form 𝑦 = 𝑎 + 𝑏𝑥. It is used when
- 𝑥 is the independent variable and 𝑦 is the dependent variable; or
- The independent/dependent variable cannot be determined and you want to estimate 𝑦 for a given
value of 𝑥.
The least squares regression line of 𝑥 on 𝑦 has the form 𝑥 = 𝑐 + 𝑑𝑦. It is used when
- 𝑦 is the independent variable and 𝑥 is the dependent variable; or
- the independent/dependent variable cannot be determined and you want to estimate 𝑥 for a given
value of 𝑦
Note:
- Both the regression lines of 𝑦 on 𝑥 and 𝑥 on 𝑦 pass through the point (𝑥, 𝑦).
- In general, the regression line of 𝑦 on 𝑥 is not the same as the regression line of 𝑥 on 𝑦.
- The regression lines of 𝑦 on 𝑥 and 𝑥 on 𝑦 are the same if and only if the product moment
correlation coefficient between 𝑥 and 𝑦 is 1 or -1. The closer the coefficient is to either of these
values, the closer are the lines to each other.
Linearization of Data
There are cases where the relation between the variables are non-linear. However, through a suitable
transformation on the data, it may still be possible to find a linear relation between the variables for the
2
transformed data, e.g. 𝑥 and 𝑦 have a non-linear relation if 𝑦 = 𝑎𝑥 + 𝑏, but there is a linear relation
2
between 𝑥 and 𝑦.