Professional Documents
Culture Documents
Formulas .................................................................................................................................. 3
Week 1-2. Descriptive Statistics and Probability ................................................................ 7
1.1 Terms ............................................................................................................................. 7
1.2 Counting Techniques .................................................................................................. 7
1.3 Probability ..................................................................................................................... 8
1.4 Measures of Central Tendency, Variation and Position ........................................ 11
Week 3. Random Variables ................................................................................................. 13
3.1 Discrete and Continuous Random Variables .......................................................... 13
Week 4. Probability Distributions ....................................................................................... 14
4.1 Special Discrete Probability Distributions ............................................................... 14
Week 5 Normal Distribution................................................................................................ 16
5.1 Normal Curve Distribution ........................................................................................ 16
Week 6. Sampling Distributions ......................................................................................... 19
6.1 Sampling Distribution/ Mean and Variance of the Sampling Distribution ......... 19
6.2 Central Limit Theorem ............................................................................................... 19
Week 7. Estimation of Parameters...................................................................................... 20
7.1 Concepts ..................................................................................................................... 20
7.2 Point Estimate of a Population Mean ....................................................................... 20
7.3 Confidence Interval Estimates of the Population Mean ........................................ 21
7.4 Point Estimate for the Population Proportion ......................................................... 21
7.5 Interval Estimates for Population Proportions ........................................................ 22
Week 8. Hypothesis Testing ................................................................................................ 22
8.1 Test of Hypothesis ...................................................................................................... 22
8.2 Types of Error in Hypothesis Testing ....................................................................... 24
8.3 Hypothesis Test of a Population Mean .................................................................... 25
8.4 Hypothesis Test of a Population Proportion ........................................................... 25
8.5 Hypothesis Test of a Difference of Two Means (Population) ................................ 25
8.6 Hypothesis Test of a Difference of Two Means (Proportion) ................................ 26
8.7 Hypothesis Test of a Difference of Two Means (Paired Samples) ........................ 26
Week 9. Exploring Relationships ........................................................................................ 27
9.1 Bivariate Data/Correlation Analysis ......................................................................... 27
9.2 Pearson Product-Moment Correlation .................................................................... 28
9.3 Regression Analysis.................................................................................................... 29
9.4 Test of Rho................................................................................................................... 30
References ............................................................................................................................. 31
Formulas
Multiplication Counting by Cases Permutations Special
Rule Permutations
m1 + m2 + ⋯ mk n!
n Pr = (objects repeated)
a×b×⋯k (n − r)!
n!
n1 ! n2 ! ⋯ nk !
Circular Combinations Probability of an Complementary
Permutation Event Rule/Probability
n!
nC r = n Rule for
(n − 1)! (n − r)! r! P(A) =
N Complements
P(Ac ) = 1 − P(A)
Bayes’ Rule
P(Br ∩ A) P(Br ) × P(A|Br ) P(Br ) × P(A|Br )
P(Br |A) = = = k
P(A) P(A) ∑i=1 P(Bi )P(A|Bi )
Binomial Distribution
Probability Mean Variance
( 𝑛𝑥 )𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 𝑛𝑝 𝑛𝑝(1 − 𝑝)
Hypergeometric Distribution
Probability Mean Variance
C
k x × C
N−k n−x 𝑛𝑘 𝑛𝑘 𝑘 𝑁−1
(1 − ) ( )
N Cn 𝑁 𝑁 𝑛 𝑛−1
Poisson Distribution Conversion of 𝑥 to 𝑧.
𝑒 −𝜇 ∙ 𝜇 𝑥 𝑥−𝜇
𝑧=
𝑥! 𝜎
Sampling Distribution
Mean Standard Deviation Variance (w/ replacement)
𝜇𝑋̅ = 𝜇𝑋 𝜎𝑋 𝜎𝑋2
𝜎𝑋̅ = 𝜎𝑋̅2 =
√𝑛 𝑛
Variance (w/o replacement)
𝜎𝑋2 𝑁 − 𝑛
𝜎𝑋̅2 = ×
𝑛 𝑁−1
Estimation of Population Mean
Population Mean Case 1
𝜇 = 𝑥̅ 𝜎
𝑒 = 𝑍𝑎
2 √𝑛
Case 2A Case 2B
𝑠 𝑠
𝑒 = 𝑍𝑎 𝑒 = 𝑡𝑎
2 √𝑛 2 √𝑛
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 (𝑑𝑓) = 𝑛 − 1
Estimation of Population Proportion
Population Proportion Interval Estimation
𝑥
𝑝 = 𝑝̂ = 𝑝̂ (1 − 𝑝̂ )
𝑛 𝑒 = 𝑍𝑎 √
2 𝑛
Sample Size Determination
𝑍𝑎 2
2
𝑛= 𝑝̂ (1 − 𝑝̂ )
𝑒2
Hypothesis Testing Population Mean
Case 1 Case 2
𝑥̅ − 𝜇0 𝑥̅ − 𝜇0
𝑧= 𝜎 𝑡= 𝑠
√𝑛 √𝑛
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 (𝑑𝑓) = 𝑛 − 1
Case 2
𝑥̅1 − 𝑥̅2
𝑡=
1 1
√𝑠 2𝑝 ( + )
𝑛1 𝑛2
(𝑛1 − 1)𝑠 21 + (𝑛2 − 1)𝑠 22
𝑝𝑜𝑜𝑙𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑠 2𝑝 ) =
𝑛1 + 𝑛2 − 2
𝑑𝑓 = 𝑛1 + 𝑛2 − 2
𝑑̅
𝑡= 𝑠
𝑑
√𝑛
Pearson Product-Moment Correlation Coefficient
∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑟=
√∑(𝑥𝑖 − 𝑥̅ )2 ∑(𝑦𝑖 − 𝑦̅)2
a×b×⋯k
Note: Use this when there are multiple independent events, each with their own
outcomes, and you want to know how many outcomes there are for all the events
together.
Counting by Cases- If an experiment has k cases, then the total number of outcomes
of the experiment is
m1 + m2 + ⋯ mk
(n − 1)!
Note: Use this when you are counting the number of ways to choose and arrange a
given number of objects from a set of objects.
1.3 Probability
• Union (∪)- the union of 2 events is the collection of all outcomes that are
elements of one or the other of the sets, or of both of them. It corresponds to
combining descriptions of the two events using the word “or”. e.g. A ∪ B
• Intersection (∩)- the intersection of 2 events is the collection of all outcomes
that are elements of both of the sets. It corresponds to combining descriptions
of the two events using the word “and”. e.g. A ∩ B
• Complement (A′ or Ac )- the complement of an event is the event containing
elements from the sample space that are not in the event. e.g. for S={1,2,3},
A={1,2}, Ac ={3}.
Mutually Exclusive Events- 2 events whose intersection is an empty set/means that the
intersection is impossible to happen.
P(Ac ) = 1 − P(A)
Bayes’ Rule- if the events 𝐵1 , ⋯ 𝐵𝑘 constitute a partition of the sample space, where
𝑃(𝐵𝑖 ) ≠ 0 for all 𝑖, and the probability for any event A in the sample space is not 0,
then
Central Tendency
∑𝑥𝑖
𝑁
o Sample (𝑥̅ )
∑𝑥𝑖
𝑛
• Median (Middlemost Value)
𝑛+1
Median Position =
2
Note: if the set is even numbered, take the average of the middle 2 numbers (Median
position rounded up and rounded down, e.g. Median Position=7.5, take 7th and 8th).
Variation
∑(𝑥𝑖 − 𝜇)2
√
𝑁
o Sample (𝑠)
∑(𝑥𝑖 − 𝑥̅ )2
√
𝑛−1
• Variance
o Population (𝜎 2 )
∑(𝑥𝑖 − 𝜇)2
𝑁
o Sample (𝑠 2 )
∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
• Interquartile Range (IQR)
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
• Coefficient of Variation
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = × 100%
𝑚𝑒𝑎𝑛
Position
∑ 𝑥 ∙ 𝑝(𝑥)
√𝜎𝑥 or √𝑉𝑎𝑟[𝑋]
Uniform Distribution
• Conditions
o The probability of each outcome of the experiment has equal
probability.
• Mean (𝜇𝑥 /𝐸[𝑋])
∑ 𝑥 ∙ 𝑝(𝑥)
Binomial Distribution
• Conditions
o Experiment consists of 𝑛 independent repeated trials.
o Each trial results either in a success or failure.
o The probability of success (𝑝) is constant.
• Formula (𝑥 is the number of successes):
( 𝑛𝑥 )𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 𝑜𝑟 nC x 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
• Mean (𝜇𝑥 /𝐸[𝑋])
𝐸[𝑋] = 𝑛𝑝
• Variance (𝜎𝑥2 /𝑉𝑎𝑟[𝑋])
𝑉𝑎𝑟[𝑋] = 𝑛𝑝(1 − 𝑝)
Hypergeometric Distribution
• Conditions
o A random sample size (𝑛) is selected from a population (𝑁).
o 𝑘 of the population (𝑁) are considered success and the rest (𝑁 − 𝑘 ) are
considered failures.
• Formula (𝑥 is the number of successes):
( 𝑘𝑥 )( 𝑁−𝑘
𝑛−𝑥 ) kC x × N−k Cn−x
𝑜𝑟
( 𝑁𝑛 ) N Cn
𝑛𝑘
𝐸[𝑋] =
𝑁
• Variance (𝜎𝑥2 /𝑉𝑎𝑟[𝑋])
𝑛𝑘 𝑘 𝑁−1
𝑉𝑎𝑟[𝑋] = (1 − ) ( )
𝑁 𝑛 𝑛−1
Poisson Distribution
• Conditions
o Experiments yield the number of outcomes occurring during a given
time interval or in a specified region.
• Formula:
𝑒 −𝜇 ∙ 𝜇 𝑥
𝑥!
• The mean and variance are both 𝜇.
Week 5 Normal Distribution
5.1 Normal Curve Distribution
Normal Curve
• Mean
𝜇𝑋̅ = 𝜇𝑋
• Standard Deviation
𝜎𝑋
𝜎𝑋̅ =
√𝑛
• Variance (with replacement)
𝜎𝑋2
𝜎𝑋̅2 =
𝑛
• Variance (without replacement)
𝜎𝑋2 𝑁 − 𝑛
𝜎𝑋̅2 = ×
𝑛 𝑁−1
Estimator- a rule or formula used for obtaining a value of the parameter of interest.
Estimate- a numerical value that resulted from substituting the sample values in the
formula.
The best point estimate for the population mean is the sample mean
𝜇 = 𝑥̅
7.3 Confidence Interval Estimates of the Population Mean
Confidence Interval: 𝑥̅ ± 𝑒
Case 1: 𝜎 is known
𝜎
𝑒 = 𝑍𝑎
2 √𝑛
Case 2A: 𝜎 is unknown and 𝑛 is large (𝑛 ≥ 30)
𝑠
𝑒 = 𝑍𝑎
2 √𝑛
Case 2B: 𝜎 is unknown and 𝑛 is small (𝑛 < 30)
𝑠
𝑒 = 𝑡𝑎
2 √𝑛
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 (𝑑𝑓) = 𝑛 − 1
𝑎
Note: 𝑍𝑎 and 𝑡𝑎 are the z and t scores with an area to the right
2 2 2
Sample values
Confidence Interval: 𝑥̅ ± 𝑒
𝑝̂ (1 − 𝑝̂ )
𝑒 = 𝑍𝑎 √
2 𝑛
𝑍𝑎 2
2
𝑛= 𝑝̂ (1 − 𝑝̂ )
𝑒2
Two-tailed Test Upper Tail-One tailed Test Lower Tail-One tailed Test
𝐻0 : 𝜇 = 85 𝐻0 : 𝜇 = 85 𝐻0 : 𝜇 = 85
𝐻𝑎 : 𝜇 ≠ 85 𝐻𝑎 : 𝜇 > 85 𝐻𝑎 : 𝜇 < 85
Note: The equal sign always goes with the null hypothesis; but this does not prevent
you from having a null hypothesis which is ≤ or ≥.
Types of Testing
• Two-Tailed Test- if the problem asked you to show if the mean is different from
the hypothesized value.
• Upper Tailed Test- if the problem asked you to show if the mean is greater than
the hypothesized value.
• Lower Tailed Test- if the problem asked you to show if the mean is less than the
hypothesized value.
Level of Significance (𝛼 )- the probability of committing a type 1 error/total area of the
rejection region.
Test Statistic Value – value computed from the sample data.
• Two-tailed test
o The rejection region is on both ends of the curve.
o The critical region is ±𝑍𝛼
2
• Upper-tail test
o The rejection region is on the right side (positive area)
o The critical region is 𝑍𝛼
• Lower-tail test
o The rejection region is on the left side (negative area)
o The critical region is −𝑍𝛼
𝛼
Note: Two-tailed tests have the areas halved ( ) because the total rejection area must
2
be 𝛼 and the rejection area is on both sides.
Same steps with population mean testing, but with a different test statistic value.
𝑃̂ − 𝑃0
𝑧=
√𝑃0 (1 − 𝑃0 )
𝑛
Since it is still hypothesis testing, same steps apply, only the test statistic value is
changed.
Case 1. When the population standard deviation (𝜎) is given or population standard
deviation (𝜎) is not given but, 𝑛1 , 𝑛2 ≥ 30
𝑥̅1 − 𝑥̅2 𝑥̅1 − 𝑥̅2
𝑧= 𝑜𝑟 𝑧 =
𝜎12 𝜎22 𝑠12 𝑠22
√ √
𝑛1 + 𝑛2 𝑛1 + 𝑛2
Case 2. when the population standard deviation (𝜎) is not given, but known to be
equal and 𝑛1 , 𝑛2 < 30
𝑥̅1 − 𝑥̅2
𝑡=
1 1
√𝑠 2𝑝 ( + )
𝑛1 𝑛2
(𝑛1 − 1)𝑠 21 + (𝑛2 − 1)𝑠 22
𝑝𝑜𝑜𝑙𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑠 2𝑝 ) =
𝑛1 + 𝑛2 − 2
𝑑𝑓 = 𝑛1 + 𝑛2 − 2
𝑝̂1 − 𝑝̂ 2
𝑧=
1 1
𝑃̅(1 − 𝑃̅) ( + )
𝑛1 𝑛2
𝑥1 + 𝑥2
𝑃̅ =
𝑛1 + 𝑛2
Paired/Matched Data- data that are connected with each other and cannot be
interchanged.
Matched samples are used to know the difference between the true means of paired
data.
Two-tailed Test Interpretation
𝐻0 : 𝜇𝑑 = 0 𝐻0 : No significant difference in the
𝐻𝑎 : 𝜇𝑑 ≠ 0 population means.
𝐻𝑎 : There is a significant difference in
the population means.
Interpretation
𝐻0 : 𝜇𝑑 = 0 𝐻0 : No significant difference in the
𝐻𝑎 : 𝜇𝑑 < 0 population means.
𝐻𝑎 : Population mean 1 is less than
population mean 2; the difference is
negative.
Interpretation
𝐻0 : 𝜇𝑑 = 0 𝐻0 : No significant difference in the
𝐻𝑎 : 𝜇𝑑 > 0 population means.
𝐻𝑎 : Population mean 1 is greater than
population mean 2; the difference is
positive.
Formula (𝑑𝑓 = 𝑛 − 1):
𝑑̅
𝑡= 𝑠
𝑑
√𝑛
Bivariate Data- Data for two variables, usually two types of related data.
Scatter Plot/Scatter Diagram- a plot of the pairs of values of two variables in a
rectangular coordinate plane displaying linear the relationship between the two
variables. It is used as a tool to analyze graphically the association between two
variables.
Strength
Formula:
Simple Linear Regression Model- the equation that describes how a variable 𝑦 is
related to a variable 𝑥 and an error term 𝜀.
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Where:
𝑦- dependent variable
𝑥 - independent variable
𝛽0 𝑎𝑛𝑑 𝛽1 - are parameters of the model (measures computed using population
data)
Estimated Regression Equation- since population data is usually unknown, estimates
of 𝛽0 and 𝛽1 (𝑏0 and 𝑏1 ) using sample data may be used.
𝑦̂ = 𝑏0 + 𝑏1 𝑥
Note: 𝑦̂ is only the estimated value of 𝑦
In the estimated simple linear regression equation, 𝑏0 is the y-intercept and 𝑏1 is the
slope. 𝑏0 and 𝑏1 are called the coefficients of the estimated regression equation.
𝑏1 gives the increase in value for every one unit increase in 𝑥 . If it is positive, then 𝑦 is
directly related to 𝑥 . If it is negative, then 𝑦 is inversely related to 𝑥 .
Note: The coefficients can be found using the same way the Pearson Product-
Moment Correlation Coefficient is found in excel.
The hypothesis test of Rho (𝜌) is done in the same way as other hypothesis tests, but
with a different test statistic value.
𝑟 √𝑛 − 2
𝑡=
√1 − 𝑟 2
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 (𝑑𝑓) = 𝑛 − 2
References
Shafer & Zhang. (2019). Introductory Statistics.
https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Intro
ductory_Statistics_(Shafer_and_Zhang)
Pearson Correlation Coefficient Calculator. (n.d.)
https://www.socscistatistics.com/tests/pearson/
Various personal notes/Handouts given to me by our teacher last year (STEM)