Professional Documents
Culture Documents
Midterm Test Revision: Charanjit Kaur
Midterm Test Revision: Charanjit Kaur
Learning Outcomes
1
25/08/2023
Population:
All members of a group about which you want to draw a conclusion.
Eg. All voters in an election, all Telstra shareholders, all invoices
submitted to Medicare for reimbursement, etc.
Data Types
Numerical
Numerical
operations Categorical Numerical operations
are not (Qualitative)
meaningful.
(Quantitative) are
meaningful.
2
25/08/2023
3
25/08/2023
Central
Variation Shape
Tendency
What is the typical or the central value? How much variation in the distribution? Are there any
unusual values
that
contribute to
the
distribution?
Median: The middle value if values are sorted from smallest to largest (50th percentile).
50% of values are equal to or lower than the median, and 50% are equal to or higher.
In Excel : =MEDIAN(…)
All are measures of central tendencies, but which one should we use?
4
25/08/2023
Variance: average squared deviations (distance) from the mean. Reported in squared units
In Excel: =VAR.S(…)
5
25/08/2023
Excel Functions:
For probability “=NORM.DIST(xvalue,mean,stdev,TRUE)”
For percentile “=NORM.INV(prob, mean,stdev)”
Potential biases:
• Selection bias – each identity in the population has an uneven chance of being chosen
• Non-responsive bias – data collection process leading to systematic non-response from certain
groups
• 𝒙
ഥ is an estimate of 𝑬 𝑿 = 𝝁 (Sample statistic is only an estimate of the
truth. Any sample statistic is not exact and has variation/error around
them.)
• Assume we take data samples repeatedly, and compute sample means as the
statistic for each set of sample. Then we would have the sampling
distribution of the sample mean to portray its variability.
𝒔
• Central Limit Theorem: If the sample size 𝒏 is large: ഥ
𝒙 ∼ 𝑵 𝝁,
𝒏
6
25/08/2023
𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
If the sample size (n) ↑,standard error ↓, width ↓, estimate is more precise
𝑛
The bigger the sample, the more information we have to increase the precision of the interval estimate of the
sample mean, the narrower the interval.
If the level of confidence (1-α) ↑, critical value changes, width ↑ , the estimate is less precise
The more confident we are, the more values we need to include in our confidence interval, the wider the
interval.
Sample
Sampling
STATISTICS Distribution
DESCRIPTIVE INFERENTIAL
ESTIMATION
HYPOTHESIS TESTS
Point & Interval
7
25/08/2023
1 2 3 4
Formulate Decide Calculate Apply
𝐻0 & 𝐻1 on the p-value decision rule:
reject 𝐻0
if p-value <
OR retain it if
p-value >
•The alternative hypothesis is what we are searching evidence for. It can contain an “≠” , “>” or “<“ sign
𝐻0 : 𝜇 = 𝜇0 𝐻0 : 𝜇 = 𝜇0
𝐻0 : 𝜇 = 𝜇0
𝐻1 : 𝜇 > 𝜇0 𝐻1 : 𝜇 < 𝜇0
𝐻1 : 𝜇 ≠ 𝜇0
ҧ 0
𝑥−𝜇 ഥ−𝝁
𝒙
𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝑠/ 𝑛
= 𝑺𝑬 𝒙ഥ𝟎
3 Judging whether or not the test statistic is outstanding “far from zero”, in the
direction of the alternative.
Decision:
P-value for a right-tail test P-value for a left-tailed test P-value for a two-tail test
=1-NORM.S.DIST(test statistic ,TRUE) =NORM.S.DIST(test statistic ,TRUE) =2*NORM.S.DIST(??,TRUE)
8
25/08/2023
𝑯𝟎 is TRUE 𝑯𝟎 is FALSE
Do not reject 𝑯𝟎 CORRECT TYPE II ERROR
DECISION! (β)
Reject 𝑯𝟎 TYPE I ERROR CORRECT
(α) DECISION!