You are on page 1of 13

Lesson structure

Recap of lecture 4: Understanding Statistical Uncertainty


Part 1: Sampling distribution and confidence intervals [Questions & Discussion (Tutor-led)]
• Quiz Part A
Part 2: Sampling variability of proportions [Questions & Discussion (Tutor-led)]
• Quiz Part B
Summary

Please make sure to sit in your assignment groups


Basic Concepts of Statistics: Population and Sample
Sample:
A subset of the population selected for analysis.
• Often chosen randomly
• Preferably representative of the population
Statistic (Estimate): Computable summaries of the sample
• Sample mean (𝑥)ҧ and
• Sample standard deviation (𝑠)

Population:
All members of a group about which you want to draw a conclusion.
Parameter: A measurable characteristic of a population
• population mean (𝜇)
• population standard deviation (𝜎)
Sampling Distribution of the Sample Mean

Central Limit Theorem: If the sample size 𝒏 is large,


𝑺𝒕𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
ഥ ∼ 𝑵 𝑴𝒆𝒂𝒏,
𝒙 𝑆𝐸 𝑥ҧ =
𝑠
𝒔𝒂𝒎𝒑𝒍𝒆 𝒔𝒊𝒛𝒆 𝑛

• The sample mean 𝒙


ഥ is centred around the true mean
𝑠
• Its uncertainty is measured by the standard error 𝑆𝐸 𝑥ҧ =
𝑛

• Large sample size → 𝒙


ഥ is more precise estimate of the true population mean
Confidence Interval for the Population Mean
Plausible range of the unknown population mean given some level of probability
𝜶
𝒛 𝟏−
𝜶 = 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉 𝟏 − 𝟐
𝟐
1–a

lower bound upper bound


𝒔 𝒔
Width of the ഥ+𝒛
𝒙 𝜶
ഥ− 𝒛
𝒙 𝟏−
𝜶
confidence interval 𝟏− 𝟐 𝒏
𝟐 𝒏
The width of a confidence interval indicates the precision of the estimate.
• (1−𝛼) is referred to as the level of confidence
• 𝛼 is the probability left in the “tail ends” of the confidence intervals. E.g. for 90%
confidence interval, 𝛼 = 1 − 0.9 = 0.1
Part A : Tutor led (Sampling distribution and confidence intervals

Refer to the data set for Week 5 tutorial and go to the " Cassava " worksheet. You will
see the price received by farmers in Timor-Leste for Cassava, measured per
kilogram. You have already analysed the summary statistics in the Week 4 Tutorial. We
have provided this information in the same worksheet.
Take note of the sample mean, sample standard deviation, sample size and the reported
standard error from summary statistics obtained using Data Analysis Toolpak.

Instructions:
Construct the 90% confidence interval of the true mean price of Cassava. (Note: The
95% interval has been calculated for you and is available in the datasheet. We
encourage you to re-calculate this in your own time as practice).
Part A : Questions

T1. Report and interpret the 90% confidence interval of the population mean in 3
decimal places.

T2. Compare the width of the two confidence intervals calculated. Indicate which
two statements statement are INCORRECT.
• The two sets of confidence intervals have the same point estimate.
• The true mean price is guaranteed to be within the range of the widest
interval.
• With a different price sample, the confidence intervals could be different.
• The 95% confidence interval estimate is more precise than the 90% interval
estimate.
Part A : Questions

T3. Assume that 100 new observations were obtained for the prices of
Cassava, and these were added to the existing data set. Calculate the standard
error of the mean under this larger data set, assuming that the mean and
standard deviation of the data set remains unchanged. State your answer in 4
decimal places.

T4. Calculate the new 90% confidence interval for the larger data set in
question 3 above. What was the effect of the increase in the sample size on the
confidence interval?
Quiz Part A

• Click on Week 5 Tutorial quiz on Moodle


• Follow the instructions and complete only the questions in Part A
• Time allowed: 20 minutes
Summary of what we did so far..

• How to construct and interpret confidence intervals:


Lower & upper bounds of a 1 − 𝛼 confidence interval for the sample mean
𝑠 𝑠 𝒔
𝑋−𝑍 < 𝜇 < 𝑋+𝑍 ഥ±𝒁
𝒙
𝑛 𝑛 𝒏

• Sample statistics are uncertain due to random sampling


• 95% confidence interval: This interval is expected to cover the true
population mean 95% of the time in repeated sampling.
• Effects of changing sample size – more precision as you collect more data.
• Standard deviation and confidence level can also affect the width. Refer to
Seminar 4 slides.
Part B: Sampling variability of proportions
Let’s analyse the proportions of houses in a Melbourne suburb that Auction sells. Go back
to the Real Estate problem and refer to the worksheet labelled RealEstate_sold by auction.
It provides information on properties sold by auction and not by auction in columns A and
B, respectively. We have also provided the summary statistics.

T1. We know that of the 329 properties sold in Melbourne, 178 are sold by auction. What
is the point estimate for the proportion of properties sold by auction in Melbourne? State
your answer in 3 decimal places.

T2. What is the standard error of the proportion of properties sold by auction? State your
answer in 3 decimal places.

T3. Complete Table 1 and construct the 95% confidence interval of the true proportion of
houses sold by auction. State your answers as a percentage using 3 decimal places
Part B: Sampling variability of proportions

T4. Is the following claim validated by the analysis: “More than half of properties in this
Melbourne suburb are sold by Auction”?
• Yes
• No

T5. We wish to use statistical evidence to validate the claim that “Properties sold by
auction achieve higher prices on average”. Complete Table 2 and construct the confidence
interval estimates for the true mean price of houses sold by Auction and not auction.
Using the two intervals, can we validate the claim?
Quiz Part B

• Click on Week 5 Tutorial quiz on Moodle


• Follow the instructions and complete only the questions in Part B
• Time allowed: 20 minutes
Confidence intervals of Proportions: Summary

• Confidence intervals of proportions – adjust the calculation of standard


errors.
• Other steps remain the same as confidence interval of sample mean.
• Comparison of segmented data – comparing confidence intervals can be
useful.

You might also like