Professional Documents
Culture Documents
013-3547250
ainina@tmsk.uitm.edu.my
Contents
Sampling Distribution
• Introduction
• Sampling Distribution of the Sample Mean, x
• Central Limit Theorem (CLT)
• Probability Distribution of the Sample Mean, x
Estimation
• Introduction
• Point Estimation
• Interval Estimation (Confidence Interval)
• One Population Mean
• Two Population Means
• One Population Variance
• Ratio of Two Population Variances
Sampling Distribution: Introduction
• The following are some required definitions:
Term Definition
Population All elements under study either living or non-living object
Sample Subset or part of population
Population parameter A summary measure/characteristics obtained from population
Sample statistic A summary measure/characteristics obtained from sample
Population distribution The probability distribution of the population data
Sampling distribution The probability distribution of a sample statistics
• In
statistical inference, we are interested in making an inference or generalization or conclusions concerning a
population based on sample information.
Sampling Distribution: Introduction
• Example of numerical descriptive measures are mean, standard deviation and variance. The formulas are as in the following
table:
Population (Parameter) Sample (Statistics)
X x
Mean μ= x=
N n
1 X 2 1 x 2
Standard deviation σ= X2 − s= x2 −
N N n−1 n
2 2
2
1 2
X 2
1 2
𝑥
Variance σ = X − s = x −
N N n−1 n
• Example 2.1 : A new battery of Model XY has a life span with a mean of 40 months and standard deviation of 4 months. A
sample of 50 batteries selected showed an average life span of 39 months and standard deviation of 4.2 months.
• Therefore, μ = 40, σ =4, x=39, s=4.2
Sampling Distribution: Sampling Distribution of the Sample Mean, x
• A sampling distribution of sample means is a distribution obtained by using the means computed from random samples of a
specific size taken from a population.
• Example 2.2: If a lecturer gives a 10 point quiz to a population of 4 students. The results of the quiz were 2, 4, 6 and 8.
1 X 2 1 20 2
σ= X2 − = 120 − = 2.2361
N N 4 4
Sampling Distribution: Sampling Distribution of the Sample Mean, x
• Example 2.2: If a lecturer gives a 10 point quiz to a population of 4 students. The results of the quiz were 2, 4, 6 and 8.
• Now, take a sample of size 2 with replacement and find the mean of each sample.
2 4 6 8 2 4 6 8
• If the original population is not normally distributed, the distribution of the sample mean will be normally distributed
for a sample size of 30 or more. The normal and not normal population distributions together with their respective
sampling distributions of 𝑥 for different sample sizes, 𝑛.
Rule of thumb:
n ≥ 30
(consider large)
Sampling Distribution: Probability Distribution of the Sample Mean, x
• A sampling distribution of sample means is a distribution obtained by using the means computed from random samples of a
specific size taken from a population.
The Central Limit Theorem (CLT) on the Distribution of the Sample Mean, 𝒙
The mean of the sample means will be the same as the population mean, μx = μ.
The standard deviation (standard error) of the sample means will be smaller than the
standard deviation of the population, and it will be equal to population standard
σ
deviation divided by the square root of the sample size, σx = n
• Note: If the population is not normally distribution or there is no information regarding the population, then the distribution
of the sample means tends to be normally distributed when the sample size is sufficiently large. That is, when n ≥ 30.
Sampling Distribution: Probability Distribution of the Sample Mean, x
Notation
σ2
X~ N μ, n
Formula
X−μ
Sampling distribution of
Z= σ
n
the Sample Mean, 𝑿 X =sample mean
μ=population mean
(CLT) σ=population standard deviation
n=sample size
Mean, μX = μ
2 σ2
Variance, σ X =
n
σ
Standard error
Standard Deviation, σX = of the mean
n
Sampling Distribution: Probability Distribution of the Sample Mean, x
Notation σ2
X~N(μ, σ2 ) X~N(μ, )
n
ii. The estimator should be relatively efficient estimator. That is, of all the statistics that can be used to estimate a
parameter, the relatively efficient estimator has the smallest variance. θ1 is more efficient estimator of θ than θ2 if
Var θ1 < Var θ2
iii. The estimator should be consistent. For a consistent estimator, as sample size increases, the value of the estimator
approaches the value of the parameter estimated. lim Var θ = θ
𝑛→∞
Estimation: Introduction
• Estimation refers to the process by which one makes inferences about a population, based on information obtained
from a sample.
• Point estimation is a single value calculates from the sample • Interval estimation is an interval or a range of values used
data used to estimate the population parameter. to estimate the parameter. This estimate may or may not
contain the value of the parameter being estimated.
• Also known as confidence interval.
For example, suppose we want to estimate the mean income of workers in Company A. For n=25 workers,
• The mean income, =RM2500/month • The mean income is between RM2300 and RM2700/month.
Estimation: Point Estimation
• Point estimator: A single number calculated from the sample to estimate the population parameter.
• To generalize the estimation to the population, the sample must be a random sample. A random sample is a sample
which each element in the population has an equal chance to be included in the sample.
• The following table indicates the best point estimator for each parameter:
1 x 2
σ s= x2 −
n−1 n
2
1 𝑥
σ2 2
s = 2
x −
n−1 n
x
p p=
n
Estimation: Point Estimation
Example 2.6: The total time for exercise in a week among 8 career women is selected. The resulting observations are 10.2
9.3 11.9 9.2 8.3 11.2 10.4 9.5. What are the point estimates of mean and standard deviation of exercise time?
(Ans: 10, 1.1662)
Estimation: Interval Estimation
• Interval estimation: two numbers calculated from the sample to form an interval within which the parameter is expected to
lie with a specified level of confidence.
• The interval is constructed around the point estimate. This interval estimate is also known as confidence interval.
• We can write the confidence interval for a parameter θ as:
x ± zα σ σ2 known sd
2 n d ± t α,n−1
𝐏𝐨𝐢𝐧𝐭 𝐞𝐬𝐭𝐢𝐦𝐚𝐭𝐞 ± 𝐦𝐚𝐫𝐠𝐢𝐧 𝐨𝐟 𝐞𝐫𝐫𝐨𝐫 2 n
σ12 σ22 d
σ2 unknown (x1 −x2 ) ± zα +
n1 n2
d=
n
2
Construct a 95% confidence interval for the
1 d 2
population means. (One Population) sd = d2 −
s
x ± t α,n−1 n
n−1 n
2 Construct a 95% confidence interval for the
difference of the population means. (Two
Population) σ2 unknown
Population Variance
One Population Ratio of Two Population
σ21
s21 1 𝜎12 s21
σ2 < < Fα,v σ22
s22 Fα 𝜎22 s22 2 2 ,v1
,v ,v
2 1 2
n−1 s2 2 n−1 s2 n−1 s2 n−1 s2 s21 1 s21
<σ < @ , 2 @ , s2 Fα,v
χ2α χ2 α χ2α χ α s22 Fα 2 2 2 ,v1
2 ,v1 ,v2
2 ,n−1 1− 2 ,n−1 2 ,n−1 1− 2 ,n−1
σ1
σ σ2
s21 1 𝜎1 s21
< < Fα,v
s22 Fα 𝜎2 s22 2 2 ,v1
,v1 ,v2
n−1 s2 n−1 s2 n−1 s2 n−1 s2 2
<σ< @ ,
χ2α χ2 α χ2α χ2 α
2 ,n−1 1− 2 ,n−1 2 ,n−1 1− 2 ,n−1 s21 1 s21
@ 2 , Fα,v
s2 Fα s22 2 2 ,v1
,v1 ,v2
2
21
Estimation: Interval Estimation: One Population Mean, μ (Variance σ2 Known)
Assumption
• Random sample
One Population
• The population is normal distributed
Mean and
• Small (n<30) or large (≥30) sample
known σ2
• σ2 /σ is known
Formula
Note: The width of the confidence interval depends on the size of the margin of error which depends on the values of z, σ and n.
However, the value of σ is beyond our control. Therefore, the width of the confidence interval can be controlled either through the
value of z (depends on α) or the size of the sample, n.
Confidence interval and the width of confidence interval
- The larger the confidence level, the wider the confidence interval is and vice versa.
Sample size and the width of confidence interval
-The bigger the size of the sample, the smaller the confidence interval is and vice versa.
Estimation: Interval Estimation: One Population Mean, μ (Variance σ2 Known)
Example 2.9:The following data represent a sample of assets (in millions of RM) of 10 companies in Selangor. Find the 95%
confidence interval of the mean. Assume that the assets (in millions of RM) of all companies in Selangor are approximately
normally distributed and the standard deviation of the population is 21.154
12.23 2.89 13.19 73.25 11.59 8.74 7.92 40.22 5.01 2.27
Below is the output for the analysis of data in Example 2.9 using Minitab software.
One-Sample Z: Assets_Value
Assumption
Formula
• Two samples are independent if they are draw from • Two samples are dependent if they are draw from
two different populations and the elements of first two different populations and the elements of first
sample have no relationship to the elements of the sample have relationship to the elements of the
second sample. second sample.
Example: Example:
• To determine the difference in mean pH of rainfall • To determine the effectiveness of Kevin Zahari’s
in Shah Alam and Klang diet program. Participant’s weight before and after
program is measured
Let μ1 and μ2 be the mean of the first and second population respectively. We want to find the confidence interval of the
difference between the two population means μ1 -μ2. Then x1 − x2 is the sample statistic used to make the confidence
interval.
Estimation: Interval Estimation: Two Population Means (Independent: Variance σ2 Known)
Assumption
• Random sample
Two
OnePopulation
Population • The population is normal distributed
MeansMean
and Known
and • Small (n<30) or large (≥30) sample
σ2
unknown σ2 • σ12 and σ22 is known
Formula
σ12 σ22
(x1 −x2 ) ± zα +
2 n1 n2
Estimation: Interval Estimation: Two Population Means (Independent: Variance σ2 Known)
Example 2.14: An experiment was conducted in which two types of engines, A and B were compared. Gas mileage in miles
per gallon was measured. 75 experiments were conducted using engine type A and 50 experiments were done for engine type
B. The gasoline used and other conditions were held constant. The average gas mileage for engine A was 42 miles per gallon
and the average for engine B was 36 miles per gallon.
Find 96% confidence interval on μA -μB, where μA -μB are population mean gas mileage for engine A and engine B
respectively. Assume that the population standard deviation are 8 and 6 for engine A and B respectively.
(Ans: (3.4240, 8.5760))
Estimation: Interval Estimation: Two Population Means (Independent: Variance σ2 Unknown: σ12 = σ22 )
Formula
Formula
Formula
Two Population The (1-α) 100% confidence level for the mean difference between two
Means , observations from matched samples, μd is
Dependent
Samples sd
d ± t α,n−1
2 n
d
d=
n
Matched or paired samples
1 d 2
Sd = d2 −
• Involve a procedure whereby pairs of n−1 n
observations are matched as close as Where
possible according to certain relevant
characteristics μd = the mean of the paired differences of the population
d = the mean of paired differences of the sample
sd = the standard deviation of the paired difference of sample
n = the number of paired difference values
Estimation: Interval Estimation: Two Population Means (Dependent)
Example 2.19: The manufacturer of a gasoline additive claimed that the use of this additive increases gasoline mileage. A
random sample of six cars was selected and these cars were driven for one week without the gasoline additive and then for
one week with the gasoline additive. The following table gives the miles per gallon for these cars without and with the
gasoline additive. Assume that the population paired difference normally distributed.
Without 24.6 28.3 18.9 23.7 15.4 29.5
With 26.3 31.7 18.2 25.3 18.3 30.9
Construct a 95% confidence interval for the difference in mean mileage per gallon for cars without and with the gasoline
additive and interpret the interval. (Ans: (-3.2150, -0.2184))
.
Estimation: Interval Estimation: Two Population Means (Dependent)
Example 2.20: Many engineering students are having problems in data analysis using statistical software. A professor who
teaches statistics for engineering course offered a two day workshop on this topic. The following table gives the test scores
of seven engineering students before and after they attended the workshop.
Before 56 69 48 74 65 71 58
After 62 73 44 85 71 70 69
The data collected was analysed and the output is shown as follows:
Paired T-Test and CI: before, after
a) Show that 95% confidence interval for the difference in mean tests scores before and after attending the workshop is
between -9.94 and 0.51.
b) Can we conclude whether attending the workshop increases the test score?
Estimation: Interval Estimation: Two Population Means (Dependent)
Example 2.20:
.
Estimation: Interval Estimation: Population Variance
Chi-Square
F- Distribution
Distribution
Characteristics
Method
Statistics
CI for CI for
Variable Method StDev Variance
ski_lift_ticket Chi-Square (3.87, 8.74) (15.0, 76.4)
Bonett (3.35, 10.09) (11.2, 101.8)
Based on the confidence interval in the given output can we conclude that the standard deviation for the price in dollars of
an adult single-day ski lift ticket is differ?
Estimation: Interval Estimation: Ratio of Two Population Variances
Assumption
Two Population
Variances The population is normally distributed.
(F distribution)
Formula
The (1- 𝛼) 100% confidence interval for ratio of two population
Characteristics σ21
variances, is
σ22
s21 1 𝜎12 s21 s21 1 s21
• The F distribution is continuous and < < Fα,v @ , 2 Fα,v
s22 Fα 𝜎22 s22 2 2 ,v1 s22 Fα s2 2 2 ,v1
skewed to the right ,v ,v ,v ,v
2 1 2 2 1 2
• Shape of the F distribution depends on
two numbers of degree of The (1- 𝛼) 100% confidence interval for ratio of two population
σ
freedom(d.f);one for numerator and standard deviation, σ1 is
2
another one for the denominator
• The units of an F distribution are non- s21 1 𝜎1 s21 s21 1 s21
< < F α @ , Fα,v
negative and denoted by Fα,v1,v2 s22 Fα 𝜎2 s22 ,v ,v
2 2 1 s22 Fα s22 2 2 ,v1
,v1 ,v2 ,v1 ,v2
2 2
• where v1 is the d.f for numerator and
v2 is the d.f for denominator
Where v1 = n1 - 1 , v2 = n2 - 1
Estimation: Interval Estimation: Ratio of Two Population Variances
Example 2.23: The manufacturer of a small battery-powered tape recorder decides to include four alkaline batteries with its
product. Two battery suppliers are being considered; each has its own brand (brand 1 and brand 2). The supervising
inspector of incoming quality believes that the battery lifetimes follow a normal distribution with equal variances. A sample
experiment is conducted: each of ten batteries (five of each brand) is connected to a test device that places a small drain on
the battery power and records the battery lifetime the following results (in hours) are obtained:
Brand 1 43 48 38 41 51
Brand 2 30 26 37 31 34
a) Construct a 95% confidence interval on the ratio of the variances of lifetimes of the battery of the two brands. Interpret
the confidence interval obtained. (Ans: (0.1668, 15.3714))
b) Do the interval supports the supervising inspector’s believes that the variances lifetimes of the two brands are equal?
Estimation: Interval Estimation: Ratio of Two Population Variances
Example 2.23:
Estimation: Interval Estimation: Ratio of Two Population Variances
Example 2.24:The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal variances. Show the 95% confidence interval of the ratio of standard
deviations for the two populations are as given in the output.
Test and CI for Two Variances: S1, S3
Statistics
95% CI for
Variable N StDev Variance StDevs
S1 13 8.309 69.038 (5.958, 13.716)
S3 9 6.564 43.092 (4.434, 12.576)
CI for
CI for StDev Variance
Method Ratio Ratio
F (0.618, 2.372) (0.381, 5.626)