Professional Documents
Culture Documents
Mas S Mohktar Email: Mas - Dayana@um - Edu.my Phone (Office) : 0379677681
Mas S Mohktar Email: Mas - Dayana@um - Edu.my Phone (Office) : 0379677681
Email: mas_dayana@um.edu.my
Phone (office): 0379677681
Sampling distributions
The Central Limit Theorem
Confidence Intervals
Basic statistical concepts:
1. Population is a set of measurements or items of interest
A characteristic of the population is called a parameter
2. Sample is any subset from the population of interest
A characteristic of the sample is called a statistic
That is, we are interested in a particular characteristic of the
population (a parameter)
To get an idea about the parameter, we select a (random)
sample and observe a related characteristic in the sample (a
statistic)
Then, based on assumption of the behavior of this statistic we
make guesses about the related population parameter
This is called inference, since we infer something about the
population
Statistical inference is performed in two ways: Testing of
hypotheses and estimation
Suppose that our focus is on estimating the mean value of some
continuous r.v. of interest
The obvious approach would be to use the mean of the sample
as an estimate of the unknown population mean μ
The quantity X is called an estimator of the parameter μ
Suppose now that one obtains repeated samples from the
population and measures their means𝑥1 , 𝑥2 ,…, 𝑥𝑛
If you consider each sample mean as a single observation, from
a r.v. 𝑋 , then it will have a probability distribution
This is known as the sampling distribution of means of samples
of size n
Given a underlying population with mean μ and standard
deviation σ, the distribution of sample means for sample of size
n has three important properties:
1. The mean of the sampling distribution is identical to the
population mean μ
2. The standard deviation of the sampling distribution is equal to
𝜎
which is known as the standard error of the mean
𝑛
3. Provided that n is large enough, the shape of the sampling
distribution is approximately normal
Cholesterol level in KL males 20-74 years old
The serum cholesterol levels for all 20-74 year-old KL males
has mean μ = 211 mg/100 ml and the standard deviation is = 46
mg/100 ml
That is, each individual serum cholesterol level is distributed
around μ = 211 mg/100 ml, with variability expressed by the
standard deviation
If we select repeated samples of size 25 from the population,
what proportion of the samples will have a mean value of 230
mg/100 ml or above?
If the sample size 25 is large enough, the central limit theorem
states that the distribution of sample means of size 25 is
approximately normal with mean μ =211 mg/100 ml, and the
standard error of the mean is
𝜎 46
= = 9.2 mg/100
𝑛 25
If 𝑥= 230, then we have
From the table, we find the area to the right of z = 2.07 is 0.019
Therefore, only about 1.9% of the samples will have a mean
greater than 230 mg/100 ml
What if?
To calculate the upper and lower cutoff points enclosing the
middle 95% of the means of samples of size n = 25 drawn from
this population we work as follows:
The cutoff points in the standard normal distribution are −1.96
and +1.96
We can translate this to a statement about serum cholesterol
levels
= 193.0 ≤ 𝑥25 ≤ 229.0
Equivalently, we have
or
This means that we are 95% confident that the interval
will cover μ
However, it is important to realize that the estimator 𝑋 is a r.v.,
where the parameter μ is a constant (maybe unknown)
Therefore, the interval is random and has a 95% chance of
covering μ before a sample is selected
Once a sample has been drawn and the CI has been calculated,
either μ is within the interval or not
There is no longer any probability involved
We do not say that there is a 95% probability that μ lies in
this single CI
If 95% is not an acceptably high confidence, we may elect to
construct a 99% confidence interval
Similarly to the last case, this interval will be of the form