Distribution and Estimation

Lecture note 5 Spring 2013
Lecture note 5
Sampling distributions
and Estimation
1
Sampling Variation
• Sample statistic – a random variable whose value depends on which

population items happen to be included in the random sample.
• Depending on the sample size, the sample statistic could either represent the
population well or differ greatly from the population.
• This sampling variation can easily be illustrated.
2
* Consider eight random samples of size n = 5 from a large population of GMAT

scores for MBA applicants
• The sample means ( X ) tend to be close to the population mean (μ =

520.78).
3
Estimators and Sampling Distributions

Some Terminology
• Estimator – a statistic derived from a sample to infer the value of a
population parameter.
• Estimate – the value of the estimator in a particular sample.
Population parameters are represented by Greek letters and the corresponding
statistic by Roman letters
4
Examples of Estimators
5
Sampling Distributions
• The sampling distribution of an estimator is the probability distribution of
all possible values the statistic may assume when a random sample of size
n is taken.
• An estimator is a random variable since samples vary.
• Sampling error = θ θ
6
Bias
Bias is the difference between the expected value of the estimator and the true
parameter
Bias=E θ θ
An estimator is unbiased if E θ θ
• On average, an unbiased estimator neither overstates nor understates the true

parameter.
7
Efficiency
Efficiency refers to the variance of the estimator’s sampling distribution.
A more efficient estimator has smaller variance
8
Consistency
A consistent estimator converges toward the parameter being estimated as the
sample size increases.
9
Sample Mean and Central Limit Theorem
• The sample mean is an unbiased estimator of μ, therefore,
E X)= E X μ
The standard error of the mean is the standard deviation of

the sampling error of X
σ
σ
√n
10
If the population is exactly normal, then the sample mean follows a normal
distribution
11
• For example, the average price, μ, of a 5 GB MP3 player is $80.00 with a

standard deviation, σ, equal to $10.00. What will be the mean and
standard error from a sample of 20 players?
E X)= E X μ=80
σ = =2.236
√ √
• If the distribution of prices for these players is a normal distribution, then the
sampling distribution on x is N(80.00, 2.236).
12
Central Limit Theorem (CLT) for a Mean

If a random sample of size n is drawn from a population with mean μ and
standard deviation σ, the distribution of the sample mean x approaches a normal
distribution with mean m and standard deviation as the sample size increase
√
• If the population is normal, the distribution of the sample mean is normal

regardless of sample size.
13
14
Questions
1. (T or F) The expected value of an unbiased estimator is equal to the parameter
whose value is being estimated
2. (T or F) The efficiency of an estimator depends on the variance of the

estimator’s sampling distribution
3. (T or F) The Central Limit Theorem says that if n exceeds 30, the population
will be normal.
15
4. A certain population has a mean of 529 and a standard deviation of 29.7. Many
samples of size 36 are randomly selected and means calculated.
1) What value would you expect to find for the standard deviation of all these
sample means?
2) What shape would you expect the distribution of all these samples means to
have? Why?
16
Confidence Interval for a Mean ( )

What is a Confidence Interval?
• A sample mean x is a point estimate of the population mean μ.
• A confidence interval for the mean is a range

μ lower < μ < μ upper
• The confidence level is the probability that the confidence interval contains
the true population mean.
• The confidence level (usually expressed as α %) is the area under the curve
of the sampling distribution.
17
Confidence Interval for a Mean ( ) with Unknown

Student’s t Distribution
• Use the Student’s t distribution instead of the normal distribution when the
population is normal but the standard deviation s is unknown and the sample
size is small.
s
X t
√n
• The confidence interval for (unknown ) is

X t <μ X t
√ √
18
Student’s t Distribution
19
• t distributions are symmetric and shaped like the standard normal distribution.
• The t distribution is dependent on the size of the sample.
20
Degrees of Freedom
• Degrees of Freedom (d.f.) is a parameter based on the sample size that is

used to determine the value of the t statistic.
• Degrees of freedom tell how many observations are used to calculate s, less
the number of intermediate estimates used in the calculation.
d.f= v=n-1
21
Example GMAT Scores Again

Here are the GMAT scores from 20 applicants to an MBA program
22
Construct a 90% confidence interval for the mean GMAT score of all MBA
applicants
x 510 , s 73.77
• Since σ is unknown, use the Student’s t for the confidence interval with n =
20 – 1 = 19 d.f.
• First find t0.95 from Appendix D.
23
• The 90% confidence interval is:
• We are 90% certain that the true mean GMAT score is within the interval
481.48 < μ < 538.52.
24
Confidence Interval Width
• Confidence interval width reflects

- the sample size,
- the confidence level and
- the standard deviation.
• To obtain a narrower interval and more precision
- increase the sample size or
- lower the confidence level (e.g., from 90% to
80% confidence)
25
Using Appendix D
• Beyond v = 50, Appendix D shows n in steps of 5 or 10.
• If the table does not give the exact degrees of freedom, use the t-value for the
next lower n.
• This is a conservative procedure since it causes the interval to be slightly
wider.
• For d.f. above 150, use the z-value.
26
STATA Practice
Data file: rent.xls
1. Import the given excel file as a *.dta file
2. Find sample mean and sample variance for “rent” variable.
3. Construct 90% and 95% confidence intervals for population mean of “rent”
variable.
4. Explain which and how statistic is used for ci construction.
27
Questions
1. (T or F) As n increases, the width of the confidence interval will decrease ,
ceteris paribus.
2. (T or F) When the sample distribution is used to construct a confidence interval

for the mean, we would use the Studnts’s t distribution instead of the normal
distribution.
3. (T or F) For a sample size of 20, a 95 percent confidence interval using the t

distribution would be wider than one constructed using the z distribution.
28
4. Given the following information: the sample size n = 20, the sample mean=
75.3, and the population standard deviation is unknown.
A) Find the 99% confidence interval for mean.
B) Are the assumptions satisfied? Explain why.
C) How large a sample should be taken if the population mean is to be

estimated with 99% confidence to within $72? The population has a
standard deviation of $800.
29

Distribution and Estimation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distribution and Estimation

Uploaded by

Copyright:

Available Formats

Lecture note 5 Spring 2013

• Sample statistic – a random variable whose value depends on which

• This sampling variation can easily be illustrated.

* Consider eight random samples of size n = 5 from a large population of GMAT

• The sample means ( X ) tend to be close to the population mean (μ =

Estimators and Sampling Distributions

• On average, an unbiased estimator neither overstates nor understates the true

Efficiency refers to the variance of the estimator’s sampling distribution.

A more efficient estimator has smaller variance

Sample Mean and Central Limit Theorem

• The sample mean is an unbiased estimator of μ, therefore,

The standard error of the mean is the standard deviation of

• For example, the average price, μ, of a 5 GB MP3 player is $80.00 with a

Central Limit Theorem (CLT) for a Mean

• If the population is normal, the distribution of the sample mean is normal

2. (T or F) The efficiency of an estimator depends on the variance of the

Confidence Interval for a Mean ( )

• A confidence interval for the mean is a range

Confidence Interval for a Mean ( ) with Unknown

• The confidence interval for (unknown ) is

• The t distribution is dependent on the size of the sample.

• Degrees of Freedom (d.f.) is a parameter based on the sample size that is

Example GMAT Scores Again

• First find t0.95 from Appendix D.

• The 90% confidence interval is:

Confidence Interval Width

• Confidence interval width reflects

2. Find sample mean and sample variance for “rent” variable.

4. Explain which and how statistic is used for ci construction.

2. (T or F) When the sample distribution is used to construct a confidence interval

3. (T or F) For a sample size of 20, a 95 percent confidence interval using the t

B) Are the assumptions satisfied? Explain why.

C) How large a sample should be taken if the population mean is to be

You might also like