You are on page 1of 29

Lecture note 5 Spring 2013

Lecture note 5

Sampling distributions
and Estimation

1
Lecture note 5 Spring 2013

Sampling Variation

• Sample statistic – a random variable whose value depends on which


population items happen to be included in the random sample.

• Depending on the sample size, the sample statistic could either represent the
population well or differ greatly from the population.

• This sampling variation can easily be illustrated.

2
Lecture note 5 Spring 2013

* Consider eight random samples of size n = 5 from a large population of GMAT


scores for MBA applicants

• The sample means ( X ) tend to be close to the population mean (μ =


520.78).

3
Lecture note 5 Spring 2013

Estimators and Sampling Distributions


Some Terminology
• Estimator – a statistic derived from a sample to infer the value of a
population parameter.
• Estimate – the value of the estimator in a particular sample.
Population parameters are represented by Greek letters and the corresponding
statistic by Roman letters

4
Lecture note 5 Spring 2013

Examples of Estimators

5
Lecture note 5 Spring 2013

Sampling Distributions
• The sampling distribution of an estimator is the probability distribution of
all possible values the statistic may assume when a random sample of size
n is taken.
• An estimator is a random variable since samples vary.

• Sampling error = θ θ

6
Lecture note 5 Spring 2013

Bias
Bias is the difference between the expected value of the estimator and the true
parameter

Bias=E θ θ

An estimator is unbiased if E θ θ

• On average, an unbiased estimator neither overstates nor understates the true


parameter.

7
Lecture note 5 Spring 2013

Efficiency

Efficiency refers to the variance of the estimator’s sampling distribution.

A more efficient estimator has smaller variance

8
Lecture note 5 Spring 2013

Consistency
A consistent estimator converges toward the parameter being estimated as the
sample size increases.

9
Lecture note 5 Spring 2013

Sample Mean and Central Limit Theorem

• The sample mean is an unbiased estimator of μ, therefore,

E X)= E X μ

The standard error of the mean is the standard deviation of


the sampling error of X

σ
σ
√n

10
Lecture note 5 Spring 2013

If the population is exactly normal, then the sample mean follows a normal
distribution

11
Lecture note 5 Spring 2013

• For example, the average price, μ, of a 5 GB MP3 player is $80.00 with a


standard deviation, σ, equal to $10.00. What will be the mean and
standard error from a sample of 20 players?

E X)= E X μ=80

σ = =2.236
√ √

• If the distribution of prices for these players is a normal distribution, then the
sampling distribution on x is N(80.00, 2.236).

12
Lecture note 5 Spring 2013

Central Limit Theorem (CLT) for a Mean


If a random sample of size n is drawn from a population with mean μ and
standard deviation σ, the distribution of the sample mean x approaches a normal
distribution with mean m and standard deviation as the sample size increase

• If the population is normal, the distribution of the sample mean is normal


regardless of sample size.

13
Lecture note 5 Spring 2013

14
Lecture note 5 Spring 2013

Questions
1. (T or F) The expected value of an unbiased estimator is equal to the parameter
whose value is being estimated

2. (T or F) The efficiency of an estimator depends on the variance of the


estimator’s sampling distribution

3. (T or F) The Central Limit Theorem says that if n exceeds 30, the population
will be normal.

15
Lecture note 5 Spring 2013

4. A certain population has a mean of 529 and a standard deviation of 29.7. Many
samples of size 36 are randomly selected and means calculated.
1) What value would you expect to find for the standard deviation of all these
sample means?
2) What shape would you expect the distribution of all these samples means to
have? Why?

16
Lecture note 5 Spring 2013

Confidence Interval for a Mean ( )


What is a Confidence Interval?
• A sample mean x is a point estimate of the population mean μ.

• A confidence interval for the mean is a range


μ lower < μ < μ upper

• The confidence level is the probability that the confidence interval contains
the true population mean.
• The confidence level (usually expressed as α %) is the area under the curve
of the sampling distribution.

17
Lecture note 5 Spring 2013

Confidence Interval for a Mean ( ) with Unknown


Student’s t Distribution
• Use the Student’s t distribution instead of the normal distribution when the
population is normal but the standard deviation s is unknown and the sample
size is small.

s
X t
√n

• The confidence interval for (unknown ) is


X t <μ X t
√ √

18
Lecture note 5 Spring 2013

Student’s t Distribution

19
Lecture note 5 Spring 2013

• t distributions are symmetric and shaped like the standard normal distribution.

• The t distribution is dependent on the size of the sample.

20
Lecture note 5 Spring 2013

Degrees of Freedom

• Degrees of Freedom (d.f.) is a parameter based on the sample size that is


used to determine the value of the t statistic.
• Degrees of freedom tell how many observations are used to calculate s, less
the number of intermediate estimates used in the calculation.

d.f= v=n-1

21
Lecture note 5 Spring 2013

Example GMAT Scores Again


Here are the GMAT scores from 20 applicants to an MBA program

22
Lecture note 5 Spring 2013

Construct a 90% confidence interval for the mean GMAT score of all MBA
applicants

x 510 , s 73.77

• Since σ is unknown, use the Student’s t for the confidence interval with n =
20 – 1 = 19 d.f.

• First find t0.95 from Appendix D.

23
Lecture note 5 Spring 2013

• The 90% confidence interval is:

• We are 90% certain that the true mean GMAT score is within the interval
481.48 < μ < 538.52.

24
Lecture note 5 Spring 2013

Confidence Interval Width

• Confidence interval width reflects


- the sample size,
- the confidence level and
- the standard deviation.
• To obtain a narrower interval and more precision
- increase the sample size or
- lower the confidence level (e.g., from 90% to
80% confidence)

25
Lecture note 5 Spring 2013

Using Appendix D
• Beyond v = 50, Appendix D shows n in steps of 5 or 10.
• If the table does not give the exact degrees of freedom, use the t-value for the
next lower n.
• This is a conservative procedure since it causes the interval to be slightly
wider.
• For d.f. above 150, use the z-value.

26
Lecture note 5 Spring 2013

STATA Practice
Data file: rent.xls
1. Import the given excel file as a *.dta file

2. Find sample mean and sample variance for “rent” variable.

3. Construct 90% and 95% confidence intervals for population mean of “rent”
variable.

4. Explain which and how statistic is used for ci construction.

27
Lecture note 5 Spring 2013

Questions
1. (T or F) As n increases, the width of the confidence interval will decrease ,
ceteris paribus.

2. (T or F) When the sample distribution is used to construct a confidence interval


for the mean, we would use the Studnts’s t distribution instead of the normal
distribution.

3. (T or F) For a sample size of 20, a 95 percent confidence interval using the t


distribution would be wider than one constructed using the z distribution.

28
Lecture note 5 Spring 2013

4. Given the following information: the sample size n = 20, the sample mean=
75.3, and the population standard deviation is unknown.
A) Find the 99% confidence interval for mean.

B) Are the assumptions satisfied? Explain why.

C) How large a sample should be taken if the population mean is to be


estimated with 99% confidence to within $72? The population has a
standard deviation of $800.

29

You might also like