Interval estimation

ASW, Chapter 8

Economics 224, Notes for October 8, 2008

Central limit theorem – CLT (ASW, 271)
The sampling distribution of the sample mean, x , is approximated by a normal distribution when the sample is a simple random sample and the sample size, n, is large. In this case, the mean of the sampling distribution is the population mean, μ, and the standard deviation of the sampling distribution is the population standard deviation, σ, divided by the square root of the sample size. The latter is referred to as the standard error of the mean.    In symbols, the standard error is x n A sample size of 100 or more elements is generally considered sufficient to permit using the CLT. If the population from which the sample is drawn is symmetrically distributed, n > 30 may be sufficient to use the CLT.

. of elements Mean Standard deviation N μ σ x   n A sample size n of greater than 100 is generally considered sufficiently large to use these results from the CLT.Large random sample from any population Any population Sampling distribution of ͞x when sample is random n μ No.

Probability that a sample mean is within a specified distance of the population mean Population :   2352 and   1485 Random sample of n  50 x has mean   2352 x has standarddeviation of 210 What is the probability that a particular random sample of size n = 50 has a mean that is within $100 of the population mean? See next slide. .

48.• Within $100 of the mean is from 2352 . 2452  2352 2252  2352 Z  0.100 = 2252 to 2352 + 100 = 2452. .48 210 210 For Z = -0.3688. • The required probability is the area under the normal curve between 2252 and 2452. cumulative probability is 0.48 and Z   0.3156 = 0. • Obtain the corresponding Z-values.6844 Required probability is 0. • The sampling distribution of the sample means is normal since the sample size n = 50 is large.48.37.3156 For Z = 0. and µ = 2352 and σ = 210. cumulative probability is 0. The probability that a sample yields a sample mean within $100 of the population mean is 0.6844 – 0.

Standard error for the sample mean • The standard deviation of the sampling distribution of the sample mean is also referred to as the standard error. the probability that a sample mean is within $100 of the population mean is the area under a normal curve between z = -0. With this larger sample size. if n = 200. As n increases. See the next slide for a diagram. for a larger n. or 0. the sample means tend to be closer to the population mean. so the sample means are less variable. or 105. the standard error is 1485 divided by the square root of 200.    x n • In the last example. the standard error decreases. .95 and z = 0. As n increases. That is. there is an increased probability that the sample mean lies within any specified distance from the mean.6578.95.

66 0.97 σ/7.0 Probability of sample mean being within $100 of μ 0.Example of the effect of changing sample size n 50 200 500 1000 x   n Value of standard error when σ = 1485 210 105 66.142 σ/22. The last column shows how there is increased probability that the sample mean is within $100 of the population mean µ as n becomes larger.623 From these calculations.071 σ/14.361 σ/31.37 0. .4 47.87 0. note how the larger sample size produces sampling distributions where the sample mean is generally closer to the population mean μ.

This is termed a confidence level. along with a confidence level. to produce an interval estimate of the parameter. 95%. or 99%. • The size of the margin of error depends on – The type of sampling distribution for the sample statistic. . • Each interval estimate is an interval constructed around the point estimate.Constructing interval estimates of a parameter • The general form for the interval estimate of a population parameter is Point estimate of parameter ± Margin of error • The margin of error is an amount that is added to and subtracted from the point estimate of a statistic. – The percentage of the area under the sampling distribution that a researcher decides to include – usually 90%.

7%.78 and 135. 81. But the data were obtained from a sample so there is sampling error associated with this estimate.5% of decided voters.78.Examples of interval estimates • Statistics Canada reports that mean weekly food expenditures for Prairie households in 2001 were $127. 2008.78.51 per cent. 62554-XIE. . catalogue no. pp. Source: Statistics Canada. Liberal at 17. Source: Leader-Post.” From the Palliser electoral district poll reporting Conservative at 43. Regina. conducted by Sigma Analytics. and Green at 3. 68% of the time and between $119.78. 19 times out of 20. • “The margin of error is estimated to be plus or minus 3. Food Expenditure in Canada 2001. 95% of the time. 70. pp. The “true” value of the mean is between $123. October 3.3%.3%. 16.78 and $131. A1-A2. NDP at 35.

if the estimate of an average expenditure for a given category is $75 and the corresponding CV is 5%. .50. 70 of 62-514XIE).75.Statistics Canada uses the following method For example.25 and $78. (p. 95% of the time. The intervals on for mean food expenditure on the last slide were constructed from this. then the “true” value is between $71.50 and $82. 68% of the time and between $67.

or standard error.Modified FIGURE 8.1 SAMPLING DISTRIBUTION OF THE SAMPLE MEAN AMOUNT SPENT FROM SIMPLE RANDOM SAMPLES OF 100 CUSTOMERS x A sampling distribution of the sample mean for a simple random sample of 100 individuals from a population with a standard deviation of 20. The mean of the sampling distribution of x is the population mean μ and its standard deviation. This distribution can be used to construct an interval estimate of μ. . is 2.

96 • The 95% interval estimate is n n n . For a normal distribution. • Obtain the margin of error associated with the confidence level. • Determine the distribution of the sample mean. If n is large.96 contains 95% of the area under the curve or of the sample means. the interval from Z = -1. See next slide to illustrate this.   x  1 .Constructing an interval estimate for a population mean μ • Obtain the point estimate of μ.96 to Z = 1. The most common level is 95%. the sample mean x. then the Central Limit Theorem can be used and x is normally distributed with mean μ and standard deviation x   • Select a confidence level. that is. 96 to x  1 .

For the general case. .96 is multiplied by the standard error to determine the margin of error.2. the standard error is 2 and the margin of error is 2 x 1. 1. SAMPLING DISTRIBUTION OF x SHOWING THE ¯ LOCATION OF SAMPLE MEANS THAT ARE WITHIN 3.92 Z-values OF μ In this example.Modified FIGURE 8.96 = 3.92.

9 55 57 37 31 Sample size Example of interval estimates .1 Standard deviation 13. Obtained through University of Regina Data Library Services. by age. .I Source: Data for this question adapted from Statistics Canada.1 40. 1st Edition.7 25. Cycle 17: Social Engagement [machine readable data file].5 20. Saskatchewan females employed full-time and full-year.3 45. 2003.9 25.3 40. ON: Statistics Canada [publisher and distributor] 10/1/2004. Ottawa.Statistics of total income. 2003 Age group Income in thousands of dollars Mean 25-34 35-44 45-54 55-64 33. General Social Survey of Canada.

Analysis: The pattern in the samples is clear – increased mean income from ages 25-34 to 45-54.II • Obtain 95% interval estimates for the mean income of all full-time. • Describe the pattern of mean income by age. the data from each of the four age groups is a sample. However. . full-year employed females in Saskatchewan in these age groups. then a decline for ages 55-64.Example of interval estimates . so interval estimates are necessary to comment on whether this pattern appears to hold for all females.

s is used as an estimate of σ.96 (13 . Thus. x  33.96 has 95% of the area under the curve or of the sample means. x is normally distributed with mean μ and standard deviation    n • Select the 95% confidence level. the Central Limit Theorem will be used.5 55 ) to 33 . 36. • The interval is 33 .9) thousand dollars. Z = -1. 96 • The 95% interval estimate is n n • In this example.96 to Z = 1. 96 to x  1 .Example of interval estimates .3 • Since n = 55 is reasonably large. • In a normal distribution.5 55 ) • The margin of error is ±3.96 (13 . as requested. • The point estimate of μ is the sample mean.3  1.   x  1 .III Obtain an interval estimate for the mean income of all females aged 25-34.7.6 and the 95% interval estimate of μ is (29. x .3  1. Call this μ.

9 25.1 Standard deviation 13.7. .3) (31.Example of interval estimates .3 45. now that interval estimates are available.IV Age group Income in thousands of dollars Mean 25-34 35-44 45-54 55-64 33. 36.6 ±5.4 ±8.7) (36.1 (29.9.9 55 57 37 31 ±3.7 25.1 40.3 40. 49.0. • Explain the pattern of mean income by age for all females of each age group.9) (34.2 ±9.5 20.2) Sample Margin size of error 95% interval estimates • Explain why the margins of error differ as they do. 53.9. 45.

.V • The margin of error is greater when s is larger or n is smaller. A larger confidence level produces a larger Z.Example of interval estimates . All these interval estimates have the same Z = ±1. The interval for the 45-54 and 25-34 age groups do not overlap so it is fairly certain that all females aged 25-34 have lower incomes than do all those aged 45-54. meaning that there may not be differences in the mean income for all females of these ages. a larger margin of error.96 associated with the 95% confidence level. • Note that the target or sample populations in this example are not really all Saskatchewan females of each age group. and a wider interval. • The intervals for each of the groups between ages 35 and 64 overlap a lot. but only those employed and employed full-time and full-year.

if a 95% confidence level is selected. • When reporting a confidence interval. make sure you report both the interval and the confidence level.Interpretation of interval estimates • The interval estimate is an interval of values of the sample mean x . We hope that this interval contains the population mean μ. One without the other is meaningless. A particular interval may or may not contain μ but the method employed here means that 95% of intervals are constructed so that they cross the population mean μ.95 that the intervals contain the population mean μ. See following slide for an illustration of this. the probability is 0. 95% confidence intervals for the two poor samples – samples 65 and 171 – in the 192 sample simulation do not contain the population mean). . (For example. • With repeated random sampling.

Thus. where n < 30. it is acceptable to use s as an estimate of σ for purposes of constructing an interval estimate. the t-distribution should be used.Determination of σ • In order to construct an interval estimate. as n increases. For a small sample. it is necessary to obtain some estimate of σ.1 and 8. In addition. • In sections 8. the t-distribution approaches the normal distribution. ASW distinguish methods for when σ is known and unknown. . the variability of the population from which the sample is drawn.2. so long as n > 30. again using s as an estimate of σ . In practice σ is rarely known and in note 1. assume the CLT holds and assume s provides a reasonable estimate of σ. ASW state this. 299. This is required to obtain an    estimate of the standard error of the sample mean x n • Generally. For large sample size. p. the sample standard deviation s is used as an estimate of σ.

• By tradition. But this may increase costs of manufacture and checking for safety. • Issues such as manufacturing products to be safe for human use. the default level is 95%. But for 99% or 99.Selecting a confidence level • There is no one confidence level that is appropriate for all circumstances. foods. 80% or 90%) produce smaller margins of error and seemingly more precise interval estimates. the interval may be very wide.9%+). .9% confidence level. eg. • Greater confidence level means greater certainty that the interval estimate of µ actually contains µ. • Smaller confidence levels (eg. but they are less likely to contain µ. • Use the level requested or the level others have used when researching similar issues. should require high confidence levels (99.

interval estimates are not precise. – The population is not too skewed (note 2. and careful sampling and survey design and practice can improve the quality of the estimates. comparisons with other studies. • As a result. – The sample size is sufficiently large to use the CLT. – The population standard deviation is known or s is a good estimate of σ. – The selection of a confidence level is an arbitrary process. 308). . • Larger n.Cautions about interval estimates • There are many assumptions involved in interval estimation: – The sample is randomly selected from a population. ASW. but are estimates or approximations. repeated sampling.

section 8.3. • Sample size (ASW. 8.4). sections 6.. October 10.6. 1-3 p.m. 8.2). 7. .Next week • t-distribution (ASW. • Extra office hour – Friday. sections 8. CL 237.3) • Interval estimates for proportions (ASW.1.