You are on page 1of 26

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 1

UNIT 4

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

OBJECTIVES

GENERAL OBJECTIVE Use Statistics to make estimates of parameters SPECIFIC OBJECTIVE After completing this unit, you should be able to Find the confidence interval for the mean when is known. Determine the minimum sample size for finding a confidence interval for the mean. Find the confidence interval for the mean when is unknown and n < 30 . Estimate the population parameters based on a large sample size using point and interval estimates and able to explain the concept of confidence interval Estimate the mean of the population when the standard deviation of the population is known Estimate the mean and standard deviation of a population from sample data Estimate the mean of a population based on a small sample size

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 2

INPUT

4.0

CONFIDENCE INTERVAL AND SAMPLE SIZE

One aspect of inferential statistics is estimation, which is the process of estimating the value of a parameter from information obtained from a sample. Look at the following statements: One out of four Polytechnic students is currently dieting 72% of Malaysians cannot afford to buy a brand new Mercedes Benz The average kindergarten students has seen more than 5000 hours of television The average amount of pocket money for a Poly student is RM500 per semester Since the populations from which these values were obtained are large, these values are only estimates of the true parameters and are derived from data collected from samples. The statistical procedure for estimating the populations mean, variance and standard deviation will be explained in this module. An important question in estimation is that of sample size. How large should the sample be in order to make an accurate estimate?

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 3

4.1 CONFIDENCE INTERVALS FOR THE MEAN (

Known or n 30)

Suppose a Poly director wishes to estimate the average age of the students attending classes this semester. The director could select a random sample of 100 students and find the average age of these students, say 22.3 years. From the sample mean, the director could infer that the average age of all the students is 22.3 years. This type of estimate is called a point estimate. A point estimate is a specific numerical value estimate of parameter. The best point estimate of the population mean is the sample mean.

Sample measures (i.e., statistics) are used to estimate population measures (i.e., parameters). The sample mean is the best estimate of the population mean because the means of samples vary less than other statistics such as medians and modes when many samples are selected from the same population.

A good estimator should satisfy the three properties described next. Three properties of good estimator The estimator should be unbiased estimator. That is, the expected value or the mean of the estimates obtained from samples of a given size is equal to the parameter being estimated. The estimator should be consistent. For a consistent estimator, as sample size increases, the value of the estimator approaches the value of the parameter estimated. The estimator should be a relatively efficient estimator. That is, of all the statistics that can be used to estimate a parameter, the relatively efficient parameter has the smallest variance.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 4

4.1.1 CONFIDENCE INTERVALS As stated in the previous module, the sample mean will be, for the most part, somewhat different from the population mean due to sampling error. Then, how good is point estimate? As the accuracy of a point estimate is questionable, statisticians use another type of estimate called an interval estimate.

An interval estimate of a parameter is an interval or a range of values used to estimate the parameter. This estimate may or may not contain the value of the parameter being estimated. For example, an interval estimate for the average age of all the students might be 26.9< <27.7, or 27.3 0.4 years. Either the interval contains the parameter or it does not. A degree of confidence (usually %) can be assigned before an interval estimate is made. For instance, one may wish to be 95% confident that the interval contains the true population mean. Another question then arises. Why 95%? Why not 99% or 99.5%? If one desires to be more confident (99% or 99.5%), then the interval must be larger. For example, a 99% confidence interval for the mean age of the Poly students might be 26.7< <27.9, or 27.3 0.6. Hence, a trade-off occurs. To be more confident that the interval contains the true population mean, one must make the interval wider.

The confidence level of an interval estimate of a parameter is the probability that the interval estimate will contain the parameter. A confidence interval is a specific interval estimate of a parameter determined by using data obtained from a sample and by using the specific confidence level of the estimate.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 5

Intervals constructed in this way are called confidence intervals. Three common confidence intervals are used: The 90%, the 95%, and the 99% confidence interval. The central limit theorem states that when the sample size is large, approximately 95% of the sample means will fall within 1.96 standard errors of the population mean. That is
1.96
n

Now, if a specific mean is selected, say X , there is a 95% probability that


. Likewise there is a 95% probability that n will contain . Stated another way, the interval specified by X 1.96 n

it falls within the range of 1.96

X 1.96 < n

< X +1.96
n

Hence, on can be 95% confident that the population mean is contained within that interval when the values of the variable are normally distributed in the population. Since other confidence intervals are used in statistics, the symbol Z is used in 2 the general formula for confidence intervals. The Greek letter (alpha) represents the total area in both of the tails of the standard normal distribution curve.

2 represents the area in each one of the tails.

The relationship between and the confidence level is that the stated confidence level is the percentage equivalent to the decimal value of 1 - , and vice versa. When the 95% confidence interval is to be found, = 0.05, since 1 0.05 = 0.95. When = 0.01, the 1 - = 1 0.01 = 0.99, and the 99% confidence interval is being calculated.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 6

Formula for the Confidence Interval of the Mean for a Specific < < For a 95% confidence interval, = 1.96; and for a 99% confidence interval, = 2.58

is called the maximum error of estimate. For a specific value, The term z 2

say = 0.05, 95% of the sample means will fall within this error value on either side of the population mean. The maximum error of estimate is the maximum likely difference between the point estimate of a parameter and the actual value of the parameter.

Example 4.1 1. One of the Polytechnic directors wishes to estimate the average age of the students currently enrolled. Per last year record, it is known that the standard deviation is 2 years. A sample of 50 students is selected of which the mean age is 23.2 years. Find the 95% confidence interval of the population mean. A well known tonic drink is known to increase the pulse rate of its users. The standard deviation of the pulse rate is known to be 5 beats per minute. A sample of 30 users had an average pulse rate of 104 beats per minute. Find the 99% confidence interval of the true mean. A sample of 30 koperasi has the mean ( X ) = 11.091 (assets in millions of RM) and the standard deviation s = 14.405. Find the 90% confidence interval of the mean.

2.

3.

4. It is required to determine the mean diameter of a long length of wire. The

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 7

diameter of the wire is measured in 16 places selected at random throughout its length and the mean of these values is 0.314 mm. If the standard deviation of the diameter of the wire is given by the manufacturers as 0.025 mm, determine (a) the 80% confidence interval of the estimated mean diameter of the wire, and (b) with what degree of confidence it can be said that the mean diameter is 0.314 0.01 mm.

Solution to Example 4.1


1. Since the 95% confidence interval is desired, substituting in the formula X 1.96 23.2 1.96
Z = 1.96
2

. Hence,

< < X + 1.96 n n

2 2 < < 23 .2 + 1.96 50 50

23.2 0.6< < 23.2 + 0.6 22.6 < < 23.8 or 23.2 0.6 years. The director can say, with95% confidence, the average age of the students is between 22.6 and 23.8 years, based on 50 students.
Z = 2.58
2

2. Since the 99% confidence interval is desired,


X 1.96 < < X + 1.96 n n

5 5 104 2.58 < < 104 + 2.58 30 30 104 2.4 < <104 + 2.4 101 .6 < <106 .4 102 < < 106 or 104 .2 2

One can be 99% confident that the mean pulse rate of all users is between 102 and 106 beats per minute, based on a sample of 30 users.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 8

3. STEP 1 It is known that the mean ( X ) is 11.091 and the standard deviation (s) = 14.405 Find

STEP 2

2 . Since the 90% confidence interval is to be used,


= 0.05

= 1 0.90 = 0.10, and 2 = 0.10 2


STEP 3 STEP 4

Find z . Subtract 0.05 from 0.5000 to get 0.4500. The 2 corresponding z from the table is 1.65. Substitute in the formula s s X z < < X + z n n (s is used in place of when is unknown, since n 30 )
2 2

14 .405 11 .091 1.65 30

14 .405 < < 11 .091 + 30

6.752 L < <15 .430

Hence, one can be 90% confident that the population mean of the assets is between RM6.752 million and 15.430 million, based on a sample of 30 koperasi.

4. (a) For the population: = 0.025 mm, for the sample: N = 16, x = 0.314, because an infinite number of measurements can be obtained for the diameter of the wire, the population is infinite and the estimated value of the confidence interval of the population mean is given by

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 9

mm.

X z 2 n

= X z = 0.314 1.28 (0.025 ) = 0.314 0.008


2

16

That is, the 80% confidence interval is from 0.306 mm to 0.322mm. This indicates that the estimated mean diameter of the wire is between 0.306 and 0.322 and that this prediction is likely to be correct 80 times out of 100. b) To determine the confidence level, the given data is equated to expression
, i.e z = 1.6 0.314 0.01 mm = X z 2 n
2

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 10

ACTIVITY 4A

TEST YOUR UNDERSTANDING BEFORE PROCEEDING TO THE NEXT INPUT! 1. What is the difference between a point estimate and an interval estimate of a parameter? Which is better? Why? 2. What is the maximum error of estimate? 3. What are the three properties of a good estimator? 4. What is necessary to determine a sample size? 5. Find each: a) b) c) d) e) 6.

z for the 99% confidence interval 2 z for the 98% confidence interval 2 z for the 95% confidence interval 2 z for the 90% confidence interval 2 z for the 94% confidence interval 2

A sample of the mathematics test scores for 35 first semester students has a mean of 82. The standard deviation of the sample is 15. a) Find the 95% confidence interval of the mean test scores of the entire first semester students. b) c) Find the 99% confidence interval of the mean test scores of the entire first semester students. Which interval is larger? Explain why.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 11

SOLUTION TO ACTIVITY 4A

1.

A point estimate of a parameter specifies a specific value, such as = 87; an interval estimate specifies a range of values for the parameter, such as 84< <90. The advantage of an interval estimate is that a specific confidence level (say 95%) can be selected, and one can be 95% confident that the interval contains the parameter that is being estimated. The maximum error of estimate is the likely range of values to the right or left of the statistic which may contain the parameter. A good estimator should be unbiased, consistent, and relatively efficient. For one to be able to determine sample size, the maximum error of estimate and degree of confidence must be specified and the population standard deviation must be known. a) 2.58 b) 2.33 c) 1.96 d) 1.65 e) 1.88 a) 77< <87 b) 75<<89 c) The 99% confidence interval is larger because the confidence level is larger.

2. 3. 4.

5.

6.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 12

INPUT

4.2

SAMPLE SIZE

Sample size determination is closely related to statistical estimation. How large a sample necessary to make an accurate estimate? The answer depends on three things: the maximum error of estimate, the population standard deviation, and the degree of confidence. For the purpose of this unit, it is assumed that the population standard deviation of the variable is known or has been estimated from the previous study. The formula for sample size is derived from the maximum error of estimate formula,
E = z and this formula is solved for n as follows: 2 n

z . n= 2 E

Example 4.2
You are asked to estimate the average age of the students in this Poly. How large a sample is necessary? You want to be confident that the estimate should be accurate within one year. From the previous study, the standard deviation of the ages is known to be 3 years.

Solution to Example 4.2

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 13

Since = 0.01 or 1 0.99, z 2 = 2.58, and E = 1, substituting in the formula, you get 2 2 z . 2 = ( 2.58 )( 3) = 59.9 which is rounded up to 60. Well, you n= E 1 need a sample size of at least 60 students in order to be 99% confident that the estimate is within 1 year of the true mean age.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 14

ACTIVITY 4B

TEST YOUR UNDERSTANDING BEFORE PROCEEDING TO THE NEXT INPUT! 1. A study of 40 poly lecturers showed that they spent, on average, 12.6 minutes correcting a students weekly quiz. a) Find the 90% confidence interval of the mean time for all quizzes when = 2.5 minutes. b) If a lecturer stated that he spent, on average, 30 minutes correcting a quiz, what would be your reaction? 2. A study found that Poly students spend an average of RM185.00 per month for the cellular phone bills. If a sample of 49 students was used, find the 90% confidence interval of the mean. Assume the standard deviation of the sample is RM1.56. The mean weight of 84 soil samples is 61.2 grams and the standard deviation is 7.9 grams. Find the 95% confidence interval for the true mean. A poly director wishes to estimate the average number of hours his part-time lecturers teach per week. The standard deviation from the previous study is 2.6 hours. How large a sample must be selected if he wants to be 99% confident of finding whether the true mean differs from the sample mean by 1 hour?

3. 4.

5. You are required to estimate the fresh weights of concrete cubes. How large a sample must be selected if you are required to be 90% confident that the true mean is within 600 grams of the sample mean? The standard deviation of the fresh weights is known to be 800 grams. 6. A class lecturer would like to estimate the average number of sick days that students use per year. It is assumed that the standard deviation is 2.5 days. How large a sample must be selected if the lecturer wants to be 95% confident of getting an interval that contains the true mean with a maximum error of 1 day?

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 15

SOLUTION TO ACTIVITY 4B

1. (a) 11.9 < < 13.3

(b) It would be highly unlikely, since this is far larger than 13.3

2. RM18.13 < < RM18.87 3. 59.5 < < 62.9 4. 45 5. 5 6. 25

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 16

INPUT

4.3

CONFIDENCE INTERVALS FOR THE MEAN (

unknown and n<30)

When is known and the variable is normally distributed or when is unknown and n 30, the standard normal distribution is used to find confidence intervals for the mean. However, in many situations, the population standard deviation is not known and the sample size is less than 30. In such situations, the standard deviation from the sample can be used in place of the population standard deviation for confidence intervals. But somewhat different distribution, called the t distribution, must be used when the sample size is less than 30 and the variable is normally or approximately distributed. Characteristics of the t Distribution The t distribution shares some characteristics of the normal distribution and differs fro it in others. The t distribution is similar to the standard normal distribution in the following ways. 1. 2. 3. 4. It is bell-shaped. It is symmetrical about the mean. The mean, median, and mode are equal to 0 and are located at the center of the distribution. The curve never touches the x-axis.

The t distribution differs from the standard normal distribution in the following ways. 1. 2. The variance is greater than 1. The t distribution is actually a family of curves based on the concept of degrees of freedom, which is related to the sample size.

3. As the sample size increases, the t distribution approaches the Standard normal distribution

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 17

See the figure below.

Many statistical distributions use the concept of degrees of freedom, and the formulas for finding the degrees of freedom vary for different statistical tests. The degrees of freedom are the number of values that are free to vary after a sample statistic has been computed, and they tell researcher which specific curve to use when a distribution consists of a family of curves. For example, if the mean of 5 values is 10, then 4 of the 5 values are free to vary. But once 4 values are selected, the 5th value must be a specific number to get a sum of 50. Since 50/5 = 10. Hence, the d.f. are 5 1 = 4, and this value tells the researcher which t curve to use. The symbol d.f. will be used for degrees of freedom. The d.f. for a confidence interval for the means are found by subtracting 1 from the sample size, i.e d.f. = n 1. Formula for a Specific Confidence Interval for the Mean When Is Unknown and n<30
s s X t < < X + t 2 2 n n

The degrees of freedom are n - 1 Notes: When to use the z or t distribution

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 18

Yes Is known No Yes Is n No Use values and s in the formula Use values and s in place of In the formula Use values no matter what the sample size is

Example 4.3
Find the t value for a 95% confidence interval when the sample size is 2 22. 2. Ten randomly selected automobiles were stopped, and the tread depth of the right front tire was measured. The mean was 0.32 mm, and the standard deviation was 0.08 mm. Find the 95% confidence interval of the mean depth. Assume that the variable is approximately normally distributed. The data represent a sample of the number of home fires started by candles for the past several years. Find the 99% confidence interval for the mean number of home fires started by candles each year. 5460 9930 5900 6090 6310 7160 8440

1.

3.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 19

Solution to Example 4.3 1. d.f. = 22 -1, or 21. Find 21 in the left column and 95% in the row labeled confidence intervals. The intersection where the two meet give the value for t , which is 2.080. See the figure below. Note: At the bottom of the 2 table where d.f. =

, the z

can be found for specific confidence

intervals. The reason is that as the degrees of freedom increase, the t distribution approaches the standard normal distribution. 2. Since

is unknown and s must replace it, the t distribution (see table

F) must be used for 95%. Hence with 9 degrees of freedom, t = 2.262. 2 The 95% confidence interval of the population mean is found by substituting in the formula
s s 0.08 , 0.26 < < 0.38 X t < < X + t = = 0.32 ( 2.62 ) 2 2 10 n n

Therefore, one can be 95% confident that the population mean tread depth of all right front tires is between 0.26 and 0.38 mm based on a sample of 10 tires.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 20

3. STEP 1 Find the mean and standard deviation for the data Use the formulas or your calculator The mean X = 7041.4 The standard deviation s = 1610.3 Find t from table F. Use the 99% confidence interval with 2 d.f. = 6. It is 3.707. STEP 3 Substitute in the formula and solve
s s X t < < X + t 2 2 n n
1610 .3 1610 .3 7041 .4 3.707 < < 7041 .4 + 3.707 7 7

STEP 2

4785.2< <9297.6

One can be 99% confident that the population mean of home fires started by candles each year is between 4785.2 and 9297.6, based on a sample of home fires occurring over a period of 7 years.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 21

ACTIVITY 4C

TEST YOUR UNDERSTANDING BEFORE PROCEEDING TO THE NEXT INPUT! For the following activities, assume that all variables are approximately distributed. 1. A sample of 8 measurements of the diameter of a bar are made and the mean of the sample is 2.470 cm. The standard deviation of the samples is 0.21 mm. Determine (a) the 95% confidence interval and (b) the 80% confidence interval for an estimate of the actual diameter of the bar. A sample of 15 electric lamps are selected randomly from a large batch and are tested until they fail. The mean and standard deviation of the time to failure are 1177 hours and 25 hours respectively. Determine the confidence level based on estimated failure time of 1177 5.8 hours. The value of the ultimate tensile strength of a material is determined by measurements on 10 samples of materials. The mean and standard deviation of the results are found to be 4.38 Mpa and 0.06 Mpa respectively. Determine the 95% confidence interval for the mean of the ultimate tensile strength of the material. Use data in problem #3 above to determine the 99% confidence interval for the mean of the ultimate tensile strength of the material. The time taken for a chemical reaction to take place is measured 5 times and is found to be: 0.28 hours, 0.30 hours, 0.27 hours, 0.33 hours and 0.31 hours. Determine the 95% and 99% confidence intervals for the estimated true reaction time.

2.

3.

4. 5.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 22

SOLUTION TO ACTIVITY 4C

1. 2.

(a) The 95% confidence interval are 2.455 cm and 2.485 cm. (b) The 80% confidence interval are 2.463 cm and 2.477 cm. It is likely that 80% of all the lamps will fail between 1171.2 and 1182.8 hours. ( t = 0.868). 2

3. 4. 5.

4.417< <4.343 4.324< <4.436 0.275< <0.321; 0.258< <0.338

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 23

SELF ASSESSMENT 4

You are approaching success. Try all the questions in this selfassessment section and check your answers on the next page. If you encounter any problems, consult your instructor. Good luck. 1. 2. When should the t distribution be used to find a confidence interval for the mean? Determine whether the statement is true or false. If the statement is false, explain why. a) Interval estimate are preferred over point estimates since a confidence level can be specified. b) An estimator is consistent if, as the sample size decreases, the value of the estimator approaches the value of parameter estimated.

Select the best answer. 3. When a 99% confidence interval is calculated instead of 95% confidence interval with n being the same, the maximum error of estimate will be a. Smaller b. Larger c. The same d. It cannot be determined

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 24

4.

When the population standard deviation is unknown and sample size is less than 30, what table value should be used in computing a confidence interval for a mean. a. z b. t c. None of the above Complete the following statement with the best answer.

5.

The maximum difference between the point estimate of a parameter and the actual value of the parameter is called__________________________________. The three confidence intervals used most often are the ____%, ______%, and ___%. The specific resistance of a reel German silver wire of nominal diameter 0.5 mm is estimated by determining the resistance of 7 samples of the wire. These were found to have resistance values (in ohms per meter) of 1.12, 1.15, 1.10, 1.14, 1.15, 1.10 and 1.11. Determine the 95% confidence interval for the true specific resistance of the reel of wire. In determining the melting point of a metal, 5 determinations of the melting point are made. The mean and standard deviation of the five results are 232.27oC and 0.742oC. Calculate the confidence with which the prediction the melting point of the metal is between 232.48oC and 233.06oC can be made. The standard deviation of the masses of 500 blocks is 150 kg. A random sample of 40 blocks have a mean mass of 2.40 Mg. a) Determine the 95% and 99% confidence intervals for estimating the mean mass of the remaining 469 blocks, and b) With what degree of confidence can it be said that the mean mass of the remaining 460 blocks is 2.40 0.035Mg?

6. 7.

8.

9.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 25

In the following exercises, assume that all variables are approximately normally distributed. 10. The average hemoglobin for a sample of 20 lecturers was 16 grams per 100 milliliters, with a sample standard deviation of 2 grams. Find the 99% confidence interval of the true mean. A sample of 6 adult elephants had an average weight of 12200 pounds, with a sample standard deviation of 200 pounds. Find the 95% confidence interval of the true mean. A recent study of 28 city residents showed that the mean of the time they had lived at their present address was 9.3 years. The standard deviation of the sample was 2 years. Find the 90% confidence interval of the true mean. A recent study of 25 students showed that they spent an average RM18.53 for petrol per week, the standard deviation of the sample was RM3.00. Find the 95% confidence interval of the true mean. The average yearly income for 28 married couple in Politeknik is RM58219.00. The standard deviation of the sample is RM56.00. Find the 95% confidence interval of the true mean.

11.

12.

13.

14.

STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES

C5606/4/ 26

FEEDBACK TO SELF ASSESSMENT 4

Have you tried the questions??? If YES, check your answers now.

1. The t distribution should be used when 2. (a) True 3. b 4. b 5. Maximum error of estimate 6. 90; 95; 99 7. 1.11 1 < < 1.14 1 m m 8. 95% (b) False

is unknown and n<30.

9. (a) 2.355 < < 2.445; 2.341 < < 2.459 (b) 86% 10. 15< <17 11. 11990< <12410 12. 8.7< <9.9 13. RM 17.29< <RM 19.77 14. RM 58197.00< <RM 58241.00