CHAPTER 1

PARAMETER ESTIMATION

1

12-3

INTRODUCTION
Parameter estimation is the first step in inferential statistics. In other words, it is the process of estimating the value of a parameter using information obtained from a sample. The process that acquires information from samples and used the information to make conclusions about populations is called statistical inference. In order to do statistical inference, we require the skills and knowledge of descriptive statistics, probability distributions, and sampling distributions. The process can be simply as in figure 1.

INTRODUCTION (cont..)
The objective of estimation is to determine the approximate value of a population parameter on the basis of a sample statistic. There are two approaches to parameter estimation which are i) Point estimation Using point estimate will obtain value that is either 100% accurate or 100% different from the true value Note that, true value = parameter value ii) Interval estimation
3

INTRODUCTION (cont..)
Estimator is the statistic used to obtain the point estimate. Estimate is a specific value or range of values used to approximate some population parameter.

Why we estimate? Can we get the exact value from the population?

4

1: Relationship between parameter and statistic 5 .INTRODUCTION (cont.) Population Eg: All UUM students Sampling process Samples are taken at random Part of population unit eg: a number of UUM students census Collect Information/ data survey Collect Information/data Population measurement Sample measurement estimate PARAMETER STATISTICS Figure 1..

POINT ESTIMATION A point estimate is a specific numerical value of a parameter or a single value (or point) used to approximate a population parameter. A point estimator draws inferences about a population by estimating the value of an unknown parameter using a single value or point. 6 .

proportion. variance. µ variance. standard deviation.1: Symbols for parameter and statistics Parameter Statistics/ estimator mean. proportion. standard deviation.POINT ESTIMATION (CONT…) Table 1. mean. 7 .

2: Formulas for statistics Statistics/ estimator Formula Sample mean Sample variance Sample standard deviation Sample proportion 8 .POINT ESTIMATION (CONT…) Table 1.

The characteristic include: un-biasness Consistency relatively efficiency 9 .POINT ESTIMATION (CONT…) Characteristics of Good Estimator The objective of each characteristic good estimator is to obtain an estimator with the sampling distribution mean centered to the parameter being estimated.

POINT ESTIMATION (CONT…) An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter. since: 10 . An estimator is an unbiased estimator for parameter if E( )=θ E. the sample mean is an unbiased estimator of the population mean µ .g.

the variance of smaller.POINT ESTIMATION (CONT…) An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger. as n grows larger. x grows 11 .g. is a consistent estimator of µ because: x That is. E.

the one whose variance is smaller is said to be relatively efficient.g. x since it is relatively efficient 12 compared to the sample median ~ . according to the variances so we choose mean. E.x POINT ESTIMATION (CONT…) If there are two unbiased estimators of a parameter. x . However. both sample median and mean are unbiased estimators of the population mean.

compute the point estimate for the mean weight of the frogs 2. estimate the standard deviation for the weight of the frogs 3. estimate the proportion of frogs that have weight not more than 200 grams 13 .POINT ESTIMATION (CONT…) Example 1: A sample of 10 frogs has been taken at random and the weight (in grams) for each of the frog was recorded and given as below: 250 230 190 200 210 195 225 200 230 240 1.

the estimate of the standard deviation for the weight of the frogs is given by 14 .Solution: i. let represent the mean weight of the frogs ii.

Note: the answer for question i) and ii) can be found directly from your calculator using the mode (SD) function.Solution: iii. Thus. 190 Then. 15 . 200. 200. 195. Those who use calculator model Casio can refer to Appendix 1 for the complete procedure. let be the number of frogs with weight not more than 200 grams and be the point estimate for the proportion of frog with weight not more than 200 grams Frogs with weight not more than 200 grams are.

Example 2: The age of 15 students who came to the recreational club during last weekend are as given below: 8 17 15 15 13 15 10 12 16 16 12 16 17 15 18 Calculate the point estimate of the: i. average age of students ii. variance of age of students iii. proportion of students with age more than 15 years old. 16 .

From a sample of 200 randomly chosen people.Example 3: A research has been done to determine percentage of UUM’s staff living in Jitra. ii. i. 88 of them are living in Jitra. Obtain the point estimate for the percentage of UUM’s staff living in Jitra. Estimate the mean and the standard deviation of the proportion. 17 .

which is less precise but safer. it is recommended to use interval estimator to estimate population parameters. Besides that. 18 . Thus. An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. we have to admit that the point estimate can sometimes gives a value which is 100% different from the true value.Interval Estimation No matter how good is the point estimator is. the point estimators don’t reflect the effects of larger sample sizes.

Interval Estimation (cont…) The value of interval estimator is between lower and upper boundaries. S θˆ is the standard deviation of the estimator k is the distribution of the parameter (distribution can be define based on Central Limit Theorem) () 19 . then the interval estimate is Where. we write the value as Lower bound < population parameter < Upper bound If is the point estimate of parameter given by . Generally.

we can conclude (with some ___% of certainty) that the population parameter of interest is between some lower and upper bounds.Interval Estimation (cont…) Once the interval estimate is obtained. In this section we will discuss the interval estimate for the mean and the proportion and is summarize in Figure 1. 20 .2.

Interval Estimation (cont…) Figure 1.2: Interval estimation 21 .

Interval Estimation (cont…) Interval estimation for mean Generally. the interval estimator for one population mean is given by x − Zα S x < µ < x + Zα S x 2 2 () 2 () or x ± Z α S x () Note: the Z distribution can be replace by t distribution if the condition to use Z distribution is not satisfied. 22 .

3: Central Limit Theorem 23 .Interval Estimation (cont…) To determine whether to use Z or t distribution. we have to follow the Central Limit Theorem Figure 1.

4: Condition to use normal Z distribution 24 .Interval Estimation (cont…) Characteristics of the Z Distribution When the standard deviation of population is known or the sample size taken is more than or equal to 30. the normal Z distribution can be used. Figure 1.

When the population standard deviation is unknown and the sample size is less than 30. 25 .Interval Estimation (cont…) Characteristics of the t Distribution 1. 2. the t distribution with degrees of freedom must be used instead of Z distribution. The degrees of freedom are the number of values that are free to vary after a sample statistic has been computed.

26 . The t distribution is actually a family of curves based on the concept of degrees of freedom. which is related to sample size. The variance is greater than 1. As the sample size increases. i. the t distribution approaches the standard normal distribution.Interval Estimation (cont…) The t distribution differs from the standard normal distribution in the following ways. ii.

5: The Z Normal and t distribution 27 .Interval Estimation (cont…) Figure 1.

Interval Estimation (cont…)

Figure 1.6: t distribution with different degrees of freedom.
28

Interval Estimation (cont…)
When to use the z or t distribution?
Is population std. dev. σ known? Use Z distribution no matter what the sample size is.

Yes

No Is sample size, n > 30? Yes No

* Variable are normally distributed when n<30

Use Z distribution and s in place of σ.

Use t distribution and s in the formula.

** variable are approximately normally distributed

Figure 1.7: Criteria for choosing Z or t distribution

29

Interval Estimation (cont…)
Therefore; the confidence interval for a mean has 3 formulas; 1. confidence interval for a mean with known population standard deviation

x − Zα σ
2

n

< µ < x + Zα σ
2

n

or

30

Interval Estimation (cont…)
2. confidence interval for a mean with unknown population standard deviation, sample size more than or equal to 30 .
x − Zα S
2

S < µ < x + Zα n 2 or

n

31

Interval Estimation (cont…) 3. confidence interval for a mean with unknown population standard deviation. n − 1 S n 2 or 2 . x − tα S < µ < x + tα . sample size less than 30 (n<30).n −1 n 32 .

Interval Estimation (cont…) The graphical view of interval estimate: Width of interval LCL: UCL: Figure 1.8: Graphical view of confidence interval 33 .

There are four commonly used confidence levels… 34 .Interval Estimation (cont…) The probability of ( 1 − α ) is called Confidence Level (or degree of confidence). assuming that the estimation process is repeated a large number of times. The is called significance level or the probability of Type I error will occur. Confidence Level is the relative frequency of times the confidence interval actually does contain the population parameter.

05 0.95 0.6449 0.01 0.02 0.98 0.3323 0.Interval Estimation (cont…) Confidence Level 10.10 0.9600 0.99 0.005 2.025 1.01 2.90 0.05 1.5758 35 .

01 0.10 0.98 0.There are the critical values for t distribution.95 0.05 0.01 0.2498 36 .005 2.02 0. Confidence Level (10.9980 3.025 0.90 0.3534 2.99 df 3 5 7 9 0.05 0.5706 2.

Example 4: A computer company samples demand during lead time over 25 time periods: 235 421 394 261 386 374 361 439 374 316 309 514 348 302 296 499 462 344 466 332 253 369 330 535 334 It is known that the standard deviation of demand over lead time is 75 computers. 37 . Estimate the mean demand over lead time with 95% confidence level in order to set inventory levels.

Example 6: A survey of 30 adults found that the mean age of a person’s primary vehicle is 5. 38 . and the mean is found to be 23. Find the 95% confidence interval of the population mean. From past studies.8 year. the standard deviation is known to be 2 years.Example 5: The president of a large university wishes to estimate the average age of the students presently enrolled. find the 99% confidence interval of the population mean. A sample of 50 students is selected.2 years.6 years. Assuming the standard deviation of the population is 0.

Example 8: Ten randomly selected automobiles were stopped.32 inch and the standard deviation was 0.08 inch. Assume that the variable is approximately normally distributed. Suppose the weights have a normal distribution with variance is 0. The mean was 0. and tread depth of the right front tire was measured. One such sample yields calculate the 90% confidence interval of the population mean. Find the 95% confidence interval of the mean depth.Example 7: A cereal company selects twenty five 12-ounce boxes of corn flakes every 10 minutes and weighs the boxes.04 ounces. 39 .

The mean yield with the new plant food is 3120 pounds of peanuts per acre with a standard deviation of 578 pounds. A new plant food has been developed and is tested on 60 individual plots of land. Find the 95% confidence interval for the mean amount of rainfall during the summer months for the northeast part of the United States. 40 .Example 9: The average production of peanuts in the state of Virginia is 3000 pounds per acre. Interpret the interval.

Example 10: The following daily highs were recorded in the city of Chicago on 20 randomly selected December days. should we use t or Z distribution? Explain. 41 . 32 49 21 32 25 34 25 36 31 38 27 40 22 30 44 28 39 36 18 38 Find a confidence interval for the mean daily high temperature.

The procedures for drawing inferences about proportion are involved the nominal and sometimes ordinal scale (i. attendance (absent. present). proportion or number of success for a specific event. Example of categorical data: gender (male and female). then the problem being investigated has something to do with proportion. examination result (pass and failed). job satisfaction (satisfied and unsatisfied).Interval estimation for proportion Whenever the information is given in percentage. opinion (poor and good).e categorical data). etc. 42 .

n = symbol for the sample proportion = number of sample units that possess the characteristics of interest = sample size. 43 .Interval estimation for proportion The point estimate for the proportion is given by ˆ p= x Where.

Interval estimation for proportion Knowing that: Sample size n is big Both and are greater than or equal to 5 then. the formula to estimate the confidence interval for a proportion is given by ˆ ˆ ˆ ˆ p − Zα S ( p ) < p < p + Z α S ( p ) 2 2 ˆ = p − Zα ˆ = p − Zα ˆ ˆ p(1 − p ) 2 n ˆˆ pq ˆ < p < p + Zα ˆ ˆ p(1 − p ) 2 n 2 ˆˆ pq ˆ < p < p + Zα n n 2 44 ˆ ˆ Where is. p + q = 1 .

the percentage of the workers who are not interrupted three or more times an hour. What is the proportion of individual living in Miami who are obese? Obtain the 95% confidence interval of the proportion of individual living in Miami who are obese and interpret. faxes and etc. message. 168 said they were interrupted three or more times an hour by phone.Example 11: A recent study of 100 people in Miami found 27 were obese. Estimate with 90% confidence level. 45 . Example 12: A survey found that out of 200 workers.

Assuming the data is approximately normally distributed. calculate the point estimator of the proportion of pine trees has been infested. Estimate with 95% confidence the population proportion of successes. we found the proportion of successes to be 48%.Example 13: In a random sample of 500 observations. and find a 95% confidence interval for the proportion of pine trees have been infested 46 . Example 14: A random sample of 1500 pine trees was tested for traces of the Bark Beetle infestation. The result showed that 153 of the trees showed such traces.

Example 15: The quality control manager at Ameen Company claims that the production of model A telephone ‘to be out of control’ when the overall rate of defects exceed 4%. Construct a 98% confidence interval for the proportion of telephone’s defect. The test for a random sample of 150 telephones revealed that 9 of them are defective. He found there were 373 such events of which 259 were successful. 47 . Estimate with 95% confidence the population proportion of all attempted theft of second base that is successful. Example 16: A statistics practitioner working for a major league baseball wants to supply radio and television commentators with interesting statistics. He observed several hundred games and counted the number of time runner on first base attempted to steal second base.

E = Zα σ 2 n 48 . maximum error.Sample size Sample size for Mean Recall back: the interval formula for estimating population mean is x − Zα σ 2 < µ < x + Zα σ n 2 4n 1 2 3 4 Error (E ) note that.

we then can calculate the value of sample size.Sample size (cont…) Sample size for Mean using the maximum Error formula. n which is given by  σ  Zα   2 n= E2 2 2 49 .

Sample size (cont…) Sample size for proportion Recall back: interval formula for estimating population proportion is ˆ p − Zα ˆ ˆ p(1 − p ) 2 ˆ < p < p + Zα n 2 4 44 n 14 2 3 Error ( E ) ˆ ˆ p(1 − p ) 50 .

we then can calculate the value of sample size. E = Zα ˆ ˆ p(1 − p ) 2 n using the maximum Error formula.Sample size (cont…) Sample size for proportion note that. maximum error. n which is given by ˆ (1 − p ) Zα  ˆ  ˆ q  Zα  p p ˆ    2 =  2 n= 2 2 E E 2 2 51 .

the confidence level the sample size. the width of the confidence interval estimate is affected by the population standard deviation.Conclusion In conclusion. 52 .

9: relationship between width and confidence level 53 . the population standard deviation.Conclusion The width of the confidence interval is a function of the confidence level. and the sample size… x − Zα S 2 n S 2 < µ < x + Zα n S 2 n = x ± Zα A larger confidence level produces a wider confidence interval: Figure 1.

54 .10: relationship between width and standard deviations Increasing the sample size decreases the width of the confidence interval while the confidence level can remain unchanged.Conclusion Larger values of standard confidence intervals deviation produce wider Figure 1.

our interest will now be on the difference between two population means. consider this parameter but with two populations.INTERVAL ESTIMATION FOR TWO MEANS Previously… we have discussed the techniques to estimate parameters for one population mean Now. With two populations. 55 .

size: n1 Parameters: and Population 2 Statistics: and Sample.INTERVAL ESTIMATION FOR TWO MEANS Population 1 Sample.11: Independent Population and Samples 56 . size: n2 Parameters: and Figure 1.

INTERVAL ESTIMATION FOR TWO MEANS There are two different types of sample which are: Dependent Samples also called related (or paired) samples occur when the response of the nth person in the second sample is partly a function of the response of the nth person in the first sample. There are two (2) common forms of sample dependency. 57 . Independent Samples are samples that are completely unrelated to one another. before-after and other studies in which the same people are surveyed at different points in time including panel studies. matched-pairs studies in which similar people are surveyed at different points in time.

samples that are completely unrelated to one another. Statistics used is or 58 .INTERVAL ESTIMATION FOR TWO MEANS Interval estimation for difference of two independent means In order to test and estimate the difference between two means. we will consider independent samples. that is. Initially. we draw random samples from each of two populations.

The populations from which the samples were obtained must be normally distributed.INTERVAL ESTIMATION FOR TWO MEANS Interval estimation for difference of two independent means Two assumptions need to be fulfilled in order to determine the difference between two independent means: The samples must be independent of each other. 59 . that is. there can be no relationship between the subjects in each sample.

) There are four (4) different formulas to estimate the confidence level for the difference between two independent means..Interval estimation for difference of two independent means (cont. Confidence interval when both population variance (or standard deviation) are known 60 . which are: i.

Confidence interval when both population variance (or standard deviation) are unknown but both sample sizes are more or equal to 30 iii. Confidence interval when both population variance (or standard deviation) are unknown. any one or both sample sizes less than 30 and both population variances are assume equal 61 .) ii..Interval estimation for difference of two independent means (cont.

any one or both sample sizes less than 30 and both population variances are assume unequal As in interval estimator for one mean. Confidence interval when both population variance (or standard deviation) are unknown..) iv. same situation should be consider in deciding the formula to use to determine the difference between two means 62 .Interval estimation for difference of two independent means (cont.

Figure 1. Yes Conduct equal variances t-test.12: Flow diagram for choosing the correct distribution and Are both known? No Are both n1 & n2 > 30? No Use tα/2 values and s in the formula. Yes * Variable must be normally distributed when n<30 Use zα/2 values and s in place of σ. ? Is No Use tα/2 values with Yes Use tα/2 values with pooled variance estimator. ** variable must be approximately normally distributed Use zα/2 values no matter what the sample size is. 63 .

** variable must be approximately normally distributed Yes * Variable must be normally distributed when n<30 Yes Conduct equal variances t-test.Figure 1.13: Flow diagrams for choosing the correct confidence interval formula and Are both known? No Are both n1 & n2 > 30? No Use tα/2 values and s in the formula. ? Is No Yes 64 .

65 . The following statistics regarding their scores in a final exam were obtained.Example 17: Two random samples of 40 students were drawn independently from two normal populations. Construct a 95% confidence interval for the difference between the means.

according to Central Limit Theorem. However. The 95% confidence interval for the difference between the means is 66 . since both sample sizes are large enough (both ). the means follow Normal distribution.Solution The populations’ standard deviations are unknown.

40. Construct a 99% confidence interval for the difference between the mean amount spent by all male and all female customers at this supermarket and interpret the interval.Example 18: A random sample of 22 male customers who shopped at this supermarket showed that they spent an average of RM80 with standard deviation of RM17. 67 . While a random sample of 20 female customers who shopped at the same supermarket showed that they spent an average of RM96 with standard deviation RM14. Assume that the amount spent at this supermarket by all the male and female customers are normally distributed with equal but unknown standard deviation.50.

Construct a 90% confidence interval for different of mean. Course 1 Course 2 14 20 21 18 17 22 14 15 17 23 19 21 20 19 16 15 68 . mining. and manufacturing firms have instituted safety courses. A company is trying to decide which one of two courses to institute.Example 19: Because of the rising costs of industrial accidents. Assume that the scores are normally distributed. Each employee takes a test. The safety test results are shown below. To help make a decision eight employees take Course 1 and another eight take Course 2. Employees are encouraged to take these courses designed to heighten safety awareness. which is graded out of a possible 25. many chemical.

88 hours.01 hours and 2. A sample of 321 children in Bandar B and 94 children in Bandar A give the mean of 3. while the population standard deviation for the children in Bandar A is 1.Example 20: Random samples of children sent to kindergarten aged 4 to 6 years in Bandar A and B were taken to find the number of hours spend for outdoor activities in the kindergarten daily. Find a 95% confidence interval for the difference between the two population means. respectively. From past studies the population standard deviation for the children in Bandar B is assumed to be 1.09. 69 .01.

Interval estimation for the difference between two proportions We will now look at procedures for drawing inferences about the difference between populations whose data are nominal (i. With nominal data. 70 . the parameter to be estimated in this section is the difference between two population proportions: p1–p2. we can calculate the proportions of occurrences of each type of outcome. categorical). Thus.e.

for both sample the conditions ˆ ˆ ˆ ˆ n1 p1 ≥ 5. n1 (1 − p1 ) ≥ 5 and n2 (1 − p2 ) ≥ 5 must be satisfied.Interval estimation for the difference between two proportions (cont…) Assumptions for doing Inferences about two proportions i. 71 . We have proportions from two independent simple random samples. In order to use Normal Z distribution. ii. n2 p2 ≥ 5.

72 . x1 x2 ˆ ˆ and p 2 = p1 = n1 n2 ˆ ˆ ( p1 − p2 ) is an unbiased estimator for ( p1 − p2 ) . we take samples of population.Interval estimation for the difference between two proportions (cont…) To draw inferences about the parameter . calculate the sample proportions and look at their difference.

Interval estimation for the difference between two proportions (cont…) The confidence interval estimator for (p1–p2) is given by:  p1q1 p2q2   p1q1 p2q2  ˆˆ ˆ ˆ ˆˆ ˆ ˆ ˆ ˆ ˆ ˆ ( p1 − p2 ) − zα  + ≤ p1 − p2 ≤ ( p1 − p2 ) + zα    n   n + n   n2  1 1 2  2 2 ˆ ˆ ( p1 − p2 ) ± zα ˆ ˆ ˆ ˆ p1q1 p2 q2 + n1 n2 2 73 .

Version one (bright colors) was distributed in one supermarket.Example 21: A Consumer Packaged Goods (CPG) company has testing the marketing of two new versions of soap packaging. while version two (simple colors) was in another. 74 . Construct a 95% confidence interval for the difference between the two proportions of successes of packaged soap sales.

75 . Of 260 female respondents. Among the questions asked was. “Do you enjoy shopping?” Of 240 male respondents. 224 answered yes. 136 answered yes. Construct a 95% confidence interval estimate of the difference between the proportion of males and females who enjoy shopping.Example 22: A random sample of 500 respondents was selected in a large city to determine information concerning consumer behavior.

SPSS NOTES FOR OBTAINING THE CONFIDENCE INTERVAL OF MEAN Step 1 : Select Analyze Menu → Select Descriptive Statistics 76 .

Step 2 : Click on Explore → Select the appropriate variable Step 3 : Click on the button into Dependent List box List of Variable(s) Make your choice Make your choice 77 .

click on Continue→ Click on OK 78 . eg: Descriptive You can change the degree of confidence (Usually use 90% and above) Step 5 : Then.Step 4 : Click on Statistics → Select the appropriate statistics.

If the times are normally distributed with a standard deviation of 5. 15. 12.Example A random sample of 10 university students was surveyed to determine the amount of time spent weekly using a personal computer. 14. 79 .2 hours. 6. and 3. 5. 10. The times are: 13. estimate with 90% confidence the mean weekly time spent using a personal computer by all university students. 7. 8.

92 11.68.00 16.900 4.92 and 11.334 At 90% confidence level.396 Std.687 1.300 Lower Bound Upper Bound . 80 . the mean weekly time spent using a personal computer by all university students is between 6.68 9.Descriptives times Mean 90% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std.33 9.040 -1. Error 1. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 9.111 3 15 12 8 -.30 6.

information regarding the sampling distribution of the statistics is important. Unlike point estimation.SUMMARY A point estimator is a good estimator if it has the qualities of good estimator which are un-biasness. interval estimation involves an interval constructed around the point estimate with a probability of . To construct interval. consistent and relatively efficient. 81 .

SUMMARY (CONT…) The Central Limit Theorem enables us to determine the sampling distribution for the sample statistics based on sample information of the sample size and knowledge of the population variance. If we want to know whether the population means/ proportion equals to certain value. at a given level of confidence. we can conclude that there is evidence to conclude that the mean/ proportion equals to k. 82 . k and the confidence interval for means/ proportion includes the k value.

END OF CHAPTER 1 83 .SUMMARY (CONT…) If the confidence interval for the difference between two means/proportions includes 0 we can say that there is no significant difference (failed to reject) between the means of the two populations. at a given level of confidence.