You are on page 1of 17
Chapter 8 Introduction In the previous chapters, how a parameter of a population can be estimated from sample data using a point estimate or confidence interval was discussed. In many situations there are two competing claims about the value of a parameter, and whichever claim is correct must be determined. This can be done by statistical inference. Inferential statistics is the other branch of statistics which deals with the estimates of population values called parameters and to make statements about computed statistics acceptable to some degree of confidence. Statistical inference is the method concerned with making estimates of population value. This method called hypothesis testing is a help in determining how accurate the generalizations are. This chapter focuses on the basic Principles of hypothesis testing of means, variance and proportion involving a single sample of data. Intended Learning Outcomes Al the end of this module, it is expected that the students will be able to: 1. Test hypotheses on the mean of a normal distribution using either a Z-test or a test. ‘MATH 403- ENGINEERING DATA ANALYSIS rr 2. Test hypotheses on the variance or standard deviation of a normal distribution. 3. Test hypotheses on a population proportion 4, Use the P-value approach for making decisions in hypothesis tests. 8.1, Hypothesis Testing Hypothesis testing is a decision-making process for evaluating the claims about a Population. The goal of this process is to make judgment about the difference between the sample statistics and a hypothesized population parameter. In this process, the researcher must define the population under study, state the hypothesis to be investigated, give the significance level, select a sample, collect data, perform the required test and reach a conclusion. The z fest and f test are statistical tests for hypothesis testing on means while chi-square test is used for testing the standard deviation. Null and Alternative Hypothesis The null hypothesis, denoted as Ho is the statement of equality indicating no existence of relationship between the variables under study. This statement is tested for the purpose of being accepted or rejected. The alternative hypothesis, denoted as Ha is also termed as research hypothesis. It is a statement of the expectation derived from the theory under the study. ‘MATH 403- ENGINEERING DATA ANALYSIS nn ‘MATH 403- ENGINEERING DATA ANALYSIS Type | and Type Il Error In hypothesis testing, there are four possible outcomes, Reject He] Dono reject Ho Fois true Type Terror | Correct decision Hois false | Correct decision |" Type Mlerror A type | error occurs if one rejects the null hypothesis when it is true. It is also referred to as significance level and denoted by the Greek symbol alpha (a). The common values of a are 1%, 5% and 10%. A type Il error occurs if one does not reject the null hypothesis when itis false. Itis denoted by a Greek symbol beta (f) Significance Level and Confidence Interval The level of significance is the maximum probability of committing a type | error. That is, P (type | error) = a. Generally, statisticians agree on using three arbitrary significance levels: 0.10, 0.05 and 0.01 level. Thatis, if the null hypothesis is rejected, the probability of a type | error will be 10%, 5% or 1% and the probability of correct decision will be 90%, 95% or 99%, depending on which level of significance is used. The values of correct decision is the confidence interval which represents the chance of accepting the null hypothesis when in fact itis true ‘MATH 403- ENGINEERING DATA ANALYSIS 8.1.1. One-sided and Two-sided Hypothesis In order to state the hypothesis correctly, the researcher must translate correctly the claim into mathematical symbols. There are three possible sets of statistical hypotheses. 1. Ho: parameter = specific value This is a two-tailed test H1 : parameter + specific value 2. Ho: parameter = specific value This is a left-tailed test Hi: parameter < specific value 3. Ho: parameter = specific value This is a right-tailed test Ht: parameter > specific value 8.1.2. P-value in Hypothesis Tests in hypotheses testing of a discrete test statistic, the critical region may be arbitrarily chosen. If is too large, it can be reduced by making an adjustment in the critical value. It may be necessary to increase the sample size to offset the decrease that occurs automatically in the power of the test. In statistical analysis, it had become customary to choose a significance level of 0.10, 0.05 or 0.01 and the critical region is selected accordingly in which the rejection or non-rejection of the null hypothesis Ho would depend on. For example, if the test is two tailed and a is set at the 0.05 level of significance and the test statistic involves, say, the standard normal distribution, then a z-value is observed from the data and the critical region is z > 1.96 or z < -1.96 where the value 1.96 is found as Zo.02s in the table of Areas Under the Normal Curve. A value of z in the critical region prompts the statement “The value of the test statistic is significant,” which we can then —————— oo ‘MATH 403- ENGINEERING DATA ANALYSIS translate into the user's language. For example, if the hypothesis is given by Ho: y= 12, H1: #12, one might say, “The mean differs significantly from the value 12.” The philosophy that the maximum risk of making a type | error should be controlled is he root of the pre-selection of a significance level. However, this approach does not account for values of test statistics that are “close” to the critical region. Suppose, for example, in the illustration with Ho: y= 12 versus H1: p # 12, a value of z= 1.84 is observed; strictly speaking, with a = 0.05, the value is not significant. But the risk of committing a type | error if one rejects HO in this case could hardly be considered severe. In fact, in a two-tailed scenario, one can quantify this risk as P = 2P (Z > 1.84 when y= 2) = 2(0.0329) = 0.0658. As a result, 0.0658 is the probability of obtaining a value of z as large as or larger in magnitude than 1.84 when in fact y= 12. It is an important information to the user although the evidence against Ho is not as strong as that which would result from rejection at an a = 0.05 level. As a result, the P-value approach has been extensively used in applied statistics. It is designed to have an alternative, in terms of a probability, to a mere “reject” or “do not reject” conclusion. The P-value also gives an important information when the z-value falls well into the ordinary critical region. For example, if z is 2.75, itis informative for the user to observe that P = 2(0.0030) = 0.0060, and thus the z-value is significant at a level considerably less than 0.05. Itis important to know that under the condition of Ho, a value of z = 2.75 is an extremely rare event, That is, a value at least that large in magnitude would only occur 60 times in 10,000 experiments. ‘MATH 403- ENGINEERING DATA ANALYSIS A P-value is the lowest level of significance at which the observed value of the test statistic is significant. It is the smallest level of a. that would lead to rejection of the He with the given data. 8.1.3. General Procedure for Test of Hypothesis The following are the steps in hypothesis testing using the fixed probability of Type | Error approach. 1. 2. 7. State the null and alternative hypothesis. Determine the level of significance and the direction of test. The direction of test will be based on whether the alternative hypothesis is stated as left or right tailed test or as two-tailed test, Determine the appropriate statistical test based on the level of measurement of the data gathered. Write the decision rule expressing on how to accept or reject the null hypothesis. Compute the test statistic and compare with the critical value. The test statistic plays a vital role in rejecting or accepting the null hypothesis. State the decision based on the resulting computed value when compared to the critical value. Draw scientific or engineering conclusion for the given problem. If you will be testing the hypothesis using Significant Testing or the P-value approach, follow these steps: 1. State the null and alternative hypothesis. nn ‘MATH 403- ENGINEERING DATA ANALYSIS 2. Determine the appropriate statistical test based on the level of measurement of the data gathered. 3. Compute the test statistic. 4. Compute the P-value based on the computed value of the test statistic. 5. State the decision based on the resulting P-value and knowledge of the scientific system. 6. Draw scientific or engineering conclusion for the given problem. 8.2. Test on the Mean of a Normal Distribution Variance Known Following the steps in hypothesis testing for only single mean, the hypothesized value referred to as the hypothesized mean (jo). The null hypothesis is stated as: Hot = Ho The alternative hypothesis can be written as: Hi: + Mo Hi: p> po Hi u< po The decision rule is stated as follows: reject the null hypothesis if the absolute value of the test statistic exceeds the critical value. Otherwise, do not reject the null hypothesis. In order to draw inference on a mean in one-population case assuming that the entries are normally distributed and the variance is known, Z-test is used. It can be used when the sample size is equal or greater than 30 (n> 30). The Z-statistic, Zc, is the test statistic —————— oo used in order to lead for the rejection of null hypothesis in favor of the altemative hypothesis. This is computed as: 2 X= My “ola Where X the computed mean is in the gathered data, x, is the hypothesized mean, o is the population standard deviation which is known or given and nis the sample size. The critical value is obtained using the z-tabular value. For a two-tailed test, the value of 1-a/2 written symbolically as Za is considered. Otherwise, for one-tailed test the value of 1-. written as Za is written. wa Wi piel ee 00 =e a i o a Figure 1. The Normal Distribution or Z- Distribution for Testing the Hypothesis Ho: jt = to with critical values for (a) Hr: po, (b) p> Ho, (6) p< to Example 1. A random sample of 100 students enrolled in Statistics course under Professor X shows that the average grade in the midterm examination is 85%, Professor X claims that the average grade of the students in the midterm is at least 80% with a standard deviation of 16%. Is there an evidence to say that the claim is correct at 5% level of significance? Solution: 1. Ho: y= 80% Hi: p> 80% ‘MATH 403- ENGINEERING DATA ANALYSIS nn ‘MATH 403- ENGINEERING DATA ANALYSIS 2. «= 0.05, right-tailed test 3. z= on 4. Critical region: z > 1.645. Reject Hoif ze is greater than 1.645 5. Computing for z-statistic: 6. Reject Ho since 3.125 is greater than 1.645, 7. Therefore, the Professor claim is correct is 5% level of significance. Using the P-value approach, the P-value corresponding to z = 3.125 is 0.0009 using the table for Areas Under the Normal Curve. This results to an evidence stronger than the 0.05 level of significance in favor of the alternative hypothesis, Hi Example 2. A manufacturer of solar lamp claims that the mean useful life of their new product is 8 months with a standard deviation of 0.5 month. To test this clam, a random sample of 50 solar lamps were tested and found to have a mean life of 7.8 months, Test the hypothesis that j1 = 8 months against the alternative hypothesis that . # 8 months using 1% level of significance. Solution: 1. Ho: y= 8 months Hi: #8 months 2. & = 0.01, two-tailed test —————— oo ‘MATH 403- ENGINEERING DATA ANALYSIS 3. Z, = Nn 4. Critical region: z < -2.575 and z > 2.575. Reject Hoif -2.575 > ze > 2.575 5. Computing for z-statistic: 8284, say 2.83 6. Reject Ho since -2.83 is less than -2.575 7. Therefore, the mean useful life of the new product is not equal to 8 months. In fact itis less than 8 months at 1% level of significance, Using the P-value approach and considering that this is a two-tailed test, the P-value is twice as the area to the left of z = -2.83. Using the table for Areas Under the Normal Curve, P = P(|z| > 2.83) = 2P(z < -2.83) = 0.0046 This results to rejection of Ho at a less than 1%. 8.3. Test on the Mean of a Normal Distribution Variance Unknown To draw an inference on a mean in one-population case assuming normally distributed but the variance is unknown and the sample size is less than 30, t-test is used. The test statistic used is the t-statistic, te, which is computed as follows: where X the computed mean is in the gathered data, 1, is the hypothesized mean, s is the sample standard deviation and n is the sample size. The critical value is obtained using the t-tabular value, For a two-sided test, critical value is obtained at a/2 and at a degree of freedom (d.f.) equals to (n-1), written as ta2 (1), Otherwise, for one-sided test, the value is obtained at o. and at a degree of freedom (n-1) written as ta (r-1. Figure 2. T- Distribution for Testing the Hypothesis H.: 1 = us with critical values for (@) Hi: po, (b) > po, (6) < Ho Example. The College of Engineering of a State University gives an entrance exam to incoming freshmen. Those who got scores equal or higher than the set passing are accepted in the College. The average score of the incoming freshmen was 80% before the implementation of K to 12 education system. Due to this implementation, the entrance exam was suspended for two years and it is thought that the quality of the first year students had diminished, However, with the vision, mission, goals and objectives of the University and the College towards quality education, the Dean wants to determine if the quality of freshmen students has changed. He wants to know if it has improved or diminished so a small random sample of 15 freshmen students and administers the same entrance exam. The average score is found to be 83% with a standard deviation of 5%: Determine whether the quality has changed using 1% level of significance. ‘MATH 403- ENGINEERING DATA ANALYSIS nn ‘MATH 403- ENGINEERING DATA ANALYSIS Solution: 1. Ho: p= 80% Hi: # 80% 2. a = 0.01, two-tailed test 4. Critical region: t = + 2.977. Reject Hoif te is less than -2.977 or greater than 2.977 This is obtained from the table for Critical Values of the t-distribution using a/2 = 0.005 and degree of freedom, 5. Computing for t-statistic: = 2.32 6. Do not reject Ho since 2.32 is less than 2.977 but greater than -2.977 7. Therefore, the quality of freshmen students has not changed at 1% level of significance. The P-value corresponding to 2.32 is 0.036 or 3.6%. Since this is a two-tailed test, then P = P(t| > 2.977) = 2P(t <-2.977) = 0.036 ‘MATH 403- ENGINEERING DATA ANALYSIS 8.4, Test on Variance and Statistical Deviation of a Normal Distribution The chi-square distribution will be used to test a claim about a single variance or standard deviation. The formula for the Chi-square test for a single variance is given by: (n- 1s? a 2 where n is the sample size, s? is the sample variance and 0” is the population variance with the degrees of freedom equal to (n -1). There are three assumptions for the Chi- square test: the sample must be randomly selected from the population, the population must be normally distributed for the variable under study, and the observations must be independent of each other. re re 13 & w © Figure 3. Chi-Squared Distribution for Testing the Hypothesis Ho: 0? = 0. with critical values for (a) Hi: 0? + «2, (b) 0? > «2, (c) 0? < 0.2 Examplet. A company claims that the variance of the sugar content of its ice cream is equal to 25 mg/oz. A sample of 20 servings is selected, and the sugar contents is measured. The variance of the sample is found to be 36, At 10% level of significance, is there enough evidence to reject the claim? Solution: 1. Ho: 0? = 25 mgloz —————— oo ‘MATH 403- ENGINEERING DATA ANALYSIS Hi: 0? #25 mgloz 2. a = 0.10, two-tailed test 3, earns 4. Critical region: x? < 10.117 and x? > 30.144 . Reject Hof x? is less than 10.117 or greater than 30.144. This is obtained from the table for Critical Values of the Chi-Squared distribution using a/2 = 0.05 and degree of freedom, v = 20-1 = 19. 5. Computing for x? - statistic: (n= 1s? a _ 9) G6) Seed = 2736 6. Do not reject Ho since 10.117 < 27.36 < 30.144. 7. Therefore, the company claim that the sugar content is equal to 25 mg/oz is correct at 10% level of significance. 8.5. Test on a Population Proportion The problem of testing the hypothesis considers the proportion of successes in a binomial experiment equals some specified value. That is, the null hypothesis Ho that p = Po, where p is the parameter of the binomial distribution is tested. The alternative hypothesis may be one of the usual one-sided or two-sided alternatives: P< PoP > Pol P # Po —————— oo ‘MATH 403- ENGINEERING DATA ANALYSIS The following are the steps in testing a proportion of small samples: 1. Ho p=Po Hi, Alternatives are: p Po Ot P # Po 2. Choose a level of significance equal to a. 3. Test statistic: Binomial variable X with p = po. 4, Computations: Find x, the number of successes, and compute the appropriate P- value 5. Decision: Draw appropriate conclusion based on the P-value. Example1. A home developer claims that solar panels are installed in 65% of all homes being constructed today in a certain subdivision. Would you agree with this claim if a random survey of new homes in this subdivision shows that 8 out of 15 had solar panels installed? Use a 0.10 level of significance. Solutio 1. Ho: p= 0.65 Hip 20.65 2, a= 0.05, two-tailed test 3. Test statistic: Binomial variable X with p = 0.65 and n= 15 4, Computations: x = 8 and npo = (15) (0.65) = 9.75. Using the table for Binomial Probability Sums, the computed P-value is shown below —————— oo 2P(X < Bwhenp = .65) 2 > Ge; 15,0.65) 0.5213 5. Do not reject Ho and conclude that there is no enough evidence to doubt the claim of the home developer. For large n, approximation is required. When the hypothesized value ps is very close to 0 or 1, the Poisson distribution with parameter / = npo may be used. However, the normal- curve approximation, with parameters j= npo and 0? = npogp, is usually preferred for large and is very accurate as long as ps is not extremely close to 0 or 1, Using the normal approximation, the z-value for testing p = po is given by Po VRP which is a value of the standard normal variable Z. Hence, for a two-tailed test at the a- level of significance, the critical region is z < -2a and z > 2u2. For one-sided alternative P po, the critical region is z > 2a Example1. A semiconductor company produces microcontrollers for robotic applications. The company is said to demonstrate capability to the customers if the process produces defective items not exceeding to 5%. To determine this, a random sample of 200 microcontrollers were tested and found out that there are four defective items. Will you agree that the company demonstrate process capability at 0.05 level of significance? Use P-value approach. ‘MATH 403- ENGINEERING DATA ANALYSIS nn ‘MATH 403- ENGINEERING DATA ANALYSIS 4 = 200(0.05) 200(0.05)(0.95) =-195 4, The P-value from the Table for Areas Under the Normal Curve, P(z < -1.95) = 0.0256. 5. Since the P-value is less than 0.05, then reject Ho 6. Therefore at 5% level of significance, the company demonstrates process capability for the customers. REFERENCES: Garcia, George A. Fundamental Concepts and Methods in Statistics, Manila: University of Sto. ‘Tomas Publishing House, 2004 Montgomery, Douglas C., et al., Applied Statistics and Probability for Engineers, 7th ed., John Wiley & Sons (Asia) Pte Ltd, 2018 Walpole, Ronald E., et al. Probability and Statistics for Engineers and Scientists, 9th ed., Pearson Education Inc., 2016 ————————— &»

You might also like