You are on page 1of 40
NORMAL, BINOMIAL AND POISSON DISTRIBUTIONS 239 ee 7.5 The Normal Distribution The normal distribution or Gaussian distribution is the most important of all distributions because it describes the situation in which very large values are rather sate, very small values are rather rare, but the middle values are rather common. It is one of the useful models for the population relative frequency distribution. the normal distribution was proposed by C.F. Gauss (1777-1855), thus this was named Gaussian distribution, a model for the relative frequency distribution of errors such as errors of measurement. It is defined by the equation n (x; H, 6) 1 Sex - wre? e n (x, p) = 04 21 where = mean o = standard deviation m= 3.14159 e = 2.71828 A graph of the normal distribution which is often called the normal curve is shown below. This curve provides an adequate model for the relative frequency of data collected from many disciplines. =F pe p25 po (0 te Ho HO *— 68.28% —* *— 95.45% ———_» — 99.73% ———___+ The Normal Curve The normal distribution has mean » = Np and ‘variance o? = Npq and o= 7 Npq The total area bounded by the normal curve and the x-axis is 1; hence the atea under the curve between two ordinates x, = a and x, = b, where a < b represents the probability that x lies between a and b. The probability is denoted by Pasx s b). Scanned with CamScanner 240 ELEMENTARY STATISTICS WITH COMPUTER APPLICATIONS 1 curve has the following characteristics: = The normal The graph has a single peak at the center that occur at the i an h (Mean = Median = Mode) The graph is symmetrical about the mean p. The graph never touches the horizontal axis. ¢ The area under the graph is equal to. 1. Plu-o — 95.45% ——> +—\_ 99.73% ———> The Standardized Normal Curve 7.5.1 The Standardized Normal Curve - When the variable x is expressed in terms of standard units z = X—#, the o 1 ea ~ uP is replaced by the so-called standard form af 2 equations y = 7.6 Standardized Normal Distribution a The equation y = = 2? defines z as normally distributed with mean 0 2m (zero) and variance 1. Any normal distribution can be converted into the “standardized normal distribution” by subtracting the mean 1 from each observation x and dividing by the standard deviation o. A z-score or a z-value is taken as a distance between a selected value or Taw score designated by x, and the mean y divided by the standard deviation. It Measures the distance between a particular value of x and the arithmetic mean In units of the standard deviation o. Scanned with CamScanner 242 ELEMENTARY STATISTICS WITH COMPUTER APPLICATIONS Kiss In symbols, z-score = aot Referring to the appendix on areas under a normal curve, if z = 1.59, then the probability that an observation is between 0 and 1.59 standard deviations of the mean is 0.4441 or 44.41%. This is arrived at by going down the column of the table headed by the letter z to 1.5, then moving horizontally to the right and reading the probability under the column headed 0.09. 0 1.59 Example 1: Find the area beneath a standard normal curve between mean z = 0 and the point -1.58. Solution: -158 9 Since z = -1.58 is negative, it lies to the left of the mean and the unknown area is the shaded area shown. Note: Since the normal curve is symmetric, the area between the mean oo 2 2 = 1.58 is 0.4429. The area between z = -1.58 and z = 0 is also Example 2: __ Find the probability that a normal distribution random. variable will be within z = +1 standard deviation of its mean. Scanned with CamScanner NORMAL, BINo) MIAL AND POISSON pistRINUTIONS 243 Ce — 10 10 P = 2(0.3413) = 0.6826 Example 3: Find the probabili ty that a normally distri dom variable x will lie more than 1.5 stan ally distributed rando: dard deviation above its mean. Solution: The unknown probability is the dark shade area. The total area under a standard normal curve is 1. Half this area lies to the left of the mean and half to the right. Therefore, the probability p that x will lie more than 1.5 standard deviations above the mean is equal to 0.5 less than area A. P =05-A = 0.5 — 0.4332 = 0668 Example 4: Find the area under the normal curve between z = 0.5 and z = 2.37. Scanned with CamScanner 244 ELEMENTARY STATISTICS WITH COMPUTER APPLICATIONS " ass SS :0C—__—_—_—_.0O0 OO SS Solution: 0 5 2.37 2 Score The area A that we seek lies to the right of the mean because both 5 values are positive. Let A represent the area between z = 0 and z = 0.5 and A, represent the area between z = 0 and z = 2.37. The area that is unknown is A=A,- A, = 0.4911 — 0.1915 = 0.2996 Example 5: Find the area under the normal curve between z = -1.0 and z = 25, Solution: -1.0 0 25 Unknown area A = Ay + Ay = 0.3413 + 0.4938 = 0.8351 7.7 Application of the Areas Under a Normal Curve there ane maaan Social science variables that are normally distributed. Howevé! ioraial distribabon, ie that have properties close enough to those of the variables as IQ, height epee properties of the normal distribution apply. Diverse in Luzon, the number of adult men, the average temperature on Christmas Day normally distributed. 1f sees laid by chickens in a month, are all approximate it is to be assumed that a variable is approximate ial Normally distributed, a num} ; = may be drawn. ; ber of interesting conclusions about that va" Scanned with CamScanner NORM ‘AL, BINOMIAL AND POISSON DISTRIBUTIONS 245 a Problem 1: The weekly incom, 5 re nor- mally distributed with = © *sistance managers of our local banks “Ay ha Meai P. jation of P95, is the z- nN of 40,000 and a standard deviatio! what is the z-score for an income x of 2) P 30,000? b) 42,000? Solution: a x = P30,000 zaXth o — 30,000 ~ 40,000 Set ~ 40,000. 5,000 =-20 b. x = 42,000 z= 42,000 — 40,000 5,000 = 04 Problem 2: Consider the normal distribution if IQs with mean y of 100 and a stan- dard deviation of 16. What percentage of IQs are: a. greater than 92? . b. between 92 and 120? c. less than 120? Solution: a. Greater than 92 92 - 100 Zee =-05 05 0 z=0 Referring to the areas under a normal curve, the area between jt = 100 or z =0 ed z = 0.5 is 0.1915. This means that the percentage of IQs lies between 100 and 92. Since the percentage of IQs greater than the mean is 1 or 50%, therefore the percentage of IQs above 92 is 0.1915 + 0.50 or 0.6915 or 69.15%. Scanned with CamScanner 246 ELEMENTARY STATISTICS WITH COMPUTER APPLICATIONS b. Between 92 and 120. 2, = 22=100 : 16 =-05 120 - 100 z, = 1.25 Since the percentage of IQs between z = -0.5 and z = 0 is 0.1915 or 19.15% and the percentage of IQs between z = 0 and z = 1.25 is 0.3944 or 39.44%, therefore Unknown percentage = area between z = -0.5 and z = 1.25 = 0.1915 + 0.3944 = 0.5859 = 58.59%. c. Less than 120 _ 120 - 100 2 16 20 ~ T6 = 125 Unknown percentage = percentage of IQs below the mean + percentage of IQs between z = 0 and z = 1.25 = 0.5 + 0.3944 = 0.8944 Problem 3; Suppose a basketball player plays in two leagues in summer. In the first league, he scored 12 points while in the second he scored 16. At first glance it appears that he did better in the second league. If we know the means and the standard deviations for points scored in the leagues, we could compare the player's shooting output in the two leagues. Assume that the scores in each league are normally distributed with mean and standard deviation as follows: Scanned with CamScanner MAL, Bing + BIND MAL AND POISSON DisTRIBUTIONS 247 a League 1 League 2 Standard Deviation 85 44 i 22 46 Solution: a z = 6-144 Fils excess food the mean, whe be e first league is over 1.5 standard deviations above above the monn, 42, Sco in the second, is only 0.35 standard deviation ean. Thus, the player played better in the fist league. Scanned with CamScanner NORMAL, ; , BINO) MIA. AND Poisson DistRIBUTIONS 249 i lion: Score: _—_—__—— . Date: Exercise 7.4 "4, Suppose a wl standard denne relative frequency distribution has mean pt = 500 a x= 620 tion c = 100. Give the z-score corresponding to b. x= 660 c. x= 700 2. Find the area under a normal curve: a, _ between the mean and a point z = 1.32 standard deviations to the right of the mean. b. between z = -2.0 and z = -1.56 c. between z = -1.5 and z = 2.25 d. between z = 2.0 and z = 3.0 e. above z = -1.28 f. below z = 1.54 Scanned with CamScanner Table fl Proportions of Area Under the Standards Normal Curve z oz f or|| * OF 0.00 | 0000 | .s0 || oss |. 3643 01 | “ooo | “4 | | 0.56 K Theas 0.02 | ‘o0g0 | -4970 37 R 3s86 0:03 | - ‘orzo | “4860 38 ‘m8 oro | ‘ore | 4800 9 cane cos | oie | .4a01 | | 0.60 m9 0.06 | ‘0239 | a7et | | 0.61 170 dior | Xp a The alternative hypothesis will be accepted if ie sempre Gate Provide us with evidence that the null hypothesis is false. nS ort, the rejection of null hypothesis implies that the alternative hypothesis is accepted, 8.5 Types of Errors . Error is one of the many things man is afraid to commit. Even in a real life situation, we would hardly come out with a decision immediately because of our fear to commit an error. The same is true in hypothesis testing, there is also a possibility of committing an error in deciding whether to accept or reject the hypothesis. This is because partial information obtained from the sample is used to draw conclusion about the entire population. In hypothesis testing, four outcomes are possible; two of which lead to incor rect decisions. The four possible outcomes are described in Table 8.1. Table 8.1 The Four Possible Outcomes for a Hypothesis Test Fact Decisi i ii ion H, is true, H, is false. Do not reject HH, Correct decision Type II error Reject A, Type I error Correct decision Based from Table 8,1 incorrect decisions oO ccur if eit] I hypothesis i i Type L error and the secerad 4° ase The first i her a true null hypothesis corre ision is called it eae. ct decision is 8.5.1 Definition of Type 1 and Tupe It Errors Type I error: Rejectin ; ‘ 8 the null hy s is true. It is also known as a ee " i. Tror, ape tt orran: Not rejecting the nut h h "i esis is false. It is algo known as ten gale wh ” eta erro) en in fact the null hypothesis en in fact the null hypo” " Scanned with CamScanner HYPOTHESIS TESTING 257 nn ‘The probability of committing a type I error is the probability of rejecting the true null hypothesis. In other words, it is the probability that the test statistic will be in the rejection region if, in fact, the null hypothesis is true. The probability of type I error is called the level of significance of the hypothesis test and is denoted by the Greek letter a (alpha). 8.6 Definition of Level of Significance The significance level « of a hypothesis test is defined to be the probability of committing a type I error. This is the probability of rejecting the null hypothesis. The probability of committing a type II error is the probability of not re- jecting a false null hypothesis. In other words, it is the probability that the test statistic will be the non-rejection region if, in fact, the null hypothesis is false. It is denoted by the Greek letter B (beta) to denote the probability of a type II error. The probability, 8, of a type Il error depends on the value of p. The choice of level of significance depends on the statistician or researcher who is willing to commit a type I error. Statistician commonly use 1%, 5% or 10% for convenience as limits for how unlikely a value of mean (x) they will tolerate before rejecting hypothesis. Using 0.05 level of significance in testing hypothesis implies that the probability of accepting to commit an error in rejecting the null hypothesis is 5% but 95% sure that the decision made is correct. Suppose, for example, we decide on a significance level of 1%, that is, a = .01. Then we choose the cut off point, critical region, so that if the Ho is true, only 1% of the possible X-values are less than the critical region. See Figure 8.1. Rejection region a=.01 Non-rejection region Critical Region Fig. 8.1 8.7 One-tailed and Two-tailed Tests One way of determining the type of test used in hypothesis testing is based on how the alternative hypothesis is formulated. A one-tailed test is used when the alternative hypothesis is directional which means that the value of the measures is either greater than (>) or less than (<) the other measure. Scanned with CamScanner 258 ELEMENTARY STATISTICS WITH COMPUTER APPLICATIONS SS eee A one-tailed test is a hypothesis test for which the rejection region lies at only one tail of the distribution. One-tailed test is classified as left-tailed test or right-tailed test. If the population mean (u1) is less than the specified value of 41, then it is a left-tailed test for which the alternative hypothesis can be expressed as } < p,. It is a right-tailed test if the population mean () is greater than the specified value of u, for which the alternative hypothesis can be expressed as 1 > Hy. A two-tailed test is used when the alternative hypothesis is non-directional which means that the values of two measures of the same kind are not equal. A two-tailed test has a not equal sign (#) in the alternative hypothesis. When the population mean (1) is not equal to specified value of 1, then the alternative hypothesis can be expressed a pt = H- A two-tailed test is a hypothesis test for which the rejection region lies on both end tails of distribution, one on the left and one on the right. Table 8.2 Rejection Regions for Two-tailed, Left-tailed, and Right-tailed Test Two-tailed | Right-tailed Left-tailed Test Test Test Sign in H, = S < Rejection region both sides right side left side Table 8.2 summarizes the discussion about two-tailed, right-tailed test, and left-tailed test. rejection region * waA___ |? z = -1.645 5 = .05, one-tailed test (left tail) non-rejection region Aa z= -2.33 x a = 01, one-tailed test (left tail) Scanned with CamScanner EON EEK rejection region K 2% = 1.645 a = .05, one-tailed test (right tail) non-rejection region rejection region x z= 2.33 a = .01, one-tailed test (right tail) non-rejection region rejection region z= -1.96 x z= 1.96 a = .05, two-tailed test non-rejection region rejection region reeton region 2 = 2.575 x z = 2,575 a. = .01, two-tailed test Fig. 8.2: Graphical representation of rejection regions for two-tailed; left-tailed and right-tailed test at 5% and 1% level of significance. Scanned with CamScanner Critical Region in Testing Hiypothes!s One-tailed Test .Two-tailed Test Significance Left-tailed | Right-tailed Z< -1.645 z> 1.645 z>196 orz< -1.96 Z< -2.33 Z> 2.33 z > 2.575 or z < -2.575 z<-1.28 z> 1.28 z > 1.645 or z < -1.645 Note: Reject the null hypothesis when the computed value of z lies within the area of rejection. 8.8 Some Terminologies To Remember Test Statistic: The statistic used as a basis for deciding whether the null hypothesis should be rejected. Rejection region: The set’ of values of the test statistic that leads to rejection of the null hypothesis. Non-rejection region: The set of values of the test statistic that leads to non- rejection of the null hypothesis. Critical value: The values of the test statistic that separate the rejection and non-rejection regions. 8.9 A Hypothesis Testing Procedure Formulate the null and alternative hypothesis. Decide the level of significance, a. Choose the appropriate test statistic, Establish the critical region. Compute the value of the statistical test. Decide whether to accept or reject the null hypothesis. NQafenrne Draw a conclusion. In this section and the succeeding sections, we are goi i ites i a , in hypothesis that involve a single population mean ™ 80Ing to discuss testing Scanned with CamScanner HYPOTHESIS TESTING 261 O_O volved in testing hypothesis between means; a large sample (n > 30) and small sample (n < 30) cases. In testing hypothesis, z-test and t-distribution may be used depending on the number of cases involved. The z-test is used in comparing two means if the population standard deviation (o) is known. We should give emphasis in the discussion that if the population is normally distributed, z-test can be used for any sample size n. However, in many practical cases, the population standard deviation is unknown but the sample is sufficiently large, that is n > 30. The sample standard deviation (s) is used as an estimator of the population standard deviation. Below are the discussions of the different cases in testing the mean. fi. Hypothesis About Means (Comparing Sample Mean and Population Mean) 1. 2. where: z = z-test value Xx = sample mean # = population mean or claimed mean in H, co = population standard deviation s = sample standard deviation n = number of cases greater than or equal to 30 Problem 1: The treasurer of a certain university claims that the mean monthly salary of their college professor is 21,750 with a standard deviation of 6,000. A researcher takes a random sample of 75 college professors were found to have a mean monthly salary of 19,375.00. Do the 75 college professors have lower salaries than the rest? Test the claim at a = .05 level of significance. Apply the different steps in testing hypothesis to solve the given problem. Solution: Step 1. H,; The mean monthly salary of the College professors is P21,750 (= 21,750) H,: The mean monthly salary of the College professors is lower or less than 21,750. (u < 21,750) Step 2. a=.05 Scanned with CamScanner 262 BLEMENTARY STATISTICS WITH COMPUTER APPLICATIONS ss Step 3. One-tailed test is used because the H, is directional. Step 4. The tabular value or critical value of z at 0.05 level of significance is £1,645. Step 5. Compute the z-value. Given: X = P19,375.00 j= 21,750.00 o = P6,000 n=75 Zz = 19,375 — 21,750 6,000 V5 ee gis 3 @ PN &R -2,375 692.84 = -3.43 Step 6. The computed value of z = -3.43 lies under the rejection region, therefore reject H, and accept H,. Step 7. Conclusion: The mean monthly salary of the College professors in lower than 21,750. Problem 2: The mean weight of the baggage carried into an airplane by individual passengers at Tuguegarao City Airport is 19.8 kilograms. A statistician takes a random sample of 110 passengers and obtains a sample mean weight of 18.5 kilograms with standard deviation of 8.5 kilograms. Test the claim at a = .01 level of significance. Scanned with CamScanner EXANPUE | lll ee a We rt Y ag oe ae 23.) +f. 45 vane Scanned with CamScanner HYPOTHESIS TESTING i 263 SE Step 1. Hp = 19.8 kg Hy p< 19.8 kg Step 2. a=.01 Step 3. The alternative hypothesis is expressed in a directional statement, therefore use one-tailed test. Step 4. The tabular or critical value of z = 42.33. Step 5. Compute the z-value. - Given: x = 18.5 kg p= 19.8 kg 8 =85kg n=110 Solution: Step 6. Step 7. z =-1.60 The computed value of z = -1.60 lies under the non-rejection, there- fore accept the null hypothesis (H,). Conclusion There is no significant difference between the weight of bag- gage carried by individual passengers. Scanned with CamScanner EXHPLE 2 —233 = ).60 a Scanned with CamScanner 264 ELEMENTARY STATISTICS WITH COMPUTER APPLICATIUNS ee B. Difference Between Means (Sample Means) eae 1 = mean of the first sample X, = mean of the second sample variance of the first sample s,? = variance of the second sample n, = number of cases in the first sample n, = number of cases in the second sample Problem 3: A sample of 70 observations is selected from a normal population. The sample mean is 2.78 and the sample standard deviation is 0.83. Another sample of 58 observation is selected from normal population. The mean sample is 2.63 and the sample standard deviation is 0.75. Test the hypothesis using a = .05 level of significance. Solution: Step 1. H,: X, = Hy & # % Step 2. a = 0.05 Step 3. The alternative hypothesis is expressed in a non-directional state- ment, therefore the two-tailed test is used. 5 = £1.96 ‘a = 05, Step 4. The tabular value of z Step 5. Compute for the z-value. Given: X, = 2.78 s, = 0.83 X, = 2.63 S = 0.75 n, = 70 ny, = 58 Scanned with CamScanner HYPOTHESIS TESTING 265 al Solution: z= \ i 0.15 ~ Yo.01954 _ 0.15 “0.1398 z = 1.07 Step 6. The computed value of z = 1.07 lies under the non-rejection region, therefore accept the null hypothesis (H,). Step 7. Conclusion There is no significant difference between the two samples. » Hypothesis Testing about a Single Proportion -P where: P = population proportion P = sample proportion n = number of cases Problem 4: A barangay captain claims that at least 85% of the residents in his baran- gay have household pets. To test this claim, a researcher randomly selected a sample of 550 residents and find that 495 of them do have household pets. At 0.05 level of significance what can you conclude? Solution: Step 1. H,: P 2 85% H,; P > 85% Scanned with CamScanner 266 ELEMENTARY STATISTICS WITH COMPUTER APPLICATIONS SS Step 2. a = 0.05 Step 3. Alternative hypothesis is expressed in directional statement, therefore one-tailed test is used. Step 4, The tabular value of 2, 95 = +1.96 Step 5. Compute the z-value. ~ “To.s5(1 - 0.85) 550 _ 0.05 ~ -¥0.00023181818 _ 0.05 ~ 0,015225576 z = 3.28 Step 6. The computed value of z = 3.28 lies at the rejection region, therefore reject the null hypothesis (H,) and accept the alternative hypothesis (H,). Step 7. Conclusion: There is a difference between the two proportions. D. Hypothesis Testing About Two Proportions R P, where: P, = proportion of the first sample P, = proportion of the second sample ny = number of cases in the first sample n, = number of cases in the second sample Scanned with CamScanner fi HYPOTHESIS TESTING 267 Problem 5: Aresearcher made a survey regarding the proportion of male and female college students who dropped their business mathematics subjects before | midterm examination. In a sample of 600 female students, 49 dropped their | business mathematics subject before midterm examination. In a sample of 450 male students, 36 dropped their business mathematics subject before the midterm examination. Is there a difference in the proportion of male | and female students who dropped their business mathematics subjects? Use | q = 0.05 level of significance. | Solution: Step 1. H,: P, = P, H,: Py =P, Step 2, a= .05 Step 3. Use two-tailed test because Ha is non-directional. Step 4. The critical or tabular value z,_ 95 = +1.96 Step 5. Compute the z-value Given: 7. * 600 36 R= = n, = 600 n, = 450 Solution: R-2 ** a-f), a= ®) ny n, 0.0817 — 0.08 = Fonsi — 0.08817) , (0.08)(1 - 0.08) 600 450 0.0017 f * 70.0002859745 z = 0.10 Scanned with CamScanner 268 ELEMENTARY STATISTICS WITH COMPUTER APPLICATIONS ——_————————— OO EEE——EEE Step 6. The computed value of z = 0.10 lies at the area of non rejection, therefore accept the null hypothesis (H,). Step 7. Conclusion: There is no significant difference between the two-sample proportions, When the sample size involves small cases (n < 30) and the population standard deviation is unknown, use the sample standard deviation (s) as an estimator of population standard deviation (a). In cases like this, t-distribution is appropriate as the test statistic. Using the t-distribution as test statistic, it is always an assumption that the sampled population is normal or approximately normal. The tdistribution was developed by an employce of Irish brewery in the person of William S. Gossett (1876-1936). He chose to publish his findings using the pen name “Student.” To honor his work, the distribution is known today as Student t-distribution. € Test Concerning Means (Comparing Sample Mean and Population Mean) Xx- 5 where: t =tvalue x = sample mean == population mean s = sample standard deviation n = number of cases less than 30 df =n-1 Problem 6: According to the Department of Education, high school teachers work an average of 40 hours per week during the school year, A district supervi- sor of a certain school surveyed 28 randomly selected teachers and found that they work an average of 42.6 hours a week and the standard deviation was 3.75 hours. Test if the mean number of hours worked by teachers’in the supervisor's school district differs from the national average. Use a = .01. Solution: Step 1. Hy: = 40 hours H,; b # 40 hours Scanned with CamScanner Chaucas ee —- Level of significance for a directional (one-tailed) test 02 1 31.821 2 1.886 2.920 4.303 6.965 9.925 3 1.638 2.353 3.182 4.641 5.841 4 1,633 2.132 2.776 3.747 4.604 S 3.365 6 3.143. 7 1.415, 1,896 | 2.365 2.998, 3.499 8 1.397 1.860 2.306 2.896 3.355 in 1.383 1.833 2.262 2.821 3.250 2.764 2.718 1.366 1.782 2.179 2.681 3.055 1.350 1.771 2.160 2.650 3.012 1.345 1.761 2.145 2,624 2.977 2,602 2.583 1.333 1.740 2.110 2.567 2.898 1.330 1.734 2.101 2.552 2.878 1,328 1.729 2.093 2.639 , 2.861 2.528 A 2.518 1.321 1.717 2.074 2.608 2.819 1,319 1.714 2.069 2.600 2.807 1.318 W711 2.064 2.492 2.797 1.316 1.708 2.060 2.485 2.787 26 1.315 1.706 2.056 2.479 2.7798 27 1.314 1.703 2.052 2.473 2.771 238 1.313 4.701 2.048 2.467 2.763 2 1.371 1.699 2.045 2.462 2.756 30 2.457 40 “ 2.423 60 1.296 1.671 2.000 2.390 2.660 120 1,289 1.668 1.980 2.358 2.617 oo 1.282 1,645 1.960 2.326 2.576 ‘The value listed in the table is the critical value of ¢ for the number of degrees of freedom listed in the left column for a directional (one-tailed) or nondirectional (two-tailed) test a the significance level indicated at the top of each column. If the observed # is greater tha or equal to the tabled value, reject Ho. Since the ¢ distribution is symmetrical about t= ¢ these critical values represent both + and ~ values for nondirectional tests. Scanned with CamScanner HUIPOTHESIS TESTING 269 au ————— Step 2. a=.01 | Step 3. The alternative hypothesis expresses a non-directional ti | therefore two-tailed test is used, statement, Step 4. If one sampled is given, use df =n - 1 df =n-1 = 28-1 df = 27 The tabular value of t = 2.771. step 5. Compute the t-value. Given: X = 42.6 hours p = 40 hours s = 3.75 hours n= 28 Solution: t= 42.6 - 40 3% a 2.6 305 5.2915 26 * 0.7087 t = 967 Step 6. The computed value of t = 3.67 is greater than the tabular value of t = 2.771, thus reject the null hypothesis (H,) and accept the alter- native hypothesis (H,). Step 7. Conclusion: There is a significant difference of the working hours of 28 f teachers per week compared to the national average. Scanned with CamScanner 270 ELEMENTARY STATISTICS WITH COMPUIBK Arr..~...-~- F. Difference Between Means 1. t-test for independent samples where: X, = mean of the first sample X) = mean of the second sample 5,2 = variance of the first sample s, = variance of the second sample n= number of cases in the first sample n, = number of cases in the second sample df=n, +n ,-1 2. t-test for dependent samples d {2 a eK 8 where: d = difference between means Yd? = sum of the squared difference Yd = sum of the mean difference n = number of cases s = standard deviation df =n-1 Problem 7: An agronomist randomly selected 20 matured calamansi trees of one variety and have a mean height of 10.8 feet with standard derivation of 1.25 feet, while 12 randomly selected calamansi trees of another variety have a mean height of 9.6 feet with standard derivation of 1.45 feet, Test whether the difference between the two sample means is significant. Use a = .05. Solution: Step 1. HX, = % Hy: X # X, Step 2, a =.05 Scanned with CamScanner HYPOTHESIS TESTING 271 Step 3. The alternative hypothesis is non-directional, thus, the two-tailed test is used. Step 4. Since there are two samples used, df =n, +n, -2 =20+12-2 = 32-2 df = 30 The tabular value of t = 2.042. Step 5. Compute t-value * Given: X, = 10.8 ft. % = 9.6 ft. s, = 125 ft. S, = 145 ft. n, = 20 ny, = 12 Solution: t = t = 2.38 Step 6. omputed value of t = 2.38 is greater than the tabular value of ae eee thus ieject the null hypothesis (H,) and accept the alter- native hypothesis (H,). Step 7. Conclusion: There is a significant difference between the two samples. Scanned with CamScanner Solution: pad pal vn ales n #2 I 10 d= 13 Step 6, | Step 7, -V8.4555556 s =2.91 13, 291 " 13 0.920222799 t = 141 The computed value of t = 1.41 which is less than the tabular value of t = 2.262, therefore accept the null hypothesis (H,). Conclusion: There is no signific students before and after the review class. It implies was not effective. ant difference between the mean scores of that the review Scanned with CamScanner Table D Critical Values of Chi-square, x? [a0 [0s [ozs [or [0s [008 Level of significance for a nondirectional test Core aauna|& SS88s SSSS8 BB The table lists the critical values of chi square for the degrees of freedom shown at the left for teats corresnondina to those significance levels which head aah antume TF she Scanned with CamScanner en In the previous sections, we dealt with hypothesis testing about a population mean, two population means and proportion. In this test, it is always assumed that the population is approximately normal and the given data are at least interval scale. How about if the given data are normal or ordinal scale, and no assump- tions are made about the population? What appropriate test statistic is used in these types of data? When the data are nomimal or ordinal, the hypothesis test used in this type of data is called non-parametric, or distribution free tests. This implies that these tests are free of assumptions regarding the distribution about a population. The chi-square goodness-of-fit is one of the most commonly used non- parametric test which was developed by Karl Pearson. The purpose of the goodness-of-fit test is to determine how well an observed set of data fits an expected data. ge - 1O- BF E where x? = chi-square value O = observed frequency E = expected frequency 8.9.1 Chi-square Goodness-of-fit Test: Equal Expected Frequencies Example: There are three (3) gates at the University of the East. The building maintenance supervisor would like to know if the gates are equally utilized. As an experiment, 600 students are observed as they enter the school. The number of students using each gate is reported below. At 101 sig- nificance level, can we conclude that there is a difference in the use of the three gates? Gate Number of Students Recto 245 Lepanto 205 Gastambide 150 600 Because there are 600 students in the sample, we expect that 200 students fall in each of the three categories. These categories are called cells. Scanned with CamScanner PP erereeneeenenennenennnneee eS Gate 0 K Reto MS 200 Lepanto 208 200 Gastanbide: AN 200, 00 600 Niation sep L Bormutate the Hy and Hl). He Thor is no difference between the set of observed frequencies and the set of expected frequencies. He: There is a difference between the set of observed frequencies and the set of expected frequencies. sep 2 a= OL Sep 3 The test statistic is the chi-square distribution. It is designated as ¥ and is computed by: ye _ Woe BY with k-I degrees of freedom, where kis the number of categories, and O is an observed frequency in a particular category E is an expected frequency in a particular category v - W- BE) E (245 - 200)° , 205 - 200)" _ {150 = 200) 200 200 200 2,025 | 25 | 2,500 200° 200° 200 10.125 + 0.125 + 12.5 22.75 " *, " Step 4. Formulate the decision rule. 9.210 Scanned with CamScanner SE °° — Since the computed value of x? = 22.75 is greater than the criti.) value of x? = 9.210, reject the H, and accept the H,. Step 5. Conclusion There is a large difference between the set of observed frequencig, and set of expected frequencies. Three gates are not equally utilized, 6.9.2 Contingency Table Analysis When we applied the goodness-of-fit test previously in this chapter, we wer concerned with only a single variable and a single trait. The chi-square distriby. tion can also be used when we are considering two traits. Example: A number of employees at a large chemical plant were asked to indicate a preference from one of the three pension plans. The results are given in the following table. Is there a relationship between the pension plan selected and job classification of the employees? Use .05 significance level. Job Class Plan A Plan C Total oO E oO E 20 | 34.67 58 38 | 78.67 38 162 | 106.67 44 220 140 df = (number of rows - 1)(number of columns - 1) (Row Total)(Column Total) df = (r - 1)(c - 1) Grand Total Suspension Chemical Labor Total Expected Frequency for a cell = Expected Frequency for acell = a = 34.67 = (220)(236) _ = oa 78.67 = £220820) _ 196.67 Scanned with CamScanner Solution: H,: There is no association between the job classification of employees Step 1. and their selected pension plan preference. H,} There is an association between the job classification of em- ployees. step 2. a = .05 Step 3. Compute the x? 2 . wee EY E _ 20 - 34.67% | (38 - 78.67)" | (162 - 106.67)" 34.67 78.67 106.67 (26 - 47.27)2 (160 - 107.27)2 . (114 - 145.45)2 ‘War o7a7SSC (68 - 22.06)2 , (38 - 50.06)2 | (44 - 67.88)2 22.06 50.06 67.88 x? = 621 + 21.03 + 28.70 + 9.57 + 25.92 + 6.80 + 58.55 = + 2.91 + 8.40 x? = 168.09 Step 4. Formulate the decision rule: Computed Value of x? Critical Value of x? 168.09 9.488 a | i Since the computed value of x? = 168.09 is greater tha critical value of x? = 9.488, reject the Ho and accept the H,. Step 5. Conclusion: Job classification is associated to the pension plan pref of employees, Scanned with CamScanner

You might also like