You are on page 1of 7
ame aS AS MERATIO gnU IN AND DATA ANALY (IN THREE Ways) . \ Enumeration and Data Analysis 1 San of ! This section aims to: 1. discuss the principles embodied i ic 1 ‘abe nd meet sang ge 3. apply Chi-square in practical situation i In a ee of snlimeration data, the researcher often needs to decide whether vera indepen amples should be regarded as having come from the somewhat, and iheproblem is to determine whether the observe samples from the same population. Enumeration data are expressed in the form of frequencies, which represent the junber of items within specified qualities, descriptions, or categories. Enumeration data may be classified according to the number of variables described as either one-way or two-way classification. Each variable is further subdivided into more specific categories. ‘A one-way classification has only one variable described by at least two categories. Let us take, for instance, the variable on civil status. Civil status may be subdivided into more specific categories-single, married, widowed, married but legally separated. Two-way classification enumeration data have two variables described by their respective categories. The frequencies given are applicable to both variables. Data with two-way classification are best summarized and presented in a contingency table which is made up of several rows and columns, the rows representing the categories of one variable, and the columns representing the categories of other variable. Characteristics of the Chi-Square The statistic used in the analysis of enumeration data known as Chi-square test. The chi-square test can be used for a variable or two variables for which there are ‘Wo or more categories each. It reflects discrepancies between the observed and expected t theoretical frequencies of individuals, objects, or events falling in the various Categories, ; Be limited in this combining by the nature of the data. That is, results of statistical est may not be interpretable if the combination of the categories has been capricious. ty The adjacent categories which are combined must have some common property 0 Mutual identity if interpretation of the outcome of the test after the combining categorie le Uses as a sufficiently large N in his research. 4 ‘Applications of the Chi-Square the following: : The chi-square test can be applied in any of 1. test of goodness of fit 2. test of homogeneity (two or more 3. test of independence (one sample, ‘samples, one criterion variables) two criterion variable) Steps in Using Chi-Square These are the steps in using the chi-square test for k independent samples: 1, State the null hypothesis. The null hypothesis may be stated in any of these ways: ‘ othetical or theoretical a. The sample distribution conforms with the hyp distribution. b. The actual observed proportion is not significant different from the ideal or expected proportion. c. One variable does not depend on ot! independent from cach other. 2. Set the level of significance, also known as alpha (a ). ; [ 3. Cast the observed frequencies in a k x r contingency table, using the k columns.for the groups. Determine the expected frequency for each cell by finding the — product of the marginal totals common to the cell and dividing this product by N. (Nis — the sum of each group of marginal totals. It represents the total number of independent * observation. Inflated N’s invalidate the test.) Determine the degrees of freedom using the formula: her variable. Or the two variables ae For one-way classification: DF = number of categories — | For two-way classification: DF = (k - I(r - 1) 4, Locate the tabular value of the chi-square in the chi-square distribution table by getting the value where the desired level of significance and the computed degree of , freedom intersect. 5. Compute for the chi-square value by using formula x =e where © = observed number of cases E_ = observed number of cases E (row total)(column toa! ) 7 grand total y e han Chap 11. Enumeration and Data Analysis 243 i § ‘ate the conclusion to arrive at by the acceptance or rejection of the null pypottesis. It the computed. value of the chi-square is less than the tabular value, the null nypottesis is accepted. If the computed value of the chi- is value, the null hypothesis is rejected, oe ese es Correction for Continuity ’ nt has been indicated that the chi-square test requires a relatively large N because of the fact that the sampling distribution of the test statistic approximates the sampling distribution given in the chi — square table only when N is large. The question naturally arises, then, as to how large N has to be in order to make use of this test. The answer depends on the number of cells and the marginal totals. Generally, the smaller the number of cells and the more merely equal are all marginal totals, the smaller the total N can be. The criteria usually used for deciding whether or not the number of cases is sufficient involve the expected frequencies in each cell. Whenever any of the expected frequencies are in the neighborhood of 5 or smaller, it is advisable to make some kind of modification as indicated below. The chi-square distribution is assumed to be continuous one. Actually, however, when the number cases is relatively small it is impossible for the computed value of chi-square to take on very many different values. This is true because observed frequencies must always be integers. In correcting for continuity we imagine that observed frequencies actually can take on all possible values, and we make use of those values within a distance f half a unit on either side of the integer obtained which will give the most conservative results. In the case of the 2 x 2 table, a correction for continuity can very easily be made. ‘This correction consists of either adding or subtracting 0.5 from the observed frequencies in order to reduce the magnitude of chi — square. The corrected version becomes: >_y(lO-E|-05/ ee Test of Goodness of Fit A chi-square goodness of fit is performed in order to determine if a set of observed data corresponds to some théoretical distribution. Example 11.1.1. A coin is tossed 80 times, resulting in 50 heads and 30 tails. Does this differ from what is expected by chance? Set alpha at the 5% level. Solution. We follow the steps in using the chi- square test 1. Hy: The result does not differ form what is expected by chance. H,: The result differ from what is expected by chance. 2. Set 5% level of significance reo Ways ata Analyst tn is ones yay disibution, the degree oF freedom i 44; Chap 1B 3. Since the given d computed as: DF number af categories ~ 1 significance level is 3.84 (see appendix) 4, The tabular value at 1 DF and 5% ; «J frequencies using the following 5, For one- way distribution, get the expected formiula: E = mp where n = total frequency P = expected proportion E, = 80x 0.5 = 40 E, = 80x05 = 40 Then we substitute the values in the formula of the chi ~ square: 0 - E)* aa 2 2 _ 60-40)" , G0- 40)" _ 55 40 30 6: Since. 3.84 (Labular value) < 5.83 (computed value), we reject the, mull hypothesis. The result differs from what is expected by chance. ‘Test of Homogeneity (Two or More Samples, One Criterion Variable) ‘The chi-square test is frequently used to determine if two or more populations are homogenous. By this meant that the data distributions are similar with respect to a particular criterion variable. The samples drawn from each population in a test of homogeneity need not to be equal size. However, it is recommended that they be so whenever possible, for when they are calculations are made easier. Example 11.1.2. In an experiment involving two groups of samples, 100 males and 100 females subjects were asked to state a preference between frozen orange juice and a newly develop type of preserved juice. Do the preferences of the two groups differ? Use 1% level of significance. Orange Juice Preference Sex Frozen Preserved Males 69. 31 Females 48 52 Solution. We follow the steps in using the chi- square test |. Hy: The preferences of the two groups do not differ significantly. H,: The preferences ofthe two groups differ significantly, 3 h 2 SY level ofsignifcanes elton and Data Ans 245 3. Since the given data j ta i ES 3h computed as: $a two ~ Way classification, the degree of freedom is DE = (k=1)(-1) DF = (2-1)2-1) ! DF=1 4. The tabi in arte at 1 DF and 1 % significance level i 6.63 (see appendix) . For two- we i i i: ay classification, get the expected frequencies using the following E (cow total)(column tot ) grand total Orange Juice Preference Sex Frozen Preserved Total Males 69 31 100 Females 48 32 100 Total 7 83 200. E, = H7x 100 _ 55.5 By = 82100 = 415 117x100 E,, = U2x100 _ oe eee 83 x 100 io w= 41: = oe ae Then we substitute the values into the formula with correction for continuity since we have a2 x 2 table with 1 degree of freedom. O-E|-0. yay lO=El - (169 -58.5|-0. (/48-58.5|-0.5)" | (52-58.5|-0.5)" 385 58.5 415 = 8.24 6. Since 6.63 (tabular value) < 8.24 (computed value), we reject null hypothesis, ‘The preferences of the two groups differ significantly. Test of Independence (One Sample, Two Criterion Variables) The one-sample test of independence differs from the test of homogeneity in that for each sample member there are measures on two variables. The sample used in a test of independence consists of members of randomly drawn from the same population. This test is used to see if measures taken on two criterion variables are either independent or associated with one another in a given population. The calculation of a chi-square test of independence is similar t that made with @ test of homogeneity. 7” . aged 20-58, were given a icq) ‘Example 11.1.3. One-hundred individuals. in the accomaedt § tecnoue sil th age and score were classified #5 SHOWN Tt eompany table: a ie TT eee He Average i 40-59 2B oy ioe: opts 18 12 f Test for the dependency ofthe scores obtained inthe psychomotor te and the individual's age. Set alpha at 10% level. Solution. We follow the steps in using t 1. Hy: The scores obtained in psychomotor : individual's age H,: The scores obtained in 2. Set 5% level of significance , 3. Since the given data is a two- way classification, computed as: the chi-square test. test does not depend on the psychomotor test depend on the individual’s age the degree of freedom ig DF = ‘DF DF : 7 4, The tabular value at 2 DF and 10% significance level is 4.61. (see appendix) .5. For two- way classification, get the expected frequencies using the following formula: k-1r- 2-1)6-1) ew Es (row total)(column totl ) grand total Score > High Average, law, zolal 40-59 2B 20 eee 60 20-39 18 12 10 40 Total 41 32 21 100 41 x 60 E, == ALx 4 _ 1 = ATES = 246 = 164. = 32x 60. E, = 22x60 = 192 STR og 27x 60 EB, = — = 16.2 = 108 “Then ‘we substitute the values into the formula ts (OHEY ae teoat a! ‘ = (3-24. (20-19.2)? | (17-16.2)" | (18-16.4)? (12-12.8)? 10-108)" iF 246 192 162 164" it S108 6, Since 4.61 (tabular value) < 0.44 (computed value), we reject the nl hypothesis. The scores obtained in psychomotor test do not depend on the individual's dix C Chi-Sa : Appendix C CHI-SQUARE DISTRIBUTION TABLE ts Level of Significance (alpha) 0.05 025 oor | 0.005 es 3.84 5.02 oe | 788 4.61 5.99 7:38 921 | 10.60 oo 781 9.49 1134 | 1284 1.28 9.35 114 13.28 | 14.86 9.24 11.07 12.83 15.09 | 16.75 10.64 12.59 1445 | 1681 18.55 12.02 14.07 1601 | (1848 | 20.28 13.36 13.51 17.53 | 20.09 21.96 14.68 16.92 19.02 267 | 2359 15.99 1831 20.48 23.21 25.19 | i 17.28 19.68 2192 | 2473 | 26.76 18.55 21.03 2334 | 26.22 28.30 19.81 22.36 ees 21.06 23.68 2.12 | 2914 | 3132 22.31 25.00 27.49 3058 | 3280 23.54 26.30 28.85 32.00 | 3427 25.99 28.87 31.53 34.81 37.16 28.41 3141 34.17 37.57 40.00 33.20 36.42 39.36 42.98 45.36 40.26 43.77 46.98 50.89 53.67 51.81 55.76 59.34 63.69 66.77 74.40 79.08 83.30 88.38 91.95 140.23 146.57 152.21 158.955 |

You might also like