Professional Documents
Culture Documents
Biometry 210 PDF
Biometry 210 PDF
0.95 — datas in close agreement with hypothesis; perhaps suspiciously so? df = 4 and calculated x? = 6.85 105 25 1.0 o1 3367.78 949 111 13.1 18.5 85 Conclusion: 0.10
3 Fig 9.3 93.1 | NOTES ON THEORETICAL x? a) _x7has only positive values (sum of squares) b) For f=1, most probable values are near zero, ©) Forf>1,2?>z, thus having a long tail. 4) For f large, x? = 724 is unlikely to have values close to zero, because all z would have to be simultaneously small. Hence the Mode increases with f and the general shape is Positive Skew. 9.32 PROPERTIES OF THEORETICAL x? a) 2¢* is a Continuous distribution. b) _Itis defined by one parameter, F the degrees of freedom. ©) Mean (yi) a) x? + NDasf + 0. Variance (0) = 2f. 933 RELATIONSHIP BETWEEN PEARSON'S x? AND THEORETICAL x? a) The distribution of Pearson's x” is the same as Theoretical x” if each nj is ND about mean npj. In practice, since 1, are integers this cannot happen, and thus Pearson's x? is discontinuous. b) —_ Discontinuity of Pearson's x? is much less pronounced — if n is large. i.e., Pearson's x? + Theoretical x?n + oo and provided no p, is small.9.3.4 a) Dy 56 CONDITIONS FOR A REASONABLY GOOD APPROXIMATION nn should be fairly large. No p, should be small so that any np, is small, .e., no expected frequency should be small, usually accepted as <5. NOTE: If condition (b) is not fulfilled, the distribution of the n, concerned will tend to be Poisson rather than normal ©) General Rule: NO EXPECTED FREQUENCY SHOULD BE <5 For certain “goodness of fit" type problems discussed later, this rule is relaxed to <1. NOTE: i) To comply with the General Rule it is sometimes necessary to combine adjacent classes. This reduces the DF of the test, with subsequent loss of sensitivity. There are occasions, however, where the combining of classes is not logically possible. ii) The main risk is that a significantly high value of x? may occur through applying the test when it cannot give an accurate result. In other words, suspect a large x? value which comes mainly from a class with aa small expected frequency. 9.4 APPLICATIONS OF CHI-SQUARE 9.4.1 RATIOS/PROPORTIONS GIVEN BY HYPOTHESIS. ( Genetics ) EXAMPLE 9.4: Consider the investigation into whether seed colour (yellow or green) is inherited independently of seed shape (round or wrinkled), Attribute : RY__wy RG) _wG Totals ‘Observed frequency (1) 315 -10l—«108—=—«32 556 Hypothetical ratio (Mendelian) 9 3003 1 16 Hypothetical proportion (p,) 6 ee 16 1 Expected frequency (np) 31275 10425 104.25 34.75 $56 (n - op) 225-325 3.75 -2.75 0 Thus xd = {2B + S85 + Be } - 0.4700 (0.90
2] number of samples, the observed table is regarded as one of a large finite number of possible tables (with the same marginal frequencies) which could have arisen by chance on the independence hypothesis. Now, if two attributes are INDEPENDENT, the probabilities corresponding to the various combinations of the two types of attributes in the cells of the table are given by the MULTIPLICATIVE LAW of probabi37 Remember, the Multiplicative Law for Independent Events states that : ~ Iftwo events A & Bare independent, P (event A and B) = P(event A) x P(event B) Consider the following table of frequenci Attributes Type A Type B: 1 2 3 Totals 1 mu oo m0 2 at = i na 95 na Totals ila ie a n Now, a) 92 = Probability that an individual selected at random (from observed n) will have the i attribute of Type A. b) “t= Probability that an individual (so selected) will have the it* attribute of Type B. ©) Thus "22% — probability that an individual so selected will, on hypothesis of independence, have the combination of these two atibutes (@) Expected frequency: cell(ij)= "8 x n=" ; compare this with observed frequency nj ie. EXPECTED FREQUENCY cell (i,j) = ®" you pqama Tos! “Fotal Frequency {9.3} calculated for all combinations of rows and columns totals, i... for (rx e) cells (classes). And = Dee . over all (rx €) classes in the contingency table. NOTE: When expected frequencies are calculated using marginal totals x? has (r—1e—1) df since there are only (r— 1X€ — 1) independent deviations, EXAMPLE 9.5: Independence of Hair and Eye colour (c.f. Mead and Curnow, page 226) Hair Colour: Eye Colour: __Fair Brown Black Red Total Blue 1 807 189 47~——«8IT Grey/Green 9461387 746533132 Brown 11S__438_288 16 857 Total 12829 26321223 116 6800, QUESTION: Is the distribution of hair colour the same for each eye colour and vice versa? —i.c., are the attributes INDEPENDENT?Altematively, do people with blue eyes, for example, tend to have fair hair more often than those with brown eyes in other words is there some ASSOCIATION between attributes?58 NOTE: For any (F x c) Contingency table, INDEPENDENCE means that the proportions of Type A attributes are the same for each attribute of Type B. If this is NOT the case, the attributes are said to be ASSOCIATED. Thus Hy : Attributes A and B are Independent. Ha: Attributes A and B are Associated. METHOD OF COMPUTATION: For each cell (i,j) calculate the expected frequency = 1 x (R,C,) where n is the total number of observations and R, and. ; are the corresponding row and column TOTALS for the (i,j)"* cell. For the above example we thus have : Hair Colour EyeColour Fair Brown Black Red Total © 1768 807 189 47 2811 Blue E 1169 1088 = 506 48 2811 (0-4) 599-281-316 “1 Grey/Green O 946 1387146 3 3132 E 1303 1212 563 3 3132 (0+) 357175 183 0 Brown O 5 438 288 16 857 E347 332) 154 1s 857 (+) 242 106 134 1 Total 2829 26321223 he 6800 NOTE: Forexample: 332" = 425857 = ete and, xs ayay = (HB + +H) ie, x§=1075 (p<0.01) i.e., the result is highly significant. Conclusion: Ho is rejected, and we conclude that the attributes of hair and eye colour are, in some way, associated, 9.4.2.2 SPECIAL CASE — (2x2) CONTINGENCY TABLE. Consider 2 x2 table of frequencies: totals ab atb © od c+d fowls ate bed atb+c+d=TOTAL(N) = dab" beers) thes: uncorrected) = Ge ppye rN AoE) coli} (Ditlerene of erssproducts}"x TOTAL Product of marginal tals NOTE: Probability statements for (2 x 2) tables can be seriously in error, especially for small number of observations and YATES suggested the following “correction for continuity’ [ad—be | ~ } TOTAL}? x TOTAL aderteorrected) = ee (9.4b}59 9.4.2.3 SPECIAL CASE — (2 x ©) or (R x 2) CONTINGENCY TABLE: As in the case of (2 x 2) contingency tables, a “quick! formula is available for the calculation of x? in the case of (Fx 2) and (2 x ¢) contingency tables, viz : — xe = Deere 19.5} with df = (F— 1) (— 1) as before. This use of formula{9.5} will be demonstrated with the following example. It is to be noted that it is usually more convenient to use the column with the smaller numbers: here the column “number with mastitis’ is labelled (11). EXAMPLE 9.6: It is postulated that mastti is tabulated below. genetically transmitted through the sire. The incidence of mastitis in the progeny of 8 bulls number number Sire _ with mastitis (ma) without mastitis (m2) Totals (mio) 1 6 30 36 2 in 40 31 3 8 26 34 4 6 21 2 5 7 2 29 6 4 42 56 a 6 20 26 8 4 19 23 Totals 62 220 (= nea) 282(=n) Ho: Mastitis is independent of sire. Applying formula {9.5} we get : (6736+...+4703). css 312) — 13232 (ns) x (62x OVE Ors Conclusion: H, is accepted, the incidence of mastitis is independent of sire. 9.5 ADDITIVE AND PARTITIVE PROPERTY OF x? 95.1 PARTITIVE PROPERTY ‘Any x2-variate with f DF may be partitioned into f independent x® — variates each with 1 DF. Note: i) This is similar to the complete orthogonal partitioning of Sums of Squares in ANOVA. ii) The partitioning need not be complete, ie., some component x® may have more than 1 DF. 95.2 ADDITIVE PROPERTY ‘The sum of a number of independent x? variates is also a x? — variate with DF equal to the sum of the individual DF. EXAMPLE 9.7. Ina series of tests (e.g. tossing of a coin by 6 different people) the following values of x each with | DF were obtained: x DE e 3.27 1 0.05 — 0.10 934 1 <0.01 6.08 1 0.01 - 0.02 251 I 0.10 —0.20 5.61 1 159 1 TOTAL x? 28.40 6 ‘CONCLUSION Although the individual tests give varying results, TOTAL x? provides convincing evidence against the hypothesis.NOTE: a) DO NOT confuse the addition of the contributions to x? from the various classes in the application of the formula > ( %2% ) 2= x? { c.f Example 9.5} with the Additive Property of x2. pp z b) The Additive Property of x? does not make any stipulation that the x? values must have the same DF. ©) TOTAL x? does NOT take any account of the directions of the deviations from the hypothesis within each independent estimate of x?. d) TOTAL x? does not reflect whether or not any deviations from the hypothesis are consistent over the conditions of the individual (independent) experiments. 9.6 x? ASA TEST OF HETEROGENEITY 2) The additive property of x? does not make any stipulation that the x? values must have the same DF. b) Total x? does take any account ofthe directions of the deviations from the hypothesis. ©) Total x? does not reflect whether or not any deviations from the hypothesis is consistent over the conditions of the individual (independent) experiments. 4) Often an experiment is carried out under a number of similar (or expressly changed) conditions with a view to testing a hypothesis under such conditions. Often an experiment is carried out under a number of similar, or expressly changed conditions with a view to testing a hypothesis under such conditions. Thus if we pool the data from all tests in a single frequency table and apply the ? test we need some assurance of HOMOGENEITY throughout the series before we can accept agreement with the hypothesis. Such a testis provided by a “Test of Homogeneity”.EXAMPLE 9.8 Seed shape segregation (Genetics). . NOTE: It is often more convenient to use the formula : ad = ee where (ky + ky ) is the hypothetical ratio, for the calculation of x? when there are only two classes and the probabilities are given by hypothesis Applying the above method to the data of the Example 9.7 we get: Ho: Segregation of seed shape according to the hypothesis of : Round : Angular :: 3 : 1, is consistent over all progeny, ic the results are HOMOGENEOUS. (ma) (m2) Plant No. Round seed Angular seed Totals(no) x7=%53"" DF 1 45 n 37 0.4737 1 2 27 8 35 0.0857 1 3 24 7 31 0.0968 1 4 19 10 29 1.3908 1 s 32 n 43 0.0078 1 6 26 6 32 0.6667 1 7 88 24 112 0.7619 1 8 2 10 32 0.6667 1 9 28 6 34 0.9804 1 10 25 7 32 0.1667 1 Totals 336(no,)-—*101 437(n) $2972 10 Pooled 34, = 0.8306 (025
1.0. ‘The degrees of freedom for x is6=(8 — 1 — 1), there being a further loss of I df since it was necessary to estimate y. (as X ) from the data itself. bma/2006