Professional Documents
Culture Documents
Table No. 1: Boys Girls
Table No. 1: Boys Girls
expectation was that the words frequencies would not be different for boys and girls. The raw frequencies of each word, for both and girls is shown on Table 1.
Word relationship university exam teacher television food shower radio book magazine hell girls 6 7 12 40 13 32 7 12 68 2 42 boys 4 17 22 19 16 36 8 13 70 9 65
The problem with the raw frequencies is that they do not represent the real proportions of categories, since the number of words in the COLT corpus labeled as girls is not the same as the number of words labeled as boys. 50.11% of the corpus words belong to boys and 46.44% to girls, and the left percentage is about words unlabelled. Thus our duty is to normalize the figures by multiplying each of the boys words frequencies by 50/50.11 and the by multiplying the girls words frequencies by 50/46.44
Table no. 1 The table with the normalized frequencies is going to have the following values
Words relationship university exam teacher television food shower radio book magazine hell Girls 6.45 7.53 45.21 43.06 13.99 34.45 7.53 12.91 73.2 2.15 45.21 Boys 3.99 16.96 21.95 18.95 15.96 35.92 7.98 12.97 69.84 8.98 64.87
To see the differences in frequencies we draw the chart as in Figure no. 1, which portrays the corresponding words frequencies for each category boys and girls
Figure no.1
20
40 freqdata$GIRLS
60
80
Figure no.1