You are on page 1of 2

From Colt corpus, the frequencies of 10 words were investigated for both boys and girls Our initial

expectation was that the words frequencies would not be different for boys and girls. The raw frequencies of each word, for both and girls is shown on Table 1.
Word relationship university exam teacher television food shower radio book magazine hell girls 6 7 12 40 13 32 7 12 68 2 42 boys 4 17 22 19 16 36 8 13 70 9 65

The problem with the raw frequencies is that they do not represent the real proportions of categories, since the number of words in the COLT corpus labeled as girls is not the same as the number of words labeled as boys. 50.11% of the corpus words belong to boys and 46.44% to girls, and the left percentage is about words unlabelled. Thus our duty is to normalize the figures by multiplying each of the boys words frequencies by 50/50.11 and the by multiplying the girls words frequencies by 50/46.44

Table no. 1 The table with the normalized frequencies is going to have the following values
Words relationship university exam teacher television food shower radio book magazine hell Girls 6.45 7.53 45.21 43.06 13.99 34.45 7.53 12.91 73.2 2.15 45.21 Boys 3.99 16.96 21.95 18.95 15.96 35.92 7.98 12.97 69.84 8.98 64.87

To see the differences in frequencies we draw the chart as in Figure no. 1, which portrays the corresponding words frequencies for each category boys and girls

Table no. 2 (Normalized frequencies)


80 70 60 50 40 30 20 10 0 relationship exam television shower book hell
Boys Girls

Figure no.1

Frequencies of words spoken by girls


frequencies spoken by girls
6 Frequency 0 0 1 2 3 4 5

20

40 freqdata$GIRLS

60

80

70 60 50 40 30 20 10 0 relationship exam television shower book hell


Girls

Figure no.1

You might also like