Professional Documents
Culture Documents
Text/Speech/
Video + Annotation
Digital media
Formal vs informal
(Meyer, 2002)
d.Arts 3,48%
Speech
Demographically
Sampled 153 4,211,216 41%
Unclassified 54 761,973 7%
Writing
Type Number of Text Number of Words % of Written Corpus
Imaginative 625 19,664,309 22%
Natural Science 144 3,752,659 4%
Applied Science 364 7,369,290 8%
Social Science 510 13,290,441 15%
World Affairs 453 16,507,399 18%
Commerce 284 7,118,321 8%
Arts 259 7,523,846 8%
Blief & thought 146 3053672 0.03
Leissure 374 9,990,080 11%
Unclassified 50 1,740,527 2%
Total 3209 89,740,554 99%
Speech
Type Number of Text Number of Words % of Spoken Corpus
Dialogues 180 360,000 59%
Private
(direct conversions, distance
conversions) 100 200,000 33%
Public
(class lessons, broadcast
discussions, broadcast interviews,
parliamentary debates, legal cross-
examinations, business
transactions) 80 160,000 26%
Monologues 120 240,000 40%
Unscripted
(spontaneous commentaries,
speeches, demonstrations, legal
presentations) 70 140,000 23%
Scripted
(broadcast news, broadcast talks,
speeches (not broadcast)) 50 100,000 17%
Total 300 600,000 99%
Draft
S1B-071d
S1B-072d
etc.
Lexical version
S1B-071l
S1B-072l
etc.
S1B-071p1
S1B-072p1
etc.
S1B-071p2
S1B-072p2
etc.
Lecture 3
Corpus Design II (Annotation)
Readings: Meyer (2002) Ch4; Sampson
and McCarthy (2005) Ch 39; Garside
(1997) Chs 4,5,16
Inform me and Ayisigi (in writing) of
your chosen corpus tool for software
review by 17 March. Precheck w. Ayisigi
that the tools suits the task criteria.