You are on page 1of 5

Corpora and Concordancing: Benefits to Oassroom Instruction

Krystal Blanton, Kim Bailey, and Boyd Davis

Background

The advent of computer-supported word storage and searching techniques has changed
the way we write, edit, translate, and teach. This discussion is an overview of uses for
today's corpora, which are carefully-selected collections of machine-readable texts in
multiple genres, and concordancing, which is a technique for working with corpora that
has benefits for classroom instruction in language and literature.

For linguistic purposes, David Crystal states that corpora are "representative sample[s]
of language, compiled for the purpose of linguiStic analysis" (410). They are used to
observe patterns of language features such as conversation characteristics, ideologies
through word association, and registers of spoken language versus academic writing
(Conrad 77). Corpora allow lexicographers to read examples of language usage in
different contexts quickly and easily. Modem-day lexicographers emphasize their
importance: ''This means not only that dictionaries can be produced and revised more
quickly than before-thus providing more up-to-date information about the language-
but also that definitions can (hopefully) be more complete and precise, since a larger
sample of natural examples is being examined" (McEnery and Wilson 106-07). In
addition to lexicographical purposes, corpora support investigations of idioms and
phrases, themes and motifs, writing errors and discourse topics, cases of disputed
authorship, and comparisons of an author's work across different stages or periods,
which aid in a variety of scholarly examinations.

Developing a corpus is more than the simple act of collecting text. The British linguists
Tony McEnery and Andrew Wilson have divided issues surrounding corpus
development into four categories: sampling and representativeness, finite size,
machine-readable form, and a standard reference (29). Sampling and representativeness
refers to building a corpus from a variety of languages, text genres, or authors in order
to have a sample which presents a broad range of language categories. Finite size means
that a corpus has specific limits: for example, it "may include recent texts to be studied
for new words or for changes in the meaning of words. The machine-readable form of a
corpus allows one to search the corpus instantly and easily, using concordancing or
Postscript 70 Blanton, Bailey, &:Davis 71

other text-analysis software. A standard reference means that the corpus provides a groups, ethnicities and languages. The CNCC corpus contains over 600 spoken
standard against which subsequent analyses of the same type can be measured interviews and conversations, and provides web access and delivery of transcripts and
(McEnery and Wilson 30-31). audio/video: access to the full collection is anticipated for winter, 2004. An interesting
component of the CNCC is a collection that focuses on changes over time in the speech
One widely known example of a corpus is the British National Corpus (BNC). This of aging persons who have cognitive deficits, such as Alzheimer's disease. Since 1998,
corpus, begun in 1991, contains over 100 million words, and provides web and Davis, Drs. Linda Moore (Nursing), Dena Shenk (Gerontology) and Ruth Greene
electronic access to transcripts and audio. It is comprised of two parts. The written (psychology, Johnson C. Smith University) have recorded conversations with persons
section makes up ninety percent of the corpus and pulls from print media and other with Alzheimer's disease. Faculty in Gerontology, Applied Linguistics and Speech
texts. The spoken section contains various contexts of demographically diverse Communications from Canada, Germany and New Zealand are currently coding or
conversations. The Oxford University Press collected the written materials in the form soliciting conversations for this component of the CNCC (New South Voices).
of newspapers, periodicals, books, letters, essays, and other text, while the Longman
[Publishing] Group UK collected the spoken materials in the forms of informal Because corpora can be quite large, concordancing programs, which allow the user to
conversations, meetings, radio shows, and phone calls. All materials were put into a search and sort data easily, are often employed for research purposes. Teachers,
standard international format. This corpus represents a wide range of modem British linguists, lexicographers, and researchers use corpora to examine the state of different
English usage, from multiple ages, various regions and social classes, as well as diverse varieties of the English language, such as comparing past and current usage, or to
genres of conversation. The target users of this corpus are primarily workers in the objectively comment on language usage for research purposes (Crystal 410).
fields of lexicography, artificial intelligence, speech recognition and synthesis, literary
studies, and several areas within linguistics, particularly areas dealing with first and Educators could think of a corpus as a webpage on the Internet. U a teacher wants to
second-language acquisition and the development of books and materials for teacher research snakes for his science elass, she can enter the word 'snakes' into a search
training (British National Corpus). engine, such as Google, which will, in turn, return results on websites that contain
information about snakes. The search engine acts in the same way a concordancer does,
The American National Corpus (ANC) is currently developing as a concentrated corpus and the return page acts similarly to a concordance, in that the search engine highlights
of American English, comparable to the BNC. Originated in 2002 and released in the fall at least one occurrence of the word' snakes' on a number of pages. Here are the first
of 2003, the ANC will continue its collection phase through the fall of 2005, at which seven from about 2,300,000 hits from Google's search for' snakes' on May 12, 2004:
time both textual and audio data will be available on search and retrieval software. It
Snakes of North AIIJerica
will contain a core corpus of at least 100 million words and will provide textual data
across a variety of genres, such as transcriptions of spoken data. Audio speech data, Snakes of North America. CLASS REPI1LA ORDER SQUAMATA.
SUBORDER SERPENTES.
video, etc. might be added in a later phase. The target users of this corpus are language
and linguistic researchers, as well as educators (AmericanNational Corpus). FAMILY LEPTOTYPHLOPIDAE (slender blind snakes)._
www.pitt.edu/-mcs2/herpISoNA.html_24k - ~ -Similar pa&es
Recently, the Charlotte Narrative and Conversation Collection (CNCC) has been invited Snakes aQd Re.ptijes - The Intimate Hef,petolojpcaIIndex!
to be a part of the ANC. The CNCC was formally established in 1995 when Professor Snakes and Reptiles, the ultimate herpetological index on the Web!
Boyd Davis led students to record oral narratives and conversations with native and Created by Jason Shadel... Jason's Snakes and Reptiles. Enter if you dare!
non-native speakers of English. The CNCC audio files and transcripts are housed in ... www.snakesandreptiles.com/-6k - ~ - Similarpaies
Natural ResourC$s&:EnvironmentalConservation
Special Collections at University of North Carolina as a part of New South Voices
digital sound project, which is located in Charlotte's J.Murrey Atkins Library. The
...Snakes of Massachusetts. How To Use This Guide. ...Information on
CNCC presents local North Carolinians narratives, offering monologues, conversations Snakes. If you want generalinformationon snakes go to Informationon
Snakes....
and interviews that represent different varieties of the demographic and cultural
characteristics in the state and especially in the Charlotte- Metrolina region, a set of www.umass.edu/umext/nrec/snake_pit/
-8k-May10,2004-~ _
Similar DalZes
eight counties ringing the urban center of Charlotte. Participants represent various age
Postscript 72
Blanton, Bailey, &:Davis 73

-
Snakes RE:ptilesRattlesnakes Photos and Infonnation that
... snake.jpg (32882 bytes). Rubber Snakes Huge 2 pound rubber snakes for
sale. ... Take this test and really find out! Garter Snakes Get your own What corpora and concordandng can do for ELL students
Garter snake today! ...
www.everwonder.com/david/snakes/ - 21k - May 10, 2004 - ~ - Teachers who have learned to use corpora and concordancing programs have found
Similar pages
Florida Venomous Snakes 1 them useful in instructing grammar and vocabulary for both English language learners
(ELls) and native English speakers; the teachers remark that these tools are helpful in
An online fieldguide to the venomous snakes of Florida is presented. ...
developing and expanding students' writing and vocabulary skills.
Florida Museum of Natural History's. Guide to Florida's Venomous
Snakes. ...
Project MORE, an OELA-funded teacher-training initiative sponsored by the U.S.
www.flmnh.ufI.edu/natsci/herpetology/fl-guide/venomsnk.htm - 31k-
Department of Education, is an expansion of the CNCC. Project MORE, which supports
May10,2004- ~ -Similarpages
Venomous Snakes the collecting, transcribing, editing, and encoding of metadata to narratives in English
and other languages, draws on the narratives in the CNCC to create classroom
Venomous Snakes. Venomous means poisonousl ...Snakes are only 1 inch
materials to aid teachers in their work with English language learners (ELLs). The goals
tall, and they're scared of you! Most snakes are not venomous, but a few
are.... of the project are to provide a resource for those engaged in teacher education, support
professional development for K-12 content-area and ESL teachers, and create classroom
pelotes.jea.com/vensnake.htm- 7k - ~ - Similarpages materials for K-12 content-area and ESL classes. Target languages for inclusion in the
BBC- Number Time-Snakes and ladders
corpus are based on the predominant native lartguages spoken by the student
... Web LinksSchoolsHelp Copyright.About the BBCContactUs Help
population in Charlotte-Mecklenburg Schools (CMS). These languages are multiple
Like this page?Send it to a friendl Snakes and ladders. Backto 'Playa
varieties of Spanish, Arabic, and Chinese, as well as Japanese, Vietnamese, Hmong,
game' menu. ... Russian, Korean, Gujarati. and German. The project targets practicing content-area
www.bbc.co.uklschools/numbertime/games/snakes.shtml-15k- ~- teachers in CMS, prospective content-area teachers enrolled in UNC Charlotte and
Similarpages seeking NC licensure, and practicing English as a Second Language (ESL) teachers who
Hot Snakes
coach their content-area colleagues to implement these narratives for use as models in
www.hotsnakes.com/- 2k - ~ - SiITiilarpages
their classrooms. Both corpora and concordancing are used to create supplemental ESL
teaching materials keyed to both the NC Standards and textbooks in use in CMS, to
In the same way, University of Nagoya Professor Mitsuharu Matsuoaka's online
train current and future ESL and content-area teachers on how to use computer-based
concordancer displays the results of queries, here, of Dickens' Bleak House. These are the technologies and implement these in their classes, and to develop the cultural
first six occurrences (by line number) of the 67 instances of the word cold: the full list competence of current and future public school teachers.
could let readers look at how Dickens reinforces connections between physical and
emotional states:
Teachers can search the Michigan Corpus of Academic Spoken English (MICASE) for
I ill: pulpit breaks out into a ~ sweat; and there is a general smell and words or phrases in specified contexts, returning concordance results with references to
-
~: her, said slowly in a gllit low voice I see her knitted brow and files, full utterances, and speakers. They do this by going to
1011: then. When she gave me one m!4 parting kiss upon my forehead, like http://www.htiumich.edu/m/micase and running concordances on several words to see
a
the different meanings that they have in different academic disciplines. Teachers can
1Ul: that Miss Donny thought the gilil had been too severe for me and lent then teach vocabulary through context using concordances and propose other
me her illZ: flame, and looking raw and ~ -
that I read the words in the classroom applications based on the concordancing activities. One such activity was
newspaper developed for Project MORE by Barbara Boal, a middle school teacher. To help ELL
1828:Theeveningwas so very ~ and the rooms had such a marshy smell students master vocabulary, Ms. Boal wrote a fictitious interview between herself and
Postscript 74 Blanton, Bailey, & Davis 75

one of the characters in a story from the CNCC, recorded the interview for a listening British National Corpus for the words above, selected relevant examples and modified a
activity, and created a Cloze Activity for students to use as they listened to the few of the examples to make them more comprehensible to ELLs.
interview a second or third time. The following is a portion of the Cloze Activity she Eating vegetables and fish is very healthy
gave to her students: The menu offers grilled fresh fish and seafood, steak, and ribs
MB Since having Christmas trees was a new tradition and custom for
American (15) ' did many people (16) _
the trees?
Did you read the new book by Professor Fish? I have not.
Are you fishing for your supper? Good luck!
HS
_
Oh my yes! The people loved them! Captain Herman had many old
friends and always, there were new (17) to buy the trees all year.
Oh, they were the best in all of Chicago. People were so happy to have For National
Forrest Gump bought a fishing boat to catch shrimp
Last year, those fishermen caught a lot of fish!
Public Radio, I'm Sophie Fisher in Geneva.
them. The lights and (18)_ were so beautifuI.
It '" ... at
Corpora and concordancing in the literature and composition class
MB What happened then?
HS No one knows for sure. The captain, his boat, and the crew were never
seen again. Days later, two (23) _
found some of the trees.
The table below summarizes the students' responses for Items 15-17 and 23:
A lesson we developed for university faculty teaching introductory literature and
composition courses is "Computer-assisted searches for context and meaning: a
concordancing tutoriaI." Sections of this activity have been adapted for introductory
Responses to the Cloze Activity from ELL Students (High-Beginning Oass) courses in literary analysis and composition, and for an advanced high school English
course. The following lesson was initially developed for with students enrolled in an
English MA class at UNC Charlotte:
We're used to how a browser uses key words to search for web pages, to locate a
page or site with the information we want. A concordancer is a tool that is used
in a similar way, to search for words in a text or a collection of texts. The
concordance that is produced will let us review how the word "changes
meaning" in different contexts.
buy - buy Buy buy buy buy buy buy 1. Introduction: an overview of how concordances display word-in-context:
custom - customer Costu costumes costumer custom custo custom
Click here (http://sara.natcorp.ox.ac.uk/lookup.html) to look at the word
'ferocious':
? s mer s s n
fishma fisher Fishma fishme fish fishme .. .smack down counters on backgammon boards with ferocious violence.
fishinlng fisheling fisheling
n n n n .. .campaign had turned into a more adult and less ferocious criticism.
...he was in a ferocious. temper
Based on an analysis of student responses, the following language skills and content ...put up a ferocious but unsuccessful fight...
..
were given priority in this class:
listening to longer and longer stretches of material
understanding how English handles plurals (count and non-count nouns),
... keeping ferocious dogs is not exactly new
...the ferocious teeth of a predatory
Although we may first think of ferocious as applying only to animals (and the
createscompounds and changesparts of speech: acts committed by their teeth), we see an extension of that meaning to the teeth,
fish-fishing-fisherman(insteadof "fishman"), to criticism, to temper, and to small violent actions in a game.
Fish or Fisher (last name) . 2. Using Blake's poetry: Go to http://www.dundee.ac.uk/english/wics/wics.htm,
select Blake, locate the concordance, click on D, and choose the word dark
DARK 5
Examples from concordances can illustrate this. We ran concordances using the
And so Tom awoke and we rose in the dark The Chimney Sweep
Innocence
76 77
Postscript Blanton, Bailey, &: Davis

The night was dark no father was there The Little Boy Lost a concordance using themes. They do this to understand the role of narrative in
Innocence different aspects of art. This assignment makes prospective teachers more aware of their
A Dream Innocence future students' needs, particularly those of non-native English speaking students. It
Dark, benighted travel-worn
And his dark secret love The Sick RoseExperience also helps prospective teachers to understand what a personal narrative is and how to
Dark disputes &:artful teazing. The Voiceof theAncient Bard use personal narratives in art education, especially in lesson planning and production of
Experience artwork.
What do you notice,and what are the implicationsfor teachingthis text or this author?
3. For an advanced discussion of Dickens' BleakHouse:enter the word 'dark' at Conclusion
http://humwww .ucsc.edu/dickens/searchworks/searchworksindex.html
Chapter 4 Corpora and concordancing have multiple uses in today's classroom. Whether a learner
...to tell us that as the road to Bleak House would have been very long, dark, and is a native speaker or a nonnative speaker of English, whether the classroom is
tedious on such an evening, and as we had been travelling already, Mr.... elementary, middle school, high school, or college level, the educator can use these tools
...more children on the way up, whom it was difficult to avoid treading on in the as an aid in teaching students in a new and innovative way.
dark; and as we came into Mrs. Jellyby's presence, one of the poor little...
Works Cited
Chapter 11 .

...A touch on the lawyer's wrinkled hand as he stands in the dark room, American National Corpus. 27 Oct. 2003. Department of Computer Science, Vassar
irresolute, makes him start and say, "What's that?" "It's me:' returns the old College. 31 Oct. 2003 <http://americannationalcor,pus.org/>.
man... British National Corpus. 12 July 2002. Oxford University Computing Services. 31
...say," observes a dark young man on the other side of the bed. "Air you in the Oct. 2003 <http://www.natcor.p.ox.ac.uk/h1dex.html>.
maydickle prayfession yourself, sir?" inquires the first. The dark young... Charlotte Narrative and Conversation Collection. J. Murrey Atkins Library &:
...The dark young surgeon passes the candle across and across the face and Information Services, UNC Charlotte. 31 Oct. 2003
carefully examines the law-writer, who has established his pretensions to his... <http://newsouthvoi~s.\!qcc.ed\!/cn~c.htm>.
Chapter 48 Conrad, Susan. "Corpus Linguistic Approaches for Discourse AJ:1alysis."
...the Dedlocks of the present rattle in their fire-eyed carriages through the Annual Review of Applied Linguistic 22 (2002): 75-95.
darkness of the night, and the Dedlock Mercuries, with ashes (or... Crystal, David. The Cambridge Encyclopedia of Language. New York: Cambridge
...The pretty face is checked in its flush of pleasure by the dark expression on the University Press, 1987.
handsome face beforeit. It lookstimidly for an explanation."And if... McEnery, Tony and Andrew Wilson. Corpus Linguistics: An Introduction. Edinburgh:
...that motherly touch of the famous ironmaster night, lays her hand upon her Edinburgh University Press, 2001.
dark hair and gently keeps it there. '1 told you, Rosa, that I wished you. to..." New South Voices.30 Oct. 2003. J.Murrey Atkins Library, Special Collections, UNC
What do you notice,and what are the implicationsfor teachingthis text or this author? Charlotte. 30 Oct. 2003 <http://newsouthvoices.uncc.edu/cn~c.jsp>.
Project MORE. 30 Oct. 2003. Department of English, UNC Charlotte. 31 Oct. 2003
<htto:/Ieducation.uncc..eduLmorelDefault.htm>.
Corpora and concordancing developed for art class

Project MORE sponsors mini-grants for Arts and Science faculty at UNC Charlotte. To
The University of North Carolina at Charlotte
be eligible, faculty must have 50% or more teacher licensure candidates in their courses.
They must also revise course curricula and assignments to include materials from the
CNCC and support the goals of Project MORE. An example of the work done by mini-
grant recipients is illustrated in Dr. Susannah Brown's use of narrative in her Art
Education class. Brown's students connect their own personal narratives and those of
their classmates, community members, and professional artists to the CNCC by running

You might also like