You are on page 1of 4

Consultant Advice

Mike McCarthy

Why Corpora are Important for Language Teaching
Not everyone would necessarily agree that corpora are important in language teaching but I hope
to convince you in this short article that they are. Basically, there have always been three big
preoccupations in language teaching, the what (the content of our teaching), the how (the
methodology) and the who (teachers, learners, curriculum designers, policy makers). It is my
belief that corpora can assist us in all those areas.
Let us just remind ourselves what a corpus is (plural: corpora). A corpus is a database of texts held
on a computer, along with information (metadata) about those texts, for example, who wrote them
or spoke them, when, where, etc. The texts could be written texts from newspapers or magazines,
websites, student essays, etc. or they could be transcripts of spoken conversations, business
meetings, oral exams, lectures and so on.
By using a well-constructed corpus we can get a unique window into a language, how people use it
in their everyday lives and, in the case of learners, how they develop their knowledge of the target
language and the uses they typically put it to. We can see how hundreds of thousands of different
speakers and writers have used the language. We no longer need to just examine our own intuition
or try to recall things we were taught about the language because we have evidence of how the
language is really employed by its users, whether native users, expert non-native users or learners.
We can find out immediately which words are most common, or which words are rare - words that
we perhaps dont need to concentrate on in our teaching. In figure 1, we see some relatively
common grammatical structures that are used in everyday spoken language, alongside some rare
ones that are typically only used in formal writing. We can also see how people interact, how they
start and finish conversations, how they interrupt one another and so on. For example, people often
say Anyway when they want to signal that they think a conversation is coming to a close. So we
can learn a lot about how language is used in everyday communication by ordinary human beings.
If we have a feeling that there may be cultural differences between L1 and the target, we can use
the evidence of corpora to find out if these differences really exist or if they are perhaps just deep-
seated pre-conceived notions that have little or no basis in reality. For example, do speakers of
British English say please and sorry more often than speakers of North American English?

Do Japanese speakers remain silent longer before they answer questions when speaking English
than do English native speakers? Well-designed corpora can give us objective evidence to answer
these kinds of questions.
This kind of evidence will be very useful for us as language teachers mainly because we are not
very good at knowing how we use the language; whether we are native speakers or non-native
speakers, our intuition is not good. Most people, when they think about how they use their
language, will look back on what they were taught at school and just give the set of rules that they
were given. A corpus enables us to be objective about the language; it enables us to see exactly
how it is used and to get rid of some of the prejudices that exist. For example, educated native
speakers of English will often deny that they use expressions such as you know and or whatever
or and stuff, condemning such usage as street language or lazy teenage language, and yet will
often unconsciously use those very expressions themselves when criticising others. We need a
corpus in order to achieve objectivity about the language. And as well as native-speaker and non-
native expert user corpora, we need corpora that show us how learners speak and write, both in
terms of how their knowledge grows and what difficulties they encounter on the way.
So, most obviously, corpora tell us a lot about the what of language teaching, while classroom
corpora, corpora of learner exams and learner data in general can tell us much about the who and
the contexts in which teachers and students use the language. But when we see how people
interact (especially in everyday conversation), it often also changes our way of thinking about the
how, our methods and techniques for teaching. For example, if we are interested in grammatical
differences between speaking and writing, it is very difficult to be objective about how we speak
when we are in the middle of a conversation, either as a speaker or as a listener, so developing
language awareness and noticing skills alongside teaching the forms of grammar is the best way to
train learners to become good observers. Incorporating a noticing and/or inductive element into
grammar teaching involves changing how we teach grammar, not just what aspects of grammar we
choose to teach. Such a methodology is incompatible with traditional behaviourist grammar drilling
or simply learning rules by heart and applying them. In that way our students will go on being good
language learners after they have left the classroom. Another example of how corpora influence
methodology would be the fact that spoken conversational corpora show us that listeners are active
and show their comprehension by the way they respond. This suggests a change in methodology

from separating the speaking and listening skills to combining them in a single approach, where
listening activities are incomplete without appropriate responses. So, not only the what of the
conversation class changes, but also the how. For that reason, knowledge and awareness about
corpora should be part of every teacher education programme and any teacher education
programme that ignores the corpus revolution is missing something very important.
Corpora nowadays play an important role in the design of dictionaries, grammars, vocabulary
learning materials and course books, as well as in approaches to assessment and, in a way, many
of the arguments in this article have already been won. What is needed now is greater access to
corpora, not just for academics and experts but for teachers and learners in the wider language
teaching community. If you want to know more about corpus design, types of corpora and how they
are being used, read the chapters in McCarthy and OKeeffe (2010) and on ways of applying
corpus insights to our understanding and teaching of grammar, vocabulary, speaking skills and so
on, see OKeeffe et al (2007). If you want to see how corpora can influence the content and
methodology of language courses, have a look at the Touchstone and Viewpoint adult course
series published by Cambridge University Press. If you feel you are ready to try out a bit of corpus
analysis, good online resources with open access include the American National Corpus
(, the British National Corpus
( and the MICASE corpus of spoken academic English
( You can access a number of learner corpora via the listings

OKeeffe, A., McCarthy, M. J. and Carter, R. A. (2007) From Corpus to Classroom, Cambridge:
Cambridge University Press.

OKeeffe, A. and McCarthy, M. J. (eds.) (2010) The Routledge Handbook of Corpus Linguistics.
Abingdon, Oxon. and New York: Routledge.

Figure 1
frequent infrequent
It-clefts: It amazes me
how quickly things can
arrive when you order
them on the internet.
Negative past
subjunctive: The minister
insisted that it not be
included in the public
Reported speech with
past continuous reporting
verb: Alan was saying the
village hall nearly caught
fire last night.
Verb + the + most + -ly
adverb: Surfaces that cool
the most rapidly should
be treated first.