You are on page 1of 2

LESSON 1. WHAT IS CORPUS?

Introduction to Corpus Linguistics

Technology has become extremely valuable in doing research. It is indispensable in collecting and
recording a large amount of data; hence, storing data digitally has become an important method in
qualitative language research. In a webinar conducted by Dita (2020), she argues that in a corpus-
inspired pedagogy we can work on independent projects and on topics that are relevant to us

Corpus is defined as a collection of texts used for linguistic analyses, usually stored in an electronic
database so that the data can be accessed easily utilizing a computer. (Butler,1997; Sinclair,2004)
To explain how to build a corpus, Meyer (2002) defines it as a collection of texts or parts of texts
upon which some general linguistic analysis can be conducted. Corpus texts usually consist of thousands,
millions, or billions of words and are made up of a linguist’s or a native speaker’s naturally occurring spoken
and written language.
A corpus addresses the need for descriptive linguistics for natural language data. For the word
English phenomenon, for example, teachers and scholars may use the following corpora (the plural form of
corpus)
a. A Representative Corpus of Historical English Registers or the ARCHERCorpus is a collection of
written texts in Early Modern British and American English.
b. The British National Corpus (BNC)
c. The Brown University-Corpus (Brown) of American English
d. The Lancaster/Oslo-Bergen Corpus (LOB) of British English
e. The Freiburg-LOB Corpus of British English (FLOB)
f. The Freiburg-Brown Corpus of British English (Frown)
g. The Survey of English Usage (British English)
h. New On Web (NOW)

LESSON 2. Two major types of corpora (Meyer, 2002)

1. multi-purpose and special purpose corpora in relation to genre

-Multipurpose corpora are intended for a variety of different purposes; hence, they contain a broad range of
genres.
-special-purpose corpora are in are restricted in scope and directed toward a specific purpose. For instance,
the Michigan Corpus of Academic Spoken English (MICASE) was created to analyze the types of speech
used by individuals conversing in an academic setting.

2. Synchronic and diachronic corpora in relation to time frame

-synchronic corpora contains samples of presently spoken and written language created within a relatively
narrow time frame.
-diachronic corpora contains samples of texts used to study historical periods of a language. The time frame
for texts is somewhat easier to determine given that historical periods of a language are well-defined.

The majority of present-day corpora are composed of texts collected according to specific principles such as
different genres and registers or styles of English. These sampling principles do not follow language-internal
criteria but language-external criteria. The texts for a corpus are not selected because of their high number
of relative clauses, but because they are instances of a predefined text type (e.g. broadcast English in a
hypothetical corpus of Australian English).

Introduction to Corpus Linguistics

Technology has become extremely valuable in doing research. It is indispensable in collecting and
recording a large amount of data; hence, storing data digitally has become an important method in
qualitative language research. In a webinar conducted by Dita (2020), she argues that in a corpus-
inspired pedagogy we can work on independent projects and on topics that are relevant to us

You might also like