You are on page 1of 8

CORPUS LINGUISTICS

PREPARED BY : ELONA BARDHI


CORPUS LINGUISTICS

Corpus linguistics  is a methodology in linguistics that involves computer-based empirical analyses (both quantitative and qualitative) of actual
patterns of language use by employing electronically available, large collections of naturally occurring spoken and written texts, so-called corpora.
Corpus-based and other types of empirical linguistic research have shown that speakers' intuitions oftentimes provide only limited access to the
open-ended nature of language, which can cause problems when examining unexpected or infrequent linguistic structures, e.g. as regards lexical co-
occurrence patterns, patterns of variation between grammatical constructions, word meaning, or idioms and metaphorical language.
The factors that condition the choice between competing grammatical variants is one topic that features prominently in research and students'
projects at Mainz University. While grammar books make us believe that e.g. yet is a trigger of present perfect, we can observe U.S. election
campaigns featuring the sentence "Did you vote yet?". While standard reference works used by school teachers advise pupils to use the synthetic
comparative -er with monosyllabic adjectives, we observe native speakers to use more apt, more proud rather than prouder, apter in the majority of
cases. While the 's-genitive is described as being used with persons while the of-genitive is allegedly to be used with things, linguists who do
research on actual language use find a marked discrepancy between what is taught and what is done. Thus,  the topic's relevance cannot be
stigmatized as an exception or even be marked as incorrect. The issue of variation poses an intriguing challenge for English teachers and researchers.
While to some the task of bringing schoolbook knowledge up to scratch with actual language use seems insurmountable, English Linguistics at
Mainz University tries to offer ways out of the dilemma.
METHODS

Corpus linguistics has generated a number of research methods, which attempt to trace a path from data to theory. Wallis and Nelson  first introduced what
they called the 3A perspective: Annotation, Abstraction and Analysis.
•Annotation consists of the application of a scheme to texts. Annotations may include structural markup, part-of-speech tagging, parsing, and numerous
other representations.
•Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically
includes linguist-directed search but may include e.g., rule-learning for parsers.
•Analysis consists of statistically probing, manipulating and generalizing from the dataset. Analysis might include statistical evaluations, optimization of
rule-bases or knowledge discovery methods.
Most lexical corpora today are part-of-speech-tagged (POS-tagged). However even corpus linguists who work with 'unannotated plain text' inevitably apply
some method to isolate salient terms. In such situations annotation and abstraction are combined in a lexical search.
The advantage of publishing an annotated corpus is that other users can then perform experiments on the corpus (through corpus managers). Linguists with
other interests and differing perspectives than the originators' can exploit this work. By sharing data, corpus linguists are able to treat the corpus as a locus of
linguistic debate, rather than as an exhaustive fount of knowledge.
Recent studies have suggested treatment outcome in adolescents with social anxiety disorder can also be assessed by analyzing language by means of Corpus
Linguistics.
THE HISTORY OF CORPUS LINGUISTICS

The modern field of corpus linguistics – based around the computer-aided analysis of extremely large
databases of text – is largely a phenomenon of the late 1950s onwards. Its early history was marked by
opposition from, in particular, Noam Chomsky, who favored a rationalist view over the empiricism
associated with corpus-based approaches. However, corpora have been shown to be highly useful in a
range of areas of linguistics (but perhaps most notably lexicography and grammatical description).
Modern corpus linguistics was formed in the context of work on English, though it is now applied to
many different languages; it was in this context that techniques such as corpus annotation, and important
concepts such as collocation, emerged. Alongside this history of corpus linguistics considered as a
methodology stands the history of an alternative approach, sometimes called neo-Firthian, within which
the study of words, phraseology and collocation in corpora are the keystone of linguistic theory.
Broadly corpus linguistics looks to see what patterns with lexical and grammatical features. Searching
corpora provides answers to questions like these:
1. What are the most frequent words and phrases in English?
2. What are the differences between spoken and written English?
3. What tenses do people use most frequently?
4. What prepositions follow particular verbs?
5. How do people use words like can, may and might?
6. Which words are used in more formal situations and which are used in more informal ones?
7. How often do people use idiomatic expressions?
THE CORPUS APPROACH IS COMPRISED OF 4
MAJOR CHARACTERISTICS:
1. It is empirical, analyzing the actual patterns of language use in natural texts. The key to this feature of the
Corpus Approach is authentic language. It is composed from : textbooks fiction, nonfiction, magazines,
academic papers, world literature, newspapers etc.
2. It utilizes a large and principled collection of natural texts as the basis for analysis. It refers to the corpus
itself. You may word with a written corpus, a spoken corpus, an academic spoken corpus etc.
3. It makes extensive use of computers for analysis. Not only do computers hold corpora, they help analyze
the language in a corpus. A corpus is accessed and analyzed by a concordance program. In short, you
cannot effectively utilize corpora, or employ the corpus approach, without a computer.
4. It depends on both quantitative and qualitative analytical techniques. This characteristic highlights the
importance of our intuition as expert users of a language.
• https://www.youtube.com/watch?v=YJTM3i5HxsQ
Thank you!

You might also like