You are on page 1of 14

Data-Driven South Asian Language

Learning

SALRC Pedagogy Workshop


June 8, 2005

J. Scott Payne
Penn State University
jspayne@psu.edu
Corpus-Based Approaches to L2
Instruction

 Traditional Paradigm:
present > practice > produce
 Data-Driven Learning:
observe > hypothesize > experiment
What is a corpus?
 "A corpus is a body of text assembled according
to explicit design criteria for a specific purpose"
(Atkins & Clear, 1992: p.5).
 John Sinclair (1991) on the development of the
field of Corpus Linguistics:
"Thirty years ago when this research started it was
considered impossible to process texts of several million
words in length. Twenty years ago it was considered
marginally possible but lunatic. Ten years ago it was
considered quite possible, but still lunatic. Today it is
very popular."
Comparing Genre

 Different genre of corpora:


 Journalistic
 Conversational (formal, informal, etc.)
 Literary
 Scientific
 Academic
 Second language learner
Analyzing Corpora

 Qualitative techniques:
 Concordances or keyword-in-context

 Quantitative techniques:
 Frequencies analysis of individual words and
collocations
 Lexical diversity
 Lexical density
What is Data-driven Learning
(DDL)?
 Application of tools (concordancers) and
techniques from corpus linguistics in the service
of language learning.
 Concordances as a tool for developing
instructional exercises.
 Places “raw” linguistic primary source material in
the hands of learners - learners as “researchers”.
 Learners have the opportunity to discover
language rules by themselves.
DDL Examples

 Tim John’s DDL website


Research on DDL
 Vocabulary acquisition
 Vocabulary acquisition improved through the use of
concordances (Steven, 1991; Cobb, 1997).
 Horst and Cobb 2001 found that of four tools supplied to
students for the learning and acquisition of vocabulary
(traditional bilingual and monolingual dictionaries, online
dictionaries and concordances) , after monolingual dictionaries,
use of concordances was more indicative of learning gains.

 Writing instruction.
 Cobb 2004 used Lextutor to help students correct their own
writing errors. Only 8% of students indicated that the
concordance had helped them.
Corpus Tool for Data-Driven
Learning

 KWICionary - http://conic.la.psu.edu/kwic/
 OCAT - http://conic.la.psu.edu/ocat/
Activity

 Construct a corpus in your language selecting


texts from the Web.
 Explore the corpus using the concordance query.
 Generate at least one activity that you could use
in the classroom this summer.
Learner Corpora

 Criteria for Learner Corpora:


 Continuous stretches of discourse, not isolated
sentences or words, containing both erroneous and
correct uses of the language.
 Resulting from authentic activity – classroom or
naturalistic interactions.
 Explicit design criteria – meta-data about the
learners’ background, setting, level of proficiency,
etc.
Learner Corpus Typology

 Monolingual <> Bilingual


 General <> Technical
 Synchronic <> Diachronic
 Written <> Spoken
Analyzing Learner Corpora

 Contrastive Interlanguage Analysis:


 Compares NS and NNS language
 Can highlight features of non-nativeness in learner
writing and speech (e.g. under and
overrepresentation of words, phrases, and structures)
Corpus-Based Assessment of
Language Development
 Goal: to construct an evolving performance-based
linguistic profile for individual learners.
 Elements of a learner linguistic profile:
 detailed diagnostic analysis of linguistic features: lexical
inventory, morphology, syntax, and a variety of discourse
properties including coherence and cohesion devices.
 It could encompass a comparison between learner performance
and native speaker usage and a comparison of learner
performance with other learners.
 allow for the introduction of a genuinely longitudinal approach
to language assessment because it will enable teachers and
researchers to track individual learner development over time
for any relevant linguistic features.

You might also like