You are on page 1of 25

Available online at www.sciencedirect.

com

Journal of Second Language Writing 21 (2012) 239–263

A dynamic usage based perspective on L2 writing§


Marjolijn Verspoor a,b,*, Monika S. Schmid a, Xiaoyan Xu a
a
University of Groningen, The Netherlands
b
University of the Free State, The Netherlands

Abstract
The goal of this study was to explore the contribution that a dynamic usage based (DUB) perspective can bring to the
establishment of objective measures to assess L2 learners’ written texts and at the same time to gain insight into the dynamic process
of language development. Four hundred and thirty seven texts written by Dutch learners of English as an L2 with similar
backgrounds were holistically coded for proficiency level, which ranged from beginner to intermediate (A1.1 to B1.2 according to
the Common European Framework of Reference). Each text was hand coded for 64 variables as distilled from the literature at
sentence, phrase, and word level. Statistical analyses showed that broad, frequently occurring, measures known to distinguish
between proficiency levels of writing expertise did so in this corpus too: sentence length, the Guiraud index, all dependent clauses
combined, all chunks combined, all errors combined, and the use of present and past tense. However, almost all specific
constructions showed non-linear development, variation, and changing relationships among the variables as one would expect from
a dynamic usage based perspective. Between levels 1 and 2 mainly lexical changes took place, between levels 2 and 3 mainly
syntactic changes occurred, and between levels 3 and 4 both lexical and syntactic changes appeared. The transition between levels 4
and 5 was characterized by lexical changes only: particles, compounds, and fixed phrases. The study shows that even short writing
samples can be useful in assessing general proficiency at the lower levels of L2 proficiency and that a cross-sectional study of
samples at different proficiency levels can give worthwhile insights into dynamic L2 developmental patterns.
# 2012 Elsevier Inc. All rights reserved.

Keywords: L2 writing; Developmental variables; Usage based; Dynamic; Objective measures; Complexity; Accuracy

Introduction

One useful way to measure general proficiency in a second language (L2) is to assess writing samples. Rather than
testing passive knowledge, as do traditional exam-style tasks, written texts show active language use on the part of the
L2 user in all its facets, including the use of vocabulary, idioms, verb tenses, sentence constructions, errors. In writing,
rather than in speaking, the learner can also show better what he or she is capable of doing in and with the L2 because
writing allows for more reflection and is therefore usually somewhat more complex linguistically than speaking. An
added bonus is that writing data is easier to collect and assess than spoken data.

§
We hereby would like to acknowledge the European Platform and the Dutch Network of bilingual education for funding the OTTO project and
the diligent work of our MA students who each wrote on a different topic addressed in this paper: Amalia Sofogianni (sentence measures), Ngoc Lan
Nguyen (vocabulary measures), Nicoleta Anastasiou (verb phrases), Theodora Antoniou (chunks), and Felicity de Vries (errors).
* Corresponding author. Tel.: +31 628838914; fax: +31 503635821.
E-mail address: M.H.Verspoor@rug.nl (M. Verspoor).

1060-3743/$ – see front matter # 2012 Elsevier Inc. All rights reserved.
doi:10.1016/j.jslw.2012.03.007
240 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

However, to be able to use writing samples to measure general L2 proficiency, the characteristics of the L2 at each
proficiency level have to be clearly defined. For example, the Common European Framework of Reference (CEFR;
Council of Europe, 2001) has developed descriptors for each level. For the A1 level (beginner) it states with respect to
writing: ‘‘Has a very basic range of simple expressions about personal details and needs of a concrete type, has a basic
vocabulary repertoire or isolated words and phrases related to particular concrete situations, shows only limited
control of a few simple grammatical structures and sentence patterns in a learnt repertoire, and can write simple
isolated phrases and sentences.’’ The problem with such descriptors is that they are still rather subjective and, as
pointed out by, for example, Jarvis, Grant, Bikowski, and Ferris (2003), not all learners of the same level behave the
same. For instance, some may have more advanced vocabulary and others more advanced sentence construction.
Furthermore, as Larsen-Freeman (2006) points out, labels such as beginner, intermediate, advanced, or native-like
are too subjective, especially where comparisons across studies are concerned. She argues that the field still needs a
‘‘common yardstick’’ for proficiency in writing products. However, the search for such objective measures has been
difficult because many factors may affect the characteristics of writing products of L2 learners. For example Wolfe-
Quintero, Inagaki, and Kim (1998) and Ortega (2003) point out that variation in writing products across learners may
occur when writers are compared across different tasks, in addition to the fact that learners from different first
languages may have different problems with their L2. Moreover, individual differences, especially language aptitude,
are known to have a strong effect on L2 development (Sparks, Patton, Ganschow, & Humbach, 2009).
Further factors, which have come to light recently through studies from a dynamic systems perspective, point to the
need to recognize differences in developmental trajectories at the individual level, both within the individual and across
groups. Among these factors are individual differences in development, referred to as ‘‘variability’’ for differences within
an individual and ‘‘variation’’ for differences among individuals (cf. Verspoor & van Dijk, in press). Variability occurs
because the same learner may not behave in the same way all the time. Beginner and intermediate learners in particular
may show a great deal of variability from day to day; they may use a past tense verb correctly in one sentence and an
incorrect one in the next one because that sub-system of the language has not been mastered yet. Also, they may not be
able to focus on all aspects of the language at the same time while learning, so in one situation they may focus more on
complexity issues and in another on accuracy (cf. Dijk, Verspoor, & Lowie, 2011). They may therefore both be considered
beginner or intermediate, but still produce texts that may have different characteristics (Jarvis et al., 2003).
The search for more objective measures to establish proficiency levels in L2 writing samples will benefit from more
detailed insights into not only what learners at certain stages have in common but also where they are likely to differ. In
order to be able to focus on such variation in L2 stages, it is important to exclude other factors known to cause variation
such as L1, age, natural versus instructed contexts, task effects, and aptitude. Therefore, our study focuses on one
largely homogeneous group of L2 learners: Dutch students aged 12–14 in their first and third year of high school with
similar scholastic aptitude. However, to be able to obtain L2 writing samples with a wide range of proficiency levels,
we included participants in two different types of instructional contexts, those with a traditional approach of two hours
of instruction per week and those in a semi-immersion environment with 15 hours of exposure to English per week.
The present study investigates indices of language development that have previously been discussed in L2 writing
research. The starting point was the three-dimensional L2 proficiency model encompassing Complexity, Accuracy,
and Fluency, or CAF (Skehan, 1989). All of these measures have generally been shown improvement as proficiency
increases (see Norris & Ortega, 2009; Ortega, 2003; Polio, 2001; Wolfe-Quintero et al., 1998, for reviews). However,
as Norris and Ortega (2009) argue, there is also a need for an ‘‘organic’’ approach that employs multivariate research
designs. They strongly advise to measure complexity not only in terms of general measures such as sentence length but
also with specific ‘‘distinct and complementary’’ (p. 562) complexity measures such as subclausal complexity,
complexity via subordination and coordination in addition to variety, sophistication, and acquisitional timing of forms
produced (pp. 561–562). Finally, as Leki, Cumming, and Silva (2008, chap. 14) point out, there are many other useful
variables that may distinguish among L2 proficiency levels and have been investigated in L2 writing, such as parts of
speech, sentence elements, sentence processes, sentence qualities, and mechanics.
Accordingly, the present study investigates 64 separate variables involving sentence constructions, clause
constructions, verb phrase constructions, chunks, the lexicon, and accuracy measures across five different stages of
writing development, operationalized as proficiency levels from 1 to 5, from beginner to intermediate. The aim is to
establish: (1) which text features (constructions, words, and errors) occur at each stage, (2) which text features
discriminate strongest between stages, and finally (3) to infer from the findings in this cross-sectional study what the
changes in sub-systems across the stages might indicate about the L2 developmental process.
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 241

We will first set out the theoretical approach on which our view of L2 development is based. Then we will discuss in
detail all the sub-systems separately. In our discussion, after a summary of the main findings, we will discuss the
contribution this study makes to identifying measures that can be used objectively to discriminate between proficiency
levels in writing samples, and the insights it provides into L2 writing development from a dynamic usage based
perspective, complementing recent attempts in the same direction by for example Verspoor, Lowie, and van Dijk
(2008) and de Angelis and Jessner (in press).

Theoretical approach

The present study proceeds from a dynamic usage based (DUB) approach1 (cf. Langacker, 2008; Robinson & Ellis,
2008; Verspoor & Behrens, 2011), which holds that there is no innate faculty for the acquisition of grammar
determining the linguistic developmental path. Instead, each learner is assumed to discover the regularities and
patterns of an L2 through exposure and experience with the language. Frequency of input, as Larsen-Freeman (1976)
claimed, is seen as one of the main factors driving acquisition. Within a DUB perspective, one would expect
differences among L2 learners, resulting in diverse individual trajectories and plenty of trials and errors along the way.
The assumption is that many predictor variables such as L1, age, intelligence, verbal aptitude, motivation, type of
exposure, or context, will interact in complex ways to determine L2 acquisition. The term ‘‘dynamic’’ implies that the
current level of development depends critically on the previous one and that therefore ‘‘initial conditions’’ are
important. Moreover, it is argued that all elements and sub-systems of an organism are interconnected and continually
affect each other in development, with each new ‘‘step’’ emerging from all previous steps (cf. de Bot, Lowie, &
Verspoor, 2007; van Geert, 1991).
As far as L2 development is concerned, the DUB approach assumes that there is much more to learn in an L2 than
its morphology and syntax. In fact, there is no real division between morphology, lexicon, collocations, formulaic
phrases, and constructions. They are all seen as constructions in a linguistic continuum with no clear division between
them. Therefore, in order to understand stages in language development, as many (overlapping) sub-systems as
possible should be examined, not only to see how each of these develops over time but also to see how they may
interact. Moreover, if each learner has to find his or her own way to detect and discover the repeated patterns, we might
also expect variation among individuals and variability within individuals.
As van Dijk and van Geert (2007), Larsen-Freeman (2006), Verspoor et al. (2008), and Spoelman and Verspoor
(2010) report, both free and systematic variability will be relatively high during development when the system is
reorganizing and they will be low in a more stable system. Therefore, the degree and pattern of variability at the
individual level is informative with respect to the developmental process and longitudinal studies should show the
variability that may occur at the individual level and how strategies may change over time. However, Siegler (2006)
argues that, despite individual trajectories and variability, there are also general developmental patterns to be found. In
a study that combined a micro-genetic and cross-sectional component (Siegler & Svetina, 2002), the patterning of
changes and non-changes proved to be quite comparable. The longitudinal and cross-sectional groups matched on 10
of the 11 indices of change that were examined. In contrast to the cross-sectional study, however, the longitudinal
micro-genetic study yielded data that could show how the changes had taken place.
A synthesis of longitudinal, micro-genetic studies has also led to general principles for the study of L2
development. In his summary of twenty major investigations of children’s language learning in micro-genetic and
longitudinal approaches, Siegler (2006) points out that (1) the discovery of new approaches or strategies tends to be the
beginning of learning rather than the end, and when new strategies are used they are generally used inconsistently, (2)
learning reflects addition of new strategies, greater reliance on relatively advanced strategies that are already being
used, improved choices among strategies, and improved execution of strategies, and (3) even though there is variation
and variability in the process of learning, learning tends to progress through a regular sequence of stages.
In the study to be reported in this paper, we hope to illustrate these general principles of learning with cross-
sectional data, text samples representing different proficiency levels from beginning to intermediate, which could be
considered idealized ‘‘knowledge states’’ or stages in the developmental writing process.

1
This term is derived from the combination of Dynamic Systems Theory and Usage Based linguistics. Langacker has applied a dynamic systems
approach in quite a few of his publications, and one of his papers (Langacker, 2000) is entitled ‘‘A dynamic usage based model’’.
242 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

Dynamic principles in development

The description of L2 development as a dynamic usage based system is based on the following two general
assumptions (cf. van Geert, 2009): First, development is defined as the growth or increase of more developmentally
advanced or complex variables and the decline or decrease of less developmentally advanced variables. Second,
growth, or change, depends on the availability of resources such as the amount of input or motivation, which are
limited. For instance, a resource factor for lexical growth is not only the language spoken in the environment, but also
the learner’s language aptitude or motivation.
The minimal developmental system we could study would consist of one single variable or component, for
example, a learner’s lexical knowledge. Such a single-variable system is then automatically assumed to be part of an
environment (in the abstract sense of the word), which consists of any other component that may affect this lexical
knowledge, such as the learner’s syntactic knowledge or the language of instruction in his L2 class, but none of these
would be addressed explicitly in the data analysis. However, for a description of development it would be more
meaningful to see how various components interact over time. For instance, it is relatively trivial that a learner’s
vocabulary acquisition may be related to acquiring new words that are less frequent or using a greater variety of words
within a text. We may also assume that the learner needs to know more difficult words in order to be able to produce
longer sentences with more complex structures. In other words, we want to describe the relationships that can hold for
any two components and their subcomponents, traditionally considered ‘‘growers’’ in the DST literature, but referred
to as ‘‘variables’’ in this study.
Several studies so far have looked at possible interactions between variables. Bassano and van Geert (2007)
illustrate the process of the emergence of syntactic complexity by looking at how one word (W1), two word (W2),
three word (W3), and four or more word (W4+) utterances developed over time in L1 acquisition. They showed that the
less and advanced developmental structures had to pass a certain threshold before more complex ones could occur.
Robinson and Mervis (1998) report a similar interaction between the lexicon and syntax in L1 development. This study
shows that when multiword sentences start to emerge, lexical growth starts to decline. Verspoor et al. (2008) looked at
the development of an advanced Dutch learner of English. They also examined the relationship between the learner’s
lexicon and syntax in academic texts and found that there seems to be a trade-off between more varied word use and
longer sentences at different stages in the developmental process.
Spoelman and Verspoor (2010) examined the development of different complexity measures of a beginning Dutch
learner acquiring Finnish. They examined the development of complexity at the word, noun phrase, and sentence level.
They found that as word complexity increases, both noun phrase and sentence complexity increase as well, so these
variables develop simultaneously, but noun phrase complexity and sentence complexity alternate in developing and
compete with each other in use. Finally, in a longitudinal Dynamic Systems Theory (DST) study on L2 writing
development, Caspi (2010) traced four variables – lexical complexity, lexical accuracy, syntactic complexity, and
syntactic accuracy – in four different advanced learners over about 10 months. She found that learners first make their
words more complex before they used them accurately; then they make their sentences more complex and syntactic
accuracy comes last.

Method

The present study is not longitudinal but cross-sectional. Sets of writing samples representing five different
proficiency levels (1–5) are assumed to represent stages in L2 writing development as most L2 learners will proceed
through these proficiency levels consecutively (as shown in Verspoor et al., 2011). To do justice to the inherent
variation among learners that a DUB approach assumes, we will look not only at what the texts at certain levels have in
common, but also where they differ.
A DST perspective assumes that there is always variation among learners even at the same ‘‘stage’’ of development.
To be able to show this, we controlled for most known causes of variation within groups of learners as much as
possible. We selected participants with highly similar backgrounds and asked them to write short texts under similar
circumstances. The texts were first holistically graded to obtain a general language proficiency rating ranging from 0 to
5, resulting in groups of texts at about the same stage of L2 writing development. On the assumption that a holistic
quality score by a team of experienced teachers can indeed identify an L2 writing developmental stage (e.g. beginner,
intermediate, or advanced) and that the measures found can be used as indexes of learner traits, our next analysis was
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 243

meant to operationalize and objectify the measures. The stages or proficiency level established for each of the texts is
of course due to how the raters rated them and therefore the holistic rating is expected to correlate with the linguistic
features that the raters evaluated.
The texts from 489 participants (comprising ca. 40,000 words) at each of these developmental stages determined by
the proficiency ratings (texts from level 0 were excluded, as these contained too few words and were more Dutch than
English) were coded for 64 variables. By making use of traditional statistical methods, we expect to be able to show:

(1) which text features (constructions, words, and errors) occur at each stage,
(2) which text features discriminate between stages, and finally
(3) to infer from the findings in this cross-sectional study what the changes in sub-systems across the stages might
indicate about the L2 developmental process.

Participants

The writing samples used in this study were collected in September and October 2008 from 489 Dutch pupils from
six different schools in the Netherlands, all of them offering a high academic, pre-university program of secondary
school. Students are admitted to this type of program based on the recommendation of their primary schools and their
scores on the CITO (Dutch testing agency) test, a national scholastic aptitude test taken in the last year of primary
education. Therefore, all participants had a very high CITO score (447 average with a highest possible score of 450),
which in the Dutch educational context translates as the ‘cream of the crop’.2 The participants investigated in the
present study therefore all had a relatively high level of scholastic aptitude. Reynolds (2002) points out that in studying
adolescent second language writers, cognitive and age-related development affects general language development.
To control for as many extraneous variables as possible, but still obtain a wide range of proficiency levels within a
relatively homogenous group of participants, writing samples were collected from two groups of students in the first
(12–13 years old) and third (14–15 years old) year of secondary education. To collect samples with as wide a range of
proficiency levels as possible, learners were selected from two different instructional conditions: those that follow a
regular stream with about two to three hours of English instruction per week and those that follow a Content and
Language Integrated Learning (CLIL) program, that is, a semi-immersion program with about 15 hours per week of
instruction in English and about English. The learners with more input are significantly more proficient (cf. Verspoor
et al., 2011). In total, there were 22 classes: 14 from the regular stream and 8 classes from the CLIL stream.3

Writing task

As participants were expected to have a proficiency level ranging from an absolute beginner to intermediate, the
writing task involved only simple, personalized topics. The first year students, directly from primary school and with
little to no prior English writing instruction, were asked to write about their new school. The third year students, with
two years of English either in the form of regular or CLIL streams, were asked to write about their previous vacation.
The topics were controlled for task because they did not require the use of specialized language. The students had
unlimited time to write their brief narratives directly on the computer; for technical reasons the limit was 1000
characters, which turned out to be about 200 words. Students wrote samples ranging from about 25 to 200 words,
which was assumed to be long enough text to be able to carry out the linguistic analyses required in this study.

Holistic proficiency scores

First the writing samples were holistically assessed for general proficiency as Ortega pointed out that studies that
operationalize proficiency in terms of holistic ratings yielded more homogenous observations for the measures she
studied as reflected in smaller standard deviations and narrower ranges than studies that operationalize proficiency in

2
For more detail on the effect of the Cito score and amount of input on the same population, please see Verspoor, de Bot, and Xu (2011).
3
In several other studies (among which Smiskova & Verspoor, in press) we looked at differences between the two conditions; however, this study
will focus only on differences between proficiency levels, regardless of instructional condition.
244 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

terms of naturally occurring classes or groups (2003, p. 502). Our procedure was carefully controlled to ensure high
interrater reliability. A group of eight experienced ESL teachers (three native speakers of English, two of Dutch, and
one each of Chinese, Portuguese, and Spanish) established scoring criteria as follows: Each rater first judged a set of
six samples to see which they thought were the strongest and which ones the weakest in general English proficiency.
These orders were discussed among the raters. From the discussions amongst the raters a range of factors emerged that
are very much in line with general CAF measures: text length, sentence length, sentence complexity, use of different
types of clauses, use of tense, aspect, voice and mood, vocabulary range, use of L1, use of idiomatic language, and
accuracy. However, many texts were difficult to rank. Some had no errors but contained only very simple
constructions; others had a great many errors but contained different complex constructions. Some samples were
discussed at length before agreement was reached. After the group reached agreement on the rank order, the texts were
tentatively classified into proficiency levels. Following this procedure, the raters worked together with some 100
samples until they had settled on six proficiency levels (0–5). Assessment criteria were then established, which
included the main characteristics of each level that had emerged from the discussions to help the raters classify the
remainder of the samples. Then the raters were divided into two groups of four, first assessing the samples individually
and then comparing the scores. For each text, the majority score (at least 3 out of 4) was taken and if there was no
majority, differences were resolved through discussion. If a group was unable to reach consensus, the other group of
raters was consulted. After all texts had been scored, they were sorted according to the assigned proficiency score. To
ensure consistency, they were examined again by two of the raters, who in consultation with each other moved 5 of the
489 texts (i.e. less than 1%) up or down one proficiency level.
Finally, samples that had been rated 0, which meant the text contained more Dutch words and constructions than
English ones, and those with fewer than 25 words were removed from the corpus, leaving us with 437 samples. In order
to allow our readers to compare our proficiency scores with a common standard, we calibrated the levels identified by
the raters with the linguistic indicators for writing in the CEF (Common European Framework) levels, resulting in a
rough comparison as follows: 1 = A1.1, 2 = A.1.2, 3 = A2, 4 = B1.1 and 5 = B1.2 (Kops Hagedoorn, 2009).

Coding the writing data

Each text was coded for as many developmental variables as we could deduce from the literature and from our own
observations. The general areas were sentence constructions, clause constructions, verb phrase constructions, the
lexicon, chunks, and types of errors. In all, we discuss 64 variables. To ease reading, we will discuss the exact
operationalization of each cluster of sub-categories in the result sections.
All writing samples were converted to the transcription conventions of the Codes for the Human Analysis of
Transcripts (CHAT). The 64 variables under investigation were hand-coded and analyzed by means of Computerized
Language Analysis (CLAN).4 To ensure privacy and consistent counting for unique words and average word length, all
personal names were replaced by name and all numbers were replaced by numb (both 4 letter words). The coding was
done in several stages. First all first-year texts were initially changed to CHAT format by one researcher and all third-
year texts were similarly transformed by another researcher, who then checked each other’s work and resolved
differences in discussion. These data were used in the analyses at the sentence level.
For the analysis of further developmental variables, comprising features pertaining to verb phrase constructions, the
lexicon, chunks, and errors, another researcher carefully checked all the codings again and resolved any questions with
the first two coders. This researcher then coded the more fine-grained distinctions, which were checked by one of the
two first researchers.
Most measures are standard and well known in the literature and will not be discussed in detail here. However, to
ascertain lexical sophistication, we adapted a standard measure to our own corpus. Laufer and Nation (1995) originally
developed a lexical sophistication measure by extracting the percentage of words belonging to the academic word list.
However, this index is very sensitive to the genre and writing topic and is not fine grained enough for learners at lower
proficiency levels, especially when writing on a non-academic topic. Sophistication can also be defined as the
proportion of relatively unusual or advanced words in the learner’s text (Read, 2000), and can be calculated by dividing

4
CHAT and CLAN were developed as part of the Child Language Data Exchange System (CHILDES) project (MacWhinney, 2000; MacWhinney
and Snow, 1990).
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 245

the number of sophisticated words by the total number of words. Considering the fact that we had a great number of
participants who wrote on the same topics, we decided to define lexical sophistication as the originality of the
vocabulary in relation to the corpus at large. In order to do this, all items were lemmatized, spelling errors were
corrected where necessary, and the frequency of each lemmatized item in the overall corpus was determined. The
resulting total number of word families were divided into five frequency bands, ranging from the 20% most frequent to
the 20% least frequent items in the corpus. The two least frequent categories were then assumed to contain the most
sophisticated words in the corpus. It was then determined for each individual text what percentage of the lexical items
(both tokens and types) fell into each frequency band.

Recalculating the measures for statistical purposes

For measures that are complementary, such as sentence constructions (e.g. a sentence is either simple, compound,
complex, or compound complex), the absolute number of occurrences in each text was converted to percentages. For
example, if a text contained 10 sentences, 5 of which were simple, 3 compound, and 2 complex, the relative
percentages would be 50% simple, 30% compound, and 20% complex. Such relative percentages were used for
dependent clause types, verb measures (tense, voice, aspect and mood), and the customized lexical sophistication
profile frequency bands. For measures that were not complimentary, the frequency was assessed on the basis of overall
text measures. Chunks, which can be of varying length and different types, were assessed on the basis of the number of
utterances which contained them, while the relative frequency of errors was assessed on the basis of the total number of
tokens in the text.

Statistical analyses

To find out which variables correlated most strongly with each proficiency level as determined by the holistic
quality score, we present a preliminary correlational analyses with effect scores to explain the variance in the holistic
ratings followed by a hierarchical multiple regression analysis in Appendix A.
The analyses we will present here tap into the exact same underlying relationships as in the regression analysis, but
they show in more detail which individual variables discriminate among levels of proficiency.
In order to detect differences between proficiency levels for the variables we investigated, either a one-way
between-groups analysis of variance (ANOVA) or a one-way between-groups multivariate analysis of variance
(MANOVA) was conducted. ANOVA was used in the case of individual factors which were independent of other
dependent variables, for example ‘words per T-unit’, while MANOVAs were carried out when two or more dependent
variables were closely related, like sentence types (simple, compound, complex, and compound-complex).
Preliminary analyses (correlations) were conducted to check if conceptually clustered variables were also statistically
related, showing that most grouped variables interacted at a certain level, with the exception of the four variables
pertaining to aspect, modality, and voice. MANOVA was still used in this case because these variables can be
considered conceptually clustered as they all concern verb phrases; Moreover, ANOVAs for each individual variable
revealed the same results. The level of statistical significance was set at 0.05 and for each pairwise comparison, the
alpha level was adjusted using the Bonferroni correction. For all MANOVAs used in this study, preliminary
assumption testing was carried out and no serious violations were noticed. To detect for which variables the variation
among learners was the greatest, we calculated the coefficient of variance for many of the variables.

Operationalizing change

Our study has five groups of texts, each representing a level of proficiency, the averages of which might be
considered idealized stages in the L2 developmental writing process. We do not expect all texts at one particular
proficiency level to be the same in all respects, but we do assume that each group of texts has commonalities (for
example, most texts at level 1 will contain verb phrases in mainly the simple present tense) and variation (some texts at
level 1 will contain many spelling errors, others will not).
When we try to project the findings of this cross-sectional study to a longitudinal one we will reason as follows:
When there is a significant difference in a variable between texts of two consecutive proficiency levels, for example
level 1 and 2, we assume that most L2 learners who progress from level 1 to 2 will show a change in the use of that
246 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

particular variable. Because most learners are prone to change with respect to this variable, it would be considered a
rather likely change between two stages in the idealized developmental writing process.
To categorize our measures, we will use the following terms:

- Strong discriminator (SD) is a significant difference between two consecutive levels (1–2, 2–3, 3–4, or 4–5)
- Medium discriminator (MD) a significant difference between two levels one step apart (1–3, 2–4, or 3–5)
- Weak discriminator (WD) a significant difference between two levels two or more steps apart (1–4, 1–5, or 2–5).

Results

The results for the five general categories (sentences and clauses, verb phrases, chunks, lexicon, and errors) and
their sub-categories are presented separately below. Each section is organized as a small study on its own. It will
include a brief overview of the operationalization of each category and sub categories followed by the detailed results,
a summary of the main findings, and an inference on what the findings may indicate about development.
The general discussion then treats the variables together to show which clusters of variables may best distinguish
between proficiency levels, which clusters of variables change most, and how these clusters may interact in L2 writing
development.

Changes at sentence level

At the sentence level, we coded the general complexity measure of T-unit length, one of the most robust complexity
measures according to Wolfe-Quintero et al. (1998, pp. 97–98). However, as we wanted to see at what stage different
types of specific constructions occur, we also coded for types of sentence and types of dependent clause (finite vs. non-
finite). The following table gives the labels, with definitions based on Verspoor and Sauter (2000), and examples from
our corpus (Table 1).
Average T-unit length changes at a moderate rate and is a medium discriminator: There were significant differences
between levels 2–4, and 3–5.
Fig. 1 shows that the distribution of the different types of sentences changes across the proficiency levels. All four
types of sentence constructions occurred from the beginning, but beginners used mainly simple and compound

Table 1
Sentence level measures.
Label Definition Examples from data
T-unit measures
Words per T-unit Total number of words divided by total number of T-units My teachers are friendly. (4)
Sentences/utterance
Fragment Utterance without main clause because we buyed all things the same!
Simple One main clause My teachers are friendly.
Compound Two complete main clauses (each with its own subject I have very much homework and I have enough
and finite verb) to do.
Complex One main clause and one or more finite dependent clauses It was very nice and funny, because we buyed
all things the same!
Compound-complex A combination of compound and complex elements Now I don’t know what to talk about anymore
so I just say a lot of things that don’t make sense.
Dependent clauses
Adverbial Finite clause functioning as adverbial It was very nice and funny, because we buyed
all things the same!
Nominal Finite clause functioning as subject, object, or other nominal I said I haven’t saw them before.
Relative Finite clause functioning as post-modifier of a noun The most nice thing I’ve did was mountain biking.
Non-finite A non-finite construction functioning as adverbial, In de back of the boat were dolphins jumping
nominal, or post-modifier in our waves.
Well, I met a boy, named Name.
He walked to the field and let the spider go away.
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 247

Sentence types
80
70
60
50
40
30
20
10
0
1 2 3 4 5

Simple Compound Complex Compound-complex

Fig. 1. Distribution of sentence types at five levels of proficiency. SD between 2–3 for simple and complex sentences as indicated by stars. MD
between 1–3, 2–4, and 3–5 for simple, complex, and compound complex. No D for compound sentences.

sentences. (Note that because we did not include fragments (incomplete sentences) and phrases, not all measures add
up to 100%.) Simple sentences were the most frequent. Simple sentences decreased fast across proficiency levels, with
a concomitant increase of complex sentences, suggesting somewhat of a ‘‘spurt’’ of complex sentences between levels
2–3, and a trade off with the simple sentence constructions. The proportion of compound sentences remained stable
across the levels; there were no significant differences between any of the levels. Compound-complex sentences
increased steadily but slowly across proficiency levels. At level 5, a balanced distribution of sentence types has been
reached.
Complex and compound-complex sentences may contain different types of finite and non-finite dependent clauses.
The measure ‘‘sentences containing any type of dependent clause’’ discriminates strongly between all consecutive
levels (see Fig. 13), a finding consistent with Wolfe-Quintero et al. (1998).
Fig. 2 shows that different types of clauses occurred at different levels. (As we did not include the category ‘‘no
dependent clause’’, the numbers do not add to 100%.) At level 1, all the types of dependent clauses occurred in a
balanced mix. Nominal and adverbial clauses were the first to increase. The average proportion of nominal and
adverbial clauses switched at level 3 and at level 5. Finite relative clauses increased moderately at first but strongly
between levels 3–4 (trend). Non-finite clauses also increased moderately.
Fig. 3 shows that even though very few dependent clauses were used at the lower levels, the degree of variation was
by far the greatest at these lower levels, suggesting that there were great differences among beginners’ texts in using
these constructions. The texts of more proficient learners (at levels 4 and 5) showed less variation, suggesting that more
proficient learners resembled each other in the use of dependent constructions.
To summarize, the total length of utterances increases at a moderate rate across proficiency levels. The proportion
of simple vs. complex sentences are moderately useful in distinguishing proficiency levels, but the total number of
dependent clauses per text is a strong discriminator. As far as variation is concerned, we may conclude that among
beginners there is more variation for these measures than among more advanced learners.

Dependent clauses

16
14
12
10
8
6
4
2
0
1 2 3 4 5

Finite nominal Finite adverbial Finite relative Non-finite (all)

Fig. 2. Distribution of types of dependent clauses at five levels of proficiency. SD between 2–3 for finite adverbial and non-finite clauses and
between 3–4 for finite relative clauses as indicated by stars. MD for all variables between remaining levels.
248 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

CoV Dependent clauses


450
400
350
300 Dependent clause
250 Finite nominal

200 Finite adverbial


Non-finite
150
Finite relative
100
50
0
1 2 3 4 5

Fig. 3. Coefficient of variance of types of dependent clauses.

If we project these findings on the writings of learners who might go through these stages consecutively, we should
expect different types of clauses to occur at different stages and rapid change for some to occur between levels 2–3 and
3–4, suggesting some syntactic reorganization especially between these levels. The fact that adverbial and nominal
clauses occur at the lower levels may be due to their comparatively simple structure, requiring only the addition of a
conjunction before a clause. Relative clauses are more intricate to form because a noun is replaced with a pronoun,
which may also involve word order differences. A spurt of non-finite clauses between levels 2–3 could be related to the
fact that mental verbs such as decide become more frequent at that level.

Change in verb phrase constructions

Based on data from language learners’ L2 narratives, Bardovi-Harlig (2000) and Bayley (1999) present evidence
for developmental paths in the acquisition of the form-meaning mappings required to express tense and aspect.
Therefore, we wanted to find out what types of verb phrases were used at what proficiency level. Each verb phrase,
regardless of whether it was correctly spelled or used, was coded for tense (simple present, simple past, and present
perfect), and for progressive aspect, passive voice, and conditional (a cover term to indicate the use of a modal, semi-
modal, or marginal modal verb or a past (perfect) tense used to express contrary to fact). All verb errors were coded
under the category ‘‘errors’’ and will be discussed in the error section (Table 2).
Both present and past tenses were used far more than any of the other tenses and occurred from the beginning. The
simple present tense and the past tense showed the same pattern, but in different directions: both the present and past
tenses discriminated strongly between 1–2, 2–3, and 3–4 (see Fig. 13 for the present tense).
Fig. 4 shows that while modals, perfect tense, progressive aspect and passive voice all occurred at level 1, they were
used sparingly by the beginners. Note that for both the perfect and progressive there is an increase until level 3 and then
a decrease, and for the passive a weak peak at level 4, suggesting degrees of overuse at these particular levels.
Fig. 5 shows that at the lower levels there is the greatest amount of variation, especially for the past perfect tense and
passive voice, suggesting that only some beginners used a past perfect tense or passive voice. At level 1, there is

Table 2
Verb phrase measures.
Label Example
Simple present tense is, walks
Simple past tense was, walked
Present perfect has gone
Past perfect had gone
Conditional will go, can go, is able to go, might, would, could go, could have gone,
went (in: if he went) etc.
Passive is written, was written
Progressive is walking, was walking
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 249

Mood, aspect and voice


10
9
8
7
6
5
4
3
2
1
0
1 2 3 4 5

Conditional Progressive Present perfect Passive Past perfect

Fig. 4. Distribution of present perfect, past perfect, conditional, progressive, and passive at five levels of proficiency. MD between 1–3 for present
perfect and 3–5 for conditional. WD between 1–5 for conditional and between 1–4 for passive.

CoV Verb phrases


1200
1000
800
600
400
200
0
1 2 3 4 5

Past perfect Passive Present perfect Simple past


Progressive Conditional Simple present

Fig. 5. Coefficient of variance for verb phrases.

however little variation in the number of present tenses used. At level 5, the degree of variation is relatively low for all
verb phrases, and the lowest for the simple past tense.
To summarize, in discriminating among proficiency levels, the proportion of present tense (or its reverse past tense)
is quite useful.5 The other types of verb phrase do not consistently discriminate between levels. As far as variation is
concerned, we may conclude that beginners show relatively more variation than more advanced learners.
If we project these findings on the writings of learners who might go through these stages consecutively, Fig. 4
suggests that learners at level 3 will overuse the perfect and progressive and at level 4 the passive. This idea is partially
complemented by the finding that there seems to be a peak of verb use errors at proficiency level 3.

Chunks

Few studies have previously looked at the use of different types of chunks across proficiency levels. However,
Hinkel (2002) showed that L2 writers’ texts had fewer collocations than those from L1 writers, and Hu, Brown, and
Brown (1982) and Sonomura (1996) showed that L2 writers had more collocation errors than L1 writers. Grant and
Ginther (2000) and Jarvis et al. (2003) showed that there was an increase across L2 proficiency levels in terms of
overall us of complementation. Smiskova and Verspoor (in press) provide a typology for chunk use in L2 language and
show that the more L2 input learners receive, the more, and longer, chunks they use.6
Chunks are notoriously difficult to identify reliably (cf. Smiskova & Verspoor, in press). To operationalize
chunks we have chosen to include the full range of types previously identified. We used a list presented by Moon

5
As a reviewer pointed out, we must concede that this may be partially due to task effect. First year participants, who were more likely to be
beginners, were asked to write about their new school and third year participants, who were more likely to be more proficient were asked to write
about their vacation. However, quite a few first year students scored at level 3–5 and those learners did use past tenses in their writings and the third
year participants who scored lower than 3 were more likely to use a present tense.
6
The typology of chunks in Smiskova and Verspoor (in press) based initially on the typology presented here is not quite the same as in this paper.
250 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

Table 3
Types of chunks identified.
Label Definition Examples
Partially schematic chunks
Structures A fixed part and slot-fillers (here underlined) better and better, it is easy to do, have work to do,
find it nice, so happy that, warm enough, some
minutes before, some meters deep, etc.
Complements Verbs with infinitives, gerunds, nominal sentences, or decide to, be able to, stop working, I don’t know
reflexives as complement what/who/where, etc.
Fixed chunks
Compounds Fixed combinations of nouns, adjectives, prepositions, or particles sunbathing, dressing rooms, deep blue, forest fire,
Caribbean Sea, overall, two-week holiday, etc.
Particles Verbs or nouns that receive prepositions or particles, phrasal verbs depend on, go on holiday, make up a story, in the
evening, a group of, because of, by train, etc.
Collocations Collocating nouns, adjectives, verbs and also adverbs, prepositions, the sun goes down, take a dive, strong current,
pronouns real close, went wrong, hurt badly, crack of dawn, etc.
Fixed phrases Highly institutionalized chunks with referential, often idiomatic, lots of fun, have a wonderful time, what a pity,
consisting mainly of more than two words at once, of course, all of a sudden, go home, etc.
Discourse Chunks with discourse function why don’t we, in other words, anyway, guess what, etc.

(1997, pp. 44–47) and added two further types identified in other studies. To be able to compare the development of
productive to non-productive chunks, we also made a distinction between partially schematic chunks and fully
specific and fixed chunks. Partially schematic chunks allow slot-filling, and new ones can be produced on the basis of a
schematic pattern. Each fully specific and fixed chunk is unique and, just like a lexical item, it will have to be learned
separately.
As mentioned in the methods section, chunks were assessed on the basis of the number of utterances which
contained them (Table 3).
When all types of chunks are added, this measure is a very strong discriminator as it distinguishes between all
consecutive levels (see Fig. 13).
Fig. 6 shows the development of partially schematic and fully fixed chunks. Note that partially schematic chunks
changed early on and the fully fixed chunks early on and at the end. Within the cluster of partially schematic chunks,
complement constructions changed between 1–3 and 3–5 and discriminated (strong trend) between 2–3. Structures
changed between 1–3, 2–5, and 3–5.
Fig. 7 shows that among the fixed chunks, particles showed the most change. Collocations were next. Compounds
are quite interesting in that they showed no change until the end. Discourse chunks were used very little. Fixed phrases
showed medium change at the early stages and large change later on.
Partially schematic versus fully fixed chunks

0
1 2 3 4 5

partially schematic fully fixed

Fig. 6. Distribution of partially schematic versus fully fixed chunks at five levels of proficiency. SD between 1–2 and 2–3 for partially schematic
chunks and between 1–2 and 4–5 for fully fixed chunks as indicated by stars. MD between 3–5 for partially schematic chunks.
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 251

Fixed chunks
3

2,5

2 particle
compound
1,5
collocation

1 fixed phrase
discourse
0,5

0
1 2 3 4 5

Fig. 7. Distribution of types of fixed chunks at five levels of proficiency. SD between 1–2 (strong trend) and 4–5 for particles, between 4–5 for
compounds, and between 3–4 and 4–5 for fixed phrases as indicated by stars. MD between 3–5 for particles, between 1–3, 2–4, and 3–5 for
collocations between 1–3 and 2–4 for fixed phrases. WD between 2–5 for particles and between 1–5 for discourse chunks.

CoV Chunks
700
600
500
400
300
200
100
0
1 2 3 4 5

Discourse Complement Structure Collocation Fixed phrase Particle Compound Total

Fig. 8. Coefficient of variance for chunks.

Fig. 8 shows the degree of variation in the use of chunks. Again the degree of variation is the greatest among
beginners. Interestingly, across proficiency levels, the discourse marker showed the most variation; as it also had the
lowest degree of occurrence we may conclude that only a few students used this type of chunk.
To summarize, in discriminating between proficiency levels, the total count of chunks is very useful as it shows
differences between all levels. Even though this finding may not be surprising – more advanced learners will use more
words with target like collocations – it is quite a novel finding in that no other study has looked at chunks across
proficiency levels as systematically before. As far as variation is concerned, we may conclude that beginners show
relatively more variation than more advanced learners.
If we project these findings on the writings of learners who might go through these stages consecutively, we should
see different patterns for partially schematic and fully fixed chunks. The differences are probably related to the fact
that once a partially schematic chunk is learned, it can become productive, whereas each fixed chunk has to be learned
separately. The spurt of fully fixed chunks at the end suggests that at higher proficiency levels, differences can be found
not so much in syntax and grammar but in the lexis, a finding supported by the Guiraud index (Fig. 13) and to some
degree by the customized lexical frequency profile (Fig. 12). The findings suggest more syntactic reorganization early
on and lexical change later on.
The different rates of change in the fully fixed chunks can be partially explained by the relative occurrence of these
constructions in the language, but what is interesting is that unlike most other measures we looked at, two of these
lexically based constructions exhibited a spurt at the very end between levels 4–5, again suggesting that between these
levels more lexical than syntactic changes take place.

Lexical sophistication and variation analysis

Numerous studies have shown that lexical measures change as proficiency increases (for an overview see Leki et al.,
2008), where lexical learning is investigated in terms of sophistication, originality, and diversification. To calculate
lexical sophistication we used two measures: the average word length and a Customized Lexical Frequency Profile
(CLFP) (see section Methods above). Average word length was investigated by Jarvis et al. (2003), who found that in
252 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

Customized Lexical Frequency Profile: Types


30

25

20

15

10

0
1 2 3 4 5

1-20 21-40 41-60 61-80 81-100

Fig. 9. Distribution of frequency bands of types at five levels of proficiency. SD between 1–2 for the first band (decrease). MD between 3–5 for the
first band (decrease), between 1–3, 2–4, and 3–5 for the second band (decrease), between 1–3 for the fourth band (increase), and between 3–5 for the
last band. WD between 2–5 for fourth band (increase) and between 1–4 and 2–5 for the last band (increase). No growth for the middle band.

their cluster analysis of twenty-one linguistic features related to complexity, it was one of the seven which were good
predictors of essay complexity. Word length is also a good approximation of what has been called sophisticated words
(Wolfe-Quintero et al., 1998), because low frequency words tend to be longer than many high-frequency words.7
To calculate lexical diversification we used the Guiraud index, a Type Token ratio adjusted for text length, which is
one of the most robust type token measures according to van Hout and Vermeer (2007). The results for each of these
measures – average word length for sophistication, customized lexical frequency profile for originality, and Guiraud
for diversification – are given below.
Word length was not a good discriminator. Even though there seems to be U shaped development, there were no
significant differences between any of the levels, most likely because of the variation among the learners.
CLFP for tokens was not a strong discriminator, but indicates subtle changes in vocabulary use across proficiency
levels. At all proficiency levels, words from all five frequency bands are used. What changes is their relative
distribution, with frequent words decreasing and less frequent words increasing. There is a decrease of the most
frequent lexical items at a medium rate between levels 2–4 for the most frequent words (1–20%) and between levels 1–
3 for the second band of frequent words (21–40%) and a small increase of the least frequent words at an early stage
between levels 1–4 and 2–5 and a medium increase at a later stage between levels 3–5 for 81–100%.
Fig. 9 shows the CLFP according to types. Visual inspection suggests that types rather than tokens indicate what
may be happening to the most frequent words, which are most likely to include function words such as articles,
common verbs such as to be, and the like.
The Guiraud (the type token ratio adjusted for text length) is a strong discriminator (see Fig. 13). There is a
significant difference between all adjacent levels.
To summarize, in distinguishing between proficiency levels, word length (sophistication) and the CLFP
(originality) were not very useful in this corpus. However, diversification as measured by the Guiraud index is very
useful as it discriminates clearly between all levels, a finding consistent with the literature. One of the reasons word
length was not a good discriminator between levels was that it showed U-shaped behavior (words were longer at level 1
than at level 2). Considering the fact that word-length has been a good indicator of general lexical sophistication in
other studies, this fact is surprising. This measure may have failed to discriminate in this corpus for two reasons: (1) the
everyday topic does not educe more sophisticated vocabulary or (2) the measure does not discriminate well among
lower proficiency ranges.
If we project these findings on the writings of learners who might go through these stages consecutively, we would
expect the lexicon to change rather subtly and slowly across proficiency levels except perhaps in the use of the most
frequent words at the beginning and the least frequent words at the end; it is not so much the relative frequency or
sophistication of separate lexical items that changes, but the different combination of words as suggested by the
Guiraud index in this section and the use of chunks in the previous section.

7
Note that this measure may not discriminate well for languages other than English, e.g. in cases where there is extensive use of compounding.
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 253

Errors

All learners make errors and they have been investigated extensively, operationalized as for example the number of
error free T-units (e.g. Leki et al., 2008; Wolfe-Quintero et al., 1998), and all studies show that the more advanced
learner makes fewer errors. To determine whether there were particular types of errors that were typical for different
proficiency levels, we coded, categorized, and sub-categorized errors at different linguistic levels: from sentence level
and word order to lexical and spelling errors. In our analyses we looked at the more fine-grained differences in several
errors as given in Table 4, but the numbers were too small and there was too much variation to provide any meaningful

Table 4
Types of errors identified.
Error type Examples Transfer
Lexical
1 Dutch word wegenwacht (breakdown assistance), ik (I), kinderen (children) Yes
2 Literal translation of L1 word a long (=> tall) boy Yes
3 Wrong preposition based on L1 I’ve been in Spain, I’m on this school now Yes
4 The use of an incorrect pronoun, based on L1 it are my new shoes Yes
5 Incorrect use of semantically related word It’s very happy there, but during I was shooting. . . No
6 Wrong word based on L1 it’s well funny, he grabbed himself together, the tent flew on fire Yes
7 Wrong word not based on L1 a much (instead of a lot) Yes
8 Odd construction, not based on L1 my front final name No
9 Half English, half Dutch biken, to same, myn Yes
10 Other A (I) like, the school light (lies) No
Spelling
1 Half Dutch, half English swemming, shijning, musik Yes
2 Phonetically spelled Franse (France), to hef (have), piepel (people) Yes
3 Similar words to/too, see/sea, there/their No
4 Tricky words awfull, allways, know/now No
5 Difficult words depend, teacher; exciting No
6 Other heelo, specail, holidiay No
Mechanics
1 Space errors olivetree, strawberryriver Yes
2 Capitalisation Biology, france, i No
Grammar
1 Wrong use of apostrophe to make weve, do’nt No
plural or 3rd p -s
2 Incorrect use of sg/pl a very cool teachers No
3 Dutch word order or confusion be/have I have not a good view, I like it not, I have not a friend, Yes
based on L1 I am nice teachere
4 Incorrect word form helping very good No
5 Dutch constructions I found a lot of the lottery, a shark was escaped, Yes
the belly of the bear
Word order
1 Odd word order not based on L1 I go on the train to school No
2 Odd word order based on L1 I have new friends made Yes
Punctuation
1 Comma splice I went on holiday with my whole family, we went to No
a camping and slept in a tent
2 Fused sentences the school is big I like free hours of food No
3 Fragment but at the end, when we went back No
Other errors
I was very happy with my to see my class, I looked No
Verb phrase errors
Verb form he go to school No
Verb use he has gone to school yesterday No
254 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

Types of errors made


7
6
5
4
3
2
1
0
1 2 3 4 5
Spelling Lexical Mechanical Grammar Punctuation Wordorder

Fig. 10. Distribution of types of errors at five levels of proficiency. All measures decrease. SD between 2–3 for spelling and between 1–2 for lexical
errors as indicated by stars. MD between 1–3 and 2–4 for spelling, between 1–3 for lexical errors, between 1–3 for grammar and punctuation errors,
and between 1–3 and 2–4 (trend) for word order errors. WD between 1–5 for mechanical errors.

Verb errors

12

10

0
1 2 3 4 5

Verb form Verb use

Fig. 11. Distribution of verb errors at five levels of proficiency. SD between 3–4 for verb use errors (decrease) as indicated by star. MD between 1–3
verb use errors (increase). WD between 1–4 and 2–5 verb form errors (decrease).

results.8 However, because our hypothesis was that beginners will rely more on the L1 as a resource than more
advanced learners, we wanted to distinguish transfer and non-transfer errors, which are indicated in the fourth column
of Table 4. We made no distinction between errors and mistakes because it is not always possible to determine the
difference and we felt that both may be signs of some competition in attention. As mentioned in the methods section,
the errors in the graphs are based on the relative values of errors (number of errors divided by the number of tokens)
made by each learner.
When all errors are added up, there are moderate differences among the proficiency levels (see Fig. 13). There was
decrease of errors between 1–3 and 2–4 and then between 3–4. There was no difference between 4–5.
Fig. 10 shows the distribution of the errors across the levels. Some errors occur most at the early stages. Both
spelling errors and lexical errors showed significant decreases. All other errors also decreased across the different
levels at different rates. Spelling and lexical errors remain relatively high. Note that the relative number of grammar
errors is low.
Fig. 11 shows verb errors separately because it is useful to distinguish between errors in verb form and verb use.
Verb form errors steadily decrease across proficiency levels. In contrast, verb use errors actually increased between
1–3 and then decreased between 3–4. Together with the results on the use of the perfect and progressive aspect (see
Fig. 7), we may assume that these tenses are overused at level 3, resulting in use errors.

8
We include the whole graph with all sub-categories to show how transfer errors and non-transfer errors were operationalized.
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 255

CoV Errors
300

250

200

150

100

50

0
1 2 3 4 5

Total Spelling Grammar Mechanical Wordorder Lexical Punctuation

Fig. 12. Coefficient of variance in number of errors.

Zooming in on transfer errors and non-transfer errors (see Table 4), we calculated the total of Dutch words, blends
of English and Dutch words, incorrect word or preposition related to the L1, spelling errors related to the L1, literal
translation of Dutch constructions, and word order errors related to L1 constructions. At level 1 we see quite a bit of
transfer, suggesting that the beginner relies heavily on the L1 to try to communicate. However, we see a decrease
between levels 1–2 and between 1–3. Non-transfer errors show differences between 1–3 and 1–4.
Fig. 12 shows that the total number of errors showed the least amount of inter-learner variation at all levels. Next are
spelling and grammar errors with a relatively low and steady coefficient of variance, suggesting that most learners
made such errors. Lexical errors showed substantial variation at level 1, but after that the degree of variation remained
relatively stable. A relatively high degree of variation was observed for punctuation errors (comma splices and fused
sentences), but the relative amount of variation did not show great differences at the different levels, suggesting that at
every level there were some learners who made these errors and others that did not. Word order errors showed the
greatest amount of variation, actually increasing across the proficiency levels, suggesting that there is much variation
among learners in this category. Note that unlike all the CoV’s of the other measures we have looked at thus far, the
level of variation does not go down per level and actually remains quite high at all proficiency levels.
To summarize, in discriminating between proficiency levels, the relative number of errors per text are only
moderately useful as they do not discriminate consistently. Only transfer errors show a significant decrease between
level 1 and 2 and verb use errors between levels 3–4. This finding is not wholly consistent with the literature, as error-
free clauses have shown to be quite useful. The reason for these conflicting findings may again be that we investigated
only beginner and intermediate learners. As far as variation is concerned, we may conclude that there is not much
difference between beginners and more advanced learners.
If we project these findings on the writings of learners who might go through these stages consecutively, we would
expect the type of errors to change across proficiency levels with beginning learners making more transfer errors than
the more proficient ones. The large drop in verb use errors after level 3 suggests that in learning to use new verb forms
the learner may overuse them and therefore use them inappropriately before using the forms in a more target like
manner. Furthermore, the high level of variation among learners of all levels suggests that at all proficiency levels there
will be some learners who make more errors than others at about the same level, even at the higher proficiency levels.

Discussion

The goal of this study was to explore the contribution that a DUB perspective can bring to the establishment of
objective measures to assess L2 learners’ written texts and at the same time to gain insight into the dynamic process of
L2 writing development.
Objective measures that can help ascertain a proficiency level and discriminate consistently among different
proficiency levels have been notoriously difficult to establish, and one of the reasons may have been that studies looked
at different groups of writers with for example different L1s or very heterogeneous groups, resulting in high degrees of
variation among learners (Ortega, 2003). Because one of the tenets of a DUB approach is that variability and variation
is inherent in any change and therefore developmental process, we wanted to show that there is variation even in a
corpus that was controlled for as many factors as possible.
256 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

To limit variation caused by predictors such as L1, age, aptitude, and task as much as possible, the present corpus
was controlled for these factors. Our corpus consisted of 437 writing samples from Dutch learners of English as a
foreign language between the ages of 11 and 14 with similar scholastic aptitude scores.
Each of these texts received a holistic rating for proficiency level, resulting in groups of texts at five levels from
beginner to intermediate. Each of these texts was coded for 64 variables. The effects we found of course reflect how the
raters have rated the texts.
As far as measures that could be used to discriminate consistently between proficiency levels, we found several
variables that distinguished well. (These also tended to show the strongest correlations with the holistic ratings; a
multiple regression analysis was conducted on the strongest correlations. See Appendix A for both.) Most of these
have also been commonly recognized in the literature. A new one that we established is the use of chunks. What is
noticeable about these measures is that they are all rather ‘‘broad’’ with large numbers, either because they are rather
general (Guiraud index), clustered (all dependent clauses combined, all chunks combined, all errors combined), or
frequently occurring constructions in the language (simple present versus simple past). In other words, they are more
likely to be statistically significant because they involve frequently occurring phenomena. Fig. 13 shows the
differences across the proficiency levels of the most robust measures.
To be able to compare rates of development, Fig. 13 shows normalized data. The solid lines represent the measures
that show significant differences between most levels: the increase in sentences with dependent clauses (all levels), the
decrease in simple present tense (all levels except 4–5), the increase in the Guiraud index (all levels), and the increase
in chunks (all levels). The dotted lines show measures that do not show significant differences between each
consecutive level but between consecutive levels. Total errors decrease between 1–3, 2–4, 3–5 and between 3–4. Most
frequent word types decrease between 1–2 and between the next levels (1–3, 2–4, and 3–5).
In our effort to gain more insight into the dynamic complexities of L2 writing development across the levels of
proficiency, we showed that when we zoomed in on specific constructions, we found non-linear development, variation
and changing relationships among the variables as one would expect from a DUB perspective.
Our data confirm our working hypothesis that learners move from the simplest and most frequent constructions to
more complex and less frequent ones. The picture that emerges is that beginners use simple sentences, very few
dependent clauses, and mostly the simple present tense. From one level to the next the language becomes a bit more
complex as all complexity, lexical, and accuracy variables increase. We also confirmed that at the earliest stages,
writers rely on their strongest resource, their L1, and transfer negatively. At some proficiency levels there is overuse of
different constructions. At the beginning all the simpler constructions are overused, but at level 3 we also see some
other signs of overuse. For example, the present perfect and progressive show a noticeable rise in the chart,
accompanied by a peak in verb use errors. Finite adverbial clauses are used relatively more than relative clauses or non-
finite clauses at level 4.
At the higher proficiency levels all measures we looked at improved: more complex constructions at all levels
emerged and fewer errors occurred. The total number of dependent clauses, total number of chunks, number of present
or past finite tenses, and the type token ratio were the strongest discriminators. Especially the robustness of the chunks
and Guiraud measure is remarkable: If we look at the lexicon measured through the customized lexical frequency

Fig. 13. The development of robust measures across proficiency levels.


M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 257

profile or average word length, we see subtle changes and redistribution in word use. In other words, new words are
used rather sparingly. We may therefore surmise that lexical changes take place more in the relative use of these words
and the way these words are combined.
The significant differences between proficiency levels we find in the broad measures conceal the variation we may
find in single variables. Despite the fact that our corpus was carefully controlled for L1, age, aptitude, and tasks, there
is a great deal of variation among learners. Still most variables do show change, but at different rates. Measures such as
the lexical frequency bands changed very little across proficiency levels and showed significant differences only
between levels 1 and 5; other measures, such as adverbial clauses, showed a significant difference between two
consecutive levels, suggesting that not all variables change at the same rate and that there may be different kinds of
relations between variables such as conditional support, support, or competition.
If we project these findings on the writings of learners who might go through these stages consecutively, we would
expect the following: Assuming that a significant difference between two consecutive levels is a sign of a sudden
change or a ‘‘spurt’’, then looking at these spurts may give us some insight into how a learner’s language system may
reorganize over time. Between levels 1–2 we see sudden changes in six variables (schematic chunks, fixed chunks,
particles, most frequent words, lexical errors, and mechanical errors), five of which are lexical in nature. Between
levels 2–3 there is a change in seven variables (decrease in simple sentences and increase in complex sentences,
adverbial clauses, non-finite clauses, partially schematic chunks, in particular complement constructions, and
spelling), five of which are syntactic in character, suggesting that between levels 2 and 3 there might especially be a
syntactic reorganization taking place, apparently with an overuse of some tenses, which are corrected between the next
two levels, 3 and 4. Between levels 3 and 4 both syntactic measures such as finite relative clauses keep changing, but
also lexical or semantic changes take place, as can be seen with the increase in fixed phrases and fewer verb use errors.
The transition between levels 4 and 5 is characterized by lexical changes only: particles, compounds, and fixed
phrases. The category of fixed phrases is especially interesting because they usually concern longer chunks.
Apparently between levels 4–5 the language changes especially with respect to the lexicon, with more chunks, in
particular particles and compounds.
To summarize, the data suggest that learners who go from level 1 to 2 are especially busy learning words; after a
certain threshold of vocabulary has been reached, the learners seem to focus more on syntactic complexity between
levels 2 and 3, which continues a bit between levels 3 and 4, but there it is mixed with lexical measures. After most
syntactic constructions are in place, there is a focus again on lexical matters between levels 4 and 5.

Variables that show significant differences between two consecutive levels


Lexically based variables in italics;
Syntactically based variables in SMALL CAPS.
Levels 1–2 Levels 2–3 Levels 3–4 Levels 4–5
SIMPLE SENTENCES
COMPLEX SENTENCES
ADVERBIAL CLAUSES RELATIVE CLAUSES
NON-FINITE CLAUSES
SCHEMATIC CHUNKS SCHEMATIC CHUNKS
Fixed chunks COMPLEMENT CONSTRUCTIONS
Particle Particle
Compounds
Fixed phrases Fixed phrases
Most frequent words
Lexical errors Errors in verb use
Spelling
Transfer errors

These findings are very much in line with findings in the DST literature, where different subsystems develop at
different rates and may have a changing relationship to each other over time. Therefore, it would be very useful to
follow similar learners over time to check whether these patterns indeed occur as suggested by this cross-sectional
study.
As far as variation is concerned, we hypothesized that beginners on the whole will show more variation among each
other than more proficient learners, who move more towards a norm. And indeed, visual inspection of almost all our
258 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

charts with coefficients of variance confirms this hypothesis. This applies to sentence constructions, types of
dependent clauses, chunks, and verb phrases. One exception is in errors, where the variance remains rather high. This
may be explained two different ways: It is possible that some learners will always make mistakes, even if they are more
advanced and the variation found would also remain at more advanced levels. The other possibility is that at even more
advanced levels than the ones presented here, learners will show less variance. The explanation could be that when
learners try something new they will go through a trial and error phase. This view is supported by Caspi (2010), who
traced four variables – lexical complexity, lexical accuracy, syntactic complexity, and syntactic accuracy – in four
advanced learners and found that only after a sub-system (lexical or syntactic) had developed, they became more
accurate in that sub-system.
Having shown both the implications for objective measures and the non-linear change, variation and different
relations among the variables, we will now go back to the general principles suggested by Siegler (2006), substituting
‘‘strategies’’ with ‘‘constructions’’:

(1) The discovery of new constructions tends to be the beginning of learning rather than the end, and when new
constructions are used they are often used inconsistently or incorrectly.

Our data has shown indeed that level 1 writers use almost all types of constructions at all linguistic levels (sentence,
clause, verb phrase, chunks, lexicon, vocabulary) that the level 5 writers use. However, they use them in very small
quantities with relatively the most errors.

(2) Language development reflects addition of new constructions, greater reliance on relatively advanced
constructions that are already being used, improved choices among constructions, and improved execution of
constructions.

Our data shows that more proficient writers use some new constructions (e.g. more advanced vocabulary and
chunks), but they especially rely more and more on the more complex constructions that they already used previously,
and as proficiency increases, they make fewer errors.

(3) Even though there is variation and variability in the process of language development, language development tends
to progress through regular stages.

Even though the broader measures do seem to discriminate clearly between proficiency levels on the whole, the
more detailed constructions do not directly support the idea of clear stages. Looking at the figures, we see more of a
continuous waning and waxing of constructions (cf. Larsen-Freeman, 2006, p. 590), with simpler ones gradually
disappearing and more complex ones appearing. On the other hand, we also see that beginners will use the present
tense and simple sentences predominantly and learners at level 3 mainly seem to be reorganizing their syntactic
system. We can assume that they can only do so because at level 2 they are focusing on acquiring enough words and
chunks to be able to form more complex constructions. At level 4 the main syntactic patterns seem to be in place, after
which at level 5 the focus is more on vocabulary and chunks again.

Conclusion

To conclude, this investigation has given us insight in the L2 writing patterns of one very particular group of
learners, and no a priori claims should be made about the generalizability of this study. However, it has shown why
a common yardstick to measure proficiency in writing products has been so difficult to ascertain: Language
develops in so many dimensions simultaneously and there is such a great deal of variation in the way learners
behave that we might have to replace the yardstick metaphor with a broccoli one; rather than looking for length we
should look at change in all directions, and measure the size of the head, making sure that all sides have developed
equally.
It would be necessary to follow learners longitudinally to see if indeed the predictions we made about L2 writing
development occur for individual learners over time. It would also be interesting to use this study as a benchmark and
compare other groups such as younger learners, older learners, learners with other backgrounds, and learners with
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 259

different aptitude levels. To what extent do we find similar or different patterns in L2 writing samples across different
proficiency levels? The findings can also help develop tools to objectify proficiency assessments in written texts as the
study has revealed a few variables that may be amenable to automatic coding such as the Guiraud index and average
sentence length. Moreover the present corpus can be used to train such a tool.

Appendix A
Means with standard deviations of all investigated variables at different proficiency levels and overall analyses results
Proficiency levels 1 2 3 4 5
No. of students N = 117 N = 89 N = 111 N = 65 N = 55
Growth at sentence level
T-unit measures
Words per T-unit 7.18 (3.6) 7.43 (2.5) 8.18 (2.1) 9.33 (3.1) 10.22 (3.0)
ANOVA F(4, 432) = 14.21 p < .001 eta2 = .12
Sentence types
Simple 70.23 (24.3) 63.32 (22.8) 51.24 (24.1) 42.96 (21.5) 37.11 (18.5)
Compound 18.23 (21.7) 17.64 (17.1) 20.05 (17.8) 21.04 (16.4) 17.61 (14.5)
Complex 4.75 (10.1) 8.85 (13.3) 16.53 (16.2) 21.62 (14.1) 26.95 (16.0)
Compound-complex 1.22 (6.0) 5.77 (10.9) 8.52 (11.7) 11.98 (16.8) 16.27 (13.3)
MANOVA F(16, 1710) = 13.26 p < .001 Hotelling’s Trace = .50 eta2 = .10
Sentences containing dependent clauses
8.04 (13.9) 17.72 (20.0) 30.63 (23.0) 42.07 (24.6) 52.88 (21.8)
ANOVA F(4, 432) = 61.49 p < .001 eta2 = .36
Dependent clauses
Finite nominal 2.79 (6.4) 6.62 (10.5) 7.39 (11.8) 10.26 (11.4) 14.60 (10.7)
Finite adverbial 1.80 (5.9) 5.54 (11.0) 10.09 (13.0) 12.91 (12.4) 13.59 (10.6)
Finite relative 1.23 (5.3) 2.75 (8.0) 5.01 (8.4) 9.31 (10.8) 10.81 (10.9)
Non-finite 2.22 (7.4) 2.82 (5.3) 8.13 (9.4) 9.60 (10.8) 13.87 (11.4)
MANOVA F(16, 1710) = 17.16 p < .001 Hotelling’s Trace = .67 eta2 = .14

Growth in verb phrase constructions


Use of tense
Simple present 93.15 (14.3) 74.40 (32.0) 35.07 (33.8) 17.33 (24.1) 16.41 (21.5)
Simple past 4.94 (11.9) 22.44 (32.3) 60.14 (34.8) 79.93 (25.3) 80.63 (22.1)
MANOVA F(8, 860) = 76.20 p < .001 Hotelling’s Trace = 1.42 eta2 = .42
Aspect, modality, voice
Conditional 3.94 (7.6) 5.78 (8.0) 5.49 (7.2) 6.82 (6.6) 9.21 (7.4)
Progressive 3.51 (7.3) 3.17 (5.8) 4.67 (7.4) 2.83 (5.3) 4.97 (6.9)
Passive 0.47 (2.8) 1.10 (3.2) 1.59 (4.0) 2.78 (5.6) 2.01 (3.7)
Present perfect 1.62 (4.8) 3.03 (6.3) 4.13 (8.2) 1.76 (4.0) 1.36 (2.7)
Past perfect 0.28 (3.1) 0.12 (0.9) 0.66 (2.2) 0.98 (3.0) 1.60 (3.0)
MANOVA F(20, 1706) = 3.57 p < .001 Hotelling’s Trace = .17 eta2 = .04

Chunks
Total 1.66 (1.9) 3.36 (3.4) 4.52 (3.1) 5.98 (3.3) 9.45 (5.3)
ANOVA F(4, 432) = 58.67 p < .001 eta2 = .35
Partially schematic 0.21 (0.5) 0.76 (1.1) 1.47 (1.6) 1.91 (1.7) 2.62 (2.0)
Fully fixed 1.45 (1.7) 2.60 (2.7) 3.05 (2.3) 4.08 (2.4) 6.84 (4.1)
MANOVA F(8, 860) = 31.15 p < .001 Hotelling’s Trace = .58 eta2 = .23
Complement 0.10 (0.4) 0.35 (0.8) 0.85 (1.2) 1.14 (1.2) 1.53 (1.6)
Structure 0.10 (0.4) 0.42 (0.7) 0.62 (0.9) 0.77 (1.0) 1.09 (1.0)
MANOVA F(8, 860) = 18.43 p < .001 Hotelling’s Trace = .34 eta2 = .15
Particle 0.53 (0.9) 1.20 (1.7) 1.28 (1.2) 1.54 (1.5) 2.60 (2.1)
Compound 0.54 (0.8) 0.54 (0.9) 0.53 (1.0) 0.65 (1.0) 1.35 (1.5)
Collocation 0.21 (0.7) 0.36 (0.5) 0.60 (1.0) 0.80 (0.9) 1.11 (1.4)
Fixed phrase 0.15 (0.4) 0.40 (0.8) 0.56 (0.7) 0.97 (1.2) 1.53 (1.5)
Discourse 0.03 (0.2) 0.09 (0.3) 0.08 (0.3) 0.12 (0.3) 0.25 (0.6)
260 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263
Appendix A (Continued )
Means with standard deviations of all investigated variables at different proficiency levels and overall analyses results
Proficiency levels 1 2 3 4 5
No. of students N = 117 N = 89 N = 111 N = 65 N = 55
MANOVA F(20, 1706) = 10.50 p < .001 Hotelling’s Trace = .49 eta2 = .11

Lexical sophistication and variation


Average word length 3.48 (0.3) 3.44 (0.3) 3.46 (0.2) 3.47 (0.2) 3.52 (0.2)
ANOVA F(4, 432) = 0.93 p = .448 eta2 = .01
Customized Lexical profile: Tokens
0–20 23.42 (6.0) 22.77 (5.5) 21.54 (4.5) 20.35 (4.2) 20.07 (4.5)
20–40 22.78 (7.2) 20.92 (6.5) 19.97 (5.0) 20.43 (4.4) 18.77 (3.9)
40–60 19.92 (7.8) 20.64 (6.2) 20.55 (5.1) 20.21 (5.2) 18.41 (4.1)
60–80 18.66 (7.6) 18.77 (6.2) 20.13 (6.0) 19.75 (5.0) 20.15 (4.8)
80–100 15.21 (8.4) 16.91 (8.23) 17.80 (6.8) 19.26 (6.1) 22.60 (6.9)
MANOVA F(20, 1706) = 4.22 p < .001 Hotelling’s Trace = .16 eta2 = .04
Customized Lexical profile: Types
0–20 18.32 (6.4) 15.71 (6.9) 16.17 (6.3) 13.85 (4.1) 11.24 (3.0)
20–40 24.77 (7.9) 23.54 (6.5) 21.53 (5.7) 20.04 (4.1) 17.37 (3.4)
40–60 20.88 (6.4) 21.15 (6.1) 21.03 (6.1) 21.98 (5.0) 20.97 (4.8)
60–80 18.35 (7.4) 20.09 (7.0) 21.05 (5.7) 21.06 (6.4) 24.11 (5.9)
80–100 17.68 (9.4) 19.50 (9.4) 20.21 (9.0) 23.08 (6.7) 26.31 (8.1)
MANOVA F(20, 1706) = 6.17 p < .001 Hotelling’s Trace = .23 eta2 = .06
Guiraud 4.08 (0.6) 4.53 (0.6) 4.98 (0.7) 5.53 (0.6) 6.00 (0.6)
ANOVA F(4, 432) = 111.28 p < .001 eta2 = .51

Errors
Total 36.31 (21.3) 30.61 (20.3) 28.53 (18.1) 18.74 (12.5) 12.08 (9.4)
ANOVA F(4, 432) = 21.70 p < .001 eta2 = .17
Types of errors
Spelling 6.24 (5.3) 4.97 (4.3) 3.25 (3.0) 2.89 (2.7) 2.78 (3.2)
Lexical 6.36 (9.9) 3.49 (3.0) 2.92 (2.5) 2.47 (2.4) 1.45 (1.6)
Mechanical 3.51 (4.0) 2.15 (3.0) 2.95 (3.6) 2.64 (3.5) 1.77 (3.1)
Grammar 2.68 (2.5) 2.06 (2.1) 1.57 (1.4) 1.40 (1.6) 1.34 (1.7)
Punctuation 1.90 (3.2) 1.51 (2.3) 0.97 (1.4) 0.78 (1.2) 1.08 (2.0)
Word order 1.22 (1.7) 0.81 (1.3) 0.44 (0.8) 0.22 (0.4) 0.27 (0.7)
MANOVA F(24, 1702) = 6.97 p < .001 Hotelling’s Trace = .39 eta2 = .09
Verb errors
Verb form 7.63 (11.5) 5.72 (6.5) 4.77 (6.7) 2.89 (4.8) 1.29 (2.7)
Verb use 6.61 (9.7) 9.83 (14.1) 11.65 (13.7) 5.45 (8.9) 2.07 (4.3)
MANOVA F(8, 860) = 7.87 p < .001 Hotelling’s Trace = .15 eta2 = .07
Transfer errors 15.89 (35.4) 7.23 (5.2) 5.26 (3.8) 4.74 (3.3) 2.92 (2.9)
Non-transfer errors 10.71 (7.1) 9.00 (6.8) 7.54 (6.0) 6.61 (5.7) 5.95 (5.8)
MANOVA F(8, 860) = 7.13 p < .001 Hotelling’s Trace = .13 eta2 = .06
Non-transfer errors 10.71 (7.1) 9.00 (6.8) 7.54 (6.0) 6.61 (5.7) 5.95 (5.8)
MANOVA F(8, 860) = 7.13 p < .001 Hotelling’s Trace = .13 eta2 = .06

Correlations between holistic quality of writings with the text measures


Holistic quality of writings
Growth at sentence length
Words per T-unit .332**
Simple .460**
Compound .025
Complex .487**
Compound-complex .392**
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 261
Appendix A (Continued )
Correlations between holistic quality of writings with the text measures
Holistic quality of writings
Sentences containing a dependent clause .507**
Finite nominal .336**
Finite adverbial .380**
Finite relative .374**
Non-finite .412**
Growth in verb phrase constructions
Simple present .730**
Simple past .728**
Conditional .197**
Progressive .049
Passive .177**
Present perfect .003
Past perfect .167**
Chunks
Chunks total .580**
Partially schematic .502**
Fully fixed .517**
Complement .432**
Structure .373**
Particle .375**
Compound .183**
Collocation .310**
Fixed phrase .429**
Discourse .180**
Lexical sophistication and variation
Average word length .032
Customized lexical profile types 0–20 .237**
Customized lexical profile types 20–40 .195**
Customized lexical profile types 40–60 .046
Customized lexical profile types 60–80 .086
Customized lexical profile types 80–100 .279**
Customized lexical profile tokens 0–20 .335**
Customized lexical profile tokens 20–40 .366**
Customized lexical profile tokens 40–60 .028
Customized lexical profile tokens 60–80 .244**
Customized lexical profile tokens 80–100 .291**
Guiraud .712**
Errors
Total errors .399**
Spelling errors .311**
Lexical errors .273**
Mechanical errors .114*
Grammar errors .248**
Punctuation errors .160**
Word order errors .302**
Verb form errors .261**
Verb use errors .100*
Transfer errors .440**
Non-transfer errors .484**
*
Correlation is significant at the 0.05 level (2-tailed).
**
Correlation is significant at the 0.01 level (2-tailed).

A hierarchical multiple regression was conducted to find out to what extent the linguistic measures which have
already show strong relations with the holistic scores in the preliminary correlational analysis could explain the
variance in the holistic ratings. The linguistic measures with the strongest correlation coefficient (printed bold) from
262 M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263

each category were included in the model, except for the category of errors where the total error was selected because it
was considered the most representative and had a strong relation as well.

Hierarchical multiple regression of predictors with large correlation coefficients on holistic quality
R2 Adjusted R2 R2 change B b
**
Model 1 .257 .255 .257
Sentences containing a dependent clause .006 .093**
**
Model 2 .572 .571 .316
Simple past .015 .445**
**
Model 3 .671 .668 .057
Chunks total .042 .128**
Model 4 .727 .725 .014
Guiraud .449 .338**
Model 5 .758 .755 .030**
Total errors ‘ .012 .182**
**
p < .01.

References

Bardovi-Harlig, K. (2000). Tense and aspect in second language acquisition: Form, meaning, and use. Malden, MA: Blackwell.
Bassano, D., & van Geert, P. (2007). Modeling continuity and discontinuity in utterance length: A quantitative approach to changes, transitions and
intra-individual variability in early grammatical development. Developmental Science, 10, 588–612.
Bayley, R. (1999). The primacy of aspect hypothesis revisited: Evidence from language shift. Southwest Journal of Linguistics, 18(2), 1–22.
Caspi, T. (2010). A dynamic perspective on second language development. Doctoral dissertation. Retrieved from: http://irs.ub.rug.nl/ppn/
329338412.
Council of Europe. (2001). Common European Framework of Reference for languages: Learning, teaching, assessment. Cambridge, UK:
Cambridge University Press.
de Angelis, G., & Jessner, U. (in press). Writing across languages in a bilingual context: A dynamic systems theory perspective. In R. Manchon (Ed.),
L2 writing development: Multiple perspectives. Berlin/New York: Mouton de Gruyter.
de Bot, K., Lowie, W., & Verspoor, M. (2007). A dynamic systems theory approach to second language acquisition. Bilingualism, Language and
Cognition, 10, 7–21.
Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing.,
9, 123–145.
Hinkel, E. (2002). Second language writers’ text: Linguistic and rhetorical features. Mahwah, NJ: Lawrence Erlbaum.
Hu, Z., Brown, D. F., & Brown, L. B. (1982). Some linguistic differences in the written English of Chinese and Australian students. Language
Learning and Communication, 1(1), 39–49.
Jarvis, S., Grant, L., Bikowski, D., & Ferris, D. (2003). Exploring multiple profiles of highly rated learner compositions. Journal of Second Language
Writing., 12, 377–403.
Kops Hagedoorn, S. (2009). Yes or no: Do you know? Paul Meara’s English as a foreign language vocabulary test as a predictor of general foreign
language proficiency. Master’s thesis, University of Groningen, Groningen, NL.
Langacker, R. W. (2000). A dynamic usage-based model. In M. Barlow & S. Kemmer (Eds.), Usage-based models of language (pp. 1–63). Palo Alto,
CA: CSLI.
Langacker, R. W. (2008). Cognitive grammar as a basis for language instruction. In P. Robinson & N. C. Ellis (Eds.), Handbook of cognitive
linguistics and second language acquisition (pp. 66–88). New York, NY: Routledge.
Larsen-Freeman, D. (1976). An explanation for the morpheme acquisition order of second language learners. Language Learning, 26, 125–134.
Larsen-Freeman, D. (2006). The emergence of complexity, fluency and accuracy in the oral and written production of five Chinese learners of
English. Applied Linguistics, 27, 590–619.
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16, 307–322.
Leki, I., Cumming, A. H., & Silva, T. (2008). A synthesis of research on second language writing in English. New York, NY: Routledge.
MacWhinney, B., & Snow, C. (1990). The child language data exchange system: An update. Journal of Child Language, 17, 457–472.
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd ed.). Mahwah, NJ: Lawrence Erlbaum.
Moon, R. (1997). Vocabulary connections: Multi-word items in English. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, acquisition
and pedagogy (pp. 40–63). Cambridge, UK: Cambridge University Press.
Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied
Linguistics, 30, 555–578.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied
Linguistics, 24, 492–518.
M. Verspoor et al. / Journal of Second Language Writing 21 (2012) 239–263 263

Polio, C. (2001). Research methodology in second language writing research: The case of text-based studies. In T. Silva & P. K. Matsuda (Eds.), On
second language writing (pp. 91–116). Mahwah, NJ: Lawrence Erlbaum.
Read, J. (2000). Assessing vocabulary. Cambridge, UK: Cambridge University Press.
Reynolds, D. W. (2002). Linguistic and cognitive development in the writing of middle-grade English language learners. Southwest Journal of
Linguistics. Linguistic Association of the Southwest. HighBeam Research. http://www.highbeam.com Accessed 12.10.11.
Robinson, B. F., & Mervis, C. B. (1998). Disentangling early language development: Modeling lexical and grammatical acquisition using and
extension of case-study methodology. Developmental Psychology, 34, 363–375.
Robinson, P., & Ellis, N. C. (2008). Handbook of cognitive linguistics and second language acquisition. New York, NY: Routledge.
Siegler, R. S. (2006). Microgenetic analyses of learning. In D. Kuhn & R. S. Siegler (Eds.), Handbook of child psychology, volume 2: Cognition,
perception, and language (6th ed., pp. 464–510). Hoboken, NJ: Wiley.
Siegler, R. S., & Svetina, M. (2002). A microgenetic/cross-sectional study of matrix completion: Comparing short-term and long-term change. Child
Development, 73, 793–809.
Skehan, P. (1989). Individual differences in second language learning. London: Edward Arnold.
Smiskova, H., & Verspoor, M. (in press). Development of chunks in Dutch L2 learners of English. In J. Evers-Vermeul, L. Rasier & E. Tribushinina
(Eds.), Usage-based approaches to language acquisition and language teaching. Berlin, GE: Mouton de Gruyter.
Sonomura, M. O. (1996). Idiomaticity in the basic writing of American English: Formulas and idioms in the writing of multilingual and Creole-
speaking community college students in Hawaii. New York. NY Peter Lang.
Sparks, R. L., Patton, J., Ganschow, L., & Humbach, N. (2009). Long-term relationships among early first language skills, second language aptitude,
second language affect, and later second language proficiency. Applied Psycholinguistics, 30, 725–755.
Spoelman, M., & Verspoor, M. (2010). Dynamic patterns in development of accuracy and complexity: A longitudinal case study in the acquisition of
Finnish. Applied Linguistics, 31, 532–553.
van Dijk, M., & van Geert, P. (2007). Wobbles, humps and sudden jumps: A case study of continuity, discontinuity and variability in early language
development. Infant and Child Development, 16, 7–33.
van Dijk, M., Verspoor, M., & Lowie, W. (2011). Variability and DST. In M. Verspoor, K. de Bot, & W. Lowie (Eds.), A dynamic approach to second
language development: Methods and techniques. (pp. 55–84). Amsterdam: John Benjamins.
van Geert, P. (1991). A dynamic systems model of cognitive and language growth. Psychological Review, 98, 3–53.
van Geert, P. (2009). A comprehensive dynamic systems theory of language development. In K. de Bot & R. W. Schrauf (Eds.), Language
development over the lifespan (pp. 60–104). New York, NY: Routledge.
van Hout, R., & Vermeer, A. (2007). Comparing measures of lexical richness. In H. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and
assessing vocabulary knowledge (pp. 93–115). Cambridge, UK: Cambridge University Press.
Verspoor, M., & Behrens, H. (2011). Dynamic systems theory and a usage-based approach to second language development. In M. Verspoor, K. de
Bot, & W. Lowie (Eds.), A dynamic approach to second language development: Methods and techniques (pp. 25–38). Amsterdam: John
Benjamins.
Verspoor, M., de Bot, K., & Xu, X. (2011). The role of input and scholastic aptitude in second language development. TTWiA (Toegepaste
taalwetenschap in artikelen), 86(2), 47–60.
Verspoor, M., Lowie, W., & van Dijk, M. (2008). Variability in second language development from a dynamic systems perspective. The Modern
Language Journal, 92, 214–231.
Verspoor, M., & Sauter, K. (2000). English sentence analysis: An introductory course. Amsterdam, NL: Benjamins.
Verspoor, M., & van Dijk, M. (in press). Variability in a dynamic systems approach to second language acquisition. In C. A. Chapelle (Ed.), The
Encyclopedia of Applied Linguistics. Oxford, UK: Wiley-Blackwell.
Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity.
Honolulu, HI: University of Hawaiı́, Second Language Teaching and Curriculum Center.

Marjolijn Verspoor is Associate Professor of Applied Linguistics. Her research focuses on second/foreign language acquisition from a dynamic
usage based perspective.

Monika S. Schmid is Full Professor of English Language. Her research focuses on language attrition.

Xiaoyan Xu has recently obtained her PhD at the University of Groningen on attrition and retention of English in Chinese and Dutch University
students.

You might also like