You are on page 1of 40

QUANTITATIVE METHODS FOR

SECOND LANGUAGE RESEARCH

Quantitative Methods for Second Language Research introduces approaches to and


techniques for quantitative data analysis in second language research, with a
primary focus on second language learning and assessment research. It takes a
conceptual, problem-solving approach by emphasizing the understanding of sta-
tistical theory and its application to research problems while paying less attention
to the mathematical side of statistical analysis. The text discusses a range of com-
mon statistical analysis techniques, presented and illustrated through applications
of the IBM Statistical Package for Social Sciences (SPSS) program. These include
tools for descriptive analysis (e.g., means and percentages) as well as inferential
analysis (e.g., correlational analysis, t-tests, and analysis of variance [ANOVA]).
The text provides conceptual explanations of quantitative methods through the
use of examples, cases, and published studies in the field. In addition, a companion
website to the book hosts slides, review exercises, and answer keys for each chapter
as well as SPSS files. Practical and lucid, this book is the ideal resource for data
analysis for graduate students and researchers in applied linguistics.

Carsten Roever is Associate Professor in Applied Linguistics in the School of


Languages and Linguistics at the University of Melbourne, Australia.

Aek Phakiti is Associate Professor in TESOL in the Sydney School of Education


and Social Work at the University of Sydney, Australia.
QUANTITATIVE
METHODS FOR
SECOND LANGUAGE
RESEARCH
A Problem-Solving Approach

Carsten Roever and Aek Phakiti


First published 2018
by Routledge
711 Third Avenue, New York, NY 10017
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2018 Taylor & Francis
The right of Carsten Roever and Aek Phakiti to be identified as authors
of this work has been asserted by them in accordance with sections 77 and
78 of the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced
or utilised in any form or by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying and recording,
or in any information storage or retrieval system, without permission in
writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or
registered trademarks, and are used only for identification and explanation
without intent to infringe.
Every effort has been made to contact copyright-holders. Please advise
the publisher of any errors or omissions, and these will be corrected in
subsequent editions.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book has been requested
ISBN: 978-0-415-81401-0 (hbk)
ISBN: 978-0-415-81402-7 (pbk)
ISBN: 978-0-203-06765-9 (ebk)

Typeset in Bembo
by Apex CoVantage, LLC

Visit the Companion Website: www.routledge.com/cw/roever


CONTENTS

List of Illustrations vii


Foreword xv
Preface xvii
Acknowledgments xxii

1 Quantification 1

2 Introduction to SPSS 14

3 Descriptive Statistics 28

4 Descriptive Statistics in SPSS 44

5 Correlational Analysis 60

6 Basics of Inferential Statistics 81

7 T-Tests 92

8 Mann-Whitney U and Wilcoxon Signed-Rank Tests 106

9 One-Way Analysis of Variance (ANOVA) 117

10 Analysis of Covariance (ANCOVA) 135

11 Repeated-Measures ANOVA 154


vi Contents

12 Two-Way Mixed-Design ANOVA 166

13 Chi-Square Test 182

14 Multiple Regression 200

15 Reliability Analysis 219

Epilogue 246

References 250
Key Research Terms in Quantitative Methods 255
Index 263
ILLUSTRATIONS

Figures
2.1 New SPSS spreadsheet 16
2.2 SPSS Variable View 17
2.3 Type Column 18
2.4 Variable Type dialog 18
2.5 Label Column 18
2.6 Creating student and score variables for the Data View 19
2.7 Adding variables named ‘placement’ and ‘campus’ 19
2.8 The SPSS spreadsheet in Data View mode 19
2.9 Accessing Case Summaries in the SPSS menus 20
2.10 Summarize Cases dialog 21
2.11 SPSS output based on the variables set in the Summarize
Cases dialog 21
2.12 SPSS menu to open and import data 23
2.13 SPSS dialog to open a data file in SPSS 23
2.14 Illustrated example of an Excel data file to be imported into SPSS 24
2.15 SPSS dialog when opening an Excel data source 24
2.16 The personal factor questionnaire on demographic information 25
2.17 SPSS spreadsheet that shows the demographic data of
Phakiti et al. (2013) 25
2.18 The questionnaires and types of scales and descriptors in
Phakiti et al. (2013) 26
2.19 SPSS spreadsheet that shows questionnaire items of
Phakiti et al. (2013) 26
3.1 A pie chart based on gender 34
viii Illustrations

3.2 A pie chart based on a 10-point score range 34


3.3 A bar chart based on a 10-point score range 35
3.4 An example of questionnaire items using a Likert-type scale 40
3.5 The positively skewed distribution of length of residence 41
3.6 The negatively skewed distribution of speech act scores 42
3.7 The low skewed distribution of implicature scores 42
4.1 Ch4TEP.sav (Data View) 45
4.2 Ch4TEP.sav (Variable View) 45
4.3 Defining gender in the Value Labels dialog 46
4.4 Defining selfrate (self-rating of proficiency) in the Value Labels dialog 47
4.5 Defining missing values 48
4.6 SPSS menu for computing descriptive statistics 49
4.7 Frequencies dialog 50
4.8 Frequencies: Statistics dialog 50
4.9 Frequencies: Charts dialog 51
4.10 A histogram of the self-rating of proficiency variable with a normal
curve 53
4.11 SPSS Descriptives options 54
4.12 SPSS graphical options 55
4.13 SPSS bar option 55
4.14 SPSS pie option 56
4.15 SPSS histogram option 57
4.16 The histogram for the total score variable 58
5.1 A scatterplot displaying the values of two variables
with a perfect positive correlation of 1 64
5.2 A scatterplot displaying the values of two variables with a
correlation coefficient of 0.90 64
5.3 A scatterplot displaying the values of two variables with a
correlation coefficient of 0.33 65
5.4 A scatterplot displaying the values of two variables with a perfect
negative correlation coefficient of –1 66
5.5 A scatterplot displaying the values of two variables with a low
correlation coefficient of 0.06 67
5.6 SPSS output displaying the Pearson product moment correlation
between two subsections of a grammar test 71
5.7 A view of Ch5correlation.sav 72
5.8 SPSS graphs menu with Scatter/Dot option 74
5.9 Simple scatterplot options 74
5.10 A scatterplot displaying the values of the listening and
grammar scores 75
5.11 Adding the fit line in a scatterplot 76
5.12 A scatterplot displaying the values of the listening and
grammar scores with a line of best fit added 77
Illustrations ix

5.13 SPSS Bivariate Correlations dialog 77


6.1 A normally distributed data set 85
7.1 Accessing the SPSS menu to perform the
independent-samples t-test 98
7.2 SPSS dialog for the independent-samples t-test 99
7.3 Lee Becker’s effect size calculators 101
7.4 Accessing the SPSS menu to perform the
paired-samples t-test 102
7.5 Paired-Samples T Test dialog 103
8.1 SPSS menu to perform the Mann-Whitney U test 109
8.2 SPSS dialog to perform the Mann-Whitney U test 110
8.3 SPSS menu to perform the Wilcoxon Signed-rank test 113
8.4 SPSS dialog to perform the Wilcoxon Signed-rank test 113
9.1 SPSS menu to launch a one-way ANOVA 123
9.2 Univariate dialog for a one-way ANOVA 123
9.3 Options for post hoc tests 124
9.4 Options dialog for ANOVA 125
9.5 SPSS menu to launch the Kruskal-Wallis test 129
9.6 Setup for the Kruskal-Wallis test 130
9.7 Variable entry for the Kruskal-Wallis test 131
9.8 Analysis settings for the Kruskal-Wallis test 131
9.9 Kruskal-Wallis test results 132
9.10 Model Viewer window for the Kruskal-Wallis test 132
9.11 Viewing pairwise comparisons 133
9.12 Pairwise comparisons in the Kruskal-Wallis test 133
10.1 Accessing the SPSS menu to launch the Compute
Variable dialog 137
10.2 Compute Variable dialog 137
10.3 Checking ANCOVA assumption of independence of
covariate and independent variable 141
10.4 Accessing the SPSS menu to select Cases for analysis 143
10.5 Select Cases dialog 144
10.6 Defining case selection conditions 144
10.7 Data View with cases selected out 145
10.8 Accessing the SPSS menu to launch ANCOVA 146
10.9 Univariate dialog for choosing a model to examine an
interaction among factors and covariances 147
10.10 Univariate: Model dialog for defining the interaction term to
check the homogeneity of regression slopes 147
10.11 Changing the analysis setup back to the original setup 149
10.12 Options in the Univariate dialog 149
11.1 A pretest, posttest, and delayed posttest design 154
11.2 Accessing the SPSS menu to launch a repeated-measures ANOVA 159
x Illustrations

11.3 Repeated Measures Define Factors dialog 159


11.4 Repeated Measures dialog 160
11.5 Repeated Measures: Options dialog 161
12.1 Diagram of a pretest-posttest control-group design 167
12.2 Changes across time points among the five groups 169
12.3 The Repeated Measures dialog 170
12.4 Repeated Measures: Profile Plots dialog 171
12.5 Repeated Measures: Profile Plots dialog with colres∗section shown 172
12.6 Repeated Measures: Post Hoc Multiple Comparisons for Observed
Means dialog 172
12.7 Repeated Measures: Options dialog 173
12.8 Estimated marginal means of MEASURE_1 180
13.1 Accessing the SPSS menu to launch the two-dimensional
chi-square test 191
13.2 Crosstabs dialog 192
13.3 Crosstabs: Statistics settings 193
13.4 Crosstabs: Cell Display dialog 193
13.5 VassarStats website’s chi-square calculator
(http://vassarstats.net/newcs.html) 196
13.6 Contingency table for two rows and two columns 197
13.7 Contingency table for two rows and two columns with data
entered 197
13.8 Chi-square test results from VassarStats 198
14.1 A scatterplot of the relationship between chocolate consumption
and vocabulary recall success 201
14.2 Accessing the SPSS menu to launch multiple regression 207
14.3 Linear Regression dialog 207
14.4 Linear Regression: Statistics dialog 208
14.5 Linear Regression: Options dialog 208
14.6 Linear Regression dialog for a hierarchical regression (Block 1 of 1) 213
14.7 Linear Regression dialog for a hierarchical regression (Block 2 of 2) 214
14.8 Linear Regression dialog for a hierarchical regression (Block 3 of 3) 214
15.1 Accessing the SPSS menu to launch Cronbach’s alpha analysis 224
15.2 Reliability Analysis dialog for Cronbach’s alpha analysis 224
15.3 Reliability Analysis: Statistics dialog 225
15.4 A selection from Ch15analyticrater.sav (Data View) 228
15.5 Excerpt from Ch15raters.sav (Data View) 231
15.6 Accessing the SPSS menu to launch Reliability Analysis 232
15.7 Reliability Analysis dialog for the split-half analysis 233
15.8 Excerpt from Ch15kappa.sav 235
15.9 Accessing the SPSS menu to launch Crosstabs for kappa analysis 236
15.10 Crosstabs dialog 237
15.11 Crosstabs: Statistics dialog for choosing kappa 237
Illustrations xi

15.12 Reliability Analysis dialog for raters’ totals as selected variables 240
15.13 Reliability Analysis: Statistics dialog for intraclass correlation analysis 241

Tables
1.1 Examples of learners and their scores 4
1.2 An example of learners’ scores converted into percentages 4
1.3 How learners are rated and ranked 5
1.4 How learners are scored on the basis of performance descriptors 6
1.5 How learners are scored on a different set of performance descriptors 6
1.6 Nominal data and their numerical codes 8
1.7 Essay types chosen by students 8
1.8 The three placement levels taught at three different locations 9
1.9 The students’ test scores, placement levels, and campuses 9
1.10 The students’ placement levels and campuses 10
1.11 The students’ campuses 11
1.12 Downward transformation of scales 11
3.1 IDs, gender, self-rated proficiency, and test score of the first 50
participants 29
3.2 Frequency counts based on gender 31
3.3 Frequency counts based on test takers’ self-assessment of
their English proficiency 31
3.4 Frequency counts based on test takers’ test scores 32
3.5 Frequency counts based on test takers’ test score ranges 32
3.6 Test score ranges based on quartiles 33
3.7 Imaginary test taker sample with an outlier 36
4.1 SPSS output on the descriptive statistics 51
4.2 SPSS frequency table for gender 52
4.3 SPSS frequency table for the selfrate variable
(self-rating of proficiency) 52
4.4 Taxonomy of the questionnaire and Cronbach’s alpha (N = 51) 59
4.5 Example of item-level descriptive statistics (N = 51) 59
5.1 Descriptive statistics of the listening, grammar, vocabulary, and
reading scores (N = 50) 73
5.2 Pearson product moment correlation between the listening
scores and grammar scores 78
5.3 Spearman correlation between the listening scores and
grammar scores 78
6.1 Correlation between verb tenses and prepositions in a
grammar test 84
6.2 Explanations of the relationship between the sample size and the
effect 88
6.3 The null hypothesis versus alternative hypothesis 89
xii Illustrations

7.1 Mean and standard deviation of error counts for generation


1.5 learners and L1 writers 93
7.2 Mean and standard deviations of ratios of error-free clauses in the
cartoon description task for both modalities 95
7.3 Means and standard deviations of the two groups 99
7.4 Levene’s test 100
7.5 The independent-samples t-test results 100
7.6 Means and standard deviations of the two means 104
7.7 Correlation coefficient between the two means 104
7.8 Paired-samples t-test results 104
8.1 Mann-Whitney U test results 107
8.2 Descriptive statistics (N = 46) 110
8.3 Mean ranks in the Mann-Whitney U test (N = 46) 110
8.4 Mann-Whitney U test statistics (N = 46) 111
8.5 Descriptive statistics (N = 46) 114
8.6 Ranks statistics in the Wilcoxon Signed-rank test (N = 46) 115
8.7 Wilcoxon Signed-rank test statistics (N = 46) 115
9.1 Immediate posttest 118
9.2 Descriptives for proficiency in TEP 125
9.3 Levene’s statistic 126
9.4 Tests of between-subjects effects as the ANOVA result 126
9.5 Scheffé post hoc test for multiple comparisons 127
10.1 ANOVA for the independent variable and covariate
(test between-subjects effects) 141
10.2 Post hoc tests for independence of covariate and independent
variable (multiple comparisons) 142
10.3 Post hoc tests for the independence of covariate and independent
variable 142
10.4 Output of homogeneity of regression slopes check
(tests of between-subjects effects) 148
10.5 Descriptive statistics of the routines scores between the two
residence groups 150
10.6 Levene’s test 150
10.7 ANCOVA analysis 151
10.8 Estimated means after adjustment for the covariate 151
10.9 Group comparisons 152
11.1 Six different tests with 10 vocabulary items 155
11.2 The within-subjects factors 162
11.3 Descriptive statistics 162
11.4 The multivariate test output 162
11.5 Mauchly’s Test of Sphericity 162
11.6 Results from tests of within-subjects effects 163
11.7 Estimates 163
Illustrations xiii

11.8 Pairwise comparisons 164


12.1 Descriptive statistics of the percentage scores for correct use
for the five treatment conditions by three tasks 168
12.2 The within-subjects factors 174
12.3 The between-subjects factors 174
12.4 Descriptive statistics 175
12.5 Mauchly’s Test of Sphericity 175
12.6 Results from tests of within-subjects effects 176
12.7 Levene’s test 176
12.8 The between-subjects effects 176
12.9 Descriptive statistics for ‘residence’ 177
12.10 Pairwise comparisons on collapsed residence 177
12.11 Univariate tests 177
12.12 Post hoc test 178
12.13 Descriptive statistics for sections 179
12.14 Pairwise comparisons on test sections 179
13.1 Frequency of phrasal verb use in five registers 183
13.2 Chi-square observed and expected counts and residuals 184
13.3 Frequency counts of language-related episodes (LREs) by
accuracy of recall 185
13.4 Marginal totals, expected frequencies, and residuals for recall by
type of LREs 186
13.5 Collocation use by proficiency level 188
13.6 Marginal totals, expected frequencies, and residuals for
collocation type and proficiency level 188
13.7 SPSS summary of the two-dimensional chi-square analysis 194
13.8 Cross-tabulation output based on gender and collapsed residence 194
13.9 Outputs of the two-dimensional chi-square test 195
13.10 Symmetric measures for the two-dimensional chi-square test 195
14.1 Three hierarchical regression models 204
14.2 Descriptive statistics 209
14.3 Correlations among the outcome and predictor variables 209
14.4 Variables entered/removed 210
14.5 Model summary 210
14.6 The ANOVA result 211
14.7 Model coefficients output: Unstandardized and standardized Beta
coefficients 211
14.8 Model coefficients output: Correlations and collinearity statistics 212
14.9 Model summary 215
14.10 ANOVA results 216
14.11 Model coefficients output: Unstandardized and standardized Beta
coefficients 216
14.12 Model coefficients output: Correlations and collinearity statistics 217
xiv Illustrations

14.13 Excluded variables 217


15.1 A simple (simulated) data matrix for a course feedback
questionnaire (N = 10) 222
15.2 The reliability for the 12-item implicature section of the TEP 222
15.3 Item-total statistics of the 12-item implicature section of the TEP 223
15.4 The case processing summary for items ‘imp1sc’ to ‘imp12sc’ 226
15.5 The overall reliability statistics 226
15.6 The item statistics 226
15.7 The summary item statistics 227
15.8 The item-total statistics 227
15.9 The scale statistics 227
15.10 The Spearman-Brown coefficient 233
15.11 Cross-tabulation of pass-fail ratings for 25 ESL learners 234
15.12 Cross-tabulation of pass-fail ratings by raters 1 and 2 238
15.13 Case processing summary for raters 1 and 2 238
15.14 Measure of agreement (kappa value) 238
15.15 Simulated data set for two raters (rater 1 and rater 2) 239
15.16 The case processing summary output 242
15.17 The reliability estimate output 242
15.18 The item statistics output 242
15.19 The intraclass correlation coefficient 243
FOREWORD

There is a certain degree of confidence or credibility that often accompanies


statistical evidence. “The numbers don’t lie”, we often hear in casual conversa-
tion. As consumers of information, whether in the news or in published second
language (L2) research, we tend to associate statistical evidence with objectivity
and, consequently, truth. The road that leads to statistical evidence, however, is
often long, winding, and full of decisions (even detours!) that the researcher has
taken. In the case of L2 research, examples of such choices might include deciding
(a) whether to collect speech samples using a more open-ended versus a controlled
task, (b) whether certain items in a questionnaire—or individuals in a sample—
should be removed from analysis based on aberrant observations, and (c) how to
score learner production that is only partially correct. Each of these choices may
influence a study’s outcomes in one direction or another, and it is critical that we
recognize the centrality of researcher judgment in all that we read and produce. As
Huff (1954) stated in his now-classic introduction to pitfalls that both researchers
and consumers succumb to, How to Lie With Statistics, “Statistics is as much an art
as it is a science” (p. 120).
A second point I offer as you enter into the wonders of quantitative research
is that nearly all of the objects we measure and quantify are actually qualitative
in nature. It may seem odd to point this out in the forward of a text like this,
but it is true! And although quantitative techniques are valuable in helping us to
organize data and to conduct the many systematic and insight-producing analyses
described throughout this book, they almost necessarily involve abstractions from
our initial interests. Imagine, for example, a study of the effects of two instruc-
tional treatments on learners’ ability to speak accurately and fluently. En route to
addressing that issue we would likely transcribe participants’ speech samples and
then code or score them for a given set of features. Next, we would summarize
xvi Foreword

those scores across the sample, the results of which would be subject to one or
more statistical tests for subsequent interpretation. In each of these procedures, we
have made abstractions, tiny steps away from learner knowledge.
I realize these comments might make me appear skeptical of quantitative research.
Of course I am! Likewise, we should all approach the task of conducting, report-
ing, and understanding empirical research with a critical eye. And thankfully, that
is precisely what this very timely and well-crafted book will enable you to do,
thereby advancing our collective ability both to conduct and evaluate research.
The text, in my view, manages to balance on the one hand a conceptual grounding
that enlightens without overwhelming and, on the other, the need for a hands-
on tutorial—in other words, precisely the knowledge and skills needed to make
and justify your own decisions throughout the process of producing rigorous and
meaningful studies. I look forward to reading them!
Luke Plonsky
Georgetown University
PREFACE

In the field of L2 research, the quantitative approach is one of the predominant


methodologies (see e.g., Norris, Ross & Schoonen, 2015; Plonsky, 2013, 2014;
Plonsky & Gass, 2011; Purpura, 2011). Quantitative research uses numbers, quan-
tification, and statistics to answer research questions. It involves the measurement
and quantification of language and language-related features of interest, such as
language proficiency, language skills, aptitudes, and motivation. The data collected
are then analyzed using statistical tools, the results of which are used to produce
research findings. In practice, however, the use of statistical tools and the way that
the results of quantitative research are reported leaves much to be desired.
In 2013, Plonsky conducted a systematic review of 606 second language acqui-
sition (SLA) studies in regard to study design, analysis, and reporting practices.
Several weaknesses in those practices were found, including a lack of basic statisti-
cal information, such as mean, standard deviations, and probability values. Plonsky
and Gass (2011), and Plonsky (2013, 2014) call for a reform of the data analysis
and report practices used in L2 research. According to Plonsky and Gass (2011),
these shortcomings could be a reflection of inadequate methodological and statis-
tical concept training, as well as insufficient coverage in research methods courses
in graduate programs of how researchers should report statistical findings.
The dearth of adequate training in quantitative research has potentially seri-
ous repercussions for the field. Certain areas in L2 research cannot be adequately
addressed if there is a lack of appropriate training in statistical methods and if suf-
ficient resources are inaccessible to new researchers and experienced researchers
new to quantitative methods. Quantitative methods, particularly inferential statis-
tics, can be technical and difficult to learn because they require an understanding
of not only the logic underpinning the statistical approaches taken, but also the
technical procedures that need to be followed to produce statistical outcomes. In
xviii Preface

addition, researchers need to be able to interpret outcomes of statistical analyses


and draw conclusions from them to answer research questions.
Since researchers in applied linguistics frequently come from an arts, humani-
ties, education, and/or social sciences background, they often have little familiarity
with mathematical and statistical concepts and procedures, and perceive statistics
as a foreign language, feeling apprehensive at the prospect of grappling with quan-
titative concepts and developing statistical skills. This may lead them to choose a
qualitative research approach, despite a quantitative one being more suitable to
answer a particular research question.
Not only can a lack of familiarity with quantitative procedures close off major
avenues of research to students, but it can also prevent new researchers from under-
standing and critically evaluating existing studies that use quantitative methods: if
readers do not understand the use of statistics in a paper, they are forced to take
the author’s interpretation of statistical outcomes on faith, rather than being able
to critically evaluate it. In the current market, there are a number of books that
deal with quantitative methods (e.g., Bachman, 2004; Bachman & Kunnan, 2005;
Larson-Hall, 2010, 2016), but these can be highly technical, mathematical, and
lengthy in their statistical treatments, as such books are often written for a particular
audience (e.g., advanced doctoral students, or experienced researchers). By contrast,
the current book assumes no prior experience in quantitative research and is writ-
ten for students and researchers new to quantitative methods.

The Aims and Scope of This Book


This book aims to introduce approaches to and techniques for quantitative data
analysis in L2 research, with a primary focus on L2 learning and assessment
research. It takes a conceptual, problem-solving approach, emphasizing the under-
standing of statistical theory and its application to research problems and pays less
attention to the mathematical side of statistical analysis.
This book is, therefore, intended as a practical academic resource and a starting
point for new researchers in their quest to learn about data analysis. It provides con-
ceptual explanations of quantitative methods through the use of examples, cases, and
published studies in the field. Statistical analysis is presented and illustrated through
applications of the IBM Statistical Package for Social Sciences (SPSS) program.
Formulae that can easily be computed manually will be presented in this book.
More involved statistical formulae associated with complex statistical procedures
being introduced will not be presented for several reasons. First, this book is
intended to nurture a conceptual understanding of statistical tests at an intro-
ductory level. Second, applied linguistics researchers rarely calculate inferential
statistics such as those presented in this book manually because there are numerous
statistical programs and online tools that are able to perform the required compu-
tations. Finally, there are many books on statistics that present statistical formulae
that the reader can consult if desired.
Preface xix

In this book, a range of common statistical analysis techniques that can be


employed in L2 research are presented and discussed. These include tools for
descriptive analysis, such as means and percentages, as well as inferential analy-
sis, such as correlational analysis, t-tests, and analysis of variance (ANOVA). An
understanding of statistics for L2 research at this level will lay the foundation on
which readers can further their learning of more complex statistics not covered
in this book (e.g., factor analysis, multivariate analysis of variance, Rasch analysis,
generalizability theory, multilevel modeling, and structural equation modeling).

Overview of the Book


The book begins with the basics of the quantification process, then moves on to
more sophisticated statistical tools. The book comprises a preface, 15 chapters, an
epilogue, references, key research terms in quantitative methods, and an index.
However, readers may choose to skip some chapters and focus on those chapters
relevant to their particular interest or research need. The chapters in this book
include specific examples and cases in quantitative research in language acquisition
and assessment, as well as analysis of unpublished data collected by the authors.
Most chapters illustrate how to use SPSS to perform the statistical analysis related
to the focus of the chapter.
Chapter 1 (Quantification) introduces the concept of quantification and dis-
cusses its benefits and limitations, and how data that are not initially quantitative
may become quantitative through coding and frequency counts. It also introduces
different scales of measurement (interval/ratio, ordinal, and nominal scales).
Chapter 2 (Introduction to SPSS ) presents the interface of the SPSS program,
the appearance of an SPSS data sheet, and preparing a data file for quantitative
data entry.
Chapter 3 (Descriptive Statistics) describes ways of representing data sets,
including graphical displays, frequency counts, and descriptive statistics. It also
foreshadows some of the statistical conditions that must be met to use some of the
statistical tests described later in the book.
Chapter 4 (Descriptive Statistics in SPSS ) shows how to compute descriptive
statistics in SPSS, and how to create simple graphs or displays of data.
Chapter 5 (Correlational Analysis) introduces the first two types of inferential
statistics, Pearson and Spearman correlations. The rationale behind correlations
and how to interpret a correlation coefficient are discussed.
Chapter 6 (Basics of Inferential Statistics) discusses the distinction between a
population and a sample, the logic of hypothesis testing, the normal distribution,
and the concept of probability. The concept of significance is also discussed. The
relationships among significance level, effect size, and sample size are highlighted.
Chapter 7 ( T-Tests) presents inferential statistics for detecting differences
between groups (the independent-samples t-test), and between repeated measure-
ment instances from the same group of participants (the paired-samples t-test).
xx Preface

Chapter 8 (Mann-Whitney U and Wilcoxon Signed-Rank Tests) presents the two


nonparametric versions of the t-tests presented in Chapter 7. These two tests are
useful for the analysis of nonnormally distributed and ordinal data.
Chapter 9 (One-Way Analysis of Variance [ANOVA]) extends between-group
comparisons as performed in the independent-samples t-test to three or more groups.
It discusses the principles of the one-way ANOVA and effect size considerations.
Chapter 10 (One-Way Analysis of Covariance [ANCOVA] ) presents an extended
version of the one-way ANOVA that is used when there are preexisting differ-
ences between groups, which can distort outcomes.
Chapter 11 (Repeated-Measures ANOVA) is an extension of the independent-
samples t-test to more than two groups. The repeated-measures ANOVA can
analyze whether there are differences among several measures of the same group.
This chapter covers the procedures that must be followed when using the repeated-
measures ANOVA, and discusses the types of research questions for which this
procedure is useful.
Chapter 12 (Two-Way Mixed-Design ANOVA) presents an inferential statistic
that combines a repeated-measures ANOVA (Chapter 11) with a between-groups
ANOVA (Chapter 9). Such a combination has the advantage of not only evaluat-
ing whether group differences affect performance outcomes, but also of being able
to simultaneously analyze the influences of time or task factors on performance
outcomes.
Chapter 13 (Chi-Square Test) demonstrates the use of the chi-square test in
L2 research and compares it with the use of Pearson and Spearman correlations.
Chapter 14 (Multiple Regression) presents simple regression and multiple regres-
sion analyses, which are used for assessing the relative impact of language learning
and test performance variables. Multiple regression allows researchers to examine
the relative contributions of predictor variables on an outcome variable.
Chapter 15 (Reliability Analysis) demonstrates an extension and application of
correlational analysis to examine the reliability of research instruments.
The Epilogue at the end of the book suggests resources for further reading in
quantitative methods.

Quantitative Research Abilities


At the end of this book, readers will have developed the following abilities:

• to understand and use suitable quantitative research analyses and approaches


in a specific research area and context;
• to critically read and evaluate quantitative research reports (e.g., journal arti-
cles, theses, or dissertations), including the claims made by researchers;
• to apply statistical concepts to their own research contexts. This ability goes
beyond understanding the specific research examples and statistical proce-
Preface xxi

dures presented in this book; it means that researchers will be enabled to


conduct analysis on their own data to answer research questions; and,
• to independently extend their statistical knowledge beyond what has been
covered in this book. Numerous advanced statistical analyses, such as Rasch
analysis or structural equation modeling, are not included in this book. They
are, however, important methods for L2 research.

Companion Website
A Companion Website hosted by the publisher houses online and up-to-date
materials such as exercises and activities: www.routledge.com/cw/roever

Comments/suggestions
The authors would be grateful to hear comments and suggestions regarding this
book. Please contact Carsten Roever at carsten@unimelb.edu.au or Aek Phakiti
at aek.phakiti@sydney.edu.au.
ACKNOWLEDGMENTS

In preparing and writing this book, we have benefitted greatly from the support of
many friends, colleagues, and students. First and foremost, we wish to acknowledge
Tim McNamara, whose brilliant pedagogical design of the course Quantitative
Methods in Language Studies at the University of Melbourne inspired us to write an
introductory statistical methods book that focuses on conceptual understanding
rather than mathematical intricacies. In addition, several colleagues, mentors, and
friends have helped us shape the book structure and content through invaluable
feedback and engaging discussion: Mike Baynham, Janette Bobis, Andrew Cohen,
Talia Isaacs, Antony Kunnan, Susy Macqueen, Lourdes Ortega, Brian Paltridge,
Luke Plonsky, Jim Purpura, and Jack Richards. We would like to thank Guy
Middleton for his exceptional work on editing the book chapter drafts. We also
greatly appreciate the feedback from Master of Arts (Applied Linguistics) students
at the University of Melbourne and Master of Education (TESOL) students at
the University of Sydney on an early draft. We would like to thank the staff at
Routledge for their assistance during this book project: Kathrene Binag, Rebecca
Novack, and the copy editors.
The support of our institutions and departments has allowed us time to con-
centrate on completing this book. The School of Languages and Linguistics at the
University of Melbourne supported Carsten with a sabbatical semester, which he
spent in the stimulating environment of the Teachers College, Columbia Uni-
versity. The Sydney School of Education and Social Work (formerly the Faculty
of Education and Social Work) supported Aek with a sabbatical semester at the
University of Bristol to complete this book project. Finally, Kevin Yang and Damir
Jambrek deserve our gratitude for their unflagging support while we worked on
this project over several years.
1
QUANTIFICATION

Introduction
Quantification is the use of numbers to represent facts about the world. It is used to
inform the decision-making process in countless situations. For example, a doctor
might prescribe some form of treatment if a patient’s blood pressure is too high.
Similarly, a university may accept the application of a student who has attained the
minimum required grades. In both these cases, numbers are used to inform deci-
sions. In L2 research, quantification is also used. For example,

• researchers in SLA might investigate the effect of feedback on students’ writ-


ing by comparing the writing scores of a group of students that received
feedback with the scores of a group that did not. They may then draw con-
clusions regarding the effect of that feedback;
• researchers in cross-cultural pragmatics might code requests made by people
from different cultures as direct or indirect and then use the codings to com-
pare those cultures; and
• researchers may be interested in the effect of a study-abroad program on stu-
dents’ language proficiency level. In this case, they may administer a language
proficiency test prior to the program, and another following the program.
Analysis of the test scores can then be carried out to determine whether it is
worthwhile for students to attend such programs.

This chapter introduces fundamental concepts related to quantitative research,


such as the nature of variables, measurement scales, and research topics in L2
research that can be addressed through quantitative methods.
2 Quantification

Quantitative Research
Quantitative researchers aim to draw conclusions from their research that can be
generalized beyond the sample participants used in their research. To do this, they
must generate theories that describe and explain their research results. When a
theory is in the process of being tested, several aspects of the theory are referred to
as hypotheses. This testing process involves analyzing data collected from, for exam-
ple, research participants or databases. In language assessment research, researchers
may be interested in the interrelationships among test performances across various
language skills (e.g., reading, listening, speaking, and writing). Researchers may
hypothesize that there are positive relationships among these skills because there
are common linguistic aspects underlying each skill (e.g., vocabulary and syntac-
tic knowledge). To test this hypothesis, researchers may ask participants to take a
test for each of the skills. They may then perform statistical analysis to investigate
whether their hypothesis is supported by the collected data.

Variables, Constructs, and Data


In quantitative research, the term variable is used to describe a feature that can
vary in degree, value, or quantity. Values of a variable may be obtained directly
from research participants with a high degree of certainty (e.g., their ages or first
language), or may have to be inferred from data collected using observation or
measurements of behavior. In quantitative research, the term construct is used to
refer to a feature of interest that is not apparent to the naked eye. Often constructs
are internal to individuals, for example, L2 constructs include language profi-
ciency, motivation, anxiety, and beliefs. Researchers may use a research instrument
(e.g., a language proficiency test or questionnaire) to collect data regarding these
constructs. For example, if researchers are interested in the vocabulary knowledge
of a group of students, then vocabulary knowledge is the construct of interest.
Researchers can ask students to demonstrate their knowledge by taking a vocab-
ulary test. Here, students’ performance on the test is treated as a variable that
represents their vocabulary knowledge. The test scores are the data, which will
enable researchers to infer the students’ vocabulary knowledge. The term data is
used to refer to the values that a variable may take on. The term data is, therefore,
used as a plural noun (e.g., ‘data are’ and ‘data were analyzed’).

Issues in Quantification
For the results of a piece of quantitative research to be believable, a minimum number
of research participants is required, which will depend on the research question under
analysis, and, in particular, the expected effect size (to be discussed in Chapter 6).
Quantification 3

In most cases, researchers need to use some type of instrument (e.g., a lan-
guage test, a rating scale, or a Likert-type scale questionnaire) to help them
quantify a construct that cannot be directly seen or observed (e.g., writing abil-
ity, reading skills, motivation, and anxiety). When researchers try to quantify
how well a student can write, it is not a matter of simply counting. Rather, it
involves the conversion of observations into numbers, for example, by applying a
scoring rubric that contains criteria which allow researchers to assign an overall
score to a piece of writing. That score then becomes the data used for further
analyses.

Measurement Scales
Different types of data contain different levels of information. These differences
are reflected in the concept of measurement scales. What is measured and how it is
measured determines the kind of data that results. Raw data may be interpreted
differently on different measurement scales. For example, suppose Heather and
Tom took the same language test. The results of the test may be interpreted in
different ways according to the measurement scale adopted. It may be said that
Heather got three more items correct than Tom, or that Heather performed better
than Tom. Alternatively, it may simply be said that their performances were not
identical. The amount of information in these statements about the relative abili-
ties of Heather and Tom is quite different and affects what kinds of conclusion can
be drawn about their abilities. The three statements about Heather and Tom relate
directly to the three types of quantitative data that are introduced in this chapter:
interval, ordinal, and nomina/categorical data.

Interval and Ratio Data


Interval data allows the difference between data values to be calculated. Test scores
are a typical kind of interval data. For example, if Heather scored 19 points on
a test, and Tom scored 16 points, it is clear that Heather got three points more
than Tom. A ratio scale is an interval scale with the additional property that it
has a well-defined true zero, which an interval scale does not. Examples of ratio
data include age, period of time, height, and weight. In practice, interval data and
ratio data are treated exactly the same way, so the difference between them has no
statistical consequences, and researchers generally just refer to “interval data” or
sometimes “interval/ratio data”.
It is the precision and information richness of interval data that makes it the
preferred type of data for statistical analyses. For example, consider the test that
Heather and Tom (and some other students) took. Suppose that the test was com-
posed of 20 questions. The full results of the test appear in Table 1.1.
4 Quantification

TABLE 1.1 Examples of learners and their scores

Learner Score (out of 20)

Heather 19
Tom 16
Phil 16
Jack 11
Mary 8

TABLE 1.2 An example of learners’ scores converted into percentages

Learner Score (out of 20) Percentage correct

Heather 19 95%
Tom 16 80%
Phil 16 80%
Jack 11 55%
Mary 8 40%

According to Table 1.1, it can be said that:

• Heather got more questions right than Tom, and also that she got three more
right than Tom did;
• Tom got twice as many questions right as the lowest scorer, Mary; and,
• the difference between Heather and Jack’s scores was the same as the differ-
ence between Tom and Mary’s scores, namely eight points in each case.

Interval data contain a large amount of detailed information and they tell us exactly
how large the interval is between individual learners’ scores. They therefore lend them-
selves to conversion to percentages. Table 1.2 shows the learners’ scores in percentages.
Percentages allow researchers to compare results from tests with different maxi-
mum scores (via a transformation to a common scale). For example, if the next
test consists of only 15 items, and Tom gets 11 of them right, his percentage score
will have declined (as 11 out of 15 is 73%), even though in both cases he got
four questions wrong. In addition to allowing conversion to percentages, interval
data can also be used for a wide range of statistical computations (e.g., calculating
means) and analyses.
Typical real-world examples of interval data include age, annual income, weekly
expenditure, and the time it takes to run a marathon. In L2 research, interval data
include age, number of years learning the target language, and raw scores on lan-
guage tests. Scaled test scores on a language proficiency test, such as the Test of
English as a Foreign Language (TOEFL), International English Language Testing
System (IELTS), and Test of English for International Communication (TOEIC)
are also normally considered interval data.
Quantification 5

Ordinal Data
For statistical purposes, ratio and interval data are normally considered desirable
because they are rich in information. Nonetheless, not all data can be classified as
interval data, and some data contain less precise information. Ordinal data contains
information about relative ranking but not about the precise size of a difference.
If the data in Tables 1.1 and 1.2 regarding students’ test scores were expressed as
ordinal data (i.e., they were on an ordinal scale of measurement), they would tell
the researchers that Heather performed better than Tom, but they would not indi-
cate by how much Heather outperformed Tom. Ordinal data are obtained when
participants are rated or ranked according to their test performances or levels of
some trait. For example, when language testers score learners’ written production
holistically using a scoring rubric that describes characteristics of performance,
they are assigning ratings to texts such as ‘excellent’, ‘good’, ‘adequate’, ‘support
needed’, or ‘major support needed’. Table 1.3 is an example of how the learners
discussed earlier are rated and ranked.
According to Table 1.3, it can be said that

• Heather scored better than all of the other students;


• Phil and Tom scored the same, and each scored more highly than Jack and
Mary; and
• Mary scored the lowest of all the students.

While ordinal data contain useful information about the relative standings of
test takers, they do not show precisely how large the differences between test tak-
ers are. Phil and Tom performed better than Mary did, but it is unknown how
much better than her they performed. Consequently, with the data in Table 1.3,
it is impossible to see that Phil and Tom scored twice as high as Mary. Although
it could be said that Phil and Tom are two score levels above Mary, that is rather
vague.
Ordinal data can be used to put learners in order of ability, but they do little
beyond establishing that order. In other words, they do not give researchers as
much information about the extent of the differences between individual learn-
ers as interval data do. Ratings of students’ writing or speaking performance are

TABLE 1.3 How learners are rated and ranked

Learner Rating Rank

Heather Excellent 1
Tom Good 2
Phil Good 2
Jack Adequate 3
Mary Support Needed 4
6 Quantification

often expressed numerically; however, that does not mean that they are interval
data. For example, numerical values can be assigned to descriptors as follows:
Excellent (5), Good (4), Adequate (3), Support Needed (2); Major Support
Needed (1). Table 1.4 presents how the learners are rated on the basis of perfor-
mance descriptors.
The numerical scores in Table 1.4 may look like interval data, but they are not.
They are only numbers that represent the descriptor, so it would not make sense
to say that Tom scored twice as high as Mary did. It makes sense to say only that
his score is two levels higher than Mary’s. This becomes even clearer if the rating
scales are changed as follows: excellent (8), good (6), adequate (4), support needed
(2), and Major support (0). That would give the information in Table 1.5.
As can been seen in Tables 1.4 and 1.5, the descriptors do not change, but
the numerical scores do. Tom and Phil’s scores are still two levels higher than
Mary’s, but now their numerical scores are three times as high as Mary’s score.
This illustration makes it clear that numerical representations of descriptors are
only symbols that say nothing about the size of the intervals between adjacent
levels. They indicate that Heather is a better writer than Tom, but since they are
not based on counts, they cannot indicate precisely how much of a better writer
Heather is than Tom.
In L2 research, rating scale data are an example of ordinal data. These are
commonly collected in relation to productive tasks (e.g., writing and speaking).
Whenever there are band levels, such as A1, A2, and B1, as in the Common Euro-
pean Reference Framework for Languages (see Council of Europe, 2001), or bands

TABLE 1.4 How learners are scored on the basis of performance descriptors

Learner Descriptor Numerical score

Heather Excellent 5
Tom Good 4
Phil Good 4
Jack Adequate 3
Mary Support Needed 2

TABLE 1.5 How learners are scored on a different set of performance descriptors

Learner Descriptor Numerical score

Heather Excellent 8
Tom Good 6
Phil Good 6
Jack Adequate 4
Mary Support Needed 2
Quantification 7

1–9, as in the IELTS, researchers are dealing with ordinal data, rather than interval
data. Data collected by putting learners into ordered categories, such as ‘beginner’,
‘intermediate’, or ‘advanced’ are another case of ordinal data. Finally, ordinal data
occur when researchers rank learners relative to each other. For example, researchers
may say that in reference to a particular feature, Heather is the best, Tom and Phil
share second place, Jack is behind them, and Mary is the weakest. This ranking indi-
cates only that the first learner is better (e.g., stronger, faster, more capable) than the
second learner, but not by how much. Ordinal data can only provide information
about the relative strengths of the test takers in regard to the feature in question. The
final data type often used in L2 research (i.e., nominal or categorical data) does not
contain information about the strengths of learners, but rather about their attributes.

Nominal or Categorical Data


Nominal data (i.e., named data, also called categorical data) are concerned only
with sameness or difference, rather than size or strength. Gender, native language,
country of origin, experimental treatment group, and test version taken are typical
examples of nominal data (i.e., data on a nominal scale of measurement). In the
example of Heather, Tom, Phil, Jack, and Mary, the nominal variable of gender has
two levels (male and female), and there are two males and three females. In research,
nominal variables are often used as independent variables; in other words, variables
that are expected to affect an outcome. Independent variables, such as teaching
methods and types of corrective feedback on performance, can be hypothesized to
affect learning outcomes or behaviors, which are then treated as dependent variables,
as they depend on the independent variables. It should be noted that dependent
and independent variables are related to research design. The nominal variable
‘study-abroad experience’, with the levels ‘has studied abroad’ (Yes = coded 1) or
‘has not studied abroad’ (No = coded 0), can be used to split a sample of learn-
ers into two groups in order to compare the scores of learners with study-abroad
experience with the scores of learners without study-abroad experience.
Nominal data are often coded numerically to facilitate the use of spreadsheets.
Table 1.6 presents an example of how nominal data can be coded numerically.
As can be seen in Table 1.6, it does not matter which numbers are assigned to the
nominal data because the idea that one number is better than another is meaningless
in this case. Also, the numerical codes do not have a mathematical value in the way
that ratio, interval and ordinal data do. For example, it cannot be said that females
are better than males merely because the code assigned to females is 2 and the code
for males is 1. However, frequency counts of nominal variables can be made, which
do have mathematical values. For instance, for the variable ‘gender’, there are three
males and two females (i.e., 40% of the participants are female and 60% are male in
the data set).
Nominal data are sometimes called categorical data because objects of inter-
est can be sorted into categories (e.g., men versus women; Form A versus Form
8 Quantification

TABLE 1.6 Nominal data and their numerical codes

Nominal variables Numerical codes

Gender Male (coded 1), female (coded 2)


Native or nonnative speaker Native (coded 1), nonnative (coded 2)
Pass or fail Pass (coded 1), fail (coded 0)
Test form Form A (coded 1), Form B (coded 2), Form C (coded 3)
Nationality American (coded 1), Canadian (coded 2),
British (coded 3), Singaporean (coded 4), Australian
(coded 5), and New Zealander (coded 6)
First language English (coded 1), Mandarin (coded 2),
Spanish (coded 3), French (coded 4), Japanese (coded 5)
Experimental groups Treatment A group (coded 1), Treatment B group
(coded 2), Control group (coded 3)
Proficiency level groups Beginner (coded 1), Intermediate (coded 2),
High Intermediate (coded 3), Advanced (coded 4)

TABLE 1.7 Essay types chosen by students

Learner Type Coded

Tom Personal experience 1


Mary Argumentative essay 2
Heather Personal experience 1
Jack Process description 3
Phil Process description 3

B versus Form C). When a variable can only have two possible values (pass/
fail; international student/domestic student, correct/incorrect), this type of data
is sometimes called dichotomous data. For example, students may be asked to com-
plete a free writing task in which they are limited to three types of essays: personal
experience (coded 1), argumentative essay (coded 2), and description of a process
(coded 3). Table 1.7 shows which student chose which type.
The data in the Type column do not provide any information about one learner
being more capable than another. It only shows which learners chose which essay
type, from which frequency counts can be made. That is, the process description
and personal experience types were chosen two times each, and the argumenta-
tive essay was chosen once. How nominal data are used in statistical analysis for
research purposes will be addressed in the next few chapters.

Transforming Data in a Real-Life Context


In a real-life situation, raw data need to be transformed for a variety of reasons.
Take the common situation in which new students entering a language program
Quantification 9

TABLE 1.8 The three placement levels taught at three different locations

Test score Placement level Location

0–20 Beginner City Campus


21–40 Intermediate Eastern Campus
41–60 Advanced Ocean Campus

TABLE 1.9 The students’ test scores, placement levels, and campuses

Student Test score Placement level Campus

Heather 51 Advanced Ocean


Tom 38 Intermediate Eastern
Phil 21 Intermediate Eastern
Jack 17 Beginner City
Mary 11 Beginner City

take a placement test consisting of, say, 60 multiple-choice questions assessing their
listening, reading, and grammar skills. Based on the test scores, the students are
placed in one of three levels: beginner, intermediate, or advanced. In addition, the
three levels are taught at three different locations, as presented in Table 1.8.
Table 1.9 presents the scores and placements of the five students introduced earlier.
The test scores are measured on an interval measurement scale that is based on
the count of correct answers in the placement test and provides detailed informa-
tion. It can be said that:

• Heather’s score is in the advanced range since her score is 11 points above the
cut-off, and her score is much higher than Tom’s, whose score was 23 points
lower than hers;
• Tom’s score is in the intermediate range, but it is close to the cut-off for the
advanced range, missing it by just three points;
• Tom’s score is far higher than Phil’s, with a difference of 17 points, yet both
scores are in the intermediate range;
• Phil’s score is just one point above the cut-off for the intermediate level, and
is only four points higher than Jack’s score. Despite the small difference in
their scores, Jack was placed in the beginner level and Phil was placed in the
intermediate level; and,
• Mary’s score is in the middle of the beginner level.

Because the information is detailed, the placement test can be evaluated criti-
cally. For example, Phil and Tom’s scores are 17 points apart whereas Phil and
Jack’s are only four points apart. Phil’s proficiency level is arguably closer to Jack’s
than to Tom’s. Yet, Phil and Tom are both classified as intermediate, but Jack is
classified in the beginner level. This is known as the contiguity problem, and it is
10 Quantification

TABLE 1.10 The students’ placement levels and campuses

Student Placement level Campus

Heather Advanced Ocean


Tom Intermediate Eastern
Phil Intermediate Eastern
Jack Beginner City
Mary Beginner City

common whenever cut-off points are set arbitrarily: students close to each other
but on different sides of the cut-off point can be more similar to each other than
to people further away from each other but on the same side of the cut-off point.
Now imagine that there are no interval-level test-score data, but instead just the
ordinal-level placement levels data and the campus data, as in Table 1.10.
As can be seen in Table 1.10, the differences between Tom and Phil and the
problematic nature of the classification that were so apparent before are no longer
visible. The information about the size of the differences between learners has
been lost and all that can be deduced now is that some students are more profi-
cient than others. Tom and Phil have the same level of proficiency and Jack is
clearly different from both of them. This demonstrates why ordinal data are not as
precise as interval data. Information is lost, and the differences between the learn-
ers seen earlier are no longer as clear.
Highly informative interval data are often transformed into less informative
ordinal data to reduce the number of categories the data must be split into. No
language program can run with classes at 60 different proficiency levels; moreover,
some small differences are not meaningful, so it does not make sense to group
learners into such a large number of levels. However, setting the cut-off points is
often a problematic issue in practice.
While the ordinal proficiency level data are less informative than the interval
test-score data, they can be scaled down even further, namely to the nominal cam-
pus data (see Table 1.11).
If this is all that can be seen, it is impossible to know how campus assignment
is related to proficiency level. However, it can be said that:

• Tom and Phil are on the same campus;


• Mary and Jack are on the same campus; and
• Heather is the only one at the Ocean campus.

This information does not indicate who is more proficient since nominal data
do not contain information about the size or direction of differences. They indi-
cate only whether differences exist or not.
Transformation of types of data can happen downwards only, rather than
upwards, in the sense that interval data can be transformed into ordinal data and
Quantification 11

TABLE 1.11 The students’ campuses

Student Campus

Tom Eastern
Mary City
Heather Ocean
Jack City
Phil Eastern

TABLE 1.12 Downward transformation of scales

Student Test score ⇒ Placement level ⇒ Campus

Heather 51 ⇒ Advanced ⇒ Ocean


Jack 17 ⇒ Beginner ⇒ City
Mary 11 ⇒ Beginner ⇒ City
Phil 21 ⇒ Intermediate ⇒ Eastern
Tom 38 ⇒ Intermediate ⇒ Eastern

ordinal data can be transformed into nominal data (e.g., by using test scores to
place learners in classes based on proficiency levels and then by assigning classes to
campus locations). Table 1.12 illustrates the downward transformation of scales.
Transformation does not work the other way around. That is, if it is known
which campus a learner studies at, it is impossible to predict that learner’s profi-
ciency level. Similarly, if a learner’s proficiency level is known, it is impossible to
predict that learner’s exact test score.

Topics in L2 Research
It is useful to introduce some of the key topics in L2 research that can be examined
using a quantitative research methodology. Here, areas of research interests in SLA,
and language testing and assessment (LTA) research are presented.

SLA Research
There is a wide range of topics in SLA research that can be investigated using
quantitative methods, although the nature of SLA itself is qualitative. SLA research
aims to examine the nature of language learning and interlanguage processes (e.g.,
sequences of language acquisition; the order of morpheme acquisition; charac-
teristics of language errors and their sources; language use avoidance; cognitive
processes; and language accuracy, fluency, and complexity). SLA research also
aims to understand the factors that affect language learning and success. Such
factors may be internal or individual factors (e.g., age, first language or cross-
linguistic influences, language aptitude, motivation, anxiety, and self-regulation), or
external or social factors (e.g., language exposure and interactions, language and
12 Quantification

socialization, language community attitude, feedback, and scaffolding). There are


several texts that provide further details of the scope of SLA research (e.g., Ellis,
2015; Gass with Behney & Plonsky, 2013; Lightbown & Spada, 2013; Macaro,
2010; Ortega, 2009; Pawlak & Aronin, 2014).

Topics in LTA Research


LTA research primarily focuses on the quality and usefulness of language tests and
assessments, and issues surrounding test development and use (e.g., test validity,
impact, use and fairness; see Purpura, 2016, or Read, 2015, for an overview). Like
SLA research, LTA research focuses on the measurement of language skills and
communicative abilities in a variety of contexts (e.g., academic language purposes
such as achievement tests, proficiency tests, and screening tests, and occupational
purposes such as tests for medical professions, aviation, or tourist guides). The
term assessment is used to cover more than the use of tests to elicit language perfor-
mance. For example, assessment may be informally carried out by teachers in the
classroom. There are several books on LTA that consider the key issues: Bachman
and Palmer, 2010; Carr, 2011; Coombe, Davidson, O’Sullivan and Stoynoff, 2012;
Douglas, 2010; Fulcher, 2010; Green, 2014; Kunnan, 2014; Weir, 2003. While
there has been an increase in qualitative and mixed methods approaches in LTA,
quantitative methods remain predominant in LTA research. This is mainly because
tests and assessments involve the measurement and evaluation of language ability.
Like SLA researchers, LTA researchers are interested in understanding the internal
factors (e.g., language knowledge, cognitive processes, and affective factors), and
external factors (e.g., characteristics of test tasks such as text characteristics, test
techniques, and the task demands and roles of raters) that affect test performance
variation. SLA and LTA research are related to each other in that SLA research
focuses on developing an understanding of the processes of language learning,
whereas LTA research measures the products of language learning processes.

A Sample Study
Khang (2014) will be used to further illustrate how L2 researchers apply the prin-
ciples of scales of measurement in their research. Khang (2014) investigated the
fluency of spoken English of 31 Korean English as a Foreign Language (EFL)
learners compared to that of 15 native English (L1) speakers. The research partici-
pants included high and low proficiency learners. Khang conducted a stimulated
recall study with a subset of this population (eight high proficiency learners and
nine low proficiency learners). This study exemplifies all three measurement scales.
The status of a learner as native or nonnative speaker of English was used as a
nominal variable. ‘Native’ was not in any way better or worse than ‘nonnative’; it
was just different. The only statistic applied to this variable was a frequency count
(15 native speakers and 31 nonnative speakers). Khang used this variable to estab-
lish groups for comparison. Proficiency level was used as an ordinal variable in
Quantification 13

this study. High proficiency learners were assumed to have greater target language
competence than low proficiency learners had, but the degree of the difference
was not relevant. The researcher was interested only in comparing the issues that
high and low proficiency learners struggled with. Khang’s other measures were
interval variables (e.g., averaged syllable duration, number of corrections per min-
ute, and number of silent pauses per minute, which can all be precisely quantified).

Summary
It is essential that quantitative researchers consider the types of data and levels of
measurement that they use (i.e., the nature of the numbers used to measure the
variables). In this chapter, issues of quantification and measurement in L2 research,
particularly the types of data and scales associated with them, have been discussed.
The next chapter will turn to a practical concern: how to manage quantitative data
with the help of a statistical analysis program, namely the IBM Statistical Package
for Social Sciences (SPSS). The concept of measurement scales will be revisited
through SPSS in the next chapter.

Review Exercises
To download review questions and SPSS exercises for this chapter, visit the Com-
panion Website: www.routledge.com/cw/roever.
References
Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press.
American Psychological Association (APA) . (2010). Publication manual of the American
Psychological Association (6th ed.). Washington, DC: American Psychological Association.
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge
University Press.
Bachman, L. F. , & Kunnan, A. J. (2005). Statistical analyses for language assessment
workbook and CD ROM. Cambridge: Cambridge University Press.
Bachman, L. F. , & Palmer, A. S. (2010). Language assessment in practice. Oxford: Oxford
University Press.
Bell, N. (2012). Comparing playful and nonplayful incidental attention to form. Language
Learning, 62(1), 236265.
Blair, E. , & Blair, J. (2015). Applied survey sampling. Thousand Oaks: Sage.
Brown, J. D. (2005). Testing in language programs. New York: McGraw Hill.
Brown, J. D. (2011). Likert items and scales of measurement. SHIKEN: JALT Testing &
Evaluation SIG Newsletter, 15(1), 1014.
Brown, J. D. (2014). Classical theory reliability. In A. J. Kunnan (Ed.), Companion to
language assessment (pp. 11651181). Oxford, UK: John Wiley & Sons.
Carifio, J. , & Perla, R. J. (2007). Ten common misunderstandings, misconceptions,
persistent myths and urban legends about Likert scales and Likert response formats and their
antidotes. Journal of Social Sciences, 3(3), 106116.
Carr, N. (2011). Designing and analysing language tests. Oxford: Oxford University Press.
Chapelle, C. A. , Enright, M. K. , & Jamieson, J. (Eds.). (2008). Building a validity argument
for the test of English as a foreign language. London: Routledge.
Cho, Y. , & Bridgeman, B. (2012). Relationship of TOEFL iBT scores to academic
performance: Some evidence from American universities. Language Testing, 29(3), 421442.
Clark, L. A. , & Watson, D. B. (1995). Constructing validity: Basic issues in objective scale
development. Psychological Assessment, 7, 309319.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Newbury Park, CA:
Sage.
Cook, R. D. , & Weisberg, S. (1983). Diagnostics for heteroscedasticity in regression.
Biometrika, 70(1), 110.
251 Coombe, C. A. , Davidson, P. , OSullivan, B. , & Stoynoff, S. (Eds.). (2012). Cambridge
guide to second language assessment. Cambridge: Cambridge University Press.
Corder, G. W. , & Foreman, D. I. (2009). Non-parametric statistics for non-statisticians.
Hoboken, NJ: John Wiley.
Council of Europe (2001). Common European framework of reference for languages:
Learning, teaching, assessment. Cambridge: Cambridge University Press.
Crossley, S. A. , Cobb, T. , & McNamara, D. S. (2013). Comparing count-based and band-
based indices of word frequency: Implications for active vocabulary research and
pedagogical applications. System, 41(4), 965982.
Derwing, T. M. , & Munro, M. J. (2013). The development of L2 oral language skills in two L1
groups: A 7-year study. Language Learning, 63(2), 163185.
Di Silvio, F. , Donovan, A. , & Malone, M. E. (2014). The effect of study abroad homestay
placements: Participant perspectives and oral proficiency gains. Foreign Language Annals,
47(1), 168188.
Doolan, S. M. , & Miller, D. (2012). Generation 1.5 written error patterns: A comparative
study. Journal of Second Language Writing, 21(1), 122.
Drnyei, Z. , & Taguchi, T. (2010). Questionnaires in second language research. London:
Routledge.
Douglas, D. (2010). Understanding language testing. London: Hodder Education.
Eisenhauer, J. G. (2008). Degrees of freedom. Teaching Statistics, 30(3), 7578.
Elder, C. , Knoch, U. , & Zhang, R. (2009). Diagnosing the support needs of second language
writers: Does the time allowance matter? TESOL Quarterly, 43(2), 351360.
Ellis, R. (2015). Understanding second language acquisition. Oxford: Oxford University
Press.
Field, A. (2013). Discovering statistics using IBM SPSS statistics (3rd ed.). Los Angeles:
Sage.
Fulcher, G. (2010). Practical language testing. London: Hodder Education.
Furr, R. M. (2010). Yates correction. In N. J. Salkind (Ed.), Encyclopedia of research design
(Vol. 3, pp. 16451648). Los Angeles: Sage.
Fushino, K. (2010). Causal relationships between communication confidence, beliefs about
group work, and willingness to communicate in foreign language group work. TESOL
Quarterly, 44(4), 700724.
Gass, S. M. with Behney, J. , & Plonsky, L. (2013). Second language acquisition: An
introductory course (4th ed.). New York and London: Routledge.
Gass, S. , Svetics, I. , & Lemelin, S. (2003). Differential effects of attention. Language
Learning, 53(3), 497545.
Green, A. (2014). Exploring language assessment and testing: Language in action. New
York: Routledge.
Greenhouse, S. (1990). Yatess correction for continuity and the analysis of 22 contingency
tables: Comment. Statistics in Medicine, 9(4), 371372.
Guo, Y. , & Roehrig, A. D. (2011). Roles of general versus second language (L2) knowledge
in L2 reading comprehension. Reading in a Foreign Language, 23(1), 4264.
Haviland, M. G. (1990). Yatess correction for continuity and the analysis of 22 contingency
tables. Statistics in Medicine, 9(4), 363367.
House, J. (1996). Developing pragmatic fluency in English as a foreign language: Routines
and metapragmatic awareness. Studies in Second Language Acquisition, 18(2), 225252.
Hudson, T. , & Llosa, L. (2015). Design issues and inference in experimental L2 research.
Language Learning, 65(S1), 7696.
Huff, D. (1954). How to lie with statistics. New York: Norton.
252 Isaacs, T. , Trofimovich, P. , Yu, G. , & Munoz, B. M. (2015). Examining the linguistic
aspects of speech that most efficiently discriminate between upper levels of the revised
IELTS Pronunciation scale. IELTS Research Report, 4, 148.
Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38(12),
12121218.
Jia, F. , Gottardo, A. , Koh, P. W. , Chen, X. , & Pasquarella, A. (2014). The role of
acculturation in reading a second language: Its relation to English literacy skills in immigrant
Chinese adolescents. Reading Research Quarterly, 49(2), 251261.
Kane, M. (2006). Validation. In R. Brennan (Ed.), Educational measurement (4th ed., pp.
1764). Westport, CT: Greenwood Publishing.
Keith, Z. K. (2003). Validity of automated essay scoring systems. In M. D. Shermis , & J.
Burstein, J. (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 147167).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Khang, J. (2014). Exploring utterance and cognitive fluency of L1 and L2 English speakers:
Temporal measures and stimulated recall. Language Learning, 64(4), 809854.
Ko, M. H. (2012). Glossing and second language vocabulary learning. TESOL Quarterly,
46(1), 5679.
Kormos, J. , & Trebits, A. (2012). The role of task complexity, modality and aptitude in
narrative task performance. Language Learning, 62(2), 439472.
Kunnan, A. J. (Ed.). (2014). The companion to language assessment. Oxford, UK: John
Wiley & Sons.
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a
practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863.
Larson-Hall, J. (2010). A guide to doing statistics in second language research using SPSS.
New York: Routledge.
Larson-Hall, J. (2016). A guide to doing research in second language acquisition with SPSS
and R (2nd ed.). New York: Routledge.
Laufer, B. , & Waldman, T. (2011). Verb-noun collocations in second language writing: A
corpus analysis of learners English. Language Learning, 61(2), 647672.
Laufer, L. , & Rozovski-Roitblat, L. (2011). Incidental vocabulary acquisition: The effects of
task type, word occurrence and their combination. Language Teaching Research, 15(4),
391411.
Lee, C. H. , & Kalyuga, S. (2011). Effectiveness of different pinyin presentation formats in
learning Chinese characters: A cognitive load perspective. Language Learning, 61(4),
10991118.
Lightbown, P. M. , & Spada, N. (2013). How languages are learned (4th ed.). Oxford: Oxford
University Press.
Liu, D. (2011). The most frequently used English phrasal verbs in American and British
English: A multicorpus examination. TESOL Quarterly, 45(4), 661688.
Macaro, E. (2010). Continuum companion to second language acquisition. London:
Continuum.
Mackey, A. , & Gass, S. M. (2015). Second language research: Methodology and design (2nd
ed.). London: Routledge.
Mantel, N. (1990). Yatess correction for continuity and the analysis of 22 contingency tables:
Comment. Statistics in Medicine, 9(4), 369370.
Matsumoto, M. (2011). Second language learners motivation and their perception of their
teachers as an affecting factor. New Zealand Studies in Applied Linguistics, 17(2), 3752.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp.
13103). New York: Macmillan.
253 Miller, G. A. , & Chapman, J. P. (2001). Misunderstanding analysis of covariance.
Journal of Abnormal Psychology, 110(1), 4048.
Mora, J. C. , & Valls-Ferrer, M. (2012). Oral fluency, accuracy, and complexity in formal
instruction and study abroad learning contexts. TESOL Quarterly, 46(4), 610641.
Norris, J. M. , Ross, S. J. , & Schoonen, R. (Eds.). (2015). Improving and extending
quantitative reasoning in second language research. Language Learning, 65(S1), vvi, 1264.
Ockey, G. J. , Koyama, D. , Setoguchi, E. , & Sun, A. (2015). The extent to which TOEFL iBT
speaking scores are associated with performance on oral ability components for Japanese
university students. Language Testing, 32(1), 3962.
Ortega, L. (2009). Understanding second language acquisition. London: Hodder.
Paltridge, B. , & Phakiti, A. (Eds.) (2015). Research methods in Applied Linguistics: A
practical resource. London: Bloomsbury.
Pawlak, M. , & Aronin, L. (2014). Essential topics in applied linguistics and multilingualism:
Studies in honor of David Singleton. New York, NY: Springer.
Phakiti, A. (2006). Modeling cognitive and metacognitive strategies and their relationships to
EFL reading test performance. Melbourne Papers in Language Testing, 1(1), 5396.
Phakiti, A. (2014). Experimental research methods in language learning. London:
Bloomsbury.
Phakiti, A. , Hirsh, D. , & Woodrow, L. (2013). Its not only English: Effects of other individual
factors on English language learning and academic learning of ESL International students in
Australia. Journal of Research in International Education, 12(3), 239258.
doi:10.1177/1475240913513520.
Phakiti, A. , & Li, L. (2011). General academic difficulties and reading and writing difficulties
among Asian ESL postgraduate students in TESOL at an Australian university. RELC
Journal, 42(3), 227264.
Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting
practices in quantitative L2 research. Studies in Second Language Acquisition, 35(4),
655687.
Plonsky, L. (2014). Study quality in quantitative L2 research (19902010): A methodological
synthesis and call for reform. The Modern Language Journal, 98(1), 450470.
Plonsky, L. , & Gass, S. (2011). Quantitative research methods, study quality, and outcomes:
The case of interaction research. Language Learning, 61(2), 325366.
Plonsky, L. , & Oswald, F. L. (2014). How big is big? Interpreting effect sizes in L2 research.
Language Learning, 64(4), 878912.
Purpura, J. E. (2011). Quantitative research methods in assessment and testing. In E. Hinkel
(Ed.), Handbook of research in second language teaching and learning Vol. 2 (pp. 731751).
London: Routledge.
Purpura, J. E. (2016). Second and foreign language assessment. The Modern Language
Journal, 100(S), 190208.
Qian, D. (2002). Investigating the relationship between vocabulary knowledge and academic
reading performance: An assessment perspective. Language Learning, 52(3), 513536.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Read, J. (2015). Researching language testing and assessment. In B. Paltridge , & A. Phakiti
(Eds.), Research methods in applied linguistics: A practical resource (pp. 471486). London:
Bloomsbury.
Roever, C. (1995). Routine formulae in acquiring English as a foreign language. Unpublished
raw data.
Roever, C. (2005). Testing ESL pragmatics. Frankfurt: Peter Lang.
254 Roever, C. (2006). Validation of a web-based test of ESL pragmalinguistics. Language
Testing, 23(2), 229256.
Roever, C. (2012). What learners get for free: Learning of routine formulae in ESL and EFL
environments. ELT Journal, 66(1), 1021.
Rutherford, A. (2011). ANOVA and ANCOVA: A GLM approach. Oxford: John Wiley & Sons.
Scheaffer, R. L. , Mendenhall, W. , Ott, R. L. , & Gerow, K. G. (2012). Elementary survey
sampling. Boston: Brooks/Cole.
Shadish, W. R. , Cook, T. D. , & Campbell, D. T. (2002). Experimental and quasi-
experimental designs for generalized causal inference. Boston: Houghton, Mifflin.
Shintani, N. , Ellis, R. , & Suzuki, W. (2014). Effects of written feedback and revision on
learners accuracy in using two English grammatical structures. Language Learning, 64(1),
103131.
Stevens, J. P. (2012). Applied multivariate statistics for the social sciences (5th ed.). New
York: Routledge.
Tabachnik, B. , & Fidell, L. (2012). Using multivariate statistics. Boston: Pearson.
Taguchi, N. , & Roever, C. (2017). Second language pragmatics. Oxford: Oxford University
Press.
Weir, C. J. (2003). Language testing and validation: An evidence-based approach. New York,
NY: Macmillan.
Williamson, D. M. , Xi, X. , & Breyer, F. J. (2012). A framework for evaluation and use of
automated scoring. Educational Measurement: Issues and Practice, 31(1), 213.
Yang, Y. , Buckendahl, C. W. , Juszkewicz, P. J. , & Bhola, D. S. (2002). A review of
strategies for validating computer-automated scoring. Applied Measurement in Education,
15(4), 391412.

You might also like