You are on page 1of 19

Teachers of English to Speakers of Other Languages, Inc.

(TESOL)

Statistics as a Foreign Language: Part 1: What to Look for in Reading Statistical Language
Studies
Author(s): James Dean Brown
Reviewed work(s):
Source: TESOL Quarterly, Vol. 25, No. 4 (Winter, 1991), pp. 569-586
Published by: Teachers of English to Speakers of Other Languages, Inc. (TESOL)
Stable URL: http://www.jstor.org/stable/3587077 .
Accessed: 25/02/2013 21:53

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Teachers of English to Speakers of Other Languages, Inc. (TESOL) is collaborating with JSTOR to digitize,
preserve and extend access to TESOL Quarterly.

http://www.jstor.org

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
TESOL QUARTERLY,Vol.25, No.4, Winter1991

Statisticsas a ForeignLanguage-
Part 1: Whatto Lookfor in Reading
StatisticalLanguage Studies*
JAMES DEAN BROWN
Universityof Hawaii at Manoa

This articleis addressedto thosepracticing EFL/ESL teachers


who currently avoidstatisticalstudies.In particular,itis designed
to provideteacherswithstrategies thatcan helpthemgainaccess
tostatisticalstudiesonlanguagelearning and teachingso thatthey
can use theinformation foundinsucharticlesto betterservetheir
students.To thatend, fiveattackstrategies are advocatedand
discussed:(a) use theabstractto decideifthestudyhas valuefor
you; (b) let theconventional organization of thepaperhelpyou;
(c) examinethe statistical reasoninginvolvedin the study;(d)
evaluatewhatyouhavereadinrelationtoyourprofessional expe-
rience;and (e) learnmoreabout statistics and researchdesign.
Each ofthesestrategies is discussed,andexamplesaredrawnfrom
thearticlefollowing thisoneinthisissueoftheTESOL Quarterly.

The TESOL Quarterlyis currentlythe research journal of the


organization,Teachers of Englishto Speakers of OtherLanguages.
Ironically,many of the statisticalstudieson language learningand
teachingthatare foundin the Quarterlymay be incomprehensible
to theveryEFL/ESL teacherswho make up theintendedaudience.
Rather than bemoaning this situation(eitherby beratingteachers
fornot knowingmore about statisticsor by criticizingresearchers
forproducingarticlesthatare frequently notaccessible to teachers),
this article will begin by accepting statisticallanguage studies for
what theyare: legitimateinvestigationsinto phenomena in human
language learning/teaching which include the use and systematic
manipulationof numbersas part of theirargument.
Notice thatI purposelyavoid termssuch as empiricaland experi-
mentalin referring to these statisticallanguage studies.I am doing
* Part
2, scheduled to appear in Volume 26, discussesmore advanced statisticalprocedures.

569

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
so forseveralreasons.First,thereare other,nonstatistical studies
thatcould be called empirical(e.g., ethnographies, case studies,
empiricalstudiesare thosebased on data
etc.) since,by definition,
(but not necessarily
quantitative data). Second,thereare statistical
studiesthatare notexactlyexperimental in thetechnicalsenseof
thatword (e.g., quasi-experimental studies,posttest-only designs,
etc.).Third,therearestatistical studiesthathavelittleornothing to
do with experimentation (e.g., demonstrations, surveyresearch,
etc.).
Regardlessof what studiesare called, when confronted with
manyreaderswilleitherskipan articleentirely,
statistics, or takea
rathercursoryroutethrough thepaper.Sucha routemightinclude
skimming theabstractand theIntroduction section,thenskipping
overtheMethodandResultssections(withtheirtables,figures, and
to
statistics) theConclusions (and/orDiscussion) section where they
lookto findoutwhatthestudywas all about.If thisstrategy sounds
similarto one thatyou use, you may be missingan opportunity.
Statistical
reasoningis justa formofargumentation; by skipping the
Methodand Resultssections,readersnotonlymisstheheartofthe
study, but also buy the authors' argumentwithoutcritical
evaluation.Mostofus wouldnotsurrender so easilyiftheformof
argument wereexpressedinwordsratherthannumbers. We would
read a prosearticlecarefully We shouldnothave to
and critically.
surrenderprofessionalskepticismjust because the form of
argument maybe a bitalien,thatis,numerical.
The purposeofthisarticleis toprovidesomeattackstrategies for
teachersto use in gainingaccess to statisticalstudiesand in
understanding thembetter.In theprocess,exampleswillbe drawn
froma studyreportedin the second articlein thisissue of the
TESOL Quarterly.

ATTACK STRATEGIESFOR STATISTICALSTUDIES


With the followingstrategiesin hand, you may not understand
everyword of a statisticalstudy,but you will be able to gain access
to such studiesand will have a purposefulway of grapplingwiththe
content.

Use theAbstractto Decide ifthe Studyhas Value forYou


Let us begin withthe familiarand work toward the less familiar.
The portionof a statisticalreportthatis probably most oftenread
is theabstract.An abstracttypicallycontainsabout 150 words in the

570 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
TESOL Quarterly.Other journals may have somewhat longer or
shorterabstracts.Regardlessof theirlength,thesehandysummaries
should containenoughinformation forthereader to know what the
study was about, how it was conducted,and thegeneraltrendof the
results. In other words, an abstract should tell the reader in a
nutshellwhat is presentedin the studyand allow you to determine
if an article is pertinentenough to your professional life and
teachingsituationto be interestingand worthyof yourtime.
Indeed, there is an overwhelmingand increasing amount of
informationcompetingfor our professionalattention.Along with
the TESOL Journaland the TESOL Quarterly,EFL/ESL teachers
choose among other journals,such as the ELT Journal,the ESP
Journal,
Language,LanguageLearning, Modern
LanguageTesting,
Language Journal,Studiesin Second LanguageAcquisition,
and
TESL Canada Journal/Revue TESL du Canada,tonamejusta few
in the United Kingdom and North America, as well as Cross
Currentsand theJALT Journal(Japan),Prospect(Australia),RELC
Journal(Singapore), System(Sweden), and manyothers.
Because of this plethora of journals, it is essential to use the
abstractsto advantage. Consider the abstractassociated with the
example articlethatfollowsthisone. Is theresufficient information
in thatabstractforyou to decide whetherthearticleis of interestto
you?

Let the ConventionalOrganizationofthe PaperHelp You


The TESOL Quarterlyand many other journals in our field
generally follow the format and organization described in the
PublicationManual of the AmericanPsychologicalAssociation
(APA) (American Psychological Association,1983). That manual
advocates using the followinggeneral sections and subsectionsin
reportinga statisticalstudy:
Introduction
Introductionto the problem
Background
Statementof purpose
Method
Subjects
Materials(or Apparatus)
Procedures
Results
Discussion (and/orConclusions)
References

READING STATISTICAL LANGUAGE STUDIES 571

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
Typically in our journals, there are no headings for the
Introductionor its subparts.However, beginningwiththe Method
section (including any Subjects, Materials or Apparatus, and
Proceduresubsections)throughtheResults,Discussion,Conclusion,
and Referencessections,you will generallyfindclear headingsand
subheadings. Since the general purpose of the headings and sub-
headings is "to help readers find specific information"(American
PsychologicalAssociation,1983,pp. 25-26), you should use themto
help you findand organize the informationthatyou need in order
to understandthe study.There is not space here to provide details
about what each of these sections,should contain.Indeed, such de-
tails are not necessaryhere because existingsources give ample in-
formationon this topic (e.g., Brown, 1988; pp. 43-62; Hatch &
Farhady, 1982,pp. 33-38; Hatch & Lazaraton, 1991,pp. 107-126).
Nevertheless,there are a number of questions that you might
want to ask yourselfas you read througha statisticalstudy.These
questionsshould help focus the informationcontainedin thesekey
sections and help readers criticallyevaluate a study. Notice that
sectionand subsectionheadings are listedin parenthesesaftereach
of the questions below. These are meant to suggest where you
would typically find the informationthat would answer each
question.
1. What literature is reviewed? Is the review current and
complete? Wheredoes thestudyfitintothefield? (Introduction
section)
2. What is the purpose of the study? (Introduction section,
especially the Statementof Purpose subsection)
3. Who was studied and how were they selected? Were there
enough people in the study to make the resultsmeaningful?
(Subjects subsection)
4. What tests,questionnaires,ratingscales, etc., were used? What
do they look like? And, are they reliable and valid for the
purposes of the study?(Materialssubsection)
5. What actually happened to the subjects during the data
gatheringprocess? (Proceduressubsection)
6. How were the data organized and analyzed? (Resultssection)
7. Is there enough informationprovided to replicate the study?
(throughout the Method section including the Subjects,
Materials,and Proceduresubsections)
8. What were the descriptiveresults?What otherstatisticalresults
came out of the study?(Resultssection)

572 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
9. What were the answers to the researchquestionsand what do
the answersindicate? (Discussion section)
10. What are theimplicationsof theresults,and how do theyrelate
to the fieldas a whole? (Discussion or Conclusionssection)
11. Which conclusionsfollow directlyfromthe resultsand which
ones are more speculative? (Discussion or Conclusionssection)
12. Whatquestionsarose in thecourseof doing thestudythatmight
be useful for future research? (Discussion or Conclusions
section)
Since answersto thesequestionsare importantin understanding any
study, you can use the conventional organization of statistical
studies as representedby the sectionsdiscussed here to findyour
way around publishedresearcharticles.
However, be warned that, even though the APA formatand
organizationare well known,some authorsdo not use exactlythe
sections and headings listed above. Sometimes there are good
reasons for such deviations. For example, in the example article,
thereis a separate Proceduressubsectionas advocated in the APA
manual. However, in otherstudies,the same authorhas chosen to
use a combined Materialsand Procedures subsectionbecause the
two issues were inextricablyintertwinedto the degree that they
made littlesense if explained separately.Regardlessof thespecific
sections and headings used in a particularstudy,you should find
sufficientinformation somewherein any statisticalstudyto answer
the twelve sets of questionslistedabove.
At thispoint,you may wishto turnto theexample article.Without
reading every word, try jumping from section to section while
lettingthe above questionsguide what you have to read in orderto
answer them. In the process, notice that everythingis found just
about where the APA manual suggests. Notice also that the
organizationand headingshelp you to findtheinformation thatyou
need and to generallyunderstandthestudy.

ExaminetheStatistical
Reasoning Involved intheStudy
In order to understandthe resultsof a statisticalstudy,it is
necessaryto understandthe statistical
reasoningthatunderliesall
suchstudies.Thereareseveralkeyconceptsthatarenecessary parts
of much statisticalreasoning:(a) descriptivestatistics,(b) statistical
(c) probabilitylevels, (d) statisticaltests,and (e)
differences,
significanceversus meaningfulness.These five concepts will be
discussed in turn.

READING STATISTICAL LANGUAGE STUDIES 573

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
Descriptive Part of the contentof any statistical
statistics. paper
describeswhathappenedin thestudy.As mentionedabove, such
description is partiallyaccomplished withinthevariouspartsofthe
Methodsection.However,descriptionalso occursin the Results
section,whichdescribeswhathappenedstatistically. Descriptive
statistics(a phraseused in contrast to inferential describe
statistics)
orsummarize a datasetbut,bythemselves, cannottellus theextent
to whichtheyrepresent a largerpopulationor other,similarsets.
The descriptive statisticsthatare mostoftenused are indicators of
thecentraltendencyand dispersion. The centraltendency(which
can be viewedas a typicalvaluefora setofnumbers)is commonly
reportedin termsof a statistic called themean. (A statistic is any
numberthatcan be computedbased on theobserveddata.) The
meanis usuallyexactlythesame as thearithmetic averagethatwe
useindailylife.Dispersion(whichcanbe viewedas thevariation of
the numericalvalues away fromthe centraltendency)is usually
reportedin termsof a statistic called thestandarddeviation.The
standarddeviationsummarizes how muchthenumbersvaryaway
fromthemean,orhow muchtheyare spreadoutaroundthemean.
Ordinarily, we would expectmostscores(about68%fora normal
distribution) to fallwithinone standarddeviationof themean.For
a normal distribution, 95% of the data fall within2 standard
deviations.For instance, ifthedatabeingdescribedarea setoftest
scores,theirstandarddeviationcan be definedas "a sortofaverage
ofthedifferences ofall scoresfromthemean"(Brown,1988,p. 69).
You now have enough basic information about descriptive
statisticsto considerthecentraltendencyand dispersionin a real
study.Suchdescriptive statisticsmostoftentaketheformofa table.
Table 2 in theexamplestudyis typical.Noticethatitdescribesthe
resultsin termsofthevariousgroupings involvedin thestudy,that
is, foreach groupand/orsubgroup in thestudy.The firstcolumn
labels groupsthatwere createdby considering the ratersin two
faculties(Englishand ESL) separately as wellas combined.Across
thetopofthetable,youwillalso findlabelsforthegroupsthatwere
createdby considering the differenttypesof studentsseparately
(Englishand ESL composition students)and combined.The most
importantthingto note is that the data being grouped and
describedare theratingsfortwo typesof students(Englishand
ESL) as ratedby two typesof instructors (fromEnglishand ESL
faculties).
Noticethatforeach possiblecombination of StudentType,and
RaterFaculty,the table providesthreestatistics: the numberof
students involvedinthegroup(n), themean(m), and thestandard

574 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
deviation (SD). For instance, the descriptive statisticsfor the
English students' compositions as rated by the English faculty
indicate thattherewere 112 compositionsinvolved,thatthe mean
was 2.46 (on a O-to-5-point scale), and thatthe standarddeviation
was 1.11. This information is interesting in itselfbecause it indicates
thatthe testis fairlywell centered (i.e., the mean is almostexactly
halfwaybetween the lowest possible score of zero and the highest
possible score of five),and thescoresare spread out to a reasonable
degree (i.e., there is room for 2 standard deviations above and
below the mean withinthe range of possible scores from0 to 5).
However, the information is also usefulforcomparinggroupsto
each other.Consider the factthatthemeans forall of thegroupings
are very similarin thistable. That may indicate thattherewere no
major differencesamong the groups of compositionsas produced
by thetwo typesof studentsand ratedby thetwo typesof teachers.
However, note that the standard deviations for the compositions
writtenby the Englishstudentsare generallyhigherthanthose for
theessayswrittenby theESL students.This indicatesthatthescores
for the English compositionstudentswere more spread out than
those for ESL students,thatis, therewas greaterdispersionin the
writingscores of theEnglishcourse students.
Statistical differences.As useful and informativeas descriptive
statisticscan be, they are often not enough. There is a type of
statisticalreasoning that takes over at this stage in most studies:
Inferentialstatisticsinvestigate the extent to which descriptive
statisticsrepresenta larger population or other,similardata sets.
This mode of reasoning hinges on the concept of significant
differences.The significantdifferencesmost often of interestin
statisticalstudiesare the differencesobserved in comparingmeans,
comparing frequencies,or comparing correlationcoefficientsto
zero.
In comparingmeans (i.e., arithmeticaverages), it is possible that
any observed differencesare purelyaccidental.Afterall,ifyou give
a testto a group of studentson severaloccasions,you would expect
the means to be slightlydifferentbecause human beingssimplydo
not perform exactly the same on every occasion (e.g., some
students'scores mighthave been affectedby thefactthattheywere
sick, tired, depressed, etc., on one of the occasions). Indeed, it
would be verysurprisingif the testresultsturnedout to be exactly
the same on successive occasions. The issue thatresearchersmust
grapple withis whetherthe differencesthattheyobserve between
means are justsuch chance variationsor are due to some othermore
systematicfactor.The question being posed by theresearcherand

READING STATISTICAL LANGUAGE STUDIES 575

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
answered throughstatisticaltests is whether or not there is a
significantdifferencebetween means.
For example, considera hypotheticalstudyin whichthe average
number of correctly defined words on a vocabulary test is
compared for two groups: one group that received lessons using
language teachingMethod X for6 weeks,and thecontrolgroupthat
received 6 weeks of instruction based on Method Y. The problemis
that the two means will naturallyvary to some degree by chance
alone. The question that the researchersmust resolve is whether
thereis a significantdifferencebetween themeans (i.e., whetherthe
observed differencebetween the means is systematicor occurred
by chance alone).
If thereis a significantdifference,the researchercan say with a
certainamountof confidencethattheobserved differencebetween
the two means was not just accidental. This is an importantissue
because, if the group learningvocabulary under Method X has a
highermean thantheothergroup,thehighernumberof vocabulary
words learned can probably be attributedto the effectsof Method
X (provided the experimentwas conducted properly).This would
constitutean argumentin favor of Method X, which might be
interestingto other language educators responsible for teaching
vocabulary. As described here, it is an argumentbased entirelyon
comparingthe mean performancesof the groupsinvolved.
In comparing frequencies,it is also possible that any observed
differencesare due to chance variations.Afterall, if we tally the
numbers of Taiwanese and Korean studentsin an ESL class on
successive days, we would expect the resultingfrequencies (also
known as tallies) to be slightlydifferent days. It might
on different
turnout thatthereare 7 Taiwanese and 11 Koreans on thefirstday,
6 Taiwanese and 12 Koreans on the second day, 7 Taiwanese and 9
Koreans on the thirdday, etc. Indeed, it would be surprisingif the
frequenciesturnedout to be exactlythesame everyday. Similarly,
by chance alone, one would expect the numbersof Taiwanese and
Koreans to vary fromgroup to group because of chance variations
in the proportions of different nationalities. The issue that
researchersmustcome to gripswithis whetherany differencethat
is observed between frequenciesis a chance variationor is instead
due to some other more systematicfactor.In short,the question
being posed by a researcher who is comparing frequencies is
whether or not there is a significantdifference between the
observed frequenciesand the frequenciesthatwould be expected
by chance alone.
An example of thistype of studymightoccur in performinga
needs analysis for a language course. Perhaps the researcheris

576 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
interestedin the frequency,or tally,of people who are interestedin
studyingpronunciationas compared to thoseinterestedin studying
grammar.The question that a researchermight pose is whether
there is a significantdifferencebetween the existingfrequencies
and the fifty-fiftysplit that would be expected by chance. If the
researchercan state that there is a significantdifference,it will
indicatethattheobserved differenceis due to factorsotherthanthe
chance fluctuationsthatwould occur naturally.
Similar reasoning may be used in comparing correlation
coefficients to zero. Correlation coefficientsare indexes that
representthe degree of relationshipbetween two sets of numbers.
Correlation coefficients can range from 0.00 (if there is no
relationship)to 1.00 (if thereis a verystrongrelationship).Consider
the followingdata set:

EFL EFL LastEFL Study


Students TestA TestB (yearssince) Age
Maria 100 97 0.0 26
Jaime 98 97 0.5 28
Carla 87 84 2.0 26
Jose 82 85 1.5 28
Juanita 77 74 4.0 27
Jimmy 55 52 3.5 27

EFL Tests A and B appear to be highlyrelated in the sense thatas


one set of numbers goes up so does the other. The resulting
correlationcoefficientturnedout to be a very high .99. If two sets
of numbers are not related at all, the expected correlation
coefficientis 0.00. This is the case forAge and Test B in the above
example data where the correlationcoefficientturnedout to be
veryclose to zero at 0.01.
It is importantto note thatcorrelationcoefficientscan also take
on negative values anywherebetween 0.00 (if thereis no relation-
ship) and -1.00 (if thereis a strongbut opposite relationship).In
such cases, the two sets of numbers are related but in opposite
directions.For example, Test B and Last EFL Studyin the above
example are fairlystronglyrelated but in opposite directions.In
other words, as one set of numbers goes up, the other set goes
down. The resultin thiscase is a highbut negativecorrelationcoef-
ficientof -.86. In short,correlationcoefficientscan range from
-1.00 (for strong,but opposite relationships)to 0.00 (for no rela-
tionship)to +1.00 (forstrongrelationshipsin the same direction).
One problem with correlationcoefficientsis that even for two
sets of random numberssome degree of correlationmay be found

READING STATISTICAL LANGUAGE STUDIES 577

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
by chance alone. For instance,on successivesets of randomly
selectednumbers,correlation coefficients of .12, .07, -.17, -.01,
etc., might be found by chance alone. Indeed, it would be
surprising if a correlation coefficient of exactly0.00 were found
every time. The issue that researchers must deal withis whether the
correlation coefficients thatoccurin a studyare justsuchrandom
(or chance)variations aroundzero,or ratherare due to somemore
systematic relationship betweenthesetsof numbers.The question
being posed by the researcher is whetherthereis a significant
relationship, i.e., a significant difference betweenthe correlation
coefficient thatwas observedin a studyand a zero (or chance)
correlation. If the researchercan statethatthereis a significant
difference,the correlationcoefficientobserved in the study
probablyvariedfromzeroforotherthanchancereasons.In simpler
terms,itindicatesthattherelationship betweenthesetsofnumbers
is probablysystematic-not justa chancerelationship.
Probabilities. Since thereis always at least some possibilitythat
differences are due to chance,researchers use statistical teststo
a
compute particularsignificant difference in terms of the
probability that observed differences would occur by chance alone.
In otherwords,whena researcher statesthatthereis a significant
difference(between the two means, between observed and
expected frequencies,or betweena correlationcoefficient and
will
zero),theseresults always be stated in terms of the probability
thattheobserveddifference was due to chancefluctuations.
Theseprobabilities are usuallyexpressedas p valuesinstatistical
studies.Theywillnormally be written as p < .01,or p < .05,or as
exact figures,e.g., p = .9681. The p standsfor probability.In
straightforward terms,p is thelikelihoodthattheresearcher willbe
wrong in statingthat thereis a statisticaldifference(whether
betweenmeans,betweenobservedand expectedfrequencies, or
betweena correlation coefficient and zero) if,in fact,thereis no
difference.Thus, if p < .01, the probability(assumingchance
alone)is lessthan1%thattheobserveddifference wouldbe so large,
giving strong evidence against pure chance. Similarly, if p < .05,
theprobability is lessthan5%thatan observeddifference thislarge
couldhave occurredby chancealone.
The probability levelsof .05 and .01 (also referred to as alpha
levels) are used by convention in most social science researchto
definethethreshold of statistical
significance.The choicebetween
the.01and .05valuesis governedbyhowstrict theresearcher wants
to be withregardto theconclusionsthatare drawnfroma study.
Whena studyis aboutan important medicine,we wantto be very

578 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
sure that it will not hurt patients.Thus a conservative.01 value
mightbe selected so that there is a 99%probabilityof nonchance
results.If the studyis about a new way of teachingreduced forms,
the decision is perhaps not quite so crucial, and therefore,we can
accept the .05 value, which indicatesthatwe are willingto accept a
less restrictive95%probabilityof nonchanceresults.
The determinationof significantdifferencesand theirassociated
probabilitiestakes many forms,but the most commonlyreported
types are the threethat have to do with means, frequencies,and
correlationcoefficients.It is importantto note thata studyseldom
compares only two means, or contrastsonly two frequencies,or
examines one correlationcoefficientto see if it varies fromzero.
More commonly, there are a number of means involved, or a
number of frequencies,or a numberof correlationcoefficientsto
complicate the picture. Nevertheless,the underlyingprocesses of
checkingsignificantdifferencesand determiningprobabilitiesare
the same.
Statisticaltests.The process of determining statisticalsignificanceas
described in theStatisticalDifferences and Probabilities subsections
above is referredto as performinga statisticaltest.The threemost
commonlyreportedtypesof statisticaltestsare used in the example
study: mean comparisons, comparisons of frequencies, and
comparingcorrelationcoefficientsto zero.
Example mean comparisonsare discussed in the example study:
In short,thesmalldifferences amongthemeansshownin Table 2 can
be
only interpreted as chance whicharenotattributable
fluctuations, to
systematic differencesbased on thevariablesused in thisstudy.
In the above quote, the statisticalreasoningforcomparingpairs
of means is explained. In thiscase, therewere threecomparisonsof
interest:(a) the differencebetween the mean scores for the two
types of students, English composition or ESL composition
students;(b) the differencebetween the mean scores assigned by
the English facultyand ESL facultyraters;and (c) the difference
between themean scoresthatresultedfromthetwo different orders
in which compositionswere rated (firstor second). In all three
comparisons,it turnedout thattherewere no significant differences
between the two typesof students,the two typesof raters,and the
two typesof orders.
The statisticaltestbeing used is the F test,the resultsof which
are based on the F statisticreportedin the second column from
the rightin Table 3 of the example study.The essentialinformation
is found in the column on the far rightwhere the probabilitiesare

READING STATISTICAL LANGUAGE STUDIES 579

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
givenin thecolumnlabeled p. Notice thateach of thesep values has
an asterisknextto it and thatthe asterisksreferto the statementat
the bottom of the table indicatingthatp > .05. This is read as "the
probabilityis greaterthan .05" and indicates thatrandom chance
alone could produce resultslike thesemore than5%of thetime;this
is not convincing evidence against chance. Thus, the researcher
would be wrong in statingthattherewas a significantdifference.
(In fact,the p values reportedin thetable are muchhigherthan.05,
so theyindicate thatthe observed resultsare quite consistentwith
random chance.) The first,second, and fourth p values are
associated with one F ratio each for the main factorsin Table 3:
Student Type, Rater Faculty, and Order. The quote cited above
discusses what thesethreestatisticaltestsindicate.
Example frequencycomparisonsare shown in Tables 4 through7
of the example study.The statisticaltestbeing used is knownas the
chi-square test,or simply x2. You will notice the asterisksin each
table referto the statementat the bottomof the table thatp < .05.
In thiscase, the statementwould be read as "the p value is less than
.05" and indicates that the researcherwas justified(in those cases
marked with an asterisk) in statingthat there was a significant
differencebetween the relativefrequenciesof Englishfacultyand
ESL raters who chose a particularfeature,that is, the observed
differencesare unlikelyto have occurred due to chance alone.
Hence, we conclude thatthereare systematicdifferencesbetween
Englishfacultyand ESL raters.
Notice thatthereare two steps involved in interpreting Table 4.
First,there is the overallX2 value to consider. This value, located in
the bottomrow, is foundto be significant at p < .05 (as indicatedin
the line justbelow the table). This resultsimplysuggeststhatone or
more of the frequenciesin the table differedfromwhat would be
expected, and more detailed analyses are justified.In order to
investigatewhich of the specific pairs (English facultyand ESL
raters) of relativefrequenciesmightbe contributingto the overall
significantdifference,the x2 values forpairs were also calculated.
These x2 values are reported in the column furthestto the right.
They indicatethattherewas a significant difference(at p < .05) for
the English facultyratersand ESL raterson Cohesion, Organiza-
tion,and Syntax(i.e., the frequenciesobserved fortheEnglishand
ESL raters on Cohesion, Organization,and Syntax were signifi-
cantly different).In contrast,the frequencies of response for
English and ESL faculty were not significantlydifferentfrom
expectationsfor the ratingcategories of Content,Mechanics, and
Vocabulary. Similar two-step interpretations can be drawn from
each of the otherfrequencytables (Tables 5-7).

580 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
to zero are
Example comparisonsof correlationcoefficients
shown in Table 1. The statisticaltest results are based on the
Pearson product-momentcorrelationcoefficient,or simplyr. The
asterisksin the table once again referto thestatementat thebottom
of the table that p < .05. The statementwould be read as "the p
value is less than .05," which, in this case, indicates that the
researcherwas justifiedin each case in statingthatthe correlation
coefficientwas significantly different fromzero. More specifically,
randomchance produces such strongcorrelationsless than5%of the
time. Based on this evidence against pure chance, the researcher
was justified in stating that there was a significantdifference
between each correlationcoefficientand zero. For instance,the
resultsin Table 1 show that, even though it is relativelylow in
magnitude,thecorrelationcoefficientof .37 between GroupsA and
B of the English facultyratersdiffersfromzero forreasons other
than chance. The same is true for all of the other correlation
coefficientsin thistable. Such is not always the case. Other studies
may well find correlationcoefficientsthat are not significant,
indicatingjust chance differencesfrom zero with no systematic
associationbetween the sets of numbersinvolved.
Significanceversusmeaningfulness. It is importantto recognizethat
a statisticallysignificantdifferenceis just that, and no more.
Significantdifferences,whetherworkingwithmeans, frequencies,
or correlationcoefficients,simplyindicatethatwe have concluded
thatthe observed differencesare due to otherthanchance factors.
In otherwords,the differencesare systematicin some way. It does
not indicate that the differencesare necessarily interestingor
meaningful.In fact,a differencecan be statistically significant,
yet
be so small thatit is not at all meaningfulor interesting.
For instance,the correlationcoefficientof .37 was found to be
statisticallysignificantat .05 (i.e., the correlationcoefficientis
probablydifferent from0.00 forotherthanchance reasons),but the
meaningfulness of therelationshipbetween thetwo setsof numbers
is a separate issue. In thiscase, the numbersare scores assigned by
two ratersand the low correlationcoefficientindicates that there
was some associationbetween the two sets of scores but thereare
other important factors that are still not accounted for. The
weakness of agreement found here is worrisome because it
indicates thatthe scores may not be very reliable. Thus, thisis an
example of a correlationcoefficientthatis statisticallysignificant,
but not verymeaningfulin magnitude.
Similarly,if two means are statisticallydifferentat p < .05, yet
only differby two pointsout of 100,thenthe resultmightnot be at

READING STATISTICAL LANGUAGE STUDIES 581

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
all meaningful.Likewise, if a set of observed frequenciesdiffers
fromexpected frequenciesat p < .01,yetdiffersto an uninteresting
degree, the results may not be meaningful.Thus, it is always
importantto examine the descriptivestatisticsin any study and
thinkabout any statisticaltestsin termsof descriptivestatisticsso
thatyou can determinewhetherany significantdifferencesare also
meaningful.
The importantthingto remember,then,is thatmeaningfulness is
a separateissue fromstatisticalsignificanceand thatmeaningfulness
will depend on all of the factorsinvolved in the situationin which
the study was conducted. When reading a statisticalstudy, you
might want to check to make sure that the researcherhas kept
separate these two issues of significance(i.e., can we rule out
chance?) and meaningfulness(i.e., is the differencelarge enough to
be interesting?)and interpretedthemclearly.
It is importantto rememberthat statisticalstudies are no more
likelyto be infalliblethanany otherformof argumentation. Authors
make errors,and computers make errors.However, if a study is
properlycarried out and the resultsare adequately described and
systematicallyexplained, such studies can help us to view the
importantissues in our fieldin new and usefulways.

EvaluateWhatYou Have Read in Relationto Your Professional


Experience
So far,you have used the abstractto decide ifyou wanted to read
thestudy,used theorganizationof thepaper to help you understand
how the study was conducted, and used some basic concepts to
interpretthe statisticalreasoningof a study.You are probablynow
at a pointwhere itmakes sense to pull away fromthestudya bit and
thinkabout it more critically.There are six typesof questionsthat
may prove usefulin thinkingabout the articleafterhavingread it.
These questionswill enable you to know,comprehend,analyse,ap-
ply, synthesize,and evaluate what you have read. (These six cate-
goriesare taken fromBloom's [1956] taxonomy.Note thattheyare
presentedhere in a slightlydifferentorder fromthe original.)In
about the
short,afterreadingan article,tryto recallbasic information
articleby askingyourselfQuestions1-3 below; thentryto relatethe
articleto yourprofessionallifeby askingyourselfQuestions4-6:
1. Know: Who wrotethearticle?When?In whatjournal?(Usefulfor
identifying the studywhen referring to it)
2. Comprehend:In a sentence,what was the articleabout? (Useful
forbrieflysummarizingthestudy)

582 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
3. Analyze:Whatsectionswas thearticledividedinto?(Usefulfor
recallingtheoverallstructure ofthestudy)
4. Apply:How canyouapplywhatyoulearnedinthearticletoyour
professional EFL/ESL teaching situation?(Usefulfordetermining
whether thearticleis applicableto yourteaching experience)
5. Synthesize: How doesthearticlerelatetootherprofessional books
orpapersthatyouhaveread?(Usefulforseeinghowthestudyfits
intotheprofessional literature)
6. Evaluate:How good was thequalityof thearticleinternally (in
termsof style,organization, reasoning,etc.)? How good was it
externally (i.e., in termsof everything else you knowaboutthe
profession)?(Usefulfor evaluatingthe overallqualityof the
article)
Going throughthesequestions(or similarones) will help you to
rememberwhich article you read, comprehendits essential
message,analyzetheconstituent partsofthearticle,applywhatwas
learnedin thearticleto yourprofessional situation,
synthesizewhat
you found in the article with otherpointsof view in theprofession,
and evaluate the quality of the article (both internallyand
externally).

LearnMoreaboutStatistics andResearchDesign
Havinggone thisfarin theprocessof understanding statistical
studies,you may now be intriguedby the of
prospect learning
more.Forinstance, youmayhaveheardaboutANOVAs,regression
analyses,factoranalyses,and otheranalysesnotdirectly coveredin
thisarticle.It is onlyby learningmorethatyou will be able to
understand someofthesemorecomplexanalyses.In fact,itis only
by learningmorethatyouwillbe able to decidewhether theauthor
of a givenstudychosethecorrectstatistical testsat all,or whether
the assumptions thatare requiredformanystatistical testswere
met.
Thereare a numberof ways to learnmoreaboutstatistics and
researchdesign.In additionto Part2 of thisdiscussion, thereare
booksspecifically designedto helplanguageteachersdo statistical
research:Butler(1985), Hatch and Farhady (1982), Hatch and
Lazaraton(1991),Seligerand Shohamy(1989),Woodsand Fletcher
(1986),etc.Another book,Brown(1988),is designedtohelpreaders
who are onlyinterested in reading(ratherthandoing) statistical
research.If the topics thatinterestyou are more closely related to

READING STATISTICAL LANGUAGE STUDIES 583

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
the statisticsand researchin the area of language testing,it may be
more appropriateto read referencessuch as Bachman (1990) and
Henning(1987). If you have no idea whichbook to choose, it might
be useful to read Hamp-Lyons' (1989, 1990) book reviews which
describe a numberof the volumes listedabove.
I am not advocating that every EFL/ESL teacher read and
internalizeall of the knowledge in these books. However, I am
suggestingthata numberof strategiesare available to teachers:(a)
for some teachers,a thoroughreading of one or two of the books
listed above may be just what is needed; (b) for otherteachers,it
may prove useful to use several of the books listed above as
referencesto exploretopics in moredepth as need arisesin reading
statisticalstudies; (c) stillotherteachersmay be more comfortable
withthe structureprovided by takingan organized course in basic
researchdesign and statisticsat a local college or university.
Regardless of the strategythat is used, learning more about
statisticalresearchcan help not onlyin understandingthestatistical
studies in the professionalliteraturebut also in grapplingwith the
researchthatis reportedin thelay media, much of whichis done in
the same statisticalresearchparadigmthatis used in our field. (For
an excellentand easy to read treatmentof how numbers,figures,
and tables are used to fool the generalpublic, you may wantto read
a book appropriatelytitledHow to Lie withStatistics,Huff& Geis,
1954. Yes, it is stillin print.)Armed withsuch knowledge,teachers
can then defend themselvesagainst numbers,and understandthe
reasoningthatsurroundstheiruse.

CONCLUSION
This article set out to provide attack strategiesfor EFL/ESL
teachers to use in gaining access to statistical studies. These
strategiesinclude using the abstractand conventionalorganization
of statisticalpapers to guide reading,examiningthe statisticalrea-
soning,criticallyevaluatingwhat theresultssignifyto each reader,
and learningmore about statisticalstudies.There are a numberof
reasons why I hope that some readers will find these suggestions
useful.First,ifthestudiesthatappear in the TESOL Quarterlyhave
a largerinformedreadership,such studieswill have greaterimpact
on the field. All of us must use all available informationabout
language learningand teachingto improvethe ways thatwe serve
our EFL and ESL students.Second, itis onlyby havingan informed
readershipthat the quality of the statisticalstudies in the TESOL
Quarterlycan be assured. Though the review process forselection
of articles is thoroughand fair,there are no guaranteesthat the

584 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
articlesthatappear in printare 100%corrector uncontroversial.It is
thereforeour responsibilityto read any articlesthat interestus as
carefullyand criticallyas we can so that the interfacebetween
teachingand researchcan be strengthened.

ACKNOWLEDGMENTS
I wouldliketothankKathleen Bailey,GrahamCrookes,ThomHudson,AndrewF.
Seigel,and AnnWennerstrom fortheirinsightful
commentsand suggestions
on an
earlierversionofthispaper.

THE AUTHOR
J.D. Brownis on thefaculty
oftheDepartment ofESL at theUniversityofHawaii
at Manoa.He haspublishednumerous on languagetesting
articles andcurriculum
development, and a book on critically
readingstatistical
studies(Understanding
Researchin SecondLanguageLearning, 1988,CambridgeUniversity Press).

REFERENCES
AmericanPsychologicalAssociation.(1983). Publicationmanualof the
AmericanPsychological Association.Washington,DC: Author.
Bachman,L. F. (1990). Fundamental considerationsin languagetesting.
Oxford:OxfordUniversity Press.
Bloom,B. (Ed.). (1956).Taxonomyof educationalobjectives:Handbook
1. Cognitivedomain.London:Longman.
Brown,J.D. (1988).Understanding researchinsecondlanguagelearning:
A teacher'sguide to statistics
and researchdesign.Cambridge:Cam-
bridgeUniversity Press.
Butler,C. (1985).Statistics
inlinguistics.
Oxford:Blackwell.
Hamp-Lyons, L. (1989).Recentpublicationson statistics,
languagetesting,
and quantitativeresearchmethods: I. TESOL Quarterly,23(1),
pp. 127-132.
Hamp-Lyons,L. (1990). Recent publicationson statistics,
language testing,
and quantitative research methods: II. TESOL Quarterly, 24(2),
pp. 293-300.
Hatch,E., & Farhady,H. (1982).Researchdesignandstatistics
forapplied
linguistics.Rowley, MA: Newbury House.
Hatch,E., & Lazaraton,A. (1991). The researchmanual:Design and
statisticsforapplied linguistics.Rowley,MA: Newbury House.
Henning,G. (1987).A guideto languagetesting.Rowley,MA: Newbury
House.

READING STATISTICAL LANGUAGE STUDIES 585

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions
Huff,D., & Geis,I. (1954).How to lie withstatistics.
New York:Norton.
Seliger,H. W.,& Shohamy, E. (1989).Secondlanguageresearchmethods.
Oxford:OxfordUniversity Press.
Woods,A.,Fletcher, P., & Hughes,A. (1986).Statistics
inlanguagestudies.
Cambridge:CambridgeUniversity Press.

586 TESOL QUARTERLY

This content downloaded on Mon, 25 Feb 2013 21:53:27 PM


All use subject to JSTOR Terms and Conditions

You might also like