The Dimensionality of Language Ability in Young Children: Language and Reading Research Consortium

Child Development, November/December 2015, Volume 86, Number 6, Pages 1948–1965
The Dimensionality of Language Ability in Young Children

Language and Reading Research Consortium
The purpose of this study was to empirically examine the dimensionality of language ability for young chil-
dren (4–8 years) from prekindergarten to third grade (n = 915), theorizing that measures of vocabulary and
grammar ability will represent a unitary trait across these ages, and to determine whether discourse skills rep-
resent an additional source of variance in language ability. Results demonstrated emergent dimensionality of
language across development with distinct factors of vocabulary, grammar, and discourse skills by third
grade, confirming that discourse skills are an important source of variance in children’s language ability and
represent an important additional dimension to be accounted for in studying growth in language skills over
the course of childhood.
Scholars within a number of disciplines (e.g., dered language in children is based on evidence of
speech–hearing sciences, developmental psychol- symptoms (delays or deficits) across one or more of
ogy, and linguistics) tend to view language ability the dimensions of vocabulary, grammar, narrative,
as a complex system consisting of multiple dimen- discourse, or pragmatics (American Psychiatric
sions. As a result, textbooks and discussion on lan- Association, 2012). Such practices further reinforce
guage development and disorders are often the perspective that language ability comprises
organized by different dimensions of language (e.g., multiple dimensions and that these dimensions rep-
Bishop, 1997; Schwartz, 2009). The presumed multi- resent independent abilities. Furthermore, it has
dimensionality of language ability has practical been shown that early weaknesses in vocabulary,
import as well. For instance, the diagnosis of disor- grammar, and discourse comprehension and pro-
duction are related to later reading comprehension
This paper was prepared by a Task Force of the Language and difficulties (Catts, Fey, Zhang, & Tomblin, 1999;
Reading Research Consortium (LARRC) consisting of Jill Penti- Nation, Cocksey, Taylor, & Bishop, 2010). Thus, our
monti (Convener), Ann O’Connell, Laura Justice, and Kate Cain. understanding of the structure of language extends
LARRC project sites and investigators are as follows: Ohio State
University (Columbus, OH)—Laura M. Justice (Site PI), Richard beyond oral language to literacy and educational
Lomax, Ann O’Connell, Jill Pentimonti, Stephen A. Petrill, and attainment.
Shayne B. Piasta; Arizona State University (Tempe, AZ)—Shelley A premise relevant to the present study is that
Gray (Site PI) and Maria Adelaida Restrepo; Lancaster University
(Lancaster, UK)—Kate Cain (Site PI); University of Kansas such conceptualizations of language ability are la-
(Lawrence, KS)—Hugh Catts (Site PI), Mindy Bridges, and Diane rgely theoretical in nature. To date, there has been
Nielsen; University of Nebraska-Lincoln (Lincoln, NE)—Tiffany limited empirical assessment of the extent to which
Hogan (Site PI), Jim Bovaird, and J. Ron Nelson. Jill Pentimonti is
now at American Institutes for Research. Stephen A. Petrill was a the various theorized dimensions of language do
LARRC coinvestigator from 2010 to 2013. Hugh Catts is now at indeed represent latent abilities at a given point in
Florida State University. Tiffany Hogan is now at MGH Institute a child’s development. Importantly, there is limited
of Health Professions. J. Ron Nelson was a LARRC coinvestigator
from 2010 to 2012. empirical justification for measuring language abil-
This work was supported by Grant R305F100002 of the Insti- ity (and diagnosing language disorders) as a set of
tute of Education Sciences’ Reading for Understanding Initiative. distinct psychological dimensions (using subtests
We are grateful to the numerous staff members, research associ-
ates, school administrators, teachers, children, and families who for interpretation) or as a unitary psychological trait
participated. Key personnel at study sites include: Garey Berry, (using the composite for interpretation; see Tomblin
Beau Bevens, Jennifer Bostic, Shara Brinkley, Lori Chleborad,
Dawn Davis, Jaclyn Dynia, Michel Eltschinger, Tamarine Fore-
man, Rashaun Geter, Sara Gilliam, Miki Herman, Jaime Kubik,
Trudy Kuo, Gustavo Lujan, Carol Mesa, Denise Meyer, Maria © 2015 The Authors
Moratto, Kimberly Murphy, Marcie Mutters, Trevor Rey, and Child Development published by Wiley Periodicals, Inc. on behalf of Society
Stephanie Williams. for Research in Child Development.
The views presented in this work do not represent those of the This is an open access article under the terms of the Creative Commons
federal government, nor do they endorse any products or find- Attribution-NonCommercial License, which permits use, distribution and
ings presented herein. reproduction in any medium, provided the original work is properly cited
Correspondence concerning this work should be sent to Jill and is not used for commercial purposes.
Pentimonti, 1000 Thomas Jefferson St., Washington, DC 20007. All rights reserved. 0009-3920/2015/8606-0020
Electronic mail may be sent to jpentimonti@air.org. DOI: 10.1111/cdev.12450
Dimensionality of Language 1949
& Zhang, 2006). This study empirically examined Furthermore, grammatical development depends on
the dimensionality of language ability for consecu- a critical lexical base, and different phases of
tive age groups from 4 to 8 years. Using a cross- grammar development are linked to different
sectional sample and measures of the language phases of lexical growth (Marchman & Bates, 1994).
dimensions of vocabulary, grammar, and discourse, These two strands of evidence lead some to argue
study outcomes help to advance our fundamental that the apparent dimensionality of different lan-
understanding of the nature of language ability guage systems might be an emergent, rather than
among children and inform practices that depend innate, property of a developing language system
on this understanding. (Elsabbagh & Karmiloff-Smith, 2006).
Dimensionality of Language Ability Empirical Investigations of Language Dimensionality

Language is a complex system; thus, it comes as The period of language development examined
no surprise that there are different ways to concep- in this article is 4–8 years of age, which is typically
tualize its dimensions. One approach is to distin- later than the age range examined in the modular-
guish between language performance in different ity debates on vocabulary and grammar (e.g., Bates
modalities, that is, comprehension (what is under- et al., 1995). During this period, extensive language
stood) versus production (how language is used). development occurs and many language and lan-
In support of this approach, there are clear dissocia- guage-related disorders are diagnosed, such as
tions between comprehension and production dur- specific language impairment (SLI) and dyslexia.
ing early language development: For example, For these reasons, we consider it essential to under-
children between 8 and 16 months understand stand how best to conceptualize language within
many more spoken words than they produce this age range to inform not only theory, but also
(Bates, Dale, & Thal, 1995), and comprehension and practices such as assessment, intervention, and
production result in specific and different patterns instruction.
of brain activity in 1-year-olds (Mills, Coffey- We know of only one prior empirical investiga-
Corina, & Neville, 1997). However, research tion of the dimensionality of language ability in
empirically examining the distinctiveness of the typically developing young children, conducted by
comprehension versus production modality, at least Tomblin and Zhang (2006). Using data collected as
for children 5 years of age and older, fails to part of a longitudinal epidemiological study of lan-
support these modalities as true dimensions of guage disorders among kindergarten children,
language ability (see Tomblin & Zhang, 2006). Tomblin and Zhang examined whether children’s
Thus, the research reported here cuts across modal- language ability at kindergarten, second, fourth,
ity to examine the distinctiveness of theorized and eighth grades represented a uni- or multidi-
dimensions of language, and considers whether mensional trait, and whether representation of this
these should be conceptualized as a single system trait (or traits) changed as children grew older. The
or separate subsystems in the early primary grades. dimensions of language ability assessed were
The majority of research that speaks to this ques- vocabulary and grammar and were measured for
tion has examined the relation between vocabulary both modalities of comprehension and production.
and grammar. One school of thought is that vocab- From an initial sample of 1,929 kindergarteners,
ulary and grammar represent separate modular about 600 children were followed over time
systems, each governed by different underlying (Tomblin & Zhang, 2006). Modality (comprehension
systems or mechanisms (Pinker, 1998). Certainly we and production) was best considered to be unidi-
find that in languages such as English, one-word mensional. Critically, there was evidence of
production precedes the use of grammar in the emergent dimensionality: In the linguistic dimen-
form of inflections on single words or combinations sion, the correlations between factors representing
of words. This temporal asynchrony in lexical and vocabulary and grammar at each grade (kinder-
grammatical development could be taken as evi- garten, second, fourth, and eighth grades, respec-
dence that different modular learning mechanisms tively) showed that these were high (e.g., rs = .94,
for words and grammar exist. However, the com- .93, .90, and .78 with increasing grade). However,
putational approach for simulating language devel- the lower value (.78) seen at eighth grade sug-
opment has demonstrated that the same learning gested that language ability might be considered
mechanism could be used for both vocabulary and multidimensional in the later grades. The possibil-
grammar acquisition (Plunkett & Marchman, 1993). ity of emergent dimensionality was supported by
1950 LARRC
confirmatory factor analysis. At the kindergarten,

Aims of the Present Study
second grade, and fourth grade time points, the
one-dimensional model fit well and represented the The overarching aim of the present study was
most parsimonious approach to conceptualizing to further our understanding of the dimensional-
dimensionality of language ability at these grades. ity of language ability among children in the 4-
In contrast, at eighth grade, the two-dimensional to 8-year-old age range. Two interests were
model separating vocabulary and grammar was the addressed, the first of which was to examine
better fitting model. This work suggests that a unidimensionality of language ability with respect to
dimensional model may not be the most accurate vocabulary and grammar. Prior work has indi-
description of language abilities across develop- cated that these two dimensions of language rep-
ment. resent a unitary trait across these ages (see
Tomblin & Zhang, 2006); however, a distinguish-
ing feature of the present work is that we extend
Language Dimensionality: A Broader Perspective
the study of language dimensionality to children
Tomblin and Zhang (2006) note that they did not in prekindergarten and include a wider collection
include measures of language that are often of multiple indicators of language ability for each
included in the dimension of discourse (or text) proposed dimension to derive latent variables.
and, therefore, that their findings cannot be general- The second interest was to determine if discourse
ized more broadly. An understanding of the struc- skills represent an additional source of variance
ture of language including discourse is critical for in language ability that is unique from vocabulary
both theoretical and practical reasons. Theoretically, and grammar at these ages. Despite theoretical
vocabulary and grammar have been conceptualized representation in the literature of the distinctive-
as lower level skills that are important, but not suf- ness of vocabulary and grammar versus discourse
ficient, for discourse comprehension, which has language skills in the literature on reading devel-
been called a higher level dimension of language opment (Cain, Oakhill, & Bryant, 2004; Dickinson,
(Kim, 2014; Lepola, Lynch, Laakkonen, Silven, & Golinkoff, & Hirsh-Pasek, 2010; Kim, 2014; Lepola
Niemi, 2012). Measures of discourse include the et al., 2012), the present work is the first effort of
ability to understand narrative, generate inferences, which we are aware to empirically determine
and monitor comprehension language. Notably, whether these theorized dimensions in fact repre-
these discourse-level language skills develop early: sent valid dimensions of language ability. In addi-
For example, comprehension monitoring during a tion, we assessed whether these dimensions
narrative is evident between 30 and 39 months become more salient across development, as
(Skarakis-Doyle & Dempsey, 2008) and infants’ found by Tomblin and Zhang (2006), theorizing
early word learning depends on inferential process- that the emergence of discourse language skills as
ing (Tomasello, Carpenter, & Liszkowski, 2007). a dimension of language ability might only be
Furthermore, when discourse-level language skills observed for the older children in our sample.
such as inference, narrative skills, and comprehen-
sion monitoring are included alongside vocabulary
and grammar, each predicts variance in listening
Method
comprehension in 4- to 6-year-olds (Kim, 2014;
Lepola et al., 2012; Silva & Cain, 2014). This litera- Data for this study were collected as part of a mul-
ture suggests that if the dimension of discourse tisite, 5-year longitudinal research project conducted
were included in an examination of language by the Language and Reading Research Consortium
dimensionality, it might form a separable dimen- (LARRC). The purpose of the LARRC longitudinal
sion to vocabulary and grammar. On the practical study was to identify and model language pro-
side, there is a strong body of work demonstrating cesses important for reading comprehension in chil-
the language bases of literacy outcomes (Berninger dren enrolled in prekindergarten (PK) to Grade 3
& Abbott, 2010; Catts, Hogan, & Fey, 2003; Hogan, (G3). The longitudinal study featured a cohort
Cain, & Bridges, 2012). More accurate models of design, such that in Year 1 five cohorts of children
language structure in prereaders and beginner read- in each of PK and G3 were ascertained and then
ers therefore have broader implications beyond sim- followed until G3; children thus graduated from
ply language development and difficulties: It might the study annually. The study design thus yields
better inform our identification of children at risk of both cross-sectional data and longitudinal data. In
later literacy difficulties. this study, we utilized the cross-sectional data col-
lected for five waves of children (PK–G3) during

Procedure
the 2011–2012 academic year.
The primary procedure for this study, beyond
recruiting the longitudinal cohorts of children,
Participants
was that of assessment. Children completed a
Participants were 915 children in PK through G3 comprehensive assessment battery in the latter
recruited from 69 schools and/or preschool centers half of the academic year (January through May).
at four data collection sites (Arizona, Kansas, The battery required an average of 5–6 hr to
Nebraska, Ohio). To recruit participants into the complete, with measures administered in 15- to
study, teachers provided children’s caregivers with 40-min blocks. For the measures utilized in the
recruitment packets that contained literature on present study, all children were assessed individu-
LARRC, a caregiver consent form, a family screener ally by trained assessors in quiet locations at their
questionnaire, and a return envelope. Caregivers schools, a university laboratory, or other public
could return the forms directly back to the teacher facility (e.g., library). Assessors were certified for
or mail them to the research team. This process was a given measure following completion of an
used until each site recruited up to 110 PK children extensive standardized training program that
and at least 30 (but no more than 50) children from included, for instance, completion of a written
each of K–G3; the number of participants per site quiz concerning administration and scoring proce-
was designed to be approximately equal. The chil- dures (required 100% correct), and completion of
dren in our sample were similar to the relative pop- two live administrations that were observed by
ulation of children at recruitment sites across key an experienced assessor (90% accuracy to adminis-
demographic characteristics (e.g., eligibility to tration and scoring procedures based on rubrics
receive free/reduced price lunch and membership developed for this purpose). Measures that were
in racial/ethnic categories). Upon receipt of consent deemed to be too complex to be scored reliably
for a given child, research staff members reviewed in the field were audio recorded using digital
the family screener to ensure that consented chil- recorders and scored by trained research staff
dren could appropriately participate in assessments members subsequently in a laboratory setting.
selected for the general sample. This included being
rated as understanding and speaking English flu-
Measures
ently and not having severe or profound disabili-
ties. In addition, for PK children to be enrolled in The present study utilized a subset of measures
the sample, it was expected that they would matric- from the larger assessment battery that represent
ulate to K the following year. In instances when children’s language ability with regard to vocabu-
consent was received for the sibling of a selected lary, grammar, and discourse skills. Table 1 pro-
child, they were excluded from the sample. vides a list of measures administered by grade, as
Of the 915 participating children, 420 were in PK, well as information on which measures were audio
124 were in kindergarten (K), 125 were in Grade 1 recorded and postscored for scoring accuracy. Inter-
(G1), 123 were in Grade 2 (G2), and 123 were in G3. rater reliability was acceptable for all postscored
The PK cohort was purposefully designed to be the measures, with intraclass correlations ranging from
largest, as it would be followed up for 5 years and 0.86 to 0.99 across measures and grades. Missing
attrition was expected. Children’s ages ranged from data percentages ranged from 0% to 30%. The aver-
4 years 6 months to 8 years 6 months. The sample age missing data percentage across all grades and
included slightly more boys than girls (53% vs. 47%) all measures was 7.96%.
and the majority of children were White/Caucasian.
Specifically, 85% of children were White/Caucasian,
Vocabulary
8% were Black, 5% were Asian, and 2% were Other.
About 10% of children were Hispanic. Seventy-eight Three standardized measures of vocabulary were
percent of families reported speaking primarily administered. Among the vocabulary measures uti-
English at home; other languages spoken at home lized across all grades, all sample-specific reliabili-
included Spanish, Chinese, Amharic, and Viet- ties exceeded .70 with the exception of the Clinical
namese. Seventy percent of children resided in Evaluation of Language Fundamentals (CELF)
two-parent households. Nearly 10% of children had Word Classes subtest for the G3 sample (with
individual education plans, and about 16% qualified a = .69). Descriptive statistics are provided in
for free/reduced lunch. Table 1.
1952 LARRC
Table 1
Descriptive Statistics (M, SD) by Measure and Grade
PK K G1 G2 G3
Lower level language (LLL)/Grammar (G)

CELF–IV—Word Structure (CFWS)a 15.59 (5.56) 20.49 (5.22) 24.39 (3.96) 26.23 (3.58) 27.46 (2.90)
CELF–IV—Recalling Sentences (CFRS) 32.11 (13.9) 42.63 (13.94) 55.67 (13.62) 60.47 (13.72) 65.96 (13.86)
TEGI—Past tense probe (TEGT) 8.52 (4.31) 9.31 (4.84) — — —
TEGI—Third-person singular probe (TEGS) 6.99 (2.93) 7.56 (2.75) — — —
TROG (TRG) 6.24 (3.76) 9.76 (4.27) 13.86 (3.34) 14.41 (3.58) 15.54 (2.90)
Morphological derivation — — 7.89 (4.21) 10.57 (4.56) 13.72 (4.04)
LLL/Vocabulary (V)
PPVT 93.78 (19.31) 111.46 (18.59) 129.15 (16.95) 137.80 (16.68) 150.86 (16.81)
EVT 70.06 (13.79) 82.57 (12.11) 96.94 (14.01) 105.75 (13.77) 113.84 (14.28)
CELF–IV—Word Classes Receptive (CFWCr)a 14.29 (4.44) 17.55 (2.91) 18.97 (1.93) 19.77 (1.26) 11.31 (3.21)a
CELF–IV—Word Classes Expressive (CFWCe)a 7.82 (4.54) 12.61 (4.16) 14.90 (2.70) 16.38 (2.48) 6.44 (2.72)
Discourse (D)
Knowledge Violations Task 4.86 (2.88) 5.67 (2.86) — — —
Detecting Inconsistencies Task — — 4.45 (2.18) 4.75 (2.22) 5.50 (1.74)
Picture Arrangement Task 2.61 (2.90) 5.90 (3.75) 8.95 (2.64) — —
Sentence Arrangement Task — — — 0.87 (1.17) 1.12 (1.26)
Inference Background Knowledge (InfBK)a 0.76 (0.40) 1.13 (0.40) 1.16 (0.42) 1.20 (0.44) 1.68 (0.29)
Inference Integration (InfInt)a 0.87 (0.49) 0.97 (0.45) 1.07 (0.43) 1.13 (0.48) 1.39 (0.39)
Note. Word Classes 1 was administered to children in PK to G2 (scores ranged from 0 to 21 for the receptive subtests and from 0 to 18 for
the expressive subtest) and Word Classes 2 was administered to children in G3 (scores ranged from 0 to 19 for the receptive subtests and
from 0 to 13 for the expressive subtest). Standard scores from the PPVT and EVT are available from the first author. EVT = Expressive
Vocabulary Test; PK = prekindergarten; K = kindergarten; G1 = Grade 1; G2 = Grade 2; G3 = Grade 3; CELF = Clinical Evaluation of
Language Fundamentals; PPVT = Peabody Picture Vocabulary Test; TEGI = Test of Early Grammatical Impairment; TEGS = third-person
singular probe; TRG = Test for Reception of Grammar; TROG = Test for Reception of Grammar–Version 2. aPostscored measure.
Peabody Picture Vocabulary Test. The Peabody To assess comprehension of words, the child was
Picture Vocabulary Test, 4th ed. (Dunn & Dunn, asked to select the two words that went together.
2007), specifically Form A, assessed children’s com- For the productive portion, the child was asked to
prehension of single words. The assessor asked the verbalize how the two words were related. The
child to point to the picture, of four, that matched a total number of correct responses on both portions
verbally presented stimulus word. The total raw was tallied to compute the raw score utilized in
score was utilized in analyses. analyses.
Expressive Vocabulary Test. The Expressive
Vocabulary Test, 2nd ed. (Williams, 2007), Form A,
Grammar
assessed children’s productive vocabularies. The
assessor asked the child to provide a single word Six standardized measures of grammar were
label for a picture or to provide a single word syn- administered. Different measures were appropriate
onym for a target word. The total raw score was for different age ranges; thus, not all grade cohorts
utilized in analyses. received the same measures, as shown in Table 1.
Word Classes subtest (CFWCr and CFWCe). The Among the grammar measures utilized across all
Word Classes subtests of the CELF, 4th ed. (Semel, grades, all sample-specific reliabilities exceeded .70
Wiig, & Secord, 2003) assessed children’s abilities to with the exception of the CELF Word Structure
understand relationships among words related to subtest for the G3 sample (with a = .63).
semantic class features and produce these similari- Word Structure subtest (CFWS). The Word Struc-
ties and differences verbally. Word Classes 1 was ture subtest of the CELF (Semel et al., 2003)
administered to children in PK through G2, and assessed children’s abilities to apply word structure
Word Classes 2 was administered to children in G3. rules to indicate inflections, derivations, and com-
To administer this task, the assessor read a set of parison and to select appropriate pronouns to refer
words, which were also represented pictorially for to people, objects, and possessive relationships. The
the younger children assessed with Word Classes 1. child listened to a model sentence and then
supplied a missing word, using appropriate inflec- child to point to the picture for that sentence. For
tional morphology, in a second sentence that mir- each sentence, one picture represented the target
rored the first. All children completed this subtest, meaning and the other three pictures represented
although a stop rule of eight incorrect responses grammatical or lexical foils. The child was awarded
was utilized for PK children. The total number of 1 point if all item responses within a block were
correct responses was tallied as the raw score. correct; the total number of correct blocks was used
Recalling Sentences subtest (CFRS). The Recalling for analyses.
Sentences subtest of the CELF (Semel et al., 2003) Morphological Derivation Task. A Morphological
assessed children’s abilities to listen to spoken sen- Derivation Task described by Wagner (n.d.)
tences of increasing length and complexity and assessed children’s knowledge of derivational mor-
repeat these sentences without changing word phology. Only children in G1–3 completed this
meanings, inflections, derivations or comparisons, measure. The assessor presented the child with a
or sentence structure. All children completed this base word (e.g., farm) and an incomplete sentence
subtest, although two items were added to better for which he or she provided a derived form of the
accommodate PK children. Specifically, the first two base word (e.g., My uncle is a ______). Responses
items from the Recalling Sentences subtest of the were tallied to provide the raw score utilized in
CELF: Preschool, II edition (Wiig, Secord, & Semel, analyses.
2004) were administered as the first two test items,
followed by the test items on the Recalling Sen-
Discourse
tences subtest of the CELF in the designated order.
For each item, the child was asked to listen to a Five measures of discourse skills were adminis-
sentence read by the assessor and to repeat it ver- tered to examine comprehension monitoring, text
batim. Total points were summed to create the raw structure knowledge (specific to narratives), and
scores utilized in analyses. inference. Measures given differed slightly by
Past tense probe (TEGT). The past tense probe of grade, as shown in Table 1. We provide reliability
the Rice/Wexler Test of Early Grammatical Impair- data for these researcher-designed measures.
ment (TEGI; Rice & Wexler, 2001) assessed chil- Comprehension monitoring—Knowledge Violations
dren’s production of regular and irregular past Task. A researcher-developed measure based on
tense verbs, and was administered to PK and K work of Cain and Oakhill (Cain & Oakhill, 2006;
children only. The child was presented with a pic- Oakhill & Cain, 2012) was used to assess compre-
ture and verbal description of a boy or girl com- hension monitoring in PK and K children. The child
pleting an action using the present participle verb listened to short stories that were either entirely
form. The child was then asked to respond as to consistent or included inconsistent information, as
what the boy or girl did, requiring use of the past in a story about a rabbit in a vegetable patch who
tense form of the verb. The total raw score was uti- “especially liked the ice cream that grew there.”
lized in analyses. After each story, the child was asked whether the
Third-person singular probe. The third-person sin- story made sense and, if not, what was wrong with
gular probe of the TEGI (Rice & Wexler, 2001) the story. The task included five practice stories
assessed children’s abilities to produce /-s/ or /-z/ and seven test stories, five of which included incon-
in present tense verb forms with singular subjects. sistent information. Children received 1 point for
This probe was administered to PK and K children each inconsistent story for which they correctly
only. The child was presented with a picture and answered the sense question and correctly identi-
name of a person with a specific job (e.g., teacher) fied the incorrect information in the story. These
and asked what that person does (e.g., teaches). points were tallied to create the raw score utilized
The total raw score was utilized in analyses. in analyses (possible range = 0–10). Internal consis-
Test for Reception of Grammar. The Test for tency measures (Cronbach’s alphas) in the present
Reception of Grammar–Version 2 (Bishop, 2003) sample ranged from .80 to .81 across grades.
assessed children’s comprehension of English gram- Comprehension monitoring—Detecting Inconsisten-
matical contrasts marked by inflections, function cies Task (DI). A researcher-developed measure
words, and word order. Test items are arranged in based on the work of Cain and Oakhill (Cain &
20 blocks of four, each block assessing knowledge Oakhill, 2006; Oakhill & Cain, 2012) was used to
of the same grammatical contrast. For each item, assess comprehension monitoring for children in
the child was shown four pictures and the assessor G1–G3. The tasks included 5 practice stories and 12
read the accompanying sentence and asked the test stories that were either entirely consistent or
1954 LARRC
included inconsistent information; 8 test stories Inference (InfBK and InfInt). A researcher-devel-
were inconsistent. Similar to the Knowledge Viola- oped measure based on work by Cain and Oakhill
tions Task (KVT), the child listened to short stories (2006) and Oakhill and Cain (2012) was used to
and was asked whether the story makes sense and, assess children’s ability to generate two types of
if not, what was wrong with the story. The task inferences from short narrative texts: inferences that
was more difficult than the KVT, because identifica- require integration of two premises, and inferences
tion of inconsistent information required integration that require integration of information in the text
of two elements across two separate sentences (e.g., with background knowledge to fill in missing details.
On her way she lost her purse. When she got to the Following administration of a practice story, chil-
store, she took out her purse and bought her favorite dren listened to two stories, after which the assessor
candy). Children received a point for each inconsis- asked eight questions, reflecting four questions per
tent story for which they correctly identified the inference type. The total number of correct responses
incorrect information within the story. These points for the two types of inference were summed and
were tallied to create the raw score utilized in anal- averaged to produce two scores, one for background
yses (possible range = –8). Internal consistency knowledge and one for integration (possible
measures (Cronbach’s alphas) in the present sample range = 0–2 for each). Internal consistency measures
were .70 for G1, .73 for G2, and .54 for G3. (Cronbach’s alphas) for the averaged score across all
Narrative structure—Picture Arrangement Task. The items in the present sample ranged from .64 to .78
first 13 items of the Picture Arrangement Test from across grades.
the Wechsler Intelligence Scale for Children, 3rd ed.
(Wechsler & Kort, 2005) were adapted to create a
measure that assessed children’s text structure
Results
knowledge, specific to ordering narrative events
into a causally and temporally coherent sequence. Means and standard deviations for the primary
This task was administered to children in PK study measures are shown in Table 1. As expected,
through G2. The child was told that she would mean scores tended to increase across grades.
view some pictures that tell a story, but the story is Through inspection of skewness and kurtosis crite-
out of order. The assessor then showed the child a ria, histograms, and boxplots of the data, the major-
set of three to five picture cards in a fixed order ity of variables showed adequate distribution with
and read a sentence that described each. The child no severe departures from normality. Furthermore,
was asked to rearrange the pictures to put them in no extreme outliers were identified within the data
the correct sequence. One practice item and 12 test across any of the grades.
items were administered; a ceiling rule of five incor- Two interests were addressed in the primary
rect stories was applied. Internal consistency mea- analyses. The first interest was to examine dimen-
sures (Cronbach’s alphas) in the present sample sionality of language ability with respect to vocabu-
ranged from .76 to .88 across grades. lary, grammar, and discourse skills as a unitary
Narrative structure—Sentence Arrangement Task. A construct or as two distinct dimensions consisting
researcher-developed measure based on work by of lower level language (LLL) (vocabulary and
Oakhill and Cain (2012) and Stein and Glenn grammar) skills versus discourse skills. The second
(1982) assessed older children’s text structure interest was to examine if discourse, vocabulary,
knowledge, specific to ordering narrative events and grammar skills represent three distinct dimen-
into a causally and temporally coherent sequence. sions of language ability. Measures indicating each
This measure was administered to children in G2 of the three language dimensions assessed are iden-
and G3. The child was told that she would read tified in Table 1. Both interests were addressed
some sentences that tell a story, but the story is simultaneously by examining three models to deter-
out of order. The assessor then showed the child a mine the best conceptualization of language dimen-
set of 6–12 cards, with one sentence typed on each sionality across the grades PK to G3. Specifically,
card, in a fixed order and read each sentence we fitted a taxonomy of latent variable models
aloud to the child. The child was asked to rear- allowing for competing configurations of dimen-
range the sentences to put them in the correct sionality. These confirmatory factor analyses (CFA)
sequence. This measure consists of one practice were conducted in MPlus v7.11 (Muthen & Muthen,
story and four test stories. Internal consistency 1993–2012) using the clustering option and maxi-
measures (Cronbach’s alphas) in the present sam- mum likelihood estimation with robust standard
ple were .67–.69. errors to adjust for nonindependence within class-
rooms and slight non-normality of the data. Three ment of language dimensionality for each grade.
models were compared at each grade for quality of First, we considered a combination of absolute, par-
fit: a unidimensional model of language, a two- simony correction, and comparative indices of
dimensional model differentiating a LLL factor model fit (Brown, 2006; Byrne, 2012). These
(vocabulary and grammar) from a factor represent- included the root mean square error of approxima-
ing discourse, and a three-dimensional model sepa- tion (RMSEA), which ranges from 0 to 1; values
rating vocabulary, grammar, and discourse ability. < 0.08 or, more conservatively, 0.05, suggest accept-
Each lower dimension model can be obtained from able model fit (Browne & Cudeck, 1993; MacCallum
the higher dimension models by fixing the corre- & Austin, 2000). We report 90% confidence inter-
sponding factor correlations to 1.0; as such, these vals for the RMSEA and results of the closeness of
three models form a set of nested models and rela- fit test (Browne & Cudeck, 1993), which tests the
tive model fit can be assessed through the chi- null hypothesis that RMSEA ≤ 0.05; this test should
square difference test. In each confirmatory model be ns. We also used the comparative fit index (CFI),
we allowed for methods effects for measures from an incremental measure that compares the fit of the
the same instrument (e.g., both the comprehension theoretical model to a null model which assumes
and production parts of the Word Classes, and also the indicator variables in the model are uncorre-
for the One-Word Picture Vocabulary Tests), but no lated (i.e., an independence model). CFI values
other modifications were made. A graphical depic- > 0.90 indicate a good fitting model (Hu & Bentler,
tion of these confirmatory models for the PK data 1999; Lomax, 2013; Schumacker & Lomax, 2004).
can be found in Figure 1; similar models were con- Comparison of nested models based on the CFI can
structed for corresponding measures in K and each provide reasonable support for the more parsimo-
of G1 through G3. nious model if the incremental change in CFI < 0.01
Recent recommendations on model fit and (Moran et al., 2013). The standardized root mean
assessment of model quality have called for a con- square residual (SRMR) is an absolute measure of
sideration of several fit criteria rather than a strict model fit and is the standardized difference
reliance on any single criteria, and this is the strat- between the observed and predicted correlations; a
egy through which we evaluated our models (e.g., model that fits well to the data should have an
Hancock & Mueller, 2010, 2013; Hu & Bentler, 1995, SRMR < 0.05 (Byrne, 2012). Model fit was also eval-
1999; Kline, 2013; Lomax, 2013; McCoach, Black, & uated using Akaike’s information criteria (AIC), a
O’Connell, 2007; Moran, Marsh, & Nagengast, log-likelihood measure of fit useful for comparing
2013; Tomarken & Waller, 2003, 2005). Correspond- competing (typically non-nested) models and for
ingly, we used several approaches in our assess- which smaller AICs indicate better fit (Kline, 2013).
Figure 1. Graphical depiction of confirmatory models for the prekindergarten data. L = language; LLL = lower level language;
D = discourse; G = grammar; V = vocabulary. The figure is an exemplar of confirmatory models, as some measures change across
grade (see Table 1 for list of measures).
1956 LARRC
In addition to examination of the model fit contains information on the AVE for the set of
indices, we used the Satorra–Bentler rescaled chi- measures theoretically presumed to relate to each
square difference test for statistically comparing factor in the models, as well as the SFCs for the
nested models (S–B v2; Satorra & Bentler, 1994). If two- and three-dimensional models. In what fol-
the scaled chi-square difference is not statistically lows, we present results for each of the five
significant, the more parsimonious (i.e., the nested grades examined in this study. We provide the
or more restrictive) model is retained as having greatest depth of interpretation for the PK results,
model fit no worse than the more complex model. as details provided therein can then be extrapo-
If the difference is statistically significant, we are lated for the other grades. Construct reliability
not able to retain the more parsimonious model. (coefficient H) for the final model we selected
In evaluating model quality, we note that within each grade is also reported (Hancock &
descriptive fit statistics as well as results of statisti- Mueller, 2001).
cal comparisons of nested models are susceptible to
sample size and other design features (McCoach
et al., 2007; Tomarken & Waller, 2005). Thus, as Prekindergarten: Dimensionality of Language Ability
advocated in the literature (e.g., Lomax, 2013; Muel-
Model Fit Criteria
ler & Hancock, 2010), we viewed the collection of
model fit indices as a guide rather than rely on a The constellation of fit statistics (see Table 2) was
single indicator or test as indicative of the “best” very similar across the three models of dimension-
representation of language. We also considered sub- ality being compared. For example, RMSEA was
stantive interpretation of the CFAs, particularly lowest for the two-dimensional model at 0.062, but
regarding discrimination between latent factors for the confidence intervals for all three models were
models with increasing dimensionality. similar, and the RMSEA estimates for the unidi-
Accordingly, in our final approach to examining mensional and three-dimensional models were simi-
support for increasing dimensionality across grades, lar at 0.065 and 0.063, respectively. Corresponding
we considered the extent to which the latent factors tests for closeness of fit (i.e., RMSEA ≤ 0.05) were
in the multidimensional models represented distinct not significant for any of the three models, with p-
components of language. For each competing close values of .019, .045, and .036, respectively.
model by grade, we reviewed the factor correla- Nonstatistically significant values for the “close-
tions. We also calculated the amount of variance in ness” test indicate acceptable fit, although Brown
a measure that is explained by a factor by squaring (2006) mentions that there is a tendency among
the standardized factor loadings and then averaged methodologists to prefer a p-close value > .50 in
these to estimate the average variance extracted order to claim that the RMSEA is sufficiently small.
(AVE; Hair, Black, Babin, & Anderson, 2010; Nete- None of these models achieved that boundary.
meyer, Bearden, & Sharma, 2003). Comparing the Among the three models for PK, the CFI and SRMR
degree to which measure variance is captured by a values were the same. The work of Moran et al.
factor (AVE) relative to the shared variance (2013) suggests that the more parsimonious of these
between constructs (squared factor correlations models can reasonably be supported given the lack
[SFCs]) for the two- and three-dimensional models of incremental change in the CFI. Overall, however,
provides information about how well the constructs this collection of fit indices did not point to
are distinct within each model. If the proportion of unequivocal support for one model of language
variance extracted by a factor is less than the pro- dimensionality at PK over another.
portion of variance shared between those factors,
there is little support for discrimination between
Nested Model Comparisons
the factors (Fornell & Larcker, 1981; Hair et al.,
2010; Netemeyer et al., 2003). Consequently, our Next, we examined the (adjusted) S–B v2
decisions on the best dimensional representation of difference tests for each pair of models (see
language for each grade were based on the overall right-hand section of Table 2, column titled
constellation of model fit criteria, consideration of Adjusted Dv2). This test assesses whether the fit
discriminability between latent factors, and tests of of the simpler (lower dimensioned) model is
nested model comparisons. appreciably worse than that of the more complex
Model fit statistics and results for the adjusted (higher dimensioned) model. Results indicated
chi-square comparison tests across models within that the one-dimensional model provided slightly
each grade are provided in Table 2. Table 3 worse fit relative to the two-dimensional and
Table 2
Results of Confirmatory Models by Grade
MLR scaling RMSEA, p close Adjusted r for latent

Model v2 df p factor (90% CI) CFI SRMR AIC Δv2 variables
Prekindergarten (n = 420)
Unidimensional 170.84 62 < .001 1.02 0.065, p = .019 [0.053, 0.076] 0.96 0.04 2,6387.03 —
Two-dimensional 159.70 61 < .001 1.02 0.062, p = .045 [0.050, 0.074] 0.96 0.04 2,6376.68 9.13a,** rLLL-D = .91
Three-dimensional 157.40 59 < .001 1.02 0.063, p = .036 [0.051, 0.075] 0.96 0.04 2,6379.11 13.34b,** rG-D = .90
1.88c rV-D = .94
rG-V = .99
Kindergarten (n = 124)
Unidimensional 60.53 62 .529 1.01 0.000, p = .939 [0.000, 0.052] 1.00 0.04 7,727.97 —
Two-dimensional 59.73 61 .522 1.00 0.000, p = .935 [0.000, 0.053] 1.00 0.04 7,728.24 0.89a rLLL-D = .95*
Three-dimensional 56.58 59 .565 1.00 0.000, p = .944 [0.000, 0.051] 1.00 0.04 7,729.30 3.75b rG-D = .92*
3.29c rV-D = .98*
rG-V = .98*
G1 (n = 125)
Two-dimensional 71.86 51 .029 1.01 0.057, p = .332 [0.018, 0.086] 0.98 0.05 6,834.08 17.82a,** rLLL-D = .85*
Three-dimensional 68.03 49 .037 1.00 0.056, p = .362 [0.014, 0.086] 0.98 0.05 6,833.88 18.43b,** rG-D = .82*
3.83c rV-D = .95*
rG-V = 1.06*,d
G2 (n = 123)
1.51c rV-D = .86*
rG-V = 1.03*,d
G3 (n = 122)
6.19c,* rV-D = .70*
rG-V = .90*
Note. MLR = robust maximum likelihood; RMSEA = root mean square error of approximation; CFI = comparative fit index;
SRMR = standardized root mean square residual; AIC = Akaike’s information criteria; LLL = lower level language.
a
Unidimensional versus two-dimensional models (df = 1). bUnidimensional versus three-dimensional models (df = 3). cTwo-dimensional
versus three-dimensional models (df = 2). dInadmissible parameter estimate.
*p < .05. **p < .01.
three-dimensional models, v2(1) = 9.13, p < .01 grammar and vocabulary measures) and discourse
and v2(3) = 13.34, p < .01, respectively, and there skill (D) was very high at .91. Brown (2006) rec-
was no statistical difference between the two- ommends factor correlations lower than .80 or .85
and three-dimensional models, v2(2) = 1.88, for good discrimination of latent constructs. Thus,
p > .05. Thus, statistically speaking, the two- discrimination among the two constructs can be
dimensional model would be supported. Given characterized as poor, and it is not clear that a
the known problems with distribution and sensi- two-dimensional representation of language ability
tivity to sample size of the chi-square difference provided a substantive improvement over that of
test (Tomarken & Waller, 2003), we next exam- the unidimensional model. We found similar
ined the factor correlations and discriminability results for the factors in the three-dimensional
for the multidimensional solutions. model representing grammar (G), vocabulary (V),
and discourse (D). All factor correlations were
very high at .90 or larger for each pair of latent
Discrimination Between Factors
variables, suggesting that discrimination among
For the two-factor solution, the correlation these constructs was weak for the three-dimen-
between the constructs of LLL (consisting of sional model as well.
1958 LARRC
Table 3
Grade-Specific Average Variance Extracted (AVE) by Each Factor, and Squared Factor Correlations (SFCs) for the One-, Two-, and Three-Dimen-
sional Models of Language
Two-dimensional model Three-dimensional model

One-dimensional model
AVE AVE
SFC (%) SFC (%) SFC (%) SFC (%)
Grade AVE LLL D LLL versus D G V D G versus D V versus D G versus V
PK 45.4 48.9 43.2 83.4 47.4 50.7 43.2 80.6 87.6 99.6
K 48.7 52.6 43.6 90.6 52.4 54.6 43.7 83.9 95.5 95.3
G1 46.8 53.5 43.0 72.3 61.6 41.0 43.0 67.7 90.6 > 100.0a
G2 44.6 51.6 43.7 64.5 57.9 43.2 43.8 61.6 73.6 > 100.0a
G3 43.2 55.3 29.8 61.3 56.4 62.6 29.8 65.3 48.9 80.6
Note. PK = prekindergarten; K = kindergarten; G1 = Grade 1; G2 = Grade 2; G3 = Grade 3; LLL = lower level language, D = discourse,
G = grammar, V = vocabulary..
a
Inadmissible solution.
Standardized factor loadings in all three models

Summary: PK Results
of language dimensionality (not shown) were statis-
tically significant and the majority of loadings in For the three specific models we tested, results of
each CFA were > 0.70. All but one measure had stan- the chi-square comparison tests support the two-
dardized loadings > 0.50; the lowest standardized dimensional model for language in PK. However,
loading in the PK CFAs was 0.49 in the unidimen- we note some ambiguity in claiming that the collec-
sional model for the past tense probe of the TEGI. tion of evidence is strong in support of this model.
For each factor, the factor loadings can be squared First, fit criteria were acceptable and nearly equal
and summed to find the AVE; when compared to the across all of the models, and the CFI in particular
SFCs, the AVE provides a more rigorous assessment was no different for these models, providing some
of discrimination among factors than simple review evidence in favor or the more parsimonious solution
of the factor correlations (Fornell & Larcker, 1981; (i.e., one-dimensional model; Moran et al., 2013).
Hair et al., 2010; Netemeyer et al., 2003). Second, factor correlations are very high for the two-
The top row of Table 3 contains the AVE for the dimensional model, and investigation of the AVE by
PK data across the factors for the one-, two-, and each factor relative to the squared multiple correla-
three-dimensional models. For the unidimensional tions between factors suggests that meaningful dif-
model, the AVE for the single factor was 45.4%. In ferences between the two dimensions are not readily
the two-dimensional model, the LLL construct is discerned. We estimated construct reliability for the
able to explain an average of 48.9% of the variance uni- and two-dimensional models using coefficient
in the set of its indicator G and V measures, with D H (Hancock & Mueller, 2001). For the unidimen-
explaining an average of 43.2% of the variance in sional model, reliability was high with H = .93. For
its set of discourse skill indicators. These AVEs are the two-dimensional model, reliability was still high
substantially lower than the variance shared for the LLL factor (grammar and vocabulary skills),
between the constructs of LLL and D, which was H = .91, but was lower for the D factor, H = .782.
83.4%, and thus discrimination between factors is Although an argument could be made in support of
not supported (e.g., Netemeyer et al., 2003). This the two-dimensional model based strictly on the
finding is similar to the results for the three-factor scaled chi-square difference tests, it seems unlikely
model. All three AVEs for the three-factor G, V, that the two-dimensional model yields a stronger
and D model are substantially smaller than the substantive interpretation to the data over the parsi-
shared variance between these factors. The latent mony available through the one-dimensional model,
variables in the two- and three-dimensional repre- at least for the specific models that we have com-
sentations of language considered here shared more pared here at PK. Thus, from a substantive perspec-
variance between them than they were able to tive, the unidimensional model at PK seems
extract from their corresponding set of measures; preferable to the two separate but highly correlated
thus, the latent factors in the multidimensional factors (rLLL-D = .91, with r2 = 83.4%) obtained for
models are not reasonably distinct. the two-dimensional model.
constructs for G and V (rG-V = 1.06). This is not an

Kindergarten: Dimensionality of Language Ability
atypical occurrence in CFA and suggests that model
Model Fit Criteria modifications are warranted. Given our specific ser-
ies of theoretical and confirmatory models to be
Examination of model fit criteria showed no clear
compared, we did not consider adaptations to the
preference for one representation of language over
three-dimensional model. The two-dimensional
another (with more consistency across models than
model fit criteria were slightly improved over the
the fit results for the PK data; see Table 2).
one-dimensional model (Table 2, third section). For
example, the chi-square value was smaller, the p-
Nested Model Comparisons close value was larger, the CFI was larger, and the
AIC was smaller for the two-dimensional model.
The unidimensional model of language in
SRMR was similar for the uni- and two-dimen-
kindergarten was retained, as none of the adjusted
sional models. None of the models attained sug-
chi-square difference tests were statistically signifi-
gested cutoffs for the RMSEA. Despite larger values
cant (Table 2, adjusted Dv2).
on the RMSEA for all three values, the remaining
criteria suggest acceptable model fit for the two-
Discrimination Between Factors dimensional model: LLL versus D ability.
Further evidence supporting unidimensionality

of language for the kindergarten data comes from Nested Model Comparisons
examination of the factor correlations in the multi- A statistical comparison of the uni- versus two-
dimensional models, which are all very large (last dimensional models of language (Table 2) found
column of Table 2) and are above the recommenda- that the unidimensional model fit significantly
tions for good discrimination of latent constructs in worse than the two-dimensional model,
a multifactor model. Standardized factor loadings v2(1) = 17.82, p < .01. Thus, there is support for the
in all three models of language dimensionality (not two-factor solution.
shown) were statistically significant and the major-
ity of loadings in each CFA were > 0.70. Three
measures had loadings below 0.50 in the unidimen- Discrimination Between Factors
sional and two-dimensional models, and only two The correlation between the LLL and D ability
had loadings below 0.50 in the three-dimensional factors in the two-factor solution is quite high,
model. As shown in Table 3, the AVEs for the rLLL-D = .85, calling into question the distinctiveness
latent factors in the two- and three-dimensional of these two factors. However, Brown (2006) notes
models are less than the large proportion of vari- that .85 (or more conservatively, .80) serves as an
ance each factor shares with the other factor(s) in appropriate upper bound for discriminability
that model (SFCs); thus, in kindergarten there is no among factors; thus, although high, separate contri-
evidence or support for distinct constructs of LLL butions to language by these two factors are likely
versus D, or for G versus V versus D ability. to be substantively meaningful. To further under-
stand this overlap, we reviewed the discriminant
validity properties for the two-dimensional model.
Summary: K Results
Standardized factor loadings in all three models of
Findings suggest that the best representation of language dimensionality (not shown) were statisti-
dimensionality of language ability at kindergarten cally significant and the majority of loadings in
is the unidimensional model. Construct reliability each CFA were > 0.70. Two measures had loadings
was high, with H = .94. below 0.50 in the unidimensional model, and only
one was below this criteria in the two-dimensional
and the three-dimensional models. The AVE results
G1: Dimensionality of Language Ability (Table 3) indicate that the variance extracted for
each of the two factors in the two-dimensional
Model Fit Criteria
model is still lower than the variance shared
Examination of model fit criteria suggests that between the two factors. Although we would like
we can discount the three-factor model due to to see the AVEs be larger than the SFCs for distinct
inadmissible values for one of the factor correla- constructs of LLL versus D, in G1 relative to K and
tions, with a correlation > 1.0 between the latent PK, the amount of shared variance (SFCs) in the
1960 LARRC
two-dimensional models is decreasing, and the respectively) were lower than the variance shared
amount of variance extracted by each factor (AVEs) between the two factors (64.5%). This discrepancy
in these models is somewhat increasing. was not what we would desire for distinct con-
structs of LLL versus D language skills, but there
was continuing evidence across grades that the
Summary: G1 Results
amount of shared variance (SFCs) in the two-di-
Considering the model fit criteria and the chi- mensional models was gradually decreasing, and
square difference test, the two-dimensional model the strength of these language skills as independent
of language ability at G1 is preferred. Construct dimensions of language (AVE) was improving.
reliabilities are moderate to high; for LLL, H = .92,
and for D, H = .76. There is considerable overlap
Summary: G2 Results
between the constructs of LLL and D skills
(r2 = 72.3%) but despite this overlap, we begin to Based on the model fit criteria and the chi-square
see an emergence of language dimensionality speci- difference test, the two-dimensional model best rep-
fic to LLL and D skills at first grade. resents language ability at G2. Construct reliabilities
were moderate to high; for LLL, H = .92, and for D,
H = .77. We note again the considerable overlap
G2: Dimensionality of Language Ability between the constructs of LLL and D skills
(r2 = 64.5%) at second grade, but continue to detect
Model Fit Criteria
evidence for the emergence of differentiation
Similar to the G1 results, the three-factor model between these two constructs.
can be discounted due to inadmissible values for
one of the factor correlations (rG-V = 1.04 between
the latent factors for G and V in the three-dimen- G3: Dimensionality of Language Ability
sional solution). Comparing across the uni- and
Model Fit Criteria
two-dimensional results (Table 2), the model fit cri-
teria were better for the two-dimensional model. The final section of Table 2 contains model fit
The upper CI boundary on the RMSEA was slightly statistics for the three confirmatory factor models
beyond 0.08, but all other criteria were within for the G3 sample. The three-dimensional solution
acceptable limits. Thus, based strictly on these crite- is interpretable here, with admissible parameter
ria, there is support for the two-factor solution. estimates for the factor correlations. All of the
model fit criteria were best for the three-dimen-
sional solution; we note that the upper bound on
the RMSEA confidence interval was, however, out-
A statistical comparison of the one- versus two- side the recommended 0.08 cutoff.
dimensional models of language (Table 2, far-right
column) yielded preference for the two-factor solu-
tion, v2(1) = 17.97, p < .01.
The two-dimensional and three-dimensional
models yielded better fit compared to the one-
dimensional model, v2(1) = 7.51, p < .01 and
As we saw with the G1 sample, the correlation v2(3) = 12.87, p < .01, respectively (see Table 2), but
between the two factors of LLL and D was high, there was also a statistical difference between the
rLLL-D = .80, but within an acceptable range for dis- two- and three-dimensional models, v2(2) = 6.19,
criminability between the factors. Standardized fac- p < .05. Based on these results, the representation of
tor loadings in all three models of language language in the three-dimensional model that sepa-
dimensionality (not shown) were statistically signif- rates G, V, and D language skills fit the data signifi-
icant and the majority of loadings in each CFA cantly better than either lower order solution.
were > 0.70. Two measures had loadings below
0.50 in the uni- and two-dimensional models, and
only one was below this criteria in the three-dimen-
sional model. The AVE results (Table 3) indicate Factor correlations, however, were high in both
that the AVEs for each of the two factors in the the two- and three-dimensional solutions, suggest-
two-dimensional model (51.6% and 43.7%, ing that while there may be a statistical preference
for a three-dimensional model that isolates the lan- in several important ways. First, we found that
guage skills of G, V, and D, there were also rela- when discourse skills were added to the study of
tively strong correlations between these constructs. language skills, there was some ambiguity in the
In particular, the strongest correlation in this set identification of dimensionality among the youngest
was between the constructs of G and V in the children in our study (children in prekindergarten).
three-dimensional model (rG-V = .90), formerly part By kindergarten, our data showed that the con-
of the same construct in the two-dimensional mod- structs we were investigating, namely, grammar,
els supported in G1 and G2. Standardized factor vocabulary, and discourse, formed a single con-
loadings in all three models of language dimension- struct. Second, we demonstrate a potentially emer-
ality (not shown) were statistically significant and gent dimensional structure of language, resulting in
the majority of loadings in each CFA were > 0.70. the separation of vocabulary and grammar, which
Two measures had loadings below 0.50 in the uni- hang together, and discourse at first and second
dimensional model, and only one was below this grades, followed by differentiation of vocabulary,
criteria in the two- and three-dimensional models. grammar, and discourse by 8 years of age.
The AVE and SFC results provided in Table 3 indi- Concerning the first major finding, this work
cate that the AVE for the first two of the three fac- built upon an earlier examination of the dimension-
tors in the three-dimensional model (56.4%, 62.6%, ality of language in vocabulary and grammar (see
and 29.8%, respectively) were somewhat close to Tomblin & Zhang, 2006). That work indicated that
but not exceeding the variance shared between the language ability is initially unidimensional but
three factors; there was also a very large 80.6% esti- becomes increasingly multidimensional as children
mate of variance shared between the latent vari- move across the primary grades and into middle
ables of G and V in the three-dimensional model. school. Complementing those initial findings, the
Thus, despite statistical preference for the three- present work looked intensively at the dimensional-
dimensional solution, overlap among at least two ity of language ability for children who were 4–
factors is large. However, in the three-factor compo- 8 years of age, corresponding to the time period in
sition these three separate components of language which children typically enter formal schooling (at
skill—namely, G, V, and D—seem to be emerging prekindergarten) and transition through periods of
as independent dimensions of language in older learning to read to reading to learn (third grade).
children. We note, however, that the average vari- Our results help us to understand more explicitly
ance explained by D in the three-dimensional the nature of children’s language ability during
model is lower here than was obtained in the previ- these formative years of schooling and reading
ous grades, suggesting that one or more of the indi- development. For the youngest children in our sam-
cators of D may not adequately represent this ple, language dimensionality was found to be
construct well in G3. somewhat unclear. Specifically, for children in
prekindergarten, a two-dimensional representation
of language emerged, whereas for kindergarten
Summary: G3 Results
children, language was observed as a single dimen-
Based on the model fit criteria and the chi-square sion of development. Although our results can be
difference test, the three-dimensional model was considered more robust than previous work in this
preferred for third grade, with a cautionary note on area due to the latent approach utilized in analyses,
the overlap particularly between the G and V fac- we acknowledge these findings are tentative given
tors in the three-dimensional model. Construct reli- the small size of our kindergarten sample. Consid-
abilities were moderate; for G, H = .85; for V, ering the instability of language observed for the
H = .87; and for D, H = .65. In G3, differentiation of youngest children in our study, future research,
G and V as separate skills from D ability appeared particularly work that is longitudinal in nature, is
to occur, although these skills were correlated. warranted to disentangle our mixed findings
regarding the possibility that discourse-level skills
may in fact be separable from other language con-
structs at very early ages. For the older children,
Discussion
language ability was best represented as two- or
This examination of the dimensionality of language even three-related dimensions. In fact, by third
ability in 4- to 8-year-old children significantly grade, we see that vocabulary, grammar, and dis-
extends our understanding of language develop- course appear to be emerging as distinct skills, and
ment during this critical period of early childhood that discourse language skills, such as inference
1962 LARRC
and comprehension monitoring, are a distinguish- higher level interactions with texts (orally or read)
able dimension of language. as a major part of reading instruction. To this point,
These findings of a unitary dimension for vocab- consider that during the course of early language
ulary and grammar until third grade are not dis- development, the written texts that children experi-
crepant with those of the earlier work. Although ence show gradual increases in the complexity of
Tomblin and Zhang (2006) preferred the two-factor the morphological structure of words, the syntactic
solution only for children in eighth grade, the CFI sophistication of sentences, and the number of
values for their two-factor model for the earlier propositions that have to be integrated across the
grades all indicated a good fit. Together, these two text as a whole. Children’s interactions with these
studies with different samples and test instruments increasingly complex written texts during the first
similarly show the emergence of separate constructs few years of literacy instruction may enhance the
of vocabulary and grammar. Thus, although these structure of language dimensions that we find.
aspects of language are often grouped together as Consistent with this argument is that the increasing
LLL skills, the two studies indicate that this level of challenge of texts in the early school years provides
description may be relevant only for younger chil- one explanation for the identification of a group of
dren. Furthermore, this finding fits with the theoret- poor readers whose problems only become appar-
ical conceptualizations of vocabulary and grammar ent in fourth grade as these children struggle to
as lower level or foundational skills that support comprehend what they read (Catts, Compton,
the reading comprehension later (Lepola et al., Tomblin, & Bridges, 2012). Similarly, it has been
2012). shown that interactions with literacy drive phone-
The emergent dimensionality seen among young mic awareness (Castles & Coltheart, 2004); thus,
children in their development of language makes it schooling impacts on the structure and refinement
clear that language is a complex construct. Our data of oral, as well as written language. This noted, the
support the viewpoint that there are dimensions results here do not imply that discourse language
within this construct that are, to some extent, sepa- skills, such as comprehension monitoring and infer-
rable. However, why are these dimensions not ence, are not observed among young children as
wholly apparent until later in children’s develop- we have already discussed; rather, it appears to be
ment? One explanation for our finding of emergent that case that around first grade these begin to
dimensionality is that a hierarchy of language clearly represent a separable dimension of language
exists, from words to sentences to discourse (Tom- ability from lower level skills.
blin & Zhang, 2006). As the dimensions lower The second major finding of this work concerns
down in the hierarchy become consolidated and the clear emergence of discourse-level language
automatized in use, as occurs early in development, ability as a specific dimension of language ability
cognitive resources are freed up to support more among the older children in this sample. At first
complex processing, as would be required for com- grade, we found two distinct dimensions of lan-
prehension monitoring and inference. This may guage ability—one representing the lower level
arise because a critical base of vocabulary is needed skills of vocabulary and grammar and the other
for grammar and, similarly, a critical base of gram- representing discourse-level skills, and this distinc-
mar is needed to understand discourse. As noted in tiveness held through third grade, albeit with sepa-
our Introduction, different phases of grammar ration of lower level skills (vocabulary and
development have been linked to lexical knowledge grammar) observed for the third graders. This is a
during the first 2 years of life (Marchman & Bates, unique finding because no previous studies of lan-
1994). guage dimensionality have included assessment of
We do not believe that this is an adequate expla- skills essential to discourse.
nation because there is evidence for the discourse Limitations of this work warrant note, and we
language skills that we discuss (comprehension highlight three that are most salient. The first con-
monitoring, knowledge of narrative, and inference) cerns the measures used to represent discourse skill.
in the first 3–4 years of life (Skarakis-Doyle & Several of these measures have never before been
Dempsey, 2008; Tomasello et al., 2007). An alterna- used with children as young as those in this study,
tive explanation as to why this emergent dimen- and thus further examination of their validity with
sionality may occur at a particular time point, is the younger groups is needed. In addition, reliabil-
that language abilities are likely affected by chil- ity measures for 3 of the 20 measures of discourse
dren’s literacy experiences within schooling and the used across all grades were found to be below com-
increasing demand to apply linguistic resources to monly accepted cutoffs. Specifically, the alpha for
the experimental discourse measure, detecting on subtests designed to assess specific dimensions
inconsistencies, was found to be particularly low of LLL ability (i.e., vocabulary and grammar) for
(Cronbach’s alpha = .54 in third grade). However, prekindergarteners through second graders, as these
in each of the models where discourse skill formed do not appear to reflect specific, distinguishable
its own dimension, the overall construct reliabilities abilities. Here, we are echoing a point made by
for the discourse factor as measured through coeffi- Tomblin and Zhang (2006) for older children.
cient H (Hancock & Mueller, 2001) were acceptable, Rather, for children at these ages, their vocabulary
at .76 and .77 for first and second grades, respec- and grammar reflect a single underlying trait, and it
tively, although somewhat lower for third grade at seems best to rely on omnibus assessments of lan-
.65. While additional measurement work should be guage skill to explore individual differences among
undertaken to improve the DI measure, construct children and, potentially, to identify children who
reliability is not diminished by the addition of a have underdeveloped language skills warranting
weak indicator to a measurement model that attention. Second, with respect to instruction, the
includes a collection of strong indicators (Hancock study supports the notion that for first- through
& Mueller, 2001). Nonetheless, the measures reflect- third-grade children, lower and discourse-level lan-
ing discourse skill overall could be improved. The guage skills reflect distinguishable underlying traits.
second limitation concerns the cross-sectional nat- While there is increasing awareness of the impor-
ure of the data. Prior work examining the dimen- tance of discourse-level language skills to reading
sionality of language ability has used longitudinal development, it is not clear that educators often seek
methods rather than the cross-sectional approach to identify children who may have limitations in
used here (Tomblin & Zhang, 2006). A limitation of this dimension of language ability. Research show-
the cross-sectional design is that it essentially pro- ing that discourse skills appear underdeveloped in
vides only a “snapshot” of a phenomenon at a children who have comprehension-specific reading
given point in time. However, given that some of deficits in the later primary grades (e.g., Adlof,
the measures across grades varied, examining the Catts, & Little, 2006) argues for the importance for
models cross sectionally is a reasonable approach to monitoring these skills early in development.
understanding dimensionality at each grade. While
we may infer that differences seen in the dimen-
sionality of language ability would be borne out
References
with longitudinal assessment, this requires addi-
tional empirical investigation. Adlof, S., Catts, H., & Little, T. (2006). Should the sim-
A third limitation is that different theoretical ple view of reading include a fluency component?
explanations could be considered. It could be Reading and Writing, 19, 933–958. doi:10.1007/s11145-
argued that the measures of discourse are simply 006-9024-z
American Psychiatric Association. (2012). Diagnostic and
“more difficult,” perhaps because they place a
statistical manual of mental disorders. Arlington, VA:
greater demand on cognitive resources, such as Author.
working memory, and that if more challenging Bates, E., Dale, P., & Thal, D. (1995). Individual differ-
vocabulary or grammar assessments were included, ences and their implications for theories of language
the three-dimensional split would not arise. There development. In P. Fletcher & B. MacWhinney (Eds.),
is some evidence for this viewpoint. A recent study Handbook of child language (pp. 96–151). Oxford, UK:
of children with SLI found that performance on Blackwell.
subtests of standardized oral language assessments Berninger, V. W., & Abbott, R. D. (2010). Listening com-
did not neatly load onto separate factors of seman- prehension, oral expression, reading comprehension,
tics and syntax or comprehension versus produc- and written expression: Related yet unique language
tion language; instead a factor structure that systems in Grades 1, 3, 5, and 7. Journal of Educational
Psychology, 102, 635–651. doi:10.1037/a0019319
reflected the different language processing require-
Bishop, D. V. M. (1997). Uncommon understanding: Devel-
ments of the tasks was apparent (Hoffman, Loeb, opment and disorders of language comprehension in chil-
Brandel, & Gillam, 2011). dren. Hove, UK: Psychology Press.
While this work is largely meant to be theoretical Bishop, D. V. (2003). Test for reception of grammar (2nd
and descriptive in nature, designed to advance ed.). Minneapolis, MN: Pearson.
understanding of the nature of language ability Brown, T. A. (2006). Confirmatory factor analysis for applied
among young children, the results may have impli- research. New York, NY: Guilford.
cations to practice. First, from an assessment Browne, M. W., & Cudeck, R. (1993). Alternative ways of
perspective, this study does not support the reliance assesssing model fit. In K. A. Bollen & J. S. Long (Eds.),
1964 LARRC
Testing structural equation models (pp. 136–162). New- Hoffman, L. M., Loeb, D. F., Brandel, J., & Gillam, R. B.
bury Park, CA: Sage. (2011). Concurrent and construct validity of oral lan-
Byrne, B. M. (2012). Structural equation modeling with guage measures with school-age children with specific
Mplus. New York, NY: Routledge. language impairment. Journal of Speech, Language, and
Cain, K., & Oakhill, J. (2006). Profiles of children with Hearing Research, 54, 1597–1608. doi:10.1044/1092-4388
specific reading comprehension difficulties. British Jour- (2011/10-0213)
nal of Educational Psychology, 76, 683–696. doi:10.1348/ Hogan, T. P., Cain, K., & Bridges, M. S. (2012). Young
000709905X67610 children’s oral language abilities and later reading com-
Cain, K., Oakhill, J., & Bryant, P. (2004). Children’s read- prehension. In T. Shanahan & C. J. Lonigan (Eds.),
ing comprehension ability: Concurrent prediction by Early Childhood Literacy: The National Early Literacy Panel
working memory, verbal ability, and component skills. and Beyond (pp. 217–232). Baltimore, MD:: Brookes.
Journal of Educational Psychology, 96, 31–42. doi:10.1037/ Hu, L. T., & Bentler, P. M. (1995). Evaluating model fit.
0022-0663.96.1.31 In R. H. Hoyle (Ed.), Structural equation modeling: Con-
Castles, A., & Coltheart, M. (2004). Is there a causal link cepts, issues and applications (pp. 76–99). Thousand Oaks,
from phonological awareness to success in learning to CA: Sage.
read? Cognition, 91, 77–111. Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit
Catts, H. W., Compton, D., Tomblin, J. B., & Bridges, M. indexes in covariance structure analysis: Conventional
S. (2012). Prevalence and nature of late-emerging poor criteria versus new alternatives. Structural Equation
readers. Journal of Educational Psychology, 104, 166–181. Modeling, 6, 1–55.
doi:10.1037/a0025323 Kim, Y. S. (2014). Language and cognitive predictors of
Catts, H. W., Fey, M. E., Zhang, X., & Tomblin, J. B. (1999). text comprehension: Evidence from multivariate analy-
Language basis of reading and reading disabilities: Evi- sis. Child Development. Advance online publication.
dence from a longitudinal investigation. Scientific Studies doi:10.1111/cdev.12293
of Reading, 3, 331–361. doi:10.1207/s1532799xssr0304_2 Kline, R. (2013). Exploratory and confirmatory factor anal-
Catts, H. W., Hogan, T. P., & Fey, M. E. (2003). ysis. In Y. Petscher, C. Schatschneider, & D. Compton
Subgrouping poor readers on the basis of individual (Eds.), Applied quantitative analysis in education and the
differences in reading-related abilities. Journal of social sciences (pp. 171–207). New York, NY: Routledge.
Learning Disabilities, 36, 151–164. doi:10.1177/002221940 Lepola, J., Lynch, J., Laakkonen, E., Silven, M., & Niemi,
303600208 P. (2012). The role of inference making and other lan-
Dickinson, D., Golinkoff, R., & Hirsh-Pasek, K. (2010). guage skills in the development of narrative listening
Speaking out for language: Why language is central to comprehension in 4–6-year-old children. Reading
reading development. Educational Researcher, 39, 305– Research Quarterly, 47, 259–282. doi:10.1002/rrq.020
310. doi:10.3102/0013189X10370204 Lomax, R. (2013). Introduction to structural equation
Dunn, L. M., & Dunn, D. M. (2007). Peabody Picture Vocab- modeling. In Y. Petscher, C. Schatschneider, & D.
ulary Test (4th ed.). Minneapolis, MN: Pearson. Compton (Eds.), Applied quantitative analysis in education
Elsabbagh, M., & Karmiloff-Smith, A. (2006). Modularity and the social sciences (pp. 245–264). New York, NY:
of mind and language. In K. Brown (Eds.), Encyclopae- Routledge.
dia of Language and Linguistics (pp. 218–224). Amster- MacCallum, R. C., & Austin, J. T. (2000). Applications of
dam, Netherlands: Elsevier Science. structural equation modeling in psychological research.
Fornell, C., & Larcker, D. F. (1981). Structural equation Annual Review of Psychology, 51, 201–226. doi:10.1146/
models with unobservable variables and measurement annurev.psych.51.1.201
error: Algebra and statistics. Journal of Marketing Marchman, V. A., & Bates, E. (1994). Continuity in lexical
Research, 382–388. doi:10.2307/3150980 and morphological development: A test of the critical
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. mass hypothesis. Journal of Child Language, 21, 339–366.
(2010). Multivariate data analysis: A global perspective (7th doi:10.1017/S0305000900009302
ed.). Upper Saddle River, NJ: Pearson. McCoach, D. B., Black, A. C., & O’Connell, A. A. (2007).
Hancock, G. R., & Mueller, R. O. (2001). Rethinking Errors of inference in structural equation modeling.
construct reliability within latent variable systems. In S. Psychology in the Schools, 44, 461–470. doi:10.1002/
Du Toit, R. Cudeck & D. Sorbom (Eds.), Structural pits.20238
Equation Modeling: Present and Future (pp.195–216). Chi- Mills, D. L., Coffey-Corina, S., & Neville, H. J. (1997).
cago, IL: Scientific Software International. Language comprehension and cerebral specialization
Hancock, G. R., & Mueller, R. O. (2010). Structural equa- from 13 to 20 months. Developmental Neuropsychology,
tion modeling. In G. R. Hancock & R. O. Mueller 13, 397–445. doi:10.1080/87565649709540685
(Eds.), The reviewer’s guide to quantitative methods in the Moran, A. J. S., Marsh, H. W., & Nagengast, B. (2013).
social sciences (pp. 371–384). New York, NY: Routledge. Exploratory structural equation modeling. In G. R.
Hancock, G. R., & Mueller, R. O. (Eds.). (2013). Structural Hancock & R. O. Mueller (Eds.), Structural equation
equation modeling: A second course (2nd ed.). Charlotte, modeling: A second course (2nd ed., pp. 1–10). Charlotte,
SC: IAP. SC: IAP.
Mueller, R. O., & Hancock, G. R. (2010). Structural equa- Silva, M. T., & Cain, K. (2015). The relations between
tion modeling. In G. R. Hancock & R. O. Mueller lower- and higher-level oral language skills and their
(Eds.), The reviewer’s guide to quantitative methods in the role in prediction of early reading comprehension. Jour-
social sciences (pp. 371–383). New York, NY: Routledge. nal of Educational Psychology, 107, 321–331. doi:10.1037/
Muthen, L. K., & Muthen, B. O. (1993–2012). MPlus user’s a0037769
guide (7th ed.). Los Angeles, CA: Author. Skarakis-Doyle, E., & Dempsey, L. (2008). The detection
Nation, K., Cocksey, J., Taylor, J. S. H., & Bishop, D. V. and monitoring of comprehension errors by preschool
M. (2010). A longitudinal investigation of early reading children with and without language impairment. Jour-
and language skills in children with poor reading com- nal of Speech, Language, and Hearing Research, 51, 1227–
prehension. Journal of Child Psychology and Psychiatry, 1243. doi:10.1044/1092-4388(2008/07-0136)
51, 1031–1039. doi:10.1111/j.1469-7610.2010.02254 Stein, N. L., & Glenn, C. G. (1982). Children’s concept
Netemeyer, R. G., Bearden, W. O., & Sharma, S. (2003). of time: The development of a story schema. In W.
Scaling procedures: Issues and applications. Thousand J.. Friedman & H. Belin (Eds.), The Developmental Psy-
Oaks, CA: Sage. chology of Time (pp. 255–282). Waltham, MA: Aca-
Oakhill, J. V., & Cain, K. (2012). The precursors of read- demic Press. doi:10.1146/annurev.clinpsy.1.102803.
ing ability in young readers: Evidence from a four-year 144239
longitudinal study. Scientific Studies of Reading, 16, Tomarken, A. J., & Waller, N. G. (2003). Potential pro-
91–121. doi:10.1080/10888438.2010.529219 blems with "well fitting" models. Journal of Abnormal
Pinker, S. (1998). Words and rules. Lingua, 106, 219–242. Psychology, 112, 578–598.
Plunkett, K., & Marchman, V. (1993). From rote learning Tomarken, A. J., & Waller, N. G. (2005). Structural equa-
to system building: Acquiring verb morphology in chil- tion modeling: Strengths, limitations, and misconcep-
dren and connectionist nets. Cognition, 48, 21–69. tions. Annual Review of Clinical Psychology, 1, 31–65.
doi:10.1111/j.1469-7610.2010.02254.x Tomasello, M., Carpenter, M., & Liszkowski, U. (2007). A
Rice, M. L., & Wexler, K. (2001). Rice/Wexler Test of Early new look at infant pointing. Child Development, 78, 705–
Grammatical Impairment. San Antonio, TX: Psychological 722. doi:10.1111/j.1467-8624.2007.01025.x
Corporation. Tomblin, J. B., & Zhang, X. (2006). The dimensionality
Satorra, A., & Bentler, P. M. (1994). Corrections to test of language ability in school-age children. Journal
statistics and standard errors in covariance structure of Speech, Language, and Hearing Research, 49, 1193–
analysis. In A. v. E. C. C. Clogg (Ed.), Latent variables 1208.
analysis: Applications for developmental research (pp. 399– Wagner, R. K. (n.d.). Morphological Derivation Task. Talla-
419). Thousand Oaks, CA: Sage. hassee: Florida State University.
Schumacker, R. E., & Lomax, R. G. (2004). A beginner’s Wechsler, D., & Kort, W. (2005). WISC–IIInl: Wechsler
guide to structural equation modeling. New York, NY: Intelligence Scale for Children. San Diego, CA: Harcourt
Psychology Press. Assessment.
Schwartz, R. G. (2009). Handbook of child language disor- Wiig, E. H., Secord, W. A., & Semel, E. (2004). Clinical
ders. New York, NY: Psychology Press. Evaluation of Language Fundamentals Preschool (2nd ed.).
Semel, E., Wiig, E. H., & Secord, W. A. (2003). Clinical San Antonio, TX: Harcourt Assessment.
Evaluation of Language Fundamentals (4th ed.). Blooming- Williams, K. T. (2007). Expressive Vocabulary Test (2nd
ton, MN: Pearson. ed.). Bloomington, MN: Pearson.

The Dimensionality of Language Ability in Young Children: Language and Reading Research Consortium

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Dimensionality of Language Ability in Young Children: Language and Reading Research Consortium

Uploaded by

Copyright:

Available Formats

Child Development, November/December 2015, Volume 86, Number 6, Pages 1948–1965

The Dimensionality of Language Ability in Young Children

Dimensionality of Language Ability Empirical Investigations of Language Dimensionality

conﬁrmatory factor analysis. At the kindergarten,

lected for ﬁve waves of children (PK–G3) during

Lower level language (LLL)/Grammar (G)

MLR scaling RMSEA, p close Adjusted r for latent

Two-dimensional model Three-dimensional model

Standardized factor loadings in all three models

constructs for G and V (rG-V = 1.06). This is not an

Further evidence supporting unidimensionality

You might also like