SCIENTIFIC STUDIES OF READING, 10(3), 301–322 Copyright © 2006, Lawrence Erlbaum Associates, Inc.

Measures of Reading Comprehension: A Latent Variable Analysis of the Diagnostic Assessment of Reading Comprehension
David J. Francis
University of Houston

Catherine E. Snow
Harvard University

Diane August
Center for Applied Linguistics Washington, DC

Coleen D. Carlson
University of Houston

Jon Miller
University of Wisconsin-Madison

Aquiles Iglesias
Temple University

This study compares 2 measures of reading comprehension: (a) the Woodcock–Johnson Passage Comprehension test, a standard in reading research, and (b) the Diagnostic Assessment of Reading Comprehension (DARC), an innovative measure. Data from 192 Grade 3 Spanish-speaking English language learners (ELLs) were used to fit a series of latent variable analyses designed to explicitly test the discriminant vaCorrespondence should be sent to David J. Francis, Texas Institute for Measurement, Evaluation, and Statistics, 100 TLCC Annex, University of Houston, Houston, TX 77204–6022. E-mail: dfrancis@uh.edu

302

FRANCIS ET AL.

lidity and differential determinants of the 2 measures. Findings indicated that the 2 measures are related (r = .61) but distinct, and influenced by different factors. The DARC is less strongly related to word-level skills and more strongly related to measures of narrative language production and memory. Both tests are equally influenced by measures of nonverbal reasoning. These differential patterns of relations, which cannot be explained on the basis of differential reliabilities, reflect true differences in the processing demands of the tests for 3rd-grade ELLs.

Assessing reading comprehension is challenging, because it is a complex and multiply determined outcome (RAND Reading Study Group, 2002). Thus, students’ success in comprehending a text may be disrupted by difficulties with any of several precursor skill domains: print skills, reflected in measures of phonological awareness, word reading and/or nonword reading accuracy, and word reading efficiency (Adams, 1990; Gough & Tunmer, 1986; Perfetti, 1985; Vellutino, 1979, 1987); oral language skills, reflected in assessments of vocabulary, linguistic memory, and language processing (Bradley & Bryant, 1983; Gathercole & Pickering, 2000; Hulme, Muter, Snowling, & Stevenson, 2004); and extended discourse skills, reflected in measures of narrative production (Tabors, Snow, & Dickinson, 2001). In addition, students may have difficulties with “pure” comprehension skills—retaining information from the text, accessing relevant information in memory, making inferences that incorporate both those sources of information, and developing adjusted knowledge schemas that take them into account (Dixon, LeFevre, & Twilley, 1988; Engle, Nations, & Cantor, 1990; Haenggi & Perfetti, 1994; Palmer et al., 1985). Distinguishing among these different sources of failure in reading is crucial if we are to tailor instruction and intervention appropriately. It makes little sense to focus instruction exclusively on strategies for comprehension with students whose word reading skills are deficient or who have inadequate knowledge of the meaning of the words used in the text. Alternately, it makes little sense to focus time and instructional attention on comprehension strategies with students who are already strategic readers but whose comprehension is hampered by failures of fluency or word knowledge. In assessing the reading skills of English language learners (ELLs), it is particularly important to pinpoint sources of difficulty, because ELLs can have extremely uneven profiles of skills, for example, good word reading skills but very limited English vocabulary (Lesaux & Siegel, 2003) or good comprehension strategies but limited relevant background knowledge. The most widely used comprehension assessments fail to distinguish among the various sources of poor comprehension—which is appropriate, as they are designed to identify level of functioning rather than to provide diagnostic information. Our ultimate goal is to develop a reading comprehension assessment that, in conjunction with other targeted measures of precursor skills, could inform instruction by identifying students’ particular profiles of weakness and strength.

MEASURES OF READING COMPREHENSION

303

In this article, we take a first step toward a deeper understanding of reading comprehension assessments by exploring the relationship of precursor skills to comprehension on two distinct types of comprehension tasks: (a) one widely used, reliable, standardized portmanteau measure, the Woodcock–Johnson Language Proficiency Battery–Revised (WLPB–R) Passage Comprehension subtest (PC), and (b) one experimental, innovative, analytic measure, the Diagnostic Assessment of Reading Comprehension (DARC; August, Francis, Hsu, & Snow, in press). THE DARC The DARC was designed on the basis of previous test-development work by Potts and Peterson (1985) and Hannon and Daneman (2001). Potts and Peterson’s test isolated four processes hypothesized to occur during successful reading comprehension (Dixon et al., 1988; Engle et al., 1990; Haengi & Perfetti, 1994; Palmer et al., 1985): (a) recalling from memory new information presented in the text, which we call text memory; (b) making novel inferences based on information provided in the text, called text inferencing; (c) accessing relevant prior knowledge from long-term memory, called knowledge access; and (d) integrating accessed prior knowledge with new text information, called knowledge integration. Potts and Peterson validated their test by showing predictive relationships from total scores to performance on a general measure of reading comprehension and from scores reflecting the four components to other, independent tests of those components. The Potts and Peterson (1985) assessment used reading passages consisting of three sentences that described relations among a set of real and artificial terms, for example, “A JAL is larger than a TOC,” “A TOC is larger than a PONY,” and “A BEAVER is larger than a CAZ.” Combining the information in the text with world knowledge would in principle allow the construction of a five-item linear ordering (JAL > TOC > PONY > BEAVER > CAZ). Participants read and studied the paragraph and then responded to true–false statements of four types. Text memory statements (e.g., “A JAL is larger than a TOC”) tested information explicitly mentioned in the paragraph. Text inferencing statements (e.g., “A JAL is larger than a PONY”) required integrating information across propositions in the text (i.e., “A JAL is larger than a TOC”; “A TOC is larger than a PONY”); no prior knowledge was required. Knowledge access statements (e.g., “A PONY is larger than a BEAVER”) could be answered by accessing prior knowledge; no information from the text was required. Knowledge integration statements (e.g., “A TOC is larger than a BEAVER”) required integrating prior knowledge (ponies are larger than beavers) with a text-based fact (i.e., “A TOC is larger than a PONY”). Potts and Peterson (1985) found that knowledge integration correlated with the two text-based constructs—text memory and text inferencing—as well as with

304

FRANCIS ET AL.

knowledge access. However, knowledge access was not strongly correlated with the text-based constructs, suggesting that the ability to remember new information and the tendency to use world knowledge are separable. Hannon and Daneman (2001) confirmed the conclusions from Potts and Peterson’s work, using a version of the test that had more complex texts for use with university students. Both the correlations of total score with a global, standardized test of reading comprehension ability (the Nelson–Denny test of reading comprehension) and the correlation of individual construct scores with specific tests of those constructs proved reliable. August et al. (in press) built on these studies by piloting the DARC, a test designed specifically to minimize the impact of word reading accuracy or speed and vocabulary on comprehension. Their goal was to evaluate among ELLs the feasibility and utility of a reading assessment in which the passages used simple, regular, high-frequency words and in which the impact of variation in background knowledge was minimized by limiting topics to very familiar ones (e.g., pets, bicycles) and by introducing nonce words for novel concepts. Items were constructed as true–false statements referring to information presented in familiar narrative-style passages, like the following: Nan has four pets. One pet is a cat. Nan’s cat is fast. Nan has a pet culp. Nan’s pet culp is like her cat. But Nan’s pet culp is faster than her cat. August et al. (in press) demonstrated with three different sets of pilot participants that the DARC is feasible for use with children as young as kindergarteners, that simple yes–no responses were adequate to reflect children’s comprehension processing, and that different aspects of the comprehension process (text memory, text inferencing, background knowledge, and knowledge integration) could be measured independently. Crucially, the pilot results showed wide variation in performance among ELLs who all scored low on a general comprehension measure. Some children who scored poorly on measures with a higher vocabulary load and greater syntactic complexity, such as the Stanford–9 or the WLPB–PC measures, performed well on the DARC. In this article, we explore further the differential functioning of the DARC and the more widely used WLPB–PC, by contrasting how print-related, language-related, and narrative skills predict outcomes on these two measures within one group of Latino third-grade ELLs. The WLPB–PC is typically strongly affected by print-related skills; we ask whether the DARC reflects its design by showing a weaker relation. On the other hand, the relation of both WLPB–PC and DARC performance to oral language measures might be expected to be strong (Hulme et al., 2004), given the central importance of language processing in reading comprehension. Skill in producing narratives has been shown to relate to word reading in Latino ELLs (Miller et al., 2006), but the relation of narrative production skill to comprehension measures remains open to speculation. It is possible that the task

MEASURES OF READING COMPREHENSION

305

demands posed by the DARC will privilege verbal-processing and verbal-reasoning skills, whereas the WLPB–PC will be more affected by traditional language-proficiency measures, such as vocabulary. We have a number of central research questions: (a) Can the DARC and the WLPB be differentiated as measures of reading comprehension? If so, (b) how well do print skills predict performance on the WLPB–PC measure versus the DARC? (c) Do other factors, such as participants’ oral language skills and narrative production, differentially relate to the WLPB–PC and the DARC? (d) More generally, is there evidence that the DARC operates in a way that is distinctively different from the WLPB–PC measure, thus confirming its promise as a novel, informative measure of reading comprehension? In the process of addressing these questions, we further hope to provide a template to other reading researchers for empirically investigating measures of reading comprehension.

METHOD Participants The sample comprised 192 third-grade Spanish-speaking ELLs in 33 transitional bilingual education classrooms in nine schools in two different Texas school districts. The two districts were demographically distinct: a large, densely populated metropolitan area in southeastern Texas and a semiurban area in the Rio Grande Valley. Approximately 65% (n = 125) of the sample came from the latter site. Criteria for including schools in the sample were that more than 40% of the school population were Latino, that at least 30% of the kindergarteners were considered limited English proficient, that the schools were performing adequately on their state accountability assessments, and that they were implementing a transitional bilingual education model. The final sample was evenly divided between boys (n = 94) and girls (n = 92), with 6 cases missing information on gender. This sample is derived from a larger study of kindergarten to Grade 3 students focused on developing and validating assessments for use with Spanish-speaking ELLs (Francis, Carlson, et al., 2005). The total sample consisted of 1,644 students across kindergarten to Grade 3, of which a random sample participated in testing with the DARC and the oral narrative procedure. All students included in analyses described were in Grade 3 and had completed the DARC in English. Of the 401 students in Grade 3 in the larger sample, 214 were given the opportunity to complete the DARC in English and in Spanish, and 198 completed it in English. Most of these students (n = 192) also completed an oral narrative production task in English (Miller et al., 2006). The resulting sample of 192 students performed better as a group on standardized measures of English language than the remainder of the Grade 3 sample. However, as

306

FRANCIS ET AL.

can be seen in Table 1, they tended to perform well below normative expectations as a group, with means on standardized language measures ranging from 70 to 81. Measures Students were administered a battery of language and literacy assessments in both Spanish and English in sessions separated by about 2 weeks. We focus here only on the English assessments. The total testing time was about 3 hr. If students were unable to complete testing in a single setting, additional sessions were allowed. The battery was designed to measure key skills related to the development of literacy and oral language proficiency: decoding accuracy and fluency, phonological awareness, vocabulary, syntax, listening comprehension, and reading comprehension. All tests were administered using standard administration procedures as prescribed by the test developer/publisher where such procedures were available, with the following exceptions. To increase students’ chances for completing assessments in English, we first provided instructions and example items in English according to standard administration guidelines. However, if students were unable to complete the practice items in English, the examiner administered instructions in Spanish and then repeated the practice items. Students still unable to complete the practice items in English then were given them in Spanish. If the student was still unable to complete the practice items, then testing was discontinued for that subtest. However, if the student was able to complete the practice items in Spanish, the English practice items were readministered. If the student was successful on the English practice items, testing continued in English. If the student did not complete the practice items in English, then testing was discontinued for that subtest.

Phonological awareness. The Comprehensive Test of Phonological Processing (CTOPP; Wagner, Torgesen, & Rashotte, 1999), considered the gold standard for assessing phonological awareness skills, was administered. CTOPP subtests measure phonological awareness, phonological memory, and rapid naming. For a complete description of the individual subtests, the reader is referred to Wagner et al. (1999) or Schatschneider et al. (1998). In this study, we used the subtests measuring phonological awareness, including First Sound Comparison (Cronbach’s α = .67), Final Sound Comparison (α = .73), Blending Phonemes into Words (α = .85), Blending Phonemes into Non-Words (α = .87), Segmenting Words into Phonemes (α = .93), Segmenting Non-Words into Phonemes (α = .93), and Phoneme Elision (α = .91). All reported alphas are based on the sample of third-grade students in this study. The somewhat lower alpha for First Sound Comparison is attributable to ceiling effects. Total scores for the phonological subtests were summed to form a composite measure of phonological awareness (α = .85) based on prior research indicating that these tasks are unidimensional (Schatschneider et al., 1998). Intrasubtest correlations in this sample ranged from

MEASURES OF READING COMPREHENSION

307

.14 to .79 (Mdn r = .44), with the smallest correlations involving the First Sound Comparison (M = 9.5, with a maximum of 10).

Decoding accuracy and fluency. To measure decoding accuracy, we used two subtests from the WLPB (Woodcock, 1991). The Letter Word Identification subtest measures real-word decoding by presenting individual words that the student reads out loud. The Word Attack subtest uses pseudoword decoding to assess the examinee’s knowledge of the rules for decoding words phonetically in English. The two subtests consist of 57 and 30 items, respectively. To limit overtesting, both subtests use a ceiling rule: Testing continues until the examinee misses the six highest numbered items on a page. Internal consistency reliability estimates in the current sample were .89 for each subtest, whose scale scores also correlate .89 with one another. To measure fluency of decontextualized word reading, we used the Test of Word Reading Efficiency (TOWRE; Torgesen, Wagner, & Rashotte, 1999). The TOWRE requires the examinee to read aloud as quickly as possible a list of words ordered by difficulty. The score is the number of words read correctly in 45 sec. We converted this into number of words read correctly per minute of reading time and used the raw score, because all students were in the same grade. Students were randomly assigned to receive either Form A or Form B of the TOWRE real-word reading test. In addition to the TOWRE, we created an experimental word reading efficiency form for use in the project. This form used only words taken from Grade 1 texts; thus the words were not graded in difficulty. The experimental form correlated .82 with Form A and .85 with Form B of the TOWRE. For the sake of analysis, we combined the two scores for each student into a single estimate of decontextualized word reading efficiency. Oral language proficiency—Standardized. To measure oral language proficiency, we used several subtests of the WLPB in English (Woodcock, 1991; Woodcock & Muñoz-Sandoval, 1995). The WLPB is a highly regarded battery of tests with high internal consistency and test–retest reliability values as well as extensive validity data (Woodcock, 1991). The test development, scaling, and norming process for the assessment is described in detail in the WLPB manual (Woodcock, 1991). All subtests use ceiling rules to terminate assessment following a specified number of errors. The WLPB allows for various scale score metrics. In this study, we used the age-based standard scores. The Listening Comprehension subtest has 38 test items in which the examinee listens to a brief passage that omits one word. The examinee completes the statement by providing a single word that is consistent with the preceding information. In this sample, the Listening Comprehension subtest had an internal consistency reliability of .88.

308

FRANCIS ET AL.

The Memory for Sentences subtest targets semantics and syntax. The examinee is asked to repeat precisely what is said by the examiner (Items 1–15) or presented on audiotape (Items 16–32); items go up to sentences of roughly 20 words and multiple clauses. Single-word items receive 1 point if correct, and multiword items are scored 0, 1, or 2 points (2 points for exact reproduction). In the current sample, internal consistency reliability was estimated at α = .74. The Picture Vocabulary subtest of the WLPB begins with multiple-choice items on which the student points to the picture that matches a vocabulary word provided orally by the examiner. At Item 8, the test becomes a confrontation naming test, that is, one where children are shown a picture and asked to provide a word that describes the picture of a targeted subpart of the picture. On the Verbal Analogies subtest of the WLPB, the examinee is required to complete items of the form “A is to B, as C is to … .” Internal consistency reliability was estimated at α = .81 for both the Verbal Analogies and Picture Vocabulary subtests in the current sample.

Oral language proficiency—Narrative procedure. In addition to the standardized language measures, we collected an oral narrative to reflect more natural use of language. Students retold a story based on one of the wordless picture books of Mercer Meyer. The exact procedure is described in detail in Miller et al. (2006). In brief, the examinee looked through the pictures as the examiner told the story following a flexible script. The student then retold the story while still looking at the book. The examiner, who sat opposite the student, offered reminders that only the student could see the pictures to reduce opportunities for the student to point to the pictures and use nonspecific referents to elements in the story. The retelling was recorded on digital minidisk and transcribed into computer text files for subsequent analysis using the Structured Analysis of Language Transcripts (Miller & Iglesias, 2003). The Structured Analysis of Language Transcripts provides a variety of measures, including vocabulary diversity, fluency, and syntax. In addition, the narratives were scored by hand for narrative structure. To assess reliability, a random sample of 20 narratives was scored by multiple raters for protocol accuracy (98%–100%), for transcription accuracy (90%–98%), for the narrative structure score (Krippendorff’s α = .74), and for the subordination index (α = .96; Krippendorff, 1980). Measures of language proficiency taken from the narrative included Mean Length of Utterance in Words (MLUW), Number of Different Words (NDW), Number of Total Words (NTW), Subordination Index (SI), Words Per Minute (WPM), and Narrative Structure Score (NSS). The rules for segmenting speech into utterances are given in Miller et al. (in press) and Loban (1976). NDW is calculated as the number of different word roots without inflections in the child’s retelling. NTW gives the total transcript length in number of words and is related to vocabulary. WPM (obtained by dividing NTW by total seconds in the retell, then

MEASURES OF READING COMPREHENSION

309

multiplying by 60) provides a measure of verbal fluency. SI is the average number of dependent clauses in an utterance, a measure of syntactic complexity. Finally, the NSS, a measure of coherent story structure, was obtained by scoring the narrative holistically on a 6-point scale (0–5) on each of seven story grammar elements: Introduction, Character Development, Mental States, Character Referencing, Conflict/Resolution, Cohesion, and Conclusion. These are discussed in greater detail in Miller et al. (2006).

Reading comprehension. The study used two measures of reading comprehension: (a) the PC subtest from the WLPB (Woodcock, 1991) and (b) the total score and subtest scores from the DARC (August et al., in press). The WLPB–PC uses a cloze procedure to examine the ability to understand information read silently. The examinee reads a sentence or short passage from which individual words have been omitted, then provides the most appropriate word to fill in the blank given the meaning of the sentence or passage. The PC subtest is used extensively in reading research because of its high reliability and validity. In this sample, reliability was estimated to be .81 based on internal consistency. The DARC requires children to read a passage and answer 30 true–false questions about the information provided in the story. The questions are designed to assess students’background knowledge, memory for the text, ability to form inferences based on information provided in the text, and ability to form inferences that require integration of information presented in the text with information known from background knowledge. When these data were collected, two stories had been developed, each in English and adapted into Spanish using a back-translation method (i.e., translation into Spanish and then back into English by independent translators). Each student read one story in English and one story in Spanish, with the pairing of language and story determined at random. We report here on the total correct score on the story read in English and on scores measuring Text Memory, Text Inferencing, Knowledge Integration, and Background Knowledge. For Story 1, internal consistency for the total score was estimated at .75, whereas for Story 2 it was estimated at .68. Reliability for the subtests was generally in the .5 to .6 range. For the purposes of the current analyses, subtest scores were used in factor analytic models that take into account their respective reliabilities. In all cases, scores from the two stories were standardized to a common mean and standard deviation, so that performance for students reading Story 1 was “equated” to performance for students reading Story 2 in computing the subtest scores and total scores.
RESULTS Means, standard deviations, and minima and maxima for each measure are provided in Table 1. The table presents the reading comprehension measures first, fol-

310

FRANCIS ET AL. TABLE 1 Descriptive Statistics for English Assessments With Grade 3 English Language Learner Students

Measure DARC Total DARC Text Memory DARC Text Inferencing DARC Knowledge Integration DARC Background Knowledge WLPB Passage Comprehension WLPB Letter Word Scale Score WLPB Word Attack Scale Score Narrative Mean Length of Utterance Narrative Subordination Index Narrative Number of Different Words Narrative Fluency (words per minute) WLPB Verbal Analogies WLPB Picture Vocabulary WLPB Listening Comprehension WLPB Memory for Sentences CTOPP Phonological Awareness Word Reading Efficiency (words per second) CTOPP Memory for Digits Raven’s Colored Progressive Matrices (no. correct)

N 192 192 192 192 191 183 183 182 192 192 192 192 183 183 180 183 192 192 192 192

M 23.33 14.82 15.30 15.51 15.58 97.32 112.10 113.00 7.24 1.15 101.50 98.36 92.64 71.51 72.73 81.69 73.79 1.46 10.81 27.23

SD 3.92 2.58 3.02 2.82 2.91 16.87 28.38 25.74 1.05 0.12 24.98 26.02 16.07 24.23 19.73 19.43 21.06 0.33 2.51 4.15

Min 10.00 6.10 7.44 7.46 3.02 50.00 34.00 57.00 4.48 0.67 25.00 24.22 47.00 12.00 9.00 34.00 10.00 0.66 5.00 12.00

Max 30.00 18.68 19.00 20.47 18.50 144.00 198.00 161.00 10.71 1.48 159.00 158.80 162.00 132.00 141.00 131.00 117.00 2.43 20.00 36.00

Note. Min = minima; Max = maxima; DARC = Diagnostic Assessment of Reading Comprehension; WLPB = Woodcock Language Proficiency Battery–Revised; CTOPP = Comprehensive Test of Phonological Processing.

lowed by WLPB measures of word reading, the narrative language measures, the WLPB language measures, phonological awareness, word reading efficiency, memory, and nonverbal reasoning (Raven’s Color Progressive Matrices; Raven, Reven, & Court, 1998). Table 1 shows that 178 cases (93%) have complete data. We standardized each of the subtest scores for the two DARC forms to perform common analyses. Because the DARC is an experimental measure, scores reported in Table 1 have limited value in indicating the level of sample performance on comprehension. However, the means in Table 1 show relatively good performance on word reading skills and the WLPB reading comprehension measure. The standardized language measures (Picture Vocabulary, Verbal Analogies, Listening Comprehension, and Memory for Sentences) from the WLPB show, however, that these students are scoring substantially more poorly on language proficiency than they are on the WLPB measures of reading. In fact, performance on the reading measures is almost 1 standard deviation above average, whereas the measures of language proficiency range from 0.5 to almost 2 standard deviations below average. The mea-

MEASURES OF READING COMPREHENSION

311

sures of narrative production paint a somewhat less bleak picture than the WLPB language proficiency measures, but caution is imposed by the limited normative information on this task. To investigate the discriminant validity of the DARC and WLPB–PC measures, we fit a series of latent variable models to the data. These models were developed specifically to investigate the relations among the reading comprehension measures when examined alone together with the other measures in Table 1 that are demonstrated precursors of reading comprehension. Table 2 presents correlations between each of the predictors and each of the measures of reading comprehension, including the WLPB–PC, the four subtest scores from the DARC, and the total DARC score. These bivariate correlations show small to moderate correlations between the DARC and WLPB–PC. The DARC tends to correlate more highly with measures of language from the WLPB than with measures of word-level reading skills. However, WLPB–PC (PC) shows a similar pattern in this sample. To investigate the discriminant validity of the DARC and PC, we used confirmatory factor analysis to estimate and test a series of latent variable models deTABLE 2 Correlations of Language and Literacy Measures to Woodcock Language Proficiency Battery– Revised (WLPB) Passage Comprehension and Diagnostic Assessment of Reading Comprehension (DARC) Measures of Reading Comprehension DARC Measure WLPB Passage Comprehension WLPB Letter Word WLPB Word Attack Narrative Mean Length of Utterance Narrative Number of Different Words Narrative Fluency Narrative Subordination Index WLPB Verbal Analogies WLPB Picture Vocabulary WLPB Listening Comprehension WLPB Memory for Sentences CTOPP Phonological Awareness Word Reading Efficiency CTOPP Memory for Digits Raven’s Colored Progressive Matrices WLPB– PC .66 .64 .40 .47 .35 .36 .78 .76 .72 .75 .60 .56 .34 .44 Total .46 .34 .28 .22 .49 .31 .28 .45 .57 .51 .54 .40 .31 .33 .36 TM .24 .11 .05 .15 .32 .26 .23 .29 .33 .30 .25 .19 .13 .09 .19 TI .29 .28 .31 .13 .28 .13 .14 .29 .37 .36 .39 .36 .16 .26 .27 KI .48 .35 .30 .23 .34 .32 .24 .44 .55 .45 .54 .32 .34 .31 .25 BK .30 .22 .14 .17 .43 .22 .23 .25 .37 .36 .35 .28 .26 .26 .27

Note. | r | > .15 is statistically significant at p < .05. | r | > .24 is statistically significant at p < .001. PC = Passage Comprehension; TM = Text Memory; TI = Text Inferencing; KI = Knowledge Integration; BK = Background Knowledge; CTOPP = Comprehensive Test of Phonological Processing.

312

FRANCIS ET AL.

signed to test explicit hypotheses about the two sets of measures. This approach to testing models of discriminant validity was described in greater detail in Francis, Fletcher, Catts, and Tomblin (2005). In the present context, we fit a series of four latent variable models:

• Model 1–RC considers only the measures of reading comprehension and

their relations to one another. • Model 2–PR considers only the predictors and their relations to one another. • Model 3–RCPR simply combines the results of Model 1 and Model 2 to explore relations between predictors and comprehension. Model 4–2RCPR tests explicitly the discriminant validity of PC and the • DARC measures by introducing separate factors into Model 3. Fit statistics for the four models are presented in Table 3. Fit statistics for Models 1 and 2 cannot be compared statistically to one another, or to Models 3 and 4, as these models are not nested. Model 3 is, however, nested in Model 4, and thus these models can be explicitly compared using the information in Table 3. All models were fit using data from the subset of 178 cases with complete data. We first fit a single-factor model (Model 1–RC) to the five reading comprehension measures (PC, Text Memory, Text Inferencing, Knowledge Integration, and Background Knowledge) without regard for word-level reading skills (accuracy and fluency), language proficiency, phonological awareness, memory, or verbal reasoning. As shown in Table 3, this model provided an exceptionally good fit to the data. The chi-square test was not statistically significant, χ2(5, N = 178) = 5.62, p < .35. Descriptive indices of fit were also strong, including the root mean square error of approximation of .024 and the standardized root mean square residual (SRMSR) of .033. Both of these measures indicate a well-fitting model when they fall below .05. Thus, on balance, the information in Table 3 suggests that the reading comprehension measures intercorrelate in a way consistent with a single underlying dimension. We would conclude from Model 1 that PC and the four DARC reading measures reflect a single factor of Reading Comprehension. However, the test of unidimensionality (i.e., single-factoredness) afforded by Model 1 is relatively low powered because of the limited number of measures included in the model. Expanding the set of measures in the model increases the power of the model to discriminate between PC and the measures of the DARC. To introduce the other measures of Table 1 into the model in a meaningful way, we first fit a series of models that examined only those measures. We began with a model that included factors for Decoding (Letter Word scale score and Word Attack scale score from the WLPB), Narrative Language Production (MLUW, SI, NDW, and WPM from the retelling), Standardized Language Proficiency (Verbal Analogies, Listening Comprehension, Picture Vocabulary, and Memory for Sentences from the WLPB), Phonological Awareness (PA; the CTOPP total), Fluency

MEASURES OF READING COMPREHENSION TABLE 3 Fit Statistics for Latent Variable Models of Comprehension Model 1 – RC 2 – PR 3 – RCPR 4 – 2RCPR 5 – M4R 5 – Restricted χ2 5.62 113.62 246.26 202.47 202.47 206.20 df 5 57 125 118 118 124 RMSEA .024 .076 .076 .062 .062 .060 SRMSR .033 .049 .060 .050 .050 .051 GFI .99 .92 .87 .89 .89 .89

313

AGFI .96 .84 .80 .83 .83 .83

Note. Model 1–RC – Reading comprehension measures only; single - factor model with WLPB–PC and four DARC indicators (TM, TI, KI, and BK). Model 2–PR – predictors only; factors are decoding (DE), Narrative Language (NL), Standardized Language (SL), Phonological Awareness (PA), Fluency (FL), Memory (ME), and Nonverbal IQ (NV). PA, FL, and NV are measured by single indicators in all models. ME is measured by WLPB MS and CTOPP MD. DE is measured by LW and WA from the WLPB; NL is measured by all narrative measures (MLUW, NDW, SI, WPM), SL is measured by all WLPB language measures (PV, VA, LC, and MS). These relations do not change across subsequent models. Model 3–RCPR: combines models RC and PR; single Reading Comprehension factor. All factors are allowed to correlate freely. Model 4–2RCPR: is identical to Model 3 but splits the Reading Comprehension factor into two factors, one measured only by WLPB PC and the other measured by the four DARC measures (TM, TI, KI, and BK). Models 3 and 4 are nested and can be compared statistically. Model 5–M4R: reparameterization of Model 4 to account for relations of Reading Comprehension and Predictor Factors through Factor on Factor regressions. Model 5 is equivalent to Model 4. Model 5–Restricted: – constrains to 0.0 nonsignificant regressions between the Reading Comprehension factors and the Predictor Factors. PA did not contribute uniquely to either Reading Comprehension factor. Regression of the Reading Comprehension factors onto the restricted set of Predictor Factors (see Table 5) fully accounted for the correlation between the two Reading Comprehension factors. Model 5 Restricted is nested in Model 5. WLPB = Woodcock Language Proficiency Battery–Revised; PC = Passage Comprehension; DARC = Diagnostic Assessment of Reading Comprehension; TM = Text Memory; TI = Text Inferencing; KI = Knowledge Integration; BK = Background Knowledge; MS = Memory for Sentences; CTOPP = Comprehensive Test of Phonological Processing; MD = Memory for Digits; LW = Letter Word Scale Score; WA = Word Attack Scale Score; MLUW = Mean Length of Utterance in Words; NDW = Number of Different Words; SI = Subordination Index; WPM = Words Per Minute; PV = Picture Vocabulary; VA = Verbal Analogies; LC = Listening Comprehension; MS = Memory for Sentences; RMSEA = root-mean-square residual; GFI = goodness-of-fit index; AGFI = adjusted goodness-of-fit index.

(Word Reading Efficiency), Memory (Memory for Digits from the CTOPP), and Nonverbal Intelligence (Raven’s Colored Progressive Matrices). We subsequently revised the model based on information about lack of fit, in particular relying on the modification indices to introduce two changes. The first change was to allow for a test-specific correlation between the two narrative production measures SI and MLUW; the second change was to allow Memory for Sentences to load on the Memory factor along with Memory for Digits. The final model, the fit statistics for which are presented in Table 3 under Model 2–PR, yielded a reasonable fit to the data for the 14 predictor measures in Table 1. The overall chi-square for Model 2 is

314

FRANCIS ET AL.

statistically significant, χ2(57, N = 178) = 113.62, p < .001, which suggests a lack of fit of the model to the data, but the other information in Table 3 suggests a reasonably good-fitting model. Other information, not presented in Table 3, also suggests a good-fitting model. Specifically, the expected cross-validation index (ECVI) for the model of 1.19 was equal to the ECVI for a saturated model, and the model AIC (Akaike’s Information Criterion) and saturated model AIC were virtually identical (210.77 and 210.00, respectively), whereas the model CAIC (consistent AIC) of 411.49 was smaller than the saturated model CAIC of 640.09. Thus, the model appears to do a reasonably good job of describing the relations among the 14 predictors. It should be noted that the correlation between the two language factors in the final version of Model 2 was estimated to be .72. Thus, although the two language factors are highly correlated, the correlation is different from 1.0, indicating that the narrative and standardized measures are tapping somewhat different aspects of language functioning. Model 3–RCPR combined the single Reading Comprehension factor of Model 1 with the seven predictor factors of Model 2. Information about model fit can be found in Table 3. The goodness-of-fit index has dropped below .90, and the SRMSR has increased to above .05. In both of the two models that were combined to produce Model 3, the SRMSR was below .05. The increase in SRMSR is due to the combination of the comprehension measures and the predictors in the same model and suggests that the lack of fit is due to the model’s inability to reproduce the correlations among the comprehension measures and the predictor measures. In addition, the ECVI for Model 3 was 2.17, just slightly larger than the ECVI for a saturated model (ECVI = 2.15). AIC for Model 3 was 383.89, compared with 380.00 for a saturated model, whereas CAIC was 655.71 for Model 3, compared with 1174.54 for a saturated model. Thus, the fit, although not terrible, has deteriorated somewhat relative to that of Models 1 and 2. Model 4–2RCPR explicitly tests the extent to which the lack of fit in Model 3 is attributable to the fact that the measures of reading comprehension are not unidimensional. In particular, Model 4 splits the Reading Comprehension factor of Model 1 into two factors: one for WLPB–PC and one for the four measures of the DARC. As mentioned previously, Model 3 is nested in Model 4, and thus we can explicitly test whether Model 4 offers a statistically significant improvement over Model 3. As can be seen in Table 3, a substantial portion of the lack of fit in Model 3 is attributed to the mis-specification on the Reading Comprehension factor. By treating the two sets of reading comprehension measures as separate factors, the chi-square statistic drops from 246.26 to 202.47 on df = 7. This difference of 43.79 is statistically significant at p < .0001 and represents roughly 18% of the lack of fit in Model 3, which, it must be recalled, was created by joining two reasonably well-fitting models. Splitting the reading comprehension measures into two factors yields an ECVI of 1.93, an AIC of 342.43, and a CAIC of 643.52. Statistics for the saturated model are unchanged as these are dependent only on the data.

MEASURES OF READING COMPREHENSION TABLE 4 Estimated Factor Correlations From Models 3 and 4 Factor 1. Reading Comprehension (Model 3) 2. WLPB– Passage Comprehension (Model 4) 3. DARC Reading Comprehension (Model 4) 4. Decoding 5. Narrative Language 6. Standardized Language 7. Phonological Awareness 8. Fluency 9. Memory 10. Nonverbal IQ 1 2 3 4 .73 5 .66 6 .99 7 .66 8 .61 9 .48

315

10 .49

.61 .68 .54 .86 .58 .55 .38 .42

.43 .64 .77 .48 .41 .52 .40

.35 .35 .60 .54 .64 .33 .39 .72 .46 .51 .38 .24

.60 .72 .66 .49 .55 .38

.54 .46 .66 .43 .59 .41

.64 .51 .49 .43 .31 .33

.33 .38 .54 .59 .31 .35

.39 .24 .38 .41 .33 .35

Note. Correlations from Model 3 are given above the diagonal; correlations from Model 4 are given below the diagonal. Model 4 splits the Reading Comprehension factor of Model 3 into two factors: one for the Woodcock–Johnson Language Proficiency Battery–Revised (WLPB) Passage Comprehension measure and one for the four measures from the Diagnostic Assessment of Reading Comprehension (DARC). Note that correlations among the seven predictors factors are unchanged.

Table 4 presents factor correlations from Models 3 (above the diagonal) and 4 (below the diagonal). The correlations among the factors measured by the predictors are unchanged in the two models. The correlations in Table 4 show that the Reading Comprehension factor of Model 3 is indistinguishable from the Standardized Language factor (r = .99) and is highly correlated with Decoding (r = .73; see row 1 of Table 4). In contrast, when the Reading Comprehension factor is split into two factors in Model 4, the two factors show somewhat different relations to the set of predictors (see the columns labeled 2 and 3 in Table 4). In particular, the PC factor is more highly correlated than the DARC factor with Decoding (r = .68 vs. .43), Phonological Awareness (r = .58 vs. .48), Fluency (r = .55 vs. .41), and the Standardized Language factor (r = .86 vs .77), whereas the DARC factor is more highly correlated than the PC factor with the Narrative Language Production factor (r = .64 vs. .54) and the Memory factor (r = .52 vs. .38). Both factors correlate about equally with Nonverbal IQ. This differential pattern of correlations cannot be explained as a simple difference in the reliabilities of the PC and DARC factors. If one factor were simply measured more reliably than the other, then we would observe the same pattern of correlations but a difference in magnitude. Instead, this differential pattern of relations indicates that the factors potentially tap somewhat different aspects of the Reading Comprehension construct. To understand how these predictor factors might account for variation in PC and DARC performance, we reparameterized Model 4 so that the predictor factors were allowed to freely intercorrelate, but the relations between the seven predictor

316

FRANCIS ET AL.

factors and the two reading comprehension factors were accounted for through regression of the two Reading Comprehension factors on the seven predictor factors. Allowing the two Reading Comprehension factors to have correlated disturbances yields a model with the same fit, albeit with different parameters. See the line for Model 5–M4R in Table 3. In this alternate parameterization of Model 4, it is possible to examine the unique contribution of the seven predictor factors to each of the two Reading Comprehension factors. In reducing Model 5, we constrained to 0.0 the direct regression of the Reading Comprehension factors on those predictor factors that did not make a statistically significant contribution to them. The final reduced set of factor-on-factor regression coefficients is presented in Table 5. The fit statistics in Table 3 show that this reduced set of factor-on-factor regressions fits the data almost as well as Model 5 (i.e., Model 4), which provides the upper limit on how well the reduced model can fit the data. The difference in chi-square statistics between the two models is not statistically significant, and all remaining fit statistics are essentially equivalent between the two models. The coefficients in Table 5 show that the Decoding and Fluency factors contribute only to the Reading Comprehension factor measured by WLPB–PC and do not contribute to the DARC factor once the two language and nonverbal reasoning factors have been accounted for. Furthermore, the two language factors (viz., the Standardized Language factor and the Narrative Language Production factor) make relatively equal contributions in predicting the DARC factor, but these two language factors have opposite signs in predicting to the PC factor. Similarly, the Memory factor has a negative signed coefficient in predicting to the PC factor. Insofar as none of the correlations among

TABLE 5 Estimated Regression Coefficients for Predicting Reading Comprehension Factors From Predictors in Model 4 Predictor Comprehension Factor WLPB–PCa Statistic β SE t β SE t Decoding 0.15 0.07 2.09 Narrative Language –0.31 0.16 –1.98 0.17 0.10 1.61 Standardized Language 1.09 0.15 7.49 0.29 0.08 3.45 PA Fluency 0.13 0.06 2.16 Memory –0.20 0.10 –1.93 Nonverbal IQ 0.08 0.05 1.66 0.06 0.04 1.69

DARCb

Note. Coefficients left blank are constrained to be 0 in the model. PA = Phonological Awareness; WLPB–PC = Woodcock Language Proficiency Battery–Passage Comprehension; DARC = Diagnostic Assessment of Reading Comprehension. aR2 = .81. bR2 = .62.

MEASURES OF READING COMPREHENSION

317

FIGURE 1 Scatterplot of Woodcock Language Proficiency Battery—Revised (WLPB–R) Passage Comprehension and Diagnostic Assessment of Reading Comprehension (DARC) Total Reading scores. Plotted symbol is proportional in size to the score on WLPB–Decoding. The WLPB–Decoding score is formed here by averaging standard scores for WLPB–Letter Word and WLPB–Word Attack. DARC Total Reading has been standardized to a mean of 100 and standard deviation of 15 prior to plotting.

the factors were negative, these sign reversals in the coefficients should be interpreted with some caution given the relatively small sample size available for this study (n = 178). As a final examination of the role of decoding in the PC and DARC factors, we provide a scatter plot of the DARC total scores (standardized to a mean of 100 and standard deviation of 15) and the WLPB–PC (see Figure 1). The plotting symbol is designed to be proportional to the decoding score, which was calculated by averaging the Letter Word scale score and Word Attack scale score from the WLPB. The plot in Figure 1 shows fairly clearly that higher decoding scores tend to coincide with higher comprehension scores on both tests, but this pattern is somewhat more marked for PC. Note the preponderance of larger circles toward the right side of the figure (e.g., above a score on the horizontal axis of 105 or 110). Contrast that with the number of smaller circles in the left-hand side of the figure that tend to range from the bottom of the figure to the top of the figure, that is, they run throughout the extent of the score range of the DARC. It is certainly the case that

318

FRANCIS ET AL.

higher decoding scores lead to better performance on the DARC, just as they do on all measures of reading comprehension, but this tendency is less pronounced as both Figure 1 and the correlations in Table 4 indicate. DISCUSSION The results of these analyses show striking differences in the relations of various predictors to outcomes on the two criterion measures of reading comprehension. First, WLPB–PC is much more strongly related to print skills than is DARC performance—confirming that we have achieved some degree of success in designing the DARC to minimize the effects of variation in word reading ability on variation in DARC scores. Particularly striking is that print-related skills of decoding and fluency make significant unique contributions to the prediction of WLPB–PC but not to the DARC once contributions from language and reasoning factors have been accounted for. That the factor regressions of the restricted version of Model 5 fully account for the relation between the two Reading Comprehension factors suggests that the basis of their relation is in language and not in print-level skills. Note that these findings do not mean that the DARC is unrelated to print-level skills. Indeed, the correlation between the DARC and print-level skills factors of Model 4 are moderately large, namely, .43 and .41 for the Decoding and Fluency factors, respectively. However, these correlations do indicate that we have achieved a measure of success in reducing the role of print skills in the DARC measure of comprehension while increasing somewhat the role of verbal processing. Second, oral language skills are relatively much more important in explaining variance in the DARC than in the WLPB–PC outcomes. Although the absolute level of variance explained on the DARC (R2 = .62) is much lower (R2 = .81 for the WLPB–PC), reflecting its lower reliability (which places an upper limit on predictability), it is clear that DARC performance is not so overdetermined by print skills that the visible contribution of other domains is restricted. Third, the narrative production measures show a possibly negative relation to performance on the WLPB–PC assessment in the restricted factor regression model (reduced version of Model 5) but a significant and positive relation to DARC performance. In particular, overall narrative skill reflected in the Narrative Language factor (the SI, MLUW, and NDW used in producing the oral narrative; measures of its length and sophistication; and the WPM, a measure of oral fluency) was uniquely related to DARC performance. These results help to establish the value of the DARC as a measure of reading comprehension on which performance is determined by abilities we might think of as central to comprehension itself—memory for the text read, integration of new information with information stored in memory, making connections across those

MEASURES OF READING COMPREHENSION

319

sources of knowledge, and structuring the integrated information into narrative forms that might promote longer term retention. In short, the DARC, like the narrative production task, places a premium on verbal processing of information and not simply on verbal knowledge as reflected in measures of vocabulary. Although these aspects of verbal skill are certainly related, they are distinct, and it stands to reason that their contributions to the processing of text can be differentiated with an appropriately crafted measure of comprehension. We certainly do not underestimate the importance of word-reading skills as a determinant of comprehension success; we do suggest, though, that comprehension measures that reflect other domains are important in guiding instruction and in informing our understanding about the complexity of achieving successful comprehension of a text. The analyses reported here made use of latent variable models to explicate possible psychometric properties, such as convergent and discriminant validity, of two sets of reading comprehension measures, one a popular and heavily researched test and the other a novel test with roots in experimental psychology. That preliminary models showed the two sets of measures to converge on a single construct highlights the importance of explicit testing of psychometric models in both simple and complex contexts. Other models certainly could have been considered and tested as potential explanations of the relations among the 19 measures studied here. The extent to which these findings will hold up on replication remains to be determined. Although there is reason for optimism given the consistency of these findings with other research on the WLPB–PC (Francis, Fletcher, et al., 2005) and with the theory underlying the development of the DARC, the sample size used in this study is small by latent variable modeling standards, and the models are relatively complex. Thus, caution in interpreting the results is advised, especially those involving the relative contributions of different predictors to the two comprehension factors. Limitations Of course, the study reported here is subject to many limitations. First, the version of the DARC used was a preliminary one, and the test itself falls short of desired levels of reliability. This limits the amount of variance that can be explained and contrasts to the much more reliable subtests of the WLPB. Second, we report data on a particular group of third-grade ELLs, all of whom received literacy instruction in a pair of school districts in Texas. It is thus impossible to estimate the degree to which the results reported here would generalize to a more heterogeneous sample of readers. Third, although the DARC was designed to minimize the impact on performance of vocabulary knowledge, we have evidently not fully achieved this goal, despite the very limited lexical range in the DARC passages. Performance on the DARC shows a significant correlation to language factors, and future versions of

320

FRANCIS ET AL.

the DARC will have to be refined to control and/or manipulate this relationship more precisely. Of course, whether the relations with language currently shared by the DARC reflect knowledge of semantics, such as vocabulary knowledge, or verbal reasoning skills that can be differentiated from knowledge of semantics, awaits further research. This research will likely require greater refinement of the language constructs in our models as well as the ability to more precisely measure the specific processing demands of the comprehension measures. We neither included models with a general language factor nor attempted to isolate possible methods factors in the models investigated here. Models with general language factors have been found in prior research using standardized language measures (Fletcher et al., 1996) but have not been applied to narrative production and standardized language measures in ELL populations, to our knowledge. Similarly, Mehta, Foorman, Branum-Martin, and Taylor (2005) found a single latent factor to account for variability across language and literacy measures, although their study used only a very limited set of language measures. It is clear from the current analyses that a single language factor could not account for the covariances among the language measures in this sample, as evidenced by the magnitude of the correlation between the two language factors in all of the models (2–5). However, it is possible that a general language factor with specific factors designed to capture method variance, or possibly more specific aspects of language processing, could provide a better fit to the data than the current models. The extent to which the conclusions reached about the DARC and PC Reading Comprehension factors would hold up under competing models for the language factors and other predictors must await future research. Fourth, although we would like to argue for the value of assessing reading comprehension using measures that minimize the impact of print skills and of vocabulary knowledge for all students, in fact we have so far tested the DARC only with second-language speakers of English, and its value with monolingual English readers remains undemonstrated. In making this claim, it is important to keep in mind what is meant by minimizing print skills. Although we do not deny the importance of print skills in comprehension generally, in assessing comprehension there is some value to isolating the relative roles of print skills from the various forms of cognitive processing that take place during comprehension of complex text so that assessment can effectively be used to guide instruction. So long as performance on comprehension assessments is determined by many factors whose relative contributions are undifferentiated in the scores obtained on the assessments, the goal of using assessment to guide instruction will remain elusive. We have proposed here one approach to isolating the contributions of various important components to reading comprehension. Specifically, we have shown that it may be possible to constrain the decoding and vocabulary demands of the text while increasing the processing demands of understanding the text. At the same time, we have shown how latent variable models can play an important role in eval-

MEASURES OF READING COMPREHENSION

321

uating the success of our efforts to isolate and measure these important processes. Although reading comprehension in its natural state is dependent on many skills and abilities, its assessment may be better served by measures like the DARC that attempt to isolate these processes from one another for the purpose of diagnosis and guiding instruction.

ACKNOWLEDGMENTS This research was supported in part by grants HD39521, “Oracy/Literacy Development of Spanish-speaking Children” and R305U010001, “Biological and Behavioral Variation in the Language Development of Spanish-speaking Children”, both of which were jointly funded by the National Institute of Child Health and Human Development and the Institute of Education Sciences. The findings and conclusions reported herein are those of the authors and do not necessarily reflect the views of the agencies or the federal government, either expressly or implied.

REFERENCES
Adams, M. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press. August, D., Francis, D., Hsu, H.-Y., & Snow, C. (in press). Assessing reading comprehension in bilinguals. Elementary School Journal. Bradley, L., & Bryant, P. E. (1983). Categorizing sounds and learning to read—A causal connection. Nature, 301, 419–421. Dixon, P., LeFevre, J. A., & Twilley, L. C. (1988). Word knowledge and working memory as predictors of reading skill. Journal of Educational Psychology, 80, 465–472. Engle, R. W., Nations, J. K., & Cantor, J. (1990). Is “working memory capacity” just another name for word knowledge? Journal of Educational Psychology, 82, 799–804. Fletcher, J. M., Stuebing, K. K., Shaywitz, B. A., Brandt, M. E., Francis, D. J., & Shaywitz, S. E. (1996). Measurement issues in the interpretation of behavior–brain relationships. In R. Thatcher, N. Krasnegor, & G. R. Lyon (Eds.), Developmental neuroimaging: Mapping the development of brain and behavior (pp. 255–262). New York: Academic. Francis, D., Carlson, C., Fletcher, J., Foorman, B., Goldenberg, C., Vaughn, S., et al. (2005). Oracy/literacy development of Spanish-speaking children: A multi-level program of research on language minority children and the instruction, school and community contexts, and interventions that influence their academic outcomes. Perspectives, pp. 8–12. Francis, D. J., Fletcher, J. M., Catts, H., & Tomblin, B. (2005). Dimensions affecting the assessment of reading comprehension. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehension and assessment (pp. 369–394). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Gathercole, S. E., & Pickering, S. J. (2000). Working memory deficits in children with low achievements in the National Curriculum at seven years. British Journal of Educational Psychology, 70, 177–194. Gough, P. B., & Tunmer, W. E. (1986). Decoding, reading, and reading disability. Remedial and Special Education, 7(1), 6–10.

322

FRANCIS ET AL.

Haenggi, D., & Perfetti, C. A. (1994). Processing components of college-level reading comprehensions. Discourse Processes, 17, 83–104. Hannon, B., & Daneman, M. (2001). A new tool for understanding individual differences in the component processes of reading comprehension. Journal of Educational Psychology, 93, 103–128. Hulme, C., Muter, V., Snowling, M., & Stevenson, J. (2004). Phonemes, rimes, vocabulary, and grammatical skills as foundations of early reading development: Evidence from a longitudinal study. Developmental Psychology, 40, 665–681. Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills, CA: Sage. Lesaux, N., & Siegel, L. (2003). The development of reading in children who speak English as a second language. Developmental Psychology, 39, 1005–1019. Loban, W. (1976). Language development: Kindergarten through grade twelve (Research Rep. No. 18). Urbana, IL: National Council of Teachers of English. Mehta, P. D., Foorman, B. R., Branum-Martin, L., & Taylor, P. W. (2005). Literacy as a unidimensional multilevel construct: Validation, sources of influence, and implications in a longitudinal Study in grades 1 to 4. Scientific Studies of Reading, 9, 85–116. Miller, J., & Iglesias, A. (2003). Systematic analysis of English and Spanish language transcripts. Madison, WI. Miller, J. F., Iglesias, A., Heilmann, J., Fabiano, L., Nockerts, A., & Francis, D. (2006). Oral language and reading in bilingual children. Learning Disabilities Research & Practice, 21, 30–43. Palmer, J., MacLeod, C. M., Hunt, E., & Davidson, J. E. (1985). Information processing correlates of reading. Journal of Memory and Language, 24, 59–88. Perfetti, C. A. (1985). Reading ability. New York: Oxford Press. Potts, G. R., & Peterson, S. B. (1985). Incorporation versus compartmentalization in memory for discourse. Journal of Memory and Language, 24, 107–118. RAND Reading Study Group. (2002). Reading for understanding: Toward an R&D program in reading comprehension. Washington, DC: RAND Education. Raven, J., Raven, J. C., & Court, J. H. (1998). Coloured Progressive Matrices 1998 edition. Oxford, England: Oxford Psychologists Press. Schatschneider, C., Francis, D. J., Foorman, B. R., Fletcher, J. M., & Mehta, P. (1998). The dimensionality of phonological awareness: An application of item response theory. Journal of Educational Psychology, 91, 439–449. Tabors, P. O., Snow, C. E., & Dickinson, D. K. (2001). Homes and schools together: Supporting language and literacy development. In D. K. Dickinson & P. O. Tabors (Eds.), Beginning literacy with language (pp. 313–334). Baltimore: Brookes. Torgesen, J. K., Wagner, R. K., & Rashotte, C. A. (1999). Test of Word Reading Efficiency. Austin, TX: PRO-ED. Vellutino, F. R. (1979). Dyslexia: Theory and research. Cambridge, MA: MIT Press. Vellutino, F. R. (1987, March). Dyslexia. Scientific American, 34–41. Wagner, R., Torgesen, J., & Rashotte, C. (1999). Comprehensive Test of Phonological Processing. Austin, TX: PRO-ED. Woodcock, R. W. (1991). Woodcock Language Proficiency Battery–Revised (English form). Chicago: Riverside. Woodcock, R. W., & Muñoz-Sandoval, A. F. (1995). Woodcock Language Proficiency Battery–Revised (Spanish form). Chicago: Riverside.

Sign up to vote on this title
UsefulNot useful