You are on page 1of 18

407924

is and PadeliaduJournal of Learning Disabilities


LDX46210.1177/0022219411407924Siderid

Journal of Learning Disabilities

Creating a Brief Rating Scale for the


46(2) 115­–132
© Hammill Institute on Disabilities 2011
Reprints and permission:

Assessment of Learning Disabilities sagepub.com/journalsPermissions.nav


DOI: 10.1177/0022219411407924
journaloflearningdisabilities.sagepub.com
Using Reliability and True Score
Estimates of the Scale’s Items
Based on the Rasch Model

Georgios Sideridis, PhD1 and Susana Padeliadu, PhD2

Abstract
The purpose of the present studies was to provide the means to create brief versions of instruments that can aid the
diagnosis and classification of students with learning disabilities and comorbid disorders (e.g., attention-deficit/hyperactivity
disorder). A sample of 1,108 students with and without a diagnosis of learning disabilities took part in Study 1. Using
information from modern theory methods (i.e., the Rasch model), a scale was created that included fewer than one third of
the original battery items designed to assess reading skills. This best item synthesis was then evaluated for its predictive and
criterion validity with a valid external reading battery (Study 2). Using a sample of 232 students with and without learning
disabilities, results indicated that the brief version of the scale was equally effective as the original scale in predicting reading
achievement. Analysis of the content of the brief scale indicated that the best item synthesis involved items from cognition,
motivation, strategy use, and advanced reading skills. It is suggested that multiple psychometric criteria be employed
in evaluating the psychometric adequacy of scales used for the assessment and identification of learning disabilities and
comorbid disorders.

Keywords
Rasch model, psychometrics, learning disabilities, rating scale, item analysis, classical test theory, modern theory methods,
item response theory

Assessing students with learning disabilities (LD) is an negative cases, psychological distress, and despair. As
important challenge for the field of LD, particularly more Ysseldyke, Burns, Scholin, and Parker (in press) stated,
so given the fact that recent identification criteria (i.e., the “Dramatic reforms in assessment practices within special
responsiveness-to-intervention model; Fuchs & Deshler, education have occurred over the past 30 years, but the
2007; Swanson, 2008; Vaughn & Fuchs, 2003) rely heavily practice has not yet caught up to research.” One purpose of
on the use of curriculum-based measures (Deno, 1989) the present studies is to provide a psychometric technology
compared to normative assessments (as the severe discrep- that will lead to the valid assessment in LD.
ancy model posited; see Stanovich, 1991; Swanson, 1991;
see Note 1). That is, during implementation of the model
that depended on a significant discrepancy between stu- Importance of Valid
dents’ potential and achievement, the gold standard came Assessments in LD
from standardized normative assessments. With the newly Teacher ratings are often used for the assessment and identi-
formed response to intervention (RTI), however, teachers fication of LD (Goodman & Webb, 2006; Ritter, 1989).
need to construct frequent tests based on the curriculum to Teachers have proven to be good and consistent judges of
monitor responsiveness to a valid treatment (Compton,
1
2006). In fact, the whole RTI concept is based on frequent University of Crete, Rethymno, Greece
2
and valid assessments (Burns, Dean, & Klar, 2004). Thus, University of Thessaly,Volos, Greece
the need to have reliable and valid measurements of stu- Corresponding Author:
dents’ competencies at various skills is imperative, whereas Georgios Sideridis, University of Crete, Galou, Rethymno 74100, Greece
their absence will contribute confusion, false positive and Email: sideridis@psy.soc.uoc.gr
116 Journal of Learning Disabilities 46(2)

students’ academic skills and competencies (Podell & ADHD (Barron, Evans, Baranik, Serpell, & Buvinger,
Soodak, 1993; Rescorla et al., 2007). Nevertheless, teachers 2006) and consequently poor achievement outcomes
are as good judges as the instruments they use. That is, use (Barron et al., 2007; Sideridis, 2007). Furthermore, Morgan,
of invalid instruments would nonetheless lead to false posi- Fuchs, Compton, Cordray, and Fuchs (2008) linked early
tive or false negative classifications or diagnoses because of failure in reading to motivational deficits, a fact that has
the potential error contributed by the testing conditions, the been evidenced in recent meta-analyses of reading (Morgan
person’s preconditions, and biases from the part of the teach- & Sideridis, 2006).
ers. Particularly with regard to the latter, Cassidy (2008) A last important reason is the role of valid assessments
argued that teacher subjectivity can potentially contribute to within the RTI framework. RTI requires constant use of
valid assessments and differentiated it from invalid subjec- assessments to monitor and evaluate growth. This evalua-
tivity. Thus, as much as the field relies on teacher ratings for tion will form the basis for subsequent decision making.
valid identification of LD, more so do rating scales need to Thus, students will be placed in certain environments or
be accurate and reflective of true student abilities, rather than will be given (or withdrawn from) interventions based on
measurement error. In the present studies, a rating scale that these assessments. To make things worse, research that
provides a screening for LD is evaluated for its validity using assesses the reliability of those assessments has pro-
a small portion of its full potential. The importance of this duced worrisome findings. For example, Burns, Scholin,
endeavor is discussed below. Kosciolek, and Livingston (2010) reported that the reli-
ability of two similar decision-making models that were
based on frequent assessments within RTI was only .29,
Importance of Creating and that estimate reflected only consistency. It is possible
Brief Scales for Assessing that besides being inconsistent, neither one of these mod-
Skills and Competencies els represented valid assessments of students’ potential
and actual performance. As things are today in the field
There are several reasons that necessitate the need for brief of LD, the success of our current practices (RTI) relies
assessments in LD. First, fatigue is an issue that should be on valid inferences of students’ abilities. Fortunately,
taken into consideration as students with LD may allocate advances in statistics and measurement provide us with
unnecessary resources (oftentimes inappropriate) when the hope that valid assessments are possible at minimum
putting forth effort relating to a skill. In fact, studies have effort and cost. This is particularly more so with the use
pointed to the fact that students with LD tire easily (Kripke, of contemporary test theory methods, such as the Rasch
Lynn, Madsen, & Gay, 1982; Morgan, 1977). Furthermore, model (Rasch, 1980).
fatigue during a task will likely be associated with effort
withdrawal and poor achievement outcomes.
A second reason for the assessment of brief scales lies The Contribution
with the fact that students with LD oftentimes possess of the Rasch Model
comorbid characteristics of attention deficit and/or attention- The Rasch model (Rasch, 1980) has been particularly
deficit/hyperactivity disorder (ADHD; Mayes, Calhoun, & useful in evaluating the psychometric adequacy of scales
Crowell, 2000). In fact, the comorbidity of the two disorders (Smith & Smith, 2004; Wright & Stone, 1979). Recently,
ranges between 20% and 30% based on U.S. estimates Dimitrov (2003) provided a series of useful criteria for
(National Institute of Mental Health, 2003) or between 35% evaluating scales using information about reliability and
and 46% based on epidemiological studies (Karakas, Turgut, population estimates of variance. The present work relies
& Bakar, 2008). Thus, we hypothesize that prolonged atten- heavily on the psychometric criteria developed by Dimitrov
tion required by lengthy tests will likely lead to failure, (2003) and extends it by providing a 12-index taxonomy
regardless of levels of ability because of the moderating role that can be used for evaluating the psychometric adequacy
of poor attention. Brief tests will certainly “correct” achieve- of scales used for research or practice.
ment levels for the parameter “fatigue.” Thus, the purpose of the present studies was to provide
A third consideration relates to the above two in that the means to create brief instruments for the assessment of
fatigue and lack of attention may relate to a lack of motiva- LD based on information from classical test theory and
tion and consequently effort withdrawal (either at the ante- modern theory analytic methods. Initially, a brief reading
cedent level or as a consequence). For example, Kane rating scale was created based on psychometric standards
(2008) pointed to the fact that the effort provided by stu- and contributions from both classical test theory and mod-
dents with LD and ADHD during the assessment process is ern methods (i.e., the Rasch model). Then, the brief reading
of very poor quality. Furthermore, several studies have scale was evaluated for its validity against the full version
associated effort withdrawal with the employment of mal- of the reading rating scale (Study 1) and an objective “gold
adaptive goals by students with LD (Sideridis, 2005) or standard” of reading ability (Study 2).
Sideridis and Padeliadu 117

Study 1 Rasch model description. The Rasch model for dichoto-


mous items posits that an item can be evaluated for its
Method quality based on two parameters: (a) the ability of the per-
Participants and Procedures. There were 1,105 students, son (termed β or b) and (b) the difficulty of the item (termed
763 with a diagnosis of LD from state multidisciplinary δ or d). The mathematical expression that estimates the
teams and 342 typical students (351 girls, 595 boys, data on probability of person n solving item i correctly is as
gender were missing for 159 students). Students were follows (Bond & Fox, 2001; Rasch, 1980; Wright &
tested for the presence of LD by state multidisciplinary Stone, 1979),
teams of experts whenever they exhibited a significant dis-
e ( Bn - Di )
crepancy between potential and achievement. In all, 381 Probabilityni (xni = 1/Bn, Di) = (1)
students were receiving supplementary services in special 1 + e ( Bn - Di )
education (segregated) settings, and the rest were educated with Pni (xni = 1/Bn, Di) being the probability of person
in general education settings. The participating students n getting item i correct given a person’s level of ability
came from Grades 3 to 9 as follows: Grade 3 = 108, Grade B and an item’s difficulty level D. The term e = 2.71828.
4 = 149, Grade 5 = 15, Grade 6 = 119, Grade 7 = 134, Grade Thus, the probability of getting an item correct involves
8 = 118, and Grade 9 = 150; data were missing for 162 stu- the natural logarithmic transformation of a person’s
dents. School selection followed the demographic charac- ability (Bn) and an item’s level of difficulty (Di). The
teristics of the country in terms of school distribution on way the Rasch model works, once we know the ability
geographical areas (rural, urban, and suburban). Student level of a person and the difficulty level of the item, it is
ratings were completed from 798 teachers during out of easy to estimate the probability of correct responding
school hours. (Wilson, 2005). This is also called the conditional indepen-
dence assumption. Let’s try to understand how Equation 1
Measures works for an item of above average difficulty level (i.e., 1
Teachers who were well acquainted with students’ on the logit scale) and a person of above average ability
behaviors, academic work, and achievement completed (i.e., also 1 on the logit scale, which is conceptually similar
the Learning Disabilities Screening Scale for Teachers to the standard deviation metric). Equation 1 then becomes,
(Padeliadu & Sideridis, 2008), which is a scale that is
e (1-1) e( 0 ) 1
based on the U.S. version of the Learning Disabilities Probabilityni (xni = 1/1.0, 1.0) = (1-1)
= = = 0.50
1+ e 1 + e( 0 ) 1 + 1
Diagnostic Inventory developed by Hammill (1995; also
see Hammill & Bryant, 1998). Teachers were asked to pro- and the probability of a person of above average ability
vide ratings of identified students during the last 2 months solving an above average difficulty item (1) equals 50%.
of the academic year (spring 2008). They were asked to If we apply Equation 1 to the same difficulty item but for
rate each item of the scale in terms of frequency and occur- a person of higher ability (i.e., 2.0 in the logit scale), then
rence of specific academic behaviors (rated between 1 = the probability of solving that item equals 73%. Obviously,
always and 9 = never). For the purpose of the present when the ability B of the person (e.g., 2 in the above case)
study, only the 20-item Reading subscale was used. exceeds that of the difficulty D of the item (e.g., 1) then the
Furthermore, the nine response option system was substi- probability of getting the item correct would be greater than
tuted with a dichotomous 0–1 system to have a yes–no 50%. All Rasch models across Studies 1 and 2 were run
system with regard to the presence of an LD (rather than a with ample levels of power as the recommended number of
probabilistic scheme). Sample items were (a) “reads text participants for a one-parameter model (N = 100; Morizot,
silently very slowly” and (b) “has difficulty decoding two- Ainsworth, & Reise, 2007) was greatly exceeded in the
digit words when reading” (also see Table 1 for the full present studies (N sizes were 1,108 and 232 in Studies 1
scale). The internal consistency (reliability) estimate of and 2, respectively).
the Reading subscale using Cronbach’s alpha was .949 for Psychometric criteria for creating a brief rating scale for the
the full sample. When estimating alpha separately for stu- assessment of LD. In psychometrics (e.g., Anastasi & Urbina,
dents with and without LD, the estimates of alpha were 1997) there are several criteria that are indicators of valid
.939 and .933, respectively. assessments. For the purpose of the present study we
employed criteria from modern theory methods and the
Data Analyses. We employed the Rasch model to evaluate Rasch model (Rasch, 1980). Furthermore, we employed
the adequacy of the full rating scale and to derive the brief newly developed criteria for expected true-score measures
form. All analyses were conducted using Winsteps 6.7 and reliability, relying heavily on the pioneering mathemat-
(Linacre, 1999). Appendix A provides the code for creating ical work of Dimitrov (2003). Thus, the following 12 means
the brief form of the scale. guided our development of psychometric technology.
118
Table 1. Salient Parameters That Are Indicative of Adequate Psychometric Properties at the Item Level.

Point Test–
Item of Reading Scale Infit-ms Outfit-ms Biserial DIF Discrim. LEV HER LETV Retesta
  1.  Reads text out loud very slowly 1.32 1.74 .61 No 0.34 .1670 .1485 .0291 .931
  2.  Reads text silently very slowly 1.36 1.97 .58 No 0.23 .1844 .1603 .0352 .934
  3.  Substitutes, reverses, adds, or omits letters when reading 1.06 1.09 .63 No 0.90 .1901 .1644 .0374 .928
  4.  Substitutes, reverses, adds, or omits syllables when reading 1.02 0.92 .61 No 0.99 .1989 .1707 .0409 .908
  5.  Has difficulty decoding double digit words when reading 0.94 0.86 .68 Yes 1.13 .2007 .1721 .0417 .908
  6.  Substitutes phonologically similar words when reading 1.06 0.92 .60 No 0.94 .2010 .1723 .0418 .912
  7.  Commits errors when reading unknown words 1.06 0.97 .74 No 0.93 .2027 .1736 .0426 .901
  8.  Has difficulties to understand abstract meanings when reading text 0.97 0.96 .70 No 1.04 .2048 .1751 .0435 .910
  9.  Has difficulties in understanding text when reading compared to when listening to it 1.20 1.34 .69 No 0.65 .2053 .1755 .0437 .922
10.  Has difficulties in identifying the main idea of a text 0.92 0.85 .74 No 1.15 .2059 .1759 .0440 .908
11.  Has difficulties in answering questions that require deep processing of information 0.91 0.80 .70 No 1.19 .2057 .1758 .0439 .915
12.  Has difficulties in predicting the content/plot of a text 0.89 0.78 .71 No 1.22 .2051 .1753 .0436 .918
13.  Has difficulties to distinguish salient and important features of a text (from unimportant ones) 0.85 0.71 .75 Yes 1.29 .2047 .1750 .0434 .927
14.  Has difficulties in processing information that will aid text comprehension 0.83 0.71 .78 No 1.28 .2041 .1746 .0432 .934
15.  Has difficulties to memorize information from a text 0.86 0.79 .75 No 1.25 .2031 .1739 .0427 .928
16.  Gives up easily his/her efforts when reading text 1.02 0.99 .71 No 0.96 .2025 .1734 .0425 .931
17.  Has difficulties in implementing strategies that would aid understanding of the text 1.01 1.01 .77 No 0.99 .1986 .1705 .0408 .923
18.  Uses poor and inefficient strategies that do not belong to his/her age group 0.89 0.80 .73 No 1.22 .1918 .1655 .0380 .927
19.  Uses appropriate strategies in inefficient ways 0.93 0.92 .69 No 1.12 .1743 .1534 .0316 .900
20.  Has difficulties in implementing a plan of action when reading 0.89 0.83 .75 No 1.20 .1670 .1485 .0291 .926
Note: Shaded items are those selected in the brief reading rating form (i.e., Items 7, 8, 3, 15, 16 and 17). DIF = differential item functioning for two groups of students, those with a diagnosis of
learning disabilities and typical students; LEV = low expected variance; HER = high expected reliability; LETV = low expected true variance.
a
Test–retest estimates of difficulty parameters across items are shown in Figure 1.
Sideridis and Padeliadu 119

1. Acceptable infit mean square statistic. The formula used which tests the hypothesis that all difficulty parameters D
for that assessment came from Wright and Masters (1981), with standard errors SE are equivalent across items L.
N N 5. Presence of nonuniform DIF. As Linacre (1999) pointed
InfitMSi = ∑ Yin2 / ∑ Win (2) out, the presence of nonuniform DIF is indicative of dis-
n =1 n =1
N criminant validity. Nonuniform DIF was run with the two
2
which is the ratio between the observed residuals n∑=1Yin ability groups (typical and LD) to test the hypothesis that
N the slopes of the items are equivalent in shape (i.e., parallel)
and the average expected residuals ∑ Win / N (also see across ability groups.
n =1
6. Discrimination parameter close to the expected value of 1.
Wilson, 2005). The expected value is around unity, and de- As Linacre (1999) pointed out, when a discrimination
viations of 0.5 from those estimates are indicative of dis- parameter value exceeds unity, then that specific item dis-
torted items (i.e., items that did not perform at the level of criminates high and low ability groups more than expected
difficulty it was expected of them). Furthermore, values (for the item’s difficulty level). On the contrary, when a
that deviate more than one unit from 1 suggest severe dis- discrimination value is less than 1, then the item discrimi-
tortion or in other words items that do not provide con- nates high from low ability groups less than expected. Thus,
structive information to the scale of interest. the closer the discrimination values were to 1, the more
2. Acceptable outfit mean square statistic based on likely was the item to behave according to expectations
z score estimates. The following formula was used for that (i.e., Rasch model’s expectations).
estimation, 7. Low expected error variance. Based on Dimitrov (2003),
∑ Z ni
2 the lower the error of measurement, the higher the probabil-
OutfitMSi = (3) ity that a test is valid. The formula for the estimation of
N amounts of error variance is as follows,
which represents the average of the standardized residual n
variance across individuals and items (i.e., Zni). This un- σ2e = ∑ σ2 (e i ) (6)
i =1
weighted estimate gives more impact on responses that are
away from a person’s (or item’s) measures (Bond & Fox, which represents the accuracy of correct versus incorrect
2001; Linacre & Wright, 1994). The expected value of this scorings.
parameter is also around unity. 8. High expected reliability. As error variance goes down,
3. High biserial correlation between an item and the scale’s reliability goes up. Based on the work of Dimitrov (2003)
total score. As Krus and Ney (1978) pointed out, when point the following was assessed,
biserial correlations are positive and high, this is indicative
of high convergent validity for that specific item. The for- ρ xxτ = σ2x / σ2 = στ2 / (στ2e + σ2 ) (7)
mula to transform point biserial correlations into biserial
correlations (because of the presence of normality in the which represents the ratio of true error variance to total
sample) was the following, variance (i.e., observed variance).
9. Low expected true variance. Again following the pio-
RBiserial = PBs ( p(1 − p)) (4) neering work of Dimitrov (2003), expected true variance
was estimated using his conventions,
with PBs being the point biserial correlation and p being the n n
proportion of data with Y = 1. σ2τ = ∑ ∑ [πi (1 − π i ) − σ 2 (e i )][π j (1 − π j ) − σ 2 (e j )] (8)
i =1 j =1
4. Presence of differential item functioning (DIF) between
children with a diagnosis of LD versus typical students. As Lina- which relate to the items’ mean πi and their error variance σ2e.
cre (1999) pointed out, the difference in item difficulties 10. Test–retest reliability of person and difficulty estimates.
between two groups must exceed .50 in logit units to be The repeated measurement of a scale is the best index of its
meaningful. Paek (2002) added that difficulty difference reliability. For this purpose, all items were subjected to
estimates less than .42 are negligible, those between .43 and test–retest reliability using a subsample of 185 participants.
.64 are of medium size, and those exceeding .64 are large. Two types of reliability were estimated: correlations between
Thus, only DIF estimates that met the criterion of medium participants’ scores across two time points and correlations
effect counted in favor of specific items (see Note 2). The between difficulty estimates (b) across the two time points
formula applied in Winsteps is the following, (Luppescu, 1991), a fact that is particularly important for
item calibration (Lunz & Bergstrom, 1995).
2
L D 2j L Dj  L 1 11. LZ statistic that is indicative of person misfit. When
χ = ∑
2

j =1 SE j
−
2  ∑ 2
j =1 SE j
 /∑
 j =1 SE 2j
) (5) comparing two scales, the proportion of participants in
  one sample (one version of the scale) is contrasted to the
120 Journal of Learning Disabilities 46(2)

proportion of participants (in the other version of the scale) results indicated that one dimension accounted for 35% of
who misfit the Rasch model. The estimation of misfitting the total variance (explained by items and participants). The
participants was conducted using the LZ statistic (Drasgow, respective amount for the brief scale was 24.2%. Certainly
Levine, & Williams, 1985). This statistic is distributed nor- the heterogeneity of the items that composed the brief scale
mally, it is standardized, and values in the software reflect accounted for the fact that the items contained much more
the probability that this person fit the Rasch model. Thus, information compared to a single reading dimension. Nev-
estimates below 5% reflect misfitted participants. The LD ertheless the amount of variance explained was significant.
statistic was computed as follows, Internal consistency estimates. Using the Kuder–Richardson
formula for dichotomous items we assessed the internal
L0 − E ( L0 )
LZi = (9) consistency of both forms of the scale. Estimates for the full
[Var ( L0 )]1/ 2 scale were equal to .949 and for the brief, six-item form
with L0 being the log peak of the likelihood function, E(L0) equal to .879. Item total correlations ranged between .52
being the expectation of L0 and Var(L0) being the variance and .74 for the full scale and between .61 and .74 for the
of L0. The analysis ran with software developed by Liang, brief scale.
Han, and Hambleton (2008). Further comparisons between Test–retest reliability estimates. Using an interval of 1 week,
full and brief forms implemented the general linear model. the rating scale was subjected to another testing for 185 par-
12. Content balance. As Chang (2004) has nicely pointed ticipating teachers. Results indicated that the test–retest
out, “The set of items selected for each examinee must sat- correlation coefficient was .974 for the full reading scale
isfy certain non-statistical constraints such as content bal- and .960 for the brief rating scale.
ance” (p. 122). This was the final criterion in determining Test characteristic curves (TCC). As Crocker and Algina
inclusion of an item to the brief form of the scale in terms (1986) have stated, TCCs reflect regression curves for pre-
of adequately capturing the latent construct of reading. In dicting observed scores (X) from latent trait scores (θ).
other words, items that represented various aspects of read- Figure 2 shows the TCCs of the full and brief scales. At the
ing (i.e., fluency, comprehension, etc.) were better candi- medium reading trait level (θ), it is obvious that the two
dates for inclusion compared to items representing one area tests are equivalent (having equal thresholds and location
only (e.g., only comprehension). Appendix B displays an on the latent trait). However, the slope of the full scale is
SPSS syntax file for running some of the commands. steeper compared to that of the brief scale, suggesting that
higher ability levels are required to achieve a high proba-
bility of success (different levels of discrimination, i.e.,
Results of Study 1 alpha parameter). That is, the difference between the two
Using the criteria described above, this section provides tests is on discrimination but not in location, suggesting that
comparative information with regard to the psychometric the full scale is better at discriminating individuals in the
adequacy of the full scale compared to the brief scale that latent trait range around the point of inflection (Morizot
was composed using fewer than one third of the total items. et al., 2007).
Test information function (TIF). TIF is extremely important
Construction of Brief Rating Scale. Table 1 provides in test construction because it provides information regarding
item information regarding the evaluative criteria of good the “precision” of a scale across levels of the latent trait
items. Based on that evaluation, a best synthesis of six items (Morizot et al., 2007). Ideally a researcher would want to
was selected that met high standards of psychometric qual- develop a scale that would have equal precision across the
ity and content balance. Given the exploratory nature of the trait range, unless the purpose of the test is to identify sub-
present studies, no attempt was made to differentially weigh groups of individuals with specific levels of performance on
the criteria. Instead, an additive model was used in which the latent trait. A normative test, however, should always aim
the items that met most of the above criteria were included to describe individuals at all levels of ability to be potentially
in the brief form (in an effort to involve as few items as pos- unbiased with regard to any level of ability. Furthermore, the
sible, without loosing content validity). As shown in Table 1, concept of “information” in Rasch modeling is very impor-
the six-item synthesis included items with low measure- tant because it relates to the standard error of measurement
ment error and high internal consistency, discrimination, (SEM), which represents the inverse square root of informa-
item–total correlation, and test–retest reliability (e.g., of tion at each and every point along the trait continuum (Rob-
parameter’s difficulty levels; see Figure 1). Below the full ins, Fraley, & Krueger, 2007; Simms & Watson, 2007),
and brief versions of the reading rating scale are compared 1
across various estimates of psychometric adequacy. SE(θ ) = (10)
I(θ)
Comparisons Between Full and Brief Scales with I(θ) representing TIF and SE(θ) representing the SEM.
Dimensionality indices between full and brief rating scales. Conversely, the information function θ of an item is esti-
Using a principal components analysis of the residuals, mated using the formula by Wilson (2005),
Sideridis and Padeliadu 121

Figure 1. Regression of difficulty parameters between Time 1 and Time 2 measurements for full scale.

Figure 2. Test characteristic curves between brief and full scales using Rasch model estimates.
122 Journal of Learning Disabilities 46(2)

Figure 3. Test information function between brief and full scales using Rasch model estimates.

Figure 4. Standard error curves for the two scales (full with 20 items and brief with 6 items).
Scales were placed on the same metric to be comparable.

1 of examinees having different levels of ability (see Note 3).


Inf (θ ) = (11)
sem(θ )2 Figure 3 indicates that both forms of the scale provide in-
and when adding the individual information functions we formation across a wide range of ability (–4 to +4 logits),
can estimate the TIF, which represents the mean of informa- with the brief form of the scale actually being more sensi-
tion for the whole scale (Lord, 1980): tive to lower (and higher) ability groups. This fact may be
particularly more important when evaluating students with
I
Test − Inf (θ ) = ∑ Inf (θ ) (12) learning problems who represent the left tail of a normal
i =1 i distribution.
As shown in Figure 3, the test information curves looked SEM. It was estimated using Equation 9 and is low when
bell shaped across both forms of the scale, suggesting that scales provide large amounts of information (have high
most information for each version of the test was provided information functions). As shown in Figure 4, the standard
for mean levels of ability. This fact strengthens the norma- error curves were parallel between the full and brief version
tive function of the tests and the lack of bias for any group of the scale, suggesting that the two scales had the ability to
Sideridis and Padeliadu 123

Figure 5. Frequency distributions between typical students and students with a learning disability on the full reading rating scale.
Indices of sensitivity and specificity are shown for each level of the rating scale.

discriminate individuals at most of the latent trait range & McGlashan, 2004; Hanley & McNeil, 1982; Hsu, 2002).
except the tails (which is expected). Obviously, for both For this purpose the distributions of the two groups of stu-
scales the greatest precision is evident in theta values that dents are plotted for the full scale (Figure 5) and the brief
range between –2 and +2 (see Figure 3) or between raw scale (Figure 6), and indices of sensitivity and specificity
scores of approximately +4 and +18 (see Figure 4). Although are displayed for each score level (dark and light lines).
low discrimination is expected in the tails of the distribu- Then ROC curves were constructed to test the ability of
tion, Sykes and Hou (2003) suggested that there may be each scale (full and brief forms) to identify students having
some guessing involved at that range of ability, which identified LD (Figure 7). The discriminant validity of each
explains that fact (see the work of Karabatsos, 2003, for scale was evaluated first, and then their areas under the
types of aberrant responding). curve (AUCs) were compared using z difference tests
Discriminant validity. It was evaluated by estimating the (Hanley & McNeil, 1983).
ability of the brief and full forms of the scale to discriminate When looking at the full scale (Figure 5), it is apparent
students with and without identified ld by use of receiver that most of the typical student group had scores around
operating characteristic (ROC) curves (Grilo, Becker, Anez, zero (dark bars above the X line), whereas students with LD
124 Journal of Learning Disabilities 46(2)

Figure 6. Frequency distributions between typical students and students with a learning disability on the brief reading rating scale.
Indices of sensitivity and specificity are shown for each level of the rating scale.

had a wide range of scores beyond zero, suggesting the


presence of various behaviors that define reading disabili-
ties. The area under the curve when trying to predict group
membership from the full rating scale was .828, which was
significant (z = 25.654, p < .001). Specifically, the full scale
had a sensitivity index of 78.9 (i.e., probability of a test
result being positive given the presence of a condition) and
a specificity index of 79.7 (i.e., probability of a test result
being negative when the condition is not present). The
respective estimates for the brief form of the rating scale
were as follows: AUC = .811 (z = 22.985, p < .001), sensi-
tivity = 72.5, and specificity = 85.7. These values were all
indicative of discriminant validation.
When comparing the AUCs (Figure 7) between the two
forms of the scale, results indicated that there were no
significant differences between the two areas (z = 0.859,
p = .3905). This finding supports the null hypothesis that Figure 7. Receiver operating characteristic curves showing
both forms of the scale were equally effective into discrimi- the discriminant ability of the full and brief scales in identifying
nating students with LD from typical students. students with a learning disability.
Sideridis and Padeliadu 125

Figure 8. Probability of misfit for each participant in the full scale versus brief scale.
Values were estimated using the LZ statistic.

Person fitting comparison. Using the LZ test described Brief Discussion of Study 1
above, participants were evaluated for their personal contri-
bution to model fit. Figure 8 displays the probability values Study 1 was designed to provide a taxonomy for which
for each participant (small values represent misfitted par- one can construct brief forms of a scale, with an applica-
ticipants, and large values substantial contribution of each tion in LD. This taxonomy was provided using informa-
participant to the Rasch model’s theses). As shown in Fig- tion from classical and modern theory methods and is
ure 8, more participants in the full scale appear to be on the primarily proposed for LD, for which brief scales may be
right of the distribution compared to the brief scale. This particularly more suitable, appropriate, and necessary.
finding was confirmed by use of a means analysis for which Results indicated that the brief form possessed highly
the mean probability of misfit was equal to .531 for the full desirable psychometric properties and was psychometri-
scale participants and .423 for the participants in the brief cally comparable to the full version of the reading rating
scale. When this difference was tested using student’s t test, scale (at times even more so). Study 2 sought to test the
results supported the alternative hypothesis that there was a validity of the brief scale further (compared to the full
significantly smaller amount of misfitted participants in the form) using external evaluative criteria as suggested
brief scale compared to the full scale, t(1960) = 9,830, p < .001. by Simms and Watson (2007). Thus, using a normative
Item fitting comparison. Items were evaluated for model reading scale, both full and brief forms were tested for the
fit by estimating the discrepancy between their empirical presence of external criterion validity and discriminant
estimate and that posited by the Rasch model (theory). Dis- validity.
crepancies were assessed using chi-square tests. These find-
ings are presented in Figure 9 for both the brief form of the
scale and the full scale. As shown in Figure 9, the full scale Study 2
seems to have items that have large discrepancies between Method
empirical estimations and expectations of the model. Visu-
ally speaking, the discrepancies appear to be smaller with Participants and Procedures. There were 232 students,
regard to the brief scale items. The ICCs for the final brief with 22 of them having a diagnosis of LD from state multi-
six-item scale are shown in Figure 10. disciplinary teams. The remaining 210 students were typical
126 Journal of Learning Disabilities 46(2)

Figure 9. Chi-square values for each item reflecting discrepancies between empirical estimates and theoretical postulates.
Large values are indicative of item misfit.

and were educated in general education settings (122 boys, compare groups of individuals (with and without LD) on
108 girls; data on gender were missing for 2 students). Stu- the latent reading trait (latent means model). Evidence in
dents were identified as having LD as in Study 1 (i.e., using favor of (a) would be manifested with significant correla-
multidisciplinary team criteria). The distribution of students tions between the latent trait and objective reading mea-
per age was as follows: Grade 3 = 33, Grade 4 = 39, Grade sures and for (b) would be manifested with a significant
5 = 35, Grade 6 = 36, Grade 7 = 35, Grade 8 = 28, and Grade coefficient linking a dummy (grouping) variable to the
9 = 17; data on grade were missing for 4 students. Schools latent reading trait. For the purpose of (a) and (b) model-fit
were selected for inclusion as in Study 1 using stratified indices (e.g., chi-square values and fit indices such as com-
random sampling with region being the stratum. parative fit index, etc.) were not relevant, as overall fit was
Learning Disabilities Screening Scale for Teachers. The same not an objective of either the (a) or (b) goal. Instead, the
scale used in Study 1 was also implemented in Study 2. structural coefficients and their respected significance were
The internal consistency estimate of the Reading subscale of interest.
was .929.
Learning Disabilities Reading Inventory. This normative
scale was developed by Padeliadu and Antoniou (2008) and Results and Discussion of Study 2
assesses four aspects of reading (i.e., decoding, fluency, External criterion validity of full and brief rating scale. As
morphology and syntax, and comprehension). Internal con- mentioned above, Model 1 tested the hypothesis that the
sistency estimates ranged between .605 and .853 using the rating scales would correlate significantly with objective
present data. indices of reading achievement (i.e., the Learning Disabili-
ties Reading Inventory). As shown in Figure 11, there were
Data Analyses. Latent variable modeling was implemented significant correlations between the latent teacher rating
using EQS 6.1 (Bentler, 2006). The purpose of this model- scale of reading and decoding (r = .21, p < .05), fluency
ing was twofold: (a) to correlate the latent “reading” trait with (r = -.28, p < .05), and comprehension (r = .12, p < .05, on
objective measurements of reading skills and (b) to a one-tailed test), but not morphology and syntax (r = .03, ns)
Sideridis and Padeliadu 127

Figure 10. Item characteristic curves for the brief six-item rating scale.

for the brief scale. These coefficients were very similar to reveal differences across scales (full form and brief) on
those for the full scale (estimates in parentheses of Figure 11). their relation with the external criterion. Interestingly, flu-
Furthermore, between-correlation-coefficient tests failed to ency correlated negatively with the latent rating of reading,
128 Journal of Learning Disabilities 46(2)

Figure 11. Correlations between reading latent construct (rating scale) and the subscales of the gold standard.
The sign of the estimates was reversed for clarity (because learning disability ratings were reversely coded).

most likely because of an emphasis on reading comprehen- one third of the items of the total scale was created in
sion and the fact that reading fast may be associated with a Study 1 based on multiple and rigorous psychometric crite-
lack of processing deep information from text. ria. Study 2 attempted to validate the findings of Study 1
Discriminant validity of full and brief rating scales: Latent using an objective measurement of reading achievement.
means model. A latent means model was fit to the data to Results indicated that the brief rating scale significantly
test the hypothesis that the latent reading rating would be discriminated between students with and without LD. In
different across students with and without LD. Results from fact, the discriminant ability of the brief rating scale was
that modeling confirmed this hypothesis (see Figure 12). even better, compared to the full item scale, across various
The effect was significant, suggesting that typical students psychometric indices.
had, on the average, a .69 higher reading score compared to The most important finding of the present study was that
students with LD. The respective estimate for the full scale the use of multiple criteria proved to be extremely useful for
was .64, very similar to that for the brief scale. identifying effective items and creating a short form of the
ROC analyses. When comparing the AUCs between the rating scale. Using multiple criteria results showed that the
two forms of the scale, results indicated that both scales brief scale, composed of fewer than one third of the total
provided exceptional discrimination (AUCFull Scale = .913, scale’s items, possessed no inferior psychometric properties
AUCBrief Scale = .919). Furthermore, there were no significant compared to the full scale. At first, this is very encouraging
differences between the two areas (z = 0.240, p = .810), a given the fact that brief scales are needed for assessing
finding that supports the null hypothesis that both forms of skills and competencies in LD. The second important find-
the scale were equally effective at discriminating students ing was that the use of an external gold standard further
with LD from typical students. substantiated the predictive and criterion validity of the
brief scale, which is very important for scale evaluation
(Messe, Crano, Messe, & Rice, 1979). This fact was evidenced
General Discussion more so even when comparing the brief scale to the full
The purpose of the present studies was to provide the means scale.
to create brief instruments for the assessment and screening Interestingly, the brief reading scale involved a combi-
of LD based on information from classical test theory and nation of items related to reading. For example, Item 7 was
modern theory analytic methods. Using a large sample of about fluency, Item 8 was about comprehension, Item 13
students with and without LD, a subscale that comprised was about deep versus surface processing (cognitive strategy
Sideridis and Padeliadu 129

Figure 12. Latent means model between students with and without learning disabilities on latent reading construct (rating scale).
Group was coded as 0 = learning disabilities, 1 = typical.

use), Item 15 was about memorization (cognitive ability, criteria has been a crude approximation of the actual
Item 16 was about motivation and the ability to not give up “quality” of each criterion. However, given the exploratory
easily, and Item 17 was about strategy implementation. In nature of the present work, our knowledge did not allow us
other words, the brief form contained a mixture of factors to differentially weigh the respective criteria. Also, the large
that are accelerators of reading, without belonging to one sample of identified students with LD provides strong sup-
unified aspect of reading (e.g., strategy use only or reading port with regard to the viability and usefulness of the iden-
comprehension only, motivation, etc.). The unified theme tifying criteria for creating a brief rating scale. Certainly,
was reading but the content of the items belonged to con- attempting to replicate the present criteria in the future will
ceptually different variables. aid the generalizability of the present psychometric attempt.

Limitations of the Study Future Directions


The present study is also limited for several reasons. First, The present study provides the means to create brief scales
the criteria used by the multidisciplinary teams to identify for the assessment and identification of LD. It is strongly
students with LD in Greece vary. Thus, we cannot judge the recommended that researchers in the field of LD carefully
validity of these ratings beyond the fact that these students design their scales to meet the psychometric criteria pre-
were identified as having LD by the designated authority. sented herein to examine their validity and usefulness. Only
Second, the identified students did not have a specific diag- valid assessments will aid the field in evaluating effective
nosis of reading disabilities. Thus, the absence of this infor- practices that will subsequently guide appropriate policies.
mation and the heterogeneity observed in the LD population This is particularly more so given the frequency and variety
with regard to various language characteristics even within of curriculum-based assessments required by the recent
the population partly explain the modest relationship found RTI model (see Keogh, 2005, for a discussion) and the
between the rating scale scores and the objective (gold) presence of concerns regarding the use of current assess-
reading criterion. Third, the equal weight applied to all ment practices (Mashburn & Henry, 2004).
130 Journal of Learning Disabilities 46(2)

Appendix A SPSS Syntax File Code:


COMPUTE ve = .011 + .195*exp(-.5*((b/1.797)**2)).
Winsteps 3.67 (Linacre, 2006) ELSE.
Code for Estimating Various Parameters of COMPUTE ve = .0023 + .171*exp(-.5*((b/2.023)**2)).
the Brief (Six-Item) Reading Rating Scale END IF.
COMPUTE p = -.0114 + 1.0228/(1 + exp(b/1.226)).
&INST COMPUTE vt = p*(1 - p) - ve.
TITLE = “Creating A Brief Rating Scale in LD” IF(vt < 0) vt = 0.
NI = 6 FLIP
ITEM1 = 2 VARIABLES b ve p vt.
NAME1 = 1 VECTOR V = VAROO1 TO VAR010.
xWIDE = 1 LOOP #I = 1 TO 10.
NAMLEN = 1 LOOP #J = 1 TO 10.
DIF = $S1W1 END LOOP.
PERSON = STD END LOOP.
ITEM = Reading item FLIP VAR001 TO VAR010.
STBIAS = Y VARIABLE LABELS b “Item difficulty.”
PFILE = PF.txt VARIABLE LABELS ve “Expected error variance.”
IFILE = IF.txt VARIABLE LABELS p “Expected item mean.”
CODES = 01 VARIABLE LABELS vt “Expected true variance.”
TOTAL = YES RENAME VARIABLES (CASE_LBL = ITEM).
CHART = YES EXECUTE.
CLFILE = *
* Declaration of Conflicting Interests
&END The authors declared no potential conflicts of interest with respect
READ7 to the research, authorship, and/or publication of this article.
READ8
READ13 Funding
READ15 The authors received financial support for conducting the study
READ16 from the Greek Ministry of Education (EPEAEK Programme).
READ17 We are grateful to Professor Porpodas for putting the research
END LABELS proposal together.
2110000 507
2001111 508 Notes
11110112020 1. This model for some researchers has outlived its usefulness, given
11100112022 its implementation since 1975 (Fletcher et al., 1998; Fletcher,
Francis, Morris, & Lyon, 2005; Francis et al., 2005).
2. In favor because of the fact that significant differential item func-
Appendix B tioning would point to the presence of discriminant validation.
3. This is important for scales designed to evaluate students with
This appendix provides an SPSS syntax from which one learning problems because a subgroup of students should be
can derive estimates of expected error variance, expected evaluated against normative criteria. Thus, a scale must be sen-
item mean, and expected true variance for a 10-item scale. sitive to the mean of the normative group for the scale to have
This syntax has been modified from the original (Dimitrov, a valid assessment of the reference group to which the target
2003), which also provides estimates of reliabilities (same group will be compared.
variable names are used for correspondence). For the origi-
nal SPSS and SAS codes, the reader is advised to read the References
original work of Dimitrov (2003). For the present syntax Anastasi, A., & Urbina, S. (1997). Psychological testing. Upper
file to work, the reader needs to provide a column with dif- Saddler River, NJ: Prentice Hall.
ficulty estimates “b” from the Rasch model. The actual Barron, K. E., Evans, S.W., Baranik, L. E., Serpell, Z. N., &
Rasch difficulty estimates need to be in an SPSS data file Buvinger, E. (2006). Achievement goals of students with
column labeled “b.” After running the syntax, the results ADHD. Learning Disability Quarterly, 29, 137–158.
are shown in a new SPSS data file and need to be saved with Bentler, P. M. (2005) EQS 6.1: Structural equations program
a new name. manual. Multivariate Software, Inc., Encino, CA.
Sideridis and Padeliadu 131

Bond, T., & Fox, C. M. (2001). Applying the Rasch model: Fun- Grilo, C. M., Becker, D. F., Anez, L. M., & McGlashan, T. H.
damental measurement in the human sciences. Mahwah, NJ: (2004). Diagnostic efficiency of DSM-IV criteria for border-
Lawrence Erlbaum. line personality disorder: An evaluation in Hispanic men and
Burns, M., Dean, V., & Klar, S. (2004). Using curriculum-based women with substance use disorders. Journal of Consulting
assessment in the responsiveness to intervention diagnostic and Clinical Psychology, 72, 126–131.
model for learning disabilities. Assessment for Effective Inter- Hammill, D. D. (1995). The Learning Disability Diagnostic Inven-
vention, 29, 47–56. tory (LDDI). Austin, TX: Pro-Ed.
Burns, M., Scholin, S., Kosciolek, S., & Livingston, J. (2010). Hammill, D. D., & Bryant, B. R. (1998). Learning Disabili-
Reliability of decision making frameworks for response to ties Diagnostic Inventory examiner’s manual. Austin, TX:
intervention for reading. Journal of Psychoeducational Assess- Pro-Ed.
ment, 28, 102–114. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of
Cassidy, S. (2008). Subjectivity and the valid assessment of pre- the area under a receiver operating characteristic (ROC) curve.
registration student nurse clinical learning outcomes: Implica- Radiology, 143, 39–46.
tions for mentors. Nurse Education Today, 29, 33–39. Hanley, J. A., & McNeil, B. J. (1983). A method for comparing the
Chang, H. H. (2004). Understanding computerized adaptive test- areas under a receiver operating characteristic curves derived
ing: From Robbins-Monro to Lord and Beyond. In D. Kaplan from the same cases. Radiology, 148, 839–843.
(Ed.), The sage handbook of quantitative methodology for the Hsu, L. M. (2002). Diagnostic validity statistics and the MCMI-III.
social sciences (pp. 117-131). CA, Thousand Oaks: Sage Psychological Assessment, 14, 410–422.
Compton, D. L. (2006). How should “unresponsiveness” to sec- Kane, S. T. (2008). Minimizing malingering and poor effort in the
ondary intervention be operationalized? It is all about the LD/ADHD evaluation process. ADHD Report, 16, 5–9.
nudge. Journal of Learning Disabilities, 39, 170–173. Karabatsos, G. (2003). Comparing the aberrant response detection
Crocker, L., & Algina, J. (1986). Introduction to classical performance of 36 person fit statistics. Applied Measurement
and modern test theory. Philadelphia, PA: Harcourt Brace in Education, 16, 277–298.
Jovanovich. Karakas, S., Turgut, S., & Bakar, E. (2008). Neuropsychometric
Deno, S. L. (1989). Curriculum based measurement and special comparison of children with pure learning disabilities, pure
education services: A fundamental and direct relationship. ADHD, comorbid ADHD with learning disability and normal
New York, NY: Guilford. controls using the Mangina test. International Journal of Psy-
Dimitrov, M. D. (2003). Reliability and true-score measures of chophysiology, 69, 147–148.
binary items as a function of their Rasch difficulty parameters. Keogh, B. K. (2005). Revisiting classification and identification.
Journal of Applied Measurement, 4, 222–233. Learning Disability Quarterly, 28, 100–102.
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropri- Kripke, B., Lynn, R., Madsen, J., & Gay, P. (1982). Familial
ateness measurement with polytomous item response models learning disability, easy fatigue, and maladroitness: Prelimi-
and standardized indices. British Journal of Mathematical and nary trial of monosodium glutamate in adults. Developmental
Statistical Psychology, 38, 67–86. Medicine & Child Neurology, 24, 745–751.
Fletcher, J., Francis, D., Morris, R., & Lyon, R. (2005). Evidence- Krus, D. J., & Ney, R. G. (1978). Convergent and discriminant
based assessment of learning disabilities in children and validity in item analysis. Educational and Psychological Mea-
adolescents. Journal of Clinical Child and Adolescent Psy- surement, 38, 135–137.
chology, 34, 506–522. Liang, T., Han, K. T., & Hambleton, R. K. (2008). User’s guide
Fletcher, J., Francis, D., Shaywitz, S., Lyon, R., Foorman, B., for ResidPlots-2: Computer software for IRT graphical resid-
Stuebing, K., & Shaywitz, B. (1998). Intelligence testing and ual analyses, Version 2.0 (Center for Educational Assessment
the discrepancy model for children with learning disabilities. Research Rep. No. 688). Amherst: University of Massachu-
Learning Disabilities Research & Practice, 13, 186–203. setts, Center for Educational Assessment.
Francis, D., Fletcher, J., Stuebing, K., Lyon, R., Shaywitz, B., & Linacre, J. M. (1999). A user’s guide and manual to Winsteps.
Shaywitz, S. (2005). Psychometric approaches to the identi- Chicago, IL: Mesa Press.
fication of LD: IQ and achievement scores are not sufficient. Linacre, J. M., & Wright, B. D. (1994). Chi-square fit statistics.
Journal of Learning Disabilities, 38, 98–108. Rash Measurement Transactions, 8, 360–361.
Fuchs, D., & Deshler, D. (2007). What we need to know about Lord, F. M. (1980). Applications of item response theory to practi-
responsiveness to intervention (and shouldn’t be afraid to ask). cal testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Learning Disabilities Research & Practice, 22, 129–136. Lunz, M. E., & Bergstrom, B. A. (1995). Item recalibration and
Goodman, G., & Webb, M. M. (2006). Reading disability referrals: ability estimate stability. Rasch Measurement Transactions, 8,
Teacher bias and other factors that impact response to inter- 396–397.
vention. Learning Disabilities: A Contemporary Journal, 4, Luppescu, S. (1991). Graphical diagnosis. Rasch Measurement
59–70. Transactions, 5, 136.
132 Journal of Learning Disabilities 46(2)

Mashburn, A. J., & Henry, G. T. (2004). Assessing school readiness: Ritter, D. (1989). Teachers’ perceptions of problem behavior
Validity and bias in preschool and kindergarten teachers’ ratings. in general and special education. Exceptional Children, 55,
Educational Measurement: Issues and Practice, 23, 16–30. 559–564.
Mayes, S. D., Calhoun, S. L., & Crowell, E. W. (2000). Learn- Robins, R. W., Fraley, C. R., & Krueger, R. F. (2007). Handbook
ing disabilities and ADHD: Overlapping spectrum disorders. of research methods in personality psychology. New York,
Journal of Learning Disabilities, 33, 417–424. NY: Guilford.
Messe, L. A., Crano, W. D., Messe, S. R., & Rice, W. (1979). Sideridis, G. D. (2005). Performance approach-avoidance motiva-
Evaluation of the predictive validity of tests of mental ability tion and planned behavior theory: Model stability with Greek
for classroom performance in elementary grades. Journal of students with and without learning disabilities. Reading and
Educational Psychology, 71, 233–241. Writing Quarterly, 21, 331–359.
Morgan, P. L. (1977). The differential effects of visual background Sideridis, G. D. (2007). Why are students with learning disabilities
and fatigue on automatized task performance in learning dis- depressed? A goal orientation model of depression vulnerability.
abled and normal children. Dissertation Abstracts Interna- Journal of Learning Disabilities, 40, 526–539.
tional, 37, 4695–4696. Simms, L. J., & Watson, D. (2007). The construct validation
Morgan, P. L., Fuchs, D., Compton, D., Cordray, D., & Fuchs, L. approach to personality scale construction. In R. W. Robbins,
(2008). Does early reading failure decrease children’s reading C. Fraley, & R. F. Krueger (Eds.), Handbook of research meth-
motivation? Journal of Learning Disabilities, 41, 387–404. ods in personality psychology (pp. 240–258). New York, NY:
Morgan, P. L., & Sideridis, G. D. (2006). Contrasting the effec- Guilford.
tiveness of fluency interventions for students with or at risk for Smith, E. V., & Smith, R. M. (2004). Introduction to Rasch mea-
learning disabilities: A multilevel random coefficient model- surement: Theory, models and applications. Maple Grove,
ing meta-analysis. Learning Disabilities: Research and Prac- MN: JAM Press.
tice, 21, 191–210. Stanovich, K. E. (1991). Discrepancy definition of reading disability:
Morizot, J., Ainsworth, A. T., & Reise, S. P. (2007). Toward Has intelligence led us astray? Reading Research Quarterly, 26,
modern psychometrics: Application of item response theory 7–29.
models in personality research. In R. W. Robbins, C. Fraley, Swanson, L. (1991). Operational definitions and learning disabilities:
& R. F. Krueger (Eds.), Handbook of research methods in An overview. Learning Disability Quarterly, 14, 242–254.
personality psychology (pp. 407–423). New York, NY: Guilford. Swanson, L. (2008). Neuroscience and RTI: A complemen-
National Institute of Mental Health. (2003). Attention deficit hyper- tary role. In E. Fletcher-Janzen & C. R. Reynolds (Eds.),
activity disorder. Bethesda, MD: Department of Health and Neuropsychological perspectives on learning disabilities in
Human Services, National Institutes of Health. the era of RTI: Recommendations for diagnosis and intervention
Padeliadu, S., & Antoniou, F. (2008). Learning Disabilities Read- (pp. 28–53). Hoboken, NJ: John Wiley.
ing Inventory. Athens: YPEPTH, EPEAEK. Sykes, R. C., & Hou, L. (2003). Weighting constructed-response
Padeliadu, S., & Sideridis, G. D. (2008). Learning disabilities items in IRT-based exams. Applied Measurement in Education,
screening for teachers. Athens: YPEPTH, EPEAEK. 16, 257–275.
Paek, I. (2002). Investigations of differential item functioning: Vaughn, S., & Fuchs, L. (2003). Redefining learning disabilities
Comparisons among approaches, and extension to a multidi- as inadequate response to instruction: The promise and poten-
mensional context. Unpublished doctoral dissertation, Univer- tial problems. Learning Disabilities Research & Practice, 18,
sity of California, Berkeley. 137–146.
Podell, D., & Soodak, L. (1993). Teacher efficacy and bias in spe- Wilson, M. (2005). Constructing measures: An item response
cial education referrals. Journal of Educational Research, 86, modelling approach. Mahwah, NJ: Lawrence Erlbaum.
247–253. Wright, B. D., & Masters, G. N. (1981). Rating scale analysis.
Rasch, G. (1980). Probabilistic models for some intelligence and Chicago, IL: Mesa Press.
attainment tests. Chicago, IL: University of Chicago Press. Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago,
Rescorla, L. A., Achenbach, T. M., Ginzburg, S., Ivanova, M., IL: Mesa Press.
Dumenci, L., Almqvist, F., & . . . Verhulst, F. (2007). Con- Ysseldyke, J., Burns, M., Scholin, S., & Parker, D. (2010). Instruc-
sistency of teacher-reported problems for students in 21 coun- tionally valid assessment within response to intervention.
tries. School Psychology Review, 36, 91–110. Teaching Exceptional Children, 42(4), 54–62.

You might also like