Adhd RS
Adhd RS
research-article2014
ASMXXX10.1177/1073191114535242AssessmentMakransky and Bilenberg
Article
Assessment
Abstract
Attention deficit/hyperactivity disorder (ADHD) is one of the most common psychiatric disorders in childhood and
adolescence. Rating the severity of psychopathology and symptom load is essential in daily clinical practice and in research.
The parent and teacher ADHD-Rating Scale (ADHD-RS) includes inattention and hyperactivity/impulsivity subscales and is
one of the most frequently used scales in treatment evaluation of children with ADHD. An extended version, mADHD-RS,
also includes an oppositional defiant disorder subscale. The partial credit Rasch model, which is based on item response
theory, was used to test the psychometric properties of this scale in a sample of 566 Danish school children between 6 and
16 years of age. The results indicated that parents and teachers had different frames of reference when rating symptoms
in the mADHD-RS. There was support for the unidimensionality of the three subscales when parent and teacher ratings
were analyzed independently. Nonetheless, evidence for differential item functioning was found across gender and age for
specific items within each of the subscales. The findings expand existing psychometric information about the mADHD-RS
and support its use as a valid and reliable measure of symptom severity when used in age- and gender-stratified materials.
Keywords
ADHD-RS, Rasch model, validation, invariance, differential item functioning
Myers, 2003). These scales are useful in the diagnostic pro- can be used to study whether a given measure is interpreted
cess and in treatment evaluation of ADHD patients. Some in a conceptually similar manner by respondents represent-
of the most widely used examples are the Vanderbilt ADHD ing different genders or age groups, or when parents or
Parent Rating Scale (Wolraich et al., 2003), the Swanson, teachers are used as informants.
Nolan, and Pelham Scale IV (Swanson et al., 2001), and the In standard Child and Adolescent Mental Health Services
ADHD Rating Scale-IV (ADHD-RS-IV; DuPaul, Power, (CAMHS), the clinician uses the mADHD-RS total score to
Anastopoulos, & Reid, 1998). The latter being the most fre- assess severity, and if the patient is treated, to assess changes
quently used. over time (remission/relapse of symptoms). Therefore,
The ADHD-RS was conceived as a 14-item rating scale more information is needed about the dimensionality of the
containing the DSM-III criteria for ADHD rated by parents scale and weather the scale can be used as a single unidi-
and teachers (DuPaul, 1991). As the diagnostic criteria mensional indicator of pathological severity, or if it is nec-
changed with DSM-III-R and later DSM-IV the scale was essary to interpret the results based on the three subscales of
revised to an 18-item scale with nine hyperactivity/impul- inattention, hyperactivity/impulsivity, and ODD. It is also
sivity criteria and nine inattention criteria. The revised ver- essential for the clinician to be familiar with how a score
sion was named ADHD-RS-IV (DuPaul, 1998; DuPaul et actually reflects symptom load and severity and how you
al., 1998). Barkley, Gwenyth, and Arthur (1999) modified can interpret scores and changes in scores from different
the ADHD-RS-IV to a 26-item version by adding eight spe- informant reports or across gender and age groups.
cific ODD items to the original 18 items. This modified ver- Several researchers have suggested that future research
sion fulfills the needs consistent with ICD-10 classification, use methods such as item response theory (IRT) to investi-
where ODD in combination with HKD constitutes a spe- gate the validity of scales (Embretson & Reise, 2000; Hays,
cific diagnostic category, hyperkinetic conduct disorder Morales, & Reise, 2000). These methods provide a more
(World Health Organization, 1992). In DSM-oriented set- thorough assessment of the measurement properties of a
tings the 18 items are unchanged in DSM-5 (American scale and test for specific properties such as MI and the uni-
Psychiatric Association, 2013), and the modified version dimensionality of a scale (Embretson & Reise, 2000).
adds valuable information about comorbid ODD. The mod- There is existing literature that has examined the psycho-
ified ADHD-RS (mADHD-RS) is the focus of the current metric properties of other ADHD scales using the IRT
study. approach (e.g., Gomez, 2008a, 2008b, 2012, 2013; Gomez,
To date, the psychometric properties of the ADHD-RS Vance, & Gomez, 2010). These studies have shown that vir-
and the mADHD-RS have been subjected to evaluation tually all items were effective in discriminating children
using classical test theory (CTT; Lord & Novick, 1968) with and without ADHD both in parent and teacher ratings
methods. These studies have generally supported the con- in the Disruptive Behavior Rating Scale (Barkley &
struct and predictive validity of the measures, including Murphy, 1998). Conversely, to our knowledge, there is no
findings that confirm a two- and three-factor model for the research that has investigated the ADHD-RS-IV or the
ADHD-RS and the mADHD-RS, respectively (e.g., mADHD-RS from an IRT perspective. Furthermore, little is
Magnusson, Smari, Gretarsdottir, & Prandardottir, 1999; known about the MI of the scale across raters or across
Szomlaiski et al., 2009; Zhang, Faries, Vowles, & demographic groups.
Michelson, 2005). The subscales have been shown to dis- The Rasch model (Rasch, 1960), also known as the 1-PL
criminate patients with ADHD from children within the model within the framework of IRT, describes the associa-
general population and from clinical controls. Interrater tion between a person’s level of an underlying trait and the
agreement between parents and teachers, nevertheless, has probability of a specific item response on a measure. This
only been moderate, which is not necessary due to rater reli- association places the individual’s level of the underlying
ability, but may instead reflect real differences in behavior trait and the item difficulty on a same metric. Observed data
across different settings. Gender and age trends have also are tested against the assumptions of the model, and if met,
indicated more symptoms in boys than in girls and higher the raw score of a scale can be said to reflect the severity of
scores in younger children. Although there is existing the underlying trait on an interval scale of measurement
research that has investigated the reliability between infor- (Tennant & Conaghan, 2007). An interval level of measure-
mants, as well as differences between gender and age ment is essential when measuring change based on the
groups when using the mADHD-RS, little is currently effects of clinical interventions as is often the case with
known about how the scales function across these groups. ADHD rating scales (e.g., Sonuga-Barke et al., 2013;
This can be investigated by assessing the measurement Swanson et al., 2001).
invariance (MI) of a scale across different demographic or An extension of the Rasch model to items with more
informant groups. MI refers to the statistical property of than two response options (polytomous items), the partial
measurement that indicates that the same construct is being credit model (PCM; Masters, 1982) is applied in this study.
measured across some specified groups. For example, MI This model was selected because when fit with the PCM is
696 Assessment 21(6)
obtained raw scores represent a sufficient statistic (e.g., (0-3), where 0 represents never or rarely, 1 is sometimes, 2
Rasch, 1960). That is to say, the person total score contains is often, and 3 is very often. The total score: range 0 to 78
all information available within the specified context about and 3 subscores; inattentive scale: range 0 to 27, hyperac-
the individual, and the item total score contains all informa- tive/impulsive scale: range 0 to 27, and conduct scale: range
tion with respect to item, with regard to the relevant latent 0 to 24 can all be calculated from item scores.
trait. This is important in most applied settings where raw The Danish version of the questionnaire was used in this
total scores are used to support diagnostic assignment and study. The questionnaire was translated from the original
in clinical decision making. Using the PCM also results in a U.S. version and back-translated into English. If there was
more comprehensive evaluation of the validity of a scale discrepancy in item phrasing after back-translation, the
because fit to the PCM requires that the data fulfill a num- Danish item was rephrased in order to assess the diagnostic
ber of rigorous conditions. Therefore, assessing the validity essence of the given symptom, all according to standard
of mADHD-RS with the PCM is an important further step procedures. Existing evidence of validity and reliability of
in establishing the validity of these scales. Furthermore, the Danish version of the measure using CTT is documented
assessing the validity of the mADHD-RS scale from a in Szomlaiski et al. (2009).
Rasch or IRT perspective is important because the mADHD-
RS includes the additional ODD/CD scale that is not
included in other ADHD measures.
Statistical Analysis
The main objectives of the article are to use the PCM to The research questions presented above were investigated
investigate the following key research questions: by assessing if the scale(s) in the mADHD-RS fit the GPC
Rasch model with the RUMM2030 program (Andrich,
Research Question 1: Does the mADHD-RS make up a Sheridan, & Luo, 2010). There are four fundamental evalu-
single unidimensional scale or is it a multidimensional ation criteria for PCM, namely, unidimensionality, item fit,
scale with three individual subscales (inattention-, item invariance, and general fit of the data to the model
hyperactivity/impulsivity- and ODD)? (Smith, Wright, Selby, & Velikova, 2007). These are
Research Question 2: Do the scale(s) function similarly described below.
when teachers or parents are used as informants?
Research Question 3: Do the items in the scale(s) func- Unidimensionality. A fundamental assumption when using a
tion similarly across gender and age groups? scale for clinical purposes is that the items in the scale mea-
sure only one underlying trait, which is known as unidimen-
sionality. The unidimensionality assumption can be tested
Method in the PCM by testing for local independence of the items
Sample and Procedures (Wright, 1996). Local dependence (LD) between items
occurs when items are redundant or linked in some way,
The participants in the study consisted of 566 children, 296 such that the response on one item will determine the
(52%) boys and 270 (48%) girls, ranging from 6 to 16 years response on another. LD can be assessed by examining the
of age (mean = 10.98). Children were recruited from repre- residual correlation matrix. Items with residuals more than
sentative schools from inner city, suburban, and rural areas. 0.2 are typically labeled as being locally dependent. Unidi-
The sample was representative of the Danish school child mensionality can also be assessed with a formal test pro-
population in terms of family, social status, and IQ posed by Smith (2002). This test uses the first residual
(Szomlaiski et al., 2009). Participants mirror the Danish factor in a principal components analysis (of residuals) to
child population, including the whole spectrum of children determine two groups of items: those with positive and
within this age, with the exception of severely intellectually those with negative loadings. Each set of items is then used
disabled individuals. Data were only included in the analy- to make an independent trait estimate for each person in the
sis when complete ratings on all items from both parents sample. Given that the items form a unidimensional scale, it
and teachers were available. is expected that there should not be much difference
between the person estimates from the two item subsets. An
independent samples t test is used to determine whether
Measures
there is a significant difference between the two estimates.
ADHD-RS modified by Barkley (Barkley et al., 1999) is a If the value does exceed the 5% expected value then the
26-item questionnaire including the 18 original ADHD- conclusion can be made that the scale is unidimensional.
RS-IV items supplemented with 8 conduct problem items.
The questionnaire is used in two settings (identical ver- Item Fit. Item fit is investigated in this study in order to
sions), with parents and teachers, respectively, as infor- determine whether all the symptoms addressed in the
mants. All items are rated on a 4-point Likert-type scale mADHD-RS are equally important for assessing the severity
Makransky and Bilenberg 697
of the diagnosis. Fit is achieved when the individual items Table 1. Means, Standard Deviations, and Reliability for Each
measure the latent trait similarly to the other items in the Scale in the mADHD-RS.
scale. Over-fit is obtained when an item discriminates Reliability
(between individuals who are high and those who are low on
the latent trait) more than what is expected by the model. Scale Mean SD PSI Alpha
Similarly under-fit is obtained when the item does not dis-
Full scale parents 10.68 9.10 0.84 0.92
criminate as well as expected. Significant chi-square statis- Full scale teachers 7.57 10.84 0.85 0.95
tics at the .05 level of significance (with a Bonferroni Inattention parents 4.25 3.91 0.70 0.86
correction) are classified as items that do not fit the PCM in Inattention teach 3.90 5.37 0.86 0.94
this study. Hyperactivity/impulsivity parents 3.70 3.59 0.62 0.82
Hyperactivity/impulsivity teachers 2.28 4.26 0.72 0.93
Item Invariance. Item invariance requires that item estima- ODD problems parents 2.73 3.16 0.61 0.85
tion is independent of the subgroups of individuals com- ODD problems teachers 1.39 3.17 0.70 0.93
pleting the measure (Bond & Fox, 2001). Items not
demonstrating invariance are commonly referred as exhibit- Note. mADHD-RS = modified Attention Deficit/Hyperactivity Disorder-
Rating Scale; PSI = Person Separation Index; ODD = oppositional defiant
ing differential item functioning (DIF; Makransky & Glas, disorder.
2013). For example, DIF occurs when different subgroups
within the sample (e.g., boys vs. girls) have different scores
on specific items, despite equal levels of latent ADHD trait chi-squared probability of the general fit of the scale to the
(e.g., inattention). model in the first column. The second column reports the
number of significant t tests based on Smith’s (2002) unidi-
General Model Fit. The chi-squared statistic is commonly mensionality test where fewer than 5% of the t tests should
used to assess the general fit of the data to the model includ- be significant for the scale to be unidimensional. The third
ing the property of invariance across the trait. A significant column reports the combination of items with significant
chi square indicates that the hierarchical ordering of the LD residuals. The fourth column reports the items that did
items varies across the trait, which violates the require- not fit the PCM. The final two columns report the items that
ments of the PCM (Pallant & Tennant, 2007). A significance displayed DIF by gender and age within each scale.
value of .5 with a Bonferroni adjustment to account for the
number of hypothesis tested was used in this study. Analysis of the ADHD-RS as a Single
In addition to these four criteria that assess the construct
validity of the measure, reliability is reported with
Unidimensional scale
Cronbach’s alpha in addition to a Person Separation Index The first set of analysis investigated the possibility of con-
(PSI). Similar interpretations can be used when using the sidering the entire 26 item mADHD-RS as a unidimensional
two measures (Tennant & Conaghan, 2007); however, the scale. The results of the PCM analysis show lack of fit for
PSI takes an IRT perspective where standard error may vary the parent and the teacher scale. There is clear evidence of
at different points of the latent construct depending on the multidimensionality for the parent and teacher scales with
item information that is available at that point. Therefore, 13.07% and 16.25% significant t tests, respectively, which
the PSI is directly related to the targeting of the items and is fall outside the nominal level of 5% (see Table 2).
important in clinical settings because poorly targeted mea- Furthermore, LD was identified with a large number of the
sures often result in floor or ceiling effects. Targeting can be item pairs, and there were many items that did not fit the
displayed by reporting a confidence interval (CI) based on scale. Therefore, the remaining analyses in this study will
the standard error (SE) of the trait estimate conditionally at investigate the mADHD-RS as a multidimensional scale that
each point on the latent trait range. measures the three scales of inattention, hyperactivity/
impulsivity, and ODD. This formulation is consistent with
the way the scale was devised.
Results
The means and standard deviations as well as person sepa-
ration and alpha reliability for the scales investigated in this
Analysis Combining Parent and Teacher Ratings
study are presented in Table 1. Total ADHD-RS mean The next set of analyses compared the teacher and parent
scores and mean subscores are consistent with Danish norm scales. Table 1 indicates that the mean ratings were higher for
scores (Poulsen, Jørgensen, Dalsgaard, & Bilenberg, 2009). parents than teachers for all three scales, meaning that teach-
Also, scores reflect the same age and gender effects seen in ers were more lenient in their ratings of the children than par-
all previous community samples. The results of fit to the ents. Similarly the SD was lower for parents than for teachers
PCM are presented in Table 2. Table 2 reports the in all three scales. Although the mADHD-RS is typically
698 Assessment 21(6)
Table 2. Results of Fit to the PCM in the Original Analysis for Each Scale.
Note. nr = not reported because too many items that did not fit the model.
Inattention items: 1 = Fails to give close attention, 2 = Difficulty sustaining attention, 3 = Does not listen, 4 = Does not follow instructions, 5 =
Difficulty organizing tasks, 6 = Avoids tasks, 7 = Loses necessary things, 8 = Easily distracted, 9 = Is forgetful.
Hyp/imp items: 10 = Fidgets or squirms, 11 = Leaves seat, 12 = Runs about or climbs excessively, 13 = Difficulty playing quietly, 14: On the go, 15 =
Talks excessively, 16 = Blurts out answers, 17 = Difficulty awaiting turn, 18 = Interrupts.
ODD items: 19 = Loses temper, 20 = Argue with adults, 21 = Disobeys parents and teachers, 22 = Deliberately annoys people, 23 = Blames others, 24
= Easily annoyed, 25 = Is angry and resentful, 26 = Is spiteful or vindictive.
used independently for parents and teachers, the analysis was estimate of the latent trait value reported in the PCM, and
designed to investigate if the ratings from the different frames the total score for the individual scales. A 95% confidence
of reference could be combined to fit a single PCM. Fit to the interval around the estimated score is displayed in the graph
model would mean that both frames of reference could be to show the measurement precision of the estimates condi-
combined to create a single total score scale. tionally at each trait level. The graphs illustrate that there is
The results indicated that parent and teacher ratings good targeting for each scale and confidence intervals only
combined did not fit the PCM for any of the scales. There become large for very high and very low scores.
was also not support for the unidimensionality assumption
because 17.84%, 11.48%, and 10.78% of the t tests were
Parent Inattention Scale
significant for the inattention, hyperactivity/impulsivity,
and ODD scales, respectively. Furthermore, there were In general, the nine-item parent inattention scale fit the
many items that did not fit the model, and item pairs with PCM (χ2[63] = 75.03, p = .14). There was support for the
LD. This is an indication that the frame of reference used by unidimensionality of the scale (4.95 significant tests; Smith,
parents and teachers is different. The results also indicated 2002), and there was no redundancy in the items (no residu-
that the PSI index and alpha reliability in the mADHD-RS als over 0.2 indicating no LD). Furthermore, there were no
was consistently higher for teachers than for parents (see items with significant chi-squared statistics indicating that
Table 1). Therefore, based on these results and because the all the items were good measures of the latent inattention
scales are used independently, in practice the remaining construct. In relation to invariance, no items displayed DIF
analyses investigated the ratings from parents and teachers by gender, conversely, Item 1 (Fails to give close attention
independently. to details or makes careless mistakes in schoolwork) and
Item 3 (Does not seem to listen when spoken to directly)
displayed DIF by age. Cronbach’s alpha of the scale was
Results for Individual Scales 0.86 and the PSI was 0.70 indicating acceptable reliability
Figures 1 displays the item parameters of the individual for the scale. In general, the results support the construct
scales in relation to the distribution of respondents. The left validity and reliability of the parent inattention scale.
frames in the figure represent the inattention, hyperactivity/
impulsivity, and ODD scales when parents are used as
Teacher Inattention Scale
informants, and the right frames represent the three scales
when teachers are used as informants. It is clear from this The nine item teacher inattention scale also had general
figure that the items target a wide range of the latent trait. It good fit to the PCM (χ2[36] = 55.12, p = .02). There was sup-
is also clear that most respondents score low on the scales port for the unidimensionality of the scale (2.83% signifi-
which is expected because the sample represents a general cant tests), and there was no redundancy in the items within
school population. Figure 2 displays the targeting of the the scale. Furthermore, all items fit the PCM model with the
scale in terms of measurement precision as a function of the exception of Item 2 (Has difficulty sustaining attention in
Makransky and Bilenberg 699
tasks or play activities). This item had a negative fit residual DIF by age. Cronbach’s alpha of the scale was 0.94 and the
indicating that the item discriminated slightly better than the PSI was 0.86 indicating good reliability for the scale. In gen-
PCM model predicted (χ2[5] = 16.75, p = .005). Nonetheless, eral, the results support the construct validity and reliability
the item was retained because the item did not display LD of the teacher inattention scale.
with any other items, the fit of the scale did not improve
when eliminating it, and because the general fit of the model
Parent Hyperactivity/Impulsivity Scale
was good. In relation to invariance, no items displayed DIF
by gender, conversely, Item 7 (Loses things necessary for The nine-item hyperactivity/impulsivity scale did not fit the
tasks or activities) and Item 8 (Is easily distracted) displayed PCM (χ2[45] = 99.35, p < .0005). However, there was
700 Assessment 21(6)
Parents Teachers
Inatt.
27 27
24 24
21 21
18 18
15 15
12 12
9 9
6 6
3 3
0 0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Hyp/imp
27 27
24 24
21 21
18 18
15 15
12 12
9 9
6 6
3 3
0 0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
ODD
prob.
24 24
21 21
18 18
15 15
12 12
9 9
6 6
3 3
0 0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Figure 2. A display of the relationship between the estimate of the latent trait value reported in the PCM and the total score, with a
95% confidence interval around the estimated score.
support for the unidimensionality of the scale (2.30% sig- reliability; however, the PSI was 0.62 indicating low reli-
nificant tests), and there was no redundancy in the items ability for the scale. A more detailed description of the dis-
within the scale. Furthermore, all items with the exception similar results is provided in the Discussion section.
of Item 11 (Leaves seat in classroom or in other situations in It is important to investigate sources of misfit when
which remaining seated is expected) fit the PCM. This item acceptable fit to the PCM is not obtained. Although little
had a negative fit residual showing that the item discrimi- evidence was uncovered to determine the source of misfit, a
nated better than the PCM predicted (χ2[5] = 20.27, p = number of follow-up analyses were conducted to assess if a
.001). In relation to invariance, no items displayed DIF by revised version of the scale could fit the PCM. These
age, conversely, Item 11 and Item 15 (Talks excessively) included the elimination of Items 11 and 15 and analyses
displayed DIF by gender. Specifically, parents of boys were pooling different combinations of items. Although the fit of
more likely to endorse Item 11 and parents of girls were the model was improved slightly, acceptable fit was not
more likely to endorse Item 15, despite children having the obtained. Nevertheless, a follow-up analysis for the boys
same level of the latent trait of hyperactivity/impulsivity. and girls samples independently indicated that the scale fit
Cronbach’s alpha of the scale was 0.82 indicating good the PCM for each of these samples. Therefore, the results
Makransky and Bilenberg 701
support the construct validity of the gender stratified parent scale was 0.85 indicating good reliability; however, the PSI
hyperactivity/impulsivity scale. was 0.61 indicating low reliability for the scale. In general,
the results support the construct validity of the parent con-
duct problems scale.
Teacher Hyperactivity/Impulsivity Scale
The nine item hyperactivity/impulsivity scale did not fit the
Teacher ODD Scale
PCM (χ2[36] = 73.11, p < .0005). There was support for the
unidimensionality of scale (1.94% significant tests). The eight-item teacher ODD scale also had general good fit
However, positive residuals over 0.2 indicating LD or to the PCM (χ2[24] = 35.30, p = .06). There was support for
redundancy were found between Item 11 (Leaves seat in the unidimensionality of the scale (1.24% significant tests).
classroom or in other situations in which remaining seated However, a positive residual over 0.2 indicating LD or
is expected) and Item 12 (Runs about or climbs excessively redundancy between Item 20 (Argues with adults) and Item
in situations in which remaining seated is expected); and 21 (Actively defies or refuses to comply with adults’
between Item 16 (Blurts out answers before questions have requests or rules) was detected. In relation to invariance, no
been completed) and Item 17 (Has difficulty awaiting turn). items displayed DIF by age; however, Item 25 (Is angry and
Furthermore, Items 12 (χ2[4] = 14.28, p = .006) and 16 resentful) displayed DIF by gender. Specifically, teachers
(χ2[4] = 16.54, p = .002) had negative fit residuals showing were more likely to endorse the item in girls compared with
that the items discriminated better than the PCM predicted, boys. Cronbach’s alpha of the scale was 0.93 and the PSI
which could be a direct consequence of the LD reported was 0.70 indicating acceptable reliability for the scale.
above. All other items fit the PCM. In relation to invariance, Although the results support the construct validity and reli-
no items displayed DIF by age, conversely, Item 10 (Fidgets ability of the teacher conduct problems scale, a follow-up
with hands and feet or squirms in seat), Item 12, and Item analysis was conducted to deal with the LD between Items
15 (Talks excessively) displayed DIF by gender. Specifically, 20 and 21. The results of the new analysis indicated excel-
teachers of boys were more likely to endorse Items 10 and lent fit to the PCM (χ2[21] = 25.66, p = .22).
12 and teachers of girls were more likely to endorse Item
15, despite children having the same level of the latent trait
of hyperactivity/impulsivity. Cronbach’s alpha of the scale
Discussion
was 0.93 and the PSI was 0.72 indicating acceptable reli- ADHD/HKD symptom levels span a continuum from nor-
ability for the scale. mal to severely disordered children, all within the school
Since direct evidence was uncovered indicating the population. The DSM and ICD diagnoses are constructs
sources of misfit to the PCM, follow-up analyses were per- based on clinical case reports and empirical data. Rating the
formed to investigate if acceptable fit to the PCM could be severity of psychopathology is essential in daily clinical
obtained. Response dependence between items is an indica- practice and in research. Therefore, reliable measures are
tion that items that are treated as independent have redun- needed that in the most valid way reflect the patient’s clini-
dancy that should be modeled; alternatively one of the pair cal picture, and change in symptom load when patients are
of items can be eliminated. Item dependence was dealt with assessed on two or more occasions within an intervention.
in this study by combining Items 11 and 12 and Items 16 The PCM, within the framework of IRT, was used to test the
and 17 into two composite items. The follow-up analysis psychometric properties of the mADHD-RS in this study.
after combining these items indicated that the scale fit the More precisely, we investigated whether the mADHD-RS
PCM (χ2[35] = 47.44, p = 0.08). Therefore, the results sup- makes up a single unidimensional scale as it is sometimes
port the construct validity and reliability of the revised used by clinicians, or if it is a multidimensional scale with
teacher hyperactivity/impulsivity scale. three individual subscales (inattention, hyperactivity/impul-
sivity, and ODD). Furthermore, we investigated the MI of
the scales. Specifically, we investigated whether the scale(s)
Parent ODD Scale function similarly when teachers or parents are used as
The eight-item parent ODD scale had general good fit to the informants, and whether the items in the scale(s) function
PCM (χ2[40] = 55.28, p = .06). There was support for the similarly across gender and age groups.
unidimensionality of the scale (2.74% significant tests), and The results of the study showed clear evidence of multi-
there was no redundancy in the items within the scale. There dimensionality for the parent and teacher mADHD-RS.
were also no items with significant chi-squared statistics Therefore, the mADHD-RS total score should not be used
indicating that all the items were good measures of the as a single severity measure of ADHD psychopathology as
latent ODD construct. In relation to invariance, no items it does not measure a single unidimensional clinical repre-
displayed DIF by gender; however, Item 25 (Is angry and sentation (diagnose or trait). The current analyses support
resentful) displayed DIF by age. Cronbach’s alpha of the the DSM (fourth and fifth editions) classification systems
702 Assessment 21(6)
where ADHD patients are characterized either as primarily that these two items are quite similar in that they both assess
inattentive or primarily hyperactive/impulsive, or as a com- student’s failure to comply with adults by arguing or refus-
bined type with severe deficits in both areas. When assess- ing to comply with their rules.
ing symptom load, clinicians need to address the two There are several options when deciding on how to
problem areas, inattention and hyperactivity/impulsivity, improve the scale based on redundant items. The first is to
separately. ODD symptoms also form a separate category eliminate one of the item pairs. This solution has complica-
and should be rated and interpreted independently. tions with the mADHD-RS because the scale is developed
An analysis of whether the parent and teacher ratings for to assess the diagnosis in the DSM-IV and DSM-5, and the
each child could be combined to get a global scale of ADHD elimination of the items would decrease the content validity
indicated that the scales could not be combined because of the scale. One consideration could be the revision of the
parents and teachers had very different frames of reference items with the objective of differentiating the behavior that
in applying the mADHD-RS. Teachers had lower average is assessed in each one to limit redundancy. Although this
ratings showing leniency in their ratings. Teachers were option sounds appealing, revising a widely used question-
also better at differentiating between children with low and naire often comes with a number of practical challenges and
high levels of the ADHD trait, which was evident in the it often takes a considerable period of time before data can
larger SD for all scales compared with the parent raters. be collected to assess how well new items are functioning.
Additionally, Cronbach’s alpha and the PSI were higher for A final option is to model the LD between the items, by
teacher ratings compared with the parent ratings. The results combining them into single combination items (e.g., Kreiner
indicate that teachers are more lenient but better at differen- & Christensen, 2011). Although future research could
tiating between children with low and high levels of the assess one of the other options if the results from this study
ADHD trait. This may be the case because teachers interact are replicated, LD was successfully dealt with in this study
with large groups of age-matched children of both genders, by combining the sets of items with LD into single items
and therefore have a more explicit “reference,” whereas and reestimating item parameters.
parents typically only have a couple of siblings to compare There were also several items that did not fit the PCM.
when rating the indexed child. This was true for Item 2 (Difficulty sustaining attention) in
In general, the results of the study supported the validity the teacher inattention scale, Item 11 (Leaves seat) in the
of the mADHD-RS when used independently for parents parent hyperactivity/impulsivity scale, and Items 12 (Runs
and teachers. The results indicated that parent and teacher about or climbs excessively) and 16 (Blurts out answers) in
inattention and conduct ODD scales had good fit to the the teacher hyperactivity/impulsivity scale. All items had
PCM. Conversely, the parent and teacher hyperactivity/ negative fit residuals showing that the items discriminated
impulsivity scale in its current format did not have good fit better than the PCM predicted. One cause for higher levels
to the PCM. However, acceptable fit was achieved with of discrimination is redundancy between items, which typi-
slight adjustments including the combination of item pairs cally leads to inflated discrimination estimates. This was
that displayed LD, and stratifying by gender for the teacher true for Items 12 and 16 in the teacher hyperactivity/impul-
and parent hyperactivity/impulsivity scales, respectively. sivity scale, which showed LD as discussed above.
The results of the study showed some redundancy in the Therefore, lack of fit for these items was likely a direct con-
form of LD between items in the teacher hyperactivity/ sequence of LD. The lack of fit for the other two items was
impulsivity and ODD scales. LD was identified between not seen as problematic as the discrimination parameters
Item 11 (Leaves seat in classroom or in other situations in were higher than expected by the model, and there was no
which remaining seated is expected) and Item 12 (Runs LD identified.
about or climbs excessively in situations in which remain- In relation to MI across gender, DIF was identified in
ing seated is expected) and between Item 16 (Blurts out Items 11 (Leaves seat in classroom or in other situations in
answers before questions have been completed) and Item which remaining seated is expected) and 15 (Talks exces-
17 (Has difficulty awaiting turn) in the teacher hyperactiv- sively) in the parent inattention scale; Items 10 (Fidgets with
ity/impulsivity scale. It is clear from the content of these hands and feet or squirms in seat), 12 (Runs about or climbs
items that they overlap considerably. Items 11 and 12 assess excessively in situations in which remaining seated is
very similar behaviors and differ only slightly in that teach- expected), and 15 (Talks excessively) in the teacher inatten-
ers are asked if children leave their seat (Item 11) or run and tion scale; and Item 25 (Is angry and resentful) in the teacher
climb (Item 12) in “situations in which remaining seated is ODD scale. Parents of boys were more likely to endorse
expected.” Similarly, Items 16 and 17 both assess children’s Item 11 and parents of girls were more likely to endorse Item
ability to wait until it is appropriate to act. LD was also 15, despite children having the same level of the latent trait
identified between Item 20 (Argues with adults) and Item of hyperactivity/impulsivity. Similarly, teachers of boys
21 (Actively defies or refuses to comply with adults’ were more likely to endorse Items 10 and 12, and teachers of
requests or rules) in the teacher ODD scale. Again it is clear girls were more likely to endorse Items 15 and 25, despite
Makransky and Bilenberg 703
children having the same level of the latent traits of hyperac- depending on the demographic group the child belongs to
tivity/impulsivity and ODD, respectively. These results are (e.g., gender, age). The use of such a scoring scheme would
congruent with previous findings (Biederman, Faraone, & have the advantage that children would be compared to a
Monuteaux, 2002) that boys and girls with high levels of general standard because the model would account for the
ADHD have different ways of expressing this behavior. The real differences that exist between demographic groups.
results indicated that boys tend to act up by squirming and However, the method has the disadvantage that it introduces
fidgeting and leaving their seats, whereas girls tend to talk added complexity and the total scores can no longer be used
excessively, or act angry and resentful. on their own as summary scores. A more simple practice is
Lack of MI across age in the form of DIF was also to use gender- and age-specific norms and cutoff scores
detected for Items 1 (Fails to give close attention to details or when making clinical decisions about individuals with the
makes careless mistakes in schoolwork) and 3 (Does not ADHD-RS. This procedure is more appropriate in the cur-
seem to listen when spoken to directly) in the parent inatten- rent setting where total scores are typically used for making
tion scale. Specifically, for Item 1 parents of the oldest group clinical decisions. Such reference scores seem to be cultur-
of children (14 to16 years of age) were more likely to ally different with rather large cross-cultural differences.
endorse the item compared with the parents of the younger Therefore, national specific standardizations of ADHD-RS
children (6 to 9 years of age) despite children having the (mADHD-RS) should be used. When assessing a specific
same level of the latent trait of inattention (the middle-aged child one should transform raw scores to t scores (z scores),
group scored between the two). This finding is in line with which indicate how far from the age and gender stratified
clinical observations because parents tend to have higher mean the child has scored (Szomlaiski et al., 2009).
expectations of older children. For Item 3, parents of younger Treatment effect can then be measured as reduction in
children (6 to 9 years) were more likely to endorse the item t score, and normalization can be defined as a t score below
compared with the parents of the older two groups of chil- 60 (mean score for a normal child plus 1 SD).
dren (10 to 16 years). This result is also consistent with clini- One advantage of assessing the validity of measurement
cal observations that children between the ages of 6 and 9 scales within the framework of IRT is that instead of assess-
years have a shorter attention span than their older peers. ing reliability in general for the entire scale, IRT modeling
Furthermore, Items 7 (Loses things necessary for tasks or provides a standard error band around the ability estimate
activities) and 8 (Is easily distracted) exhibited DIF by age in at each point on the latent trait continuum. Good test target-
the teacher inattention scale. For Item 7 teachers of the mid- ing is obtained when the items within a scale accurately
dle aged group (10 to 13 years) were more likely to endorse assess the sample of respondents who are administered the
the item compared with the younger group (6 to 9 years) scale. Clinically, this information can be useful in assess-
despite children having the same level of the latent trait of ing floor and ceiling effects. An instrument is needed that
inattention (the oldest aged group scored between the two). can reliably measure the latent trait span from normal to
This result may be explained by higher expectations from severely disordered children, since measurement of treat-
teachers of the middle-aged children, and acceptance of pre- ment outcomes goes from subclinical symptom levels all
adolescents or pubertal children to become more forgetful. the way to normalization. The results of this study indi-
For Item 8, teachers of younger children (6 to 9 years) were cated that the mADHD-RS subscales have good targeting
more likely to endorse the item, followed by the middle for the general sample of school children between 6 and 16
group (10 to 13 years), and the oldest group (14 to 16 years) years of age, with the exception of students with raw scores
despite having the same trait level, which is consistent with below 2 and those with scores higher than 3 SD above the
clinical observations that younger children are more easily mean. The difference between Cronbach’s alpha and PSI
distracted in school activities. Finally, there was DIF by age highlight the difference between assessing reliability with
for Item 25 (Is angry and resentful) in the parent ODD scale. CTT and the PCM. While Cronbach’s alpha does not take
Specifically, parents of the middle-aged group (10 to 13 targeting into account the PSI does. Cronbach’s alpha was
years) were more likely to endorse the item, followed by the considerably higher than the PSI in all of the scales in the
younger group (6 to 9 years), and finally the oldest group (14 mADHD-RS. This was the case because the sample used in
to 16 years) despite having the same trait level. this study was made up of children from the general popu-
There are several options available for dealing with lack lation, which resulted in highly positively skewed distribu-
of MI in the form of DIF. There are statistical procedures tions, so between 38% and 82% of the sample had raw
that can account for the DIF in items by using group spe- scores below 3 on the different mADHD-RS scales.
cific item parameters (e.g., Kreiner & Christensen, 2011; However, acceptable precision was obtained at the mid-
Makransky & Glas, 2013). These procedures correct for the range of the scale (range 9-18), which is important because
differences at the item level by estimating ability with dif- clinicians typically want to treat referred children so that
ferent item parameters for each demographic group. This they go from scores in the upper third (range 18-27) to
produces scores that have different trait-level values lower third (range 0-9) of the scales.
704 Assessment 21(6)
Our results are important for clinicians assessing and Declaration of Conflicting Interests
treating children and youth with ADHD, because they The authors declared no potential conflicts of interest with respect
underline the fact that core ADHD symptoms can be rated to the research, authorship, and/or publication of this article.
validly by use of multi-informant rating scales. There is
limited information available regarding the MI of the Funding
ADHD-RS or other similar instruments; however, this is a
The authors received no financial support for the research, author-
fundamental assumption when making decisions about
ship, and/or publication of this article.
results obtained from different rating sources, or when com-
paring different demographic groups. Our results provide
References
evidence that the ADHD-RS total score within each of the
subscales is a sufficient and valid measure of the severity of American Psychiatric Association (1994). Diagnostic and sta-
ADHD symptoms when teachers and parent ratings are tistical manual mental disorders (4th edition; DSM-IV).
used separately. Furthermore, when compared with age and Washington, DC: Author.
American Psychiatric Association (2013). Diagnostic and sta-
gender stratified norm distributions it is possible to cali-
tistical manual mental disorders (5th edition; DSM-V).
brate the severity of symptoms and measure outcomes. Washington, DC: Author.
Nevertheless, it is essential that symptoms are only one Andrich, D., Sheridan, B., & Luo, G. (2010). Rasch models
component of a clinical diagnosis; suffering or functional for measurement: RUMM2030. Perth, Australia: RUMM
impairment is often of primary concern for patients and Laboratory.
relatives. Consequently, global functional impairment must Barkley, R., Gwenyth, E. H., & Arthur, L. R. (1999). Defiant
always be assessed as part of baseline evaluation and when teens. A clinician’s manual for assessment and family inter-
measuring treatment response. vention. New York, NY: Guilford Press.
Barkley, R. A., & Murphy, K. R. (1998). Attention-deficit hyper-
activity disorder: A clinical workbook (2nd ed.). New York,
Future Research NY: Guilford Press.
Biederman, J., Faraone, S. V., & Monuteaux, M. C. (2002).
Although the mADHD-RS functioned well in a general
Differential effect of environmental adversity by gender:
school children sample, it would be beneficial to investigate Rutter’s index of adversity in a group of boys and girls with
the generalizability of the results to other populations in and without ADHD. American Journal of Psychiatry, 159(1),
other cultural contexts. Furthermore, this study and other 36-42.
existing studies that have assessed the validity of different Bond, T. G., & Fox, C. M. (2001) Applying the Rasch model:
ADHD scales from an IRT or Rasch perspective have used Fundamental measurement in the human sciences. Mahwah,
general population samples (e.g., Gomez, 2008a, 2008b, NJ: Erlbaum.
2012, 2013; Gomez et al., 2010). Although ADHD/HKD Collett, B. R., Ohan, J. L., & Myers, K. M. (2003). Ten-year
represents latent traits that span a continuum from normal to review of rating scales. V: scales assessing attention-deficit/
severely disordered children, it would be beneficial to hyperactivity disorder. Journal American Academy Child
investigate the validity of the scales within a clinical popu- Adolescent Psychiatry, 42, 1015-1037.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for
lation. The use of the scales in clinical samples may illumi-
psychologists. Mahwah, NJ: Erlbaum.
nate specific characteristics of the scales that do not appear DuPaul, G. J. (1991). Parent and teacher ratings of ADHD symp-
when the scales are used in general samples. Future research toms: Psychometric properties in a community-based sample.
could assess the validity of these scales by using the PCM Journal of Child Psychology, 20, 245-253.
to provide more detailed information about how the scales DuPaul, G. J. (1998). Parent ratings of attention-deficit/hyper-
function in these populations. activity disorder symptoms: Factor structure and normative
In general, the results showed that very slight adjust- data. Journal of Psychopathology Behavioral Assessment, 20,
ments were needed to obtain good fit to the PCM; however, 83-102.
future research should be conducted to investigate if the DuPaul, G. J., Power, T. J., Anastopoulos, A., & Reid, R. (1998).
results of the current analyses can be replicated. Depending ADHD rating scale–IV. New York, NY: Guilford Press.
on these results, changes may be required to improve the Gomez, R. (2008a). Parent ratings of the ADHD items of the
Disruptive Behavior Rating Scale: Analyses of their IRT
validity and reliability of the mADHD-RS. Consistency of
properties based on the generalized partial credit model.
the results may even influence future revisions of the clas- Personality and Individual Differences, 45, 181-186.
sification systems if redundancies of the diagnostic criteria Gomez, R. (2008b). Item response theory analyses of the par-
are evident. Specific criteria may be excluded from the ent and teacher ratings of the DSM-IV ADHD Rating Scale.
diagnostic assessment if one or more symptoms are over- Journal of Abnormal Child Psychology, 36, 865-885.
lapping so that the presence of one criterion is trivial in the Gomez, R. (2012). Item response theory analyses of adoles-
presence of another. cent self-ratings of the ADHD symptoms in the Disruptive
Makransky and Bilenberg 705
Behavior Rating Scale. Personality and Individual interventions for ADHD: systematic review and meta-analy-
Differences, 53, 963-968. ses of randomized controlled trials of dietary and psychologi-
Gomez, R. (2013). DSM-IV ADHD symptoms self-ratings by cal treatments. American Journal of Psychiatry; 170: 275-289.
adolescents: Test of invariance across gender. Journal of Smith, A. B., Wright, P., Selby, P., & Velikova, G. (2007).
Attention Disorders, 17(1), 3-10. Measuring social difficulties in routine patient-centred assess-
Gomez, R., Vance, A., & Gomez, A. (2010). Item response theory ment: A Rasch analysis of the social difficulties inventory.
analyses of parent and teacher ratings of the ADHD symp- Quality of Live Research; 16: 823-831.
toms for recoded dichotomous scores. Journal of Attention Smith, E. V. (2002). Detecting and evaluating the impact of mul-
Disorders, 15, 269-285. tidimensionality using item fit statistics and principal com-
Hays, R. D., Morales, L. S., & Reise, S. P. (2000): Item response ponent analysis of residuals. Journal Applied Measurement,
theory and health outcomes measurement in the 21st century. 3, 205-231.
Medical Care, 38(9 Suppl.), II28-II42. Swanson, J. M., Kraemer, H. C., Hinshaw, S. P., Arnold, L. E.,
Kreiner, S., & Christensen, K. B. (2011). Item screening in graphi- Conners, C. K., & Abikoff, H. B. (2001). Clinical relevance
cal loglinear Rasch models. Psychometrika, 76, 228-256. of the primary findings of the MTA: Success rates based on
Lord, F. M., & Novick, M. R (1968). Statistical theories of mental severity of ADHD and ODD symptoms at the end of treat-
test scores. Oxford, England: Addison-Wesley. ment. Journal of American Academy of Child Adolescent
Magnusson, P., Smari, J., Gretarsdottir, H., & Prandardottir, H. Psychiatry; 40: 168-179.
(1999). Attention-deficit/hyperactivity symptoms in Icelandic Szomlaiski, N., Dyrborg, J., Rasmussen, H., Schumann, T., Koch,
schoolchildren: assessment with the attention deficit/hyper- S. V., & Bilenberg, N. (2009). Validity and clinical feasibility
activity rating scale-IV. Scandinavian Journal of Psychology, of the ADHD rating scale (ADHD-RS): A Danish nationwide
40, 301-306. multicenter study. Acta Paediatrica; 98: 397-402.
Makransky, G., & Glas, C. A. W. (2013). Modeling differential Tennant, A., & Conaghan, P. G. (2007). The Rasch measurement
item functioning with group-specific item parameters: A model in rheumatology: What is it and why use it? When
computerized adaptive testing application. Measurement, 46, should it be applied, and what should one look for in a Rasch
3228-3237. Retrieved from http://dx.doi.org/10.1016/j.mea- paper? Arthritis & Rheumatism, 57, 1358-1362.
surement.2013.06.020 World Health Organization. (1992). The ICD-10 classification of
Masters, G. N. (1982). A Rasch model for partial credit scoring. mental and behavioural disorders. Clinical descriptions and
Psychometrica, 47, 149-174. diagnostic guidelines. Geneva, Switzerland: Author.
Pallant, J. F., & Tennant, A. (2007). An introduction to the Rasch Wolraich, M. L., Lambert, W., Doffing, M. A., Bickman, L.,
measurement model: An example using the Hospital Anxiety Simmons, T., & Worley, K. (2003). Psychometric proper-
and Depression Scale (HADS). British Journal of Clinical ties of the Vanderbilt ADHD diagnostic parent rating scale
Psychology, 46, 1-18. in a referred population. Journal of Pediatric Psychology, 28,
Poulsen, L., Jørgensen, S. L., Dalsgaard, S., & Bilenberg, N. 559-567.
(2009). Danish standardization of the ADHD rating scale. Wright, B. D. (1996). Local dependency, correlations and prin-
Ugeskr Laeger, 171, 1500-1504. cipal components. Rasch Measurement Transactions, 10,
Rasch, G. (1960). Probabilistic models for some intelligence 509-511.
and attainment tests. Copenhagen: Danish Institute for Zhang, S., Faries, D. E., Vowles, M., & Michelson, D. (2005). ADHD
Educational Research. rating scale IV: Psychometric properties from a multinational
Sonuga-Barke, E. J., Brandeis, D., Cortese, S., Daley, D., Ferrin, study as a clinician-administered instrument. International
M., Holtmann, M., . . .Sergeant, J. (2013). Nonpharmacological Journal Methods Psychiatry Research, 14, 186-201.