Professional Documents
Culture Documents
Methods
Volume 17 | Issue 1 Article 15
6-26-2018
Abdullah Saddaawi
King Saud University, alsadaawi@gmail.com
Khaleel Al-Harbi
Taibah University, k.harbi@qiyas.org
Recommended Citation
Sideridis, Georgios; Saddaawi, Abdullah; and Al-Harbi, Khaleel (2018) "Internal Consistency Reliability in Measurement: Aggregate
and Multilevel Approaches," Journal of Modern Applied Statistical Methods: Vol. 17 : Iss. 1 , Article 15.
DOI: 10.22237/jmasm/1530027194
Available at: https://digitalcommons.wayne.edu/jmasm/vol17/iss1/15
This Emerging Scholar is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted
for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState.
Internal Consistency Reliability in Measurement: Aggregate and
Multilevel Approaches
Cover Page Footnote
Address correspondence to Georgios D. Sideridis, Ph.D., Boston Children’s Hospital, Harvard Medical
School, 300 Longwood Avenue, Boston, MA 02115. Email: Georgios.sideridisi@childrens.harvard.edu or
Georgios.sideridis@gmail.com.
This emerging scholar is available in Journal of Modern Applied Statistical Methods: https://digitalcommons.wayne.edu/jmasm/
vol17/iss1/15
Journal of Modern Applied Statistical Methods
May 2018, Vol. 17, No. 1, eP2560 Copyright © 2018 JMASM, Inc.
doi: 10.22237/jmasm/1530027194 ISSN 1538 − 9472
EMERGING SCHOLARS
Internal Consistency Reliability in
Measurement: Aggregate and Multilevel
Approaches
Georgios Sideridis Abdullah Saddaawi Khaleel Al-Harbi
Harvard Medical School King Saud University Taibah University
Boston, MA Riyadh, Saudi Arabia Medina, Saudi Arabia
The purpose of this study was to evaluate the internal consistency reliability of the General
Teacher Test assuming clustered and non-clustered data using commercial software
(Mplus). Participants were 2,000 testees who were selected using random sampling from a
larger pool of examinees (more than 65k). The measure involved four factors, namely: (a)
planning for learning, (b) promoting learning, (c) supporting learning, and (d) professional
responsibilities, and was hypothesized to comprise a unidimensional instrument assessing
generalized skills and competencies. Intra-class correlation coefficients and variance ratio
statistics suggested the need to incorporate a clustering variable (i.e., university) when
evaluating the factor structure of the measure. Results indicated that single level reliability
estimation significantly overestimated the reliability observed across persons and
underestimated the reliability at the clustering variable (university). One level reliability
was also, at times, lower than the lowest acceptable levels leading to a conclusion of
unreliability whereas multilevel reliability was low at the between person level but
excellent at the between university level. It was concluded ignoring nesting is associated
with distorted and erroneous estimates of internal consistency reliability of an ability
measure and the use of MCFA is imperative to account for dependencies between levels of
analyses.
Introduction
Inconsistent measurement is undoubtedly one of the biggest threats to the internal
validity of studies and has attracted the interest of researchers since early 1900
doi: 10.22237/jmasm/1530027194 | Accepted: May 15, 2017; Published: June 26, 2018.
Correspondence: Georgios Sideridis, georgios.sideridis@childrens.harvard.edu
2
SIDERIDIS ET AL
(Spearman, 1904, 1910). For that purpose, several indices of internal consistency
reliability have been developed to properly capture this important measurement
characteristic (Anderson & Gerbing, 1982). In the relevant literature there have
been diverse opinions regarding internal consistency and reliability with several
authors pointing to diverse operational definitions (Hattie, 1985). In the present
study the term internal consistency reliability is used and relates to the earlier use
of internal consistency. It represents a domain-sampling approach, as true reliability
should involve the presence of two measurement points (Guttman, 1945). Amongst
indices, Cronbach’s alpha coefficient (Cronbach, 1951) has been one of the most
widely used indices with more than 250,000 hits in Google’s Scholar database; see
also Hogan, Benjamin, and Brezinski (2000). This is particularly interesting despite
noticeable shortcomings and challenges regarding computation and interpretation
(Boyle, 1991; Cortina, 1993; Hayashi, & Kamata, 2005; Henson, 2001; Kopalle,
1997; Liu, Wu, & Zumbo, 2010; Raykov, 2001; Shevlin, Miles, Davies, & Walker,
2000; Streiner, 2003a). Other commonly-used indices involve omega reliability
(Raykov, 1997) and maximal reliability H (Li, 1997). The purpose of this study is
to illustrate, using an example from a National Examination, the measurement of
internal consistency reliability under the lenses of multilevel modeling as a means
to properly assess the amount of error that the latent trait contains across different
levels in the analysis. Here, internal consistency reliability refers to true-score
variance. A secondary purpose is to illustrate the estimation of various indices using
widely known software.
Inevitably, when assessing internal consistency of a measure one must
consider the context in which individuals are located. For example, when students
take a test, their scores and performance may be more similar to students within
their class compared to students from other classes, schools, or neighborhoods
(Opdenakker & Van Damme, 2000). This apparent relationship will likely be
reflected with a correlational structure that will account for those dependencies (e.g.,
an autocorrelation structure if students are nested within time). In the above
example, student scores will likely be more strongly correlated when tested within
their class (and the mean of their class), compared to across classes (and the grand
mean). Ignoring that dependency will likely result in estimates of internal
consistency that confound true within and between estimates of reliability as the
aggregate term will ignore the true score and error variance estimates at each level,
placing them all under a single residual term (Geldof, Preacher, & Zyphur, 2014).
As a problem in educational research, it was first described by Robinson (1950) as
the ecological fallacy phenomenon, which refers to the implicit assumption that
estimates at one level generalize to another (Morin, Marsh, Nagengast, & Scalas,
3
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
2014) with several applications supporting opposite claims (e.g., Marsh, 2007;
Schwartz, 1994). The basic justification for estimating multilevel reliability is that
true score variance may be “captured to a different degree at each level” (Geldof et
al., 2014, p. 75).
Within the Structural Equation Modeling (SEM) framework, when ignoring
nesting, a factor loading reflects the expected value of change in an indicator when
the factor changes by one standardized unit (Pornprasertmanit, Lee, & Preacher,
2014). This relationship, however, between an item and a latent factor, should not
presumably be the same if the unit of analysis is the person in relation to his/her
cluster’s mean (e.g., when students are nested within their class – as in group-mean
centering) compared to the person being seen in relation to the whole group
(aggregate data, ignoring nesting – as in grand mean centering). The next section
describes the estimation of various internal consistency reliability indices.
Cronbach’s Alpha
Based on Classical Test Theory (Nunnally, 1978), a measured item/construct’s X
score is comprised of two components: a true score T plus some form of error e (i.e.,
X = T + e), with the expectation that error is random rather than systematic. Since
we rarely measure single item constructs, unidimensionally-measured phenomena
are often described with a single factor model in which items contribute stochastic
and white noise information. Using a three-item instrument the 1-factor model is
expressed as follows (Wang & Wang, 2012):
Y1 = 11 + 1 ( Item 1)
Y2 = 21 + 2 ( Item 2 )
Y3 = 31 + 3 ( Item 3)
with each of items Y1, Y2, and Y3 being linked to the latent structure ξ1 stochastically
(with λ being the correlation between the item and the latent dimension) and δ a
form of random error as the items are likely imperfect estimates of the true trait.
Wang and Wang (2012) emphasized that no matter how carefully the procedures
have been implemented or how refined a measure is, error of measurement is
sizable and hopefully reflects random rather than systematic variations; for an
excellent discussion see Streiner (2003b) and Judd, Smith and Kidder (1991). Based
4
SIDERIDIS ET AL
on the above single factor model and earlier work (Guttman, 1945), Cronbach
proposed the alpha statistic as a measure of internal consistency assuming that all
items contribute to the measurement of a construct and that contribution is reflected
in the intercorrelations between items (i.e., k r ) as follows:
kri
Cronbach = (1)
1 + ( k − 1) ri
Thus, the term ri reflects the mean intercorrelation between items i1, i2,…, ik and k
is the number of items. The above formula was used for presentation only as it is
the easiest to conceptualize; for estimation we employed the alternative formula,
which has wider use:
n2 ij
Cronbach = (2)
2
with n representing the number of items, σij the average covariance between items,
and 2 the sum of all item variances plus 2 times the sum of the covariances
between items (i.e., scale’s variance, see Geldof et al., 2014). Later Cronbach
corrected the positive bias that the number of items exerts on the coefficient by
adopting the Spearman-Brown formula (J. Brown, 1996) and proposed alternative
formulations (Cronbach & Gleser, 1964). Obviously the magnitude of the interitem
correlation and the number of items are positive contributors to alpha with larger
correlations and lengthy instruments being associated with higher estimates of
internal consistency reliability. As several researchers noted, however, Cronbach’s
coefficient alpha is a lower bound to the true reliability when items are tau
equivalent (Lord & Novick, 1968). Consequently, it may seriously underestimate
the internal consistency of a measure (Osburn, 2000; Thompson, Green, & Yang,
2010). For that purpose, an alternative to the original formula was implemented
which involved a correction for sample size (Kristof, 1963):
2 N −3
Cronbach's alpha using Kristof's correction K = + (3)
N −1 N −1
5
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
( )
2
i
Omega = i
(4)
( ) + Var ( )
2
i i i i
With λi being the factor loadings of item i and Σi Var(εi) the respective error
variances of item i. This formula ignores the likelihood that a correlated structure
in the residuals is present, in which case reliability needs to be adjusted accordingly
(Westfall, Henning, & Howell, 2012). In instances that correlated errors reflect
measurement artifacts (Wang & Wang, 2012) such as presence of a single stem
across all items (e.g., “It is important to me to…”) or the presence of a third latent
aptitude trait (e.g., language, complex terminology) that is a prerequisite to
6
SIDERIDIS ET AL
comprehending the content of some items, one needs to adjust the coefficient for
collinearity in the residuals as following:
( )
2
i
Omega = i
(5)
( i i ) + i Var ( i ) + 2i j Varij
2
With the term 2ΣiΣj Varij being two times the sum of the covariance between the
error terms, representing a scale’s variance. More recently, Zinbarg, Revelle, Yovel,
and Li (2005) provided an extension of omega through estimating a lower bound
estimate of internal consistency reliability. However, in the present study the
intercorrelations of residual estimates were negligible around zero and, thus, this
formula was not implemented further.
Maximal Reliability H
The H coefficient termed maximal reliability H (Bentler, 2007) was assessed as a
means of estimating reliability using an optimally weighted composite using the
standardized factor loadings as follows (Li, 1997; Raykov, 2004):
i2
i 1− 2
H= i
(6)
2
1+ i i 2
1 − i
With li2 being the standardized factor loading of item i squared (Hancock &
Mueller, 2001). The advantage of the maximal reliability coefficient compared to
omega lies in the fact that negative factor loadings now offer meaningful variance
that is modeled properly. Also, the H statistic uses a weighted estimate by squaring
the individual factor loadings (Hancock & Mueller, 2001) and the estimated
reliability can never be less than reliability of the best measured item. Last, the
weighing procedure saliently downgrades less informative items which load
weakly on the factor (Geldof et al., 2014).
7
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
(e.g., alpha, Feldt, 2002). The idea that governs those indices is that the covariances
between items represent true information whereas the variances of the items contain
both true and unique variability. The interested reader can consult the works of
Jackson and Agunwamba (1977) for excellent reviews. One such index is αpc (ten
Berge & Hofstee, 1999), which employs the eigenvalue of the first principal
component, in the case of unidimensional structures. The coefficient is estimated
as follows:
n 1
pc = 1 − (7)
n − 1 1
With λ1 being the eigenvalue of the first principal component from a PCA analysis
using commercial software; see also Raykov and Pohl (2012) for an alternative
conceptualization through modeling common factor variance.
A second index is Guttman’s Lambda 1 coefficient. It is estimated as the ratio
of one minus the sum of the items’ error variances to the instrument’s total variance
2 :
1 − ii2
1 = (8)
2
with the term σii representing the sum of the item error variances and 2 the sum
of all item variances plus 2 times the sum of the covariances between items. The
idea behind the coefficient is that all of items’ information represents measurement
error except the interitem covariances, which reflect true variance.
A last index is Guttman’s Lambda 2 coefficient which equals lambda 1 when
the items are tau equivalent. It is estimated through adding lambda 1 to the ratio of
the square root of two times the item covariances squared times n / (n – 1), and all
that divided by the sum of all item variances plus 2 times the sum of the covariances
between items (i.e., the scale’s total variance estimate):
n
C
2 = 1 = n −1 (9)
x2
8
SIDERIDIS ET AL
with C being the square root of the sums of squares of the off diagonal elements. It
is considered an improved estimate over Cronbach’s alpha (Osburn, 2000) and is
very similar to the u2 estimate of ten Berge and Zegers (1978).
zˆ z a S.E. ( zˆ ) (10)
2
With za/2 being the two-sided level of significance for a given alpha level. The logit
transformation of omega is given by:
ˆ
zˆ = ln (11)
1 − ˆ
9
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
Table 1. Indices of internal consistency reliability for the aggregate scale and its bifurcation to within and between levels
Level of analysis Cronbach's alpha Alpha K-corrected Omega CR Guttman's λ1 Guttman's λ2 αpc† Maximal H
Aggregate scale 0.689 0.692 0.693 0.516 0.69 0.537 0.745
(0.726-
(Single level) (0.668-0.711) (0.671-0.714) (0.672-0.714) (0.501-0.532) (0.669-0.712)
0.765)
Within level 0.673 0.674 0.677 0.505 0.675 0.527 0.731
(709-
(Person) (0.631-0.718) (0.632-0.719) (0.648-0.707) (0.475-0.537) (0.633-0.720)
0.755)
Between level 0.981 0.981 0.969 0.735 0.987 0.744 0.97
(0.951-
(University) (0.431-10.00) (0.431-10.00) (0.950-0.989) (0.322-10.00) (0.432-10.00)
0.990)
Note: Estimates in the parentheses are 95% confidence intervals based on the logit transformation (Raykov, Marcoulides, & Akaeze, 2016).
†
A standard error for this estimate could not be computed because it was based on an estimate of an eigenvalue for which an error term was not available.
Without knowing the distribution of eigenvalues we decided not to attempt to estimate the error terms around those estimates for both the within and
between levels in the analysis.
Alpha K-corrected involves Kristof’s correction for sample size.
Estimates of Guttman’s lambda 1 and lambda 2 coefficients were cross-validated using other commercial software for which routines were readily
available.
Upper bound estimates were constrained to unity, when exceeding the theoretical min-max, for ease of interpretation.
10
SIDERIDIS ET AL
S.E. (ˆ )
S.E. ( zˆ ) = (12)
ˆ (1 − ˆ )
11
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
power and sample sizes as the magnitude of the correlated structure (intraclass
correlation) will not be accounted for. In the presence of unreliable measurement
and large standard errors, the needed sample sizes will likely be prohibitive. The
purpose of the present study was to evaluate and illustrate, using a commercially
available software (Mplus Version 7.4), estimation of within and between level
internal consistency estimation using a National General Teacher test in Saudi
Arabia using the methodologies outlined above for the measurement of various
internal consistency reliability indices and using unilevel and multilevel structures.
Prior attempts to capture multilevel internal consistency estimation were conducted
to evaluate the consistency of means within classes (Raykov & Penev, 2010) rather
than the internal consistency of scales; see also Wilhelm and Schoebi (2007) and
Huang and Weng (2012) for alternative conceptualizations.
Method
12
SIDERIDIS ET AL
Data Analysis
Data were analyzed using Mplus (see supplemental content) and modeled the above
indices of internal consistency reliability for unilevel (aggregate) and multilevel
data (Muthén, 1989, 1990, 1994; Yuan & Bentler, 2007). The university comprised
the clustering variable, with students nested within universities, as it would be
important to test how reliable an aptitude test’s scores are at both the person and
the university levels provided the medium of education is through universities.
Thus, assessing ability at the university level will allow proper evaluation and
comparison between universities and their respective departments, knowing that
rankings and ratings are oftentimes conducted at the university and/or the
department level. Federal and state agencies may make use of such data. For
example, K. Woodhouse (2015) reported that most of governmental funding in
2013 was directed to community colleges and small universities, leaving research
institutions to seek funding from other sources. Such decisions need to be granted
on hard evidence relating the qualities of universities and departments, thus,
accurate estimation of these attributes that the level is essential. Consequently, the
estimation of internal consistency reliability at the person level would suggest the
precision in which ability can be estimated for each individual (person level),
whereas estimation of internal consistency reliability at the university level would
reflect aggregate estimates for groups of individuals who belong to a university and
would point to the accuracy of estimating aptitude at the organization level
(university), so that direct comparisons across universities can be accomplished.
Multilevel Structural Equation Modeling (MSEM) evaluates measurement
and structural models at more than one level in the analysis when nesting is in place
(Geldof et al., 2014; Heck & Thomas, 2015). The primary purpose of modeling
data at two or more levels is to avoid the violation of the independence of
observation assumption which is introduced when ignoring the clustering effects
(e.g., the effects a school administration, teacher, school culture, or classroom
climate exerts on all students-causing a baseline between person correlation that
reflects a systematic source of measurement error) (Julian, 2001). Some
background information and a description of the models utilized in the present
study are presented below.
For the measurement model and using a unidimensional structure, the
relations between the items and the latent factor can be expressed using the
following equation using scalar expression:
Yi = Λ i + (13)
13
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
With Yi being the observed items for person i, Λ being the matrix of factor loadings
on latent variables ηi, and ε the error terms. In order to accommodate predictors at
the latent variable level, structural models can be expressed as follows:
= + B + (14)
= + B + X +
(15)
= + B11 B12 B13 B14 + 1 predictor +
With both errors of the measurement and structural models being distributed
approximately multivariate normal:
j = MVN ( 0, )
j = MVN ( 0, )
Using the multilevel confirmatory factor analysis framework and matrix notation,
the following model was fit to the data:
Results
Prerequisite Analyses
Initially, the necessity to model the university as a random effect was tested by
inspecting the Intra Class Correlations (ICCs), the variance ratio statistic, along
with the design effect estimate (see Table 2, Kish, 1965; Preacher & Selig, 2012;
Werts, Linn, & Jöreskog, 1974). Results justified the presence of multilevel
modeling (nonzero ICCs and large between to within variance ratio statistics as
14
SIDERIDIS ET AL
well; > 2 design effect values). Furthermore, the confidence intervals of those
estimates did not contain zero suggesting the absence of negligible effects.
Table 2. Intra-class correlation coefficients and variance ratio statistics of the general test
with 95% confidence interval estimates
Variance
Construct ICC ICC-CI ratio test† Ratio test CI DEFF
Planning for Learning 5.40% 2.7-10.7% 33.60% 17.3-64.9% 5.665
Promoting Learning 4.30% 2.0-9.33% 32.00% 20.7-49.4% 4.713
Supporting Learning 3.60% 1.5-8.6% 32.50% 20.7-51.1% 4.105
Professional Responsibilities 1.30% 0.3-4.3% 25.30% 14.3-44.9% 2.138
Note: Each of the four factors represents an item for the purposes of the present study.
†
Refers to the Raykov et al. (2016) recent Ratio statistic of the between to within variance estimate as a
supplement to the ICC. Confidence intervals are at 95%.
The DEFF estimate refers to the design effect for which values greater than 2.0 suggest that the
clustering variable contains information that need to be modeled via multilevel modeling techniques
(Maas & Hox, 2005; Muthén & Satorra, 1995). It is estimated using the formula Design
Effect = 1 + (Average Cluster Size – 1) * ICC.
Confidence intervals of the ICCs were estimated using the ci.r function from Raykov and Marcoulides
(2012). A slightly improved function has been put forth by Raykov et al. (2016).
The ICC was measured as the ratio of the between level variance to the sum of between and within
2
variance: s b (s 2
b
2
+ sw ).
Figure 1. One-factor model for the measurement of general ability using aggregate
scores (sum of items) from each of four general ability factors; unstandardized factor
loadings are shown
15
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
Figure 2. One-factor model at both levels in the analysis (person and university) using
unstandardized estimates
16
SIDERIDIS ET AL
17
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
Figure 3. Distributions of internal consistency estimates at the within and between levels
in the analysis; coefficients that overlap are now shown; as shown above, the distribution
of those estimates is approximately normal, thus, there was no need to apply a log-odds
transformation prior to the simulation
18
SIDERIDIS ET AL
follows: (a) Omega Within = 0.677 [C.I. = 0.648-0.707], Omega Between = 0.969
[C.I. = 0.950-0.989], (b) Maximal reliability H Within = 0.731 [C.I. = 0.709-0.755],
H Between = 0.970 [C.I. = 0.951-0.990] (c) αpc Within = 0.527, αpc
Between = 0.744, (d) Cronbach’s alpha Within = 0.673 [C.I. = 0.631-0.718],
Cronbach’s alpha Between = 0.981 [C.I. = 0.431-1.00], (e) Cronbach’s alpha with
Kristof’s correction, Within = 0.674 [C.I. = 0.632-0.719], Between = 0.981
[C.I. = 0.431-1.00], (f) lambda 1 Within = 0.505 [C.I. = 0.475-0.537],
Between = 0.735 [C.I. = 0.322-1.00], and (g) lambda 2 Within = 0.675
[C.I. = 0.633-0.720], Between = 0.987 [C.I. = 0.432-1.00]. The obvious conclusion
was that the estimates of the within level (person) were similar to the aggregate
estimates but the point estimates of the between level were much higher, but with
much less precision, likely because of the relatively small number of the level-2
units (universities in the present case). Figure 3 displays the distribution of within
and between level coefficients following 1,000 replications using estimates of
means and variances from the original dataset.
Conclusion
The purpose this study was to evaluate within and between level internal
consistency estimates for a General Teacher test using various indices such as alpha,
omega, maximal H, lambda 1, lambda 2, αpc and a few corrections on them as past
research has indicated that ignoring nesting can be detrimental to both parameter
estimation, standard error estimation and consequently, reliability
(Pornprasertmanit et al., 2014). The paper attempted to involve a wide variety of
internal consistency indices including the early lower bound indices (Cronbach’s
alpha and its variants). Several important findings emerged in relation to measuring
internal consistency reliability in multilevel versus aggregate structures.
The most important finding related to the measurement of reliability in that,
differences in reliability at each level of the analysis suggests different levels of
precision of the measured instrument. The present study included alpha reliability,
composite reliability, maximal reliability, the αpc statistic, and two of Guttman’s
popular lambda indices, namely λ1 and λ2. All suggested that the measurement of
general competencies was more accurate and consistent at the university level
(between university level of analysis Level-2) compared to the between-person
level of analysis (Level-1). These findings suggest that, again, the aggregate
measurement of reliability when ignoring nested structures can lead to misleading
estimates regarding a measure’s internal consistency estimate as the aggregate
terms average the true reliabilities at each level. This would hinder the true
19
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
20
SIDERIDIS ET AL
recommended that these large numbers compensate, to an extent, for the limited
number of clusters as they found to be associated with smaller standard errors
(Cohen, 1998; Hox & Maas, 2001; Snijders & Bosker, 1993). Second, our
methodology for computing confidence intervals is only one among several
possibilities; Padilla and Divers (2013) presented 6 different methodologies. Third,
the estimates of internal consistency reliability used in the present study represent
only a small fraction of the available estimates (Schmidt, Le, & Ilies, 2003). One
motivation towards including some of these indices was ease to model them in
Mplus compared to more cumbersome indices, such as split half estimates for
which the number of possible splits with large numbers of items increases
exponentially. Last, in the present study we ignored the influence of correlated
errors, which may be detrimental for some coefficients compared to others; e.g., for
alpha, which results in inflation (Komaroff, 1997).
It is important to assess how the above reliability coefficients behave when
the data are dichotomous or polytomous (Dimitrov, 2002, 2003a, 2003b; Padilla &
Divers, 2013) and not continuous as in the present study; see also Yang and Green
(2014). The need to include modeling at various levels when the data are clustered
is nevertheless imperative in light of the recent findings which show that ignoring
clustering is associated with high Type-I error rates when assessing non-invariance
(Kim et al., 2012) or the underestimation of standard errors (and, thus, Type-I error
inflation) when covariates are modeled at the between level (Finch & French, 2011).
Furthermore, it will be important to evaluate reliability in light of the properties of
the measure (e.g., congeneric, tau equivalent, etc.) with the goal of selecting the
most appropriate estimate for the data given evidence that reliability is often
misconducted (Aiken et al., 1990). Last, corrective actions may need to be taken so
that measurement error would be accounted for at each level in the analysis, prior
to moving to more complex structural models using either Bayes’ priors or
information from past research; see G. Woodhouse et al. (1996) for a correction for
unreliability. Other recommended approaches involve parcels to correct for
unreliability (Coffman & MacCallum, 2005).
References
Aiken, L. S., West, S. G., Sechrest, L., Reno, R. R., Roediger III, H. L.,
Scarr, S.,…& Sherman, S. J. (1990). Graduate training in statistics, methodology,
and measurement in psychology: A survey of PhD programs in North America.
American Psychologist, 45(6), 721-734. doi: 10.1037/0003-066x.45.6.721
21
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
22
SIDERIDIS ET AL
Dunn, T. J., Baguley, T., & Brunsden, V. (2013). From alpha to omega: A
practical solution to the pervasive problem of internal consistency estimation.
British Journal of Psychology, 105(3), 399-412. doi: 10.1111/bjop.12046
Feldt, L. S. (2002). Estimating the internal consistency reliability of tests
composed of testlets varying in length. Applied Measurement in Education, 15(1),
33-48. doi: 10.1207/s15324818ame1501_03
Fife, D. A., Mendoza, J. L., & Terry, R. (2012). The assessment of
reliability under range restriction. A comparison of α, ω, and test-retest reliability
for dichotomous data. Educational and Psychological Measurement, 72(5), 862-
888. doi: 10.1177/0013164411430225
Finch, W. H., & French, B. F. (2011). Estimation of MIMIC model
parameters with multilevel data. Structural Equation Modeling, 18(2), 229-252.
doi: 10.1080/10705511.2011.557338
Geldof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation
in a multilevel confirmatory factor analysis framework. Psychological Methods,
19(1), 72-91. doi: 10.1037/a0032138
Goldstein, H. (2003). Multilevel Statistical Models (3rd ed.). London, UK:
Arnold.
Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates
of score reliability: What they are and how to use them. Educational and
Psychological Measurement, 66(6), 930-944. doi: 10.1177/0013164406288165
Guttman, L. (1945). A basis for analyzing test-retest reliability.
Psychometrika, 10(4), 255-282. doi: 10.1007/bf02288892
Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability
within latent variable systems. In R. Cudeck, S. du Toit, & D. Sörbom (Eds.),
Structural equation modeling: Present and future – A festschrift in honor of Karl
Jöreskog (pp. 195-216). Lincolnwood, IL: Scientific Software International.
Hattie, J. A. (1985). Methodology review: Assessing unidimensionality of
tests and items. Applied Psychological Measurement, 9(2), 139-164. doi:
10.1177/014662168500900204
Hayashi, K., & Kamata, A. (2005). A note on the estimator of the alpha
coefficient for standardized variables under normality. Psychometrika, 70(3), 579-
586. doi: 10.1007/s11336-001-0888-1
Heck, R. H. (1999). Multilevel modeling with SEM. In S. L. Thomas &
R.H. Heck (Eds.), Introduction to multilevel modeling techniques (pp. 89-127).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
23
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
24
SIDERIDIS ET AL
25
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
26
SIDERIDIS ET AL
27
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
28
SIDERIDIS ET AL
29
INTERNAL CONSISTENCY RELIABILITY IN MEASUREMENT
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach's α,
Revelle's β, and McDonald's ωh: Their relations with each other and two
alternative conceptualizations of reliability. Psychometrika, 70(1), 123-133. doi:
10.1007/s11336-003-0974-7
30