You are on page 1of 25

Communication Methods and Measures

ISSN: 1931-2458 (Print) 1931-2466 (Online) Journal homepage: https://www.tandfonline.com/loi/hcms20

Use Omega Rather than Cronbach’s Alpha for


Estimating Reliability. But…

Andrew F. Hayes & Jacob J. Coutts

To cite this article: Andrew F. Hayes & Jacob J. Coutts (2020): Use Omega Rather than
Cronbach’s Alpha for Estimating Reliability. But…, Communication Methods and Measures, DOI:
10.1080/19312458.2020.1718629

To link to this article: https://doi.org/10.1080/19312458.2020.1718629

Published online: 11 Feb 2020.

Submit your article to this journal

Article views: 25

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=hcms20
COMMUNICATION METHODS AND MEASURES
https://doi.org/10.1080/19312458.2020.1718629

Use Omega Rather than Cronbach’s Alpha for Estimating


Reliability. But…
Andrew F. Hayesa and Jacob J. Couttsb
a
Department of Psychology and School of Communication, Ohio State University, Columbus, Ohio, USA;
b
Department of Psychology, Ohio State University, Columbus, Ohio, USA

ABSTRACT
Cronbach’s alpha (α) is a widely-used measure of reliability used to quantify the
amount of random measurement error that exists in a sum score or average
generated by a multi-item measurement scale. Yet methodologists have
warned that α is not an optimal measure of reliability relative to its more
general form, McDonald’s omega (ω). Among other reasons, that the computa-
tion of ω is not available as an option in many popular statistics programs and
requires items loadings from a confirmatory factor analysis (CFA) have prob-
ably hindered more widespread adoption. After a bit of discussion of α versus
ω, we illustrate the computation of ω using two structural equation modeling
programs (Mplus and AMOS) and the MBESS package for R. We then describe
a macro for SPSS and SAS (OMEGA) that calculates ω in two ways without
relying on the estimation of loadings or error variances using CFA. We show
that it produces estimates of ω that are nearly identical to when using CFA-
based estimates of item loadings and error variances. We also discuss the use
of the OMEGA macro for certain forms of item analysis and brief form con-
struction based on the removal of items from a longer scale.

Communication scholars, and indeed researchers in all fields that study human behavior, have devised
a number of clever approaches to measuring variables that are in the domain of the field’s topic of
inquiry. Of the many measurement approaches being used in empirical behavioral science research, by
far the most ubiquitous is the multi-item measurement scale (MIMS). A MIMS consists of a set of k items
or indicators of some unobservable construct of interest (e.g., shyness; communication competence,
traumatic stress), the responses to which are numerically aggregated in some fashion (usually summing
or averaging) to generate a single score or measurement of that construct for each unit being measured.
These “responses” are typically exactly that – a person’s self-reported responses to a set of question or
statements, with the Likert-type response format being common. But they need not be responses from
the person being measured. The responses could be provided by a peer or parent of the person. Or when
the unit of measurement is not a person, the “responses” could be from existing archives of data on the
indicators, as in Vandello and Cohen’s (1999) state-level measure of individualism and collectivism, or
from experts’ judgments, such as in various country-level indices of democratic values or press freedom
(Becker, Vlad, & Nusser, 2007)
Many empirical journals publish research on the development of MIMSs. Some recent examples
in communication research include a measure of trust in news media (Prochazka & Schweiger, 2019)
and a measure of warranting values (DeAndrea & Carpenter, 2018). And entire books exist (e.g.,
Robinson, Shaver, & Wrightsman, 1991, 1999; Rubin, Palmgreen, & Sypher, 1994) that are filled with
various MIMSs and research that has been conducted on them (as opposed to with them). When
a MIMS is developed and then published, the goal of publication is to convey to the scientific

CONTACT Andrew F. Hayes hayes.338@osu.edu Department of Psychology, The Ohio State University, 1835 Neil Avenue,
Columbus OH 43210
© 2020 Taylor & Francis Group, LLC
2 A. F. HAYES AND J. J. COUTTS

community that the MIMS exists and that it yields data that have satisfying psychometric properties,
such as high reliability and validity. When a MIMS is constructed ad hoc for a particular study, an
extremely common practice, the research article describing the study should include some discussion
of the reliability and validity of the data it generates.
It is reliability and how it is commonly quantified by developers and users of MIMSs that is the
topic of this paper. A property of the data generated by the instrument when applied to a specific
population rather than a property of the scale itself (Raykov, 2007; Streiner, 2003), reliability
quantifies the amount of random measurement error or “noise” that exists in a set of measurements.
The data can be said to be highly reliable if there is high correspondence between respondents’
observed scores – the scores generated by the MIMS – and their actual level or amount of the
attribute being measured. The higher the reliability, the more we can trust that differences between
people on the scores the MIMS generates are an accurate reflection of actual individual differences
on what the MIMS is measuring. If an instrument generates data with low reliability, the observed
scores contain a lot of random measurement error. The consequences of low reliability can be
negative and severe, such as, on the one hand, reduced power to detect real effects (a false negative)
or, on the other hand, an increase in the likelihood of claiming effects observed in a study are real
when they aren’t (a false positive). But, importantly, high reliability says next to nothing about what
a MIMS is actually measuring – its construct validity – meaning whether it is measuring what you
claim it is measuring or are using it to measure. We do not discuss validity in this paper, with the
exception of a brief mention at the end.
In this paper, we take exception to the common practice of relying on a popular measure of
reliability reported by users of a MIMS: Cronbach’s alpha (α). After discussing what α is and what it
is not, we advocate, as many methodologists before us have, a shift to the use of McDonald’s omega
(ω) as a related but better alternative. Recognizing that shifting a field’s statistical practice is next to
impossible without offering some guidance on computation, we then offer a tutorial on the calcula-
tion of ω using confirmatory factor analysis in Mplus and AMOS as well as in R. Recognizing as well
that researchers may resist change when new computational requirements add new burdens to their
research lives, such as having to learn a new piece of software, we introduce an approach to the
computation of omega built into an easy-to-use computational tool (the OMEGA macro) available
for the popular SPSS and SAS programs, the use of which requires no knowledge of programming or
confirmatory factor analysis. We also discuss a feature in the OMEGA macro useful for item analysis
and scale shortening. We end acknowledging some additional reasons for resisting change and
counter with arguments for why switching to ω is sensible and the right thing to do.

Cronbach’s Alpha versus McDonald’s Omega


Classical test theory is the underpinning of many measures of reliability that have been proposed
over the decades and used by behavioral scientists. Applied to a MIMS, classical test theory
postulates that the response a person offers to item i in a MIMS, Xi , in a set of k items measuring
a construct are a function of the respondent’s “true score” T and a random error in measurement,
Xi ¼ μi þ λi T þ ei (1)
where μi is a constant, λi is item i’s factor loading and ei is an error in estimation of that response to
item i. This model is graphically depicted in Figure 1 (without the constants), which is
a unidimensional factor model of latent variable T using k indicators of T (i.e., the Xi items).
Although not a requirement of all of the mathematics that follows, we are assuming that each of the
k items is measured on a common response scale, which is typically the case when a researcher uses
a MIMS to measure a specific construct.
Because T can rarely be known – it is latent rather than directly observed – the sum of a person’s
responses to the k items
COMMUNICATION METHODS AND MEASURES 3

Figure 1. A unidimensional factor model of latent variable T with k observed indicators (Xi).

X
k
O¼ Xi (2)
i¼1

is frequently used as a proxy for T.1 But because each Xi contains some random error in measure-
ment (from Equation 1), so too does O. Reliability is defined as the proportion of the variance (V) in
the observed scores O attributable to actual variance in T: V(T)/V(O). We want this proportion to be
high, meaning that most of variation in O is attributable to actual variation on the construct being
measured (T).2
Under the assumption that ei is uncorrelated with T for all k items, ei is uncorrelated with the
error in estimation of item j, for all i ≠ j, and all factor loadings are equal to a common value λ, the
reliability of O can be estimated fairly accurately using Cronbach’s α (Cronbach, 1951), defined as
Pk !
k i¼1 V ðXi Þ
α¼ 1 (3)
k1 V ðO Þ

where V (Xi ) is the variance of responses to item i and V (O) is the variance of the observed sum.
Cronbach’s α estimates the squared correlation between T and O, which is the proportion of p the
ffiffiffi
variance in observed scores attributable to true variation in the dimension measured. Thus, α
estimates the correlation between O and T. It has several other useful interpretations (see McNeish,
4 A. F. HAYES AND J. J. COUTTS

2018). As Nunnally (1978) stated in his famous psychometrics book, Cronbach’s α “is so pregnant
with meaning that it should routinely be applied to all new tests.” (p. 214).3
Behavioral scientists appear to have taken Nunnally’s advice seriously. Cronbach’s 1951
Psychometrika paper has been cited more than some of the most influential research in the sciences
(Dunn et al., 2014; Sijtsma, 2009), and Cronbach’s α is the dominant measure of reliability reported
in studies that rely on a MIMS. For example, Flake, Pek, and Hehman (2017) found that when
a MIMS was used in articles published in the Journal of Personality and Social Psychology in 2014,
Cronbach’s α was the measure reported nearly three quarters of the time. And in a more extensive
content analysis of psychology journals, McNeish (2018) found that Cronbach’s α was used in over
90% of articles in which reporting a measure of reliability of a MIMS would be relevant.
Communication scholars are just as reliant on Cronbach’s α as researchers in other fields. We
examined the 2017 and 2018 volumes of Communication Monographs, Communication Research,
Human Communication Research, Journal of Communication, and Journal of Computer-Mediated
Communication and found 187 articles that relied on at least one MIMS in the study or collection of
studies reported. Of these, the majority (173, 93%) reported Cronbach’s α. The vast majority of the
remaining articles reported no reliability information at all. In other words, when reliability was
reported for the data generated by a MIMS, usually Cronbach’s α was the measure reported.
Though widely used, Cronbach’s α is also misunderstood. Cortina (1993) and Schmitt (1996)
addressed several of these misunderstandings decades ago, but they persist (for more recent discus-
sions, see Sijtsma, 2009; Streiner, 2003). Perhaps the most pervasive is the belief that a high value of α
justifies the summation of item scores as a measurement of the construct on the grounds that high α
implies unidimensionality of the item set. But the meaningfulness of Cronbach’s α as an estimate of
reliability assumes unidimensionality (Graham, 2006). High α is not evidence of unidimensionality.
Alpha can be high even if a set of items measures more than one construct. Thus, α should not be
reported as a measure of reliability of a set of observed scores without having first established that
the items are measuring a single construct.
Cronbach’s α is also sometimes misinterpreted as a measure of the internal consistency or within-
person homogeneity of responses to the items. That is, a higher α is interpreted to reflect stronger
correlations between responses across the items. Although it is true that all other things being equal,
α is larger the higher the average correlation between item responses, that average may consist of
correlations that vary dramatically in size and could include correlations that are zero or near zero.
Furthermore, α is influenced by the number of items. Holding the average inter-item correlation
constant, α increases as the number of items k increases. If the number of items is sufficiently large, α
could be large even though the intercorrelation between the items is generally quite small. So α is not
directly a measure of internal consistency or homogeneity of item responses, even though it is often
described and taught as an “internal consistency” measure of reliability.
Regardless of proper interpretation, a casual observer of the literature or newcomer to the
behavioral science enterprise could easily come to the conclusion that Cronbach’s α is the only
measure of reliability available for data generated by a MIMS or that it must be the best measure.
Neither is so. Although there are many alternatives that are sensitive to different ways of
conceptualizing reliability, a serious competitor to α is attributed to McDonald (1999) and so is
sometimes called McDonald’s omega (ω). Like α, it can be calculated from a single administration
of a set of items as
P
ð λi Þ2
ω¼ P 2 P (4)
ð λi Þ þ Vðei Þ

where V(ei ) is the variance of the errors in estimation of item i from Equation (1) (also see Figure 1)
and the summation is over all i = 1 to k items. Under the assumption of uncorrelated errors across
the k items (i.e., unidimensionality), Equation (4) is equivalent to
COMMUNICATION METHODS AND MEASURES 5

P
ð λi Þ2
ω¼ (5)
V ðO Þ
(McDonald, 1999, p. 89). McDonald (1999, pp. 91–92) also discusses that under the assumptions
listed earlier, Cronbach’s α (Equation 3) is a special case of Equations (4) and (5) with all k loadings
λi fixed to a common value λ. This assumption of equal factor loadings (along with some other
conditions), often unknown to or neglected by users of Cronbach’s α, is called essential tau-
equivalence, although there are alternative ways of expressing this assumption without relying on
the factor loadings. So ω is a more general estimator of reliability than α because it does not assume
essential tau-equivalence yet reduces to α under the assumption of essential tau-equivalence.
When the k items are measured on the same scale (the typical situation for most MIMS used by
researchers), essential tau-equivalence means that a given difference in the latent variable T corresponds
to the same difference in response to all items. Alternatively, it means that each item measures latent
variable T with the same degree of precision. For example, consider the items “I find it hard to be around
people,” and “I consider myself to be an introvert” as indicators of a latent variable we might call
“shyness.” Essential tau-equivalence would be satisfied for these two items if two people who differ by
a given amount in shyness differ from each other by the same amount in their response to the two items.
Or if the variance in responses to the two items was the same, essential tau-equivalence means that the
correlations between T and responses to the two items are the same. This seems like a strong assumption
even for a two-item scale such as this. But essential tau-equivalence applies to all pairs of items in the
scale, not just a single pair.
As others have stated, essential tau-equivalence seems likely the exception rather than the norm
for most scales, and research has shown that when essential tau-equivalence is violated (meaning
a set of items or a scale is congeneric), α estimates reliability less accurately than does ω (Green &
Yang, 2009; Raykov, 1998; Trizano-Hermosilla & Alvarado, 2016). For this reason, and others, there
is strong sentiment among at least some psychometricians and others who study measurement for
a living that α is not the best measure of reliability or even one that should be preferred like it clearly
has been over the years. Although there are disagreements about what should replace it, McDonald’s
ω or closely related measures that don’t assume essential tau-equivalence have emerged as recom-
mended alternatives (Dunn, Baguley, & Brunsden, 2014; Graham, 2006; Green & Yang, 2009;
McNeish, 2018; Raykov, 1998). This is the position we take as well. To facilitate the adoption of
this recommendation, we dedicate much of the rest of this paper to illustrating the computation of ω
in five data analysis programs that researchers will likely be familiar with to some degree.

Computation of Omega Using Statistical Software


Cronbach’s α may be the dominant measure of reliability used by behavioral scientistics in part
because its computation requires only item and total score variances, making it easy to teach in
research methods and statistics classes and illustrate with simple hand computations. Further
facilitating its widespread adoption in the age of computers, it was long ago implemented in popular
data analysis programs such as SPSS and SAS that historically have and still do dominate the desktop
computers of behavioral scientists. However, with varying degrees of ease, ω can be computed with
the aid of statistical software that can conduct confirmatory factor analysis. In this section we show
how to compute ω first using two structural equation modeling (SEM) programs (Mplus and
AMOS) and then using R. However, many without familiarity or comfort with using these programs
will find their use frustrating, so such a tutorial may have little impact on practice for many. To
make the transition to ω easier, we then introduce a computational tool for SPSS (a similar version is
also available for SAS) that calculates ω without the use of confirmatory factor analysis and that
requires only a single line of syntax or a drop-down dialog box that can be installed into the SPSS
system. This, we believe, will eliminate the last remaining excuse (though we introduce what could
potentially be a new one later) for relying on α rather than ω as a measure of reliability.
6 A. F. HAYES AND J. J. COUTTS

Our example is based on data collected from 211 undergraduate students who completed
the eight-item Blirtatiousness scale, a MIMS designed to measure “how quickly, frequently,
and effusively people respond to their partners” (Swann & Rentfrow, 2001, p. 1160). The item
wordings can be found in Table 1. Responses were offered by selecting a box on a 5-point
Likert-type response scale, anchored with the terms “strongly disagree,” “disagree,” “neither
agree nor disagree,” “agree,” and “strongly agree,” scored 1 through 5, respectively. Items 2, 3,
5, and 7 are reverse keyed and so were reversed scored by subtracting the response from 6
prior to analysis, resulting in higher item responses (and the sum of responses) reflecting
higher blirtatiousness. The data are held in variables named blirt1, blirt2r, blirt3r, blirt4,
blirt5r, blirt6, blirt7r, and blirt8. The data file (with reverse scoring already implemented for
variables ending with “r” in the name) can be downloaded from www.afhayes.com and is
available as an SPSS data file (blirt8.sav) as well as a comma-delimited text file (blirt8.csv)
without variable name headings. The variables in the csv file are in the same order as the
variable names listed above.

Structural Equation Modeling: Mplus and AMOS


The computation of ω as we have discussed it thus far requires factor loadings and error
variances [λi and Vðei Þ in Equation (4)] when the model diagramed in Figure 1 is estimated
using CFA. CFA is typically conducted in stand-alone SEM software (such as Mplus, AMOS,
LISREL, EQS), or in more general statistical packages that have built in SEM routines (such as
PROC CALIS in SAS, the SEM command in STATA, or the lavaan package in R). Green and
Yang (2009), Yang & Green (2010)) discuss SEM approaches to the estimation of reliability (also
see Graham, 2006; Raykov, 1997b) and favor SEM because it offers the analyst considerable
flexibility in how the measurement model is specified. Here we illustrate the computation of ω
using maximum likelihood estimation of factor loadings and error variances in Mplus and
AMOS. Other programs would generate similar results, with any discrepancies resulting from
minor variations in default estimation routines, convergence criteria, model specification, and so
forth.

Table 1. Factor loadings from confirmatory (CFA) and exploratory (EFA) maximum likelihood factor analysis and the Hancock-An
algorithm (HA), discrepancies from CFA loadings, and estimates of Omega from 217 participants who completed the blirtatious-
ness scale.
Loadings (λ)
CFA EFA-ML HA
If I have something to say, I don’t hesitate to say it (BLIRT1) 0.662 0.663 0.654
It often takes me a while to figure out how to express myself (BLIRT2R) 0.640 0.642 0.677
If I disagree with someone, I tend to wait until later to say something. (BLIRT3R) 0.574 0.576 0.583
I always say what is on my mind (BLIRT4) 0.709 0.711 0.711
Sometimes I just don’t know what to say to people (BLIRT5R) 0.548 0.549 0.549
I never have a problem saying what I think (BLIRT6) 0.641 0.643 0.636
When emotions are involved, it’s difficult for me to argue my opinion (BLIRT7R) 0.376 0.376 0.380
I speak my mind as soon as a thought enters my head. (BLIRT8) 0.397 0.398 0.344
Discrepancy from CFA loadings
Bias – 0.001 0.002
RMSE – 0.002 0.023
Omega reliability
Mplus 0.785 – –
AMOS 0.785 – –
MBESS in R 0.785 – –
Omega macro – 0.785 0.779
Test of equality of CFA loadings (essential tau-equivalence): χ2(7) = 22.382, p = .002
COMMUNICATION METHODS AND MEASURES 7

Mplus
Mplus is a versatile data analysis program capable of doing confirmatory factor analysis, larger SEM
problems, and most anything else that can be parameterized in the form of a set of equations or models.
The Mplus program in the box below reads the comma-delimited data file and specifies which variable in
the data (or constructed in the program) are used in the model (under DATA and VARIABLE), specifies
the equations that define the model (under MODEL), and calculates new statistics that are expressed in
the form of functions of statistics estimated by the model equations (under MODEL CONSTRAINT). In
this code, we have told Mplus to freely estimate all the factor loadings (with the star following blirt1;
otherwise, by default, it will fix the loading for the first indicator following “by” to 1). In order to identify
the model, we fixed the variance of the latent BLIRT variable to 1 (with the code “blirt@1”). The labels in
parentheses following each indicator allow us to refer to the parameter estimates (the loadings and error
variances) in the MODEL CONSTRAINT section, where ω is calculated.
DATA:
file is c:\omega\blirt8.csv;
VARIABLE:
NAMES are blirt1 blirt2r blirt3r blirt4 blirt5r blirt6 blirt7r blirt8;
USEVARIABLES are blirt1 blirt2r blirt3r blirt4 blirt5r blirt6
blirt7r blirt8;
MODEL:
blirt by blirt1* (b1)
blirt2r (b2)
blirt3r (b3)
blirt4 (b4)
blirt5r (b5)
blirt6 (b6)
blirt7r (b7)
blirt8 (b8);
blirt1 (e1); blirt2r (e2); blirt3r (e3);blirt4 (e4);blirt5r (e5);
blirt6 (e6); blirt7r (e7); blirt8 (e8);
blirt@1;
MODEL CONSTRAINT:
new sumload2 sumevar omega;
sumload2=(b1+b2+b3+b4+b5+b6+b7+b8)**2;
sumevar=e1+e2+e3+e4+e5+e6+e7+e8;
omega=sumload2/(sumload2+sumevar);

An excerpt of the output generated by this code can be found in Figure 2. The factor loadings are
found at the top under “BLIRT BY”, and the error variances are found toward the bottom under the
heading “Residual Variances.” Applying Equation (4),
 X 2
λi ¼ ð0:662 þ 0:640 þ 0:574 þ 0:709 þ 0:548 þ 0:641 þ 0:376 þ 0:397Þ2 ¼ 20:675
X
Vðei Þ ¼ 0:581 þ 0:660 þ 0:541 þ 0:494 þ 0:841 þ 0:547 þ 1:156 þ 0:843 ¼ 5:663
and so
P
ð λi Þ2 20:675
ω¼ P 2 P ¼ ¼ 0:785
ð λi Þ þ Vðei Þ 20:675 þ 5:663
However, these hand computations are not necessary, as the code under the MODEL
CONSTRAINT heading constructs the numerator and denominator of Equation (4) and gen-
erates ω in the output, as can be seen toward the bottom of Figure 2. Mplus also generates
8 A. F. HAYES AND J. J. COUTTS

Two-Tailed
Estimate S.E. Est./S.E. P-Value

BLIRT BY
BLIRT1 0.662 0.068 9.687 0.000
BLIRT2R 0.640 0.071 8.957 0.000
BLIRT3R 0.574 0.064 8.941 0.000
BLIRT4 0.709 0.066 10.688 0.000
BLIRT5R 0.548 0.077 7.140 0.000
BLIRT6 0.641 0.066 9.679 0.000
BLIRT7R 0.376 0.085 4.434 0.000
BLIRT8 0.397 0.074 5.379 0.000

Intercepts
BLIRT1 3.365 0.069 48.432 0.000
BLIRT2R 3.379 0.071 47.462 0.000
BLIRT3R 3.156 0.064 49.119 0.000
BLIRT4 3.133 0.069 45.581 0.000
BLIRT5R 2.976 0.074 40.463 0.000
BLIRT6 3.237 0.067 48.037 0.000
BLIRT7R 3.379 0.078 43.101 0.000
BLIRT8 2.664 0.069 38.679 0.000

Variances
BLIRT 1.000 0.000 999.000 999.000

Residual Variances
BLIRT1 0.581 0.069 8.441 0.000
BLIRT2R 0.660 0.076 8.677 0.000
BLIRT3R 0.541 0.062 8.774 0.000
BLIRT4 0.494 0.064 7.760 0.000
BLIRT5R 0.841 0.090 9.322 0.000
BLIRT6 0.547 0.065 8.443 0.000
BLIRT7R 1.156 0.116 9.974 0.000
BLIRT8 0.843 0.086 9.787 0.000

New/Additional Parameters
SUMLOAD2 20.680 2.569 8.051 0.000
SUMEVAR 5.662 0.214 26.449 0.000
OMEGA 0.785 0.022 35.185 0.000

ω
Figure 2. Excerpt of Mplus output from a confirmatory factor analysis of the blirtatiousness scale.

a standard error for ω, which can be used to construct a confidence interval in the usual way.
Alternatively, a bootstrap confidence interval can be constructed by bootstrapping the distribu-
tion of ω (see Padilla & Divers, 2013,; Raykov, 1998, for a discussion of bootstrap inference for
reliability). In Mplus, this is accomplished by adding
COMMUNICATION METHODS AND MEASURES 9

ANALYSIS:
bootstrap=10000;
OUTPUT:
cinterval (bootstrap)

to the code above. When we did so, the 95% confidence interval generated was (0.737, 0.824).
Mplus can also be used to conduct a test of essential tau-equivalence by reestimating the model
but fixing the factor loadings to be equal. For this example, this is accomplished by adding the line

b1=b2;b1=b3;b1=b4;b1=b5;b1=b6;b1=b7;b1=b8;

under the MODEL CONSTRAINT command in the code above. This tells Mplus to estimate the
model under the constraint that the factor loading for items 2 through 8 should be constrained
to be the same as the factor loading for item 1 (which is equivalent to saying all 8 are fixed to be
the same value). A likelihood ratio test of the difference in fit of this model compared to the
model without this constraint serves as a test of the reasonableness of the equality constraint
imposed on the factor loadings. This likelihood ratio test revealed that the model constraining
the loadings to be equal fit the data significantly worse compared to when the factor loadings
were allowed to differ, χ2(7) = 22.382, p < .01, verifying that ω is a more appropriate measure of
reliability than Cronbach’s α.

AMOS
Like Mplus, AMOS is a structural equation modeling system that can estimate, among other things,
confirmatory factor analysis models. Although it can be programmed much like Mplus by writing
code corresponding to the model (using its AMOS Basic language), it has enjoyed wide use in part
because of its friendly graphical user interface that allows the user to bypass the writing of code by
drawing the model on the computer screen.
The AMOS Graphics file that sets up a single factor CFA model for the Blirtatiousness scale and
generates an estimate of ω can be found in Figure 3, along with the factor loadings and error
variances. On the left side of the model, with the latent BLIRT variable sending effects to the eight
items, is a traditional single factor CFA model. We have fixed the variance of the latent variable to
one, allowing for the free estimation of all the factor loadings. Notice that the loadings and error
variances are identical to those generated by Mplus and so will produce the same estimate of ω if the
computations are done by hand.
But as with Mplus, hand computation of ω is not necessary. The section of the model on the
right of the diagram sends unit weighted effects toward the “SUM” variable, with an error
variance set to 0. This represents the sum of the eight items as a latent variable. In the implied
covariance matrix toward the bottom of Figure 3, requested in the output by selecting
“Standardized estimates” and “All implied moments” in the Analysis Properties option, is the
correlation between the latent BLIRT variable and the latent SUM of the eight items (Graham,
2006). This correlation is the square root of ω, so squaring this correlation produces ω, as noted
in Figure 3: ω ¼ 0:8862 ¼ 0:785:

The MBESS Package in R


R is a freely available open source statistical and programming language that is growing in popularity
throughout the sciences. One of its major disadvantages is that many find it hard to learn, and
without practice and attention, what is learned is easily forgotten. But there are a few packages
available for R that calculate various measures of reliability such as Cronbach’s α and ω. McNeish
(2018) provides a discussion of some of these packages and illustrates the computation of ω using
each of them. We use the MBESS package (Kelley, 2007) in our illustration here.
10 A. F. HAYES AND J. J. COUTTS

Figure 3. AMOS input and output excerpt from a confirmatory factor analysis of the blirtatiousness scale. The implied correlation (generated
by choosing the “Standardized estimates” and “all implied moments” options in Analysis Properties) between the latent variable (BLIRT) and
the “latent” sum of the eight items (SUM) is the square root of ω. Square this correlation to generate ω: 0.8862 = 0.785.

The ci.reliability function in MBESS estimates the factor loadings using the lavaan package and
then uses these loadings in the computation of ω. After installing the MBESS package, the R code
below was applied to the blirtatiousness data, stored in a comma-delimited file:

require(MBESS)
dat<-read.table(“c:\\omega\\blirt8.csv”, sep=”,”,header=FALSE)
ci.reliability(dat,type=“omega”)

The resulting output is ω,


$est
[1] 0.7850364

which agrees with the estimate generated by Mplus and AMOS. The ci.reliability function in MBESS
can also be used to construct a confidence interval for ω using either the estimated standard error or
bootstrap approaches. See Dunn et al. (2014) or McNeish (2018) for a tutorial.
COMMUNICATION METHODS AND MEASURES 11

SPSS and SAS


SPSS and SAS are popular data analysis packages used widely in communication, the behavioral
sciences more broadly, business and marketing research, and various other research communities.
But neither has a procedure that computes ω, and this may have hindered the adoption of ω more
widely. The only reliability options we are aware of available in SPSS and SAS include Cronbach’s
α, various split-half approaches, as well as a few others (e.g., Guttman’s lambda coefficients;
Guttman, 1945) little known to most researchers.
Here, we introduce a computational tool – a macro – that brings ω to SPSS and SAS users while
also easing its computation relative to SEM approaches by reducing the work down to writing and
executing a single line of code. The OMEGA macro is freely available as an SPSS syntax file defining
the macro that can be downloaded from www.afhayes.com. For SPSS users who prefer working with
the graphical user interface, we have also created a custom dialog file that can be installed in the
SPSS menu system, allowing the user to set up the analysis using familiar point-and-click interface.
A SAS program defining the macro is also available for SAS with the same features we describe here
and a similar syntax structure. We focus on the SPSS version below. See the documentation for
additional detail and the syntax for the SAS version.
The example computations of ω just presented in AMOS, Mplus, and R rely on the iterative
computation of factor loadings and error variances using CFA and maximum likelihood estimation.
SPSS has no built in procedures for confirmatory factor analysis. However, we have found that the
FACTOR routine, typically used for exploratory factor analysis (EFA), when using the maximum
likelihood extraction method generates factor loadings that correspond closely to what a single-
factor CFA routine generates (we justify this claim below) when a single-factor solution is requested.
An alternative for the computation of ω built into the OMEGA macro relies on the closed form
approximate approach described in Hancock and An (in press). This closed-form approach also
facilitates the production of output useful for some forms of item analysis and the production of
brief forms of longer scales. We next describe the use of each of these options built in to the OMEGA
macro and then, in the next section, discuss its use for item analysis and scale shortening.

Using EFA Factor Loadings


After activation of the OMEGA macro by running the macro definition syntax file, the SPSS
command below executes the computation of ω using loadings generated by its factor analysis
routine:

omega items=blirt1 blirt2r blirt3r blirt4 blirt5r blirt6 blirt7r blirt8.

When this line of code is executed, a maximum likelihood factor analysis of the correlation matrix4
of the eight blirtatiousness items is initiated, forcing a single-factor solution. In regular SPSS syntax,
this OMEGA command conducts the equivalent of the following FACTOR command built into SPSS

factor variables= blirt1 blirt2r blirt3r blirt4 blirt5r blirt6 blirt7r blirt8
/extraction=ml/criteria factors(1).

The OMEGA macro will generate the output from this command automatically in the output
window.5 Once the factor analysis is conducted, the macro then extracts the factor loadings from
the output, multiplies them by the standard deviations of the items to place them on the original
response scale of the items (standardized loadings are also available as an output option) and feeds
them into a computational routine that generates an estimate of ω using Equation (4).
The resulting output can be found in Figure 4, panel A. The estimate of ω is 0.785, the same as
generated using loadings from a CFA conducted using AMOS, Mplus, and the MBESS package for
R. And as can be seen, the OMEGA macro also generates the factor loadings as well as the error variances
12 A. F. HAYES AND J. J. COUTTS

a
This estimate of omega is based on the factor loadings of a forced single-factor
maximum likelihood factor analysis using SPSS's built in FACTOR procedure.

Reliability:
Omega
.785

Item means, standard deviations, and estimated loadings:


Mean SD Loading ErrorVar
blirt1 3.365 1.012 .663 .583
blirt2r 3.379 1.037 .642 .663
blirt3r 3.156 .936 .576 .544
blirt4 3.133 1.001 .711 .496
blirt5r 2.976 1.071 .549 .845
blirt6 3.237 .981 .643 .549
blirt7r 3.379 1.142 .376 1.161
blirt8 2.664 1.003 .398 .847

b
This estimate of omega is based on the approximate and closed-form solution
to the computation of loadings described in Hancock, G. R., and An, J. (2020).
A closed-form alternative for estimating omega reliability under unidimensionality.
Measurement: Interdisciplinary Research and Perspectives.

Reliability:
Omega BootSE BootLL95 BootUL95
.779 .025 .721 .819

Item means, standard deviations, and estimated loadings:


Mean SD Loading
blirt1 3.365 1.012 .654
blirt2r 3.379 1.037 .677
blirt3r 3.156 .936 .583
blirt4 3.133 1.001 .711
blirt5r 2.976 1.071 .549
blirt6 3.237 .981 .636
blirt7r 3.379 1.142 .380
blirt8 2.664 1.003 .344

Figure 4. Output from the OMEGA macro for SPSS using maximum likelihood factor analysis (panel A) or the HA algorithm (panel
B) for estimating factoring loadings.

(defined as the variance of the item multiplied by the quantity 1 minus the squared standardized factor
loading for the item). These loadings can be found in Table 1 to facilitate a side-by-side comparison with
the loadings generated by AMOS and Mplus. The loadings are nearly identical.
Table 1 also shows two objective measures of correspondence between the CFA and EFA-derived
loadings. “Bias” is the mean discrepancy between the k = 8 OMEGA macro loadings (λiðEFAÞ ) and the
loadings generated through a single-factor CFA in Mplus or AMOS (λiðCFAÞ ), defined as

1X k
Bias ¼ ½λiðEFAÞ λiðCFAÞ  (6)
k i¼1

A value of zero means that the EFA loadings are, on average across the items, the same as the CFA
loadings. The second discrepancy measure is “RMSE”, the root mean squared error,
COMMUNICATION METHODS AND MEASURES 13

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u k
u1 X
RMSE ¼ t ½λiðEFAÞ λiðCFAÞ 2 (7)
k i¼1

which is similar to a standard deviation and captures roughly the average difference between the
CFA and EFA loadings, irrespective of the sign of the difference. RMSE can only be positive, with
a value closer to zero representing greater similarity between the pairs of loadings. Regardless of
which measure is considered, the small discrepancy values seen in Table 1 reflect little difference
between these two sets of loadings, Bias = 0.001 and RMSE = 0.002. The EFA loadings are close
approximations of the CFA loadings.
This similarity in loadings and the estimate of ω between the EFA and CFA approaches are not
specific to this one example set of participants responding to this one scale. We have tested the OMEGA
macro on datasets we had available that included participants’ responses to 17 different scales, including
the approach and avoidance subscales of the Argumentativeness Scale (Infante & Rancer, 1982), Fear of
Social Isolation (Hayes, Matthes, & Eveland, 2013), Willingness to Self-Censor (Hayes, Glynn, &
Shanahan, 2005), Self-Esteem (Rosenberg, 1965), Shyness (Cheek & Buss, 1981), the Big 5 personality
factors (John & Srivastava, 1999), the conformity and conversation subscales of the Family
Communication Patterns Scale (Ritchie & Fitzpatrick, 1990), Fear of Negative Evaluation (Leary,
1983), Public Self-Consciousness (PSC), and the Social Anxiety subscale of the PSC (Fenigstein,
Scheier, & Buss, 1975). As is typical practice, these were all administered using a Likert response format
with 5-point ordinal response options (e.g., strongly disagree to strongly agree), and using samples of
various sizes ranging from 86 to over 2,000. For all these scales and datasets, Table 2 contains the estimate
of ω from the OMEGA macro using the EFA loadings, the Mplus code described earlier, and the MBESS
package for R, and the two measures of discrepancy between the EFA and the CFA loadings (Equations 6
and 7). Also provided in Table 2 is a likelihood ratio test of equality of the factor loadings using CFA. As
can be seen, the differences between ω and the loadings using CFA versus EFA are consistently and
generally negligible. It appears to make little difference at all whether ω is computed exactly using a CFA
routine such as in Mplus, AMOS, or the MBESS package for R, or using the OMEGA macro that relies on
SPSS’s exploratory factor analysis routine. Notice as well that for every scale we examined, the essential
tau-equivalence assumption was violated.

Hancock and An’s Closed-Form Approximation of Loadings


The OMEGA macro can estimate ω using an alternative, closed form (i.e., non-iterative) approach. It
is a closed-form method because it generates loadings that can be computed without relying on
iterative computation like in EFA and CFA. This approach is described by Hancock and An (in
press), which we refer to as the “HA” method here, and involves estimating the loading for item
i from sums of covariances and products of covariances of sets of three items i, j, and q. The estimate
of item i’s loading is calculated as
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi
"j < q ðCOVij COViq Þ
λiðHAÞ ¼ P (8)
"j < q ðCOVjq Þ

(Equation 8 in Hancock & An, in press) where COV is a covariance between the responses to the two
items subscripted and "j < q denotes “for all combinations of items j and q when j is less than q.” No
factor analysis is needed. All that is needed to compute ω are the covariances of responses to the k items
and the variance of the sum of the k items (the OMEGA macro uses Equation 5 for the computation of ω
when the HA option is requested), all readily calculated using the data available. Hancock and An (in
press) compared the performance of the HA algorithm in estimating ω compared to CFA in an extensive
Monte Carlo simulation, varying sample size, number of items, and the sizes and spread of the loadings.
They found HA to be a serious competitor to CFA estimation except when loadings were small and/or
the scale consisted of a very small number of items.
14

Table 2. Scale names and characteristics, estimates of reliability (Cronbach’s α and McDonald’s ω), and discrepancies from CFA loadings.
EFA-ML HA
A. F. HAYES AND J. J. COUTTS

CFA
Scale Name Items n ω ω CFA λ Discrepancy Bias/RMSE ω CFA λ Discrepancy Bias/RMSE Test of Tau-Equivalence Cronbach’s α
Argumentativeness – Approach 10 86 .837 .837 0.003/0.003 .829 0.002/0.024 χ2(9) = 38.768, p < .001 .828
Argumentativeness – Avoidance 10 87 .839 .839 0.004/0.004 .834 0.006/0.049 χ2(9) = 41.310, p < .001 .831
Big 5 – Openness 10 322 .780 .780 0.001/0.001 .777 0.002/0.026 χ2(9) = 25.969, p = .002 .780
Big 5 – Conscientiousness 9 322 .763 .763 0.000/0.002 .764 0.004/0.032 χ2(8) = 22.070, p = .005 .766
Big 5 – Extraversion 8 325 .874 .874 0.001/0.001 .870 −0.002/0.012 χ2(7) = 86.650, p < .001 .867
Big 5 – Agreeableness 9 322 .787 .787 0.001/0.001 .784 0.002/0.024 χ2(8) = 29.486, p < .001 .787
Big 5 – Neuroticism 8 325 .803 .803 0.001/0.001 .799 0.001/0.018 χ2(7) = 27.643, p < .001 .800
Blirtatiousness 8 211 .785 .785 0.001/0.002 .779 −0.002/0.023 χ2(7) = 22.276, p = .002 .780
Family Communication Pattern – Conversation 15 329 .914 .914 0.001/0.001 .913 0.001/0.010 χ2(14) = 63.472, p < .001 .913
Family Communication Pattern – Conformity 11 329 .830 .830 0.001/0.001 .823 −0.003/0.018 χ2(10) = 89.804, p < .001 .824
Fear of Negative Evaluation 11 240 .902 .902 0.001/0.001 .902 0.008/0.044 χ2(10) = 52.804, p < .001 .903
Fear of Social Isolation 5 233 .864 .864 0.002/0.002 .864 0.002/0.009 χ2(4) = 16.125, p = .003 .862
Public Self Consciousness 7 415 .826 .826 0.001/0.001 .823 −0.001/0.012 χ2(6) = 44.528, p < .001 .823
Social Anxiety 6 236 .810 .810 0.001/0.001 .807 −0.001/0.018 χ2(5) = 39.607, p < .001 .800
Self-Esteem 10 234 .906 .906 0.001/0.001 .904 0.001/0.007 χ2(9) = 80.755, p < .001 .902
Shyness 13 1611 .897 .897 0.000/0.001 .897 0.001/0.010 χ2(12) = 460.393, p < .001 .895
Willingness to Self-Censor 8 2470 .811 .811 0.000/0.000 .810 −0.001/0.004 χ2(7) = 112.242, p < .001 .809
COMMUNICATION METHODS AND MEASURES 15

One advantage of the HA approach relative to EFA and CFA is that it is fast. In addition, it is
quite easy to program a routine that applies this algorithm repeatedly to different subsets of items
without requiring additional work from the user, facilitating the generation of item analysis statistics
that we discuss later, as well as the construction of a bootstrap confidence interval for ω.
A disadvantage is that the resulting loadings sometimes produce estimates of ω larger than 1, and
sometimes a loading can’t be constructed at all, as when the operand in the denominator of Equation
8 is negative. When either of these occurs, the OMEGA macro returns −999 for ω or a loading and
the corresponding output should not be interpreted.
The HA algorithm is activated with the use of the ha = 1 argument in the OMEGA command, as
below. In this example, we also request a 95% bootstrap confidence interval for ω (generated using the
percentile method and 10,000 bootstrap samples) by adding boot = 10000 to the command, as below:

omega items=blirt1 blirt2r blirt3r blirt4 blirt5r blirt6 blirt7r blirt8/ha=1


/boot=10000.

The resulting output can be found in Figure 4, panel B. Using this approximation of the loadings, ω = 0.779
(95% bootstrap confidence interval from 0.722 to 0.818, with a bootstrap estimate of the standard error of
0.025), which is close to the 0.785 produced by CFA and EFA approaches. The macro also displays the item
means and standard deviations and the estimated factor loadings. For ease of comparison of the loadings
relative to those generated with a CFA, the loadings are found in Table 1 for the blirtatiousness scale
example, along with the corresponding discrepancy measures compared to the CFA loadings. Notice the
loadings are similar, though not as close as when using EFA.
Like we did for the EFA approach implemented in the macro, we have applied the HA algorithm
to data collected using the 17 scales described earlier. Table 2 contains ω estimated with this
approximation for each of the scales. They are similar to ω calculated in other ways. Furthermore,
as can be seen, the discrepancies between the loadings generated using this approach and the CFA
loadings are quite small (using Equations 6 and 7, substituting λiðHAÞ for λiðEFAÞ ), but clearly much
larger than when using SPSS’s ML-based EFA routine.

Item Analysis and Brief Form Construction Using OMEGA and the HA Algorithm
The HA algorithm programmed in the OMEGA macro makes it easy to conduct some forms of item
analysis that focus on how one or more items contribute to the estimate of reliability. In this section,
we discuss the limitations of the popular leave-one-out approach to item analysis and offer an
approach implemented in the OMEGA macro that relies on the construction of all possible “subset
scales” that can be constructed from a set of k items. We show how this subsets approach can be
used to select items to remove from a scale that don’t contribute positively to its reliability or to
assist in the construction of brief forms of existing measurement scales.

Item Analysis
When constructing or evaluating a MIMS, it is common to do an item analysis, examining how each
item in the scale contributes to the psychometric qualities of the resulting data. Exploratory or
confirmatory factor analysis is frequently used to assess the dimensionality of a set of items, which
items measure which factors if the item set is multidimensional, and to examine which items fail to
load on any factor. Once a set of items (either the entire set being considered or a subset of the items
that load on a common factor) is accepted as unidimensional, the investigator might then calculate
how much an estimate of reliability increases or decreases when an item is removed from the scale as
an additional approach to eliminating items that do not contribute to or perhaps even hurt
reliability. Some statistics programs such as SPSS and SAS will generate all possible “leave-one-
16 A. F. HAYES AND J. J. COUTTS

out” (LOO) estimates of Cronbach’s α by showing, for each item, what happens to estimated
reliability of the set when the item is excluded. Scale developers using this procedure typically do
so iteratively, the first time using all k items in a scale or subscale, the second time throwing out the
item that lowers reliability the most, and so forth, purging one item from the scale at each iteration
until the reliability of the scale can no longer be improved by throwing out items.
However, the LOO approach to item analysis is problematic. The change in a measure of
reliability when an item is removed is dependent on the presence of the other items in the scale,
some of which may themselves be worthy of exclusion. The reliability of a set of items at
iteration t will depend on which of the k items was discarded at iteration t – 1. Consider item 1
in a 5-item set. Using the LOO approach, reliability is estimated for the scale with all 5 items as
well as a four-item scale that excludes item 1. Suppose reliability is higher when item 1 is
removed first, suggesting that its presence hurts reliability, and so the analyst decides to discard
item 1. However, suppose that item 2 is also a problematic item, and when removed first rather
than item 1, one might find in the next iteration that discarding item 1 from the now 4-item
scale may actually hurt reliability, suggesting that item 1 should be retained rather than
discarded.
More generally, for any item i, an iterative LOO approach implemented by discarding an item
that lowers reliability by its inclusion in the set will generate the reliability of, at most, k – 1 scales
containing item i. As discussed below, this is typically only a small fraction of the possible scales one
could create from the set of k items that include item i and that could be useful in evaluating the
value of that item to the measurement scale. The result is likely to be some doubt as to whether items
purged or retained have produced the best scale (by a reliability standard) one could generate from
these k items (cf., Morris, 1978b).
The OMEGA macro implements this LOO approach to item analysis (using either ω or
Cronbach’s α as the reliability index) by adding loo =1 to the command on page 23 (see the
resulting output at the top of Figure 5). But it has an additional feature that helps to overcome
this shortcoming of the LOO approach. If an item lowers the reliability of the data generated by a set
of k items, then you would expect that any scale constructed from some subset of the k items that
includes that item would generate less reliable data on average than would a scale that excludes it.
For example, suppose one is open to having as few as two items on the final version of the scale.
With k items, there are 2k – k – 1 possible scales with at least two items (including the scale with all
k items). Of these, 2(k – 1) – 1 contain item i and 2(k – 1) – k do not. Item i would be a strong
candidate for exclusion from the scale if, on average, reliability is nontrivially smaller in the set of
scales that include item i than in the set that exclude that item.
The subsets option in the OMEGA macro estimates the average gain in reliability when an item is
included in the scale by calculating the reliability for all possible scales from the set of k items that
can be constructed with and without each item and calculating the difference in average reliability
between the two sets. The result is a more comprehensive assessment of the contribution that item
i makes to the reliability of a set of items and therefore a more informed decision as whether or not
the item should be retained or excluded from the set.
This option is available in the OMEGA macro when using the HA algorithm to estimate ω and
the factor loadings.6 Recall that the HA algorithm relies on the covariances of sets of three items to
estimate ω. So the OMEGA macro will generate all 2k- 0.5(k2+k+ 2) possible scales with at least three
items when calculating the average gain in reliability attributable to item i, 2(k – 1) – k of which
contain item i and 2(k – 1)- 0.5(k2 – k+ 2) which do not.
Returning to the blirtatiousness scale, Figure 5 contains the output that results when using the
subsets option, requested by adding subsets=1 to the OMEGA command, as in

omega items=blirt1 blirt2r blirt3r blirt4 blirt5r blirt6 blirt7r blirt8/ha=1


/loo=1/boot=10000/subsets=1.
COMMUNICATION METHODS AND MEASURES 17

Reliability:
Omega BootSE BootLL95 BootUL95
.779 .024 .722 .818

Item means, standard deviations, estimated loadings, and reliability if item deleted:
Mean SD Loading Omega
blirt1 3.365 1.011 .654 .742
blirt2r 3.379 .973 .677 .741
blirt3r 3.156 .971 .583 .745
blirt4 3.133 1.029 .711 .733
blirt5r 2.976 1.104 .549 .764
blirt6 3.237 1.001 .636 .741
blirt7r 3.379 1.051 .380 .787
blirt8 2.664 .903 .344 .781

Reliability and subscale-full scale correlations (r) for all possible


subscales with 3 or more items (1=item included, 0=not included):

blirt1 blirt2r blirt3r blirt4 blirt5r blirt6 blirt7r blirt8 Omega r


1 .000 .000 .000 1.000 1.000 .000 .000 1.000 .931 .859
2 .000 1.000 .000 .000 1.000 .000 .000 1.000 .827 .852
3 1.000 1.000 1.000 1.000 1.000 1.000 .000 .000 .792 .962
4 1.000 1.000 1.000 1.000 1.000 1.000 .000 1.000 .787 .977
5 1.000 1.000 1.000 1.000 .000 1.000 .000 .000 .782 .945
6 1.000 1.000 1.000 1.000 .000 1.000 .000 1.000 .782 .952
7 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .000 .781 .983
8 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .779 1.000
9 1.000 1.000 1.000 1.000 .000 1.000 1.000 1.000 .764 .983
10 1.000 .000 1.000 1.000 .000 1.000 .000 .000 .761 .908
11 1.000 .000 1.000 1.000 1.000 1.000 .000 .000 .761 .941
12 .000 1.000 1.000 1.000 1.000 1.000 .000 .000 .761 .940
13 1.000 1.000 1.000 1.000 .000 1.000 1.000 .000 .760 .971
14 1.000 .000 1.000 1.000 .000 1.000 .000 1.000 .759 .914
15 .000 .000 .000 .000 1.000 1.000 .000 1.000 .759 .858
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
28 1.000 1.000 .000 1.000 1.000 1.000 1.000 .000 .744 .972
29 1.000 1.000 1.000 1.000 .000 .000 .000 .000 .742 .923
30 1.000 .000 1.000 1.000 1.000 1.000 1.000 .000 .742 .967
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
216 1.000 .000 .000 .000 1.000 .000 1.000 .000 .471 .854
217 1.000 .000 .000 .000 1.000 .000 1.000 1.000 .437 .910
218 .000 .000 1.000 .000 .000 .000 1.000 1.000 .408 .823
219 .000 .000 .000 .000 1.000 .000 1.000 1.000 .349 .821

Mean reliability of all possible subset scales with and without item:
Without With Gain
blirt1 .637 .682 .045
blirt2r .635 .683 .048
blirt3r .643 .677 .034
blirt4 .624 .692 .067
blirt5r .651 .669 .018
blirt6 .637 .681 .045
blirt7r .692 .636 -.057
blirt8 .669 .655 -.014

Figure 5. OMEGA macro output when using the loo and subsets options.

This section of output at the bottom of Figure 5 contains a small table with eight rows, one for each
of the items in the blirtatiousness scale. In the columns are the mean ω generated from the 120
possible scales with at least three of the eight blirtatiousness items but that exclude the item in that
row (“Without”), the mean ω of the 99 subsets with at least three items that include that item
(“With”), and the average gain in reliability when the item is included (“Gain”). When the gain is
positive, including the item enhances reliability on average, and when it is negative, the item reduces
reliability on average.
18 A. F. HAYES AND J. J. COUTTS

Notice that the subsets procedure identifies items 7 and 8 as potentially problematic items, as the
LOO approach (toward the top of the output, generated with the loo =1 option in the code) does.
However, the LOO approach suggests only a tiny reduction in reliability of when item 7 is included
(0.779 for the full eight item version) compared to the seven item scale that excludes it (0.787). But
when considering all possible scales with three or more items, the inclusion of item 7 reduces
reliability by much more – 0.057 on average. Likewise, the LOO approach suggests only a tiny
reduction in reliability of 0.002 attributable the inclusion of item 8. The actual reduction in reliability
attributable to item 8 when considering all possible scales from the set is a much larger 0.014.
It is tempting to conclude that if items 7 and 8 were excluded from the blirtatiousness scale, the
resulting reliability estimate would be 0.057 + 0.014 = 0.071 higher. Not true. In the next section, we
will see another feature of the subsets option that shows that when both items 7 and 8 are excluded
while the other 6 items are retained, reliability increases by only 0.013, from 0.779 to 0.792. We will
also see that a four item version of the blirtatiousness scale could be justified, at least by a reliability
standard, constructed by throwing out items 5, 6, 7, and 8, even though items 5 and 6 appear to
increase rather than decrease reliability.

Brief Form Construction


Many measurement scales used by behavioral scientists contain some, often even substantial
redundancy in item content. A large number of items might be helpful for achieving content
validity, but excessive items can come at a price. Longer surveys can be more expensive to
administer. And there is the potential that respondents will be more likely to become fatigued,
disinterested in responding thoughtfully or truthfully, or break off the interview or cease to answer
all the questions, resulting in missing data or error-prone responses.
With these considerations in mind, many short or “brief forms” of scales exist that are designed,
in part, to reduce respondent fatigue and lower data collection costs relative to when the full scale is
used (e.g., Cacioppo, Petty, & Kao, 1984; Leary, 1983; Stephenson, Hoyle, Palmgreen, & Slater, 2003;
Zhu & Yzer, 2019). The subsets option in the OMEGA macro can assist in the development of such
brief forms or, like when conducting item analysis, otherwise winnow from a set of items those that
add redundancy without improving and perhaps even harm the psychometric qualities of the
instrument. It does so by helping to identify a subset of m items from the original set of k that
yields data that are sufficiently reliable and that generates scale scores that are highly correlated with
measurements obtained if all k items were administered. If such a shorter version of the scale exists,
and typically several do, then the k – m remaining items in the full scale probably add little to the
psychometric qualities of the scale and can be abandoned for the sake of measurement efficiency.
Why spend the effort and cost administering a k-item scale when a smaller set of m of those items
will suffice from a psychometric standpoint?
More formally, define rk as reliability using the full k-item measure, rm as the reliability using
a specific subset of m of the k items, m < k, and rmin as the minimum reliability of an m-item version
of the full scale that the analyst or research community will find acceptable. Furthermore, define cm
as the correlation between the scores from the full k-item scale and an m-item version, and cmin as
the minimum desired correlation between scores from an m-item version of the scale and full scale
scores. The output from the subsets option contains all possible m-item scales that can be con-
structed from the k items, m = 2 (for α) or 3 (for ω) to k, as well as rm and cm for each of these scales.
Using this section of the output, you scan these possible subscales with the goal of finding an m-item
version of the scale with
rm  rmin
and
cm  cmin
COMMUNICATION METHODS AND MEASURES 19

but without sacrificing face or content validity. You choose the values of rmin and cmin that are
acceptable given the purpose of the research and uses of the MIMS. Most would not want to use rmin
less than 0.7 (higher would be better, such as 0.90 or greater for measures used to make consequen-
tial decisions about people; Nunnally & Bernstein, 1994, p. 265; see Lance, Butts, & Michels, 2006,
for a discussion of beliefs about conventional cutoffs for “acceptable” reliability). And we recom-
mend cmin of at least 0.90, given that the goal is to produce a concise scale that is functionally
equivalent to a longer version including all k items.
For any value of m, several good candidate brief forms may exist, and there may be several good
candidates with different values of m. From among these choices, better to settle on the brief form
with rm not much smaller than rk and preferably higher if possible, as well as cm close to 1. Such
a scale would generate data about as reliable as the full scale, if not more so, while generating scores
that are functionally equivalent to those generated when using the full set of k items.
Figure 5 contains an excerpt of output generated when using the subsets option and applied to the
data from the blirtatiousness scale. Each subset occupies a row in this table and is denoted with, in
the columns, a 0 or 1 indicating whether (1) or not (0) the item labeling the column is included in
the subset. The final two columns contain the estimated reliability (ω) of the sum generated with
those items included (rm ) as well as the correlation between scores generated with the subset scale
containing only those items and the scores generated using all k items (cm ).
By default, the rows are sorted by decreasing value of rm .7 The scale with all k = 8 blirtatiousness
items is in the eighth row, meaning that seven sets with fewer than eight items exist with rm > rk . The
top row is an m = 3-item version containing only items 4, 5, and 8, with rm = 0.931. Blirtatiousness
scores generated by adding these three items correlate cm =0.853 with the scores generated by adding
up all 8 items. Notice that row 3 in the table corresponds to the m = 6-item version that excludes
items 7 and 8, two items that, on average, lower reliability when they are included, as revealed in the
prior discussion of the subsets option output. Reliability of the scores generated with these 6 items is
rm = 0.792, higher than when all items are used, while generating scores correlated nearly perfectly
(cm = 0.962) with scores generated using all eight items. Another interesting candidate for a brief
form of the blirtatiousness scale is found in row 29. This m = 4-item version contains items 1, 2, 3,
and 4, with only slightly lower reliability than when all 8 items are used (rm = 0.742) but it generates
scores highly correlated with the full set of 8 items (cm = 0.923).

Some Cautions and Caveats regarding the Subsets Option


The subsets option in the OMEGA macro allows for an examination of reliability as estimated with ω
(or Cronbach’s α) for all possible scales that can be constructed from a set of items. In our example
using the eight-item blirtatiousness scale, only a couple hundred scales can be constructed. However,
as the number of items increases, the number of possible scales increases dramatically. For a set of 10
items, there are 968 possible scales with at least three items. With 15 items, this increases to 32,647,
and with 20, there are over a million possible such scales. The OMEGA macro attempts to generate
them all. This can take time and computational memory to store the results, perhaps more than you
have available on your computer or (in extreme cases) your lifetime. Limiting our concern about
this, however, is our skepticism that any measurement scale with more than 12 or so items is truly
unidimensional. Unidimensionality is an assumption of the meaningfulness of a measure of relia-
bility or a composite score constructed as the sum or mean of a set of items, so we can’t recommend
the use of the OMEGA macro with more than 12 to 15 items unless the user is confident that the
unidimensionality assumption of ω or α is satisfied.
Second, use of subsets option as we have described it is data mining. It is exploratory in nature.
When a MIMS is constructed by examining potential items for a scale that do and do not perform
well psychometrically and choosing the best scale based on a subset of items with highest reliability,
some overfitting is likely, just like when using stepwise selection procedures in regression analysis to
choose a set of predictors to include in a model (see Kopalle & Lehman, 1997; their discussion occurs
20 A. F. HAYES AND J. J. COUTTS

in the context of Cronbach’s α but should apply to ω as well. Also see Raykov, 2007). The potential is
high that the data generated by the best brief form chosen through this exploratory procedure will
not be as reliable when applied to data not used to make the choice. This concern generalizes to the
use of the correlation between subset and full scale scores as a criterion for the choice. We strongly
recommend crossvalidation and replication before accepting that the chosen set of items is perform-
ing as intended in data not used to develop the scale in the first place.
Third, and relatedly, although the subsets option in the OMEGA macro can assist in reducing the
number of items on a measurement instrument being developed or already in existence, it is important to
remember that good measurement is about far more than establishing high reliability (see Flake & Fried,
2019; Fried & Flake, 2018). We cannot just willy-nilly modify instruments that may be based on decades
of research and assume that a brief form is also a valid measure of the same construct. A shortened
version of an existing established measurement instrument should go through a rigorous process of
validation just as the original (hopefully) did. At a minimum, a researcher using a brief form developed
with the help of the OMEGA macro should attempt to make the argument empirically that the
measurement instrument used is yielding high quality measurement of the intended construct. Indeed,
a danger of overly focusing on reliability as quantified with α or ω when constructing a brief form of an
existing MIMS is a reduction in content validity that can result by increasing the homogeneity of item
content, thereby narrowing the scope of what the scale is actually measuring. The final decision as to
which of the many possible m-item brief forms is ideal should include a consideration of both content
validity and relevant statistical evidence.

Out of Excuses or Making Something Out of Nothing?


In this paper, we have taken the position that communication scholars and those throughout the
behavioral sciences using a MIMS in their research should transition away from Cronbach’s α as their
favored measure of reliability toward its parent measure, McDonald’s ω. Cronbach’s α is a special case of
ω that requires a restrictive assumption that is unlikely to be met in many measurement situations. To
facilitate adoption of this recommendation, we have illustrated methods of computing ω, from more
complicated SEM approaches to an easy-to-use implementation in SPSS and SAS.
We can certainly understand some researchers’ reluctance to adopt an alternative measure of
reliability that is less familiar, not widely discussed in classrooms or research methods books, and
appears to be harder to calculate. However, with its implementation in open source software such as
R and the OMEGA macro for SPSS and SAS introduced here, anyone can calculate ω without having
to do CFA or even knowing what CFA is. Thus, clinging to α because it is easier to calculate is an
untenable position to take.
However, anyone who adopts this recommendation will eventually discover something that we
should be honest and open about. While the methodology literature has demonstrated that ω is clearly
a superior and preferred measure of reliability relative to α, in our experience, the two measures typically
produce similar estimates of reliability when applied to real data. In our example analysis using the data
generated by the blirtatiousness scale, ω computed most precisely using CFA loadings is 0.785. But α is
just a tad smaller at 0.780. The similarity in estimates is not specific to this one dataset or scale. The last
column in Table 2 provides Cronbach’s α for the 17 scales and datasets that we used to examine the
similarity in results produced by different approaches to calculating ω. Notice that there is consistently
little difference between α and ω, even though in every case we found the essential tau-equivalence
assumption that α makes is violated. So why abandon α if it tends to be similar to ω?
In our opinion, using these results to defend the status quo is an excuse, not a defense. First,
although the data collection and analysis practices, as well as the measurement scales we used here
are similar to those commonly used by behavioral scientists, we cannot say that never will the
discrepancy between α and ω be larger in any data set one could construct using any measurement
scale that exists or that is someday invented. Research has shown that the difference between α and ω
can be nontrivial in some circumstances (see McNeish, 2018, for a review)
COMMUNICATION METHODS AND MEASURES 21

Second, the scales we used here have already gone through a process of refinement and evalua-
tion. Poor items, presumably with low factor loadings relative to others, had been purged from the
scales before they were finalized, validated, and published. So long as the number of items is not too
small, research (as well as common sense and intuition) suggests that α will tend to deviate less from
alternative measures of reliability such as ω when loadings are uniformly larger and the scale
therefore more likely to be closer to essentially tau-equivalent (Raykov, 1997a; Raykov &
Marcoulides, 2019; Yang & Green, 2011). But when developing new measures or when measures
are constructed ad hoc for use in a specific study, items with smaller loadings or that perhaps more
extensive research would exclude from the scale are more likely to be used, increasing the likelihood
that α and ω will diverge.
Third, notice in Table 2 that α tends to be smaller than ω (when estimated with CFA or the
exploratory factor analysis approach implemented in our macro), consistent with the conventional
wisdom that α represents a lower bound on reliability. In other words, some might argue that α is
a conservative estimate, so if α is judged to be acceptably high, then that suggests that actual
reliability more accurately estimated is even higher. Although it is true that α tends to underestimate
reliability when the essential tau-equivalence assumption is violated, research shows that in some
circumstances, such as when the random measurement errors are correlated across items, α can
overestimate reliability (see McNeish, 2018; Raykov, 1997a, 2001). So it simply isn’t true that α is
necessarily a conservative underestimator of reliability.
Finally, many of the assumptions built into the statistical methods social scientists use are there
because simplifying assumptions made computations easier in the days before computers domi-
nated research laboratories and scientists’ offices. Given that assumptions can never be proven to
be true, using a method that makes an assumption (such as essential tau-equivalence) doesn’t
make sense when a method exists that is at least as good but that doesn’t require that assumption.
Robustness to assumption violations is a nice property of a statistical method in a world with no
better alternatives just as easy to implement. But with respect to the estimation of reliability, this is
a world we do not inhabit.
We conclude with two last points. First, although ω is preferable to α, neither is necessarily the
best measures of reliability out there. Both are fallible and can produce poor estimates in a variety of
situations you may confront, such as when responses to many of the items are skewed. For
a discussion of some alternatives, see McNeish (2018), Raykov and Marcoulides (2019), Sijtsma
(2009), and Trizano-Hermosilla and Alvarado (2016). Second, debating too vigorously the merits of
different measures of reliability is perhaps akin to laboring over the choice between granite and
marble countertops in your kitchen renovation when your house is on fire. Behavioral scientists,
communication scholars included, dedicate far less time than they should to developing good
measurement instruments and justifying that the instruments they are using are in fact good.
Questionable measurement practices abound in the behavioral sciences, such as ad hoc instrument
construction without validation, deleting or modifying items from validated scales, and changing the
number or anchor of response options from those used when the scale was originally developed
(Flake & Fried, 2019; Fried & Flake, 2018). This certainly contributes to a messy literature filled with
inconsistencies in findings, false negatives and positives, and other phenomena produced by
researcher degrees of freedom when analyzing data. High reliability is not good enough. Indeed, it
is only marginally relevant to whether the instrument being used is actually measuring what the user
intends to be measuring.
It is high time to make the switch to ω. It may not be the best of all possible replacements, but it is
quite easy to calculate, can be computed in several ways, is (now) available in some popular statistical
packages, and its properties and performance have lead many before us to recommend its use rather
than α. But while making the transition, let’s not forget that good measurement requires much more
than establishing acceptable reliability.
22 A. F. HAYES AND J. J. COUTTS

Notes
1. When the response scales for the k indicators are the same, the arithmetic average of the k indicators is also
frequently used as a proxy for T. The use of the average does not change the argument we make here or the
estimate of reliability that results, but it will change a little bit of the math that we describe.
2. As Raykov and Marcoulides (2019) point out, this definition is problematic as a general definition of reliability.
It is possible for data to be highly reliable in circumstances in which there is no variation in T, yet reliability
would be 0 (or undefined, depending on the variance in O) by this definition. They argue that this definition of
reliability should be conditioned on V(T) > 0.
3. McDonald (1999) refers to Equation 3 as Guttman-Cronbach α. Although Cronbach popularized Equation 3,
Guttman (1945) invented this measure of reliability before Cronbach’s influential paper was published.
4. The ML extraction method does not allow for the factor analysis of a covariance matrix.
5. The output from the factor routine generated by the omega macro should be examined to make sure that the
factor analysis converged and generated a solution. The rest of the output should not be interpreted if an error
is generated by the factor command.
6. The subsets option is also available in the OMEGA macro when using Cronbach’s α as the measure of
reliability. Note that we are not the first either to suggest maximizing reliability through selective item deletion
or to provide software that implements such an all subsets approach. See Morris (1978a, 1978b) and Serlin and
Kaiser (1976) for some earlier treatments of this topic that focus on maximizing Cronbach’s α.
7. The sort option can be used to change the sorting of the rows of this table by reliability (the default, sort = 0) to
sorting by rm (sort = 1) or the number of items m (sort = 2). This table can also be saved as a data file with the use of
the save option. See the documentation for details.

Disclosure statement
No potential conflict of interest was reported by the authors.

References
Becker, L. B., Vlad, T., & Nusser, N. (2007). An evaluation of press freedom indicators. The International
Communication Gazette, 69, 5–28. doi:10.1177/1748048507072774
Cacioppo, J. T., Petty, R. E., & Kao, C. F. (1984). The efficient assessment of need for cognition. Journal of Personality
Assessment, 48, 306–307. doi:10.1207/s15327752jpa4803_13
Cheek, J. M., & Buss, A. H. (1981). Shyness and sociability. Journal of Personality and Social Psychology, 41, 330–339.
doi:10.1037/0022-3514.41.2.330
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied
Psychology, 78, 98–104. doi:10.1037/0021-9010.78.1.98
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. doi:10.1007/
BF02310555
DeAndrea, D. C., & Carpenter, C. (2018). Measuring the construct of warranting value and testing warranting theory.
Communication Research, 45, 1193–1215. doi:10.1177/0093650216644022
Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of
internal consistency estimation. British Journal of Psychology, 105, 399–412. doi:10.1111/bjop.12046
Fenigstein, A., Scheier, M. F., & Buss, A. H. (1975). Public and private self-consciousness: Assessment and theory.
Journal of Consulting and Clinical Psychology, 43, 522-527. doi:10.1037/h0076760
Flake, J. K., & Fried, E. I. (2019, January 17). Measurement schmeasurement: Questionable measurement practices and
how to avoid them. doi:10.31234/osf.io/hs7wm
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and
recommendations. Social Psychological and Personality Science, 8, 370–378. doi:10.1177/1948550617693063
Fried, E. I., & Flake, J. K. (2018). Measurement matters. APS Observer, 31(3). Retrieved from https://www.psycholo
gicalscience.org/observer/measurement-matters
Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how
to use them. Educational and Psychological Measurement, 66, 930–944. doi:10.1177/0013164406288165
Green, S. B., & Yang, Y. (2009). Commentary on coefficient alpha: A cautionary tale. Psychometrika, 74, 121–135.
doi:10.1007/s11336-008-9098-4
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282. doi:10.1007/BF02288892
Hancock, G. R., & An, J. (in press). A closed-form alternative for estimating omega reliability under unidimensionality.
Measurement: Interdisciplinary Research and Perspectives.
COMMUNICATION METHODS AND MEASURES 23

Hayes, A. F., Glynn, C. J., & Shanahan, J. (2005). Willingness to self-censor: A construct and measurement tool for
public opinion research. International Journal of Public Opinion Research, 17, 298–323. doi:10.1093/ijpor/edh073
Hayes, A. F., Matthes, J., & Eveland, W. P. (2013). Stimulating the quasi-statistical organ: Fear of social isolation
motivates the quest for knowledge of the opinion climate. Communication Research, 40, 439–462. doi:10.1177/
0093650211428608
Infante, D. A., & Rancer, A. S. (1982). A conceptualization and measure of argumentativeness. Journal of Personality
Assessment, 46, 72–80. doi:10.1207/s15327752jpa4601_13
John, O. P., & Srivastava, S. (1999). The big five trait taxonomy: History, measurement, and theoretical perspectives. In
L. A. Pervin & O. P. John (Eds.), Handbook of personality: Theory and research (pp. 102–138). New York, NY: Guilford
Press.
Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: An R package. Behavior Research
Methods, 39, 979–984. doi:10.3758/BF03192993
Kopalle, P. K., & Lehman, D. R. (1997). Alpha inflation?: The impact of eliminating scale items on Cronbach’s alpha.
Organizational Behavior and Human Decision Processes, 70, 189–197. doi:10.1006/obhd.1997.2702
Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did
they really say? Organizational Research Methods, 9, 202–220. doi:10.1177/1094428105284919
Leary, M. R. (1983). A brief version of the fear of negative evaluation scale. Personality and Social Psychology Bulletin,
9, 371-375. doi:10.1177/0146167283093007
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23, 412–433. doi:10.1037/
met0000144
Morris, J. D. (1978a). A comparison of three algorithms for item analysis to maximize coefficient alpha. Educational
and Psychological Measurement, 38, 801–804. doi:10.1177/001316447803800321
Morris, J. D. (1978b). Maximizing coefficient alpha reliability while maintaining validity. Behavior Research Methods
and Instrumentation, 19, 733–734. doi:10.3758/BF03205385
Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd edition). New York, NY: McGraw-Hill.
Padilla, M. A., & Divers, J. (2013). Bootstrap interval estimation of reliability via coefficient omega. Journal of Modern
Applied Statistical Methods, 12, 78–89. doi:10.22237/jmasm/1367381520
Prochazka, F., & Schweiger, W. (2019). How to measure generalized trust in news media? An adaptation and test of
scales. Communication Methods and Measures, 13, 26–42. doi:10.1080/19312458.2018.1506021
Raykov, T. (1997a). Scale reliability, Cronbach’s coefficient alpha, and violations of essential tau-equivalence with fixed
congeneric components. Multivariate Behavioral Research, 32, 329–353. doi:10.1207/s15327906mbr3204_2
Raykov, T. (1997b). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement,
21, 173–184. doi:10.1177/01466216970212006
Raykov, T. (1998). A method for obtaining standard errors and confidence intervals of composite reliability for
congeneric items. Applied Psychological Measurement, 22, 369–374. doi:10.1177/014662169802200406
Raykov, T. (2001). Bias of coefficient α for fixed congeneric measures with correlated errors. Applied Psychological
Measurement, 25, 69–76. doi:10.1177/01466216010251005
Raykov, T. (2007). Reliability if deleted, not ‘alpha if deleted’: Evaluation of scale reliability following component
deletion. British Journal of Mathematical and Statistical Psychology, 60, 201–216. doi:10.1348/000711006X115954
Raykov, T., & Marcoulides, G. A. (2019). Thanks coefficient alpha, we still need you! Educational and Psychological
Measurement, 79, 200–210. doi:10.1177/0013164417725127
Ritchie, L. D., & Fitzpatrick, M. A. (1990). Family communication patterns: Measuring intrapersonal perceptions of
interpersonal relationships. Communication Research, 17, 523–544. doi:10.1177/009365090017004007
Robinson, J. P., Shaver, P. R., & Wrightsman, L. S. (1991). Measures of personality and social psychological attitudes.
San Diego, CA: Academic Press.
Robinson, J. P., Shaver, P. R., & Wrightsman, L. S. (1999). Measures of political attitudes. San Diego, CA: Academic
Press.
Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: Princeton University Press.
Rubin, R. B., Palmgreen, P., & Sypher, H. E. (1994). Communication research measures: A sourcebook. New York, NY:
The Guilford Press.
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350–353. doi:10.1037/1040-
3590.8.4.350
Serlin, R. C., & Kaiser, H. F. (1976). A computer program for item selection based on maximizing internal consistency.
Educational and Psychological Measurement, 36, 757–759. doi:10.1177/001316447603600328
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach's alpha. Psychometrika, 74,
107–112. doi: 10.1007/S11336-008-9101-0
Stephenson, M. T., Hoyle, R. H., Palmgreen, P., & Slater, M. D. (2003). Brief measures of sensation seeking for
screening and large scale surveys. Drug and Alcohol Dependence, 72, 279–286. doi:10.1016/j.drugalcdep.2003.08.003
24 A. F. HAYES AND J. J. COUTTS

Streiner, D. L. (2003). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal
of Personality Assessment, 80, 99–103. doi:10.1207/S15327752JPA8001_18
Swann, W. B., & Rentfrow, P. J. (2001). Blirtatiousness: Cognitive, behavioral, and physiological consequences of rapid
responding. Journal of Personality and Social Psychology, 81, 1160–1175. doi:10.1037/0022-3514.81.6.1160
Trizano-Hermosilla, I., & Alvarado, J. M. (2016). Best alternatives to Cronbach’s alpha reliability in realistic condi-
tions: Congeneric and asymmetrical measurements. Frontiers in Psychology, 7, Article 769. doi:10.3389/
fpsyg.2016.00769
Vandello, J. A., & Cohen, D. (1999). Patterns of individualism and collectivism across the United States. Journal of
Personality and Social Psychology, 77, 279–292. doi:10.1037/0022-3514.77.2.279
Yang, Y., & Green, S. B. (2010). A note on structural equation modeling estimates of reliability. Structural Equation
Modeling, 17, 66–81. doi:10.1080/10705510903438963
Yang, Y., & Green, S. B. (2011). Coefficient alpha: A reliability coefficient for the 21st century? Journal of
Psychoeducational Assessment, 29, 377–392. doi:10.1177/0734282911406668
Zhu, X., & Yzer, M. (2019). Testing a brief scale format self-affirmation induction for use in health communication
research and practice. Communication Methods and Measures, 13, 178–197. doi:10.1080/19312458.2019.1572084

You might also like