Professional Documents
Culture Documents
Coefficient Alpha
ROBERT A. PETERSON*
Despite some limitations, Cronbach's coefficient alpha remains the most widely
used measure of scale reliability. The purpose of this article was to empirically
document the magnitudes of alpha coefficients obtained in behavioral research,
compare these obtained values with guidelines and recommendations set forth by
individuals such as Nunnally (1967, 1978), and provide insights into research design
characteristics that may influence the size of coefficient alpha. Average reported
alpha coefficients ranged from .70 for values and beliefs to .82 for job satisfaction.
With few exceptions, there were no subs~tive relationships between the magnitude
of coefficient alpha and the research design characteristics investigated.
(k ~ 1)(1 - j~ (17/(1~)
245-246) Preliminary research .7
Basic research .8 a =
Applied research .9-.95
or
ki'
the research of Churchill and Peter (1984) and bears 1 + r(k - 1) ,
many similarities to their work (see also Peter and
Churchill 1986). Even so, this research differs from that where k is the number of items in the scale, (17 is the
of Churchill and Peter in several regards. First, there is variance of item i, (1~ is the variance of the scale, and i'
a subtle difference in the objectives, and consequently is the average interitem correlation.
the scope, of the two endeavors. Although both studies Whether by acclamation (see, e.g., Churchill 1979;
can be described as meta-analyses, Churchill and Peter Gerbing and Anderson 1988; Peter 1979) or citation,
focused on investigating "the effects of research design coefficient alpha has effectively become the measure of
on reliability estimates" (p. 360), whereas the current choice for estimating the reliability of a multi-item scale.
focus was somewhat broader in that it attempted to Indeed, coefficient alpha has become one of the foun-
provide base rate comparison standards at a construct dations of measurement theory. Because it is a gener-
level. alized intraclass correlation coefficient, coefficient alpha
The difference in objectives resulted in three notable can be derived from the theory of true and error scores,
study differences. First, Churchill and Peter included as well as from the domain sampling model. According
six different reliability coefficients in their analyses. Be- to the Social Science Citation Index, Cronbach's 1951
cause of the admonitions of Guilford (1965, chap. 17) article has been referenced in more than 2,200 articles
that different reliability coefficients are not conceptually in the last 20 years. Not only is coefficient alpha the
or numerically comparable, the present investigation most widely used estimator of reliability, but also it has
focused on only one reliability coefficient, Cronbach's been the subject of considerable methodological and
(1951) coefficient alpha. Second, the present study was analytical attention (see, e.g., Cortina 1993). Therefore,
significantly larger in scope in terms of the journals re- focusing on coefficient alpha should not detract from
viewed, the years covered, and the number of reliability the generality of the research. Instead, it should improve
coefficients analyzed. For example, Churchill and Peter the usefulness of the research, because there is no het-
limited their investigation to marketing-related journals erogeneity in the data due to the presence of other re-
over 12 years, whereas the present research included liability coefficients.
both marketing and psychology journals over 33 years.
This resulted in nearly 30 times the number of reliability METHODOLOGY
coefficients analyzed by Churchill and Peter. Third, the
two studies differed in their sampling approaches. To obtain a large number and wide representation
Churchill and Peter followed what might be termed the of alpha coefficients, an extensive literature review was
Mansfield and Busse sampling approach, whereas the undertaken. A census of eight psychology- and mar-
present research followed a traditional Glassian sam- keting-related journals was conducted, beginning with
COEFFICIENT ALPHA 383
the year 1960 (the year in which citations ofCronbach's from this category must take this caveat into consid-
work began to appear) and ending with 1992. This eration.
meant that every article published in these journals was More than 33,000 articles, proceedings papers, and
systematically and individually examined to locate al- rejected manuscripts were individually examined during
pha coefficients. In addition, a convenience sample of the course of data collection. From these, alpha coef-
selected issues of 16 other journals (e.g., Journal ofAd- ficients were obtained from 832 different articles, pro-
vertising Research, Journal of Business Research, Ed- ceedings papers, and manuscripts reporting data from
ucational and Psychological Measurement, Journal of 1,030 samples consisting of more than 300,000 indi-
Educational Psychology, and others) and two confer- viduals. Thus, on average there were 5.1 alpha coeffi-
ence proceedings (American Marketing Association, cients per reviewed publication or manuscript and 4.1
Association for Consumer Research) were examined for alpha coefficients per sample. To obtain a conceptually
alpha coefficients. Finally, a sample of unpublished coherent pool of alpha coefficients, only those alpha
manuscripts (manuscripts that had been rejected at least coefficients reported for rating scales designed to mea-
once for publication) were examined for alpha coeffi- sure individual difference constructs such as personality,
cients. The "harvesting" of alpha coefficients and the attitude, and opinion in nonspecial populations were
coding of individual difference constructs and research included in the analysis. Excluded were alpha coeffi-
design characteristics were done by the author and two cients reported for forced-choice scales (e.g., constant-
experienced research assistants, with the bulk of the sum scales), scales used to measure interrater agree-
coding being done by the latter two individuals after ment, and scales developed or designed for special
an extensive pretest and a training session. A compar- populations (e.g., institutionalized individuals).
ison of the research assistants' coding consistency for a
sample of research design characteristics produced an Constructs Measured
interrater agreement coefficient of .92 (Perreault and
Leigh 1989). Hence, coding was judged to be consis- All harvested alpha coefficients were categorized ac-
tently done (as would be expected given the relatively cording to the underlying construct being measured. In
straightforward nature of the task). most instances this was accomplished by simply ac-
Table 2 contains the sources ofthe alpha coefficients cepting the construct designation employed in a study.
However, in certain instances it was necessary to infer
analyzed in the study, as well as the years they were
the appropriate construct category on the basis of no-
reviewed, the type of search undertaken, the number
menclature and terminology used.
of coefficients obtained from each, and the mean and After an extensive review of the behavioral literature
median alpha coefficient observed for each source. and a preliminary analysis, 42 categories of constructs
Across the sources, 4,286 alpha coefficients were har- were initially constructed for classification purposes.
vested. All alpha coefficients were independent in the Because of overlap among the categories and the small
sense that they were estimated from distinct sets of numbers of alpha coefficients in some of the categories,
items. For example, if alpha coefficients were reported the number of categories was ultimately collapsed to
for both an "overall" or composite scale and one or 20, which included a miscellaneous category. As might
more subscales of that scale, only the subscale alpha be expected, the categories varied in terms of their level
coefficients were retained for analysis. Eliminating the of generality and the number of scales they contained.
composite scale from the meta-analysis minimized the The specific construct categories used are reported in
possibility of interdependencies among the alpha coef- Table 3. The largest category consisted of attitude con-
ficients. structs, with 699 alpha coefficients. The smallest cate-
Several aspects of Table 2 merit brief mention. First, gory contained constructs relating to expectation, with
as is apparent from the table, it was not possible to 37 alpha coefficients.
conduct a census of all journals back to 1960 because
some, such as the Journal of Consumer Research, did
not begin publication until after that date. Second, as Research Design Characteristics
previously noted, journals were purposely selected to For each alpha coefficient harvested, information was
provide a representative array of journals, research do- sought on 12 research design characteristics in addition
mains, and alpha coefficients. Finally, unpublished to source and year of publication. The research design
manuscripts consisted of manuscripts submitted to the characteristics examined in this article have been pos-
journals and conference proceedings listed in Table 2. ited for more than 65 years as influencing the size of a
Manuscripts classified as unpublished can only be so reliability coefficient (see e.g., Symonds 1928). Each is
defined for the submissions evaluated. Some of the briefly described below.
manuscripts classified as unpublished may well have Although measurement theory does not consider the
been previously rejected or subsequently published in effect of sample size on the magnitude of an alpha coef-
an outlet unknown to the present researcher. Therefore, ficient, Churchill and Peter (1984) observed a negative
any interpretation of the alpha coefficients obtained relationship between the two. Because they were not
384 JOURNAL OF CONSUMER RESEARCH
TABLE 2
Source Time period covered Data collection Number of a'S Mean a Median a
able to explain their finding, the present research re- the literature on this subject reveals that the reliability
studied the relationship. of a scale is expected to increase with an increase in the
Type of sample was operationalized as college stu- number of items only under certain conditions that re-
dent, consumer, businessperson, "mixed" (more than late to the homogeneity of individual item variances.
one type), or "cannot tell" (sample unclassifiable). Contrary to popular belief, it is not clear that simply
Churchill and Peter (1984) hypothesized that college increasing the number of items in a scale will guarantee
student samples should evince higher scale reliabilities that its reliability will also increase. Consequently, the
than should noncollege student samples because stu- present research again addressed the issue from the per-
dents should be more experienced in completing ques- spective of a large body of alpha coefficients.
tionnaires and perhaps more educated. However, their Analogous to the Churchill and Peter research, the
hypothesis was not supported. Given their finding and present study investigated the effect of scale type, for-
a lack of conceptual support for the hypothesis of dif- mat, and nature on the magnitude of an alpha coeffi-
ferent alpha coefficients for different types of samples, cient. Type of scale was operationalized by whether the
type of sample was not expected to influence the size scale consisted of traditional Likert items (i.e., declar-
of an alpha coefficient. ative statements with a five-category "agree-disagree"
Two research design characteristics have been studied response format) or semantic differential items (i.e.,
extensively for their effects on the magnitudes of reli- seven-category bipolar items). Scale format was oper-
ability coefficients-the number of categories, points, ationalized as the specific labeling of scale-item cate-
or intervals in a scale item, and the number of items in gories (i.e., whether only endpoints were labeled,
a scale. The effect of the number of categories on the whether numerical or verbal labels were used on inner
size of a reliability coefficient has long been debated in categories, or whether it was impossible to label cate-
the literature, with, for example, Bendig (1954) and Ja- gories). The nature of the scale was operationalized by
coby and Matell (1971) concluding that the magnitude whether there was an odd or even number of scale-item
ofa reliability coefficient is independent of the number categories or whether it was impossible to tell how many
of scale categories and Komorita and Graham (1965) categories there were. Given the findings obtained by
and Lissitz and Green (1975) concluding the opposite. Churchill and Peter, no differences in the magnitudes
With the exception of the Churchill and Peter (1984) of alpha coefficients were expected as a function of scale
research, which found a positive relationship between format or scale nature. However, it was anticipated that
the number of scale categories and the size of a reliability scales consisting of semantic differential items would
coefficient, prior research on the number-of-categories exhibit larger alpha coefficients than would scales con-
issue either relied on relatively small samples or used sisting of Likert items if a relationship exists between
a simulation approach. It was anticipated that the pres- the number of categories in a scale item and the mag-
ent research would resolve these conflicting findings nitude of an alpha coefficient (simply because of the
through its synthesis of a large body of alpha coefficients. difference in the number of item categories).
The formula for coefficient alpha implies that the The final characteristic studied that Churchill and
larger the number of items in a scale, the greater its Peter also investigated was mode of scale administra-
reliability. This relationship is generally taken for tion. Although Churchill and Peter did not offer a hy-
granted in the literature, and Churchill and Peter's pothesis regarding mode of scale administration and
findings corroborated it. Even so, careful inspection of the magnitude of a reliability coefficient, in the present
COEFFICIENT ALPHA 385
TABLE 3
Quartile
95% Confidence interval
Construct N Mean a Median a for a. First Third
"Includes such constructs as loyalty, innovativeness, and importance, each with fewer than 30 alpha coefficients.
study it was expected that self-administered scales would plied, or whether this could not be determined. (Al-
exhibit larger alpha coefficients than would scales that though Churchill and Peter did study a design charac-
were not self-administered because of the likelihood of teristic they termed "source of scale," it is not directly
there being less ambiguity and confusion associated with comparable to the design characteristic investigated
scale items that an individual could physically view and here.) If the scale was developed, a fourth research de-
complete at his or her own pace. In their analysis, sign characteristic was investigated. Specifically, an
Churchill and Peter did not find a relationship between analysis of developed scales was conducted to determine
the administration mode and the magnitude of the re- whether there were scale items deleted during the de-
liability coefficients they analyzed. velopment process and, if so, whether the number of
Four research design characteristics not directly in- items deleted from a scale influenced the size of its alpha
vestigated by Churchill and Peter were also studied. One coefficient.
such characteristic was scale orientation, which indi- It is unfortunate that, because of reporting practices,
cated whether a scale was stimulus- or respondent-cen- it was not always possible to capture the desired research
tered or both. Because respondent-centered scales have design characteristics of every study reviewed. Hence,
been the focus of more measurement attention than not all alpha coefficients harvested were included in ev-
stimulus-centered scales (Cox 1980), it was anticipated ery analysis.
that the former would exhibit larger alpha coefficients
than the latter. A second research design characteristic RESULTS
studied was the nature of the construct represented by
a scale, whether it was designated primarily as a depen- Figure 1 illustrates the distribution of the 4,286 alpha
dent or an independent variable or as both or whether coefficients harvested. The coefficients ranged from .06
it was impossible to designate it at all. It was postulated to .99 with a mean of.77 and a median of .79. As can
that, because of the likelihood that a greater emphasis be observed from the figure, the coefficients were rela-
would be placed on a dependent variable during the tively tightly grouped (SE = .002), and there was a slight
conceptual and operational development of a study, negative skew (sk = -1.15) to the distribution.
larger alpha coefficients would be exhibited by depen- Seventy-five percent of the observed alpha coefficients
dent variables than by independent variables. were.70 or greater, 49 percent were .80 or greater, and
The third research design characteristic not directly 14 percent were .90 or greater. These three values cor-
investigated by Churchill and Peter was the type of re- respond to Nunnally's 1978 recommendations (pp.
search, whether a scale was developed specifically for 245-246) for minimally acceptable reliability levels for
the reported research, whether it was simply being ap- preliminary, basic, and applied research, respectively.
386 JOURNAL OF CONSUMER RESEARCH
Quartile
95% Confidence interval
Construct N Mean a Median a for a. First Third
Sample size:"
<100 1,028 .76 .80 ±.009 .68 .87
100-199 1,169 .78 .80 ±.007 .71 .87
200-299 696 .78 .80 ±.008 .72 .86
300 or more 1,265 .75 .77 ±.007 .68 .84
Not given 128 .76 .80 ±.003 .67 .86
Type of sample:"
College students 1,741 .77 .80 ±.007 .70 .87
Consumers 879 .74 .76 ±.009 .66 .84
Businesspersons 1,130 .77 .80 ±.007 .70 .86
Mixed 450 .78 .79 ±.010 .72 .86
Cannot tell 86 .76 .77 ±.025 .69 .87
Number of scale categories:"
Not given 667 .76 .78 ±.009 .68 .. 85
2 221 .70 .74 ±.022 .63 .82
3 158 .78 .80 ±.020 .69 .88
4 305 .76 .78 ±.014 .69 .86
5 1,319 .77 .79 ±.007 .71 .86
6 249 .75 .77 ±.016 .68 .84
7 991 .78 .82 ±.009 .72 .88
8 or more 376 .77 .81 ±.015 .70 .88
Number of items:"
Not given 342 .77 .79 ±.015 .71 .86
2 307 .73 .75 ±.016 .63 .83
3 519 .73 .75 ±.012 .65 .84
4 503 .76 .78 ±.011 .68 .85
5 441 .78 .79 ±.011 .71 .86
6 377 .75 .77 ±.013 .69 .84
7 210 .76 .78 ±.016 .69 .85
8 183 .73 .77 ±.025 .65 .85
9 117 .80 .83 ±.017 .74 .89
10 311 .74 .78 ±.015 .65 .85
11 or more 976 .81 .83 ±.007 .76 .89
Scale type:"
Likert 828 .76 .79 ±.009 .70 .86
Semantic differential 372 .80 .82 ±.013 .74 .89
Scale format:
Only endpoints labeled 811 .77 .80 ±.010 .69 .87
Numerical values on inner categories 553 .77 .80 ±.011 .69 .86
Verbal values on inner categories 1,869 .77 .80 ±.006 .71 .86
Cannot tell 1,053 .76 .78 ±.008 .68 .85
Nature of scale:"
Odd number of item categories 2,756 .78 .80 ±.005 .70 .86
Even number of item categories 863 .74 .76 ±.010 .70 .86
Cannot tell 667 .76 .78 ±.009 .68 .85
Administration mode:"
Self 4,064 .77 .79 ±.004 .70 .86
Interviewer 153 .72 .75 ±.028 .65 .85
Not given 69 .77 .78 ±.029 .70 .87
Scale orientation:"
Respondent-centered 3,017 .76 .79 ±.002 .69 .86
Stimulus-centered 1,206 .79 .81 ±.004 .73 .88
Both 63 .79 .78 ±.018 .72 .87
Nature of construct:"
Dependent 919 .79 .82 ±.008 .73 .89
Independent 1,897 .77 .79 ±.006 .70 .86
Cannot tell/both 1,470 .75 .78 ±.007 .67 .85
Type of research:
Scale development 1,270 .77 .79 ±.007 .70 .86
Scale application 2,978 .77 .79 ±.005 .70 .86
Cannot tell 38 .84 .84 ±.029 .78 .91
FIGURE 2 TABLE 5
RELATIONSHIP BETWEEN COEFFICIENT ALPHA AND NUMBER EFFECT OF ELIMINATING SCALE ITEMS DURING SCALE
OF SCALE ITEMS AND ITEM CATEGORIES DEVELOPMENT ON COEFFICIENT ALPHA
Number of scale items
Number of items eliminated" N Mean a
20r3 4 or more
None 234 .70
1-3 64 .79
4-10 69 .80
2 a = .62 -a= .71 11-30 63 .77
Number of More than 30 63 .87
categories
in item (n = 23) (n = 186) "Relationship significant at p < .001.
TABLE 6
SUMMARY COMPARISON OF CHURCHILL AND PETER (1984) AND PRESENT RESEARCH FINDINGS
their originating scales were more likely to consist of between the mode of administration and the magnitude
more items with more categories, to be derived from of coefficient alpha in the Churchill and Peter study
smaller samples, to be stimulus-centered, and to have may have been a consequence of their sample size and/
been developed to measure constructs employed as de- or operationalization of the design characteristic. The
pendent variables. Succinctly stated, there appears to finding in the present research that self-administered
be a systematic relationship between the origin of an scales exhibited larger alpha coefficients than inter-
alpha coefficient and whether it is "sufficiently high" viewer-administered scales is intuitively logical. Hence,
(in excess of .90) to warrant being deemed acceptable the mode of administration warrants attention in future
for applied research. research as an influencer of coefficient alpha.
Second, the present research generally corroborated Furthermore, the present research confirms previous
the findings of Churchill and Peter (1984; Peter and findings from simulation studies (see, e.g., Jenkins and
Churchill 1986) to the extent that it documented that Taber 1977; Lissitz and Green 1975) and empirical
coefficient alpha is relatively robust and is not subject demonstrations (see, e.g., Bendig 1954; Jacoby and
to dramatic fluctuations as a consequence of research Matell 1971) that, with the exception of scale items
design characteristics. As summarized in Table 6, de- possessing only two response categories, the number of
spite their underlying differences, the Churchill and Pe- categories in a scale item is essentially unrelated to the
ter research and the present research obtained similar magnitude of coefficient alpha. Thus, the research adds
results for six of the eight design characteristics inves- to the body of knowledge relating to the "optimal"
tigated in common. The two studies differed in their number of scale categories (see, e.g., Cox 1980) in that
findings regarding the impact of sample size and the it shows that to increase reliability (as estimated by coef-
mode of administration on the magnitude of coefficient ficient alpha) it is necessary to assume a strategy other
alpha, with Churchill and Peter finding a relationship than simply increasing the number of scale-item cate-
for the former but not for the latter. Because Churchill gories.
and Peter had no a priori hypothesis regarding the re- Perhaps the most interesting relationship studied was
lationship between sample size and coefficient alpha and that between the number of items in a scale and the
were unable to explain the obtained relationship post magnitude of coefficient alpha. Theoretically, the larger
hoc, it may have simply been an anomaly, as the present the number of items in a scale, the more reliable will
research suggests. The lack of a significant relationship be the scale (see, e.g., Nunnally 1978, p. 243). However,
390 JOURNAL OF CONSUMER RESEARCH
Gerbing, David W. and James C. Anderson (1988), "An Up- Murphy, Kevin R. and Charles O. Davidshofer (1988), Psy-
dated Paradigm for Scale· Development Incorporating chological Testing: Principles and Applications, Engle-
Unidimensionality and Its Assessment," Journal of wood Cliffs, NJ: Prentice-Hall.
Marketing Research, 25 (May), 186-192. Nunnally, Jum C. (1967), Psychometric Theory, 1st ed., New
Guilford, J. P. (1965), Fundamental Statistics in Psychol- York: McGraw-Hill.
ogy and Education, 4th ed., New York: McGraw- - - - (1978), Psychometric Theory, 2d ed., New York:
Hill. McGraw-Hill.
Jacoby, Jacob and Michael S. Matell (1971), "Three-Point Perreault, William D., Jr. and Lawrence E. Leigh (1989),
Likert Scales Are Good Enough," Journal of Marketing "Reliability of Nominal Data Based on Qualitative
Research, 8 (November), 495-500. Judgments," Journal of Marketing Research, 26 (May),
Jenkins, G. Douglas, Jr. and Thomas D. Taber (1977), "A 135-148.
Monte Carlo Study of Factors Affecting Three Indices of Peter, J. Paul (1979), "Reliability: A Review of Psychometric
Composite Scale Reliability," Journal of Applied Psy- Basics and Recent Marketing Practices," Journal of
chology, 62 (November), 392-398. Marketing Research, 16 (February), 6-17.
Kaplan, Robert W. and Dennis P. Saccuzzo (1982), Psycho- - - - and Gilbert A. Churchill, Jr. (1986), "Relationships
logical Testing: Principles, Applications, and Issues, among Research Design Choices and Psychometric
Monterey, CA: Brooks/Cole. Properties of Rating Scales: A Meta-analysis," Journal
Komorita, Samuel S. and William K. Graham (1965), of Marketing Research, 23 (February), 1-10.
"Number of Scale Points and the Reliability of Scales," Peterson, Robert A., Gerald Albaum, and Richard F. Beltra-
Educational and Psychological Measurement, 25 (No- mini (1984), "A Meta-analysis of Effect Sizes in Con-
vember), 987-995. sumer Behavior Experiments," Journal ofConsumer Re-
Lissitz, Robert W. and Samuel B. Green (1975), "Effect of search, 12 (June), 97-103.
the Number of Scale Points on Reliability: A Monte Carlo Symonds, Percival M. (1928), "Factors Influencing Test Re-
Approach," Journal of Applied Psychology, 60 (Febru- liability," Journal of Educational Psychology, 19 (Feb-
ary), 10-13. ruary), 73-87.