You are on page 1of 7

L.M. Lozano et al.

: Response Categories and the Reliability andHogrefe


Validity
©Methodology
2008 & of
2008; Rating
Huber
Vol. Scales
4(2):73–79
Publishers

Effect of the Number of Response


Categories on the Reliability and
Validity of Rating Scales
Luis M. Lozano1, Eduardo García-Cueto2, and José Muñiz2
1
University of Jaén, 2University of Oviedo, all Spain

Abstract. The Likert-type format is one of the most widely used in all types of scales in the field of social sciences. Nevertheless, there
is no definitive agreement on the number of response categories that optimizes the psychometric properties of the scales. The aim of the
present work is to determine in a systematic fashion the number of response alternatives that maximizes the fundamental psychometric
properties of a scale: reliability and validity. The study is carried out with data simulated using the Monte Carlo method. We simulate
responses to 30 items with correlations between them ranging from 0.2 to 0.9. We also manipulate sample size, analyzing four different
sizes: 50, 100, 200, and 500 cases. The number of response options employed ranges from two to nine. The results show that as the
number of response alternatives increases, both reliability and validity improve. The optimum number of alternatives is between four
and seven. With fewer than four alternatives the reliability and validity decrease, and from seven alternatives onwards psychometric
properties of the scale scarcely increase further. Some applied implications of the results are discussed.

Keywords: Likert scale, reliability, validity, number of choices

In practice, the number of alternatives most frequently (Cicchetti, Showalter, & Tyrer, 1985; McKelvie, 1978;
found in Likert-type scales is five, which is actually what Nunnally, 1970; Ramsay, 1973), while others prefer five
Likert (1932) himself proposed on presenting his method, (Jenkings & Taber, 1977; Lissitz & Green, 1975; Neumann,
though he never explained the precise psychometric rea- 1979).
soning behind the number he chose. There is a good deal As far as factorial validity is concerned, there are several
of research on how the psychometric properties of a scale studies on how it is affected as the number of response
are affected by the number of response alternatives (Fer- options changes (Babakus, Ferguson, & Jöreskog, 1987;
rando, 1995, 2000; Hernández, Muñiz & García-Cueto, DiStefano, 2002; Dolan, 1994; Green, Akey, Fleming,
2000; Laveault & Grégoire, 1997; Moreno, Martínez, & Hershberger, & Marquis, 1997; Hutchinson & Olmos,
Muñiz, 2004; Muñiz, 2003; Sancerni, Meliá, & González- 1998). Although these researches are focused on confirma-
Romá, 1990; Tomás & Oliver, 1998); in these researches tory factor analyses, the main idea that the greater the cat-
the authors found that when they increase the number of egorization imply a greater loss of information, and, in turn,
choices the reliability (usually Cronbach’s α) and the fac- a greater attenuation of the relationships between items,
torial validity of the scale are better; however, no clear an- can be applied to the exploratory factor analyses. In those
swer has emerged on the right number for maximizing studies that compare the dichotomous format with that of
these properties (Muñiz, García-Cueto, & Lozano, 2000). seven response alternatives (Bernstein & Teng, 1989;
The majority of studies that attempt to discover the op- Comrey & Montag, 1982; King, King, & Klockars, 1983;
timum number of alternatives take as a criterion the reli- Oswald & Velicer, 1980; Velicer, DiClemente, & Corri-
ability of the scale. Such studies examine whether different veau, 1984) the data indicate better factorial validity with
numbers of alternatives affect reliability, and, if so, which seven options than with two. The fact that the validity of
number maximizes it. Some authors (Aiken, 1983; Boote, the questionnaire changes according to the number of re-
1981; Brown, Widing, & Coulter, 1991; McCallum, Keith, sponse options is a matter of considerable importance, po-
& Wiebe, 1988, Weng, 2004) maintain that although reli- tentially calling into question the construct validity of the
ability tends to increase as the number of response options variable measured (MacCallum, Zhang, Preacher, &
rises, when the number of alternatives exceeds five or six, Rucker, 2002; Muñiz, García-Cueto, & Lozano, 2005).
reliability hardly increases further. The clearest conclusion A short discussion about the difference between the ar-
from such research is that the minimum number of catego- ticles using real data and the ones that use simulated data
ries for ensuring an appropriate level of reliability is four. can be found in Bandalos and Enders (1996).
As regards the maximum number, some authors argue that Given the lack, in the literature reviewed, of a comprehen-
reliability is maximized with seven response alternatives sive study relating the number of options of scales with their

© 2008 Hogrefe & Huber Publishers Methodology 2008; Vol. 4(2):73–79


DOI 10.1027/1614-2241.4.2.73
74 L.M. Lozano et al.: Response Categories and the Reliability and Validity of Rating Scales

Table 1. Variation coefficient of the scale (N: sample size; rij: correlation between the items of the scale)
rij N Number of response categories
2 3 4 5 6 7 8 9
0.9 50 28.15 25.89 27.97 28.03 28.88 29.21 29.64 30.22
100 27.46 23.68 25.00 25.76 26.37 26.79 27.31 27.63
200 27.52 22.46 23.55 24.27 24.91 25.335 25.74 26.10
500 27.17 20.67 21.91 22.38 22.87 23.20 23.69 23.88
0.8 50 25.52 22.90 24.30 25.05 25.75 26.20 26.83 27.00
100 25.43 21.38 22.70 23.46 24.15 24.67 25.06 25.35
200 25.49 19.84 21.37 21.91 22.53 22.93 23.34 23.62
500 25.81 20.62 20.01 19.86 20.95 21.35 21.70 22.04
0.7 50 23.79 20.64 22.00 22.68 23.48 24.34 24.42 24.51
100 23.38 18.92 20.45 21.16 21.75 22.18 22.54 22.82
200 23.15 17.63 19.22 19.88 20.45 20.80 21.19 21.41
500 23.69 16.09 18.11 18.54 19.03 19.38 19.74 19.93
0.6 50 21.38 18.32 19.61 20.43 21.04 21.45 21.77 21.94
100 21.40 16.94 18.39 19.06 19.73 20.08 20.41 20.56
200 21.44 16.02 17.66 18.29 18.82 19.19 19.49 19.68
500 21.36 14.51 16.51 18.39 17.42 17.80 18.10 18.21
0.5 50 19.35 16.26 17.52 18.25 18.77 19.19 19.40 19.69
100 20.09 15.11 16.67 17.24 17.72 18.96 19.40 18.53
200 19.65 14.34 16.08 16.67 17.14 17.49 17.75 18.02
500 20.02 13.20 15.97 15.84 16.24 16.56 16.89 16.59
0.4 50 17.68 14.55 15.77 16.43 16.82 17.10 17.48 17.69
100 17.61 13.57 15.06 15.53 15.99 16.34 16.58 16.85
200 17.23 12.68 14.54 14.71 15.29 15.41 15.66 16.31
500 17.51 11.68 13.65 14.13 14.65 14.68 14.94 15.05
0.3 50 15.35 12.80 14.46 14.42 14.87 14.95 15.41 15.59
100 15.05 11.63 12.97 13.40 13.55 13.96 14.20 14.36
200 15.52 11.78 12.63 13.14 13.38 13.62 13.85 14.04
500 10.55 10.23 11.92 12.23 12.54 12.86 13.06 13.08
0.2 50 13.17 10.73 11.67 12.15 13.29 12.69 12.91 13.02
100 12.92 9.83 10.92 11.32 12.54 11.71 11.98 12.09
200 12.85 9.19 10.46 10.64 10.97 11.15 11.32 11.40
500 13.07 8.44 9.96 10.16 10.43 10.62 10.79 10.88

psychometric properties, the principal objective of the pre- bution (0,1), with correlations between them of 0.9, 0.8,
sent work is to examine in a systematic fashion the influence 0.7, 0.6, 0.5, 0.4, 0.3, and 0.2. The number of response
of the number of response alternatives of Likert-type scales categories of the items ranged from 2 to 9. Four sample
on the psychometric properties of the scale (reliability and sizes were used: 50, 100, 200, and 500 cases. In sum, in the
factorial validity). In addition to the number of response op- design employed we crossed four sample sizes (50, 100,
tions, we shall take into account sample size and the correla- 200, and 500), eight sizes of correlation between the items
tions between the items making up the questionnaire or test. (0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, and 0.2) and eight levels of
response categories (9, 8, 7, 6, 5, 4, 3, and 2). Data were
simulated to represent a one-factor model. In total we gen-
Method erated 256 (4 × 8 × 8) simulation conditions. For each one
of these conditions 100 samples were simulated, giving a
Data Simulation total of 25,600 simulated samples.
With this design it was aimed to cover in systematic
By means of PRELIS (Jöreskog & Sörbom, 1993), we sim- fashion the entire possible range of correlations between
ulated the responses to 30 items following a normal distri- items, and the whole range of numbers of response catego-

Methodology 2008; Vol. 4(2):73–79 © 2008 Hogrefe & Huber Publishers


L.M. Lozano et al.: Response Categories and the Reliability and Validity of Rating Scales 75

Table 2. Coefficient α of the scales (N: sample size; rij: correlation between the items of the scale)
rij N Number of response categories
2 3 4 5 6 7 8
0.9 50 .987 .988 .991 .993 .994 .994 .995 .995
100 .986 .988 .990 .992 .993 .994 .994 .994
200 .987 .987 .990 .991 .993 .993 .994 .995
500 .987 .986 .989 .991 .992 .993 .994 .994
0.8 50 .977 .980 .984 .986 .988 .989 .989 .990
100 .977 .979 .983 .986 .988 .988 .989 .990
200 .977 .977 .982 .985 .987 .988 .989 .990
500 .977 .976 .981 .983 .986 .988 .989 .989
0.7 50 .967 .970 .976 .979 .981 .982 .982 .983
100 .967 .968 .975 .979 .981 .982 .983 .984
200 .966 .962 .974 .978 .980 .982 .983 .983
500 .966 .964 .973 .977 .979 .981 .982 .983
0.6 50 .953 .957 .965 .970 .972 .974 .974 .975
100 .952 .954 .964 .969 .972 .973 .974 .975
200 .953 .954 .964 .969 .972 .973 .974 .975
500 .953 .950 .962 .967 .970 .973 .974 .974
0.5 50 .938 .943 .953 .959 .961 .963 .964 .965
100 .936 .939 .951 .956 .959 .961 .962 .961
200 .936 .936 .950 .956 .960 .962 .963 .964
500 .937 .933 .949 .952 .960 .961 .963 .964
0.4 50 .910 .917 .931 .938 .941 .943 .945 .946
100 .915 .917 .933 .939 .944 .946 .947 .948
200 .913 .913 .930 .938 .942 .944 .946 .941
500 .913 .910 .929 .937 .941 .944 .946 .947
0.3 50 .863 .883 .898 .911 .915 .918 .919 .921
100 .872 .876 .898 .907 .912 .915 .917 .918
200 .877 .881 .901 .911 .917 .916 .921 .922
500 .877 .871 .898 .908 .914 .918 .920 .922
0.2 50 .812 .823 .849 .860 .866 .871 .873 .873
100 .812 .814 .845 .858 .864 .868 .869 .875
200 .816 .812 .847 .857 .866 .869 .871 .874
500 .814 .801 .841 .855 .863 .868 .872 .873

ries of the scales. As regards sample sizes, we selected torial validity (percentage of variance explained by the first
those most commonly used by psychologists and teachers. factor). The method of extraction used in the factor analysis
As far as the items with just two alternatives are concerned, was that of maximum likelihood asking for a factor because
these are clearly not Likert-type, but we have included data were simulated according to a one-factor structure.
items of this type since it may be interesting to compare
this binary response format (customary in tests and ques-
tionnaires) with the true Likert-type scale format.
Results
Data Analysis There are lot of problems to compare the variability of
groups with clearly different means. Usually biggest means
For the statistical analysis we used the SPSS 12.5 package. implies biggest standard deviations. In this research we
First of all we calculated the descriptive statistics of the used the variation coefficient to compare the variability of
samples. Reliability was assessed by means of the α coef- the different groups. The results showed here are the aver-
ficient, and validity was evaluated using indicators of fac- ages of the 100 replications per cell of the different statis-

© 2008 Hogrefe & Huber Publishers Methodology 2008; Vol. 4(2):73–79


76 L.M. Lozano et al.: Response Categories and the Reliability and Validity of Rating Scales

Table 3. Number of significant differences between number


of choices (N: sample size; rij: correlation between
the items of the scale)
rij N
50 100 200 500
0.9 11 16 21 24
0.8 9 13 19 22
0.7 7 12 18 21
0.6 7 11 16 21
0.5 4 10 15 20
0.4 2 9 11 18
0.3 4 8 11 17
0.2 0 6 10 16
Figure 1. Coefficient α of the scales.

Table 4. Percentage of variance explained by the first factor (N: sample size; rij: correlation between the items of the scale)
rij N Number of alternatives
2 3 4 5 6 7 8 9
0.9 50 72.89 75.54 79.73 82.74 84.44 86.04 86.95 87.78
100 71.54 73.79 77.93 81.14 83.28 84.88 86.05 86.95
200 72.23 72.39 76.88 80.02 82.47 84.19 85.56 86.46
500 72.04 71.53 75.68 79.03 81.56 83.48 84.92 85.93
0.8 50 59.83 63.97 68.83 72.08 74.43 75.75 76.92 77.48
100 60.31 62.26 67.48 71.11 73.78 75.40 76.53 77.32
200 60.40 60.70 66.38 69.93 72.98 74.69 76.01 76.87
500 59.99 58.69 64.71 68.72 71.62 73.89 75.33 76.34
0.7 50 52.24 54.18 59.67 62.93 65.20 66.52 67.48 68.10
100 51.65 52.38 58.76 62.35 64.70 66.30 67.64 68.09
200 50.76 50.96 57.01 60.97 63.72 65.40 66.56 67.38
500 50.77 49.06 55.77 59.76 62.79 64.71 66.01 67.09
0.6 50 43.50 45.73 50.80 54.15 55.81 57.27 58.02 58.75
100 42.37 43.90 49.58 53.06 55.09 56.50 57.55 58.06
200 42.75 43.01 48.72 52.69 55.04 56.62 57.60 58.32
500 42.57 41.02 47.68 51.50 54.06 55.74 56.92 57.70
0.5 50 37.00 38.64 43.61 46.61 48.06 49.48 49.91 50.23
100 35.24 36.55 41.73 44.58 46.39 47.58 48.31 48.73
200 35.41 35.47 41.19 44.35 46.23 47.66 48.47 49.08
500 33.37 31.95 38.37 42.06 43.63 45.16 46.50 46.86
0.4 50 29.40 30.74 34.58 36.95 38.07 38.86 39.49 39.92
100 29.56 32.80 34.42 36.96 38.63 39.50 40.20 40.75
200 28.68 28.68 34.02 35.99 37.21 38.61 39.44 39.94
500 28.38 27.45 33.15 35.51 37.89 38.36 39.02 39.58
0.3 50 23.18 24.47 27.69 29.40 30.47 31.07 31.27 31.81
100 22.14 22.75 26.12 27.80 28.93 29.45 30.13 30.39
200 22.23 22.59 26.23 28.23 29.39 30.17 30.77 31.12
500 22.09 21.17 25.47 27.42 28.82 30.35 30.29 30.72
0.2 50 14.76 15.64 17.81 19.03 19.62 20.10 20.45 20.55
100 13.72 13.93 16.72 17.90 18.45 19.03 19.43 19.59
200 13.39 13.22 16.08 17.10 18.21 18.51 18.78 19.20
500 15.75 15.01 18.03 19.44 20.31 20.91 21.08 21.60

Methodology 2008; Vol. 4(2):73–79 © 2008 Hogrefe & Huber Publishers


L.M. Lozano et al.: Response Categories and the Reliability and Validity of Rating Scales 77

real tests (0.2, 0.3) the differences are only found between
the extreme alternatives.
On observing the behavior of the percentage of variance
explained by the first factor (Table 4), we find that it tends
to fall as the number of alternatives decreases. It is worthy
of note that once more there is a reversal of this trend in
the samples of 500 cases with two response options.
In all the situations studied it is found that as the number
of response alternatives decreases, the percentage of ex-
plained variance also falls. In the samples with 500 cases
the minimum is always obtained with three alternatives, not
with two.
If we consider the variance explained by the first factor
as a function of the number of alternatives (Figure 2), re-
gardless of the correlation between the variables, we find
Figure 2. Percentage of variance explained by the first fac-
that the percentage decreases as the number of response
tor.
options falls, even though there is a reversal of this trend
in the sample of 500 cases when there are just 2 response
alternatives. This change in trend already begins to appear
tics. Table 1 shows the variation coefficient of the scales in the sample of 200 cases.
according to number of categories, correlations between
the items and sample size. As it can be seen, as the number
of alternatives is reduced the variation coefficient also de-
crease systematically. It should be noted that there is an
exception to this systematic increase in the variability as Discussion and Conclusions
the number of response categories increases: on moving
from two to three categories this increase does not occur; The results presented show that the reliability of Likert-
indeed, in many cases the variability with three response type scales, assessed using the α coefficient, decreases as
categories is lower than in the case of two categories. In the number of response options is reduced. When we re-
this table we can see the trend of increasing the variation duce the number of choices we are reducing the variability
coefficient when the correlation is also increasing. of the scale too, so the reliability decrease. This trend is
Table 2 shows the reliability coefficients of the scales maintained for the different samples sizes used and for the
according to number of response categories, correlations different sizes of correlations between the items of the
between the items and sample size. As it can be seen, in scale. This general rule has an interesting exception, and
general the reliability increases as the number of response one with clear applied implications: The reliability of
options rises. It should be noted that in the sample of 500 scales with two categories is slightly higher than that ob-
cases this trend in the reliability coefficient undergoes a tained with three categories. This appears more clearly
reversal on moving from three to two categories. This when the samples are large, especially in those with 500
change in trend already begins to appear in the sample of cases. The explanation for this intriguing anomaly can be
200 cases. found by considering the behavior of the variability when
On the other hand, as expected, reliability falls as the the scales have three response categories. In this case the
correlation between the items decreases. variability decreases, since the majority of cases are con-
If we analyze the reliability of the scale, regardless of centrated in the central category, so that this reduction af-
the correlation between the items, but taking into account fects all the statistics, including the reliability coefficient:
sample size and number of alternatives (Figure 1), we find as is well known, this decreases as the variability of the
that the trend is the same as that displayed previously. That sample in which it is calculated decrease. An applied cor-
is, reliability increases as the number of alternatives offered ollary of this result is the inadvisability of using three re-
increases even though the variation is very slight. sponse categories in Likert-type scales, though it is not un-
The differences on reliability between the number of common to find such scales in practice.
choices leaving constant the rest of parameters (correlation As regards factorial validity, assessed through the per-
and sample size) were calculated using the Feldt test (Hak- centage of variance explained by the first factor, we
stian & Whalen, 1976; Alsawalmeh & Feldt, 1999; Feldt found that it decreases as the number of response options
& Seanghoon, 2006) finding that the differences can be is reduced, regardless of the correlations between the
explained by the sample size and the correlation of the items and sample size. In the large samples (200 and 500
items. As it can be seen (Table 3) when the sample size cases) we find the same effect as for reliability in the case
and/or the correlation increase the number of significant of scales with three response alternatives. The explana-
differences increases. In the most frequent correlations in tion is the same as that given for reliability, namely, that

© 2008 Hogrefe & Huber Publishers Methodology 2008; Vol. 4(2):73–79


78 L.M. Lozano et al.: Response Categories and the Reliability and Validity of Rating Scales

variability falls in the case of three response categories. References


It should be noted that the fit to the unidimensional model
proposed (the data were simulated in accordance with a
Aiken, L.R. (1983). Number of response categories and statistics
one-dimensional structure) improves on increasing the on a teacher rating scale. Educational and Psychological Mea-
number of response options of the scale. When sample surement, 43, 397–401.
size and number of response options are kept constant, Alsawalmeh, Y.M., & Feldt, L.S. (1999). Testing the equality of
so that only the correlations between the items are mod- two independent α coefficients adjusted by the Spearman-
ified, it can be seen how as these fall, both the value of Brown formula. Applied Psychological Measurement, 23,
the reliability coefficient and the percentage of variance 363–370.
explained by the first factor decrease, which is as expect- Babakus, E., Ferguson, C.E., & Jöreskog, K.G. (1987). The sen-
ed. Highest scores in variability correspond to the sample sitivity of confirmatory maximum likelihood factor analysis to
with 50 cases, followed by those with 100, 200, and 500, violations of measurement scale and distributional assump-
respectively. This is an instrumental artefact due to the tions. Journal of Marketing Research, 24, 222–228.
data simulation technique employed. The program used Bandalos, D.L., & Enders, C.K. (1996). The effect of nonnormal-
ity and number of response categories on reliability. Applied
for the simulation (PRELIS), despite theoretically simu-
Measurement in Education, 9, 151–160.
lating data between minus infinity and plus infinity, in
Bernstein, I.H., & Teng, H. (1989). Factoring items and factoring
practice draws limits, and simulates between + 6 and –6, scales are different: Spurious evidence for multidimensionality
approximately. On employing these limits, regardless of due to item categorization. Psychological Bulletin, 76, 186–204.
sample size, a sample of 50 cases generates a greater dis- Boote, A.S. (1981). Reliability testing of psychographic scales:
persion than when the simulation is with a sample of 100 Five-point or seven-point? Anchored or labeled? Journal of
cases, whose dispersion will in turn be greater than with Advertising Research, 21, 53–60.
one of 200 cases, and so on. This effect means that in the Brown, G., Widing, R.E., & Coulter, R.L. (1991). Customer eval-
sample of 50 cases we systematically obtain the highest uation of retail salespeople utilizing the SOCO scale: A repli-
dispersions in all the analyses carried out. cation, extension, and application. Journal of Academy Mar-
Our results permit the conclusion that, considering cri- keting Science, 9, 347–351.
teria of reliability and validity, the minimum number of Cicchetti, D.V., Showalter, D., & Tyrer, P.J. (1985). The effect of
number of rating scale categories on levels of interrater reli-
response categories for items with Likert-type format
ability: A Monte Carlo investigation. Applied Psychological
should be al least of four. As regards the ideal number, the Measurement, 9, 31–36.
data indicate that from seven categories onwards the gains Comrey, A.L., & Montang, I. (1982). Comparison of factor ana-
are scarce from a psychometric point of view, suggesting lytic results with two choice and seven choice personality item
the use of between four and seven. In practice, on deter- formats. Applied Psychological Measurement, 6, 285–289.
mining the maximum number of response alternatives, it is DiStefano, C. (2002). The impact of categorization with confir-
advisable to complement the psychometric criterion with matory factor analysis. Structural Equation Modeling: A Mul-
consideration of the particular characteristics of the sample tidisciplinary Journal, 9, 327–346.
in question. Thus, it should be ensured that the number of Dolan, C.V. (1994). Factor analysis of variables with 2, 3, 5 and
response options will be such that it does not exceed the 7 response categories: A comparison of categorical variable
discriminative capacity of the subjects. If too many alter- estimator using simulated data. British Journal of Mathemat-
natives are offered and the subject has problems discrimi- ical and Statistical Psychology, 47, 309–326.
nating between them, there is greater likelihood of the in- Feldt, L.S., & Seonghoon, K. (2006). Testing the difference be-
troduction of new measurement errors. Given that the poor- tween two α coefficients with small samples of subjects and
raters. Educational and Psychological Measurement, 66,
est results are obtained with two or three response
589–600.
alternatives, from a psychometric perspective, it is advis-
Ferrando, P.J. (1995). Equivalencia entre los formatos Likert y
able for questionnaires to avoid using such formats. In ad- continuo en ítems de personalidad: un estudio empírico
dition to the psychometric reasons, it should be borne in [Equivalence between the Likert and continuous format in per-
mind that respondents prefer formats with a larger number sonality items: An empirical study]. Psicológica, 16, 417–428.
of response alternatives, as this permits them to more clear- Ferrando, P.J. (2000). Testing the equivalence among different
ly express their point of view (Muñiz, Cueto, & Lozano, item response formats in personality measurement: A structur-
2005). al equation modeling approach. Structural Equation Model-
ing, 7(2), 271–286.
Green, S.B., Akey, T.M., Fleming, K.K., Hershberger, S.L., &
Marquis, J.G. (1997). Effect of the number of the scale points
Acknowledgments on χ² fit indices in confirmatory factor analysis. Structural
Equation Modeling: A Multidisciplinary Journal, 4, 108–120.
Hakstian, A.R., & Whalen, T.E. (1976). A K-sample significance
This work was financed by project IB205–027 from the test for independent α coefficients. Psychometrika, 41, 219–231.
Regional Government of Asturias, and by project Hernández, A., Muñiz, J., & García-Cueto, E. (2000). Comporta-
SEJ2005–08924, from the Spanish Government. miento del modelo de respuesta graduada en función del nú-

Methodology 2008; Vol. 4(2):73–79 © 2008 Hogrefe & Huber Publishers


L.M. Lozano et al.: Response Categories and the Reliability and Validity of Rating Scales 79

mero de categorías de la escala [Graded model trend based on Muñiz, J., García-Cueto, E., & Lozano, L.M. (2005). Item format
the number of the response categories of the scale]. Psicothe- and the psychometric properties of the Eysenck Personality
ma, Suppl. 2, 288–291. Questionnaire. Personality and Individual Differences, 38,
Hutchinson, S.R., & Olmos, A. (1998). Behaviour of descriptive 61–69.
fit indexes in confirmatory factor analysis using ordered cate- Neumann, L. (1979). Effects of categorization on relationships in
gorical data. Structural Equation Modeling: A Multidisciplin- bivariate distributions and applications to rating scales. Dis-
ary Journal,5, 344–364. sertation Abstracts International, 40, 2262-B.
Jenkings, C.D., & Taber, T.A. (1977). A Monte Carlo study of Nunnally, J.C. (1970). Psychometric theory. New York: McGraw-
factors affecting three indices of composite scale reliability. Hill.
Journal of Applied Psychology, 62, 392–398. Oswald, I., & Velicer, W.F. (1989). Item format and the structure
Jöreskog, K.G., & Sörbom, D. (1993). PRELIS 2 user’s reference of the EPI: A replication. Journal of Personality Assessment,
guide. Chicago: Scientific Software. 44, 283–288.
King, L.A., King, D., & Klockars, A.J. (1983). Dichotomous and Ramsay, J.O. (1973). Effects of number of categories in rating
multipoint scales using bipolar adjectives. Applied Psycholog- scales on precision of estimation of scale values. Psychomet-
ical Measurement, 7, 173–180. rika, 38, 513–532.
Laveault, D., & Grégoire, J. (1997). Introduction aux théories des
Sancerni, M.D., Meliá, J.L., & González-Romá, V. (1990). For-
tests en sciences humaines [Introduction to the test theory in
mato de respuesta, fiabilidad y validez en la medición de con-
the human sciences]. Bruxelles: De Boeck Université.
flicto de rol [Answer format, reliability, and validity in the roll
Likert, R. (1932). A technique for the measurement of attitudes.
conflict measurement]. Psicológica, 11, 167–175.
Archives of Psychology, 140, 44–53.
Tomas, J.M., & Oliver, A. (1998). Efectos de formato de respuesta
Lissitz, R.W., & Green, S.B. (1.975). Effects of the number of
y método de estimación en el análisis factorial confirmatorio
scale points on reliability: A Monte Carlo approach. Journal
[Effect of the format answer and estimation method in the con-
of Applied Psychology, 60, 10–13.
firmatory factor analysis]. Psicothema, 10, 197–208.
MacCallum, R.C., Zhang, S., Preacher, K.J., & Rucker, D.D.
(2002). On the practice of dichotomization of quantitative vari- Velicer, W., DiClemente, C.C., & Corriveau, D.P. (1984). Item
ables. Psychological Methods, 7, 19–40. format and the structure of the personal orientation inventory.
McCallum, L.S., Keith, B.R., & Wiebe, D.T. (1988). Comparison Applied Psychological Measurement, 8, 409–419.
of response formats for multidimensional health locus of con- Weng, L.J. (2004). Impact of the number of response categories
trol scales: Six levels versus two levels. Journal of Personality and anchor labels on coefficient α test-retest reliability. Edu-
Assessment, 52, 732–736. cational and Psychological Measurement, 64, 956–972.
McKelvie, S.J. (1978). Graphic rating scales. How many catego-
ries? British Journal of Psychology, 69, 185–202.
Moreno, R., Martínez, R.J., & Muñiz, J. (2004). Directrices para la
construcción de ítems de elección múltiple [A guide to the mul- Luis M. Lozano
tiple choice items development]. Psicothema, 16, 490–497.
Muñiz, J. (2003). Teoría clásica de los tests [Classical test theory]. Facultad de Psicología
Madrid: Pirámide. Universidad de Jaén
Muñiz, J., García-Cueto, E., & Lozano, L.M. (2000). The influ- Campus Las Lagunillas
ence of the number of categories of the items on the psycho- E-23071 Jaén
metric properties of the scale. Paper presented at the XXVII Spain
International Congress of Psychology, Stockholm, Sweden. E-mail lmlozano@ujaen.es

© 2008 Hogrefe & Huber Publishers Methodology 2008; Vol. 4(2):73–79

You might also like