Professional Documents
Culture Documents
Neuendorf
2
For example, a students thesis examined American students exposure to foreign films
(Ying, 2009). Her survey questionnaire included a checklist roster of many international
movies, and respondents checked all they had seen. e.g.:
Please check the films that you've seen.
____A Very Long Engagement (France)
____About My Mother (Spain)
____Amelie (France)
____Amores Perros (Mexico)
____Antonia's Line (Holland)
____Asterix et Obelix: Mission Cleopatra (France)
____Au Revoir, Les Enfants (France)
____Autumn Tale (France)
____Avenue Montaigne(France)
____Babette's Feast (Denmark)
____Bad Education (Spain)
____Belle Epoque (Spain)
____Betty Blue (France)
The main scale created from these multiple indicators was a straight addition of the
number of foreign films seen (i.e., each item was coded as 0=not checked, 1=checked).
There was no reason to expect strong intercorrelations among the items. That is, just
because a respondent had seen Amelie, we would not expect them to have seen Bad
Education as well. The entire logic of this type of scale is one of a counting of activities,
thoughts, or the like.
One more example (Measurement, 2001) might be a sexism scale in which participants
were supposed to report the extent to which they have experienced a number of different
sexist situations. We would not necessarily expect the experience of one event to be
related to experiencing another event. In a case such as this, the [internal consistency]
reliability would be somewhat low, yet we may still want to sum the scores to give us an
indication of how many events they experienced (p. 56).
3
portion of the construct has been measuredrepeatedly! As noted by Clark and Watson
(1995), maximizing internal consistency almost invariably produces a scale that is quite
narrow in content; if the scale is narrower than the target construct, its validity is
compromised (p. 316).
Additionally, highly redundant measures in a questionnaire or interview can frustrate the
respondent (Didnt you just ask me that?). This can clearly lead to poor measurement,
as respondents tune out or get angry at the researchers.
Criteria/Rules of Thumb
As Clark and Watson (1995) note, the issue of internal consistency reliability assessment
is complicated by the fact that there are no longer any clear standards regarding what
level. . . is considered acceptable for Cronbachs alpha (p. 315); past criteria have
ranged from .80 or .90 alpha coefficients, down to .60 or .70 alphas.
As noted above, some scholars find Cronbachs alpha to be too sensitive to number of
measures/items, and prefer the use of the raw mean interitem correlation as a statistical
marker of internal consistency. For this, a rule of thumb is offered by Briggs and Cheek
(1986): The optimal level of homogeneity occurs when the mean interitem correlation
is in the .2 to .4 range (p. 114). Clark and Watson (1995) offer: we recommend that the
average interitem correlation fall in the range of .15-.50. . . if one is measuring a broad
higher order construct such as extraversion, a mean correlation as low as .15-.20 probably
is desirable; by contrast, for a valid measure of a narrower construct such as
talkativeness, a much high mean intercorrelation (perhaps in the .40-.50 range) is needed
(p. 316).
References:
Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural
equation perspective. Psychological Bulletin, 110, 305-314.
Briggs, S. R., & Cheek, J. M. (1986). The role of factor analysis in the evaluation of
personality scales. Journal of Personality, 54, 106-148.
Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale
development. Psychological Assessment, 7, 309-319.
Measurement. (2001). Journal of Consumer Psychology, 10(1&2), 55-69.
Streiner, D. L. (2003). Being inconsistent about consistency: When coefficient alpha does
and doesnt matter. Journal of Personality Assessment, 80(3), 217-222.
Ying, L. (2009). ?????.