You are on page 1of 9

Journal of Consumer Research, Inc.

Content-Analysis Research: An Examination of Applications with Directives for Improving


Research Reliability and Objectivity
Author(s): Richard H. Kolbe and Melissa S. Burnett
Source: Journal of Consumer Research, Vol. 18, No. 2 (Sep., 1991), pp. 243-250
Published by: The University of Chicago Press
Stable URL: http://www.jstor.org/stable/2489559 .
Accessed: 10/06/2014 23:48
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

The University of Chicago Press and Journal of Consumer Research, Inc. are collaborating with JSTOR to
digitize, preserve and extend access to Journal of Consumer Research.

http://www.jstor.org

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

Research:
Content-Analysis
of

Applications

Research

with

Reliability

Examination

An

Directives

for

Improving

Objectivity

and

RICHARDH. KOLBE
MELISSAS. BURNETT*
This article provides an empirical review and synthesis of published studies that
have used content-analysis methods. Harold Kassarjian's criticalguidelines for content-analysis research were used to examine the methods employed in 128 studies.
The guidelines were expanded by providing an empirical investigation of multiple
dimensions of objectivity. Reliabilityissues were also assessed by examining factors
central to the replication and interjudge coefficient calculations. The findings indicate
a general need for improvement in the application of content-analysis methods.
Suggestions for calculating reliabilitycoefficients and for improving the objectivity
and reliabilityof research are offered.

content analysis,' the availability and specificity of


Kassarjian's work make it a methodological benchmark
for the field of consumer behavior and a guide for the
current review.
The present study extends Kassarjian's criteria by
operationalizing his stated directives for the purpose of
examining the methods and applications of content
analysis in consumer research since 1977. An empirical
methodological review of this kind integrates and systematically critiques past research (Cooper 1989). Such
a review also helps current and future content analysts
to examine their methodological decisions closely.
Given the increased use of other qualitative methods
aligned with content analysis (e.g., protocol analysis,
process analysis, integrative literature reviews), the relevance of this review and critique of research methods
extends to a variety of consumer research activities.

major topic in consumer behavior research concerns consumer communications. The breadth of
communications that exist in the consumer arena is
extensive and involves media advertising, printed materials, and sundry verbal and nonverbal messages
created by a host of sources. Included in the many
methods used to study consumer communications is
content analysis. Content analysis has become widely
used for evaluating various communication forms relevant to consumer behavior scholars (Yale and Gilly
1988).
A catalyst for such research use was Kassarjian's
(1977) Journal of Consumer Research article entitled
"Content Analysis in Consumer Research." Undoubtedly, one reason that Kassarjian wrote this article was
his observation that previous content analyses frequently did not meet accepted methodological standards. Consequently, Kassarjian offered directives for
improving content analyses in the areas of objectivity,
systematization, quantification, sampling, and reliability. While other sources describe accepted methods of
A

THE ROLE OF CONTENT ANALYSIS


IN CONSUMER BEHAVIOR RESEARCH
Content analysis is an observational research method
that is used to systematically evaluate the symbolic
content of all forms of recorded communications. These
communications can also be analyzed at many levels
(image, word, roles, etc.), thereby creating a realm of
research opportunities.

*Richard H. Kolbe is assistant professor of marketing at Washington State University, College of Business and Economics, Pullman,
WA 99164. Melissa S. Burnett is assistant professor of marketing at
Southwest Missouri State University, College of Business Administration, Springfield, MO 65804. The authors thank F. Robert Dwyer,
U. N. Umesh, and John Mowen for their helpful comments and encouragement on earlier versions of this manuscript, and Peter V.
Raven and Michelle M. McCann for their assistance in data collection.

'These include Berelson (1952), Budd, Thorp, and Donohew (1967),


Holsti (1969), Kerlinger (1986), Krippendorff (1980), Rosengren
(1981), and Weber (1985).

243
X 1991by JOURNALOF CONSUMERRESEARCH,Inc.* Vol. 18 * September1991
All rightsreserved.0093-5301/92/1802-0010$2.00

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

244

JOURNAL OF CONSUMER RESEARCH

Content analysis offers a number of benefits to consumer researchers. First, content analysis allows for an
unobtrusive appraisal of communications. This unobtrusiveness is particularly valuable in situations in which
direct methods of inquiry might yield biased responses.
Second, content analysis can assess the effects of environmental variables (e.g., regulatory, economic, and
cultural) and source characteristics (attractiveness,
credibility, and likability) on message content, in addition to the effects (cognitive, affective, and behavioral)
of different kinds of message content on receiver responses. Knowledge of message-content effects and receiver responses is of considerable interest to consumer
researchers.
Third, content analysis provides an empirical starting
point for generating new research evidence about the
nature and effect of specific communications. For example, content analyses of female roles in advertising
have aided experimental studies of role effects on various audiences.
Further, content analysis has potential as a companion research method in multimethod studies (Brewer
and Hunter 1989). Multimethod research uses divergent
methods to enhance the validity of results by mitigating
method biases. For example, attitudinal self-report
measures could be compared with content analysis
findings. Here, content analysis could be used to classify
an individual's possessions to assess convergence between attitudes and actual behaviors in areas such as
materialism, social responsiveness, and gender stereotyping.
While the potential benefit of using content analysis
in consumer research seems extensive, some consideration also needs to be given to its inherent weaknesses.
For instance, this method is quite susceptible to the
effects of researcher biases, which, in turn, can affect
decisions made in the collection, analysis, and interpretation of data. Given that researchers wish to draw
inferential conclusions from data, the existence of these
biases can affect a study's contribution to knowledge.
Content analyses are constrained in their potential
in that they often are limited to reporting specific elements in communications. This type of exploratory approach makes it difficult to consider theoretical perspectives. In addition, content analyses often yield
categorical data. Although these data are rich in descriptive, classificatory, and identification powers, they
may be less sensitive to subtleties in communications
than are data obtained from higher-order scales or from
other research methods.
Given the potential methodological problems associated with content analysis, it is useful to assess the
nature of past applications. Using the directives that
Kassarjian offered for content-analysis research, the
current study investigated (1) whether researchers conducted and reported studies in accordance with the
critical method areas for content analysis, (2) whether

there were any trends in content-analysis procedures


used during the period of this study that reflected adherence to or rejection of Kassarjian's concepts, and
(3) whether method quality differed by topic area, publication year, publication outlet, media analyzed, and
whether Kassarjian (1977) was cited.

METHOD

Sample
Content-analysis articles were identified by a search
of consumer behavior/marketing (Journal of Market-

ing, Journal of Marketing Research, Journal of Consumer Research), advertising(Journal of Advertising,


Journal ofAdvertisingResearch),and communication
(Journal of Broadcasting,Journal of Communication,
Journalism Quarterly) journals, and conference proceedings (American Academy of Advertising, Association for Consumer Research, American Marketing
Association) that were published between 1978 and
mid-1989. Other studies were obtained from computerized reference searches,2 a review of Communication
Abstracts and journal indices, and systematic perusal
of individual journals and proceedings to find articles
not referenced elsewhere.
Studies that used content-analysis-related techniques
to evaluate written and verbal protocols and author
productivity issues were excluded. Research that did
not use content analysis as the primary method was
excluded, since these studies were likely to omit detailed
methodological information.
The resulting sample contained 128 articles that were
obtained from 28 journals, three proceedings, and one
anthology. The content of articles ranged from research
on ad information content to materialism themes and
gender roles to ACR presidential addresses.3 The diversity of journal articles not only indicates the extent
of consumer-related content-analysis research, but also
the breadth of outlets used by marketing and consumer
behavior scholars. Many articles from nonmainstream
journals were retained because of the high incidence of
marketing-affiliated authorship.
The major sources for content-analysis studies were

Journal of Advertising(n = 15), Journalism Quarterly


(n = 15), Advances in Consumer Research (n = 11),
AMA Proceedings (n = 9), and Journal of Consumer
Research (n = 9). The content analyses primarily evaluated magazine (n = 56) and television (n = 39) ads,
which together represented nearly three-quarters
(74.2 percent) of the sample studies. No other sampling element received more than five applications. The
2Data bases searched were Management Contents, ABI/Inform,
Sociological Abstracts, and Psychinfo (key words were "content
analysis" and "advertising").
3A bibliography and detailed summary table of the reviewed articles
are available from R.H.K.

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

245

CONTENT-ANALYSISRESEARCH

distribution of sampled articles by year is shown in


Table 1.
Whilethe emphasisof the researchin the samplewas
on advertising,the largestnumberof authorshad marketing-departmentaffiliations(41.8 percent).Only 12.2
percentof the authorswere from communications,advertising,andjournalism schools or departmentscombined. Numerous other affiliationsaccounted for the
remainderof the authorship,includingpsychology(14.3
percent), sociology (12.2 percent), and industry (6.6
percent).

Coding Issues
The operationaldefinitionsand categories(objectivity, systematization, quantification,4 sampling, reliability) used to code the studies are detailed in the Results and Discussion section of this article.'
The authorsindependently analyzed all of the articles. Disagreementsin codings were resolved by discussingkey termsandjointly reviewingthe articlesuntil
a consensuswas reached.Two trainedjudges, working
independently,analyzed all the articles to provide reliability and accuracy checks of the authors' codings.
Perreaultand Leigh's (1989) method was used to calculate interjudgereliabilities,as reportedin Table 2.

RESULTS AND DISCUSSION

Objectivity
Objectivityrefersto the processby which analytical
categoriesare developed and used. Precise operational
definitionsand detailedrulesand proceduresfor coding
are needed to facilitatean accurateand reliablecoding
process. Detailed rules and proceduresreducejudges'
subjective biases and allow replication by others (a
check on researcherbiases). Using multiple, trained,
independentjudges also enhances objectivity.
Objectivitywas measuredby whether (1) rules and
procedureswere reported, (2) judge training was reported, (3) pretesting of measures was reported, (4)
judges were independentof the authors,and (5) judges
workedindependentof one another.The independence
of judges in measure four means that the coder and
authorwere not one and the same person. While more
stringentstandardsof independence could be offered,
the use of coders other than the authors is a primary
and measurablecomponent of objectivity.The number
of judges used in the study was also coded.
4Kassarjian (1977) primarily discusses the debate over what constitutes quantification. Thus, only one measure, "the highest level of
data collected in the study," was used to assess quantification with
the following results: 112 studies had categorical, three ordinal, 11
interval, and two ratio data.
'Copies of the coding sheet and operational definitions are available
from R.H.K.

TABLE 1
DISTRIBUTIONOF SAMPLED ARTICLESBY YEAR
Year

No. of articles

Percentage of sample

1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
First half of 1989

2
6
7
7
6
18
13
12
20
18
15
4

1.6
4.7
5.5
5.5
4.7
14.1
10.2
9.4
15.6
14.1
11.7
3.1

Rules and Procedures. Descriptions of rules and


procedures are necessary for the validation of research
findings and future replication. About 71 percent (n
= 91) of the articles provided details of categories and
operational definitions. Another 20 articles (15.6 percent) cited previous research as the source for rules and
procedures (most were extensions of the referenced
work). Seventeen of the studies ( 13.3 percent) failed to
provide details of rules and procedures. These results
suggest that authors have done a reasonable job in reporting categories and definitions. Although replication
may not be executable in practice, the presence of categories and coding rules is an essential first step for replication.
Judge Training. Training judges is important to
objectivity because it increases the coders' familiarity
with the coding scheme and operational definitions,
thereby improving interjudge and intrajudge coding reliability. Judge training was reported in 52 studies (40.6
percent). Over 48 percent of the studies did not report
any details of judge training (n = 62). Studies in which
the authors served as judges (n = 14; 10.9 percent) were
classified as "no training" studies. It would be difficult
to envision coders not receiving some form of training.
Thus, the finding that only 40.6 percent of the studies
had training may simply reflect a failure to report procedures.
Measure Pretesting. Pretesting categories and definitions checks the reliability of the coding process.
Ninety studies (70.3 percent) did not report measure
pretesting in judge training sessions or elsewhere.
Thirty-eight articles (29.7 percent) did report pretesting.
In general, the absence of references to judge training
or measure pretesting indicates a reporting gap and,
perhaps, weaknesses in research execution.
Judge Independence. The independence of judges
assesses their freedom to make autonomous judgments
without inputs from the researcher(s) or other judges.

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

JOURNAL OF CONSUMER RESEARCH

246
TABLE2

CODING CATEGORIESAND INTERJUDGERELIABILITIES


Interjudge reliabilitiesa

Content-analysis dimensions

Authors vs.
judge 2

Judge 1 vs.
judge 2

.963
.937

.877
.907

.911
.853

.951
.894
.972
.937

.951
.884
.884
.884

.951
.853
.831
.869

.965
.982
.993

.853
.944
.891

.829
.954
.877

.968
.944
.982
.948

.923
.934
.982
.920

.929
.924
.982
.880

.969
.963

.916
.884

.937
.893

1.000
.967
.967
.979
.982
.969
1.000
1.000

1.000
.903
.876
.979
.982
.963
1.000
1.000

1.000
.921
.886
.979
.982
.957
1.000
1.000

Authors vs.
judge 1

Objectivity items:
Were rules and procedures given?
Were judges trained?
Was a pretest of unit measures
conducted?
How many judges were used in coding?
Were independent judges used?
Did judges code stimuli independently?
Systematization items:
Method of inquiry
Was a theoretical perspective examined?
Data collection orientation
Sampling methods:
Sampling method
Sample size
Effective sample size
Time span of sampling period
Reliability:
Reliabilitycoefficient
Method of reporting reliabilities
Other variables:
Year of publication
Topical issues
Subcategories of issues
Sampling element
Media used for data collection
Type of data
Publication reference
Kassarjian (1977) cited in article
ainterjudge reliabilitiesby Perreault and Leigh's (1989) method.

Sixty-two studies (48.4 percent) clearly used independent judges. Many authors referred to judges as "independent," although the meaning of the term may differ from the definition used here. Judge independence
could not be determined in 52 studies (40.6 percent).
In 14 studies (10.9 percent) the authors served as judges;
thus, independence was not possible.
About 33 percent of the studies reported individual
viewing of the stimuli (n = 43). Ten studies (7.8 percent)
had judges work in pairs or group settings. The remainder of the studies had either a single judge (n = 2)
or did not report judge independence (n = 73; 57.0 percent).
Number of Judges. As shown, two coders were most
frequently used. This finding is subject to some caveats.
The coding scheme counted all judges, including those
who judged only a subset of the sample. Our coding
indicates the total number of judges involved, which
may overstate involvement and multiple judgeships.
Further, in 10 of the 49 two-judge studies, the authors
served as coders. Consequently, the objectivity of the

studies, based on the number of coders and judge independence, is suspect.

Singlejudge
Two judges
Three or morejudges
Not reportedin article

Frequency

Percentage

2
49
39
38

1.6
38.3
30.4
29.7

Objectivity Index. The multiple measures used to


examine objectivity point to the need for an overall
assessment.6 The five objectivity issues, excluding the
number of judges, were dichotomized (1 = yes; 0 = no
or not reported) and summed for each study. Thus, the
range for this scale was 0-5, where 5 represented adherence to all objectivity items. The objectivity index
was measured by variables for the year of publication,

6These measures include rules and procedures, measure pretesting,


judge training, judge independence, and judge coding autonomy.

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

CONTENT-ANALYSIS RESEARCH

topical issues, media analyzed, publication outlet, and


whether Kassarjian (1977) was referenced.
Analyses of variance were computed for the five independent variables. The only significant difference was
found on the dimension of whether Kassarjian was cited
(F(1,126) = 13.51, p < .001). Studies that cited Kassarjian had higher levels of objectivity (X = 2.96) than
those that did not (X = 2.06), providing evidence of the
contribution of Kassarjian's article and its positive effect
on methodological rigor.
Objectivity Overview. Objectivity is a fundamental
component of content analysis because it encompasses
details that directly affect the overall quality of the
judging process. Nevertheless, the majority of past content-analysis research either has failed to perform or
report measurements of objectivity or has not made
available the objectivity-related procedures that were
followed. While the absence of such detailed reporting
does not necessarily mean that these steps were omitted,
concern still exists for judging precision and the capacity
to adequately replicate and extend past studies. Although objectivity procedures require greater time for
the analyst and space in journals, such action is critical.

Systematization
Kassarjian (1977) states (citing Berelson [1952] and
Holsti [1969]) that systematization requires research
procedures to (1) eliminate biased selection of communications or classification categories to suit the analyst's thesis and (2) examine scientific problems or hypotheses.
Appraisal of the first requirement demands expertise
in all applications of content analysis. Because such a
background is not reasonable, we focus on hypothesis
and theory testing and research designs.
Hypothesis Testing. Articles with formal statements
of predicted relationships between two variables (Kerlinger 1986), supported by research or theory, were
coded as having hypotheses. Statements in the form of
questions, suppositions, and general predictions without
the specificity of a hypothesis were classified as research
questions. Approximately 48 percent of the studies (n
= 62) stated research questions as the basis for empirical
investigation. Sixteen studies (12.5 percent) tested hypotheses. Thirty-nine percent (n = 50) had neither hypotheses nor research questions.
Theory Testing. Theory testing or theoretical perspectives were observed infrequently in the studies (n
= 7; 5.5 percent). While minimally present, content
analysis does have a role in theory-testing research. A
discussion of this role is presented in the overview of
systematization.
Data Collection Designs. Systematization was also
examined by the three data collection designs offered

247

by Holsti (1969).7 The first design describes the characteristics of communications. These studies address
the "who," ''what,"9and "to whom" questions of the
communication process. This design was used in 96 of
the reviewed articles (75.0 percent). The second design
makes inferences about the antecedent of communications. Here the analyst ascertains who the source was
(when unknown) and why the communication was encoded. This design was used in 22 studies ( 17.2 percent).
Finally, the third design makes inferences about the
effects of the communication. This level of data collection attempts to infer the receiver's decoding process.
Comparisons between the sender's and the receiver's
message content, or behavioral response, facilitates inference generation. Ten studies (7.8 percent) were
judged as making inferences about communications effects.
Differences in the use of these three orientations were
assessed by chi-square tests. No significant differences
in the designs were found by publication year, topic,
media analyzed, publication outlet, or citing of Kassarjian (1977).
Systematization Overview. More hypothesis-testing
research would enhance the perceived value of content
analysis to consumer research. The number of descriptive designs was not unexpected and relates to Holbrook's (1977) assessment of their prevalence in the
field vis-'a-vis the other designs. Conversely, environmental-influence and communication-effect studies
represent largely untapped research approaches.
It is myopic to suggest that atheoretical contentanalysis studies have no value. Kassarjian (1977, p. 9)
states that data need only to be "linked" by "some form
of theory." He also implies that formal theory does not
need to be present, but that the investigator's "theory"
should be represented in models, research questions,
scientific problems, or basic trend analysis, thereby giving the systematization needed for a meaningful contribution. Since the linkage between content analysis
and theory testing is not clear, we offer some structure
to this issue. Borrowing from Lijphart's (1971) cate-gorizations for case study research, we propose five roles
for content analysis in theory development.
First, content analysis is valuable in collecting data
about communications when there are no theoretical
underpinnings. Such atheoretical content analyses are
useful in fostering future research and theory-building
efforts because they collect information about a communication form.
Second, when scholars use a theoretical perspective
as the basis for collecting data, without intending to
make generalizations to a larger population (i.e., they
are merely attempting to describe or explain data), such

'A consumer behavior perspective is offered by Holbrook (1977).

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

248

JOURNAL OF CONSUMER RESEARCH


TABLE 3

TABLE 4

RANGES OF SAMPLE SIZES

TYPES OF RELIABILITY INDICES USED

Effective sample size

No. of articles

1-200
201-300
301-600
601-1,000
1,001-2,000
Greater than 2,000
Multiplesamples used
Not discernible

33
15
22
20
15
15
5
3

Percentage of sample
25.8
11.7
17.2
15.7
11.7
11.7
3.9
2.3

research would be called "interpretative content analyses."


Third, in cases where researchers make inconclusive
predictions about a phenomenon, the exploratory value
of content analysis may be used to provide evidence for
specific hypotheses. Such studies would be called "hypothesis-generating content analyses."
Fourth, theory-confirming content analyses would
examine what is predicted by established theories,
thereby confirming or invalidating the theoretical position. These content analyses appraise the presence of
the predicted content.
Last, deviant-results content analyses would examine
those stimuli that fail to comply with the balance of the
sample. Such examination may help to explain unexpected variations.
The proposed categories reflect the supportive role
that content analysis has in theory development. The
largest contribution that this method can make is to
embellish, augment, accumulate, and describe information. The need for systematic study and information
acquisition, part of the initial steps in theory development, can be readily provided by content-analysis research.

Sampling Methods
Sampling addresses the issues of randomization,
manageability of sample size, and generalizability.
Randomization and generalizability were assessed by
examining sampling methods. Sample size was used to
evaluate the manageability issue.
The majority of samples were classified as convenience (n = 103; 80.5 percent). Probability samples (e.g.,
simple random, multistage, systematic, proportional,
stratified) were obtained in 18.8 percent of the studies.8
Significant differences were found for sample types on
the basis of publication outlet (X2 = 23.82, p < .001).
Random samples were found more frequently in the
Journal of Consumer Research and the Journal of Mar8In one study the sampling method could not be determined.

Reliability coefficient

No. of
articles

Percentage
of sample

Coefficient of agreement
Krippendorf alpha
Holsti
Others (three or fewer uses)
Method of calculating not discernible
No reliability coefficient reported

41
9
4
10
24
40

32.0
7.0
3.1
7.8
18.8
31.3

keting than in other publications. Given the additional


time, effort, and resources necessary to obtain random
samples, this finding is not surprising.
The range of sample sizes obtained in the studies is
shown in Table 3. As shown, the majority of samples
had 600 or fewer units. In television studies, half of the
samples had 280 or fewer units, with the top third of
the samples exceeding 367 units. Conversely, the bottom 50 percent of magazine samples had 660 or fewer
units, and the top third contained 1,300 or more units.
The differences by media likely reflect the relative ease
of sample collection for magazines as compared with
television.

Reliability
Reliability in content analysis includes categorical
and interjudge reliabilities. The limited information
provided in methodology sections of other articles has
hindered evaluation of categorical reliability. Thus, this
discussion centers on interjudge reliability.
Interjudge reliabilities are largely influenced by the
procedural issues that we previously addressed in the
Objectivity section. However, two important issues remain: the calculation and reporting of reliabilities.
Interjudge reliability is often perceived as the standard measure of research quality. High levels of disagreement among judges suggest weaknesses in research
methods, including the possibility of poor operational
definitions, categories, and judge training.
Reliability Index Use. The most frequently used
reliability index was the coefficient of agreement (the
total number of agreements divided by the total number
of coding decisions; see Table 4). Often authors referred
to interjudge agreement or interjudge reliability without
specifying the calculation method. However, most
noteworthy and troublesome is the finding that over 30
percent of the articles did not report any reliability coefficient. The absence of reliability figures does not allow
a thorough appraisal of the analyst's work and raises
questions about the credibility of the findings.
Reliability Index Reporting. Forty-six of the articles
(35.9 percent) reported one "overall reliability" for the

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

249

CONTENT-ANALYSIS RESEARCH

study. Thirty-one studies (24.2 percent) reported reliabilities on individual measures, and 11 (8.6 percent)
reported ranges of reliabilities. These findings are of
particular concern since the "overall reliability approach" can yield misleading results. While agreement
may be high in the aggregate, low ratings on individual
measures may be hidden by pooled results. Consequently, ranges and individual reliabilities, in particular,
are superior reporting methods.
An analysis of reporting procedures by the major dependent measures (publication year, topical issue, and
so on) found only one significant chi-square calculation.
The reporting procedures differed when Kassarjian
(1977) was cited (X2 = 20.48, p < .001). Studies that
did not cite Kassarjian (1977) reported more incidents
of "no reliabilities" than did those that cited the article.
Kassarjian's discussion of reporting requirements may
have encouraged detailed reliability reporting.
Reliability Overview. Reliance on the coefficientof-agreement index suggests that attention needs to be
directed toward reliability calculation issues, given the
potential for biased scores (Scott 1955). One weakness
of the coefficient of agreement is the impact of the
number of coding decisions on the reliability score. As
the number of categories decreases, the probability of
interjudge agreement by chance increases. For example,
one would expect greater agreement with only two categories than with five categories because of the higher
probability of chance agreements.
Agreement also can be inflated by adding categories
that seldom are present in a communication. When one
calculates reliabilities including these categories, the
"agreements" on "nonrelevant" categories compensate
for disagreements on other classification categories.
While this applies to other indices of reliability, the
coefficient of agreement is especially subject to these
two problems. These factors limit the establishment of
standards with which to compare calculated reliabilities;
they also hinder comparisons across content studies
(Perreault and Leigh 1989).
Cohen's kappa (Cohen 1960), which has received extensive use in judgment-based coding procedures, was
used in only three of the studies examined. Kappa was
developed to remove the impact of chance agreements
among judges.9 However, the use of kappa is difficult
in content analysis because a key value, the number of
chance agreements for any particular category, is generally unknown. For nominal data, Cohen's kappa and
its derivations (Brennan and Prediger 1981; Umesh,
Peterson, and Sauber 1989) are viable methods for calculating reliabilities when the number of chance agreements is known or can be reasonably approximated.
9Cohen (1960) calculates kappa (K) as follows: K = (Fo -FJI(N
F), where N is the total number of judgments made by each judge,
F0 is the number of judgments on which the judges agree, and FCis
the number of judgments for which agreement is expected by chance.
-

Perreault and Leigh (1989) propose a reliability index


for use in judgment-based research.10 This index accounts for differences in reliabilities due to the number
of categories, focuses the issue of reliability on the whole
coding process, and is sensitive to coding weaknesses.
Whether researchers use Perreault and Leigh's index
of reliability, kappa (or kappa(,)), the analysis of variance estimate of reliability for interval-based judgments
(Winer 1971), Scott's pi (Scott 1955), or another reliability coefficient (see Hughes and Garrett 1990; Suen
and Ary 1989), consideration still needs to be given to
the merits of the selected reliability index. Acceptance
of the coefficient of agreement as the index of choice
does little to reflect the reliability needed in content
analysis.
The absence of many objectivity criteria in the studies
that we examined demonstrates the need to attend to
the process by which reliability occurs. Reliability does
not occur simply because the agreement coefficient exceeds .80. The use of appropriate reliability calculations
has obvious importance. However, if judges are making
consonant, but incorrect, judgments about the stimuli,
little is accomplished. Research will be improved more
by focusing on operational procedures that increase interjudge reliability than by quibbling about the quality
of the agreement index.

CONCLUSION
This methodological review of content-analysis research identified weaknesses in the methods used and
reported. Of primary interest in this study was adherence to Kassarjian's (1977) standards. The results indicate that there are a number of gaps in the methods
and procedures used by analysts in the areas of objectivity and reliability. Since these two areas are at the
heart of content-analysis research and directly affect research quality, the seriousness of the problem becomes
evident.
Most factors pertaining to objectivity were either unreported or unattended by authors. Problems with reliability reporting and coefficient selection were also
present. When the reliability of content-analysis research is in question, either because of an inability to
replicate the study or ineffectual or unreliable coding,
the value of the research is minimized.
It is interesting that there was no significant change
in the procedures used in content analysis over the
sampling time span. The only variable that accounted
for differences in overall objectivity and reliability reporting measures was whether Kassarjian's article was
cited. These findings empirically demonstrate the con'"The Perreault and Leigh (1989) reliability index is as follows: I,
={[(F,/N) - (Ilk)][kl(k - 1)]}O5, for F/N>
Il/k, where Fo is the
observed frequency of agreement between judges, Nis the total number of judgments, and k is the number of categories.

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

250

JOURNAL OF CONSUMER RESEARCH

tribution of Kassarjian's article and the need to fully


consider his guidelines for content-analysis research.
It is important to note that the results of the current
study are limited to the details of the procedures and
methods that were reported by authors. Consequently,
while researchers may have followed procedures that
would have increased levels of objectivity, reliability,
and so on, these activities could not be measured by
the procedures employed here because the methodological details were not reported in the original studies.
Consequently, one must be cautious in criticizing content-analysis practitioners. Nevertheless, this result
suggests that methodology reporting is critical for discerning the quality and usefulness of content-analysis
studies as well as for allowing replication.
We also need to keep in mind that the criteria used
in this study are idealistic benchmarks of how to execute
a study. Clearly, research decisions are constrained by
various factors. The question this study raises is: To
what level and degree should these criteria be adopted
to assure minimum standards of validity? While we do
not wish to devalue research standards for the sake of
research parsimony, there would seem to be some middle ground between methodological ideals and practical
research decisions. It is recommended that researchers
emphasize objectivity and reliability issues as discussed
here. It has been demonstrated that the quality of content analyses in consumer research will improve as a
result.
Given the myriad research techniques available today, one might question whether content-analysis research has become an outdated technique of limited
value. Quite the opposite would seem to be true. Content analysis is an important and (re-)emerging method
for facilitating many other types of analyses. Potential
contributions also exist in the role that content analysis
can play in theory development. In summary, if attention is directed toward the problems associated with the
content-analysis methods identified here, this research
technique can offer a substantial contribution to consumer behavior research activities.
[Received February 1990. Revised February 1991.]

REFERENCES
Berelson, Bernard (1952), Content Analysis in Communications Research, Glencoe, IL: Free Press.
Brennan, Robert L. and Dale J. Prediger (1981), "Coefficient
Kappa: Some Uses, Misuses, and Alternatives," Educa-

tional and Psychological Measurement, 41 (Autumn),


687-699.
Brewer, John and Albert Hunter (1989), Multimethod Research. A Synthesis of Styles, Newbury Park, CA: Sage.
Budd, Richard W., Robert K. Thorp, and Lewis Donohew
(1967), Content Analysis of Communications, New York:
Macmillan.
Cohen, Jacob (1960), "A Coefficient of Agreement for Nominal Scales," Educational and Psychological Measurement, 20 (Spring), 37-46.
Cooper, Harris M. (1989), Integrating Research: A Guide for
Literature Reviews, Newbury Park, CA: Sage.
Holbrook, Morris B. (1977), "More on Content Analysis in
Consumer Research," Journal of Consumer Research, 4
(December), 176-177.
Holsti, Ole R. (1969), Content Analysis for the Social Sciences
and Humanities, Reading, MA: Addison-Wesley.
Hughes, Marie Adele and Dennis E. Garrett (1990), "Intercoder Reliability Estimation Approaches in Marketing:
A Generalizability Theory Framework for Quantitative
Data," Journal of Marketing Research, 27 (May), 185195.
Kassarjian, Harold H. (1977), "Content Analysis in Consumer
Research," Journal of Consumer Research, 4 (June), 818.
Kerlinger, Fred N. (1986), Foundations of Behavioral Research, New York: Holt, Rinehart & Winston.
Krippendorff, Klaus (1980), Content Analysis. An Introduction to Its Methodology, Beverly Hills, CA: Sage.
Lijphart, Arend (1971), "Comparative Politics and the Comparative Method," American Political Science Review,
65 (September), 682-693.
Perreault, William D., Jr. and Laurence E. Leigh (1989),
"Reliability of Nominal Data Based on Qualitative
Judgments," Journal of Marketing Research, 26 (May),
135-148.
Rosengren, Karl E. (1981), Advances in Content Analysis,
Beverly Hills, CA: Sage.
Scott, William A. (1955), "Reliability of Content Analysis:
The Case of Nominal Scale Coding," Public Opinion
Quarterly, 19 (Fall), 321-325.
Suen, Hoi K. and Donald Ary (1989), Analyzing Quantitative
Behavioral Observation Data, Hillsdale, NJ: Erlbaum.
Umesh, U. N., Robert A. Peterson, and Matthew H. Sauber
(1989), "Interjudge Agreement and the Maximum Value
of Kappa," Educational and Psychological Measurement,
49 (Winter), 835-850.
Weber, Robert P. (1985), Basic Content Analysis, Beverly
Hills, CA: Sage.
Winer, B. J. (1971), Statistical Principles in Experimental
Design, New York: McGraw-Hill.
Yale, Laura and Mary C. Gilly (1988), "Trends in Advertising
Research: A Look at the Content of Marketing-oriented
Journals from 1976 to 1985," Journal ofAdvertising, 17
(1), 12-22.

This content downloaded from 188.72.127.95 on Tue, 10 Jun 2014 23:48:14 PM


All use subject to JSTOR Terms and Conditions

You might also like