Professional Documents
Culture Documents
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
The University of Chicago Press and Journal of Consumer Research, Inc. are collaborating with JSTOR to
digitize, preserve and extend access to Journal of Consumer Research.
http://www.jstor.org
Research:
Content-Analysis
of
Applications
Research
with
Reliability
Examination
An
Directives
for
Improving
Objectivity
and
RICHARDH. KOLBE
MELISSAS. BURNETT*
This article provides an empirical review and synthesis of published studies that
have used content-analysis methods. Harold Kassarjian's criticalguidelines for content-analysis research were used to examine the methods employed in 128 studies.
The guidelines were expanded by providing an empirical investigation of multiple
dimensions of objectivity. Reliabilityissues were also assessed by examining factors
central to the replication and interjudge coefficient calculations. The findings indicate
a general need for improvement in the application of content-analysis methods.
Suggestions for calculating reliabilitycoefficients and for improving the objectivity
and reliabilityof research are offered.
major topic in consumer behavior research concerns consumer communications. The breadth of
communications that exist in the consumer arena is
extensive and involves media advertising, printed materials, and sundry verbal and nonverbal messages
created by a host of sources. Included in the many
methods used to study consumer communications is
content analysis. Content analysis has become widely
used for evaluating various communication forms relevant to consumer behavior scholars (Yale and Gilly
1988).
A catalyst for such research use was Kassarjian's
(1977) Journal of Consumer Research article entitled
"Content Analysis in Consumer Research." Undoubtedly, one reason that Kassarjian wrote this article was
his observation that previous content analyses frequently did not meet accepted methodological standards. Consequently, Kassarjian offered directives for
improving content analyses in the areas of objectivity,
systematization, quantification, sampling, and reliability. While other sources describe accepted methods of
A
*Richard H. Kolbe is assistant professor of marketing at Washington State University, College of Business and Economics, Pullman,
WA 99164. Melissa S. Burnett is assistant professor of marketing at
Southwest Missouri State University, College of Business Administration, Springfield, MO 65804. The authors thank F. Robert Dwyer,
U. N. Umesh, and John Mowen for their helpful comments and encouragement on earlier versions of this manuscript, and Peter V.
Raven and Michelle M. McCann for their assistance in data collection.
243
X 1991by JOURNALOF CONSUMERRESEARCH,Inc.* Vol. 18 * September1991
All rightsreserved.0093-5301/92/1802-0010$2.00
244
Content analysis offers a number of benefits to consumer researchers. First, content analysis allows for an
unobtrusive appraisal of communications. This unobtrusiveness is particularly valuable in situations in which
direct methods of inquiry might yield biased responses.
Second, content analysis can assess the effects of environmental variables (e.g., regulatory, economic, and
cultural) and source characteristics (attractiveness,
credibility, and likability) on message content, in addition to the effects (cognitive, affective, and behavioral)
of different kinds of message content on receiver responses. Knowledge of message-content effects and receiver responses is of considerable interest to consumer
researchers.
Third, content analysis provides an empirical starting
point for generating new research evidence about the
nature and effect of specific communications. For example, content analyses of female roles in advertising
have aided experimental studies of role effects on various audiences.
Further, content analysis has potential as a companion research method in multimethod studies (Brewer
and Hunter 1989). Multimethod research uses divergent
methods to enhance the validity of results by mitigating
method biases. For example, attitudinal self-report
measures could be compared with content analysis
findings. Here, content analysis could be used to classify
an individual's possessions to assess convergence between attitudes and actual behaviors in areas such as
materialism, social responsiveness, and gender stereotyping.
While the potential benefit of using content analysis
in consumer research seems extensive, some consideration also needs to be given to its inherent weaknesses.
For instance, this method is quite susceptible to the
effects of researcher biases, which, in turn, can affect
decisions made in the collection, analysis, and interpretation of data. Given that researchers wish to draw
inferential conclusions from data, the existence of these
biases can affect a study's contribution to knowledge.
Content analyses are constrained in their potential
in that they often are limited to reporting specific elements in communications. This type of exploratory approach makes it difficult to consider theoretical perspectives. In addition, content analyses often yield
categorical data. Although these data are rich in descriptive, classificatory, and identification powers, they
may be less sensitive to subtleties in communications
than are data obtained from higher-order scales or from
other research methods.
Given the potential methodological problems associated with content analysis, it is useful to assess the
nature of past applications. Using the directives that
Kassarjian offered for content-analysis research, the
current study investigated (1) whether researchers conducted and reported studies in accordance with the
critical method areas for content analysis, (2) whether
METHOD
Sample
Content-analysis articles were identified by a search
of consumer behavior/marketing (Journal of Market-
245
CONTENT-ANALYSISRESEARCH
Coding Issues
The operationaldefinitionsand categories(objectivity, systematization, quantification,4 sampling, reliability) used to code the studies are detailed in the Results and Discussion section of this article.'
The authorsindependently analyzed all of the articles. Disagreementsin codings were resolved by discussingkey termsandjointly reviewingthe articlesuntil
a consensuswas reached.Two trainedjudges, working
independently,analyzed all the articles to provide reliability and accuracy checks of the authors' codings.
Perreaultand Leigh's (1989) method was used to calculate interjudgereliabilities,as reportedin Table 2.
Objectivity
Objectivityrefersto the processby which analytical
categoriesare developed and used. Precise operational
definitionsand detailedrulesand proceduresfor coding
are needed to facilitatean accurateand reliablecoding
process. Detailed rules and proceduresreducejudges'
subjective biases and allow replication by others (a
check on researcherbiases). Using multiple, trained,
independentjudges also enhances objectivity.
Objectivitywas measuredby whether (1) rules and
procedureswere reported, (2) judge training was reported, (3) pretesting of measures was reported, (4)
judges were independentof the authors,and (5) judges
workedindependentof one another.The independence
of judges in measure four means that the coder and
authorwere not one and the same person. While more
stringentstandardsof independence could be offered,
the use of coders other than the authors is a primary
and measurablecomponent of objectivity.The number
of judges used in the study was also coded.
4Kassarjian (1977) primarily discusses the debate over what constitutes quantification. Thus, only one measure, "the highest level of
data collected in the study," was used to assess quantification with
the following results: 112 studies had categorical, three ordinal, 11
interval, and two ratio data.
'Copies of the coding sheet and operational definitions are available
from R.H.K.
TABLE 1
DISTRIBUTIONOF SAMPLED ARTICLESBY YEAR
Year
No. of articles
Percentage of sample
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
First half of 1989
2
6
7
7
6
18
13
12
20
18
15
4
1.6
4.7
5.5
5.5
4.7
14.1
10.2
9.4
15.6
14.1
11.7
3.1
246
TABLE2
Content-analysis dimensions
Authors vs.
judge 2
Judge 1 vs.
judge 2
.963
.937
.877
.907
.911
.853
.951
.894
.972
.937
.951
.884
.884
.884
.951
.853
.831
.869
.965
.982
.993
.853
.944
.891
.829
.954
.877
.968
.944
.982
.948
.923
.934
.982
.920
.929
.924
.982
.880
.969
.963
.916
.884
.937
.893
1.000
.967
.967
.979
.982
.969
1.000
1.000
1.000
.903
.876
.979
.982
.963
1.000
1.000
1.000
.921
.886
.979
.982
.957
1.000
1.000
Authors vs.
judge 1
Objectivity items:
Were rules and procedures given?
Were judges trained?
Was a pretest of unit measures
conducted?
How many judges were used in coding?
Were independent judges used?
Did judges code stimuli independently?
Systematization items:
Method of inquiry
Was a theoretical perspective examined?
Data collection orientation
Sampling methods:
Sampling method
Sample size
Effective sample size
Time span of sampling period
Reliability:
Reliabilitycoefficient
Method of reporting reliabilities
Other variables:
Year of publication
Topical issues
Subcategories of issues
Sampling element
Media used for data collection
Type of data
Publication reference
Kassarjian (1977) cited in article
ainterjudge reliabilitiesby Perreault and Leigh's (1989) method.
Sixty-two studies (48.4 percent) clearly used independent judges. Many authors referred to judges as "independent," although the meaning of the term may differ from the definition used here. Judge independence
could not be determined in 52 studies (40.6 percent).
In 14 studies (10.9 percent) the authors served as judges;
thus, independence was not possible.
About 33 percent of the studies reported individual
viewing of the stimuli (n = 43). Ten studies (7.8 percent)
had judges work in pairs or group settings. The remainder of the studies had either a single judge (n = 2)
or did not report judge independence (n = 73; 57.0 percent).
Number of Judges. As shown, two coders were most
frequently used. This finding is subject to some caveats.
The coding scheme counted all judges, including those
who judged only a subset of the sample. Our coding
indicates the total number of judges involved, which
may overstate involvement and multiple judgeships.
Further, in 10 of the 49 two-judge studies, the authors
served as coders. Consequently, the objectivity of the
Singlejudge
Two judges
Three or morejudges
Not reportedin article
Frequency
Percentage
2
49
39
38
1.6
38.3
30.4
29.7
CONTENT-ANALYSIS RESEARCH
Systematization
Kassarjian (1977) states (citing Berelson [1952] and
Holsti [1969]) that systematization requires research
procedures to (1) eliminate biased selection of communications or classification categories to suit the analyst's thesis and (2) examine scientific problems or hypotheses.
Appraisal of the first requirement demands expertise
in all applications of content analysis. Because such a
background is not reasonable, we focus on hypothesis
and theory testing and research designs.
Hypothesis Testing. Articles with formal statements
of predicted relationships between two variables (Kerlinger 1986), supported by research or theory, were
coded as having hypotheses. Statements in the form of
questions, suppositions, and general predictions without
the specificity of a hypothesis were classified as research
questions. Approximately 48 percent of the studies (n
= 62) stated research questions as the basis for empirical
investigation. Sixteen studies (12.5 percent) tested hypotheses. Thirty-nine percent (n = 50) had neither hypotheses nor research questions.
Theory Testing. Theory testing or theoretical perspectives were observed infrequently in the studies (n
= 7; 5.5 percent). While minimally present, content
analysis does have a role in theory-testing research. A
discussion of this role is presented in the overview of
systematization.
Data Collection Designs. Systematization was also
examined by the three data collection designs offered
247
by Holsti (1969).7 The first design describes the characteristics of communications. These studies address
the "who," ''what,"9and "to whom" questions of the
communication process. This design was used in 96 of
the reviewed articles (75.0 percent). The second design
makes inferences about the antecedent of communications. Here the analyst ascertains who the source was
(when unknown) and why the communication was encoded. This design was used in 22 studies ( 17.2 percent).
Finally, the third design makes inferences about the
effects of the communication. This level of data collection attempts to infer the receiver's decoding process.
Comparisons between the sender's and the receiver's
message content, or behavioral response, facilitates inference generation. Ten studies (7.8 percent) were
judged as making inferences about communications effects.
Differences in the use of these three orientations were
assessed by chi-square tests. No significant differences
in the designs were found by publication year, topic,
media analyzed, publication outlet, or citing of Kassarjian (1977).
Systematization Overview. More hypothesis-testing
research would enhance the perceived value of content
analysis to consumer research. The number of descriptive designs was not unexpected and relates to Holbrook's (1977) assessment of their prevalence in the
field vis-'a-vis the other designs. Conversely, environmental-influence and communication-effect studies
represent largely untapped research approaches.
It is myopic to suggest that atheoretical contentanalysis studies have no value. Kassarjian (1977, p. 9)
states that data need only to be "linked" by "some form
of theory." He also implies that formal theory does not
need to be present, but that the investigator's "theory"
should be represented in models, research questions,
scientific problems, or basic trend analysis, thereby giving the systematization needed for a meaningful contribution. Since the linkage between content analysis
and theory testing is not clear, we offer some structure
to this issue. Borrowing from Lijphart's (1971) cate-gorizations for case study research, we propose five roles
for content analysis in theory development.
First, content analysis is valuable in collecting data
about communications when there are no theoretical
underpinnings. Such atheoretical content analyses are
useful in fostering future research and theory-building
efforts because they collect information about a communication form.
Second, when scholars use a theoretical perspective
as the basis for collecting data, without intending to
make generalizations to a larger population (i.e., they
are merely attempting to describe or explain data), such
248
TABLE 4
No. of articles
1-200
201-300
301-600
601-1,000
1,001-2,000
Greater than 2,000
Multiplesamples used
Not discernible
33
15
22
20
15
15
5
3
Percentage of sample
25.8
11.7
17.2
15.7
11.7
11.7
3.9
2.3
Sampling Methods
Sampling addresses the issues of randomization,
manageability of sample size, and generalizability.
Randomization and generalizability were assessed by
examining sampling methods. Sample size was used to
evaluate the manageability issue.
The majority of samples were classified as convenience (n = 103; 80.5 percent). Probability samples (e.g.,
simple random, multistage, systematic, proportional,
stratified) were obtained in 18.8 percent of the studies.8
Significant differences were found for sample types on
the basis of publication outlet (X2 = 23.82, p < .001).
Random samples were found more frequently in the
Journal of Consumer Research and the Journal of Mar8In one study the sampling method could not be determined.
Reliability coefficient
No. of
articles
Percentage
of sample
Coefficient of agreement
Krippendorf alpha
Holsti
Others (three or fewer uses)
Method of calculating not discernible
No reliability coefficient reported
41
9
4
10
24
40
32.0
7.0
3.1
7.8
18.8
31.3
Reliability
Reliability in content analysis includes categorical
and interjudge reliabilities. The limited information
provided in methodology sections of other articles has
hindered evaluation of categorical reliability. Thus, this
discussion centers on interjudge reliability.
Interjudge reliabilities are largely influenced by the
procedural issues that we previously addressed in the
Objectivity section. However, two important issues remain: the calculation and reporting of reliabilities.
Interjudge reliability is often perceived as the standard measure of research quality. High levels of disagreement among judges suggest weaknesses in research
methods, including the possibility of poor operational
definitions, categories, and judge training.
Reliability Index Use. The most frequently used
reliability index was the coefficient of agreement (the
total number of agreements divided by the total number
of coding decisions; see Table 4). Often authors referred
to interjudge agreement or interjudge reliability without
specifying the calculation method. However, most
noteworthy and troublesome is the finding that over 30
percent of the articles did not report any reliability coefficient. The absence of reliability figures does not allow
a thorough appraisal of the analyst's work and raises
questions about the credibility of the findings.
Reliability Index Reporting. Forty-six of the articles
(35.9 percent) reported one "overall reliability" for the
249
CONTENT-ANALYSIS RESEARCH
study. Thirty-one studies (24.2 percent) reported reliabilities on individual measures, and 11 (8.6 percent)
reported ranges of reliabilities. These findings are of
particular concern since the "overall reliability approach" can yield misleading results. While agreement
may be high in the aggregate, low ratings on individual
measures may be hidden by pooled results. Consequently, ranges and individual reliabilities, in particular,
are superior reporting methods.
An analysis of reporting procedures by the major dependent measures (publication year, topical issue, and
so on) found only one significant chi-square calculation.
The reporting procedures differed when Kassarjian
(1977) was cited (X2 = 20.48, p < .001). Studies that
did not cite Kassarjian (1977) reported more incidents
of "no reliabilities" than did those that cited the article.
Kassarjian's discussion of reporting requirements may
have encouraged detailed reliability reporting.
Reliability Overview. Reliance on the coefficientof-agreement index suggests that attention needs to be
directed toward reliability calculation issues, given the
potential for biased scores (Scott 1955). One weakness
of the coefficient of agreement is the impact of the
number of coding decisions on the reliability score. As
the number of categories decreases, the probability of
interjudge agreement by chance increases. For example,
one would expect greater agreement with only two categories than with five categories because of the higher
probability of chance agreements.
Agreement also can be inflated by adding categories
that seldom are present in a communication. When one
calculates reliabilities including these categories, the
"agreements" on "nonrelevant" categories compensate
for disagreements on other classification categories.
While this applies to other indices of reliability, the
coefficient of agreement is especially subject to these
two problems. These factors limit the establishment of
standards with which to compare calculated reliabilities;
they also hinder comparisons across content studies
(Perreault and Leigh 1989).
Cohen's kappa (Cohen 1960), which has received extensive use in judgment-based coding procedures, was
used in only three of the studies examined. Kappa was
developed to remove the impact of chance agreements
among judges.9 However, the use of kappa is difficult
in content analysis because a key value, the number of
chance agreements for any particular category, is generally unknown. For nominal data, Cohen's kappa and
its derivations (Brennan and Prediger 1981; Umesh,
Peterson, and Sauber 1989) are viable methods for calculating reliabilities when the number of chance agreements is known or can be reasonably approximated.
9Cohen (1960) calculates kappa (K) as follows: K = (Fo -FJI(N
F), where N is the total number of judgments made by each judge,
F0 is the number of judgments on which the judges agree, and FCis
the number of judgments for which agreement is expected by chance.
-
CONCLUSION
This methodological review of content-analysis research identified weaknesses in the methods used and
reported. Of primary interest in this study was adherence to Kassarjian's (1977) standards. The results indicate that there are a number of gaps in the methods
and procedures used by analysts in the areas of objectivity and reliability. Since these two areas are at the
heart of content-analysis research and directly affect research quality, the seriousness of the problem becomes
evident.
Most factors pertaining to objectivity were either unreported or unattended by authors. Problems with reliability reporting and coefficient selection were also
present. When the reliability of content-analysis research is in question, either because of an inability to
replicate the study or ineffectual or unreliable coding,
the value of the research is minimized.
It is interesting that there was no significant change
in the procedures used in content analysis over the
sampling time span. The only variable that accounted
for differences in overall objectivity and reliability reporting measures was whether Kassarjian's article was
cited. These findings empirically demonstrate the con'"The Perreault and Leigh (1989) reliability index is as follows: I,
={[(F,/N) - (Ilk)][kl(k - 1)]}O5, for F/N>
Il/k, where Fo is the
observed frequency of agreement between judges, Nis the total number of judgments, and k is the number of categories.
250
REFERENCES
Berelson, Bernard (1952), Content Analysis in Communications Research, Glencoe, IL: Free Press.
Brennan, Robert L. and Dale J. Prediger (1981), "Coefficient
Kappa: Some Uses, Misuses, and Alternatives," Educa-