Professional Documents
Culture Documents
INTRODUCTION
As social media and the Web environment continue to
grow and evolve, with large no of users using social networking
sites,it has become a media where millions of users share their
views about various domains.Content analysis is one of the
most powerful methodologies.). Today, a variety of content
analysis measurement techniques are available to support
content analysis tasks within different disciplinary contexts in
distinct ways. Communication researchers have heavily relied
on both human- and computer-based content analyses to
examine symbols of communication and to make valid
inferences about communication. However, the trade-offs
between human and computer coding in terms of validity,
reliability, and large-scale data processing often mar
researchers abilities to identify characteristics within the text
and draw inferences from mediated messages. Human coding
methods maximize the validity of measurement, but are often
limited in their ability to deal with large databases. Computer-
coding methods maximize reliability and can efficiently deal
with large collections of data, but have traditionally been
criticized in terms of their ability to understand the subtle latent
meanings of opinion expression in the same way as human
coders. This inherent tension between human- and computer-
based coding is nothing new, but is only beginning to be
adequately addressed by communication scholars. The main
purpose of this study is to argue the need for communication
scholars to rely on emerging tools for content analysis that
capitalize on the strengths of human coding and intelligent
algorithms. Using one exemplar a supervised, learning-based
hybrid method, developed by Hopkins and King (2010) (referred
to henceforth as HK) among other similar software programs
that are currently available. The hybrid approach that is
advocate for preserves semantic validity, a strength of
humanbased coding, while also being applicable to large
quantities of data.. Using nuclear power and nanotechnology as
our focal issues, we tracked opinion expression on Twitter
before and after the Fukushima Daiichi disaster using the HK
content analysis method.The demonstrated hybrid method is a
supervised machine learning technique relying on a software-
based algorithm.. This study discusses the various advantages
of the hybrid supervised learning technique as compared with
traditional content analysis tools.
CHAPTER 2
METHODOLOGICAL ADVANCEMENTS
In the Web 2.0 environment, researchers are now able to
retrieve a seemingly infinite amount of digital content for
analysis .Not surprisingly, this has resulted in increased interest
in sentiment analysis or opinion mining, a specific form of
content analysis that identifies how sentiments, opinions, and
emotions are expressed about a given subject in text-based
documents, such as social media messages. Importantly, the
creation of data in the new media environment has greatly
outpaced the capacity of conventional sentiment content
analysis approaches . Here, first review the key characteristics
and challenges of human- and computer-based content analysis
before introducing why combining the merits of these two
approaches is a necessity in the Web 2.0 environment. The
comparisons will focus on three key areas: reliability, validity,
and efficiency.
CHAPTER 3
Comparisons of human- and computer-
based content analysis
3.1 Reliability
3.2 Validity
3.3 Efficiency
CHAPTER 4
CONVENTIONAL METHODS
4.1 Dictionary Approach
4.2Statistical Approach
CHAPTER 5
An example of hybrid content analysis
method: The HK method
The HK method (the specific hybrid method employed in our
study) is based on a twostep process. The first stage involves
intensive human input in reading and classifying a sample of
social media corpora that are randomly extracted on the basis
of a set of user-determined keywords. At the second stage, the
HK algorithm learns from the subsample of online texts labeled
by human coders as being representative of particular types of
opinion categories. The trained classifier is then used to derive
the aggregate distribution of classifications of all unread
documents using an automated analysis.1 To understand the
aggregate proportion of opinion classifications across a set of
population documents, the single documents (e.g., tweets) are
first decomposed into a set of word stems that can then be
represented by a list of stemmed unigram vectors to create
word stem profiles for each document (see Hopkins & King,
2010 for further details). Mathematically, the HK algorithm can
be expressed by the following formula: P(D) = P(S) P(S|D) 1 .
(1) The goal of the HK method is to obtain an estimate of P(D),
which is the multinomial frequency distribution of opinion
across all population documents that fall into each possible
opinion category, as the collection of all possible opinion
categories among population documents is indicated by D. The
word stem files, expressed by S, consist of a set of variables
that provide a summary of all the word stems that appear in
each document. Estimation of the multinomial frequency
distribution of word terms P(S) can be directly achieved through
tabulating all the population documents. However, the
conditional probability of each word stem profile occurring in
the population in each opinion category, indicated by P(S|D),
cannot be directly observed. Consequently, the HK method
makes a key theoretical assumption that the estimation of P(S|
D) in the population is the same as the conditional frequency
distribution of word stem pro- files in the hand-coded training
samples, Ph(S|D). In other words, the texts of the humancoded
training set are assumed to be homogeneous across the
population set of documents, which suggests that the
conditional distribution of P(S|D) can be estimated by referring
to the human-labeled training set of texts. We can then rewrite
Equation (1) to the following formula that uses estimates of
P(S) and Ph(S|D) to derive the target estimation of P(D): P(D) =
P(S) Ph(S|D) 1.Outside of the ability to provide reliable and
efficient automated textal analysis, a major advantage of the
HK method is its ability to perform human-based
computeraided content analysis. Human coders are better
prepared than computers to make sense of ambiguous
materials with thwarted or negated expressions, to spot irony in
posts, and to recognize alternative forms of expression (e.g.,
abbreviations and neologisms). Another major characteristic of
the HK method is that it provides populationlevel estimates of
the aggregate proportion of content (e.g., tweets, blogs, or
Facebook posts) within each opinion category, rather than
conducting individual-level classification of a single document.
That is, the method does not classify individual tweets or posts,
which oftentimes may express multiple sentiments, but
estimates the proportion of content across the categories of
interest.
CHAPTER 6
Bayesian Network and Recursive
Partitioning Analysis on Health and
Environment Keyword
Since keywords in health and environment topics directly
affect safety of human body and environment, the
corresponding topics received much more public attention than
the other topics, on location and nuclear power. We now
observe the mentioned rate classified by keywords via
Bayesian networks. In this graphical model, nodes represent
random variables and arrows represent the probabilistic
dependencies among nodes that are identified by recursive
partitioning analysis, a statistical method for multivariable
analysis on each topic. This model helps us understand a
sparse set of keywords of direct dependencies..
REFERENCES
Leona Yi-Fan Su Department of Life Sciences
Communication, University of Wisconsin-Madison,
Madison, WI, USA(2016)