You are on page 1of 7

Ideogram Based Sentiment Analysis in Japanese Text

Tyler Thornblade

1. INTRODUCTION Network Sentiment Dictionary corpora. They then
expand the dictionary using two thesauri,
In recent years a growing body of literature has tong2yi4ci2ci2lin2 (Mei et al. 1982) and the Academia
applied techniques for sentiment analysis to languages Sinica Bilingual Ontological Wordnet (Huang et al.
other than English. While many of these works apply 2008).
similar techniques across differing languages, a
To develop character sentiment scores, they mine
technique notable for its specificity to ideographic
this dictionary seeking counts of the number of positive
languages has arisen: rather than just considering
and negative words in which each character appears.
sentiment at the word, sentence, and document level
fpci and fnci denote the frequencies of a character ci in
sentiment is also analyzed at the level of the constituent
the positive and negative words, respectively; n and m
characters of a word.
denote total number of unique characters in positive
This technique may be of unique value to and negative words, respectively (Ku et al. 2006).
ideographic languages for two reasons. First, all
multi-character words consist of characters that
themselves have distinct meanings (or sets of
meanings) (Henshall, 1988). Thus, information on
individual characters can inform judgments made on
composite words. Second, the concept of a word is not
well defined in some ideographic languages and many
errors may be introduced by the “word segmentation”
step in processing (Zagibalov & Carroll, 2008).
Moving to a character-based approach can avoid the The score for a character is:
need for segmentation.
Two recent papers apply this technique to the
Chinese language. In this work, we apply the same
approach to Japanese. We theorize that techniques for The score for a word is simply the average of the
assigning sentiment scores to individual ideograms in characters in the word:
order to develop sentiment scores for words will not be
as effective in Japanese as they are in Chinese without
some significant modifications.

2. RELATED WORKS Thus words with strong positive sentiment will have
values much greater than zero, and negative words will
likewise have values much less than zero. Words with
2.1. Ku, Liang, and Chen weak sentiment scores will be found near zero. They
use this technique to expand their small seed dictionary
Ku, Liang and Chen (2006) introduce a character- to be able to handle novel words that are composed of
based technique using a simple approach. They first known characters.
construct a sentiment dictionary at the word level. In
their paper, the base dictionary is constructed from the
General Inquirer (Stone et al. 1966) and the Chinese
2.2. Huang, Pan and Sun 1. Both characters have the same meaning.

Huang, Pan and Sun (2007) use a very similar 2. The characters have opposite meanings.
approach. As their base dictionary they use the 3. The top character modifies the bottom
NTCIR emotion dictionary (Kando 2008) and also character.
utilize tong2yi4ci2ci2lin2 as their thesaurus. Their
approach is otherwise identical and they cite Ku et al. 4. The bottom character is the target, direct
for this technique. object, or complement of the top character.
5. The top character negates (“flips”) the meaning
 2.2. Results of the bottom character.

Both works in this section report reasonable system The first two classes would not seem to present a
results for the sentiment analysis task at the sentence problem for the “bag of words” approach as the
or document level, yet neither paper reports the results sentiment values will either reinforce or cancel each
at the word or phrase level. Therefore, it is hard to other out, respectively. However, the remaining three
establish a baseline for the performance of this phase. classes may present a problem as each of these classes
can result in a word whose meaning is not simply the
sum of its parts.
3.2. Experiment
The technique of Ku et al. at the word level can be
considered analogous to the “bag of words” approach In order to evaluate the efficacy of this technique in
to sentential understanding. Much as “bag of words” Japanese, we formulate a simple experiment as a proof
ignores syntactic information encoded in the sentence, of concept. Using the Japanese sentiment dictionary of
this simple approach ignores information contained in Kaji and Kitsuregawa (2007) we apply the approach of
the composition of multi-character words. We theorize Ku et al.
that system performance in Japanese will be limited by The Kaji and Kitsuregawa data is not intended to be
this lack of compositional knowledge. a word-based dictionary and contains a significant
number of bigrams and trigrams. Since these
3.1. Linguistic foundation multiword entries are amenable to syntactic analysis,
they are not the target of this experiment and the first
The Japanese language uses three different writing step is to clean the data and remove them. This
systems: Chinese characters and two phonetic scripts reduces the number of usable entries from 10,000 to
(Jorden and Noda, 1998). A significant portion of the 2386.
Japanese vocabulary consists of loanwords from
Chinese that are written with the same or similar Next, we extract the unique characters and generate
characters. It is compounds of this nature that seem sentiment scores for every Chinese character in the
most amenable to the technique of Ku et al. corpus using the same approach as Ku et al. We do
not generate sentiment scores for script characters for
Additionally, there are words which combine the reason cited in section 3.1. This results in a total
Chinese characters with script (referred to as of 954 scored characters.
okurigana) and words which are written entirely in
script (referred to as kana). These latter words seem Finally, we generate sentiment scores for each word
less relevant, because unlike Chinese characters the in the corpus using the character scores. We evaluate
individual script characters do not have independent the results by comparing the sign of the score from the
meanings. Since Chinese does not have any equivalent sentiment dictionary to the sign of the score by our
script, this is a class of words on which performance in method, ignoring intensity.
Japanese is likely to suffer compared to that of It should be stressed that this is considered at best a
Chinese. proof of concept in applying this technique to Japanese
In Japanese there are five canonical ways in which and this is not intended to be a rigorous experiment.
characters can be combined to make two-character By training and testing on the same corpus we
compounds (Yamamoto, 2008). These are: effectively simulate perfect knowledge of the sentiment
scores of the words. Thus, the results should be
interpreted more as an upper bound on the
performance of the approach than as results for
comparison to other sentiment analysis systems.


4.1. Overall results

Overall results for this approach are surprisingly
high, with the precision scores being fairly impressive
for both negative and positive sentiment. Although we
cannot draw conclusions about whether this level of
performance is achievable in a real-world system, it
does show rather emphatically that despite its
simplicity the technique does not suffer from structural
limitations that would prevent it from being effective
given a reliable source for word-level sentiment data.

4.2. Detailed results for character sentiment  
In order to show that the system is indeed
characterizing character sentiment in a manner similar 5. ERROR ANALYSIS
to that of a human, we present the top 10 most positive
and negative characters along with their most common In this section we examine the output of the system
English meaning or set of meanings. We believe the in an attempt to identify systematic errors. Due to time
results are intuitive. constraints, it was not possible to analyze all errors.
Instead, we attempt to choose a representative subset.
Of the 475 errors that occurred, 100 were selected for
analysis: 50 from the false positives and 50 from the
false negatives. This list was further pruned to
eliminate bigrams, words that contained script
characters as part of the compound, and words that
consisted of only a single Chinese character. The
motivation for this pruning was that we are interested
most in errors due to lack of compositional knowledge,
and the above classes of words cannot by their nature
demonstrate such errors.
After pruning we arrive at 15 false positives and 31 5.2 Script characters
false negatives for analysis. For each word, we
examine it to see if a compositional explanation exists In section 3.1 we introduced the fact that there are
for the error. E.g. for composition class 5, where the two major classes of words containing script
first character negates the second, we look for cases characters: those entirely composed of script, and those
where the sentiment of the second character is opposite combining both script and Chinese characters.
in sign to the true sentiment of the word. For the former, 150/475 or 31.6% of positive
sentiment errors were due to the fact that we could not
5.1 Errors due to missing compositional   develop any kind of sentiment score for a word
knowledge consisting entirely of script. This class made up
201/536 or 37.5% of the negative sentiment errors.
In this section, references are made to the
It is difficult to gauge the impact of script
compositional classes introduced in section 3.1.
characters on the latter class, although it is likely
Of the false positives, 33% were found to be nonzero. As a result, we can conclude that at least
explained by a lack of compositional knowledge. 34.7%, and possibly more, of all errors were due to
There were two classes of such errors. The first was script characters.
composition class 5, or compounds where the first
character negates (or flips) the meaning of the second 5.3 Problems with source data
character. An example is the word “ 無 難 ” (safety)
which is formed of the characters “not” and “difficulty, The simple methods used to prune bigrams and
hardship”. Although both characters have negative trigrams from the source data were not completely
sentiment scores, the overall score should be the effective. Examination of the output shows that
reversed value of the second character. These errors approximately 4-5% of the entries in the training
represented 6.7% of the total. corpus are bigrams, which could introduce noise into
the data.
The second is composition class 3, specifically
compounds where the first character is emphasizing a Finally, although for the purpose of this paper the
positive character. An example is “ 重宝”(priceless) sentiment dictionary of Kaji & Kituregawa was
consisting of “heavy” and “treasure”. The first considered the gold standard, we must point out that
character is associated with a negative sentiment score, there are several items in the error analysis that call
yet in this compound it serves to enhance a positive into questions their sentiment scores.
character and thus the score should be positive. These
Some specific examples are 無用 and 不用, both of
errors represented 27% of the total.
which mean “useless” yet quizzically received high
Of the false negatives there were again two classes positive sentiment scores and showed up as false
of errors, and 54.8% of the errors can be explained by negatives for positive sentiment in these results.
them. The first class is again composition class 5
representing 3.2% of the errors. 5.4 Detailed ErrorAnalysis
The second class is not one of the five The first figure below shows the full list of multi-
compositional classes introduced in section 3.1. It is a
character false positives. The column headings are
novel group making up 51.6% of the errors and respectively the word itself, the score as calculated by
consisting of words that include the character “ 的 ” .
Kaji & Kitsuregawa, the score for the word via the
This character has no meaning in itself for the words in method of this paper, and finally the scores for the
this class; it simply changes the part of speech of the
individual characters in the word(Kaji & Kitsuregawa,
word. Therefore, it should not influence the word’s 2007).
sentiment score, but since it has a large negative value
it results in many false negatives.
If the results of this sample were representative
across the corpus, it would indicate that roughly 27%
of all errors are due to compositional knowledge.
The second figure below repeats this analysis for
the false negatives. The detailed analysis was not done
for all words because their sentiment value was
dominated by the single character “的” which has a
very high negative value of -24.


6.1 Conclusions
We hypothesized in section 3 that the technique of
Ku et al. would be less effective on Japanese due to Second, despite the surprise of the overall results,
characteristics of the language. After performing the the hypothesis that there would be structural issues
experiment, the results are not exactly what we had with Japanese was shown to be accurate. As much as
supposed – yet in many ways are more interesting than 60% of the errors can be explained between the
what we had expected. compositional rules and the system’s inability to
process script characters.
First, the overall results were quite good. F1 scores
above 0.7, while perhaps not state of the art, are
certainly not poor. Although this was unexpected, it 6.1 Future work
has the benefit that it is easier to draw conclusions The overall results encourage further work in
from good results than from bad and we can at least adapting this technique to Japanese. The first step
conclude that this general method does appear to hold should be to apply the technique with more rigor. A
promise. simple way to do this would be to do an annotation
study of novel content and apply the character
sentiment data developed in this work to it. From this,
meaningful conclusions could be drawn about the
efficacy of the approach. A more complex experiment
would be to develop an independent Japanese sentiment
dictionary and then perform the experiment; the
sentiment dictionary of Kaji & Kitsuregawa a large
amount of data and it would be ideal to start from a
more robust corpus.
Next, it would be interesting to experiment with
compositional features. Preliminary results seem to
indicate that approximately 27% of the errors could be
addressed with compositional knowledge and
appropriate logic. A potential barrier to this is that we
are not aware of any current lexical resources that
could be used to implement it.
Finally, a large percentage of the errors
(approximately 35%) are due to script characters.
Performance cannot be improved on these words using
the techniques of this paper since the script characters
are not sentiment bearing. A hybrid approach could,
however, be applied that combines character and word-
based sentiment analysis to improve performance.
REFERENCES Processing. Cambridge Studies in Natural Language
Processing. Cambridge: Cambridge University Press.
[1] Rui-hong Huang and Le Sun and Long-xi Pan. "SCAS
in Opinion Analysis Pilot Task: Experiments with [10] STONE, Philip J., SMITH, Marshall S., OGILIVIE,
sentimental dictionary based classifier and CRF Daniel M., DUMPHY, Dexter C. The General
model" Proceedings of the Sixth NTCIR Workshop Inquirer: A Computer Approach to Content Analysis.
Meeting on Evaluation of Information Access MIT Press. 1966.
Technologies: Information Retrieval, Question
Answering and Cross-Lingual Information Access,
2007. [11] Kando, Noriko. "Evaluation of Information Access
Technologies: Information Retrieval, Question
Answering, and and Cross-Lingual Information
[2] L. W. Ku, Y. T. Liang, and H. H. Chen, "Opinion Access." The 6th NTCIR Workshop (2006/2007). 01
extraction, summarization and tracking in news and December 2008. <
blog corpora," Proceedings of AAAI-2006 Spring ws6/ws-en.html>.
Symposium on Computational Approaches to
Analyzing Weblogs, 2006.
[12] Jordan, Eleanor Harz and Noda, Mari. Japanese: The
Written Language. Boston: Chen & Tsui Company.
[3] Kaji, N. & Kitsuregawa, M. "Building Lexicon for 1998.
Sentiment Analysis from Massive Collection of HTML
Documents" Proceedings of the 2007 Joint Conference
on Empirical Methods in Natural Language
Processing and Computational Natural Language
Learning (EMNLP-CoNLL), 2007.

[4] Kim, S. and Hovy, E. 2004. "Determining the
sentiment of opinions" Proceedings of the 20th
international Conference on Computational
Linguistics (Geneva, Switzerland, August 23 - 27,
2004). International Conference On Computational
Linguistics. Association for Computational
Linguistics, Morristown, NJ, 1367.

[5] Yamamoto, Natsuhiko. "漢字検定2級体験記 熟語
の構成." 山本夏彦の本 つかぬことを言う. 20
October 2008.

[6] Henshall, Kenneth. A Guide to Remembering
Japanese Characters. Tokyo: Tuttle Publishing, 1988.

[7] Zagibalov, T. and J. Carroll "Automatic seed word
selection for unsupervised sentiment classification of
Chinese text" Proceedings of The 22nd International
Conference on Computational Linguistics (COLING),
Manchester, UK, 2008.

[8] Mei, J., Zhu, Y. Gao, Y. and Yin, H..
tong2yi4ci2ci2lin2.Shanghai Dictionary Press. 1982.

[9] Huang, Chu-Ren, Ru-Yng Chang, and Shiang-bin Li.
2008. Sinica BOW: A bilingual ontological wordnet.
To appear in: Chu-Ren Huang et al. Eds. Ontologies
and Lexical Resources for Natural Language