You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221273608

Opinion Mining and Social Networks: A Promising Match

Conference Paper · July 2011


DOI: 10.1109/ASONAM.2011.123 · Source: DBLP

CITATIONS READS

12 1,408

2 authors, including:

Mikolaj Morzy
Poznan University of Technology
80 PUBLICATIONS   462 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Using network analysis to improve classification quality of non-network data View project

Avaya Conversational Intelligence View project

All content following this page was uploaded by Mikolaj Morzy on 27 May 2014.

The user has requested enhancement of the downloaded file.


Opinion Mining and Social Networks:
a Promising Match

Krzysztof Jędrzejewski, Mikołaj Morzy


Institute of Computing Science
Poznan University of Technology, Poznan, Poland
Krzysztof.Jedrzejewski@cs.put.poznan.pl, Mikolaj.Morzy@put.poznan.pl

Abstract—In this paper we discuss the role and importance of The main goal of social network analysis is the study
social networks as preferred environments for opinion mining of structural properties of networks. Structural analysis of
and sentiment analysis especially. We begin by briefly describing the social network investigates the properties of individual
selected properties of social networks that are relevant with vertices and the global properties of the network as
respect to opinion mining and we outline the general
a whole. It answers two basic classes of questions about
relationships between the two disciplines. We present the related
work and provide basic definitions used in opinion mining. Then, the network: what is the structural position of any given
we introduce our original method of opinion classification and we individual node and what can be said about groups
test the presented algorithm on real world datasets acquired from (communities) forming within the network. The main
popular Polish social networks, reporting on the results. The measurement of a node’s social power (also called
results are promising and soundly support the main thesis of the member’s prestige) is centrality, which allows to determine
paper, namely, that social networks exhibit properties that make node’s relative and absolute importance in the network.
them very suitable for opinion mining activities. There are several methods to determine node’s centrality,
such as the degree centrality (the number of links that
Keywords: opinion mining, sentiment analysis, social
computing, social networks
connect to a given node), the betweenness centrality
(the number of shortest paths between any pair of nodes in
I. INTRODUCTION the network that traverse a given node) or the closeness
Graphs and networks certainly rank among one of centrality (the mean of shortest paths lengths to other
the most popular data representation models due to their nodes in the network).
universal applicability to various application domains. From the point of view of opinion mining the ability to
The need to analyze and mine interesting knowledge from assess the node’s prestige is essential as it allows to
graph and network structures has been long recognized, but differentiate between opinions of different individuals.
only recently the advances in information systems have More specifically, node’s prestige allows to assign
enabled the analysis of graph structures at huge scales. different weights to opinions and associate more
Analysis of graph and network structures gained new importance to opinions expressed by prominent
momentum with the advent of social networks. While the individuals. Another factor that is often considered in
analysis of social networks has been a field of intensive opinion mining is the identification of influential
research, particularly in the domains of social sciences and individuals. An influential individual does not have to be
psychology, economy or chemistry, it is the emergence of necessarily characterized with high degree centrality to
huge social networking services over the Web that influence the average opinion within the network. Usually,
spawned the research into large-scale structural properties such individuals are characterized by high betweenness
of social networks.. Social networks exhibit a very clear centrality, impacting the dissemination of opinion rather
community structure. Such community structure partially than forming the opinion. For instance, an individual with
stems from objective limitations (e.g., internal high betweenness centrality can stop a negative opinion
organizational structure of a company can be closely from spreading through the network, or, on the other hand,
represented by the ties within a particular social network) she can amplify the opinion.
or, to some extent, may result from subjective user actions Due to psychological reasons humans tend to form their
and activities (e.g., bonding with other people who share opinions in such way that the opinions conform with
one’s interests and hobbies). Unveiling the true structure of the norm established within a given social group. Thus,
a social network and understanding of communities when mining opinions one has to take into consideration
forming within the network is the key factor in the influence of the context in which the opinion is
understanding what the future structure of network will be. forming, i.e. the social milieu of an individual. Social
networks are highly effective in bolstering group formation
of similar individuals. Groups of nodes that share common natural language processing. Sentiment Analysis methods
properties tend to get connected in the social network. can be regarded both as a supervised [1][5] and an
Communities are densely inter-connected and have fewer unsupervised learning methods [6][15], and an information
connections to nodes from outside of a group. When retrieval methods [16][18]. Many works concerning
provided with the information on group membership of opinion mining present conceptions based on dealing with
an individual, the opinion mining algorithm can utilize this text documents modelled as sets of words [1] or vectors,
knowledge to improve on the accuracy of opinion where dimensions represents words and values are weights
prediction. This improvement stems from the fact that of words in the document [2].
the inclusion of community information in the opinion In the vast majority of sentiment analysis methods,
mining algorithm allows for group-specific behaviour and information about connotations of a word with a positive
norms to be accounted for when trying to assess e.g. or a negative class is used to calculate document’s
the semantic orientation of an opinion. semantic orientation γ
Opinion mining is the domain of natural language
 , 
 > 0
 =  
processing and text analytics that aims at the discovery and
 , 
 < 0
extraction of subjective qualities from textual sources. (1)
Opinion mining tasks can be generally classified into three
types. The first task is referred to as sentiment analysis and where
aims at the establishment of the polarity of the given

 =
∑೟ ∈೏ 

source text (e.g., distinguishing between negative, neutral ೔
(2)
and positive opinions). The second task consists in | |
identifying the degree of objectivity and subjectivity of or
a text (i.e., the identification of factual data as opposed to

 = ∑
೔∈  
opinions). This task is sometimes referred to as opinion
(3)
extraction. The third task is aims at the discovery and/or
summarisation of explicit opinions on selected features of
the assessed product. Some authors refer to the this task as where  is the i-th term of the document d, || is
sentiment analysis. All three classes of opinion mining the number of terms appearing in the document d,  and
tasks can greatly benefit from additional data that may be  are positive and negative classes, respectively, and
provided from the social network. Added knowledge may score() is a function that assigns positive or negative
include: a node’s centrality indexes, a node’s group values to terms, depending on their relationship with
membership, nomenclature utilized within the group, the respective class.
average group opinion on selected products, group’s Semantic orientations of individual terms are
coherence and cohesion, etc. All these variables enrich aggregated using a dictionary method [5]. This method
opinion mining algorithms and provide additional uses two small sets of manually identified positive and
explanatory capabilities to constructed models. negative adjectives, which serve as seed sets. New terms
This paper is organized as follow. In Section II we are subsequently added to these sets if they are linked by
present some related work on opinion mining. Section III semantically loaded conjunctions such as “and”, “but”,
introduces basic concepts used in opinion mining. In “however”, etc.
Section IV we present an original algorithm for Some opinion mining algorithms use the pointwise
discovering opinion polarity. Section V describes the mutual information measure to determine semantic
datasets gathered from popular Polish social networks, and orientation of a term [3][4][6]. In this case semantic
the results of conducted experiments are reported in orientation of a term is inferred from the association
Section VI. The paper concludes in Section VII with a between the term and a word (or a set of words) assigned
brief summary and a future work agenda. unambiguously to only one class (positive or negative),
e.g. excellent and poor. The pointwise mutual information
II. RELATED WORK of the term t and the word w is defined as

,  = log (
())
Literature related to social network analysis is 
,
extremely abundant and rich. The first proposals to (4)
perform social network analysis originated in the domains
of social sciences and psychology [12] or economy [13]. where ,  is the joint probability of the term t and
Interestingly, much of this research rephrased what has the word w occurring together, while  and () are
been previously discussed in physics within the context of probabilities from individual distributions of t and
complex systems [14]. The most thorough summary of w assuming their independence.
social network analysis topics, models and algorithms can Semantic orientation of the term t is defined as
be found in [17].
Opinion mining is a relatively new domain spanning
between the fields of data mining, machine learning and
of text classification problems, the use of lemmatisation
 = ,   − ,   (5) may result in deterioration of the classification accuracy,
due to the possible occurrence of words in different forms
where
,  and
,  are values derived from one lemma, depending on the document
affiliation to one of the classes.
calculated in accordance to the formula (4) for positive and
Stemming is a process similar to lemmatisation. It aims
negative classes, respectively.
to extract the core of the word, referred to as the stem,
As the individual probabilities of terms are difficult to
from the inflectional word forms. Stemming typically
compute, sometimes a heuristic is employed. This heuristic
involves removal and replacement of prefixes and suffixes.
considers the number of documents in the database where
The result of stemming does not need to be and often is not
the term t is placed near semantically loaded words. Then,
a proper lemma. The best known stemming algorithm is
the pointwise mutual information becomes
the Porter stemmer [7].
A stop-list is a set of words that should be removed at
,
  
ܲ‫ ܫܯ‬ሺ‫ ݐ‬ሻ = log ( ) (6) early stage of text processing. In most cases, these are
,  
 conjunctions and other words which do not contribute
additional information to the content of the sentence.
where ℎ  ,  and ℎ ,  denote the number Often, stop-list words are present in the sentence solely
of documents in which the term t occurs close to at least due to the requirements of language’s grammar. In many
one word representing positive and negative class, cases the use of stop-lists improves accuracy and
respectively, while ℎ  and ℎ  represent performance of text document processing.
the number of documents in which these words occurs. A term is a token generated from the document. It may
be a word, a lemma, a stem or an n-gram. An n-gram is a
Among methods based on the concept of supervised sequence of n letters appearing in a document content, e.g.
learning, similar assumptions are presented by score the character string “opinion mining” may be split into the
classification algorithm [1]. In this case, the term scoring following 8-grams: “opinion_”, “pinion_m”, “inion_mi”,
function has following form: “nion_min”, “ion_mini”, “on_minin”, “on_minin”,
“n_mining”. N-gram representation of documents is often

|   (|  ) an alternative to term representation. N-grams are lossless,
‫ ݁ݎ݋ܿݏ‬ሺ‫ݐ‬ሻ = (7) because the text may be rebuild, e.g. with use of algorithms

|   (|  )
for DNA sequencing [8]. N-gram representation allows
where |  and (| ) are conditional probabilities of the same operations on documents’ collections as the term
the occurrence of the term t in positive and negative class, representation, but in addition offers extended functionality
respectively. These probabilities may be approximated by (e.g., spelling corrections).
term occurrence frequencies in the training set.
IV. OUR APPROACH
Another popular concepts described in the literature
on sentiment analysis are families of methods based on The method proposed in this paper for determining
the use of Naive Bayes classifier, e.g. [20], and Support term’s semantic orientation is a variant of the method
Vector Machines [1][1]. For other popular models and used in [1]. The drawback of the original method is that it
algorithms, and other opinion mining tasks, e.g. opinion assigns maximum or minimum value to all terms if they
occur in only one class, regardless of the number of
extraction, we refer the reader to [19].
occurrences. Therefore, we have proposed an alternative
III. BASIC CONCEPTS way of calculating the semantic orientation of a term. Our
method is based on the ratio of term occurence frequency
in documents assigned to positive and negative classes.
Opinion mining algorithms utilize heavily the available According to our approach the scoring function for
text mining techniques. In order to clarify the description assigning positive and negative scores to terms becomes

− 1 , iff  ≥ 1
of our approach and the description of conducted

() = 
experiments we introduce some basic notions from
− − 1 , iff  < 1
the domain of text processing and mining.  (8)
Lemmatisation is a process of identifying the lemma of ೟
a word. Algorithms for performing this operation typically where
use dictionaries, where they look up the primary form of
the word. Lemmatisation may find several different
| 
‫  |
= ݌‬ (9)
lemmas for a given word, if the word is the inflected form 
of many various lemmas. The use of lemmatisation reduces
the number of terms present in the corpus and allows where, pt is the raw semantic orientation of the term t,
matching of words in documents, even if words tend to |  and |  are conditional probabilities of
occur in different grammatical forms. However, in the case occurences of the term t in documents from positive and
negative classes, respectively, and ɛ is a small positive the reviews with grades 1 or 2 are considered negative, and
value controlling for terms that appear in only one class. In the reviews with grades 4 and 5 are considered positive.
our experiments we have used the reciprocal of |C*| as ɛ We have discarded neutral reviews with grade equal to 3.
value, where C* denotes the majority class. We refer to The dataset consists of 1055 negative reviews and 9068
the original method presented in [1] as the score method, positive reviews.
and we refer to the above described modification as The second dataset contains opinions on consumer
the proportional method. products aggregated by the website Ceneo. Among the
Example: Let us compute token polarity evaluation in reviews graded from 0 to 5, we have chosen 16 674
the way presented above. Let’s assume training set positive reviews graded 4 or 5, and 793 negative reviews
contains 1000 positive and 200 negative examples, token T with grades 0 or 1. Again, we have discarded all neutral
occured 9 times in positive examples, and 3 times in reviews.
negative examples. Then  =

= 0.001 , The third dataset comes from the website Znany lekarz
;
|  = = 0.009 and |  = which gathers opinions about physicians. We have
 
= 0.015 , so
  assumed that opinions associated with grades 1 and 2 on
 = .. = 0.625 ,  = − . −
.. 
thus a scale 1-6 are negative, and opinions with grades 5 and 6
1 = −0.6. indicate a positive feedback. The dataset contains 2380
negative opinions and 11 764 positive opinions. In addition
The score value of a term determined as above we have performed tests using an aggregated dataset
increases or decreases with changing frequency of term created by merging the three datasets. The aggregated
occurrences in positive or negative class, even if the term dataset contains 4228 negative opinions and 37 506
occurs in only one class. Similarly to the score method, positive opinions.
the disadvantage of the proportional method is the noise
resulting from an insufficient number of term’s instances B. Performance measures
in the training set. However, when proportional method is To evaluate the effectiveness of the classification we
used, the influence of the noise is limited in comparison to have used two measures: classification accuracy (A) and
the score method. This limitation results from the use of binary classification quality (Q). The latter measure is
the scaling value ɛ. The score value assigned to a term similar to the F1 measure, but takes into account precision
which occurs only once in the training set is limited by and recall achieved in both classes. These measures are
the ratio of cardinalities of classes, whereas the semantic expressed by equations:
orientation of terms characteristic to positive or negative
=
documents is often orders of magnitude greater. To further
೛ 

(11)
reduce the impact of the noise on the effectiveness of
೛ ೛ 
೙ ೙
the algorithm, we propose to add filtering by removing
from the dictionary terms that occur in fewer than β
,iff 0 ∉ ሼ‫ܿ݁ݎ‬௉ , ‫ܿ݁ݎ‬ே , ‫ܿ݁ݎ݌‬௉ , ‫ܿ݁ݎ݌‬ே ሽ
ܳ = ቐೝ೐೎ುାೝ೐೎ಿା೛ೝ೐೎ುା೛ೝ೐೎ಿ 

documents భ భ భ భ

,iff 0 ∈ ሼ‫ܿ݁ݎ‬௉ , ‫ܿ݁ݎ‬ே , ‫ܿ݁ݎ݌‬௉ , ‫ܿ݁ݎ݌‬ே ሽ


|∗ |
(12)
 = | +2 (10) 0
# |

where C* denotes the majority class and C# denotes where


 =  =
the minority class in the training set.


Setting the threshold β of term occurrences in (13, 14)

೛ ೛
೙ ೙

 =  =
the training set allows to eliminate terms that are not


(15, 16)
characteristic for any of the document classes, i.e. these
೛ ೙
೙ ೛
terms for which conditional probabilities of term
occurrences are similar for both classes, but which where tp and fp are true positives and false positives (i.e.,
occurred too rarely in the training set, to have their numbers of positive examples from the test set classified
evaluation been determined to be equal or close to zero. correctly and incorrectly), and tn and fn are true negatives
and false negatives (i.e., numbers of negative examples
V. EXPERIMENTS from the test set classified correctly and incorrectly).
A. Test sets The binary classification quality measure Q, in contrast
to the accuracy of classification A, is not vulnerable
The main objective of experiments was to test
to a considerable disproportion between the sizes of
the accuracy of the classification algorithm proposed in
the positive and negative class. This vulnerability is
Section IV. We used collections of opinions harvested
noticeable when the classifier prefers the majority class for
from the e-commerce site Merlin, and two social networks
which the number of examples in the test set is many times
Znany lekarz and Ceneo. The first dataset is the collection
of movie reviews from the Merlin website. The reviewers greater than the number of examples of the minority class.
were grading movies using the scale from 1 to 5, where
C. Experiment setup Table 1: Sums of ranks from descending rankings of algorithm
configurations calculated independently for each set based
bas on values of
We have performed the 10-fold fold cross-validation
cross quality Q and accuracy A. Value in each row is the sum of all ranking
experiments with both document representations:
representation based on positions assigned
ned to algorithm configurations.
configurations
terms and n-grams. As the baseline we have used the score Conf. element value Q A
method. In case of representations based on terms we have
tested classification performance both using lemmatisation No pre-processing 330 283
and stemming, and without text pre-processing.
pre For Stemming 379 383
stemming we have used Stempel [10], [10] for lemmatisation Lemmatization 491 534
we have used morfologik-stemming [9]. We have also
tested the impact of dictionary filtering and removal of Proportional method 184 170
stop-words. Stop-word
word lists were created based on Polish Prop. method + filtering 286 262
Wikipedia [11]. In case of n-gram representations we have
Score method 359 292
tested the dependency of classification performance from
n-gram length and the impact of dictionary filtering. We Score method + filtering 371 476
have not considered the impact of character case used in with stop-words 637 629
the original text, i.e., in all experiments all letters in all w/o stop-words 563 571
documents were converted to lower cas case. When
the assignment of a document to either of the classes was A. Experiments on term representation
not possible due to the lack of available tokens, we were Table 1 shows best performance of classification when
assuming that the document has been erroneously no stemming or lemmatization was performed. We believe
assigned. that the reasons
easons for such behavior stem from the properties
of the Polish language, where the vocabulary used to
VI. RESULTS
express positive and negative opinions strongly overlaps,
In this Section we present the results
re obtained by but the frequency of specific grammatical forms
fo may vary.
running all combinations
binations of test described in Section V. For example, the word używać
uż ć (to use) may occur more
frequently in negative context (Nie będę więcej używać
tego produktu – I will never use this product again) than in
positive context (Używamżywam tego produktu i jestem
zadowolony - I am using this product and I am a satisfied).
The influence of the removal
emoval of stop words is minimal and
ambiguous.

Figure 1: Maximal and average values of accuracy and quality, achieved


during the tests using a document representation based on terms. P -
proportional method, S - score method, PF, SF - filtering of dictionary
created using proportional and score method respectively.

The results show better performance of the proportional


method over the score method. The removal of rare tokens
tends to lower the quality and
nd accuracy, which is possibly Figure 2: Average
verage values of accuracy and
a quality, achieved during the tests
using a document representation based on n-grams.
caused by insufficient number of tokens in documents.
B. Experiments on n-gram
gram representation
The length of the n-gram
grams for which the best quality
and accuracy is attained is 7 or 8.
8 Maximum accuracy and
quality is almost independent of the parameter
parame n, but
average results strongly point out to long n-grams.
n
Interestingly, among n-grams
grams having the highest value of
the semantic orientation there are n-grams spanning
between words, e.g., czo odr (stanowczo
stanowczo odradzam -
strongly discourage) or dzo pol (bardzo
bardzo polecam - highly VIII. BIBLIOGRAPHY
recommend). [1] K. Dave, S. Lawrence, and D. M. Pennock. Mining the peanut gallery:
opinion extraction and semantic classification of product reviews. In
Proceedings of the 12th international
internati conference on World Wide Web,
WWW ’03, pages 519–528, 528, New York, NY, USA, 2003. ACM AC
[2] R.F. Xu, , K.F. Wong, and Y.Q. Xia,. Coarse-Fine Opinion Mining –
WIA in NTCIR-77 MOAT Task, Proc. of NTCIR-7. Japan, 2008
[3] G. Wang, K. Araki,. Modifying SO-PMI SO for Japanese Weblog Opinion
Mining by Using a Balancing Factor and Detecting Neutral Expressions,
In Proceedings of NAACL HLT 2007, Companion Volume 2007, pages
189 – 192, Rochester, NY, 2007
[4] G. Wang, K. Araki,. A Graphic Reputation Analysis System for Mining
Japanese Weblog Based on both Unstructured and Structured
Information, In Advanced Information Networking and Applications -
Workshops, 2008. AINAW 2008. 22nd International Conference on on.,
pages 1240 – 1245, Okinawa, 2008
[5] V. Hatzivassiloglou, , K.R. McKeown (1997). Predicting the semantic
orientation of adjectives. In Proceedings of the 35th Annual Meeting of
the Association for Computational Linguistics and the 8th Conference of
the European, pages 174-181. 181. New Brunswick, NJ: Association for
Computational Linguistics.
[6] P. D. Turney, M. L. Littman, Unsupervised learning of semantic
Figure 3: Maximal values of accuracy and quality, achieved during the tests orientation from a Hundred-Billion-word
Hundred corpus, 2002. [Online].
using a document representation based on n-grams. Available: http://arxiv.org/abs/cs/0212012
[7] M. F. Porter, An algorithm for suffix stripping, In Program,
vol. 14, no. 3, pp. 130–137, 130 1980. [Online]. Available:
VII. CONCLUSIONS http://portal.acm.org/citation.cfm?id=275705
In this paper we have discussed the possibility and [8] J. Blazewicz, M. Kasprzak, W. Kuroczycki, Hybrid Genetic Algorithm
benefit of using the social network environment for for DNA Sequencing with Errors, Errors In Journal of Heuristics,
vol. 8, no. 5, pp. 495–502 502, 2002. [Online]. Available: http://
opinion mining. We believe that social networks present a http://www.springerlink.com/content/c7y26bnn1mrvkx1
http://www.springerlink.com/content/c7y26bnn1mrvkx1d
perfect solution to the problem of opinion acquisition and [9] D. Weiss, M. Miłkowski, morfologik-stemming,
morfologik [Online]. Available:
dissemination and may be perceived as natural enablers for http://morfologik.blogspot.com/ 2010
http://morfologik.blogspot.com/,
opinion mining applications. [10] A. Białecki, Stempel - Algorithmic Stemmer for Polish Language,
We have presented, as a proof off concept, examples of [Online]. Available: http://www.getopt.org/stempel/,
http://www.getopt.org/stempel/ 2004
analysis that aim at gathering user opinions in two different [11] Stop listy - Wikipedia, wolna encyklopedia, [Online]. Available:
http://pl.wikipedia.org/wiki/Stop_listy , 2010
application areas. Both experiments suggest that the social
networks
orks fuelling the websites in question provide relevant [12] J. Moreno, H. Jennings, Statistics of social configurations. Sociometry ,
pp. 342-374. 1938
context for opinion mining. We are aware of the fact that [13] S. D. Berkovitz, Markets and market-areas:
market Some preliminary
we have not utilized the information from the social formulations. W B. Wellman, man, & S. D. Berkovitz, Social Structures: A
network directly in the opinion mining algorithm. Merely, Network Approach (strony 261 261-303). Cambridge, England. Cambridge
we have tested the ability to attain ttain high accuracy and University Press. 1988
quality of sentiment prediction using the data harvested [14] S. Bocaletti, V. Latora, Y. Moreno, M. Chavez, Complex networks:
structure and dynamics. Physics Reports (424), pp 175-308.
175 2006
from a social network site.
[15] M. Hu, B. Liu, Mining opinion features in customer reviews, In
Our future work agenda includes the analysis of AAAI'04: Proceedings of the 19th national conference on Artifical
the user's reception of opinions contained in the text and intelligence (2004), pp. 755-760.
760.
further improvements of the presented al algorithm. We [16] W. Zhang, C. Yu, W. Meng, Opinion retrieval from blogs, In CIKM '07:
expect to attain the improvement of classification Proceedings of the sixteenth ACM conference on Conference on
performance due to the utilization of more contextual information and knowledge management (2007), pp. 831-840.
831
information derived from the social networks, namely, [17] S. Wassermann, K. Faust, Social Network Analysis: Methods and
Applications, Cambridge University Press, 1994.
the information on relationships and connections between
[18] A.M. Popescu, O. Entzioni, Extracting Product Features and Opinions
users. We also intend to develop
elop an active learning strategy from Reviews, Natural Language Processing and Text Mining, pp. 9-28. 9
for this type of classification task. 2007
[19] B. Pang, L. Lee, Opinion Mining and Sentiment Analysis, Now
Publishers inc. 2008.
[20] H. Yu, V. Hatzivassiloglou, Towards Answering Opinion Questions:
Separating Facts from Opinions and Identifying the Polarity of Opinion
Sentences, In Proceedings of the 2003 Conference on Emp Emprical Method
in Natural Language
anguage Processing, pp. 129-136,
129 2003
[21] M. Gamon, Sentiment classification on customer feedback data: Noisy
data, large feature vectors, and the role of linguistic analysis, in
Proceedings of the International Conference on Computational
Linguistics (COLING), 2004

View publication stats

You might also like