You are on page 1of 6

Generating a Concept Hierarchy for Sentiment

Analysis
Bin Shi Kuiyu Chang
S.Rajaratnam School of International Studies School of Computer Engineering
Nanyang Technological University Nanyang Technological University
Singapore, 639798 Singapore, 639798
isshibin@ntu.edu.sg kuiyu.chang@mosuma.com

Abstract—In this paper, we propose an unsupervised machine then clustered using bottom-up agglomerative clustering to
learning method to automatically construct a product hierarchical generate a binary tree resembling a concept hierarchy.
concept model based on the online reviews of this product.
Our method starts by representing each candidate noun using In [2], Caraballo et al. proposed a novel method to auto-
a feature context vector, which is simply a vector of all its co- matically construct a hypernym labeled noun hierarchy from
occuring neighbors excluding itself. We then applied bisection the Wall Street corpus. They hypothesized that “nouns in
clustering to hierarchically cluster the context vectors to obtain conjunction or appositives tend to be semantically related”.
a cluster hierarchy. Lastly, we proposed and evaluated two Each noun is represented by a weighted vector of other nouns,
methods to label each intermediate clustering node with the
most representative member context feature vector. Experiments where each weight denotes the number of times another noun
conducted on 3 sets of on-line reviews (in both Chinese and appears in conjunction with or in an appositive relationship
English) benchmarked qualitatively and quantitatively against a to it. The similarity between two nouns is simply their cosine
well known existing approach demonstrated the effectiveness and angle in vector space. Similarly, Jikels et al.[3] made use of
robustness of our approach. subject and appositive dependency relationship among nouns
to build noun clusters, labeling them with hypernyms extracted
I. I NTRODUCTION
from each cluster.
In this paper, we study the problem of automatically creat- In [4], noun-noun relationships are classified into 4 types.
ing a product concept hierarchy based on textual information Cederberg et al.[5] search for several patterns like “x such
contained in on-line customer reviews. In particular, we pro- as y”, “y and other x”, and “x, including y” where y is
pose to use top-down bisection clustering to group similar or treated as a hyponym of x. They than apply Latent Semantic
conceptually related nouns together. To this end, we propose Analysis to further improve the extracted hypernyms. Rydin et
a novel way to measure the conceptual similarity between two al.[6] proposed five hypernym-hypernym constructs to mine
nouns: by counting the co-occurrences of neighboring words. hypernym pairs and create the hierarchy. Nanas et al.[7]
In other words, a noun is represented solely by its neighboring construct the hierarchy using three types of term-dependent
words; two nouns are more similar if they tend to occur in relations.
sentences sharing a lot of common words. Thus this new All of the above approaches rely on some sort of known
similarity measure estimates the conceptual distance between language pattern to mine hypernyms and create a hierarchy.
two words based on word usage. To control the clustering There are other statistical approaches that are language agnos-
procedure, an intra cluster similarity threshold needs to be tic, such as the seminal work by Sanderson[8], which uses co-
experimentally determined, with no other manual intervention. occurrences of words and phrases to create a concept hierarchy,
After clustering at each level, we then propose a unique a process termed “subsumption”. The relationship between
projection method to identify the best hypernym to label each words or phrases is determined by their mutual conditional
cluster node in the hierarchy. The final clustering hierarchy probabilities estimated from the corpus. Lawrie et al.[9] also
thus resemble a tree structure. Our approach can thus be used utilized term co-occurrences and subsumption to extract nouns
to automatically construct a rough and initial concept hierarchy and construct a concept hierarchy.
for products and services. Our approach can be summarized Improving upon Sanderson’s approach, [10] and [11] used
into three steps as follows: term mutual conditional probabilities to build a concept hi-
1) Identify nouns from the review data set. erarchy. In particular, [10] added two type of classifications,
2) Create a contextual representation vector for each noun. namely subsumption or equivalence, to the mutually inclu-
3) Top-down cluster the nouns to create a concept hierarchy. sive probabilities, with each type handled differently in the
subsumption process. In [11], an additional threshold is used
II. R ELATED W ORK to extract the subsumption term pairs precisely.
In [1], snippets returned by search engines are used as Glover et al.[12] proposed using a noun’s category-specific
representative vectors of the query keyword. The vectors are document frequency (DF) vis-a-vis the corpus DF to predict

312

1-4244-2384-2/08/$20.00 
c 2008 IEEE
their roles, which is either parent, child, or self . In this reviews. Assuming there are D terms in the lexicon,
approach, the category or group membership must be known. then sn ∈ RD . Binary representation is used here
One major disadvantage of most existing approaches is the since there is not much difference between binary and
reliance on language-specific grammatical or syntactic patterns Term Frequency representation for segments, which are
like conjunction and appositive. On-line reviews are typically typically short (less than 20 words).
very colloquial, short, and mostly not grammar-conforming. 2) For a candidate feature noun wd ∈ V , where V =
Thus most of the classical methods will not work well. {w1 , . . . , wD } is the set of all words in the lexicon,
let T be the subset of binary term vectors from S
III. M ETHODOLOGY
containing a non-zero weight for candidate noun wd , i.e.,
Most online review sentences tend to be opinion sentences, T = {t : (t ∈ S) ∧ (td = 1)}. Define the context vector
and the nouns are most likely related to a product feature. In ud for noun wd as follows:
other words, the nouns in on-line reviews are usually subjects 
of a sentence and their frequently co-occurring verbs are ud = Ud t (1)
typically link verbs like “is”, “are”, and “be”, which makes t∈T
noun-verb co-occurrence based approach too specific for on- where Ud is a diagonal matrix of all ones except for one
line reviews. Thus in our case, we adopted a free-for-all zero at the d-th position:
approach, by counting all co-occurring words. 
1, if i = j = d
A. Collecting Candidate Nouns Ud = [uij ]D×D =
0, otherwise
Low level semantic word analysis is typically performed at
(2)
the sentence resolution level as the context usually changes
across sentence boundaries. This means that we have to be Intuitively, each context vector ud is simply an average
able to segment text into proper sentences ending with a of all word vectors used in the context of wd . Enumerating
terminating punctuation such as a period or colon. Unfortu- all possible neighbors of a noun over all sentence segments
nately, owing to the colloquial nature of on-line user reviews, containing the noun ensure that each noun is represented by a
punctuations are widely abused, making automatic sentence unique context vector, which has several advantages:
boundary detection non-trivial. We therefore have to go a 1) A rich representation of the noun is created, which also
step further by analyzing at the sentence segment level. A approximates the relative frequency of each neighboring
sentence segment is a continuous string of words between two words used in the context of that noun.
adjacent punctuations (end of line inclusive). The disadvantage 2) We have avoided the problem of a noun having different
of resorting to sentence segment resolution is the possible loss contextual representations from various sentence seg-
of higher level contextual information. However, for the vast ments, which might end up in different clusters, making
majority of sentences, the effect is minimal. For example, a interpretability difficult.
sentence like “This product is OK, except for the high price.” 3) The influence of frequently occurring neighbors is en-
will likely result in one positive and one negative sentiment hanced while at the same time the effect of outlier
extracted from the two segments separately, and one negative neighbors is decreased.
sentiment extracted for the whole sentence, in which case the
sentence segment analysis results is more accurate! C. Top Down Clustering of Features
The candidate noun set is comprised of all POS-tagged To find the hypernymic relationships between different
nouns. Here we adopt the simple Segment Frequency (SF) feature nouns, we apply a top down binary clustering of all
criterion, an analogy of document frequency, to pick important feature context vectors. We use the cosine similarity to measure
nouns from the candidate noun set for analysis. A noun’s SF the degree of similarity between two feature context vectors u
refers to the number of segments containing it in the corpus, and v, which is defined as follows:
thus a noun with a low SF is rare and not representative. uT v
sim(u, v) = (3)
Assuming that frequently mentioned nouns have a higher uv
probability of being a product feature, we pick nouns with We use CLUTO [13] to perform top-down bisection cluster-
a SF value of at least 7%-10%. Keep in mind that this simple ing. In general, determining the optimal number of clusters k
assumption will inadvertently prune infrequent feature nouns for a dataset remains an open research problem. Thus, instead
and include a whole lot of frequent non-feature nouns. of specifying k, an intra-similarity cluster threshold is used to
B. Context Vector for Representing a Noun determine when to stop splitting a cluster, defined as:

Unlike the vast majority of existing approaches, a noun u=v sim(u, v)
is represented as a vector comprising entirely of its common IS = (4)
n(n − 1)
neighbors, except for itself. In effect, the noun is defined by where n is the number of vectors in the cluster.
all of its neighbors, constructed in two steps: The algorithm starts by splitting all feature context vectors
1) Extract a set of N term vectors S = {sn }N n=1 to into two clusters. The intra-similarity metric of data in each of
represent each review segment in the corpus of on-line the two clusters is then computed. If a cluster’s metric exceeds

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008) 313
the threshold, it is labeled as a final cluster and no more clus- respectively. For the Nk vectors in cluster set Ck we have,
tering will be done. Otherwise, bisection clustering is applied
1 
to the sub-cluster and the procedure continues iteratively, until sIik = sim(ui , v) (7)
all clusters have attain intra-similarities exceeding the preset Nk
v∈Ck
threshold. 1 
Bisection clustering creates a binary tree structure with sE
ik = sim(ui , v) (8)
Nk
v∈C
/ k
the final clusters as leaf nodes. Meaningful words could be
assigned to label each node in the tree, thereby transforming it Intuitively, vectors with large internal z-scores and low external
into a useful concept hierarchy. Since we are clustering feature z-scores tend to be the centroid of a cluster. Thus, the most
nouns, we could naturally select the most representative feature representative vector is the vector with the largest internal and
context vector (hypernym) under each node as its text label. smallest external z-scores. In case of multiple candidates, the
We have tried various approaches to find the most represen- vector with the largest internal z-score and/or smallest external
tative vector, including the vector nearest to the mean. In the z-score is preferred.
process, we found two methods to work well for our specific
task. The first method assumes that a noun that shares the IV. E XPERIMENTS
largest number of words with other nouns is most likely a
broader term. The second method computes the z-score. A. Datasets
In the first method, for each node we compute the feature Three on-line review datasets were collected:
 o for all members underneath it as follows:
overlap vector
1) Hotel. This data set contains 2000 Chinese hotel reviews
1, ∃ fraction p of vectors u with ud > 0;
o= randomly selected from 14929 reviews on 423 different
0, otherwise. hotels in Shanghai. The 14929 reviews were crawled
The overlap vector can be visualized as a binary projection from www.ctrip.com, which is the largest on-line travel
of all features onto an empty vector, as shown in Figure 1 portal in China.
where p = 0.5. Then we pick the vector that has the highest 2) Cellphone. This data set contains 2000 Chinese Cell-
amount of binary overlap with the overlap vector as the most phone reviews randomly selected from the 313102 Chi-
representative noun. In case of a tie, we pick the vector with nese cellphone reviews crawled from four major Chinese
a higher total weight. For example, in Figure 1, although both websites.
noun vectors 1 and 2 have 3 non-zero overlapping positions 3) Mp3player. This is a smaller public dataset comprising
with the overlap vector, vector 2 is picked over vector 1 as 95 English reviews on Creative’s Nomad Jukebox Zen
vector 2 has a larger sum of weight of its overlapping terms, Xtra 40GB Mp3player collected from www.amazon.com.
i.e., 7 versus 4. This dataset has been previously evaluated and made
available by Hu and Liu[14].
The three datasets cover different domains and languages,
each with their own characteristics. For example, both the hotel
and cellphone reviews are short as they largely include real per-
sonal experiences. On the other hand, the Mp3player reviews
are longer, and contain less noise (fewer spam reviews).
The only natural language processing tool required for our
approach is the Part-Of-Speech (POS) tagger for identifying
nouns. For Chinese, we use ICTCLAS [15] to segment words
and perform POS tagging. For English, Brill’s tagger[16] was
Fig. 1. Projection Based Hypernym Selection. Here the density vector used. Preprocessing such as stemming for English, punctua-
captures only features with more than 50% overlap. tion filtering, and pruning of questioning segments were also
performed. Table I shows the distribution of the three review
The second method of selecting the best representative datasets after processing, where the fourth row shows the
vector is based on the normalized internal (zkI ) and external average word count per segment.
(zkE ) z-scores for a vector ui in the k-th cluster, defined as
follows[13]: TABLE I
D ISTRIBUTION OF THE THREE REVIEW DATASETS .
sIik − μIk
zkI (ui ) = (5) Hotel Cellphone Mp3player
σkI # reviews 2000 2000 95
# segments 15628 6703 3081
s − μE
E
Avg. Length 5.27 4.96 8.9
zkE (ui ) = ik E k (6) # unique nouns 1813 695 1117
σk # unique words 5986 3496 2431

where μk and σk denotes the mean and standard deviation

314 2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)
the threshold, it is labeled as a final cluster and no more clus- respectively. For the Nk vectors in cluster set Ck we have,
tering will be done. Otherwise, bisection clustering is applied
1 
to the sub-cluster and the procedure continues iteratively, until sIik = sim(ui , v) (7)
all clusters have attain intra-similarities exceeding the preset Nk
v∈Ck
threshold. 1 
Bisection clustering creates a binary tree structure with sE
ik = sim(ui , v) (8)
Nk
v∈C
/ k
the final clusters as leaf nodes. Meaningful words could be
assigned to label each node in the tree, thereby transforming it Intuitively, vectors with large internal z-scores and low external
into a useful concept hierarchy. Since we are clustering feature z-scores tend to be the centroid of a cluster. Thus, the most
nouns, we could naturally select the most representative feature representative vector is the vector with the largest internal and
context vector (hypernym) under each node as its text label. smallest external z-scores. In case of multiple candidates, the
We have tried various approaches to find the most represen- vector with the largest internal z-score and/or smallest external
tative vector, including the vector nearest to the mean. In the z-score is preferred.
process, we found two methods to work well for our specific
task. The first method assumes that a noun that shares the IV. E XPERIMENTS
largest number of words with other nouns is most likely a
broader term. The second method computes the z-score. A. Datasets
In the first method, for each node we compute the feature Three on-line review datasets were collected:
 o for all members underneath it as follows:
overlap vector
1) Hotel. This data set contains 2000 Chinese hotel reviews
1, ∃ fraction p of vectors u with ud > 0;
o= randomly selected from 14929 reviews on 423 different
0, otherwise. hotels in Shanghai. The 14929 reviews were crawled
The overlap vector can be visualized as a binary projection from www.ctrip.com, which is the largest on-line travel
of all features onto an empty vector, as shown in Figure 1 portal in China.
where p = 0.5. Then we pick the vector that has the highest 2) Cellphone. This data set contains 2000 Chinese Cell-
amount of binary overlap with the overlap vector as the most phone reviews randomly selected from the 313102 Chi-
representative noun. In case of a tie, we pick the vector with nese cellphone reviews crawled from four major Chinese
a higher total weight. For example, in Figure 1, although both websites.
noun vectors 1 and 2 have 3 non-zero overlapping positions 3) Mp3player. This is a smaller public dataset comprising
with the overlap vector, vector 2 is picked over vector 1 as 95 English reviews on Creative’s Nomad Jukebox Zen
vector 2 has a larger sum of weight of its overlapping terms, Xtra 40GB Mp3player collected from www.amazon.com.
i.e., 7 versus 4. This dataset has been previously evaluated and made
available by Hu and Liu[14].
The three datasets cover different domains and languages,
each with their own characteristics. For example, both the hotel
and cellphone reviews are short as they largely include real per-
sonal experiences. On the other hand, the Mp3player reviews
are longer, and contain less noise (fewer spam reviews).
The only natural language processing tool required for our
approach is the Part-Of-Speech (POS) tagger for identifying
nouns. For Chinese, we use ICTCLAS [15] to segment words
and perform POS tagging. For English, Brill’s tagger[16] was
Fig. 1. Projection Based Hypernym Selection. Here the density vector used. Preprocessing such as stemming for English, punctua-
captures only features with more than 50% overlap. tion filtering, and pruning of questioning segments were also
performed. Table I shows the distribution of the three review
The second method of selecting the best representative datasets after processing, where the fourth row shows the
vector is based on the normalized internal (zkI ) and external average word count per segment.
(zkE ) z-scores for a vector ui in the k-th cluster, defined as
follows[13]: TABLE I
D ISTRIBUTION OF THE THREE REVIEW DATASETS .
sIik − μIk
zkI (ui ) = (5) Hotel Cellphone Mp3player
σkI # reviews 2000 2000 95
# segments 15628 6703 3081
s − μE
E
Avg. Length 5.27 4.96 8.9
zkE (ui ) = ik E k (6) # unique nouns 1813 695 1117
σk # unique words 5986 3496 2431

where μk and σk denotes the mean and standard deviation

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008) 315
product concept hierarchy should group similar nouns or TABLE VI
R ESULTS OF N OUN PAIR E VALUATION .
concepts at the leaf nodes, with higher level nodes describing
broader concepts. Hotel Cellphone Mp3player
Naturally, one way of evaluating a hierarchy is by determin- Proj. San. Proj. San. Proj. San.
1) Aspect of(%) 30.7 42.9 24.5 27 38.3 27.2
ing the correctness of each parent-child link, but how should 2) Type of(%) 6.8 0 4.3 1.2 7.2 4.1
one quantify or qualify “correctness”? To this end, we adopt 3) Same(%) 19.3 3.3 5.5 4.9 7.2 3.0
4) Opposite(%) 0 2.0 1.8 4.3 0 0
the five relationship definitions of Sanderson [8]. 5) Unknown(%) 43.2 52 63.8 62.6 47.3 65.7
To evaluate our hierarchy, we benchmark it against Sander- Sum (1-3)(%) 56.8 46.2 34.3 33.1 52.7 34.3
son’s subsumption approach [8], which is focused on deriving Sum (1-4)(%) 56.8 48.2 36.1 37.4 52.7 34.3

non-hierarchical disconnected relationships. As the latter’s


approach is quite different from our projection method of
if the fourth relationship holds true for some pairs within
deriving a single hierarchy, to ensure a fair comparison, we
the hierarchy, it should be considered “wrong”. Thus, if we
ensure that both graphs have an approximately equal number
consider the sum of correct type 1 to 3 noun pairs, our
of unique edges/links.
projection approach is 10.6%, 1.2%, and 18.4% better than
Unique edges are links that connect two non-identical
Sanderson’s approach for each of the 3 datasets, respectively,
nouns, which make up the majority of the edges in both
which averages to about 10% across-the-board improvement.
methods. Since we do not enforce each node to have a unique
Note that both methods performed poorly for the cellphone
hypernym in our projection method, as a result, non-unique
dataset, which contains many irrelevant posts.
edges may be present.
As the complete tree and graph for each dataset are quite
We selected 3 clustering results with the lowest reduction
large, we only show part of them here. To facilitate objective
in IS (emphasized in bold in Table III) to benchmark our
comparison of the two graphs, sub-hierarchies consisting of
projection hypernym extraction. Note that the three selected
the same nouns are extracted and shown. A sub-tree from the
hierarchies may be locally optimal in terms of noun intra-
“projection” hierarchy is first selected, then a sub-graph from
similarity, but could otherwise be sub-optimal with respect to
Sanderson’s hierarchy containing the same nouns is extracted
the best hierarchy. Sanderson’s method, on the other hand, is
and shown.
not restricted to finding a hierarchy.
Sub-hierarchies and sub-graphs for the three datasets are
Table V lists the number of unique edges and nouns for
shown in Figures 2 to 4.
the graphs/trees obtained by both methods. Both methods were
given the same set of candidate nouns and reviews. Sanderson’s
mutually inclusive threshold value was adjusted so that it
resulted in an approximate number of unique edges as our
projection results. From Table V, Sanderson’s result include
far fewer nouns at the same link density as our projection
method.
TABLE V (a) Projection
S TATISTICS FOR Projection AND S ANDERSON .

# Unique edges # Nouns captured


Projection Sanderson Projection Sanderson
Hotel 176 154 175 120
(b) Sanderson
Cellphone 163 163 161 126
Fig. 2. Sub-hierarchies of Hotel.
Mp3player 167 169 163 125

The projection method of labeling each node goes hand-


We manually classify each unique edge in both graphs to
in-hand with the top-down clustering procedure, thus it will
one of 5 types as listed in Table VI. Sanderson’s method
not remove any candidate nouns from the set even if they
clearly benefits from this simplistic evaluation methodology.
were selected as the node label (hypernym). As a result, two
This is because our hierarchies are top-down binary, which
anomalies may arise:
restrict hypernyms at each intermediate node to have only two
children, thus limiting the number of edges for each noun. 1) The same node labels may be subsequently assigned to
Further, the hierarchical nature of our tree means that only one or more descendants, resulting in edges such as
the first two relationships should be considered correct in the “zhiliang-zhiliang” (“quality-quality”) in Figure 3 and
strictest sense. On the other hand, each edge in Sanderson’s “track-track” in Figure 4.
graph is considered “correct” as long as any of the 4 relation- 2) Edges with reversed relationships may be created, e.g.,
ships applies. “track-album” in Figure 4.
As indicated in [8], only the first three types of rela- To make the comparison with Sanderson’s method fair, both
tionships, “Aspect of”, “Type of”, and “Same” should be cases are treated as class 5, i.e., “Unknown”, and do not
considered “correct” for our projection concept hierarchy. Even contribute to the scores of our projection method in Table VI.

316 2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)
of a node, whereas in Sanderson’s graph, synonyms may be
disconnected and shown in different graphs.
There are some limitations of our approach that will require
further work. First, the current method of collecting candidate
nouns for a node label relies solely on a noun’s Segment
Frequency (SF). More sophisticated statistics such as burstiness
(a) Projection
of words[17] could be used. Second, a multi-way split would
result in more sophisticated tree with fewer levels. Third, with
the incorporation of a multi-way split, we could remove a noun
from the label candidate pool once it has been selected as a
label for a parent node, in contrast to the current approach
which accepts duplicate node labels.
Despite these limitations, our current approach can still be
effectively used to automatically generate a concept hierarchy
from the latest review content. The concept hierarchy can then
(b) Sanderson
Fig. 3. Sub-hierarchies of Cellphone. be reviewed by humans to uncover the latest buzz and feature
4
words, which can be fed back into an opinion mining system
album
as candidate feature nouns.
5
folder
5
track R EFERENCES
[1] S.-L. Chuang and L.-F. Chien, “A practical web-based approach to gen-
folder name disk artist menu titl order plai track list album genr
erating topic hierarchy for text segments,” in ACM CIKM, Washington,
(a) Projection D.C., USA, November 2004.
file song cd [2] S. A. Caraballo, “Automatic construction of a hypernym-labeled noun
hierarchy from text,” in Proceedings of the 37th annual meeting of the
Association for Computational Linguistics on Computational Linguistics,
name folder list titl album track College Park, Maryland, 1999.
[3] T. Jickels and G. Kondrak, “Unsupervised labeling of noun clusters,” in
Proceedings of the 19th Canadian Conference on Artificial Intelligence,
iriv order player artist inform end Quebec City, June 2006, pp. 278–287.
[4] M. Degeratu and V. Hatzivassiloglou, “Building automatically a business
registration ontology,” in Proceedings of the 2002 annual national
plai disk menu
conference on Digital government research, Los Angeles, California,
May 2002, pp. 1–7.
(b) Sanderson
Fig. 4. Sub-hierarchies of Mp3player. [5] S. Cederberg and D. Widdows, “Using lsa and noun coordination
information to improve the precision and recall of automatic hyponymy
extraction,” in Conference on Natural Language Learning (CoNNL),
Edmonton, Canada, 2003.
From the comparison of sub-hierarchies between our [6] S. Rydin, “Building a hyponymy lexicon with hierarchical structure,” in
method and Sanderson’s, as shown in Figures 2, 3, and 4, we Proceedings of the ACL-02 workshop on Unsupervised lexical acquisi-
see that our method has the following qualitative advantages tion, Philadelphia, Pennsylvania, July 2002, pp. 26–33.
[7] N. Nanas, V. Uren, and A. D. Roeck, “Building and applying a concept
in addition to the quantitative advantages previously presented hierarchy representation of a user profile,” in ACM SIGIR, Toronto,
in Table VI: Canada, 2003.
[8] M. Sanderson and B. Croft, “Deriving concept hierarchies from text,” in
1) Our method can produce a neat hierarchical structure that ACM SIGIR, 1999.
is intuitive to end users. On the other hand, Sanderson’s [9] D. Lawrie, W. B. Croft, and A. Rosenberg, “Finding topic words for
method generates fragments of disconnected graphs that hierarchical summarization,” in ACM SIGIR, New Orleans, Louisiana,
United States, September 2001, pp. 349–357.
has no clear hierarchical ordering. [10] W. Dakka, P. G. Ipeirotis, and K. R. Wood, “Automatic construction of
2) Our method can identify synonyms and merynoyms multifaceted browsing interfaces,” in ACM CIKM, Bremen, Germany,
effectively as descendants of a node, whereas in Sander- 2005.
[11] X. Chen and Y. fang Brook Wu, “Web mining from competitors’
son’s graph, synonyms may be disconnected and shown websites,” in ACM SIGKDD, Chicago, Illinois, USA, August 2005.
in different graphs. [12] E. Glover, D. M. Pennock, S. Lawrence, and R. Krovetz, “Inferring
hierarchical descriptions,” in Proceedings of the 11th international con-
V. C ONCLUSIONS ference on Information and knowledge management, McLean, Virginia,
USA, November 2002.
As demonstrated in both the quantitative benchmark results [13] “CLUTO & Manual,” http://glaros.dtc.umn.edu/gkhome/views/cluto/.
and qualitative sub-hierarchy comparision with Sanderson’s [14] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in ACM
SIGKDD, Seattle, Washington, USA, August 2004.
method, our approach can automatically group similar and [15] “ICTCLAS linux version,” http://www.nlp.org.cn.
related nouns into an intuitive hierarchy that is both user- [16] “RBT,” http://www.cs.jhu.edu/∼brill/.
friendly and up-to-date. This is in contrast to the fragments [17] Q. He, K. Chang, and E.-P. Lim, “Using burstiness to improve clustering
of topics in news streams,” in ACM SIGIR, Amsterdam, Netherlands,
of disconnected graphs with no clear hierarchical ordering 2007, pp. 207–214.
generated by Sanderson’s method. Further, our approach can
identify synonyms and merynoyms effectively as descendants

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008) 317

You might also like