You are on page 1of 9

Automatic Labeling of Topic Models Using Text Summaries

Xiaojun Wan and Tianming Wang


Institute of Computer Science and Technology, The MOE Key Laboratory of Computational
Linguistics, Peking University, Beijing 100871, China
{wanxiaojun, wangtm}@pku.edu.cn

understand and leverage the topic. However, it is


Abstract usually very hard for a user to understand the dis-
covered topics based only on the multinomial dis-
Labeling topics learned by topic models is tribution of words. For example, here are the top
a challenging problem. Previous studies terms for a discovered topic: {fire miles area
have used words, phrases and images to north southern people coast homes south damage
label topics. In this paper, we propose to northern river state friday central water rain high
use text summaries for topic labeling. california weather}. It is not easy for a user to
Several sentences are extracted from the fully understand this topic if the user is not very
most related documents to form the sum- familiar with the document collection. The situa-
mary for each topic. In order to obtain tion may become worse when the user faces with
summaries with both high relevance, cov- a number of discovered topics and the sets of top
erage and discrimination for all the topics, terms of the topics are often overlapping with each
we propose an algorithm based on sub- other on many practical document collections.
modular optimization. Both automatic and In order to address the above challenge, a few
manual analysis have been conducted on previous studies have proposed to use phrases,
two real document collections, and we concepts and even images for labeling the discov-
find 1) the summaries extracted by our ered topics (Mei et al., 2007; Lau et al., 2011;
proposed algorithm are superior over the Hulpus et al., 2013; Aletras and Stevenson, 2013).
summaries extracted by existing popular For example, we may automatically extract the
summarization methods; 2) the use of phrase “southern california” to represent the ex-
summaries as labels has obvious ad- ample topic mentioned earlier. These topic labels
vantages over the use of words and can help the user to understand the topics to some
phrases. extent. However, the use of phrases or concepts as
topic labels are not very satisfactory in practice,
1 Introduction because the phrases or concepts are still very short,
and the information expressed in these short labels
Statistical topic modelling plays very important is not adequate for user’s understanding. The case
roles in many research areas, such as text mining, will become worse when some ambiguous phrase
natural language processing and information re- is used or multiple discrete phrases with poor co-
trieval. Popular topic modeling techniques in- herence are used for a topic. To address the draw-
clude Latent Dirichlet Allocation (LDA) (Blei et backs of the above short labels, we need to pro-
al., 2003) and Probabilistic Latent Semantic Anal- vide more contextual information and consider
ysis (pLSA) (Hofmann, 1999). These techniques using long text descriptions to represent the topics.
can automatically discover the abstract “topics” The long text descriptions can be used inde-
that occur in a collection of documents. They pendently or used as beneficial complement to the
model the documents as a mixture of topics, and short labels. For example, below is part of the
each topic is modeled as a probability distribution summary label produced by our proposed method
over words. and it provides much more contextual information
Although the discovered topics’ word distribu- for understanding the topic.
tions are sometimes intuitively meaningful, a ma-
jor challenge shared by all such topic models is to Showers and thunderstorms developed in parched
accurately interpret the meaning of each topic areas of the southeast , from western north
(Mei et al., 2007). The interpretation of each topic carolina into south central alabama , north
is very important when people want to browse, central and northeast texas and the central and
southern gulf coast . … The quake was felt over a

2297
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 2297–2305,
Berlin, Germany, August 7-12, 2016. 2016
c Association for Computational Linguistics
large area , extending from santa rosa , about 60 relations among topics. Kou et al. (2015) pro-
miles north of san francisco , to the santa cruz posed to map topics and candidate labels (phrases)
area 70 miles to the south …. Fourteen homes
were destroyed in baldwin park 20 miles northeast to word vectors and letter trigram vectors in order
of downtown los angeles and five were damaged to find which candidate label is more semantically
along with five commercial buildings when 75 related to that topic. Hulpus et al. (2013) took a
mph gusts snapped power lines , igniting a fire at new approach based on graph centrality measures
allan paper co. , fire officials said . …
to topic labelling by making use of structured data
exposed by DBpedia. Different from the above
The contributions of this paper are summarized
works, Aletras and Stevenson (2013) proposed to
as follows:
use images for representing topics, where candi-
1) We are the first to invesitage using text
date images for each topic are retrieved from the
summaries for topic labeling;
web and the most suitable image is selected by us-
2) We propose a summarization algorithm
ing a graph-based algorithm. In a very recent
based on submodular optimization to extract
study (Aletras et al., 2015), 3 different topic rep-
summaries with both high relevance, coverage
resentations (lists of terms, textual phrase labels
and discrimination for all topics.
and images labels) are compared in a document
3) Automatic and manual analysis reveals the
retrieval task, and results show that textual phrase
usefulness and advantages of the summaries pro-
labels are easier for users to interpret than term
duced by our algorithm.
lists and image labels.
2 Related Work The phrase-based labels in the above works are
still very short and are sometimes not adequate for
2.1 Topic Labeling interpreting the topics. Unfortunately, none of
After topics are discovered by topic modeling previous works has investigated using textual
techniques, these topics are conventionally repre- summaries for representing topics yet.
sented by their top N words or terms (Blei et al., 2.2 Document Summarization
2003; Griffiths and Steyvers, 2004). The words or
terms in a topic are ranked based on the condi- The task of document summarization aims to pro-
duce a summary with a length limit for a given
tional probability p(𝑤𝑖 |𝑡𝑗 ) in that topic. It is
document or document set. The task has been ex-
sometimes not easy for users to understand each
tensively investigated in the natural language pro-
topic based on the terms. Sometimes topics are
cessing and information retrieval fields, and most
presented with manual labeling for exploring re-
previous works focus on directly extracting sen-
search publications (Wang and McCallum, 2006;
tences from a news document or collection to
Mei et al., 2006), and the labeling process is time
form the summary. The summary can be used for
consuming.
helping users quickly browse and understand a
In order to make the topic representations more
document or document collection.
interpretable and make the topics easier to under-
Typical multi-document summarization meth-
stand, there are a few studies proposing to auto-
ods include the centroid-based method (Radev et
matically find phrases, concepts or even images
al., 2004), integer linear programming (ILP) (Gil-
for topic labeling. Mei et al. (2007) proposed to
lick et al., 2008), sentence-based LDA (Chang and
use phrases (chunks or ngrams) for topic labeling
Chien, 2009), submodular function maximization
and cast the labeling problem as an optimization
(Lin and Bilmes, 2010; Lin and Bilmes, 2011),
problem involving minimizing Kullback-Leibler
graph based methods (Erkan and Radev, 2004;
(KL) divergence between word distributions and
Wan et al., 2007; Wan and Yang, 2008), and su-
maximizing mutual information between a label
pervised learning based methods (Ouyang et al.,
and a topic model. Lau et al. (2011) also used
2007; Shen et al., 2007). Though different sum-
phrases as topic labels and they proposed to use
marization methods have been proposed in recent
supervised learning techniques for ranking candi-
years, the submodular function maximization
date labels. In their work, candidate labels include
method is still one of the state-of-the-art summa-
the top-5 topic terms and a few noun chunks ex-
rization methods. Moreover, the method is easy to
tracted from related Wikipedia articles. Mao et al.
follow and its framework is very flexible. One can
(2012) proposed two effective algorithms that au-
design specific submodular functions for address-
tomatically assign concise labels to each topic in
ing special summarization tasks, without altering
a hierarchy by exploiting sibling and parent-child
the overall greedy selection framework.

2298
Though various summarization methods have 4 Our Method
been proposed, none of existing works has inves-
tigated or tried to adapt document summarization Our proposed method is based on submodular op-
techniques for the task of automatic labeling of timization, and it can extract summaries with both
topic models. high relevance, coverage and discrimination for
all topics. We choose the framework of submodu-
3 Problem Formulation lar optimization because the framework is very
flexible and different objectives can be easily in-
Given a set of latent topics extracted from a text corporated into the framework. The overall frame-
collection and each topic is represented by a mul- work of our method consists of two phases: can-
tinomial distribution over words, our goal is to didate sentence selection, and topic summary ex-
produce understandable text summaries as labels traction. The two phrases are described in the next
for interpreting all the topics. We now give two two subsections, respectively.
useful definitions for later use.
Topic: Each topic 𝜃 is a probability distribu- 4.1 Candidate Sentence Selection
tion of words {𝑝𝜃 (𝑤)}𝑤∈𝑉 , where V is the vocab- There are usually many thousands of sentences in
ulary set, and we have ∑𝑤∈𝑉 𝑝𝜃 (𝑤) = 1. a document collection for topic modelling, and all
Topic Summary: In this study, a summary for the sentences are more or less correlated with each
each topic 𝜃 is a set of sentences extracted from topic. If we use all the sentences for summary ex-
the document collection and it can be used as a traction, the summarization efficiency will be
label to represent the latent meaning of 𝜃. Typi- very low. Moreover, many sentences are not suit-
cally, the length of the summary is limited to 250 able for summarization because of their low rele-
words, as defined in recent DUC and TAC confer- vance with the topic. Therefore, we filter out the
ences. large number of unrelated sentences and treat the
Like the criteria for the topic labels in (Mei et remaining sentences as candidates for summary
al., 2007), the topic summary for each topic needs extraction.
to meet the following two criteria: For each topic 𝜃, we compute the Kullback-
High Relevance: The summary needs to be se- Leibler (KL) divergence between the word distri-
mantically relevant to the topic, i.e., the summary butions of the topic and each sentence s in the
needs to be closely relevant to all representative whole document collection as follows:
documents of the topic. The higher the relevance 𝐾𝐿(𝜃, 𝑠)
is, the better the summary is. This criterion is in- 𝑝𝜃 (𝑤)
= ∑ 𝑝𝜃 (𝑤) ∗ 𝑙𝑜𝑔
tuitive because we do not expect to obtain a sum- 𝑡𝑓(𝑤, 𝑠)⁄𝑙𝑒𝑛(𝑠)
mary unrelated to the topic. 𝑤∈𝑇𝑊∪𝑆𝑊
High Coverage: The summary needs to cover where 𝑝𝜃 (𝑤) is the probability of word w in topic
as much semantic information of the topic as pos- 𝜃. TW denotes the set of top 500 words in topic 𝜃
sible. The summary usually consists of several according to the probability distribution. SW de-
sentences, and we do not expect all the sentences notes the set of words in sentence s after removing
to focus on the same piece of semantic infor- stop words. 𝑡𝑓(𝑤, 𝑠) denotes the frequency of
mation. A summary with high coverage will cer- word w in sentence s, and 𝑙𝑒𝑛(𝑠) denotes the
tainly not contain redundant information. This cri- length of sentence s after removing stop words.
terion is very similar to the diversity requirement For a word w which does not appear in SW, we set
of multi-document summarization. 𝑡𝑓(𝑤, 𝑠)⁄𝑙𝑒𝑛(𝑠) to a very small value (0.00001 in
Since we usually produce a set of summaries this study).
for all the topics discovered in a document collec- Then we rank the sentences by an increasing or-
tion. In order to facilitate users to understand all der of the divergence scores and keep the top 500
the topics, the summaries need to meet the follow- sentences which are most related to the topic.
ing additional criterion: These 500 sentences are treated as candidate sen-
High Discrimination: The summaries for dif- tences for the subsequent summarization step for
ferent topics need to have inter-topic discrimina- each topic. Note that different topics have differ-
tion. If the summaries for two or more topics are ent candidate sentence sets.
very similar with each other, users can hardly un- 4.2 Topic Summary Extraction
derstand each topic appropriately. The higher the
Our method for topic summary extraction is based
inter-topic discrimination is, the better the sum-
maries are. on submodular optimization. For each topic 𝜃 as-
sociated with the candidate sentence set V, our

2299
method aims to find an optimal summary 𝐸̃ from ∑𝑠∈𝐸 𝑠𝑖𝑚(𝑠 ′ , 𝑠) measures how similar E is to sen-
all possible summaries by maximizing a score tence 𝑠 ′ and then ∑𝑠∈𝑉 𝑠𝑖𝑚(𝑠 ′ , 𝑠) is the largest
function under budget constraint: value that ∑𝑠∈𝐸 𝑠𝑖𝑚(𝑠 ′ , 𝑠) can achieve. Therefore,
𝑠 ′ is saturated by E when ∑𝑠∈𝐸 𝑠𝑖𝑚(𝑠 ′ , 𝑠) ≥
𝐸̃ = 𝑎𝑟𝑔𝑚𝑎𝑥𝐸⊆𝑉 {𝑓(𝐸)} 𝛼 ∑𝑠∈𝑉 𝑠𝑖𝑚(𝑠 ′ , 𝑠). When 𝑠 ′ is already saturated by
s.t. 𝑙𝑒𝑛(𝐸) ≤ 𝐿 E in this way, any new sentence very similar to 𝑠 ′
cannot further improve the overall relevance of E,
where 𝑙𝑒𝑛(𝐸) denotes the length of summary E. and this sentence is less possible to be added to
Here E is also used to denote the set of sentences the summary.
in the summary. L is a predefined length limit, i.e.
250 words in this study. 4.2.2 Coverage Function
𝑓(𝐸) is the score function to evaluate the over- We want the summary to cover as many topic
all quality of summary E. Usually, 𝑓(𝐸) is re- words as possible and contain as many different
quired to be a submodular function, so that we can sentences as possible. The coverage function is
use a simple greedy algorithm to find the near-op- thus defined as follows:
timal summary with theoretical guarantee. For-
mally, for any 𝐴 ⊆ 𝐵 ⊆ 𝑉\𝑣, we have 𝐶𝑂𝑉(𝐸) = 𝛽 ∗ ∑ {𝑝𝜃 (𝑤) ∗ √∑ 𝑡𝑓(𝑤, 𝑠)}
𝑤∈𝑇𝑊 𝑠∈𝐸
𝑓(𝐴 + 𝑣) − 𝑓(𝐴) ≥ 𝑓(𝐵 + 𝑣) − 𝑓(𝐵) where 𝛽 ≥ 0 is a combination co-efficient.
The above function is a monotone submodular
which means that the incremental “value” of v de- function and it encourages the summary E to con-
creases as the context in which v is considered tain many different words, rather than a small set
grows from A to B. of words. Because 𝑓(𝑥) = √𝑥 where 𝑥 ≥ 0 is a
In this study, the score function 𝑓(𝐸) is decom- concave non-decreasing function, we have 𝑓(𝑥 +
posed into three parts and each part evaluates one 𝑦) ≤ 𝑓(𝑥) + 𝑓(𝑦). The value of the function will
aspect of the summary: be larger when we use x and y to represent two
frequency values of two different words respec-
𝑓(𝐸) = 𝑅𝐸𝐿(𝐸) + 𝐶𝑂𝑉(𝐸) + 𝐷𝐼𝑆(𝐸) tively than that when we use (𝑥 + 𝑦) to represent
the frequency value of a single word. Therefore,
where 𝑅𝐸𝐿(𝐸) , 𝐶𝑂𝑉(𝐸) and 𝐷𝐼𝑆(𝐸) evaluate the use of this function encourages the coverage
the relevance, coverage and discrimination of of more different words in the summary. In other
summary E respectively. We will describe them in words, the diversity of the summary is enhanced.
details respectively.
4.2.3 Discrimination Function
4.2.1 Relevance Function
The function for measuring the discrimination be-
Instead of intuitively measuring relevance be- tween the summary E of topic 𝜃 and all other top-
tween the summary and the topic via the KL di- ics {𝜃 ′ } is defined as follows:
vergence between the word distributions of them,
we consider to measure the relevance of summary 𝐷𝐼𝑆(𝐸) = −𝛾 ∑ ∑ ∑ 𝑝𝜃′ (𝑤) ∗ 𝑡𝑓(𝑤, 𝑠) ⁡
E for topic 𝜃 by the relevance of the sentences in 𝜃′ 𝑠∈𝐸 𝑤∈𝑇𝑊
the summary to all the candidate sentences for the where 𝛾 ≥ 0 is a combination co-efficient.
topic as follows: The above function is still a monotone submod-
𝑅𝐸𝐿(𝐸) ular function. The negative sign indicates that the
summary E of topic 𝜃 needs to be as irrelevant
= ∑ min⁡{∑ 𝑠𝑖𝑚(𝑠 ′ , 𝑠), 𝛼 ∑ 𝑠𝑖𝑚(𝑠 ′ , 𝑠)} with any other topic as possible, and thus making
𝑠′ ∈𝑉 𝑠∈𝐸 𝑠∈𝑉 different topic summaries have much differences.
where V represents the candidate sentence set for
topic 𝜃, and E is used to represent the sentence 4.2.4 Greedy Selection
set of the summary. 𝑠𝑖𝑚(𝑠 ′ , 𝑠) is the standard co- Since 𝑅𝐸𝐿(𝐸), 𝐶𝑂𝑉(𝐸) and 𝐷𝐼𝑆(𝐸) are all sub-
sine similarity between sentences 𝑠 ′ ⁡and s. 𝛼 ∈ modular functions, 𝑓(𝐸) is also a submodular
[0,1] is a threshold co-efficient. function. In order to find a good approximation to
The above function is a monotone submodular the optimal summary, we use a greedy algorithm
function because 𝑓(𝑥) = 𝑚𝑖𝑛⁡(𝑥, 𝑎) where 𝑎 ≥ 0 similar to (Lin and Bilmes, 2010) to select sen-
is a concave non-decreasing function. tence one by one and produce the final summary,
as shown in Algorithm 1.

2300
Algorithm 1 Greedy algorithm for summary We have two goals in the evaluation: compari-
extraction son of different summarization methods for topic
labeling, and comparison of different kinds of la-
1: 𝐸 ← ∅ bels (summaries, words, and phrases).
2: 𝑈 ← 𝑉 In particular, we compare our proposed summa-
3: while 𝑈 ≠ ∅ do rization method (denoted as Our Method) with
𝑓(𝐸∪{𝑠})−𝑓(𝐸) the following typical summarization methods and
4: 𝑠̂ ← 𝑎𝑟𝑔𝑚𝑎𝑥𝑠∈𝑈 𝑙𝑒𝑛(𝑠)𝜀 all of them extract summaries from the same can-
5: 𝐸 ← 𝐸 ∪ {𝑠̂ } if ∑𝑠∈𝐸 𝑙𝑒𝑛(𝑠) + 𝑙𝑒𝑛(𝑠̂ ) ≤ 𝐿 didate sentence set for each topic:
and MEAD: It uses a heuristic way to obtain each
𝑓(𝐸 ∪ {𝑠}) − 𝑓(𝐸) ≥ 0 sentence’s score by summing the scores based on
6: 𝑈 ← 𝑈 ∖ {𝑠̂ } different features (Radev et al., 2004): centroid-
based weight, position and similarity with first
7: end while
sentence.
8: return 𝐸 LexRank: It constructs a graph based on the
sentences and their similarity relationships and
In the algorithm, 𝑙𝑒𝑛(𝑠) denotes the length of then applies the PageRank algorithm for sentence
sentence s and 𝜀 > 0 is the scaling factor. At each ranking (Erkan and Radev, 2004).
iteration, the sentence with the largest ratio of ob- TopicLexRank: It is an improved version of
jective function gain to scaled cost is found in step LexRank by considering the probability distribu-
4, and if adding the sentence can increase the ob- tion of top 500 words in a topic as a prior vector,
jective function value while not violating the and then applies the topic-sensitive PageRank al-
length constraint, it is then selected into the sum- gorithm for sentence ranking, similar to (Wan
mary and otherwise bypassed. 2008).
Submodular(REL): It is based on submodular
5 Evaluation and Results function maximization but only the relevance
5.1 Evaluation Setup function is considered.
Submodular(REL+COV): It is based on sub-
We used two document collections as evaluation
modular function maximization and combines
datasets, as in (Mei et al. 2007): AP news and
two functions: the relevance function and the cov-
SIGMOD proceedings. The AP news dataset con-
erage function.
tains a set of 2250 AP news articles, which are
We also compare the following three different
provided by TREC. There is a total of 43803 sen-
kinds of labels:
tences in the AP news dataset and the vocabulary
Word label: It shows ten topic words as labels
size is 37547 (after removing stop words). The
for each topic, which is the most intuitive inter-
SIGMOD proceeding dataset contains a set of
pretation of the topic.
2128 abstracts of SIGMOD proceedings between
Phrase label: It uses three phrases as labels for
the year 1976 and 2015, downloaded from the
each topic, and the phrase labels are extracted by
ACM digital library. There is a total of 15211sen-
using the method proposed in (Mei et al., 2007),
tences in the SIGMOD proceeding dataset and the
which is very closely related to our work and con-
vocabulary size is 13688.
sidered a strong baseline in this study.
For topic modeling, we adopted the most popu-
Summary Label: It uses a topic summary with
lar LDA to discover topics in the two datasets, re-
a length of 250 words to label each topic and the
spectively. Particularly, we used the LDA module
summary is produced by our proposed method.
implemented in the MALLET toolkit 1 . Without
loss of generality, we extracted 25 topics from the 5.2 Evaluation Results
AP news dataset and 25 topics from the SIGMOD
proceeding dataset. 5.2.1 Automatic Comparison of Summarization
The parameter values of our proposed summa- Methods
rization method is either directly borrowed from In this section, we compare different summariza-
previous works or empirically set as follows: 𝛼 = tion methods with the following automatic
0.05, 𝛽 = 250, 𝛾 = 300 and 𝜀 = 0.15. measures:

1 http://mallet.cs.umass.edu/

2301
KL divergence between word distributions the maximum similarity. Seen from Table 3, the
of summary and topic: For each summarization topic summaries produced by our method has the
method, we compute the KL divergence between lowest average and maximum similarity with each
the word distributions of each topic and the sum- other, and thus the summaries for different topics
mary for the topic, then average the KL diver- have much difference.
gence across all topics. Table 1 shows the results.
We can see that our method and Submodu- 5.2.2 Manual Comparison of Summarization
lar(REL+COV) have the lowest KL divergence Methods
with the topic, which means our method can pro- In this section, we compare our summarization
duce summaries relevant to the topic representa- method with three typical summarization methods
tion. (MEAD, TopicLexRank and Submodular(REL))
Topic word coverage: For each summarization manually. We employed three human judges to
method, we compute the ratio of the words cov- read and rank the four summaries produced for
ered by the summary out of top 20 words for each each topic by the four methods in three aspects:
topic, and then average the ratio across all topics. relevance between the summary and the topic
We use top 20 words instead of 500 words be- with the corresponding sentence set, the content
cause we want to focus on the most important coverage (or diversity) in the summary and the
words. The results are shown in Table 2. We can discrimination between different summaries. The
see that our method has almost the best coverage human judges were encouraged to read a few
ratio and the produced summary can cover most closely related documents for better understand-
important words in a topic. ing each topic. Note that the judges did not know
which summary was generated by our method and
AP SIGMOD which summaries were generated by the baseline
MEAD 0.832503 1.470307 methods. The rank k for each summary ranges
LexRank 0.420137 1.153163 from 1 to 4 (1 means the best, and 4 means the
TopicLexRank 0.377587 1.112623
worst; we allow equal ranks), and the score is thus
Submodular(REL) 0.43264 1.002964
Submodular(REL+COV) 0.349807 0.991071 (4-k). We average the scores across all summaries
Our Method 0.360306 0.907193 and all judges and the results on the two datasets
Table 1. Comparison of KL divergence between word are shown in Tables 4 and 5, respectively. In the
distributions of summary and topic table, the higher the score is, the better the corre-
sponding summaries are. We can see that our pro-
AP SIGMOD posed method outperforms all the three baselines
MEAD 0.422246 0.611355 over almost all metrics.
LexRank 0.651217 0.681728
TopicLexRank 0.678515 0.692066
rele- cover- discrimina-
Submodular(REL) 0.62815 0.713159
vance age tion
Submodular(REL+COV) 0.683998 0.723228
MEAD 1.03 0.8 1.13
Our Method 0.673585 0.74572
TopicLexRank 1.9 1.6 1.83
Table 2. Comparison of the ratio of the covered Submodu-
words out of top 20 topic words 2.23 2 2.07
lar(REL)
Our Method 2.33 2.4 2.33
AP SIGMOD Table 4. Manual comparison of different summariza-
average max average max tion methods on AP news dataset
MEAD 0.026961 0.546618 0.078826 0.580055
LexRank 0.019466 0.252074 0.05635 0.357491 rele- cover- discrimina-
TopicLexRank 0.022548 0.283742 0.062034 0.536886 vance age tion
Submodu- MEAD 1.6 1.4 1.83
0.028035 0.47012 0.07522 0.52629
lar(REL) TopicLexRank 1.77 2.1 2.1
Submodular Submodu-
0.023206 0.362795 0.048872 0.524863 2.07 2.1 2.03
(REL+COV) lar(REL)
Our Method 0.010304 0.093017 0.024551 0.116905 Our Method 2.43 2.17 2.1
Table 3. Comparison of the average and max similar- Table 5. Manual comparison of different summariza-
ity between different topic summaries tion methods on SIGMOD proceeding dataset

Similarity between topic summaries: For 5.2.3 Manual Comparison of Different Kinds of
each summarization method, we compute the co- Labels
sine similarity between the summaries of any two In this section, we manually compare the three
topics, and then obtain the average similarity and kinds of labels: words, phrases and summary, as

2302
mentioned in Section 5.1. Similarly, the three hu- In practice, the different kinds of labels can be
man judges were asked to read and rank the three used together to allow users to browse topic mod-
kinds of labels in the same three aspects: rele- els in a level-wise matter, as described in next sec-
vance between the label and the topic with the cor- tion.
responding sentence set, the content coverage (or
diversity) in the label and the discrimination be- Topic 1 on SIGMOD proceeding dataset:
tween different labels. The rank k for each kind of word label: data analysis scientific set process analyze
labels ranges from 1 to 3 (1 means the best, and 3 tool insight interest scenario
means the worst; we allow equal ranks), and the
phrase label: data analysis ; data integration ; data
score is thus (3-k). We average the scores across
set
all labels and all judges and the results on the two
datasets are shown in Tables 6 and 7, respectively. summary label: The field of data analysis seek to
It is clear that the summary labels produced by our extract value from data for either business or scientific
proposed method have obvious advantages over benefit . … Nowadays data analytic application are
the conventional word labels and phrase labels. accessing more and more data from distributed data
store , creating a large amount of data traffic on the
The summary labels have better evaluation results network . …these service will access data from
on relevance, coverage and discrimination. different data source type and potentially need to
aggregate data from different data source type with
rele- cover- discrimina-
different data format ….Various data model will be
vance age tion
Word label 0.67 0.67 1.11
discussed , including relational data , xml data , graph-
Phrase label 1 0.87 1.4 structured data , data stream , and workflow ….
Summary la-
1.83 1.87 1.9
bel
Topic 2 on SIGMOD proceeding dataset:
Table 6. Manual comparison of different kinds of la-
bels on AP news dataset word label: user information attribute model privacy
quality record result individual provide
rele- cover- discrimina-
vance age tion phrase label: data set ; data analysis ; data
Word label 0.87 0.877 1.27 integration
Phrase label 1.4 1.53 1.43 summary label: An essential element for privacy
Summary la-
1.8 1.97 1.9 metric is the measure of how much adversaries can
bel
Table 7. Manual comparison of different kinds of la- know about an individual ' sensitive attribute ( sa ) if
bels on AP news dataset they know the individual ' quasi-identifier ( qi) ….We
present an automated solution that elicit user
5.2.4 Example Analysis preference on attribute and value , employing different
disambiguation technique ranging from simple
In this section, we demonstrate some running ex-
keyword matching , to more sophisticated probabilistic
amples on the SIGMOD proceeding dataset. Two model ….Privgene need significantly less perturbation
topics and the three kinds of labels are shown be- than previous method , and it achieve higher overall
low. For brevity, we only show the first 100 words result quality , even for model fitting task where ga is
of the summaries to users unless they want to see not the first choice without privacy consideration ….
more. We can see that the word labels are very
confusing, and the phrase labels for the two topics 5.2.5 Discussion of Practical Use
are totally overlapping with each other and have Although the summary labels produced by our
no discrimination. Therefore, it is hard to under- method have higher relevance, coverage and dis-
stand the two topics by looking at the word or crimination than the word labels and the phrase
phrase labels. Fortunately, by carefully reading labels, the summary labels have one obvious
the topic summaries, we can understand what the shortcoming of consuming more reading time of
two topics are really about. In this example, the users, because the summaries are much longer
first topic is about data analysis and data integra- than the words and phrases. The feedback from
tion, while the second topic is about data privacy. the human judges also reveals the above problem
Though the summary labels are much longer than and all the three human judges said they need to
the word labels or phrase labels, users can obtain take more than five times longer to read the sum-
more reliable information after reading the sum- maries. Therefore, we want to find a better way to
mary labels and the summaries can help users to make use of the summary label in practice.
better understand each topic and also know the In order to consider both the shorter reading
difference between different topics. time of the phrase labels and the better quality of

2303
the summary labels, we can use both of the two Nikolaos Aletras, Timothy Baldwin, Jey Han Lau, and
kinds of labels in the following hierarchical way: Mark Stevenson. 2015. Evaluating topic
For each topic, we first present only the phrase representations for exploring document collections.
Journal of the Association for Information Science
label to users, and if they can easily know about and Technology (2015).
the topic after they read the phrase label, the sum-
David M. Blei, Andrew Y. Ng, and Michael I. Jordan.
mary label will not be shown to them. Whereas, if 2003. Latent Dirichlet Allocation. Journal of
users cannot know well about the topic based on machine Learning research 3: 993-1022.
the phrase label, or they need more information
Ying-Lang Chang and Jen-Tzung Chien. 2009. Latent
about the topic, they may choose to read the sum- Dirichlet learning for document summarization.
mary label for better understanding the topic. Proccedings of IEEE International Conference on
Only the first 100 words of the summary label are Acoustics, Speech and Signal Processing
shown to users, and the rest words will be shown (ICASSP2009).
upon request. In this way, the summary label is Güneş Erkan and Dragomir R. Radev. 2004.
used as an important complement to the phrase la- LexPageRank: Prestige in multi-document text
bel, and the burden of reading the longer summary summarization. In Proceedings of EMNLP.
label can be greatly alleviated. Dan Gillick, Benoit Favre, and Dilek Hakkani-Tur.
2008. The ICSI summarization system at TAC 2008.
6 Conclusions and Future Work In Proceedings of the Text Understanding
Conference.
In this study, we addressed the problem of topic Thomas Hofmann. 1999. Probabilistic latent semantic
labeling by using text summaries. We propose a indexing. Proceedings of the 22nd annual
summarization algorithm based on submodular international ACM SIGIR conference on Research
optimization to extract representative summaries and development in information retrieval. ACM.
for all the topics. Evaluation results demonstrate Ioana Hulpus, Conor Hayes, Marcel Karnstedt, and
that the summaries produced by our proposed al- Derek Greene. 2013. Unsupervised graph-based
gorithm have high relevance, coverage and dis- topic labelling using dbpedia. Proceedings of the
crimination, and the use of summaries as labels sixth ACM international conference on Web search
has obvious advantages over the use of words and and data mining. ACM.
phrases. Wanqiu Kou, Fang Li, and Timothy Baldwin. 2015.
In future work, we will explore to make use of Automatic labelling of topic models using word
vectors and letter trigram vectors. Information
all the three kinds of labels together to improve Retrieval Technology. Springer International
the users’ experience when they want to browse, Publishing, 253-264.
understand and leverage the topics.
Jey Han Lau, Karl Grieser, David Newman, and
In this study, we do not consider the coherence Timothy Baldwin. 2011. Automatic labelling of
of the topic summaries because it is really very topic models. Proceedings of the 49th Annual
challenging to get a coherent summary by extract- Meeting of the Association for Computational
ing different sentences from a large set of different Linguistics: Human Language Technologies-
documents. In future work, we will try to make the Volume 1. Association for Computational
summary label more coherent by considering the Linguistics.
discourse structure of the summary and leveraging Hui Lin and Jeff Bilmes. 2010. Multi-document
sentence ordering techniques. summarization via budgeted maximization of
submodular functions. Human Language
Technologies: The 2010 Annual Conference of the
Acknowledgments North American Chapter of the Association for
The work was supported by National Natural Sci- Computational Linguistics. Association for
ence Foundation of China (61331011), National Computational Linguistics.
Hi-Tech Research and Development Program Hui Lin and Jeff Bilmes. 2011. A class of submodular
(863 Program) of China (2015AA015403) and functions for document summarization.
IBM Global Faculty Award Program. We thank Proceedings of the 49th Annual Meeting of the
the anonymous reviewers and mentor for their Association for Computational Linguistics: Human
Language Technologies-Volume 1. Association for
helpful comments. Computational Linguistics.
Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang
References Zhai. 2006. A probabilistic approach to
Nikolaos Aletras, and Mark Stevenson. 2013. spatiotemporal theme pattern mining on weblogs.
Representing topics using images. HLT-NAACL. In Proceedings of the 15th international conference
on World Wide Web, pp. 533-542. ACM.

2304
Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai.
2007. Automatic labeling of multinomial topic
models. Proceedings of the 13th ACM SIGKDD
international conference on Knowledge discovery
and data mining. ACM.
Xian-Ling Mao, Zhao-Yan Ming, Zheng-Jun Zha, Tat-
Seng Chua, Hongfei Yan, and Xiaoming Li. 2012.
Automatic labeling hierarchical topics. In
Proceedings of the 21st ACM international
conference on Information and knowledge
management, pp. 2383-2386. ACM.
You Ouyang, Sujian Li, and Wenjie Li. 2007.
Developing learning strategies for topic-based
summarization. Proceedings of the sixteenth ACM
conference on Conference on information and
knowledge management. ACM.
Dragomir R. Radev, Hongyan Jing, Małgorzata Styś,
and Daniel Tam. 2004. Centroid-based
summarization of multiple documents. Information
Processing & Management 40, no. 6: 919-938.
Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, and
Zheng Chen. 2007. Document summarization using
Conditional Random Fields. In IJCAI, vol. 7, pp.
2862-2867.
Thomas L. Griffiths and Mark Steyvers. 2004. Finding
scientific topics. Proceedings of the National
Academy of Sciences 101.suppl 1: 5228-5235.
Xiaojun Wan. 2008. Using only cross-document
relationships for both generic and topic-focused
multi-document summarizations. Information
Retrieval 11.1: 25-49.
Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007.
Manifold-ranking based topic-focused multi-
document summarization. In IJCAI, vol. 7, pp.
2903-2908.
Xiaojun Wan and Jianwu Yang. 2008. Multi-document
summarization using cluster-based link analysis.
Proceedings of the 31st annual international ACM
SIGIR conference on Research and development in
information retrieval. ACM.
Xuerui Wang and Andrew McCallum. 2006. Topics
over time: a non-Markov continuous-time model of
topical trends. Proceedings of the 12th ACM
SIGKDD international conference on Knowledge
discovery and data mining. ACM.

2305

You might also like