Affinity-Preserving Random Walk for Multi-Document Summarization
Kexiang Wang, Tianyu Liu, Zhifang Sui and Baobao Chang
Key Laboratory of Computational Linguistics, Ministry of Education
School of Electronics Engineering and Computer Science, Peking University
Collaborative Innovation Center for Language Ability, Xuzhou 221009 China
{wke, tianyud421, sz£, chbb}@pku. edu
Abstract
Multi-docurnent summarization provides
users with & short text that summarizes
the information in a set of related doc-
‘uments. ‘This paper introduces: alinity-
preserving random walk to the sumira-
riration unk, which preserves the aflin=
ity relations of sentences hy an absorh-
ing random walk model. Meanwhile, we
pul forwant adjustable attaity-preserving
random walk to eaforce the diversity eon-
strain of summarization in the random
walk proces, The ROUGE evaluations on
BUC 2003 topie-focused summarization
ask and DUC 200% genes summarizat-
tion tsk show the good pertirmance of
cour method, which has the best ROUIGE-
2 recall among the graph-based_manking
methods,
1 Introduction
Mulidtocument stimmatization provides: users
with summary that reflects the main information
in a set af given documents. ‘The documents are
‘ofien related ane talk about more than one top-
cs, Generic multiosument summarization and
topic-focused multi-tocument summarization site
ment cluster. ‘The sentences with litle informa-
tion about the document cluster should not be im-
‘eluded in the suramary. The second one is te
versity, The information overlap between sum
mury yentcnces should be ax minimal a possic
‘ble duc te the length limit of summary. In other
‘words, the Information coverage of summary is
a determinant, which requires that the summary
seniences should cover diverse aspects of infor
mation. Besides the two goals, thers is another
goal for the topic-focused summarization and that
is relevaney. [i requires thal the summary sen-
tenees be relevant to the topic description, A se-
ries of conferences and workshops or automtie
Jext summarization (eg. NTCIR, DUC), special
opie sexiiony ink ACL, EMNELP and SIGIR have
achvunced the technicues to achieve these gonls net
smuany approaches have been proposed 0 fr
In this paper, we focus on the extractive summia~
ization methods, which extract the summary sem-
fences from the input document cluster. ‘We pro-
pose alfnity-preserving random walle for mulki-
document suramarization. The metho! iy x graphs
faved ranking methed, which tke into account
the global infewnnation collectively computed from
the ebtire seriicnce affinity graph. Dilferent from
the previous eroph-based ranking methods, our
method adopts “global normalization” to trans-