: a Statistical Approach to Extracting EntityRelationships
Dept. of Comp. Sci. & Tech.Tsinghua UniversityBeijing, 100084 P.R China
Microsoft Research AsiaNo. 49 Zhichun RoadBeijing, 100080 P.R China
Dept. of EEISUniversity of Sci. & Tech. ofChinaHefei, 230027 P.R China
Traditional relation extraction methods require pre-speciﬁedrelations and relation-speciﬁc human-tagged examples. Boot-strapping systems signiﬁcantly reduce the number of train-ing examples, but they usually apply heuristic-based meth-ods to combine a set of strict hard rules, which limit theability to generalize and thus generate a low recall. Further-more, existing bootstrapping methods do not perform openinformation extraction (Open IE), which can identify var-ious types of relations without requiring pre-speciﬁcations.In this paper, we propose a statistical extraction frameworkcalled
(StatSnowball), which is a boot-strapping system and can perform both traditional relationextraction and Open IE.StatSnowball uses the discriminative Markov logic net-works (MLNs) and softens hard rules by learning their weightsin a maximum likelihood estimate sense. MLN is a generalmodel, and can be conﬁgured to perform diﬀerent levels of relation extraction. In StatSnwoball, pattern selection isperformed by solving an
-norm penalized maximum like-lihood estimation, which enjoys well-founded theories andeﬃcient solvers. We extensively evaluate the performance of StatSnowball in diﬀerent conﬁgurations on both a small butfully labeled data set and large-scale Web data. Empiricalresults show that StatSnowball can achieve a signiﬁcantlyhigher recall without sacriﬁcing the high precision during it-erations with a small numberof seeds, and the joint inferenceof MLN can improve the performance. Finally, StatSnowballis eﬃcient and we have developed a working entity relationsearch engine called
based on it.
Categories and Subject Descriptors
]: Models - Statistical
This work was done when Jun Zhu and Xiaojiang Liu werevisiting Microsoft Research Asia.
Jun and Bo are also with the State Key Laboratory of Intel-ligent Technology and Systems; and the Tsinghua NationalLaboratory for Information Science and Technology.
Copyright is held by the International World Wide Web Conference Com-mittee (IW3C2). Distribution of these papers is limited to classroom use,and personal use by others.
April 20–24, 2009, Madrid, Spain.ACM 978-1-60558-487-4/09/04.
Relation extraction, statistical model, Markov logic networks
The World Wide Web has been growing rapidly as a hugeinformation repository, which contains various kinds of valu-able semantic information about real-world entities, such aspeople, organizations, and locations. We have been work-ing on object-level search engines, which automatically ex-tract and integrate the semantic information about entitiesand return a list of ranked entities instead of webpages toanswer user queries . In object-level search engines, itis particularly important to mine entity relations from theWeb to automatically build an entity relationship graph tolink all the extracted information together. With the en-tity relationship graph, users will be able to easily navigatethrough their interested entities just like the way they nav-igate through hyperlinked webpages. This paper focuses onentity relation mining from the Web.
1.1 Motivating Example
We have deployed an entity relationship search engine inChinese search market called
. Renlifang is a dif-ferent kind of search engine we have been building, one thatexplores relationships between entities. In Renlifang, userscan query the system about people, locations, and organi-zations and explore their relationships. Currently Renlifangonly serves in the Chinese language domain. These entitiesand their relationships are automatically mined from billionsof crawled Chinese webpages. For each crawled webpage inRenlifang, the system extracts entity information and de-tects relationships, covering a spectrum of everyday indi-viduals and well-known people, locations, or organizations.Below we list the key features of Renlifang:
Entity Relationship Mining and Navigation
. Renli-fang enables users to explore highly relevant informationduring searches to discover interesting relationships aboutentities associated with their queries.
. Renlifang can return a ranked list of people known for dancing or any other topic.
. Renlifang detects the pop-ularity of an entity and enables users to browse entities indiﬀerent categories ranked by their prominence on the Webduring a given time period.
WWW 2009 MADRID!Track: Data Mining / Session: Statistical Methods