You are on page 1of 6

THE INSTITUTE OF ELECTRONICS,

INFORMATION AND COMMUNICATION ENGINEERS

TECHNICAL REPORT OF IEICE.

Wikipedia

305-8573 1-1-1
() 141-0031 8-3-6
135-0064 2-3-26
Wikipedia

Wikipedia
Wikipedia

)
Wikipedia

Wikipedia

Analyzing Topics of Blogs based on Wikipedia as a Multilingual


Knowledge Source
Kensaku MAKITA , Daisuke YOKOMOTO , Hiroko SUZUKI , Takehito UTSURO , Yasuhide
KAWADA , and Tomohiro FUKUHARA
Grad. Sch. of Systems and Information Engineering, University of Tsukuba, Tsukuba, 305-8573, Japan
Navix Co., Ltd. 8-3-6 Nishi-Gotanda, Shinagawa-Ku Tokyo 141-0031, Japan
Center for Service Research, National Institute of Advanced Industrial Science and Technology,
Tokyo, 135-0064, Japan
Abstract Given a search query, most existing search engines simply return a ranked list of search results. However,
it is often the case that those search result documents consist of a mixture of documents that are closely related to
various sub-topics. This is also true for the case of our previously developed framework of retrieving blog posts which
are closely related to a certain topic. In this paper, we propose a framework of categorizing blog posts according to
their sub-topics, where, given a search query, those blog posts are automatically collected from the blogosphere. In
our framework, the sub-topic of each blog post is identied by utilizing Wikipedia entries as a knowledge source and
each Wikipedia entry title is considered as a sub-topic label. This paper especially presents examples of applying
the proposed framework to Japanese / Korean / English blogospheres. Through those examples, we show that it
becomes much easier to quickly overview the distribution of sub-topics over the whole blog posts collected with a
certain search query.
Key words blog analysis, topic, Wikipedia, sub-topic categorization, facets

1.

Fukushima I Nuclear Power Plant ()


Wikipedia

[1]

Wikipedia

Wikipedia

2.3.4.

Wikipedia

5. 1

1 [2, 3]
Wikipedia

[4]

(
5. 2

Wikipedia 2011 6

75

Wikipedia

6.

2.

Wikipedia

API

t0 )

RBMK
Reactor vessel()

1
Wikipedia

(: : )

D(t0 )

3.
t0 Wikipedia

Wikipedia

e R(e)
Wikipedia e r( R(e)) d

F (t0 ) 2 t0

[24] Wikipedia

Wikipedia f0

f0 D(t0 )

u(f0 ) 30

r (inverse

F (t0 ) [2, 3]

f0 D(t0 )
u(f0 )

document frequency, idf) w(r) 4 e


idf I

I(e)
=

w(r1 ), . . . , w(rn )

5 F (t0 )

Wikipedia e

[4]

r d f req(d, r) d

G(d,
e)

4.
2. t0


G(d,
e) =

f req(d, r1 ), . . . , f req(d, rn )

D(t0 ) F (t0 )

Wikipedia e d

Sim(e, d) 2

4. 1 Wikipedia

 G(d,

Sim(e, d) = I(e)
e) =

w(r) f req(d, r)

rR(e)

Wikipedia e

4. 2
2

.
3
Wikipedia Wikipedia

d
Wikipedia 

r
[4]

4w(r) = idf(r) = log

F (t0 ) f
5

f = argmax Sim(f  , d)
f  F (t0 )

d f d, f 

Naver Open API 82010

1011 t0
4 9
( [4] )
2

12

5.
5. 1

[2, 3] [4]
[2, 3]
[4]

t0 Ya-

hoo!Japan API 62010 79


t0 8 7
(
[2, 3] ) t0

18
19

http://www.yahoo.co.jp/
6

8http://dev.naver.com/openapi/

7
fc2.comyahoo.co.jpyaplog.jpameblo.jpgoo.ne.jplivedoor. 9blog.naver.com, blog.daum.net, blog.cyworld.com, blog.paran.

jpSeesaa.nethatena.ne.jp

com

(Nuclear power

plant)

2011 3 12

(Fukushima I
Nuclear Power

2011 3 20

Plant)

2011 3 13

(Nuclear meltdown)

360

(Reactor vessel)

(Radioactivity)

(Sievert)

32

2011 3 12
RBMK

2011 3 17

(Becquerel)

(Nuclear weapons

2011 3 17 G 1

2011 3 14

2011 4 19

2011 4 13

2011 4 21

2011 3 29

testing)

2011 4 30

Wikipedia

5. 2

Yahoo! Search BOSS 102011 5

4 11

) Wikipedia

1,000 URL

t0

()

t0

Wikipedia

1
Wikipedia

Wikipedia

10http://developer.yahoo.com/search/boss/
11blogspot.comwordpress.comtypepad.commultiply.com

Wikipedia

1 1

Wikipedia

G 1

RBMK

6.
TREC-2009
[5]

[6] Web

Web

Wikipedia

[7, 8] Web

Wikipedia

Wikipedia

[1] D. Tunkelang. Faceted Search. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers, 2009.
[2] , , , , , ,
, , , .
Wikipedia . 3
DEIM
, 2011.
[3] D. Yokomoto, K. Makita, Y. Kawada, T. Utsuro, and
T. Fukuhara. Utilizing Wikipedia in categorizing topic related blogs into facets. In Proc. 12th PACLING, 2011.
[4] D. Lim, D. Yokomoto, K. Makita, T. Utsuro, and
T. Fukuhara. Utilizing Wikipedia as a knowledge source
in categorizing topic related Korean blogs into facets.
17 , pp. 876879, 2011.
[5] C. Macdonald, I. Ounis, and I. Soboro. Overview of the
TREC-2009 blog track. In Proc. TREC-2009, 2009.
[6] , . PLSI .
16 , pp. 118121, 2010.
[7] , , .
.
, Vol. 46, No. SIG 13(TOD 27), pp. 4052, 2005.
[8] , .
. , Vol. 50, No. 4,
pp. 13991409, 2009.
[9] , , , , , .
BLOGRANGER .
, OIS2005-92, pp. 1924, 2006.
[10] C. Li, N. Yan, S. B. Roy, L. Lisham, and G. Das. Facetedpedia: Dynamic generation of query-dependent faceted interfaces for Wikipedia. In Proc. 19th WWW, pp. 651660,
2010.
[11] , , , .
. 3
DEIM
, 2011.

[9] Wikipedia
[10]
[11]

7.

You might also like