You are on page 1of 39

min.song@yonsei.ac.

kr

*
1. Background
2. Related Work & Literature Review
3. Technique Method
4. Results
5. Conclusions
2

1. Background

1. Background
Research Questions
RQ: ,
?
RQ1. ,
?
RQ2. ,
-
?
RQ3. - ,

?
4

1. Background
The Goal of this study


, ,

6
, , , , 5

,
,

Clustering ,
Classification
( - )

Classification

2. Related Work & Literature Review


(2010)

2010 10 4 23 10
1

1


:
( , 2008)

( , 2006)

( 2002)

2. Related Work & Literature Review


(
2010)
, , , , ,

( ), (
), ( )

LED
, ,
( 2011)

Opinion analysis(Balahur 2009), (Pollak

7 2011)

3. Technique Method


Lucene Korean KLT


, PNNC
MALLET Package


Clustering
Classification

3. Technique Method

: KINDS (Korea Integrated News Database


System)

: 2008 2 ~ 2012 5 KINDS


: 3,026

: URL
URL HTML


, ,

3. Data Preprocessing

1.

, ,

10

: 4
2.

2008.12.29- 2009-01.29

2012.04.25- 2012.05.25
:
3.
2012.04.25- 2012.05.25

4.
2012.03.15- 2012.04.15
:
5.
2012.04.25- 2012.05.25

6.
2012.04.25- 2012.05.25

3. Date Preprocessing Continued

HTML,
Lucene Korean (
)
Java ,
KLT

11

3. Date Preprocessing Continued


(,,,,,.,(,),,|,,,,,,,,,)

12

3. Text Mining Techniques


Clustering
/ PNNC ( 2006)

13

3. Technique Method

Classification:
MALLET Package
Naive Bayes
(Precision), (Recall), F-value

70:30

( )

750
1400
350
150
2000
150

( )

150
150
100
50
150
50

(Carlos and Lucio, 2003)


( )
( )
( )
( )
(4 )
( )

14

4. Results

10

( )

:
:


0.47




15

F-value

0.479339

0.386667

0.428044

0.460674

0.546667

0.5

0.483444

0.486667

0.48505

4. Results

10

( )

: 6


0.7
, F-value 0.5 ->


16

0.18, Fvalue 0.23 ->

F-value

0.401515

0.706667

0.512077

0.346154

0.18

0.236842

0.425926

0.306667

0.356589

4. Results
( )

17

Clustering

Clustering

4. Results

10

( )

:
,


0.59
18

F-value

0.384615

0.5

0.434783

0.511628

0.44

0.473118

0.595238

0.5

0.543478

4. Results

10

( )



:
, ,

, :

19

F-value

0.430233

0.37

0.397849

0.5125

0.41

0.455556

0.395522

0.53

0.452991

4. Results

: ,

10

( )

:
:


: 0.6

0.84, F-value 0.81,
0.8


20

F-value

0.847826

0.78

0.8125

0.655738

0.8

0.720721

0.627907

0.54

0.580645

4. Results
( )

21

4. Results

: , ,

, ,

10

(4 )



22

F-value

0.435897

0.34

0.382022

0.349398

0.58

0.43609

0.642857

0.36

0.461538

4. Results
(4 )

23

4. Results
(4 )
4 ,
150
50
3 , F-value 0.6
, 0.7

24

F-value

0.672414

0.78

0.722222

0.62

0.738095

0.673913

0.672414

0.78

0.722222

4. Results
(4 )
4
2012.04.08- 2012.07.08
: 4
: 97 / : 85 / : 47


F-value 72%

67%
70%

25

4. Results
(4 )
4
4 70%
7% , 3%

26

5. Conclusions

4

,

Clustering


Classification , 70%
27

5. Conclusions

3,000

clustering

Classification

Classification 0.7 :

28

5. Conclusions

29

: Topic Modeling

: , , ,
2010 11
1 2012 10 31

(3,928)

(8,110)

(4,182)

(1,450)

(3,244)

(1,794)

(2,008)

(4,359)

(2,351)

1,880

2,048

2,213

1,969

685

765

937

857

960

1,048

1,304

1,047

Topic Modeling Technique

Dirichlet ,

. ,
,
topic

Topic Modeling

10 topic .
7 topic

Topic 1.

Topic 2.

Topic 3.

Topic 4.

Topic 5.

Topic 6.

Topic 7.

Topic 1.

, ,
, , ,

, , ,

, ,

, , ,

, ,

, ,

, ,

, ,

, ,

, ,

, ,

, , , ,
, , , ,

Topic 1.

, , , ,
, , ,

, , ,

, , , ,
, , , ,
, , , , , ,
, , ,

, ,

, , , ,
,

, ,

Topic 1.

, , , ,
, , , , , ,
, , ,

, ,

, , , , ,
, , , , ,
, , , , ,
, , ,

, ,

* References
, , :
, , 41 (2008), 232~267.
. . Accessed 2012.04.12,
<http://www.mediatoday.co.kr/news/articleView.html?idxno=91565>.
, , .
2003 , 2003 11 , . 574~580.
, , : , ,
, , 34 (2006), 132~162.
, , , 40
3 (2006a), 191~214.
, , , 23 4
(2006b), 215~231.
, , , , 2005.
, , , 46 4 (2002), 314~348.
, , , , , 17 4
(2011), 227~240.
, , , 2010.

36

* References
Carlos H. Caldas, and Lucio Soibelman, Automating hierarchical document classification for
construction management information systems, Journal of Automation in Construction,
Vol.12(2003), 395~406
Pollak, Senja, Roel Coesemans, Walter Daelemans and Nada Lavra, Detecting Contrast
Patterns in Newspaper Articles by Combining Discourse Analysis and Text Mining, Pragmatics,
Vol.21, No.4(2011), 647~683.
Balahur, A., and R. Steinberger, Rethinking sentiment analysis in the news: From theory to
practice and back, In Proceedings of the 1st Workshop on Opinion Mining and Sentiment
Analysis, Satellite to CAEPIA 2009, (2009).

37


38

1.

1)

0.13967

13472

0.08543

8610

( /
)

0.40472

26825

0.25105

26598

( )

0.11217

13680

0.10259

15239

0.05549

10756

0.21487

16733

0.24956

17256

0.25473

20971

0.17779

17851

0.21742

18208

2)

0.16248

11974

0.16728

13883

0.09057

8870

0.12418

12505

0.43681

35593

0.25515

27772

0.05842

6672

0.06781

6904

You might also like