You are on page 1of 4

DATA MINING

TUGAS TEXT MINING

Syafrida Maulina Hartanti 17081010066

DOSEN PENGAMPU :
INTAN YUNIAR PURBASARI, S.KOM, M.SC.

PROGRAM STUDI TEKNIK INFORMATIKA


FAKULTAS ILMU KOMPUTER
UNIVERSITAS PEMBANGUNAN NASIONAL “VETERAN”
JAWA TIMUR
2020
Given a query of “W4 W5” and a collection of the following three documents:

 Document 1: <W1 W2 W3 W4 W5 >


 Document 2: <W6 W7 W4 W5>
 Document 3: <W8 W3 W9 W4 W10>

Use the Vector Space Model, TF/IDF weighting scheme, and Cosine vector similarity measure
to find the most relevant document(s) to the query.

Jawaban :

ID Word Document Frequency


1 W1 1
2 W2 1
3 W3 2
4 W4 3
5 W5 2
6 W6 1
7 W7 1
8 W8 1
9 W9 1
10 W10 1

1. The Vector Space Model


 Document 1: <W1 W2 W3 W4 W5 >
( 1,1,1,1,1,0,0,0,0,0)

 Document 2: <W6 W7 W4 W5>


( 0,0,0,1,1,1,1,0,0,0)

 Document 3: <W8 W3 W9 W4 W10>


( 0,0,1,1,0,0,0,1,1,1)
2. TF/IDF weighting scheme

ID Word Document Frequency IDF


1 W1 1 0,477
2 W2 1 0,477
3 W3 2 0,176
4 W4 3 0
5 W5 2 0,176
6 W6 1 0,477
7 W7 1 0,477
8 W8 1 0,477
9 W9 1 0,477
10 W10 1 0,477

 Document 1: <W1 W2 W3 W4 W5 >


( 1,1,1,1,1,0,0,0,0,0 )
( 0.477, 0.477, 0.176, 0, 0.176, 0, 0, 0, 0, 0 )

 Document 2: <W6 W7 W4 W5>


( 0,0,0,1,1,1,1,0,0,0 )
( 0, 0, 0, 0, 0.176, 0.477, 0.477, 0, 0, 0 )

 Document 3: <W8 W3 W9 W4 W10>


( 0,0,1,1,0,0,0,1,1,1 )
( 0, 0, 0.176, 0, 0, 0, 0, 0.477, 0.477, 0.477 )

Normalisasi

 Document 1: <W1 W2 W3 W4 W5 >


( 1,1,1,1,1,0,0,0,0,0 )
( 0.477, 0.477, 0.176, 0, 0.176, 0, 0, 0, 0, 0 )

Normalisasi ( 0.663, 0.663, 0,245, 0, 0.245 , 0, 0, 0, 0, 0 )


 Document 2: <W6 W7 W4 W5>
( 0,0,0,1,1,1,1,0,0,0 )
( 0, 0, 0, 0, 0.176, 0.477, 0.477, 0, 0, 0 )

Normalisasi ( 0, 0, 0, 0, 0.201, 0.545, 0.545, 0, 0, 0 )

 Document 3: <W8 W3 W9 W4 W10>


( 0,0,1,1,0,0,0,1,1,1 )
( 0, 0, 0.176, 0, 0, 0, 0, 0.477, 0.477, 0.477 )

Normalisasi ( 0, 0, 0.177, 0, 0, 0, 0, 0.479, 0.479, 0.479 )

3. Cosine vector similarity

Q = ( 0, 0, 0, 0, 0.316, 0, 0, 0, 0, 0 )
Cosine (D1,Q) = 0,173
Cosine (D2,Q) = 1,01
Cosine (D3,Q) = 0

You might also like