You are on page 1of 8

DATA MINING AND

AND KNOWLEDGE MANAGEMENT

Semester 8

Assignment 1 Data Mining

Oleh :

Grup E
Name NIM
Bagus Anom Tri Atmojo 004201705042
Eka Rudi Irawan 004201805041
Nurchayadin 004201705065
Steven Tricahayadi 004201905xxx
Viona Kacaribu 004201805079

Group Leader : Eka Rudi Irawan

WhatsApp No : 089666238990

Date : 10/02/2021
List of Contributions

Name Contributions
Bagus Anom Tri Atmojo No . 2 A
Eka Rudi Irawan No, 3 A,B,C,D,E
Nurchayadin No.2 C
Steven Tricahayadi No.2 D
Viona Kacaribu No.2 B
Eka Rudi Irawan (004201805041)

No.3
Question

3.Here, we further explore the cosine and correlation measures.


a. What is the range of values that are possible for the cosine measure?
b. If two objects have a cosine measure of 1, are they identical? Explain.
c. What is the relationship of the cosine measure to correlation, if any? (Hint: Look at
statistical measures such as mean and standard deviation in cases where cosine and
correlation are the same and different.)
d. Figure (a) below shows the relationship of the cosine measure to Euclidean distance for
100,000 randomly generated points that have been normalized to have an L2 length of 1.
What general observation can you make about the relationship between Euclidean
distance and cosine similarity when vectors have an L2 norm of 1?
e. Figure (b) shows the relationship of correlation to Euclidean distance for 100,000
randomly generated points that have been standardized to have a mean of 0 and a
standard deviation of 1. What general observation can you make about the relationship
between Euclidean distance and correlation when the vectors have been standardized to
have a mean of 0 and a standard deviation of 1?
Answer :

a) (-1,1). Many times the data has only positive entries and in that case the range is (0 , 1)
b) + Not necessarily. All we know is that the value at their attributes differ by contrant
factor
c) – For two vectors , ‘x’ and “y” that have mean at 0 , CORR (x, y) = COS (x, y)
d) Since cw the 10000 points fall on the curve there is a functional relationship between
Euclidean distance and cosine similarity for normalized data. More specially , there is an
inverse relationship between cosine similarity and Euclidean distance, for example if two
data points are identic, there cosine similarity is one and Euclidean distance is zero. But
if two data point have a high Euclidean distance , their cosine value is close to zero . Note
that all the sample data points were bean the positive quadrat . I.e had only positive value.
This means that allcosine value will be positive and correlation values will be positive.
e) Some as answer (d), but with correlation substituted for cosine.
No.2
Question : For the following vectors, x and y, calculate the indicated similarity or distance
measures.
a. x = (2, 2, 2, 2), y = (3, 3, 3, 3) cosine, correlation, Euclidean, Extended Jaccard
b. x = (0, 1, 0, 1), y = (1, 0, 1, 0) cosine, correlation, Euclidean, SMC, Jaccard
c. x = (0,−1, 0, 1), y = (1, 0,−1, 0) cosine, correlation, Euclidean
d. x = (1, 1, 0, 1, 0, 1), y = (1, 1, 1, 0, 0, 1) cosine, correlation, SMC, Jaccard
e. x = (2,−1, 0, 2, 0,−3), y = (−1, 1,−1, 0, 0,−1) cosine, correlation, Euclidean

Answer :

Nurchayadin ( 004201705065)
A ) x=(1,1,1,1) , y =(2,2,2,2)

Cosine
Cosx,y =>x.yxy
x.yxy=>2+2+2+21+1+1+14+4+4+4 =82
=>82*4=1
Correlation
Corr of x and y
Corrx,y =SxySxSy =00 // Not Defined
Sxy=1n-1k=1nxk-x(yk-y)
x=Ink=Inxk= 1+1+1+4=1
y=Sxy=130=0
Sx=1n-1k=1nxk-x2=0
Sy=1n-1k=1nyk-y2=0
Euclidian
dx,y=k
k=1n(xk-yk)2=(1 2 2)+(1 2 2)+(2 2 1)+(2 2 4) =2

Viona Kacaribu (004201805079 )


B) x=(0 ,1,0,1) y=(1,0,1,0)
Cosine
Cosx.y=x.yxy=0+0+0+022=02=0
Correlation
Corr(x,y)
Corrxy=SxySxSy=-131313=-1
Sxy=1n-1k=1nxk-x(yk-y)
x=1nk=1nxk=24=12
y=1nk=1nyk=12
Sxy=13(-14-14-14-14)=-13
Sx=1n-1k=1nxk-x2=13(+14+14+14+14=13
Sx=1n-1k=1nyk-y2=13
Euclidean
Dx,y=k=1n(xk-yk)2=1+1+1+1=4=2

Jaccard
J=F11f01+f10+f11=01+20+0

Bagus Anom Tri Atmojo (004201705042)


D. X = (1, 1, 0, 1, 0, 1), Y = (1, 1, 1, 0, 0, 1))
 Cosine
(x . y) = (1 * 1) + (1 * 1) + (0 * 1) + (1 * 0) + (0 * 1) + (1 * 1) = 3

||x||= √(1 ∗ 1) + (1 ∗ 1) + (0 ∗ 0) + (1 ∗ 1) + (0 ∗ 0) + (1 ∗ 1)= √4 = 2

||y||=√(1 ∗ 1) + (1 ∗ 1) + (1 ∗ 1) + (0 ∗ 0) + (0 ∗ 0) + (1 ∗ 1)= √4 = 2

Cos(x,y) = (x . y) / (||x||*||y||) = 3 / (2*2) = 0.75

 Correlation

Mean(x) = (1 + 1 + 0 + 1 + 0 + 1)/6 = 0.666


Mean(y) =(1 + 1 + 1 + 0 + 0 + 1)/6 = 0.666

1
Cov(x,y) = * [(1-0.666)(1-0.666) + (1-0.666)(1-0.666) +(0-0.666)(1-0.666)
6−1

+(1-0.666)(0-0.666) + (0-0.666)(0-0.666) +(1-0.666)(1-0.666) ]


1
Cov(x,y) = * [(0.111556) + (0.111556) + (-0.2224) + (-0.2224) + (0.4435) +
5
(0.111556)]
1
Cov(x,y) = * 0.333368 = 0.06667
5

s(x) =
1

6−1
∗ [(1 − 0.666)2 + (1 − 0.666)2 + (0 − 0.666)2 + (1 − 0.666)2 + (0 − 0.666)2 + (1 − 0.666)2

=
1
s(y) = √6−1 ∗ [(1 − ½)2 + (0 − ½)2 + (1 − ½)2 + (0 − ½)2 = 0.5174
1
5
Corr(x,y) = 0.7471
(0.5174∗0.5174)

 Euclidean

d(x,y) =
√(1 − 1)2 + (1 − 1)2 + (0 − 1)2 + (1 − 0)2 + (0 − 0)2 + (1 − 1)2
d(x,y) =√2 = 1.414
 SMC
x =11 0101
y =111001
𝑓01 = 1
𝑓10 = 1
𝑓00 = 1
𝑓11 = 3
SMC = (𝑓11 + 𝑓00 ) / (𝑓01 + 𝑓10 + 𝑓11 + 𝑓00

= (3 + 1) / (1 + 1 + 3 + 1) = 0.666
 Jaccard
J = (𝑓11 ) / 𝑓01 + 𝑓10 + 𝑓11
=3/1+1+3 = 0.6

Steven Tricahyadi
E. x = (2, -1, 0, 2, 0, -3), y(-1, 1, -1, 0, 0, -1)

 Cosine
(x . y) = (2 * -1) + (-1 * 1) + (0 * -1) + (2 * 0) + (0*0) + (-3 * -1)= 1
||x|| = √(2 ∗ 2) + (−1 ∗ −1) + (0 ∗ 0) + (2 ∗ 2) + (0 ∗ 0) + (−3 ∗ −3)
= √14 = 3.7417
||y|| =√(−1 ∗ −1) + (1 ∗ 1) + (−1 ∗ −1) + (0 ∗ 0) + (0 ∗ 0) + (−1 ∗ −1)
= √4 = 2

Cos(x,y = (x . y) / (||x||*||y||) = 1 / (3.7417*2)

Cos(x,y) = 0.1336

 Correlation

Mean(x) = (2 - 1 + 0 + 2 + 0 - 3)/6 =0
Mean(y) =(-1 + 1 - 1 + 0 + 0 - 1)/6 = -0.3333

1
Cov(x,y) = * [(2-0)(-1+0.3333) + (-1-0)(1+0.3333) +(0-0)(-1+0.3333)
6−1

+(2-0)(0+0.3333) + (0-0)(0+0.3333) + (-3-0)(-1+0.3333)]


1
Cov(x,y) = * [-0.3332]
5
Cov(x,y) = −0.06664

1
s(x) = √6−1 ∗ [(2 − 0)2 + (−1 − 0)2 + (0 − 0)2 + (2 − 0)2 + (0 − 0)2 + (−3 − 0)

= 3.6
s(y) =
1
√ ∗ [(−1 + 0.3333)2 + (1 + 0.3333)2 + (−1 + 0.3333)2 + (0 + 0.3333)2 + (0 + 0.3333)2 + (−1 + 0.3333
6−1

= -0.00004
0
Corr(x,y) =0
(1.4142∗1.4142)

You might also like