You are on page 1of 8

DATA MINING AND

AND KNOWLEDGE MANAGEMENT

Assignment 1 Data Mining

Oleh :

Eka Rudi Irawan - 004201805041

Viona Kacaribu -

Steven Tricahayadi -

Nurchayadin - 004201705065

Bagus Anom Tri Atmojo - 004201705042

Program Studi Teknik Industri

28 Januari 2020
Contribution Member :

Eka Rudi Irawan = No, 3 A,B,C,D,E

Viona Kacaribu = No.2 B

Steven Tricahayadi = No.3 D

Nurcahayadin = No.2 C

Bagus Anom Tri Atmojo = No . 2 A

Manager Group :

Name : Eka Rudi Irawan


Whatsapp : 089666238990
E-mail : rudii8185@gmail.com

Member Group :

1. Eka Rudi Irawan - 004201805041


Viona
2. Kacaribu -
3. Steven Tricahayadi -
4. Nurcahayadin - 004201705065
5. Bagus Anom Tri Atmojo - 004201705042

 Contribution of Group member :

Eka Rudi Irawan (004201805041)


No.3
Question

3.Here, we further explore the cosine and correlation measures.


a. What is the range of values that are possible for the cosine measure?
b. If two objects have a cosine measure of 1, are they identical? Explain.
c. What is the relationship of the cosine measure to correlation, if any? (Hint: Look at
statistical measures such as mean and standard deviation in cases where cosine and
correlation are the same and different.)
d. Figure (a) below shows the relationship of the cosine measure to Euclidean distance for
100,000 randomly generated points that have been normalized to have an L2 length of 1.
What general observation can you make about the relationship between Euclidean
distance and cosine similarity when vectors have an L2 norm of 1?
e. Figure (b) shows the relationship of correlation to Euclidean distance for 100,000
randomly generated points that have been standardized to have a mean of 0 and a
standard deviation of 1. What general observation can you make about the relationship
between Euclidean distance and correlation when the vectors have been standardized to
have a mean of 0 and a standard deviation of 1?
Answer :

a) (-1,1). Many times the data has only positive entries and in that case the range is (0 , 1)
b) + Not necessarily. All we know is that the value at their attributes differ by contrant
factor
c) – For two vectors , ‘x’ and “y” that have mean at 0 , CORR (x, y) = COS (x, y)
d) Since cw the 10000 points fall on the curve there is a functional relationship between
Euclidean distance and cosine similarity for normalized data. More specially , there is an
inverse relationship between cosine similarity and Euclidean distance, for example if two
data points are identic, there cosine similarity is one and Euclidean distance is zero. But
if two data point have a high Euclidean distance , their cosine value is close to zero . Note
that all the sample data points were bean the positive quadrat . I.e had only positive value.
This means that allcosine value will be positive and correlation values will be positive.
e) Some as answer (d), but with correlation substituted for cosine.
No.2
Question : For the following vectors, x and y, calculate the indicated similarity or distance
measures.
a. x = (2, 2, 2, 2), y = (3, 3, 3, 3) cosine, correlation, Euclidean, Extended Jaccard
b. x = (0, 1, 0, 1), y = (1, 0, 1, 0) cosine, correlation, Euclidean, SMC, Jaccard
c. x = (0,−1, 0, 1), y = (1, 0,−1, 0) cosine, correlation, Euclidean
d. x = (1, 1, 0, 1, 0, 1), y = (1, 1, 1, 0, 0, 1) cosine, correlation, SMC, Jaccard
e. x = (2,−1, 0, 2, 0,−3), y = (−1, 1,−1, 0, 0,−1) cosine, correlation, Euclidean

Answer :

Nurchayadin ( 004201705065)
A ) x=(1,1,1,1) , y =(2,2,2,2)

Cosine
Cosx,y =>x.yxy
x.yxy=>2+2+2+21+1+1+14+4+4+4 =82
=>82*4=1
Correlation
Corr of x and y
Corrx,y =SxySxSy =00 // Not Defined
Sxy=1n-1k=1nxk-x(yk-y)
x=Ink=Inxk= 1+1+1+4=1
y=Sxy=130=0
Sx=1n-1k=1nxk-x2=0
Sy=1n-1k=1nyk-y2=0
Euclidian
dx,y=k
k=1n(xk-yk)2=(1 2 2)+(1 2 2)+(2 2 1)+(2 2 4) =2

Viona Kacaribu
B) x=(0 ,1,0,1) y=(1,0,1,0)
Cosine
Cosx.y=x.yxy=0+0+0+022=02=0
Correlation
Corr(x,y)
Corrxy=SxySxSy=-131313=-1
Sxy=1n-1k=1nxk-x(yk-y)
x=1nk=1nxk=24=12
y=1nk=1nyk=12
Sxy=13(-14-14-14-14)=-13
Sx=1n-1k=1nxk-x2=13(+14+14+14+14=13
Sx=1n-1k=1nyk-y2=13
Euclidean
Dx,y=k=1n(xk-yk)2=1+1+1+1=4=2

Jaccard
J=F11f01+f10+f11=01+20+0

Bagus Anom Tri Atmojo (004201705042)


D. X = (1, 1, 0, 1, 0, 1), Y = (1, 1, 1, 0, 0, 1))
 Cosine
(x . y) = (1 * 1) + (1 * 1) + (0 * 1) + (1 * 0) + (0 * 1) + (1 * 1) = 3

||x||= √ ( 1∗1 ) + ( 1∗1 )+ ( 0∗0 )+ ( 1∗1 ) + ( 0∗0 ) +( 1∗1)= √4 = 2

||y||=√ ( 1∗1 ) + ( 1∗1 )+ ( 1∗1 ) + ( 0∗0 ) + ( 0∗0 ) +( 1∗1)= √4 = 2

Cos(x,y) = (x . y) / (||x||*||y||) = 3 / (2*2) = 0.75

 Correlation

Mean(x) = (1 + 1 + 0 + 1 + 0 + 1)/6 = 0.666


Mean(y) =(1 + 1 + 1 + 0 + 0 + 1)/6 = 0.666

1
Cov(x,y) = * [(1-0.666)(1-0.666) + (1-0.666)(1-0.666) +(0-0.666)(1-0.666)
6−1
+(1-0.666)(0-0.666) + (0-0.666)(0-0.666) +(1-0.666)(1-0.666) ]
1
Cov(x,y) = * [(0.111556) + (0.111556) + (-0.2224) + (-0.2224) + (0.4435) +
5
(0.111556)]
1
Cov(x,y) = * 0.333368 = 0.06667
5

s(x) =
1
√ 6−1
∗¿ ¿

=
1
s(y) =
√ 6−1
∗¿ ¿ = 0.5174

1
Corr(x,y) 5 = 0.7471
(0.5174∗0.5174)

 Euclidean

d(x,y) = √ ( 1−1 )2 + ( 1−1 )2 + ( 0−1 )2+ ( 1−0 )2 + ( 0−0 )2 + ( 1−1 )2


d(x,y) =√ 2 = 1.414
 SMC
x =11 0101
y =111001
f 01 = 1
f 10 = 1
f 00 = 1
f 11 = 3

SMC = ( f 11 +f 00 ¿ / ( f 01+ f 10 +f 11 + f 00
= (3 + 1) / (1 + 1 + 3 + 1) = 0.666
 Jaccard
J = ( f 11) / f 01 + f 10 + f 11
=3/1+1+3 = 0.6

Steven Tricahyadi
E. x = (2, -1, 0, 2, 0, -3), y(-1, 1, -1, 0, 0, -1)

 Cosine
(x . y) = (2 * -1) + (-1 * 1) + (0 * -1) + (2 * 0) + (0*0) + (-3 * -1)= 1
||x|| = √ ( 2∗2 ) + (−1∗−1 ) + ( 0∗0 ) + ( 2∗2 )+ ( 0∗0 )+(−3∗−3)
= √ 14 = 3.7417
||y|| =√ (−1∗−1 ) + ( 1∗1 )+ (−1∗−1 ) + ( 0∗0 )+ ( 0∗0 )+(−1∗−1)
= √4 = 2

Cos(x,y = (x . y) / (||x||*||y||) = 1 / (3.7417*2)

Cos(x,y) = 0.1336

 Correlation

Mean(x) = (2 - 1 + 0 + 2 + 0 - 3)/6 =0
Mean(y) =(-1 + 1 - 1 + 0 + 0 - 1)/6 = -0.3333

1
Cov(x,y) = * [(2-0)(-1+0.3333) + (-1-0)(1+0.3333) +(0-0)(-1+0.3333)
6−1
+(2-0)(0+0.3333) + (0-0)(0+0.3333) + (-3-0)(-1+0.3333)]
1
Cov(x,y) = * [-0.3332]
5
Cov(x,y) = −0.06664

1
s(x) =
√ 6−1
∗¿ ¿ = 3.6

1
s(y) =
√ 6−1
∗¿ ¿ = -0.00004

0
Corr(x,y) =0
(1.4142∗1.4142)

You might also like