Professional Documents
Culture Documents
lNeer
ukondaI
nst
it
uteofTechnol
ogy&Sci
ences(
Aut
onomous)
(PermanentAf
fi
li
ati
onbyAndhr
aUniver
sit
y&ApprovedbyAI
CTE
Accredi
tedbyNBA(ECE,EEE,
CSE,I
T,Mech.Civ
il&Chemi
cal
)&NAAC)
Sangi
val
asa-
531162,
Bheemunipatnam Mandal
,Visakhapat
nam Di
str
ict
Phone:
08933-
225083/84/87 Fax:226395
Websi
te:
www.
ani
ts.
edu.
in emai
l
:pr
inci
pal
@ani
ts.
edu.
in
3-
1ITMi
d–I
I(SET-
II
)Scheme&Key
Subj
ect
:IT314DATAWAREHOUSE&DATAMI
NING
Scheme
1.Wit
hsupportthr
eshold=50%andconf
idence=60%.Constr
uctacondit
ionalFPtree
t
ofindtheassoci
ati
onr ul
es.Fpt
reeconstr
ucti
on-6M,Ident
if
icat
ionFrequent
i
tems-4M
2.Mi
ningf
requenti
temsetusi
ngt
hev
ert
ical
dat
afor
matoft
het
ransact
iondat
aset
D.
All
owforminmum suppor
tcountei
ther2or3andf
orv
ert
ical
process-
6M and
fr
equenti
tem l
i
st–4M
3.Appl
ydecisi
ontr
eefort
hedatasetandconsi
dercl
assi
fi
cat
ionat
tr
ibut
east
arget
var
iabl
e.(
Ident
if
yonlyt
herootnode)
.
Cal
cul
ati
onofentr
opyandinf
ormat
iongai
nforeachat
tri
but
e(a1,
a2,
a3)–3*
4=12M
andi
denti
fi
cat
ionofr
ootnode–3M
4.Naïv
ebay esclassif
ieral
gori
thm -
5M andApplyNaïveBayesi
anclassif
icat
ionon
thedatasetin(Questi
on3)andpr edi
ctthecl
assif
icat
ionl
abelyes/
nof orthetest
sample<True,cool,Normal>-10M
5a.k-
meanscl
ust
eri
ngalgori
thm.–3M Applyk-meanst
oclust
ert
hefol
l
owingdata
{2,
3,
4,
10,
11,
12,
20,
25,30}i
ntotwoclust
ersbyassumingmean1=4andmean2=12.–
7M
5b.Wr
it
eanyt
wot
echni
quest
oimpr
ovet
hecl
assi
fi
cat
ionaccur
acy
.5M
6a.Def
ineAggl
omerati
vecl
uster
ing.-2M Andappl
yaggl
omerati
vesi
ngl
eli
nk
cl
uster
ingf
orthedat
a{18,
22,25,
42,27,
43)
-6M andbui
lddendogr
am –2M
6b.Wr
it
eanyt
wot
ypesofcl
ust
eri
ng.–5M
AnswerKey
1ans)Suppor
tthr
eshol
d=50%=>0.
5*6=3=>mi
n_sup=3
1.Countofeachi
tem
I
tem Count
I
1 4
I
2 5
I
3 4
I
4 4
I
5 2
2.Sor
tthei
temseti
ndescendi
ngor
der
.
I
tem Count
I
2 5
I
1 4
I
3 4
I
4 4
3.Bui
l
dFPTr
ee
2.ans)Consi
dermi
nimum suppor
tcount=2or3.
I
temset TI
D_set
I
1 {
T100,
T400,
T500,
T700,
T800,
T900}
I
2 {
T100,
T200,
T300,
T400,
T600,
T800,
T900}
I
3 {
T300,
T500,
T600,
T700,
T800,
T900}
I
4 {
T200,
T400}
I
5 {
T100,
T800}
Forsuppor
tcount=2
2-
Itemset
sinv
erti
caldat
afor
mat
I
temset TID_set
I
1,I
2 {
T100,
T400,
T800,
T900}
I
1,I
3 {
T500,
T700,
T800,
T900}
I
1,I
4 {
T400}
I
2,I
3 {
T300,
T600,
T800,
T900}
I
2,I
4 {
T200,
T800}
I
2,I
5 {
T100,
T800}
I
3,I
5 {
T800}
3-
it
emseti
nver
ti
cal
dat
afor
mat
I
temset TI
D_set
I
1,I
2,I
3 {
T800,
900}
I
1,I
2,I
5 {
T100,
T800}
Dot
hesamepr
ocedur
eforsuppor
tcount
=3.
3ans)Deci
siont
ree
At
tri
but
e:a1
Val
ues(
a1)=Tr
ue,
Fal
se
S=[
6+,
4-] ent
ropy
(s)=-
6/10l
og(
6/10)
-4/
10l
og(
4/10)
Strue=[
1+,
4-] ent
ropy
(Strue)=-
1/5l
og(
1/5)
-4/
5log(
4/5)
Sfalse=[
5+,
0-] ent
ropy
(Sfalse)=0
I
nfor
mat
ionGai
n(S,
a1)=Ent
ropy(s)–5/10ent
ropy(Strue)
-5/
10ent
ropy
(Sfalse)
=0.
9709-5/
10*0.7219-5/
10*1=0. 6099
Val
ues(
a2)=Hot
,cool
S=[
6+,
4-] ent
ropy
(S)=-
6/10l
og(
6/10)–4/
10l
og(
4/10)=0.
9709
Shot=[
2+,
3-] ent
ropy
(Shot)=-
2/5l
og(
2/5)–3/
5log(
3/5)=0.
9709
Scool=[
4+,
1-] ent
ropy
(Scool)=-
4/5l
og(
4/5)–1/
5log(
1/5)=0.
7219
I
nfor
mat
ionGai
n(S,
a2)=Ent
ropy
(s)–5/
10ent
ropy
(Shot)
-5/
10ent
ropy
(Scool)
=0.
9709-5/
10*
0.9709-5/
10*
0.7219=0.
1245
Val
ues(
a3)=Hi
gh,
Nor
mal
S=[
6+,
4-] ent
ropy
(S)=-
6/10l
og(
6/10)–4/
10l
og(
4/10)=0.
9709
Shigh=[
2+,
4-] ent
ropy
(Shigh)=-
2/6l
og(
2/6)–4/
6log(
4/6)=0.
9183
Snormal=[
4+,
0-] ent
ropy
(Snormal)=0.
0
I
nfor
mat
ionGai
n(S,
a3)=Ent
ropy
(s)–6/
10ent
ropy
(Shigh)
-4/
10ent
ropy
(Snormal)
=0.
9709–6/
10*0.
9183-4/
10*0.
0=0.
4199
Maxi
mum i
nfor
mat
iongai
n=0.
6099
Hencet
her
ootnodei
sa1
4ans)Naï
veBay
esCl
assi
fi
eri
soneoft
hesi
mpl
eandmostef
fect
iveCl
assi
fi
cat
ion
al
gor
it
hmswhi
chhel
psi
nbui
l
dingt
hef
astmachi
nel
ear
ningmodel
sthatcanmake
qui
ckpr
edi
cti
ons.I
tisapr
obabi
l
ist
iccl
assi
fi
er,
whi
chmeansi
tpr
edi
ctsont
hebasi
s
oft
hepr
obabi
l
ityofanobj
ect
.
o Bay
es't
heor
em i
sal
soknownas
Bay
es'Rul
e or
Bay
es'l
aw,whi
chi
susedt
o
det
ermi
net
hepr
obabi
l
ityofahy
pot
hesi
swi
thpr
iorknowl
edge.I
tdependson
t
hecondi
ti
onal
probabi
l
ity
.
o Thef
ormul
aforBay
es'
theor
em i
sgi
venas:
Where,
P(
A|B)i
sPost
eri
orpr
obabi
l
ity
:Probabi
l
ityofhy
pot
hesi
sAont
heobser
vedev
entB.
P(
B|A)i
sLi
kel
i
hoodpr
obabi
l
ity
:Pr
obabi
l
ityoft
heev
idencegi
vent
hatt
hepr
obabi
l
ity
ofahy
pot
hesi
sist
rue.
a1 Yes No
Tr
ue 1/
6 4/
4
Fal
se 5/
6 0/
4
a2 Yes No
Hot 2/
6 3/
4
cool 4/
6 1/
4
a3 Yes No
Hi
gh 2/
6 4/
4
nor
mal 4/
6 0/
4
Gi
vensampl
e<t
rue,
cool
,nor
mal
,?>
Pyes=6/
10*
1/6*
4/6*
4/6=8/
180=2/
45=0.
04
Pno=4/
10*
4/4*
1/4*
0/4=0
Py
es>pno;
Hencewecanpr
edi
ctt
hatt
hesampl
e<t
rue,
cool
,nor
mal
,Yes>
5aans)K-meansAl gori
thm:
St
ep-1:
Sel
ectt henumberKt odeci
dethenumberofcl uster
s.
St
ep-2:
Sel
ectr andom Kpoi ntsorcentr
oids.(
Itcanbeot herfr
om theinputdataset
).
St
ep-3:
Assign each dat a poi
nttot heirclosestcent r
oid,which willform the
pr
edefi
nedKcl ust
ers.
St
ep-4:
Calculat ethevarianceandplaceanewcent r
oidofeachcl
ust er
.
St
ep-5:
Repeatt het hi
rdst eps,whichmeansr eassi
gneachdat apointtot henew
cl
osestcentroidofeachcl uster
.
St
ep-6:
I
fanyr eassignmentoccur s,t
hengot ostep-4elsegotoFINISH.
St
ep-7:Themodel isready.
I
ter
ati
on1:
I
tems C1=4 C2=12 Clust
er
number
2 |
4-2|=2 |
12-
2|=10 1
3 1 9 1
4 0 8 1
10 6 2 2
11 7 1 2
12 8 0 2
20 16 8 2
25 21 13 2
30 26 18 2
Mean1=(
2+3+4)
/3=3;
mean2=(
10+11+12+20+25+30)
/6=18
I
ter
ati
on2:
I
tems C1=3 C2=18 Clust
er
number
2 |
3-2|=1 |
18-
2|=16 1
3 0 15 1
4 1 14 1
10 7 8 1
11 8 7 2
12 9 6 2
20 17 2 2
25 22 7 2
30 27 12 2
Mean1=(
2+3+4+10)/
4=19/
4=4.
75;mean2=(
11+12+20+25+30)
/5=98/
5=
19.
6
I
ter
ati
on3:
I
tems C1=4.
75 C2=19.
6 Clust
er
number
2 |
4.75-
2|=2.
75 |
19. 6-
2|= 1
17.6
3 1.
75 16.6 1
4 0.
75 15.6 1
10 5.
25 9.6 1
11 6.
25 8.6 1
12 7.
25 7.6 1
20 15.25 0.4 2
25 20.25 5.4 2
30 25.25 10.4 2
Mean1=(
2+3+4+10+11+12)
/6=7;
Mean2=(
20+25+30)
/3=25
I
ter
ati
on4:
I
tems C1=7 C2=25 Clust
er
number
2 |
7-2|=5 |
25-
2|=23 1
3 4 22 1
4 3 21 1
10 3 15 1
11 4 14 1
12 5 13 1
20 13 5 2
25 18 0 2
30 23 5 2
Mean1=(
2+3+4+10+11+12)
/6=7;
Mean2=(
20+25+30)
/3=25
Fi
nal
clust
ersc1={
2,3,
4,
10,
11,
12}c2={
20,
25,
30}
Anothermodeli
screat
edandpr
edict
ionsaremadeonthedat
aset
.
(Thi
smodel t
ri
estocorr
ectt
heer
rorsfrom t
hepr
evi
ousmodel)
Simil
arl
y,multi
plemodel
sarecreat
ed,
eachcorr
ect
ingt
heerror
soft
he
previ
ousmodel .
Thefinalmodel(st
rongl
ear
ner)ist
heweight
edmeanofal
lthemodel
s
(weaklearner
s)
6aans)St
ep1:
18 22 25 27 42 43
18 0
22 4 0
25 7 3 0
27 9 5 2 0
42 24 20 17 15 0
43 25 21 18 16 1 0
St
ep2:
18 22 25 27 42,
43
18 0
22 4 0
25 7 3 0
27 9 5 2 0
42,
43 24 20 17 15 0
St
ep3:
18 22 25,
27 42,
43
18 0
22 4 0
25,
27 7 3 0
42,
43 24 20 15 0
St
ep4:
18 22,
25,
27 42,
43
18 0
22,
25,
27 4 0
42,
43 24 15 0
St
ep5:
18, 42,
43
22,
25,
27
18,
22,
25,
27 0
42,
43 15 0
St
ep6:
18,
22,
25,
27,
42,
43
18, 0
22,
25,
27,
42,
43
6bans)Ty
pesofcl
ust
eri
ng:
Partit
ioni
ngCl ust eri
ng:Itisat y
peofcl ust
eri
ngt hatdividesthedatai nt
onon-
hierarchi
calgr oups.I tis al
so known as t he
centroi
d-based method.The most
commonexampl eofpar t
it
ioningcluster
ingisthe K-
MeansCl ust
eri
ngalgori
thm.In
thistype,thedat asetisdi
videdintoasetofkgr oups,whereKi susedtodefinethe
numberofpr e-definedgroups.Thecl ust
ercentreiscreatedinsuchawayt hatthe
distancebetweent hedatapointsofonecl ust
erisminimum ascompar edtoanother
clustercentr
oid.
HierarchicalClusteri
ng:Itcanbeusedasanal ternativ
efortheparti
ti
onedclustering
asther eisnor equi r
ementofpr e-speci
fyi
ngthenumberofcl ust
erstobecr eated.In
thi
st echni que,thedat asetisdiv i
dedintoclusterst ocr
eateat ree-
li
kest ructure,
whichi salsocal leda dendrogram.Theobser vati
onsoranynumberofcl usterscan
besel ectedbycut ti
ngt hetreeatt hecorr
ectlevel.Themostcommonexampl eof
thi
smet hodi sthe Agglomerati
veHi er
archi
calalgor
ithm.
Thedensity
-basedcl usteri
ng:Itconnectst hehighl
y-densear
easintoclust
ers,and
thearbi
tr
ari
lyshapeddi st
ribut
ionsaref or
medasl ongast hedenseregioncanbe
connect
ed.Thisalgorithm doesitbyi dent
ifyi
ngdiff
erentcl
ust
ersinthedatasetand
connect
stheareasofhi ghdensiti
esintocluster
s.Thedenseareasindat
aspacear e
div
idedfr
om eachot herbyspar serar
eas.