You are on page 1of 16

URaNAM Rayna

BE.B
265
LTCE

Machine Leannung

ASSIGNMENT-2

Exploin C00-validation stechniaue.

Jtatiokital. method waed t eatimate the pekomamce o accwtaoy eMI modla


I a uaed t puetoct agamat evexikung ia a pudictive modelabaticulany
a Caae wheneth amounto a data may be limited. Sn cre00-vali.dtLo,
L
make a aed number o 4olds (or partitiona) the data., un th
Onalyata n 0chald and then Do 0ge tihe tveualQ ewte eatimate.

Whev dealug ith a ML ta«k, yau haa te epenly idisti thphebltm £o y


Wht Ca
pitk th mezt autablh alyoim whuch an gve tha_ best dcce

Thig modk cnn at be teatzd on the branug _data Ouu man


slbective ia te make
Awne that th medel should be able to wock well en Yeal-
wndld datn athaughe=
o, tr knew the Heal acore othe model t shauld he teated on
the datu
tha haa nevet olon bekace ond thua 2et o data ia UALally
e
callod tenuna

CroAA-Valadation ia a tzchniue in whidh we train sun mo del uaua thu


Subaet
the dataoet andthen exaluate vaungCanblem.ertanu
Aubact ak the dataaet
SIE PS IN CROSS-VALIDATION 0 Reolwe Acme po-tion o Samplo _dataset
2 Ueing the neat data-aot te dan moda
Tat thi model_wan the NAewe þo-tion
TYg |Model| odataaet
Dotaoet
(Testing) yabste
LTCE

DRewntiate hetwe0n Rondem Foreat and Booatu technia


A)
RANDOM FORESTI BOOSTING-
(BAGGING)

The implut way Combiui4 OA way a CDrmanug pdicona that


adictiona that axe el the same belon de the ttypis

Aime te dioranAe vainmce,nst Aima deorease bina, nst vaMLAe


bina
(3 Ench modol receNeo eaul weigst 3Modela weigkdted occorduy TeeMkOMaMCa
Each modal a buit indehendenty Neu nwodaka oneinktuenced, _bu the peka
-moMce eheu builk_model
t tiea tr solwe Orenuy 6Bo0atung tiea tr eduoo bina
peblen
s h claaiyer ia uaatable (bigh a tho claaihen in stable h sinple
vntioee) then apply baggin higlh bina then aebl boeouj
Snth baat claaikiena 0 taintd O Sa thua bane cdaaaihn ore
paralloly tyauned seuentially

Ezplaun sVM Ezplan the cencept a manaun and hypenplane

SUM ctavda or Supkot Victon MachineThie techniaue haa ita noto n


Attiatital lenMhingthery
AAAta daaiicatnon,t seanchts oatimal hybrplanelit deina
bauday) ehaatng tu bandaung tuples shene claa am ünathen.
SVM Wrka well i t hughon dimenoLonal_data and thua asuda dimLnoibnaltt
pheblem
LTCE

Althoug the SVM baaed claaaiication (ie: traunng time) iaextemaly skew
tho Adaut, howeMex higal ccunak FatheM, teat n unknown data

SVM ltaa pHene t oven-Jtny than other methodo sk alar oalitates


Combact modela or claoikication
TYPES O Linem SVM
Nen-Lineax SVM

MARGIN tin the diatauce betwen t hypexplone and the obaeatio


clasest tr the hypexplane aukbat Vectava dn SVM onge
mangann LOmaidored a geod maxain.
Tho 0L2 types a marnginahtnd mangn Ond sot maHgun
o r plone
Mox
Mann
Sport edoye

ve Hpmplone

HYPERPLANE > Supeoa we havea a_dadtaaet _havinq two ckasaza ond we wot t
cdaaai dthat the new data paint ao eithu 1r 0To daaes
mae baunta, we Com hawmany deciai on boundaniea
In mudipl dimunaisnaie Mare than 20) t a called a
hyplane Th boat hupexplone
th
ia tat_plane that

hao the max. diatomce m t daa20a


LTCE

Ezplnin typta s Cluatung


Clusting taak o dividin4 the populdion or data poitantr a nunben.
-gMAua Auch that data pniuta. Jn the Same 4M0ps D moe aimian
te then_data poiut n tha Sae qeup sthan th.0oe in othu qeups
Inaimola Whrda, 3th um in te A0gre0te 4MApa with Siilian trats ond
04aign then cluotena

TYPES OE CLuSTERING Bxoadly apeakg, ousteng Can be divided


It twb Aubauoups

HARD CLUSTERING n hond clustnung, ech data p.oinz eihe belm4


Ta cuoteA Cemplotoly or nst For e:a Ja tthe below
example ench Cuatomexin but ut enN qYap 0t
sk the 10 qrouba

CASE STUDY: Theu o a ubal store or boeko, he Cuotomeno like


cecain beoks. We can eok dabaulo ok each cuatome
and duwiae wniae stvabog4 bat io not elkiit
Ao we Cavn duotor te cudtomee ut qvalupd o 10 boaed
bntheir reruneu

SOFT CWSTERING n aot duaten inat tad puttiug each


d ata þoint ntr a Aobexate duatera þrebabilig
Y likolihood sk that data pount t be n thoae
cluatene 0041gned
Ere.q m the abore sconauo each cuatom LA
Aaaigned a pebabilidy t e in ethex o10
Cluatuu oh th utail store
LTCE

Give the foll. datui 234, 123,456, 23,34, 56,H8, 96,150, l6, 11,118, 199
or k=2, uoe C180) and C2(250) aa thu inutiak cluotan centrea uoing
the k-meana alaorithm tr ind tho inal cduotana
A
R=2qiven
Iniially,C1-80,C2 250

C =80 C2 2500
K 123,23,34,56,H,96,150, K=234, 456, 199
116,17, 1188J

New cetma, C = 91.1 C= 296.33

K i 123,23,34, 56, , 96, K={234,456, 199


I50, 116,1,18

dince, K= K K2 K2 we stop
Theneore nal cuta ane > Ka= 123,23,34,56, 8,96,150,116,H,
k 234, 456, 139

gUai the Mininum Apannin Tue Aphtoach, Creuce cluatono k K=3

2
36
G 5
4
2

MinNimum Ahomnn4 T MSI) The apanmng tree a a raph with the


mi poaibl sum o t edge weighto
LTCE

inumum

Th paaaible,aponinugtru o tu above puoblam


elect Mand.om vextex,add it t empty aph I
Aelect (1 a att

(b)Repunk utl all nodsa o dded ta T by selezin4 edaea with


minimul Weiqht

2 Weight=11
2

Remeve (k-1) edaza with the highst weigtdHenu,k=3(qiven)


We phunt edgea 93 ond which Lemea uo with
2
2

3 Custers

The clustena a {0,3,4 > Cluster I


Cluotr L
uote I
LTCE

Explain DBSCANclustuns toohuiaue


DR.SCAN: Denoit Baaed. Spatial Clustvung Applicationa with Nowe

t waa pHepoaed b Eatu, KiLgel, Sanden and XuKDD46)


Reliea on a denatiy-baoed notion o cuaten : A clustexo delined ao a
maximal aet ak donaitu-Comneeted poLMta
Diacovo cluatxo ah axbitrary ahabe in amtial_databaaea with noiae
Baaic Adea lustena axo donar eajons Jn tu data apace, sèberated by
ngiona lenen objeet donait

DBSCAN Alqori thm SnbutThe datnoet D


Paxamete , Min Pts
For each ohject D
i a a cove okject and net psuec2a«ed then
C=ychive all shjosta denaity-veashabls frem
J
maxk all ohjectn in C_0n bNeceAoed
xepot Co a cluter
elae mank p_no 0utlien
end
EndFoY

Anbitray arlset apoint h-


Rehiexe all þouta denait-XRachable ron wyt Epa and MinPts
a Core p0it a duoter ia omed
4 border hont,no painta xe danaity Rechable4rom and
DBSCAN vicits thanext point o the databuae

Contnue þrecea4 untl a!l the pauta hawe ben peceaed.


LTCE

DBSCAN hao eaiatamce te Noioe


DBSCAN Can handle duotena ol dikheunt ahapes and. aize
DBSCAN Cannet hande v/anuun4 domoitiea
D&SCAN Semaitive to baAametera
Co6
Elounthe imbacamce o DimunaLonality uducion in Machune Leannng

An Maching Leanrun4u ordex te build a gsod hemung model we try


paa Dn thoe eactiunea i to dotaoet that axe aignkcant ta one nothex.

Dimenaienaliiy_xaduch.on ia he pneceas aduucuy the umben Andenm


vaiablea unde conmaideration, by aktainus a oct a puincibal yanuables

PROBLEMS WITH HIGH DIMENSiONAL DATA 0High compdation coat


Ovex-ktuy-
3 Corelodted. datn
Disth blw neakeoti Aathost data þait
NEED OFDIMENSIONALITY REDuCTION(IMPORTANCE)
O Dimanakn ality xedudion helha Soluig th above þusbkema whils teyin
pMaOwL meat the tolexant Juno n the clata needed te Lean
DccuMte,pdicive mpdela
4 t redues the tume and sturaqe apace xeaiTed
4t helpo Homove muli-collineanuty which imbuwa the intzxprtation_ok
the poxam ckes o the ML modal
becomla 20ae te viaualize the daa when veduced to voA Lew
dinmtnaiona Auch na 2D or 3BD
(5) 9t aveida cwnor o dimanaLonaluty omd Yemerza ixnulevat Aodtuts frem
the dota.
LTCE

QDeecibe the stzhe to reduce dimesdonalit uan the PCA methodSoe


Jar the below
X= nd Y= 1
2
ys

A)
STEPS FOR PCA ALGD

O 6etiug tho dataacti taka the inputrdatnoeth diida itz ite twosubp.avta
Xand. y whexa X=tning set, - tootintSet_
Rebabating data aaStruectutnpnt 20 matrix s Indupendart va X

tandandigzing tu data,

Compucte the meam vector (4)-

dubtradr menn hom aiven data.


Calulate o-vaxiamce matix

yalmea the cOvauace_ matix


Calculate thu eiq vectora and tigen

Chooaun Componerlo ond mmung aeatunevecdor

DeHmnA th neu dataso

PROBLEMaivon, X=| - Y =
2 y2
LTCE

tap1giuEneatura vndaa.a ,=(1,-1), x=(0,1), z(-1,0)

dtep-2 Calaulute h mean vctor4)-


3 3

Mean vectr4)

ktep3dubtkact the mean Vadova 4om 4tatuwn_madricLs we got

Atep4:Calculate coraxiawce_matuix ie XiA): (x -4

n,Ma)ly-Aj[u-|11
L-1 1J
m2 =)(%A-[o1Eo1

m(X):(%4 [|10)11-
oo
CovnLaMce matix= MtMtMg/3 ie,1+011 -1+D+0 1 + 3
1+0+0 1+1t0
2/3 -31
L- 2/3

dtzp-5 Calculate eien values ond eiqen vecko4


ia an eiata voluu hara matix M ik it u a dhsritesaake
Aolitien h dhaxacteniste eauton M-A=0
LTCE
23 -13
L-13 2/3
f 23- 3
-l/3 2/3-
ie, - ) - H } [ *
0
-441-0
onsewing wgta=1 and
We diacandiqnore the Smallex eigpn valua

dtep To ind eiqun Vredo we ue M-X =_-X


2/3

>2-1XE
3
X 0
X2
X+ 2X2 =

On saluing we qatXE -X2 -A=X2


Eianvtdtors
Pinäibal Conmponnt

r
LTCE

Find the oinqulax value decomposition 4orA


A Fomula or Svo >A= USVt

ATA = M
Atep-1

dtop-2 Calculate eigen valuues oungM-=0

5- 3 6
3 5-A

(5-2-8)-o
25+*-10-9-0
-10+ 16=o -2-8+16 2a(-2)-«(2-12)
-2 r D-8

dtz3Ei on VectDrs,-

-2, A-8,
5-8 3

3.5-8

3V + 3V2=0 3V +3V =O
,V+V2=0 Vi =V2
= -V22

ie-1
LTCE

For tigin veckorI), LEARti 212=N2


m nomalising,-
1/2

Fr tigen vector lngt s E-J2


Gn normausig,

1JE

V 12
4/J2 1/J2
1/2-1//2 2.8 28 0
1414 JJ

Stp-4Find uin4 h V,_A= V<V


AV UL
o 2-828
O 2-82&1= U_
1414 1:414 I

Step-5 SVD 1 writtn aa,


S
A 1/2-4/E
o 2 4//2 1/2
LTCE

Eind ostimat hypuplane a th att o data" pointa3,),[3-1), (6,1), (6,-1)7


fL0)f0.1)t01),4,0
A)D Plat aaph
45
2

Suthaot Vectava SE
(2 Bina Qgment S=

Fan
Ean Ds) Is) + «2 dsa).o[G) +da ¢lss). bs) -
+1

daoeA blone
){})+*|EN)**}X9- COnadeha plane

Dnsolu akox? ean Wget,- 2d+ 4od2 + 4d3 =-1


4d+ N d 2 + 9d3 +1

4d + tod2 + 1$d3 +1 =

d-3.5, d=0.15 and N3-0:t5


STi
LTCE

WEDl Si
615
W-35 +045

2-25
35 +
6-75
2-2S
-0-45
357 0+57 o5

-2/

Hpenlana _ean u WAtb


W b -2 b-2-0)

What ype o duotuin gvon y e EM adgaithm? Ezplan epodaten


nd maximizati an älgorithm "

Expoctatiou ond Maximization alaithm Con be uaed or latent vaniablbs


(ie vaniablo that ave nat dötstt chomable diroctly and 0M aceally
inkemedrom tthu values a tAe othen ohsewabko vaniables) Tee n,d
te pxodict theun values ith the condtion that the 9naralkom a
hhobailik distibition gbvuuy hase latut vaniablao known t .
LTCE

A uaed ta indthe Local maimum likelikood.þaxamekenaoa


atathotrcal madel in the caae whu Latent Vaniakleo _ax iwolwed
ond the_data ia inc8mplote o miaa

ALGD 0 Given a Aet ok incompleke data; cemaidex a set o stattug


Datamtoxs
Ezpectation atep (E-step) Uaing th abaewed aail akl
dataz e the dataat, atimate
lquea) thu valuts emison data

3 Maximizution stelpM-steh) Cemplrte datn_qenerated. alteu the


Expectation(E)
atep ia uoed in
01dex to pdatz the patametens
Repeat Stop 2h 3 ustil Chwgmma

START INITIAL VALVES

ExPECTATION STEP

MAXI MIzATlON STEP

CONVERGED No)
NO

VSAGE> 0 anbe uaed. tor ji miaony data in a Sample.


uAed an a baai ounaupenwised leaunung clusten4
Can be uoed te eotimate Ih paxametens o HMM
Con be uoed diocoverung the values ak Lotent vaxinkle

You might also like