You are on page 1of 20

AI

MLANSWERS
UNI
T3
Q1.Whati
sDat
a?Expl
aindi
ff
erentt
ypesofDat
a.
Datai sdi
sti
nctpiecesofinf
ormation,usuall
yformat tedinaspecial way .
Datai
smeasur ed,coll
ectedandreported,andanalyzed, wher
euponi tisof ten
vi
suali
zedusinggraphs,i
magesorot heranalysistools.Rawdat a(“
unpr ocessed
dat
a”)maybeacol l
ecti
onofnumber sorchar acter
sbef oreit
’sbeen“ cleaned”and
cor
rectedbyresearcher
s.
Ty
pesofdat
a:
Gener
all
ydat
aisdi
vi
dedi
ntot
wot
ypes-
1.Qual
i
tat
ivedat
a-
Quali
tat
ivedata 
descri
besqualit
iesorcharact
erist
ics.I
tiscol
lect
edusing
questi
onnair
es,i
nter
views,orobservati
on,andfrequentl
yappearsinnarrat
ive
form.Forexample,i
tcouldbenot estakenduri
ngaf ocusgroupont hequali
ty
ofthefoodatCafeMac, orresponsesf r
om anopen-endedquestionnai
re.I
tis
alsocal
ledas
cat
egor
ical
dat
a.
I
tisf
urt
hersubdi
vi
dedi
nto3t
ypes:
a.Bi
nar
ydat
a-
Binar
ydat aisnumericall
yrepresent
edbyacombi nati
onofzer osandones.
Binar
ydat aistheonlycategoryofdatathatcanbedi rect
lyunderst
oodand
executedbyacomput er.Dataanaly
stsusebi nar
ydatat ocreatest
ati
sti
cal
modelst hatpredi
cthowof tenthestudysubjecti
slikelytobeposit
iveor
negati
v e,upordown, r
ightorwrong—basedonazer oscal e.

b.Nomi
nal
dat
a-
Nomi naldat
a,alsoreferredtoas“named, l
abel
eddata”or“nominalscaled
data,”i
sanyt y
peofdat ausedtolabelsomethi
ngwi t
houtgivi
ngitanumer i
cal
value.Dataanalyst
susenomi nal
datat odet
erminestat
ist
ical
lysi
gnif
icant
diff
erencesbetweenset sofquali
tati
vedata.Forexample,
amul ti
ple-
choice
testtoprofi
l
epar ti
cipants’ski
l
lsi
na
st
udy
.

c.Or
dinal
dat
a-
Ordi
naldataisquali
tati
vedatacategor
izedinaparti
cul
arorderorona
r
angingscale.Whenr esear
chersuseordinaldat
a,t
heorderofthequal
it
ati
ve
i
nfor
mat i
onmat ter
smor et
hanthediff
erencebetweeneachcategor
y.Data
anal
ystsmightuseordinaldat
awhencreat i
ngchart
s,whil
eresear
cher
smight
useitt
oclassi
fygroups,suchasage,gender,orcl
ass.Forexampl
e,aNet
PromoterScor
e( NPS)surveyhasr
esult
st hatare
ona0-
10sat
isf
act
ionscal
e.

2.Quant
it
ati
vedat
a-
Quant
it
ati
vedataaret
heresul
tofcount
ingormeasuri
ngatt
ri
butesof
popul
ati
on.Quant
it
ati
vedat
aisalwaysanumber.Hencei
tisal
socall
edas
numeri
caldat
a.
I
tisf
urt
hersubdi
vi
dedi
nto2t
ypes.
a.Discr
eteDat
a:
Discr
etedat
acont
ainst
hedatawhichhavediscr
etenumer
ical
val
uesf
or
exampleNumberofChi
l
dren,
DefectsperHouretc.

b.Conti
nuousData:
Conti
nuousdatacont
ainsthedatawhichhav
econt
inuousnumer
ical
val
uesforexampleWeight
,Volt
ageetc.
Q2.Gi
vei
ndet
ail
thedi
ff
erencebet
weenqual
i
tat
iveandquant
it
ati
vedat
a.

Q3.Thef
oll
owi
ngdat
agi
vest
henumber(
int
housands)ofappl
i
cant
sregi
ster
edwi
th
an
employ
mentExchangedur
ing 1995-
2000.Const
ructanappr
opr
iat
egr
aphusi
ng
gi
vendata.

Q4.Expl
ainwi
thexampl
eTi
meser
iesgr
aph,Exponent
ialgr
aph,andl
ogar
it
hmi
c
gr
aph.

 1.Ti
meser
iesgr
aph–

A ti
meser i
esgraph 
isalinegraphthatshowsdatasuchasmeasurements,
salesor
frequenciesoveragiventimeperiod.Theycanbeusedtoshowapatter
nor tr
endin
thedata andareusefulformaking predi
cti
ons 
aboutt
hefut
uresuchasweather
forecasti
ngorf i
nanci
algrowth.

Forexampl
e,

Thi
sti
meser
iesgr
aphshowst
het
emper
atur
eofat
ownr
ecor
dedov
ert
woy
ear
sat
t
hree-
mont
hlyper
iodsknownas 
quar
ter
s.
2.Exponent
ial
Graphs-
Exponenti
algraphsaretherepresentat
ionofexponent ialfunct
ionsusingthetabl
eof
val
uesandpl ott
ingthepoint
sonagr aphpaper .Itshouldbenot edthatt
he
exponenti
alfuncti
onsaretheinverseoflogarit
hmi cfuncti
ons.Inthecaseof
exponenti
alcharts,
thegraphcanbeani ncreasingordecreasingtypeofcurvebased
onthefuncti
on.Anexampl eisgivenbelow, whichwi l
lhelptounderstandthe
conceptof 
graphingexponenti
alfuncti
on easil
y.
Forexample,
 
t aphofy=3x 
hegr i
sani
ncr
easi
ngonewhi
l
et aphofy=3-xi
hegr sa
decr
easingone.
aphofy=3x
Gr
-x
Gr
aphofy=3

3.Logar
it
hmi
cGr
aphs
Logar i
thmicf uncti
onsareinverseofexponent
ial
functi
onsandt hemethodsof
plotti
ngthem ar esimilar
.Toplot l
ogar
ithmi
cgraphs,i
tisrequir
edtomakeat ableof
valuesandt henpl otthepoint
saccordingl
yonagr aphpaper.Thegraphofanylog
functionwil
l bet heinver
seofanexponent i
alf
uncti
on.Anexampl ei
sgivenbelowfor
betterunderst andi
ng.
Forexampl
e,
 t
hei
nver aphofy=3x 
segr wil
lbey=l
og3 {
x}whi
chwi
l
lbeasf
oll
ows:
UNI
T4
Q1.Def
ineExpl
orat
oryDat
aAnal
ysi
s(EDA)
?Al
soexpl
ainneedf
orEDA?
Expl oratoryDataAnal y
sis(EDA)isanapproachtoanaly
zethedat
ausingvi
sual
techniques.Itisusedt odiscovertr
ends,
pat
terns,
ortocheckassumpt
ionsandhow
valuabletheext r
actedinsightsare,
witht
hehelpofst
atist
ical
summaryand
graphicalrepresent
ations.
Needf
orEDA-
● Expl
or at
oryDataAnalysi
sisacrucial
stepbef
oreyoujumptomachine
l
earningormodel i
ngofy ourdat
a.Bydoingt
hisyoucangettoknowwhether
theselectedf
eaturesaregoodenoughtomodel,areal
lthef
eatur
esrequi
red,
arethereany
corr
elat
ionsbasedonwhi
chwecanei
thergobackt
otheDat
aPr
e-pr
ocessi
ng
stepormov eontomodel
i
ng.
● I
dent
if
yingmosti
mpor
tantv
ari
abl
es/
feat
uresi
nyourdat
aset
.
● Test
ingahy
pot
hesi
sorchecki
ngassumpt
ionsr
elat
edt
othedat
aset
.
● Tocheckqual
i
tyofdat
aforf
urt
herpr
ocessi
ngandcl
eani
ng.
● Del
i
verdat
adr
iveni
nsi
ght
stobusi
nessst
akehol
der
s.
● Ver
if
yexpect
edr
elat
ionshi
psact
ual
l
yexi
sti
ndat
a
● Tof
indunexpect
edst
ruct
ureori
nsi
ght
sint
hedat
a.
● Getti
ngabetterunder
standi
ngoftheproblem st
atement.I
thelpsyougather
i
nsightsandmakebettersenseofthedata,andremovesir
regular
it
iesand
unnecessar
yv al
uesfr
om data.
● Hel
psy
oupr
epar
eyourdat
asetf
oranal
ysi
s.
● Al
l
owsamachi
nel
ear
ningmodel
topr
edi
ctourdat
asetbet
ter
.
● Gi
vesy
oumor
eaccur
ater
esul
ts.
● I
tal
sohel
psust
ochooseabet
termachi
nel
ear
ningmodel
.

Q2.Descr
ibeDat
aSci
encePr
ocess.Al
soI
mpor
tanceofEDA.
Dat
aSci
encePr
ocessi
nvol
vest
hef
oll
owi
ngst
eps:
 Discovery
Tobegi nwith,i
tisexceptional
lyimperat
ivetogetthediff
erentdeterminati
ons,
prer
equisi
tes,needsandr equiredbudget
-rel
atedwiththeventur
e.Youmusthav e
thecapaci
tyt oi
nquir
et hecor r
ectquesti
onslikedoyouhav egotthedesi r
edassets.
Theseassetscanbei nt er
msofi ndi
vi
duals,
innovat
ion,t
imeandi nformation.I
n
t
hisst
age,
yout
oogott
oout
li
net
het
radei
ssueanddef
inest
art
inghy
pot
heses(
IH)
t
otest
.
 InformationPreparati
on
I
nt hi
sstage, youwouldliketoinvest
igat
e, pr
e-pr
ocessandcondi t
iondatafor
modeling.You’ll
beabletoper for
mi nfor
mat i
oncleaning,changing,and
vi
suali
zation.Thiswil
lassistyoutospott heexcepti
onsandbui ldupar el
ati
onship
betweenthef actor
s.Oncey ouhav egotcleanedandar rangedtheinfor
mation,i
t’
s
ti
met odoexpl orat
oryanalyti
csoni t
.
 Mo delPl
anning
Here,y oumaydeci dethestrat
egiesandmet hodst odr
awt heconnect
ionsbet
ween
fact
or s.Theseconnect i
onswil
lsetthebasef orthecal
culati
onswhichyoumay
execut ewit
hinthef ol
l
owi ngst
age.Youmayappl yExpl
orator
yDataAnalyti
cs(
EDA)
uti
li
zingdiffer
entfactualequati
onsandvisualizat
ionapparat
uses.
 Mo delBuil
ding
Inthi
sstage,you’
llcr
eatedataset
sfortrai
ningandtesti
ngpur poses.Youmay
analy
zediff
erentlear
ningprocedur
esl i
keclassi
fi
cat
ion,associat
ion,andcl
ust
eri
ng
andatlast,
actuali
zethemostexcellentfi
ttechni
quetoconstructtheshow.
 Operati
onal
ize
Inthi
sst age,youconveythelastbri
efi
ngs, code,andspeci
ali
zedreports.I
n
expansion,nowapi l
otventureisaddit
ionallyact
uali
zedinareal
-t
imegener ati
on
envir
onment .Thiswil
lgiv
ey ouaclearpictureoftheexecuti
onandot herrel
ated
l
imitati
ons.
 CommunicateResult
s
Presentl
y,iti
scri
ti
caltoassesstheout comeoft heobjecti
ve.So,wi
thi
nthefi
nal
stage,y
our ecogni
zeallt
hekeydi scoveri
es,communi catetothepart
nersand
decideintheeventthatt
heoutcomesaboutt hev ent
ureareav i
ctor
yora
di
sappointmentbasedont hecri
teriacreat
edi nStage1.

I
mpor
tanceofEDA-
● Expl
or at
oryDataAnalysi
sisacrucial
stepbef
oreyoujumptomachine
l
earningormodel i
ngofy ourdat
a.Bydoingt
hisyoucangettoknowwhether
theselectedf
eaturesaregoodenoughtomodel,areal
lthef
eatur
esrequi
red,
arethereany
corr
elat
ionsbasedonwhi
chwecanei
thergobackt
otheDat
aPr
e-pr
ocessi
ng
stepormov eontomodel
i
ng.
● I
dent
if
yingmosti
mpor
tantv
ari
abl
es/
feat
uresi
nyourdat
aset
.
● Test
ingahy
pot
hesi
sorchecki
ngassumpt
ionsr
elat
edt
othedat
aset
.
● Tocheckqual
i
tyofdat
aforf
urt
herpr
ocessi
ngandcl
eani
ng.
● Del
i
verdat
adr
iveni
nsi
ght
stobusi
nessst
akehol
der
s.
● Ver
if
yexpect
edr
elat
ionshi
psact
ual
l
yexi
sti
ndat
a
● Tof
indunexpect
edst
ruct
ureori
nsi
ght
sint
hedat
a.
● Getti
ngabetterunder
standi
ngoftheproblem st
atement.I
thelpsyougather
i
nsightsandmakebettersenseofthedata,andremovesir
regular
it
iesand
unnecessar
yv al
uesfr
om data.
● Hel
psy
oupr
epar
eyourdat
asetf
oranal
ysi
s.
● Al
l
owsamachi
nel
ear
ningmodel
topr
edi
ctourdat
asetbet
ter
.
● Gi
vesy
oumor
eaccur
ater
esul
ts.
● I
tal
sohel
psust
ochooseabet
termachi
nel
ear
ningmodel
.
Q3.St
epsI
nvol
vedi
nExpl
orat
oryDat
aAnal
ysi
swi
thcommandsi
npy
thonf
oreach
st
ep.
1.Dat
aCol
l
ect
ion
Datacoll
ecti
onisanessential
par
tofexplor
ator
ydataanalysi
s.Itref
erstothe
processoffi
ndingandloadi
ngdataint
ooursystem.Good,rel
iabledatacanbe
foundonv ar
iouspubli
csit
esorboughtfr
om pri
vateorgani
zat
ions.Somer el
iabl
e
sit
esfordatacoll
ect
ionareKaggl
e,Git
hub,MachineLearni
ngRepository,
etc.
2.Get
ti
ngi
nsi
ght
saboutt
hedat
aset
Thefir
standfor
emoststepofanydat
aanal
ysi
s,af
terloadi
ngthedatafi
l
e,shoul
dbe
aboutchecki
ngfewi
ntr
oductor
ydetai
lsl
i
ke,
no.Ofcolumns,no.ofrows,t
ypesof
feat
ures(
cat
egor
ical
orNumer
ical
),dat
aty
pesofcol
umnent
ri
es.
● df
.i
nfo(
)
● df
.shape
● df
.head(
10)
● df
.t
ail
(15)
3.Descr
ipt
ionofdat
a/St
ati
sti
cal
Insi
ght
Weneedt
oknowthedi
ff
erentki
ndsofdat
aandot
herst
ati
sti
csofourdat
abef
ore
wecanmoveont
otheotherst
eps.
Thi
sstepshoul
dbeperfor
medforget
ti
ngdetai
lsaboutvar
iousst
ati
sti
cal
dat
ali
ke
Mean,
StandardDev
iat
ion,Medi
an,
MaxValue,MinValue
● dat
a.descr
ibe(
)
4.Dat
aCl
eani
ng
● Thisi
st hemostimportantstepi
nEDAi nvol
vingremovi
ngdupl
i
cate
rows/columns,
fil
l
ingthev oi
dentr
ieswithvaluesli
kemean/
medianoft
he
data,
droppingvari
ousv al
ues,r
emov i
ngnullentr
ies.
● Forexampl
e--
ToCheckDupl
i
cat
eswehav
etouset
his-
dat
a.dupl
i
cat
ed(
).
sum(
)-r
etur
ningt
otal
numberofdupl
i
cat
esent
ri
es
ToRemov
eDupl
i
cat
eswehav
etouset
hiscommand-
dat
a.dr
op_
dupl
i
cat
es(
inpl
ace=Tr
ue)
5.
Dat
aVi
sual
i
zat
ion
● Datavisuali
zati
oni
sthemethodofconv
ert
ingrawdatai
ntoavisual
for
m,
suchasamaporgr aph,
tomakedataeasi
erforust
ounderst
andandextr
act
useful
insights.
● Var
iousTy
pesofVi
sual
i
zat
ionanal
ysi
sis:
a.Uni
Var
iat
eanal
ysi
s:
Thisshowsever
yobservati
on/
dist
ri
but
ionindataonasi ngl
edatav
ari
abl
e.I
t
canbeshownwi t
hthehelpofvari
ousplot
sli
keScatterPlot
,Li
nepl
ot,
Histogr
am(summary)
plot,
boxplot
s,vi
oli
nplot
,etc.
b.Bi
-Var
iat
eanal
ysi
s:
Biv
ari
ateanaly
sisdispl
aysaredonetorev
ealtherel
ati
onshipbetweentwo
datav
ariabl
es.Itcanal
sobeshownwi ththehel
pofScatterpl
ots,hi
stogr
ams,
HeatMaps,BoxPl ot
s,Vi
oli
nPlots,
etc.
c.Mul
ti
-Var
iat
eanal
ysi
s:
Mult
ivar
iat
eanaly
sis,
ast
henamesuggest
s,di
spl
aysar
edonet
orev
eal
the
r
elat
ionshi
pbetweenmor
ethant
wodatavari
abl
es.
Scat
ter
plot
s,Hi
stogr
ams,
boxpl
ots,
viol
i
npl
otscanbeusedf
orMul
ti
var
iat
e
Anal
ysi
s

Q4.Appl
yexpl
orat
orydataanalysi
stoStudent/
Empl
oyee 
dat
aset
sandpr
ovi
de
i
nter
pret
ati
onsvi
arelev
antvisuali
zat
ion.

Q5.Howmi
ssi
ngdat
aofEDAi
shandl
edexpl
ainwi
tht
hehel
pofpy
thoncommands.
 Handl
i
ngmi
ssi
ngdat
a:
a.Datainther
eal
-wor
ldarer
arel
ycleanandhomogeneous.Datacan
eit
herbemissi
ngduri
ngdataext
ract
ionorcol
l
ect
ionduet osever
al
reasons.
b.Missi
ngv al
uesneedtobehandl
edcar
eful
lybecauset
heyr
educet
he
quali
tyofanyofourper
for
mancematr
ix.
c.Itcanal
soleadtowrongpr
edi
cti
onorcl
assi
fi
cati
onandcanal
so
causeahighbiasf
oranygi
venmodelbei
ngused.
d.Ther
ear
esev
eral
opt
ionsf
orhandl
i
ngmi
ssi
ngv
alues.Howev
er,
the
choi
ceofwhatshoul
dbedoneislar
gel
ydependentonthenatur
eof
ourdat
aandthemissi
ngval
ues.Bel
owaresomeoft hetechni
ques:

● Checki
ngNul
lent
ri
es
dat
a.I
sNul
l
().
sum -gi
vest
henumberofmi
ssi
ngv
aluesf
oreachv
ari
abl
e
df
.i
snul
l
().
sum(
)
● Remov
ingNul
lEnt
ri
es
dat
a.dr
opna(
axi
s=0,
i
npl
ace=Tr
ue)-I
fnul
lent
ri
esar
ether
e
df=df
.dr
opna(
axi
s=0,
how='
any
')
pr
int
(df
.i
snul
l
().
sum(
))
df
.shape
● Fi
l
li
ngv
aluesi
npl
aceofNul
lEnt
ri
es(
IfNumer
ical
feat
ure)
Val
uescanei
therbemean,
medi
anoranyi
nteger
mode=df
['
Seni
orManagement
'
].mode(
).
val
ues[
0]
df
['
Seni
orManagement
'
]=df
['
Seni
orManagement
']
.r
epl
ace(
np.
nan,
mode)
df
.i
snul
l
().
sum(
)
● Checki
ngDupl
i
cat
es
dat
a.dupl
i
cat
ed(
).
sum(
)-r
etur
ningt
otal
numberofdupl
i
cat
esent
ri
es
● Remov
ingDupl
i
cat
es
dat
a.dr
op_
dupl
i
cat
es(
inpl
ace=Tr
ue)
UNI
T5
Q1.Wi t
hthehel
pofl
ocal
andgl
obal
mini
mizerexpl
ainconst
rai
nedopt
imi
zat
ionwi
th
funct
ion.

Q2.Usi
ngnewt
on’
smet
hodf
indmi
nimi
zerof:
-


Q3.Summar
izet
hest
eepestdescental
gor
it
hm wher
epoi
ntst
art
ingf
rom x(k).

Q4.Ev
aluat
e  

    

usi
ngapenal
tyf
unct
ion.


Q5.Wr
it
eashor
tnot
eont
hef
alseposi
ti
onmet
hodwi
thexampl
e.

 Theconv ergceprocessint hebisectionmet hodi sveryslow.I tdependsonlyon


thechoiceofendpoi ntsoftheinterval[a,
b].Thef uncti
onf (x)doesnothav eanyr ole
i
nf i
ndi
ngt hepointc( whi
chi sjust
 themi d-poi
ntofaandb) .I
tisusedonlyt odecide
thenextsmallerint
erval[
a,c]or[c,
b].Abet t
erappr oxi
mat i
ont occanbeobt ainedby
taki
ngthestraightl
ineLjoiningthepoi nts(a,f(
a))and( b,f
(b))i
nt er
secti
ngthex- axi
s.
Toobtainthev al
ueofcwecanequat ethet woexpr essionsoft heslopem oft heli
ne
L. 

f
(b)-f(
a) 0-f(
b)
m =   

 
 (b-
a)  
(c-
b)

 
  
  => 
(c-
b)*(
f(b)
-f
(a)
)=-
(b-
a)*f
(b)

 
 

f(
b)*(b-a)
c=b-
 f
(b)-f
(a)

Nowt
henextsmal
l
eri
nter
val
whi
chbr
acket
sther
ootcanbeobt
ainedbychecki
ng

f(a)*f(b)<0thenb=c
 
  
  
  
  
  
  
  >0thena=c
 
  
  
  
  
  
  
  
  
  
  
  
  
 =0thencist
her
oot
.

Sel
ecti
ng 
c byt
heabov
eexpr
essi
oni
scal
l
edRegul
a-Fal
simet
hodorFal
seposi
ti
on
method.
Al
gor
it
hm -Fal
sePosi
ti
onScheme

Gi
venaf
unct
ionf(
x)cont
inuosonani
nter
val
[a,
b]sucht
hatf(
a)*f(
b)<0 
Do 
 

a*f(
b)-b*f
(a)
c= 
  
 f(
b)-f(
a)

 
  
  
 i
ff( a)*f(c)<0t hen b=c 
 
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  el
se a=c 
whil
e(noneoft heconv er
gencecr
it
eri
onC1,
C2orC3i
ssat
isf
ied)

Thef
alseposi
ti
onmet
hodi
sagai
nboundt
oconv
ergebecausei
tbr
acket
sther
ooti
n
t
hewhol
eofi
tsconv
ergencepr
ocess.

UNI
T6
Q1.Whati
sdi
mensi
onal
i
tyr
educt
ion?Di
scusst
hei
ssueswi
thhi
ghdi
mensi
onal
dat
a.
 Dimensional
i
tyr educt
ion 
simplyr
efer
stot
hepr
ocessofreduci
ngt
henumberof
at
tr
ibutesi
nadat asetwhilekeepi
ngasmuchoft
hevari
ati
onintheor
igi
nal
dat
aset
aspossibl
e.
I
tisadataprepr
ocessi
ngstepmeani
ngt
hatweper
for
m di
mensi
onal
i
tyr
educt
ion
bef
oret
raini
ngthemodel.
I
ssueswi
thhi
ghdi
mensi
onal
dat
aar
e:-
 deceasest
heov
eral
lper
for
manceofmachi
nel
ear
ningal
gor
it
hms
 t
hepr
obl
em of
 ov
erf
it
ti
ng
 di
ff
icul
tfor
 dat
avi
sual
i
zat
ion
 i
ssueofmul
ti
col
l
inear
it
y
 noi
sei
nthedat
a
 unabl
etot
ransf
orm non-
li
neardat
a

Q2.Howcanweext
ractpr
inci
pal
component
sfr
om agi
vendat
aset
?Expl
aint
he
pr
ocess.
 Pri
ncipalcomponentanaly
sisisatechni
queforr
educi
ngthedi
mensi
onal
i
tyof
dat
asets,i
ncreasi
ngint
erpret
abil
i
tybutatthesametimemini
mizi
ngi
nfor
mati
onl
oss.
Toextr
actpr
inci
pal
component
sfr
om agi
vendat
aset
swehav
etodot
hef
oll
owi
ng
oper
ati
ons:-
1.St
andar
dizat
ion
2.Cov
ari
ancemat
ri
xcomput
ati
on

3.Ei
genv
ect
orsandei
genv
alues

4.Feat
urev
ect
or

1.
Standar
dizat
ion-
Themainaim ofthisstepistostandardizetherangeoft heattr
ibutessothateach
oneofthem l
iewithinsimi
larboundaries.Thisprocessinvolv
est heremovalofmean
fr
om t
hev ar
iabl
ev aluesandscalingthedatawi t
hr especttothestandarddeviat
ion.
Z=v
ari
abl
eval
ues–mean
St
andar
ddev
iat
ion
2.
Cov
ari
ancemat
ri
xcomput
ati
on-

Covar
iancematri
xisusedt oexpresst
hecor r
elati
onbetweenanytwoormoreat t
ri
but
esin
amulti
dimensi
onaldataset
.TheCov ar
iancema t
ri
xhastheentr
iesast
hev ar
ianceand
Covar
ianceoftheattr
ibutevalues.Thev ari
ancei sdenotedby“Cov”.Ont heri
ght,
we
canseet heCovar
iancema tr
ixfortwoattri
butesandthei
rval
ues.

[ var
cov
(
(
x) cov
x,) v
y
(
x.
ar
(y
)
y
) ]
3.
Eigenv
ect
orsandei
genv
alues-
Eigenvaluesandeigenvect
orsaret
hemathemati
calval
uesthatar
eext
ractedfrom the
covari
ancetable.Theyar
eresponsi
blef
ort
hegenerat
ionofnewsetofvar
iableswhichis
furt
herleadtotheconstr
ucti
onofpri
nci
pal
components.

3A
B’ B

4.
Feat
urev
ect
ors-
Feat
urev ect
orissimplyamatri
xthathasei
genvect
orsofthecomponentsthatwedeci
det
o
keepasthecolumns.Her e,
wedecidewhetherwemustkeepordisr
egardtheless
si
gnif
icantpri
ncipalcomponent
sthatwehavegener
atedi
nt heabov
esteps.

[]
x1
X= x2
xn
Q3.Supposeamani
fol
ddat
asethasanonl
i
nearr
elat
ionshi
pamongi
tsf
eat
ures.
Whi
chal
gori
thm woul
dyoupr
eferf
ordi
mensi
onal
i
tyr
educt
ion?Li
stoutt
hest
epsf
or
t
hesame.
Fort
heabov
eexampl
ewecanuseMul
ti
dimensi
onal
Scal
i
ng(
MDS)
Mul ti
dimensional scal i
ngisa visualrepresentationofdi stancesordi ssimilari
ti
es
betweenset sofobj ects.“Objects”canbecol ors,faces,mapcoor dinates,polit
ical
persuasion,oranyki ndofr ealorconcept ualstimul i
.Objectst hataremor esi milar
(orhav eshorterdistances)ar eclosert ogetheront hegr apht hanobj ectsthatar e
l
esssi milar(
orhav el ongerdistances).Aswel l
asi nterpreti
ngdi ssimil
arit
iesas
distancesonagr aph, MDScanal soser v
easadi mensionr educt i
ontechni quefor
high-di
mensi onaldat a.
Basi
cst
eps:

1.Assi
gnanumberofpoi
ntst
ocoor
dinat
esi
nn-
dimensi
onal
space.
 N-
dimensi onalspacecoul dbe2- dimensi onal,3-dimensi onal,orhi gherspaces
(atleast ,theoretical ly,because4- di
mensi onal spacesandabov earedifficul
t
tomodel )
.Theor ientationoft hecoor dinateaxesi sar bitr
aryandi smost lythe
resear cher ’
schoi ce.Formapsl iket heonei nthesi mpl eexampl eabov e,axes
thatr epresentnor th/sout handeast /westmaket hemostsense.
2.Calcul ateEucl i
deandi stancesf oral l pairsofpoi nts.The  Eucli
deandi stance i
s
the“ ast hecr owf lies”st rai
ght -l
inedi st ancebet weent wopoi nt sxandy
i
n Eucl ideanspace.I t’
scal culatedusi ngt hePy thagor eant heor em (c2 =a2 +
2
b) ,althoughi tbecomessomewhatmor ecompl i
cat edf orn-dimensi onal
space( see“ Eucl i
deanDi stancei nn- dimensi onal space“ ).Thisr esult
si nthe
similaritymat rix.
3.Compar ethesi mi l
aritymat ri
xwi tht heor iginalinputmat rix 
byev al
uatingt he
stressf unct i
on. St
  ress  i
sa  goodness- of-fit
 measur e,basedondi ff
erences
betweenpr edictedandact ual distances.I nhi sor i
gi nal1964MDSpaper ,
Kruskal wrotet hatf i
tscl oset ozer oar eexcel l
ent ,whi l
eany thingov er.2
shoul dbeconsi der ed“ poor ”.Mor erecentaut horssuggestev aluatingst r
ess
basedont hequal i
tyoft hedi stancemat ri
xandhowmanyobj ect sarei nthat
mat rix.
4.Adjustcoor dinat es, ifnecessar y,tomi nimizest ress

Q4.Whatdoy
oumeanbyl
atentv
ari
abl
e?Howwoul
dyouf
indl
atentv
ari
abl
esf
rom a
dat
aset
?
 Alatentvari
ableisav ari
abl
et hatcannotbeobser ved.Thepr esenceoflat
ent
var
iabl
es,howev er,
canbedet ectedbyt hei
reffect
sonv ari
ablesthatareobserv
able.
Mostconstructsinresearcharelatentvari
abl
es.Consi derthepsy chol
ogi
cal
constr
uctofanxiety,f
orexampl e.Anysingleobservablemeasur eofanxiet
y ,
whether
i
tisaself-r
eportmeasur eoranobser vati
onalscal
e, cannot pr
ovideapuremeasur e
ofanxiet
y.
Tof
indl
atentv
ari
abl
esf
rom agi
vendat
asetwehav
etof
oll
owt
hef
oll
owi
ngst
eps:
1.
Cor
rel
ati
onmat
ri
x
o Gener
ateacor
rel
ati
onmat
ri
xforal
lvar
iabl
es.
o I
dent
if
yvar
iabl
esnotr
elat
edt
oot
herv
ari
abl
es
o Ift
hecor
relat
ionbet
weenv
ari
abl
esar
esmal
l
,iti
sunl
i
kel
ythatt
heyshar
e
commonf act
ors
o Thi
nkofcor
rel
ati
onsi
nabsol
utev
alue
o Correl
ati
oncoeff
ici
entsgr
eat
ert
han0.
3inabsol
utev
aluear
eindi
cat
iveof
acceptabl
ecorr
elat
ions
o Exami
nev
isual
l
ytheappr
opr
iat
enessoff
act
ormodel
2.
Fact
orext
ract
ion-
Thepr
imar
yobjecti
veofthi
sstagei
stodet
erminethef
act
ors.I
nit
ial
deci
sionscan
bemadehereaboutthenumberoff
act
orsunderl
yi
ngasetofmeasuredvari
abl
es.
Esti
mat esofi
niti
alf
actor
sareobtai
nedusi
ngPr
inci
palcomponentsanal
ysi
s.The
pri
ncipalcomponentsanal
ysisi
sthemostcommonlyusedext
ract
ionmethod.Ot
her
fact
orextract
ionmethodsincl
ude:
 Maxi
mum l
i
kel
i
hoodmet
hod
 Pr
inci
pal
axi
sfact
ori
ng
 Al
phamet
hod
 Unwei
ght
edl
easesquar
esmet
hod
 Gener
ali
zedl
eastsquar
emet
hod
 I
magef
act
ori
ng.
3.
Fact
orr
otat
ion-
I
nthisstep,factorsarerotat
ed.Un-r
otatedf actorsaretypi
cal
lynotver
yint
erpr
etable
(mostfactorsarecor r
elat
edwithmayv ariables).Factor
sarerotat
edtomaket hem
moremeani ngful andeasi
ertoint
erpret(eachv ariabl
eisassoci
atedwit
hami nimal
numberoff actors).
Diff
erentr
otat
ionmet
hodsmayr
esul
tint
hei
dent
if
icat
ionofsomewhatdi
ff
erent
fact
ors.
4.
Maki
ngf
inal
deci
si
ons-
Thef i
naldecisi
onaboutthenumberoff actorst ochooseist henumberoff act
ors
fortherotat
edsol ut
iont
hatismostinterpretable.Toidenti
fyfactor
s,groupvari
abl
es
thathavelargeloadi
ngsforthesamefact or.Plot sofl
oadingsprovi
deav isual
for
vari
ablecluster
s.Int
erpr
etfact
orsaccordingt ot hemeaningoft hevari
ables

You might also like