You are on page 1of 20

Page No.

Unit 2. Date

Dete Anayics
oefine the emsi Dcatc Data fKIne,lheoet

Dctte seless to cny inforehion


O fecs thet are collected,stosed,
am d f6cesed fos cmclyis o
refesence.
-Zt cam be im Nasios Forns. such
ds nombems, tex ,imayeS yor oulredia

labeled dotae stTRLctkere îm Pen de s.


a Popale python Hbary far
dete mmi paltion analysis.
3, RteaSet i Adoteasel is a coll ectign detHoes
ypicaly ongamized ina dahulao fmcd.
-Z+ Cam be cny stauctued colletiao o
oT treinimg machinme lenmg madels.

3Give importence to EDA in Decha Analytks.


1, Uncesstandnq the Rate
de Oata ualit Assessmernt
3. featurre Selecion Engineerimg
4. Madel Assumpti om
S Iniqhd Genese 4îom
b, Comnunication.
Date

Undersandimg he DetaéOA helps


strnttHuse,pserns Aelatiagi
Indudtng its
bef vacables.
SmerntiEOA cdb Scnalyt,
Lte Gucaliy Asses velues oetlier s&
to id emt f missîng en
inconsyteRcies in the Qetce helpimeg enSuTe
quality befoe frothe ana lyois.
da tae
thct can
idendityg
importn fecteres relationship oT other
be used for redicdive mode ling
anelysis tesks
’ model Asscemptiom5 EDA helps cmcysts
ctnal?jcal modelssuch es mammulityinei
E indepern dence d varleble S

Insfyh Genesa tloni- E0A helps gyenerelhe nlis


and hypotheses about the dattahich canguike
fenher cncalysis cnd doctoiom =meaking ri
>Commm)cation ÉA esults cam be viuely
Commmunicc ted to steke holdes, mcskiny
Complex dete mmore inteatBable amd feit
cnaking
Date

.fcplain ZntesNctl Estr nutio cnd


Tesding in termso Quanhi te tiveMyosdhais
ÉOA.
Znteva EJtimatto-:

- inter a Leodimaliorn involvcs csArncd ina


elesible vealues Foepopdcon
based om dam ple dete
- This ran ge, kman as c Conide-nce
intevc l, Prouides infommce iam caboud he psacs ian
G the estimtte cand helps qucan lihy the
0n(es teanly associted with the sample
estime te

Hy aothesis Testing
Bypa hesis Aestingim quantitedive Eol
inualve s makimg Stalis kicalinfeten ces cabaut
Popeletjon eremeters besed o-n sample date
tyPically involves testing a mul hgpothes
hfch tesesents aspeciic cleim aT aJSUmptio
abou + he population Pesameter, agatnst an
altesnctive hypotheis.
= 9y Comgering Sample stafuics o he mell
hgpathesishypothe.sis teoting helps enalysts
aSsess ahethen sbsenve d di feesences or Tdaionshjs
in the data ase ska Irstically siy nlEican or
Smpy due toTandom variatiom.
Page No
Date

Qdte Amalyics éxlatn the


chal s Data Anctyics Sn deteai l.
Prócess_ Oat

Cnayzing, intepreimg ,4 deriving i sights Ezon


decis+om-mcakimg,opmize pracese
deatee t o Infom
&Golue complegc Problens. waious Steistical
-14 îmvolves the apliceion of
hem edical,nd computtionca] tech ntgesto
ma ship Fsom
Cztcl meaniyfel cetterns Cand seltion

the rocess steps o Data Analyi as


lo Defrne objectives
a DatQ Collec4ion.

Anel3, Dcta loeprocessing.


h Ez plorgalory Dtta Amalysis (eOA)
S. modelng and Analysis
6, 1ntespseheatian and Insight Genescttiorn.
4Nisualizaion and Reosimya
6. Nealidetin and lteratio
10, Condimuous lmprove ment
s bxplein Steps invalvcdin EOA CExolosateg Dcta
Analysis)

1, Defime youn questians and gacals


cohat ce yo.tgrag to leara faom he
- cohat secihc insihts cTe yoe lo oking fos?
- Align your qucsdions k goals with the ovexal
Rreject obj echves.
R. Get fam alias aith the deta ie
- Undesrstcend the date soLarscesg collecAion
me thod s &potenral bíases.
- folere the dcata struckre, ncluding dctte
fyles nidSing udlu es atliess.
Gel a inial Sennse d the ddta
diotai bahon and emge or each Ucyica ble
b, ceam and Prepaxe the data iz
- Address mibsing vahues through imputttom g
Temovel based om the (on text
- Hndle otliers dd im Consisten ies aQT OpS khey
- Convert date ty Pes as nee de d. for analyuis
- Cseate meu fettceres oo Combine exb+img
ones foT betBe insig nts.
Page
Date

vegisbles osing desctpive


-AnlyzE individal medicam node, etc)
Stolts 41cS Comectm
distribution d edch Varib) e
-visvalize he donsity Alots.
boaxplots oT
aith his togms
patenicl R dtesn , trend s
ldensf
Clnomaiesevtthin each vctr+ceb
le

Eondvct biva icte cna lyiis A Veentables


5. reladionship betweenn deg
- fxplore cosrelealionn coefficiem4s.
USing scattem plots
trenmds endpotenticel
- leok fom gatfeTnS he vericblcs.
Conselatioms betwee
-ldenhfy potetdical explanetor Acnget
Vcriccle s for fuatheor cenczlyss.
6. Conside multiwr iette cne lysis eaplare
Lo Qtta Sets coith many variables, uong.
realion ships betoeem group[ af iebles
teghmigues like dimensignality xeductom oy
cus t e r i m g e
- This cam help yn Covea hidden erttesns
and reldton ships hat mih mot tbe evidemt
im bi weriate analyois.

- Use cecL g cimd infozoe tron cheSts


Jsaphas, table s to presend 4arr finding
effrcANey
Page Nc
Date

- TatloyoumNidali zeatian6 to yous


au dience can d ensaTe teny oncerstcAnd
the eyinsights
Doccenment yaor foce ss aslemp ions for
AxanS penency 6 TepToduc bility
8,1terte and Tefrne
-As you lecm from the daete refne you
initiclqueslio n6_ and goals.
- Conduct fug th ers ccnz lgais btsed onyour
Findings and itercte thraugh the co4 steps
es needed
- This iterative agroach en Surtes yo ezteut
the moS value fson No dclte

6, Explain k- meCrns Clusteîn algomi tham cohat


i5 the mejo dscecs beck

>k- meanS Clustersing algari thni

- ZE Ls 3ed Fo custeimg deha points into


gToups based on hei simileritycustevaTkme
-l+ aim s o minmize the wthIn-
deste Points wi thin ce cluster Shoeld be
medni
closer 4o each othe than to Pomts in othe
cluotee
Page
Date

le SPea the nomber a custems Ck)


e Ingdtctize Centsoids custegs.
Assigon date poîn+5 to
3.
h, UpdcteCen tToids
Se Repett steps 3 and ha
&.fincal cluster .

major Drcoback Semsiiuity to initiad


Con d1dionS.

k-means Suffenss trom a Siynrkcant


to th nltia
dRtoback i t s sensitivit
plecemend o Centoid S
-`ince Hhe algsrithom ikeselivelg seftnes
Hhese iniicl posiLtons diffesent sten tina
fasnts can leced to different fndl cluste
even with the Sceme deete,

n 4. Libt ouet vasious ibraries o ython ued


for de ta cmalyHs.

in dete Cmaly1cS

be matplo+lih
Date

6. Scily.
uStats models

(0.Tens0afloc

)Nompy
findaememtal nmeniccal Co-mputimg
libry
- Provides eicient crsty structS for
dcateemonipuladcon:
-offess mehomattcad Eanctron s, linecey
alyebre taols,
) Rrndas
-High level dete analysis &mmipulatton

fog tabulcr daata.


-Enables delta laasing clectninng,fansforeen
mexgimg,inde sing ilalizatron.

9nehplotib
- Compsehem sive libray fom Creahing verious
6tcedic cmme ted,cnd intesactve
v)sucaltzatiom s.
- Genegtes line plo ts, Setters pls+s bas-cht=
histragams, hece tmeps,30-Plsts < moTe_
- offess extensiNe customize tion optrons.
Date

- Seabon mattolotHb,Psouidiny
-bheilt on top of Stetrstica
fo
highe-level indesface
deta UiNuelizaIton informa Aive 4
-Simol1fies Creehing
plots.
estheiintay
Soeiclizes plot tles like vol
sf hetncepS
Plotsjont cnd

) Scig sclentific Competing aytng


-Coleciomd fernc tion
and ma themcaticcol
Capabilities o}th
- Eadeds NoMPys
yiteyra ion intepoleation,
more,
Specic ( fenctians
Computasion S.
engineeing

Vesnfile smachme lecning libcycoth


-
a coide Yrnge d lyorithms.tonYeesia,
- bncudes tools for clcssi fica
custesing,dimen Sioncaity teducton,máde
Selechon « PrelroCeSS*Ng
-seo- foiendly AJ. and eztensiv e
doccnern tcedtond
Date

focuses on stetisticcaloodeling cend alya


- Prouides tools foo Stasticce teos,eshrlicn,
inferenceExolorcato
- lncledes limea
ydatanalyas
seyessI9nr00delss geealized,
linetcmodels timeSeoies Knelyais 4 mgse

2eloty
-lnterecive visuaizahon litzeay
oeb-based plots do dashbocesd 6. foxcreating
-(senesates highlhy customizablef dyraanic.
Uisuali zation s "fo nterectiVe dae tcL
eLplosetion
- Supposts 20 plotsps s
chasts.

-Powesful open- SorTce machime learnimg


Eruomecoosk for deep leagning
-USed Eos building & trainíne neêralnetaak
Cmphasizing flesbility á seez
- Ropulctg fo onQete Vsion naturel lanwg=
otheo tdvmced As tesks.

5)JepSors Plow i
fory
-Anotherr Popeldar open Sarce l torm
mchine leasninq and deeplearning
offers Cosnpre hensive ecosyltem d too
libra mes fo sesetch t deNelopment
E
Expleaim the Concep by
divingis EDA eremple
Sattaeble
Ezplosztony ta Arayais.
-EDA meaS
inidial stage d cny data to
As the dive into yous
Prscess ohere yaucharactemstics
Mndess end it's ce detectve gctthering clues belnc
-2'6 like beimg
Soluing omnysteryl
Eternple aclathing Stone awner
Anagime youlve Customer ba se bethe

Sales.
4o imaNe Contoining nfommchon
Se +
-70u hqve adata. inceiny
abaut 4oer CstomerS,
2Gende
Loccatom
punchese hisory

1, Desclbe the deta:


-cohe are the dqe nyes amd yendeI
distabutian o your CustomeS?
-ohee are heg locaed geagsa phicatly
- whett aTe he ypicce l puschase asn aunts cLnd
Date

- Crectte chazts andgsaphs to see ho


Arffescan t veriablec relate toCack othea,
-Axe hee am 4rends in uchase beheviax.
based Cnageor locatian2
- De ceakain tend to be baughLtogethes

-Are hese any otliers in the dat like


customess witth nUS ualy high Speming or
Specific item Pre ferrern cesl
- Jnvestigcete these anomalies to undenstn
4hciz paenticel Cuse a impck

4, formulce te quest1ons
-Ra sed om yaT initical exLplonahion,whcz
questioms do youhave aboat yoz (ustomen
asyet with merkehing camp igns?
- Car yoe identif tenticel product TComnes -
dation s base d om pumchase histoy

l. Enhenced Rethern Recaymition


makimgite sieg to idemtify pcettens, tremds
And am omelies through viue lizahon coniparoe
to s dcet t ablee
Dato

(ctdioo and col,absrt


Communi
-vijualiccalion aTe aUnvesldngae
3. faste
Comrnunicate in Sishde
to
makimg itedsier Veyfng AechniCal
Aostake holders coith
back gromdS.
Hypothesis Genesat ioni
3. Eeficlent exploinq the datay rcedchers ceon
-byuisuctlly hyloheses ahaut potenhial melestonshe
duickty forn fehes
qnd test them

load
H Aeduced coqni iveCogni helps facus
tive locad camd
- This educed
key insiyhts
5. ldenfication atlrez cnd f a s s
da ta painks
-Uiszalizetins caen easihy hiyhligh+mormm,
thest deuiate Siqniicaoty froO te pahendik

6.Oiscoveny iddesn Aelclonn sh ip5.


-Thiscan l ec d to mec discaueies ad Anuge
îmuoletive solutron S hett miyh+ not hewe been
appatr en t from tTadittoncl ma lyiis methads

- This cAn Tecd to betfeg dectsiom -mt king


bes sed on date driven insiyhtå
Page tio
ate

10 coht do you understand by Predichive


cn ely ticst
Predic Fivc analytics t-zt is the pracess d
VAina historical date ka Gtatisticalana delrnq
techniques to Psedtc+ fturse ootcomes Or ends.
- 2's like havina (Tyslal baly be instecd
omaic yit elies on dte SCrence cnd
mechime lecanim to make infarned guesses
about chat miyht happem meoc t
the Powep Dactce
Hcanncsig identi Ratttenns
This detae 1s chnayzed ta
Vaìables
cand reletfon ships beteem

bStedsical modeling Techmqeesi


Algomithms like reqrejsîom decision trees
psedictive madelS.

Dbeyond Just Mondess

-Poedrchve cnalyics Isn yst abatt hose


mombers nto cciona ble nsights.
Page No
Date

Examples in Action'
(redictina
cm e-Conmnese stoneerchase
mugine
Customer churn btsed On heim
4hem to îmteVeneend
histeg inm
alleaing
vetlucble cutamerS
Te tct

Genefits Poedtcdive Anzlyicsz.


bmproved decsiom-aking
- Inisea ed eficiemy
- Aeduced pisk
-Enhmced Cugtomer epertem (e

cPlatm amy one ctpplicaztion of data Ahoclytics

> Application 2Predrcting locAn DeRault Risk


cwÌ4h Mcchine lecgnimg
55cedao A bank onts to assess the ik
o- Potenticel barsnowers dofenel tin g
Process

Det colecio
- Gathes his troricctl trem sachîorn ddte
including, cmandloccdiontime,meschan
categey cadhald er informatian k Erand.
labels.
Page tio
Date

Foature éngineringi
- Create gelesnt feattearae frGrtn theddta
thatmîgh + indica te fraud ysuch as
trangction cmaum t Cornpcased o
hietorfcel seen ding pcattesnslacet10n
deuiadioo ram VSulspero dim lacahion

omerchcimt ccteore) etl

DUnsapeauised Arnanhea ly 0etectionz


Snce labeled fTnundlen t ten Sackions
re often a Sncll rction o the dat
emPly un Superuisedanomaly dete tton
tehniges like lsobtio fores t Loccal
Gne-Class Su ppo
Vecto mach nesimke

Model Evaluetion &Tuminy


- kvaluate the pmsdells PestoTMamce on
a hold- out setusinnq metics 1ike
PseciStomecetll cd fIcoTe

- Deploy he model toa Taltime system


hey OCCUr
to canalyz e nest mSactioses
tie
end Fk potentiel frced in reel
)Addiiomcal Cop Si deTations1
- Odtce security fae se posiive 4Nget Ve
baplainability.
Date

be' ade- Ardlyics Dle -Scieme


ROfRererhate
eta Science
Data. Analy dics

dctce- Paedictive mode ling.


Arnalyzing pust mchchime lecernimgcAz

diven - Soluing Co-mplex problem


making data- futets e
kpsedaimg Futa
decisio out con es,

Descoiptive and diqg no+ic- redictive <o fse5caiplve


amalytig cnalyticS
Tobls ase Gtaitical-Tools aoe oogmmirg
mehod s visuelization lcmguagesClython, R,ej
to ols machine leminy libyetia
Staong analytical skils,-Prognaoming ,staites.
domaim knocwledge achime lectrning,
JdoTnain kngcoledge
-ezdded to cmalyzing beBrocLde Scope inclucmg
CStng deta derta conglysi s, me chine!

Analyzmy Sales datee do- Buil diny


idenirfy Arendy S3ste Foo dm e
Comnece ple+fo
Date

BExplaim the tyles o &xplorztory da te


AnelysS
- é00 deal coith dttae to undesteind
4 chandG te)btics, Uncovcs Retteensamd
fonulate lnitiaal hypotheses,
- Thee cLre hree mce in tyles, based on
the mumbesariebles inuolve,

lo UnivaTldte Amalysis
lo Bivajete AmeysiS
3. mulivCCede Ama lysiS

>Unjvaricte Anelyryt
vaicables
This focu ses on indiicel
didtribetion,Centel
to vnderteonnd theî
tentency,vritbi lity , Potenticed Gtliers
’ techn1ques - De scipive seti sics ,hitogs,
boclots, density plots, Foeguedcy tctbles.
rng. the distribtioncustomer
’ Exaople: Arnalyz
aqes in e deteset usimg a his toyTam

too
- This Ecplores elettionships betwee) dens
Jariableb, lookimg fog Co-eletien s
0 difeerences.
’ Techmques 6catter potscorela+ion Coeffic iert
CTOJS- toa bulo tion 5, booc plots coitth diffeetqps
’ Ezcro ple : Eram înim the relaio-nship 5etdeesn
Paschase emaLnt and customer age using
a sa tterr pl6t.
Date.

Smultrvts tcate Anncysis:


hree or rooTe licthles,
- This dectls aith
elaiooship s hidden
Uncovecing compleoc
Pattern S.
Techmigues : Oimensiornality cduc Hion,
ectiompsnetwork
clusterimg lgomi thrns h
enalysió
Custamers bosed on the schase history
demog-dphic S engege-ment mekicS

You might also like