You are on page 1of 23

Dota snalytes


Sntroduction te ata smalytes
Sylia bus
Srlbooduction to data analyies; ouwcs and maun q data, classiticalien
dala (stutuned, semi suetued, unsouctuund), chaacinislie
du ction to data platjorm, Nerd d daita analjtic
deta gntrc ba
Evolition. oanalytiz scalabilty, cralytie pretas and torls, analysis
uporing, merdexn data analytie toes, applucalien dala

Data Analylis iecyeh: Nead, key volas for ucal analyt

project, varios phau ef dath analytes like cyes- discovt
data prepaxation, model þlanning, model buibng, Commu ritaio
iault, ohexalionalizalion .

Introduc tion t dato Analylics.

OwNCEA and rnatiu_ data

Data Suw
ata is a sact o fiqun ehtaind em eapeimanas 8 k e d enetsions
calculatiens drauimg
a basus or making
ssund in a
seim that is sitnb
Sor exa mplu,ttxt, imaguu and
er sloraga in er brocusing by a tombuu
SowceA o dato
The au two types of douN CA of dato avaulakol

Jimau sowu o dala

2 etondauy soce dato

Timay sounce dato

h e data which i raw, oriainal and exlra t t l diretly em dl
o ces is known as prim aruy
This type data is diuctly cetlhehd by þerforming tichniquas inh
unslion arus, intxviees and suveys

*ata Collectd must be a ccerding to the dema nd and AequvamonEs

t h taxget audisnts on which analysis i befemed otheruist it Lsul

be a bwden in he dlato pro tsstng

Metheols far collecing primany dala
Tntevies method
Swvey me-thod
ObseH valion method
Ecbexi mental method
peimenta) me thod: 7he expeximental
methood hu pro q colluetig
data through pexerming e«pestment, ressavch and investgation.
The most q u n t y uasd expeimed metheols ara (CRD, RBD ,LSD , FD.

CRDCempltly Rondemized lesgn

CRD Js a simpl
epeximental design used in data
analytiet which u

besed on randemizolion and vepb calion. 4l is mestly tucd or Cembau

he expbeimeris.
ReD-Randlomized Block Besign
RBDi an excpedmetal deslgm in sohich the expeximehb is diidd
inte émall unit calles blocka. Kandom expeiment as pexfermd m

each ohu blotks and nsuls are dauen usimg a tichnique km Bn as

fralysis o Variante (ANoVA). RBD is oviginatad m the agriutu

LSD- Latin Square Vesign

LSD s an ecperd mental dastgnhat s similax te CRD and RBD blat
but len lains cws and coumns. 9t is an
auxongement b XN OT
wh an equal amount T u s and celumns tokich Cortains utte h a
OccLus ony onu îin aTow, Hene+Re diwuncu Con br easi ouna
Lsthewer evu in he expeimeñts. Sud o buzzk is an ercoml

atm sguaxe design .

Factuial Besign-£D
r O s On ecpexi me ntal design expeimend has tüe dacters
uwhou eoch
each with possibl values and en pexfozming traw othex Combinatisna
acts aa duived.

e Condauy sSowee odala and

already besn telluatd
bece nday data is he daia hhich has
u d again er some valid buxpeas
and il as
Tho data is previously seCordas om primary data
type s
tüve typer sourcu mamed inttnal eune and exunalescuwru.

Thuna soun te
These ybes d a a can éauiy be Jeund within 09onizolon zth
as maxkd u toyd, sales ne tord,Torsacton Customu dat, acteum
s C w . i e .Tre etest and tme consumpi o i s obtaining nto
Eblivnal ahowte
Thi data Lokich and tan
Con ' be at inloral
found ovgonizalion. ba
oough eztövina) +yd baity JesOLUKCLA is esdinal s6eunu o daf
TK cost and time Censumbtton iv becauashis Contauns q huge
ambumd vt data .
Ecambles- Goveunmenl þublicotiont, Neus pubUcations , yrolicatt k
and other
Meqevemmental pubicaluons
othe sorces data
ensor data : with th advonoment otOT devicu, R åsnko1A 0
hese dvtes Collct datä which can hr uard dur srsor data aalyiui
track th þexfermantr and usage o product.
satoll data: Satllites erllret a lol images and dalä tm tonb
odaily basis th7oush swveillan ta Comea ohich ton be usd fo
callet usnul
indorma ticn
eb tsaic&u to fast and cheah interned
ai Gti omaub a many
dato okich Lan be uploado.d by uauu on dunt plotmCgn
bvedictad and Collect with hi þexmiasion for data analyais, Ths
w6each engines alse brevid +hee data
analysis. "TK
Seathed mosty.
thrbugh keyuoorda.Q nd quit

ata to llo ction


imov dota
Se tondaTy dala

i nNeuvieu
Intunaldoto Cxlorna
OrQaniza tion
Cnovern mU

clossieation dala ceWelien Aouma

Noture o dlalo
The natuns of data is clasitis int feur Catigerias.
Nomi nal dala
Drdinal dato
Snterval data
Ratio data
Noninal data as the identialien of
edcale is ustd or auigning to*ha
nal e iownals atwrdny
ndividuual unt. fer exampu, A clasi kation
he considaad as mOminal olata. 3
dusibne hy balong may
numbtrs ipreatnd
a assignad to dlsi be +Ba cailagoris,h
he name h e o Calegory.

Orvdinal data among he n u m berA @Asigna

t indicatis He ordered or 9sade 0
made. THese rumbs eannot Tonks a n t taleg erus
to tha cbservatio n
hawng a ulatonship in a diinith orde.
exambl. ë study the respensivenu ot Ubrasy statt a esa chor
may assign J'to ind'cat ÞooY, '2'tndicat.aHaqk, '3' toindiala
Gcod ond 'y'to indicate enlent Te numbuus ,2 3, tn his Cau
au át o Ordinal data
The 0Td'nal data sshoo the divecton othe dun and netthe exa

anmeunt diount.

Intval data beuun

ntva datà au ordoud Categeries o data
and s duntts
vasLus Cattqris gre o eual measuTemenc

for eampu, hse Can measure hu T9 70 ch ldren,tl4

Onna numeudtvau to he g oeatch chi ld,ha olata Can be
2T0up with tta inttval 0-to l6, lo-to 20 and As m.
Ratie data
Katto data an +e quantatie measwume nt o a vauua be in y ms o

ratie data, we Can bay hat oneHing tuie 6 th.

istuies thrie.
ma .Tn

Foreamblu, meastoents measiment tnvolving ueighd, di bnca,biu tk,

Clateatien data
t tn inoTmatien stored _in a batieulas fth uhressnTad
The data ae clasitid inte thret germs dota
ötwrt om
Unstructind orm
demi tuctunad gorm

abuctiun Lohe
IC d koTm : Any term erelationa! database
ulatin bitinn altibuhu is bossibla. That h r ewisb a xulation
atwean eUN and column im the dalabase oth atable teucturr

Eg.ivg database progiammang arg uage ql Oracle, mysqe

Unsbu claed form: Any ferm t dlata that des mot have þredetned
stuclss is epresenta as unstueliura form ot olata. g
vfdee,imag, Cormmentu pest few wrbsites such as blegs ond
wiki pedia
Seminstwotned data : ta e u mel hae erm ta belar dab
imilas KDBMS. sedefned ogonize8rmais avai abe. Eg: c
xmk jstn, t«t filh tth tab keberator ete.
Chaxactvisties a t
>Accura cy
Comble lines
Re Gabi ty
Time ine that can uAeda
t eOT Xt Covds
CCLUNacy > Data acwnacy e
seulia be dcueee inomalinn
ov Lahaisca
hs ns we nas

comptbneds Data Comblutincas Telers h the compre

0v missing ingemati data
oheleness e dala.Thau sheuld bi re gapi
t btruly complst
that data is compati and accuurato
Data uliabity
means tUhe
heiabiGy lding data
trusr acess

Czucial doundatüonr
and i t is a


the ingormation can dátrve

Kelevance Data velevan ta assesies oheathen
i purpese tnapavticulas Ccoteszts.
hou to daa tha
Timeliness Rata timelUness xsles to ths u
incrmation is .

Introduc tion te_Bi9_ate Platform.

oig data is a iudthat aat ways Ta analyze , dystmatical
exract fndevmaion em, 0r OHevusui daals uith se data adh tat
ae tBe lasge Or Cormplae To be daal uwih y tadt linal data

pvOCLAB ing -appication satiar

es bigdet
demi. suetud

Chana ctivisties
The a
bi daa
thavac uisties of
5 big data ae
asfe leus
Vaxie ty
Vena cty
Ve toby
*The name 'bjq data' tses i relatadto a Aize Lohich ts eney meus

*Yolume ishuge gmourd olota.


To datmine th vau
evalus sijx data plays a vey
ive luma o data is vey lage than it is actiallcensidesd
abi data' This means uoheather a
þarücular data c a n
be actitally bt consideud as a big data oT Tot, is dsberolun tsen
volume o data

*Examp Jn he yar 2016,+Ae

eshmatnd alo bal mebile t
oas 62
enabye62 biUio n GB) þev menth. Alse by
2020 we wiU have almost y0, oo
Exa byts of data.

Veloty re t ths high sstad acumlatien e data
Tn big dato velotiby dato flouss im om sowCLs U ke machenes,
iuork, soio/ madia, mobio fbhone" ele
Th u e massive and cotinous ow dala . Ths debimimesthe
botntas ot data hes fast +a data is genarata d and procea
t meetho dmands.

Sanpling dlata ean halp m daaling usi-th the issue U'velotry

Eocampla. Thus a u move than 3:5 biLuten seaehas pe day a r e
mads en
9oga. Aue, Jasbook u s u s aru incaastg by 22/ (appm
Jeax by yeas
Vate ty
S t e o s th natvu o data +hat is stuucned , dsmirsu tu

unstructiuned data
s t asc e to haluegunt eus öouH Cs.
* Vasiety is basically Re avri val e data hom neLs Aou cas that a s

beth nside and outide t

entwbrse . t an
an he sucliurad,
a$emstuetuud and unsbucure),

Vewaety (Truthgul)
*St e s Toin Consistenies and un tuta'niy in olo, Ha
data a vaslaba Can seme time get messy and 9ua
acuracy e d c t to contol.
a j q dauta is also vaviab» bicause d t a muldude et data dimenälors
Testin4 om u l t plu disparate olato upes and souru,
datà in bulk A
Exampl, would creat confusiors olerea
data could Convey ha orintombttt intormauon

The bulk of data havin no valus is ne go d to tomban
unlass you +un it Trto semething uushul
a t a in i a a is n o uL l imbovtanu butt it naads to br
tad îto Some- ng vauoblu to extrat întermation.

Aldvantages big data

0pbortunities tt make bett docistons
Inctasing hroductivity avdF'ciency
Keducing costs
Tmrevî ng cuusomar suviCe Qnd Custo m expeiena
raud and fnomaly datetion
reatis Agility and spead to masket

Disadvantoges a Bi a
uastionabla data qualit
Heightend secusuty risks
Spdta ComplGane headachos
Cast and îngrostuuctiun tssees
Rig olata sills shortage

Data fAncalylics
Qata fnalyics
Data Analytics s -thu stiunte
Cpncusions abouthat dbdtae .fnor mation.
Taw data in ovde
This inormat
Can bt usee t5
0btimize bTocisses t inctasa he oveall c
o bunoas or 8yshm.


MM Crseat
Collet olala
tnalyse dota

ata nalyicr,
Hbes o at Analyts
4.Desoübtive analytiu
2Predictive analuteu
3recbtive analytius
4agerdste grayH a.
is duocay
deseribüve analyti the resuld
RSpuve onaluue Th

n g a d oi+the brobabi ty among ropi n nTumbns

Lohe to ch obtin has an equal chantu d obabiutu

Eg (obsevalion , Case- study 6uve)

redictive analti : TRà huhe onaluticu deals uih predi
alaoithms. n
data baxed Ctdin
past to make decisiors on
batient th aboud t h pa
Case oa doctor t h doctor quations
CoTTet hu fllnss rough already existing reudue

Eg hea htart hera eathen, însurane, Secial madia analyLs

heCuve analyücu : Prescribtve analyls twerk with preditwe

analytu, whiches cuss data to adrlumine TRar -tuum ottomes.

resCriptve analytis make ts machine larning to he lb busirewe

ducide a Coumse e actien bud on comput roa7om' prediCiena.
E: heath ta , ban kung

Siag oniste anaytica TKs tousAs moTe n ohy semethin hapnad

TFis invc lves move diveue data in puh and bt hubetheizing

Netd 7 ota tnalyie

>Gathu hidden inah
Gerata Reporh
Peuterm mokat arolyu
Imprevt businass requremunta.
atho kidden insjada Hidden insjghtsom data
athana a
he wih respeut to busin4 Aquis ma
ana d
genenatad hom fo dáta and ara pau
euat Reberta Rebost a e
to dral wih fu thun ad.
t o h e Faspretive teams and ndvidual

high rist in bustness.

andlyais can bt
lexkerm Markat Analysis: Market
ha weaknaises Com
unduustand he stunqths and
data um
mpreve business requiremarts: Analysis Cs,
cstomen equirementi
and expevimen

Eyolarkin Analytie dealbbiB ty

Scalabilty The abilt q a dyum te handl inere asing ameunt
work uqund t bextorm i Task.

The in canase in dala sbrage abGty has gro Loh in Yecent yeas atte mi.
the mud r kig data

TTaolitional Analytic Architctur

he had t pull all togethe inle a
separah analytes enuTOnm.
te de analysis

Data base 3
Data base
date baieu
data bas e 2
The heavy
i n h e anayti

e n v i r o n mant.

Analytie due
Morduin in utabase Arehi tecure

The bre ussing stays in the database ohew hu olata has heen Constidta-

atabase atabas 3
data bae 4
data bas2)
Censt Udate

REn ta pie datAe

Oauhouuse Asi


t h e ' m a t h u u

Just submd h a

Ahrady tie Seve

DY Pe.

Mausively Pasallel lrocusing (Mer)

tn MPP
dato base breaks ha data ito îndabendnd chunks si th
ndapenden disk and cPo

Singla ovexloada Mu ltip ghtty loadad Auves

Shael lothing!

Gun s
Chun k s Chunk
Alradtional dadabasr
udi quyabyt tab ChunK Uuny Chun 169 4
One ueu ad im
0 SinuCtaneous 100-
8gabt quaies
* Mf ytm alleuh oteunt drti cPo Ond dis k ttT

rt tASContuwunty

n MPf

Jeb into


fowall mcus

MPP6ysüm build in kedundloney tmake u toveny easy

MPP ysbim hovt e s c u t moragnont trls
Mana H CPU and duk shau

CLeud Campulng
Mekinae aá Cempany pabes fum 2tt
- Mask hs unduuling inustwca em tusv
k zlastie ical e damord

en a poy- p- bais
.Natinas Tratinb d trdans ord Technckg (NMsT)
-Cn demand i Auie
Kexwru peuLng

rapid elat t
Too tybes ocous Environment
JPoblic cloud
Tha &uvico and ngastwtiue a e þrovidsd H-kit ever thu
Greatst uvel et e{deny in shared useu ca.
Lesst cunas and move ylnorabu han brivat clous
2 rivat cloud
nhasbuuclue chexaltd boluly fr singk o19anizolion
The same fealunes publie cloud

Ca he reatust u vel o decunity and conbel

Necas oy to purchaue and own he entive cloud infrastrut i e

podCompuing to reath a commen

The sedenatizn Computir resourcus

A baxallelogramming ameoerk.
Faull tolvona
gate dstsibutm
map eduu
nad balnun

Mab n tion sd qintxmadiat kyvalua

to gina a
frc sing a kay valus bauu
Redutfunticn immudiat
valuas auociatad wi h tha Bame
all madi ato

How map veduca h k i

20 trabytu dala
and 20 mabreduc Ave
lwi ass Umaharu a
moodsor o pvo
20 ncolss uLing dimpl dil
t t the
J s l i b ut a obgt each

Coby rocs
(Mab, Reduu)to Achad
2 ubmit i o þroghoma
3 ke map þrogiam find: the data em dik ard execlu tRa oo:
he Jeault eh mab sth au Hn hassed T T h Acduit
4 ansueeu,
bummaiz nd aggl:gat th: inal

Map unetion



maluytic breccas and fhaly tie teol

Data Analybi precess is a
þre Ca Cellseting ransorming
modelling data uwith he goal dicoveing he cleaning qnd,

The Lsuutu so
tbtained axe tommuniccatd, Aemuire irrmalien
duppe tiy cde cisien ma suggesüng conusiens and
ata Analyis precsss Consub f the follousint Þhases that are i rati
Dath Reguiumu nt
Data coUadien sheaicalien
Sata broci
ato clari
ata Analyiu
2ata Reqüge me r

Data Collee tüon

Data prcusing

Data elaaning


MiCrcsehl extel

2 Database
Co umn
3Progvammung Zang9uags
Rand lrthen
dely éuice dat
b e Pou BI

AwS Guick ght

Big data toels
ato laks
fAho che spask
A os
Geog cloud
Mic ese dge
Analysta vg posing
Paxamet Reporting Analrties
fanmpos eus ohat iuhatnlain ohy is happening
Tasks gani zing uLstioring
Fovna ting ntrpreti
Summasizing Ex plerin
Kesult Kesults ae pushad to tlaus bul Jesulu o answe
uLt fer Revie quslonns
Value Trons hts data înt d s utomma ndatio n t drive
infermation atios.

Applratin q dato Qnalyi

Se cui
1City plonning
2Franaortation 8.Healthcare
3 Praud and Ruk dstection 9. Trave
4Manage kiks J0 tnengY management
6 frchex spending J Snornt webseare
6 Cuwtome intona ttiona
( advertse me
Dato Aralyin 2ccyra:
6ded o (hases af dath analytiosi:yela and
or big data þrobems
The data analytie cye desigmad real brojred.
data s6enu þrqets. Tke yels s Healire to frepresent
The bhases ohich involved in data analy hies l e cyeh ans -

Ofs Covey
2ata brebasation
4. Medel buildij
5-Cemmuritation kesuts

6 Oherationalize

iscouey kam len and investigat the probem

The data science
evelob tonteset and undenstarding
data neceded ond
available jor the projet.
Come to knt us about he lata t u s t with
Hhat con
T e teom sprmulatu nitialhy bethesiu


Beta pnehaaian d: Lling

Stebs t explere, þre pre cisd and Con diton dath prier t e mo

and analsis
he kam execute,
t Tequirts ho presenca a n anaytit sondhere,
oadand transfevm
to qut
times and
Kata oepasation a sks ave likely o be perfome! nat HihG
nol tn fredelined order.

Sevenateels Commcn used for is phase are-Hadeop, Aine minex

Chen Kefme e4
Model blanning laHlons hi baluweon vaig kt
Team ecþlbves data te lkarn obout Ae
most suita be mtd
vasiobls and hs
Ubieputly, eleet kuys
n t h i s phase , dath stiont Jeam drelop
dato e for troinj
tsting and þroduttion puxpese.
Team burlds and exectis models bascd on he uoorkdone in
in ti
model bloning phase
Several tools commen y uacd der thu phase ae

Matlab , STA
Mede! building
Team olevele ba datasets fer tsting »thainig and
Team alse considess
prToduction burpose.
whethen its esting too w?U 3u
Tunning he modls e T
hay nccd m6re rcbust envito nmet
executing medels.
>Prec 0T
then- souTCe toe u -Rand PLR, Octave oERN .

Comm cia tooa - Mat ab, STAS TGcA

Commurication Kesult
Ar < xeulin9 model eam naed to Cempans

to critoio ësta blihed outo mes modl

fr suecus ond faiune.
Team Co nuides htw bist to auticulat
eom membus and fineling4 Ond ouhoms to
sla ke ho ldors ta king ioto
assumptior. attounrd Kan
Teomzheuld iounti key indergs, qusntity bukines value, and
2 dove lop mana live to Aummari ze ond to nvey indings te sO
3 eheta oherationabize

The eam Lommu nitalis banu o es more hroadly and A

bilet mojct te
depay Lot7k in Contyr lled n brpove broaderit
h wenk to full eti uae q usu.
T h t abhroach enablu eam to leaih about o m a n c Aalatad consho,l

mode n productioo environ mend on small'Aale l nbsb, and ma

adjustmunts brferrful! daploymint
Thettam dslives fnal auperb, brieerg.codu
Pree C7 Cpen scuce tonls -olave, we ka, 9ql,.Mod li b


dab anautiu

(Lem muR Cortin


Modu plann

JBuuinss l s
The business usu woho Unolustonds Ho main
u he One
ansa pd
and is ase basically banstited om the Araults ,

ntto pret aou

nu e
9ivts adviu ano tensut Ho ttam werkn
T Value othe ne3 ults ebtaina d and hou tho tperhlion m 4
Os den

DUSines managll, Une manager, 0r deep Aubjet malte e»«se

this Yo.
Ho pejact mains full:ls
2freet Shonsor
yesponsibl to
tohe ii
initiat thopojcd
is tho one
he rojct Sponser the actual Tequiremenu o
proje t ond brean
Projpct sponsor brovides
tRo bau ise businass issus.

He gennal rovids4 th funds

Ond mnasUres ta
dagrea of vauus hom
tinalscdpu ethtom Kor king on ths projct,
T h i peson intro dauee he bsime Cencern and brooma he redeulb

reek Monaze
Ths beLson enstes that key mihstne and puabese th: prepek u
bntim and dtha expeitad qualt

asiness tnt llugen te fnalyst

7kotustnuas fnk lligan ce analyst provids bui nas domaim be n bau
na detiled and dath urdenstanding tH data, key peferma ha
ndi cals (kPa ) ; kay matie and bud inds intlleganci e m a Auholi
abeint vi eu,
Thi beson generally cata fascka and ashort ond kneus beut ha
data drds and sowrtes.

Iatabase Administrater (DBA)

DEA Jaditats and avange ha datbass environ ment to supert Hha
analy tHes TeLd H-keam uwerking en a preject.
H bonsibiltëy may include providing hexmiss ten t key data bases
or tablis ancd making dwu Had i
aphrc þriat stcility s
stages au in Rui cerecd lbts Aclats t he data a
ubosito7ies cr nel.

Lato Engtner
a t o ergint grasp deap tehricl skillh asisl wi h kning $gL
gwnior data maragimant md
dato ectro ction and trevides sun
r dat intka int the analytis -sandbox.
The dato engune'ocTks iointty ci th tho data stienist te helb bu
dat in Core et
vnys anltysis,
Lab Adentiit
nth sse ntut fa dli lalts oith the subjek matt. .eepetise fer aralytis
ehriquuei, date modelng and ahflying CeTech analitica thaipes
ov ogvtn kusincss du"
He ensuns oveall Onalyial ojactives a met
o t e dntist Buis and a bby analy ti cal m +hedi ard pre cked
tsuands the data availa b f Concunad rejzek.

You might also like