Professional Documents
Culture Documents
(Kes-05)
Unit
Sntroduction te ata smalytes
Sylia bus
Srlbooduction to data analyies; ouwcs and maun q data, classiticalien
dala (stutuned, semi suetued, unsouctuund), chaacinislie
du ction to data platjorm, Nerd d daita analjtic
deta gntrc ba
Evolition. oanalytiz scalabilty, cralytie pretas and torls, analysis
uporing, merdexn data analytie toes, applucalien dala
amalytics.
0
Data Suw
ata is a sact o fiqun ehtaind em eapeimanas 8 k e d enetsions
calculatiens drauimg
a basus or making
ssund in a
seim that is sitnb
Sor exa mplu,ttxt, imaguu and
er sloraga in er brocusing by a tombuu
SowceA o dato
The au two types of douN CA of dato avaulakol
Factuial Besign-£D
r O s On ecpexi me ntal design expeimend has tüe dacters
uwhou eoch
each with possibl values and en pexfozming traw othex Combinatisna
acts aa duived.
Thuna soun te
These ybes d a a can éauiy be Jeund within 09onizolon zth
as maxkd u toyd, sales ne tord,Torsacton Customu dat, acteum
s C w . i e .Tre etest and tme consumpi o i s obtaining nto
Eblivnal ahowte
Thi data Lokich and tan
Con ' be at inloral
found ovgonizalion. ba
oough eztövina) +yd baity JesOLUKCLA is esdinal s6eunu o daf
TK cost and time Censumbtton iv becauashis Contauns q huge
ambumd vt data .
mose
huge
Ecambles- Goveunmenl þublicotiont, Neus pubUcations , yrolicatt k
and other
Meqevemmental pubicaluons
othe sorces data
ensor data : with th advonoment otOT devicu, R åsnko1A 0
hese dvtes Collct datä which can hr uard dur srsor data aalyiui
track th þexfermantr and usage o product.
satoll data: Satllites erllret a lol images and dalä tm tonb
n
odaily basis th7oush swveillan ta Comea ohich ton be usd fo
callet usnul
indorma ticn
eb tsaic&u to fast and cheah interned
ai Gti omaub a many
dato okich Lan be uploado.d by uauu on dunt plotmCgn
bvedictad and Collect with hi þexmiasion for data analyais, Ths
w6each engines alse brevid +hee data
analysis. "TK
Seathed mosty.
thrbugh keyuoorda.Q nd quit
imov dota
Se tondaTy dala
Suwey
i nNeuvieu
Intunaldoto Cxlorna
data
OrQaniza tion
Cnovern mU
anmeunt diount.
anethe
Foreamblu, meastoents measiment tnvolving ueighd, di bnca,biu tk,
Clateatien data
t tn inoTmatien stored _in a batieulas fth uhressnTad
a
m
data.
The data ae clasitid inte thret germs dota
ötwrt om
Unstructind orm
demi tuctunad gorm
abuctiun Lohe
IC d koTm : Any term erelationa! database
ulatin bitinn altibuhu is bossibla. That h r ewisb a xulation
atwean eUN and column im the dalabase oth atable teucturr
Unsbu claed form: Any ferm t dlata that des mot have þredetned
stuclss is epresenta as unstueliura form ot olata. g
vfdee,imag, Cormmentu pest few wrbsites such as blegs ond
wiki pedia
Seminstwotned data : ta e u mel hae erm ta belar dab
imilas KDBMS. sedefned ogonize8rmais avai abe. Eg: c
xmk jstn, t«t filh tth tab keberator ete.
Chaxactvisties a t
>Accura cy
Comble lines
Re Gabi ty
Reavanu
Time ine that can uAeda
t eOT Xt Covds
CCLUNacy > Data acwnacy e
a
seulia be dcueee inomalinn
ov Lahaisca
hs ns we nas
Czucial doundatüonr
bi
and i t is a
Opnization.
Chana ctivisties
The a
bi daa
thavac uisties of
5 big data ae
asfe leus
Volume
Vaxie ty
Vena cty
Value
Ve toby
Volume
*The name 'bjq data' tses i relatadto a Aize Lohich ts eney meus
To datmine th vau
evalus sijx data plays a vey
tualae
ive luma o data is vey lage than it is actiallcensidesd
as
abi data' This means uoheather a
þarücular data c a n
be actitally bt consideud as a big data oT Tot, is dsberolun tsen
volume o data
Velocity
Veloty re t ths high sstad acumlatien e data
Tn big dato velotiby dato flouss im om sowCLs U ke machenes,
iuork, soio/ madia, mobio fbhone" ele
Th u e massive and cotinous ow dala . Ths debimimesthe
botntas ot data hes fast +a data is genarata d and procea
t meetho dmands.
unstructiuned data
s t asc e to haluegunt eus öouH Cs.
* Vasiety is basically Re avri val e data hom neLs Aou cas that a s
Vewaety (Truthgul)
*St e s Toin Consistenies and un tuta'niy in olo, Ha
data a vaslaba Can seme time get messy and 9ua
and
acuracy e d c t to contol.
a j q dauta is also vaviab» bicause d t a muldude et data dimenälors
Testin4 om u l t plu disparate olato upes and souru,
datà in bulk A
Exampl, would creat confusiors olerea
data could Convey ha orintombttt intormauon
Value
The bulk of data havin no valus is ne go d to tomban
unlass you +un it Trto semething uushul
a t a in i a a is n o uL l imbovtanu butt it naads to br
tad îto Some- ng vauoblu to extrat întermation.
Conve
Disadvantoges a Bi a
uastionabla data qualit
Heightend secusuty risks
Spdta ComplGane headachos
Cast and îngrostuuctiun tssees
Rig olata sills shortage
Data fAncalylics
Qata fnalyics
Data Analytics s -thu stiunte
maka
analama
Cpncusions abouthat dbdtae .fnor mation.
Taw data in ovde
This inormat
Can bt usee t5
0btimize bTocisses t inctasa he oveall c
o bunoas or 8yshm.
Retot
(
MM Crseat
Collet olala
Kete
tnalyse dota
ata nalyicr,
Hbes o at Analyts
4.Desoübtive analytiu
2Predictive analuteu
3recbtive analytius
4agerdste grayH a.
is duocay
deseribüve analyti the resuld
RSpuve onaluue Th
ptmsdt
andlyais can bt
lexkerm Markat Analysis: Market
ha weaknaises Com
tos.
unduustand he stunqths and
busin
data um
mpreve business requiremarts: Analysis Cs,
cstomen equirementi
and expevimen
t
The in canase in dala sbrage abGty has gro Loh in Yecent yeas atte mi.
the mud r kig data
Data base 3
Data base
date baieu
data bas e 2
ocus
Cn
The heavy
i n h e anayti
e n v i r o n mant.
Analytie due
Morduin in utabase Arehi tecure
The bre ussing stays in the database ohew hu olata has heen Constidta-
atabase atabas 3
data bae 4
data bas2)
Censt Udate
Submit
Requst
t h e ' m a t h u u
Just submd h a
Oneuab
ablo
tOTGigab
Chun
too.Gb
Gun s
Chun k s Chunk
Clunl
Alradtional dadabasr
100-h
udi quyabyt tab ChunK Uuny Chun 169 4
Chun
One ueu ad im
0 SinuCtaneous 100-
8gabt quaies
onCuLUtfrcu
* Mf ytm alleuh oteunt drti cPo Ond dis k ttT
rt tASContuwunty
n MPf
Jeb into
pes
3
J
Singlathresdhd
fowall mcus
CLeud Campulng
Mekinae aá Cempany pabes fum 2tt
- Mask hs unduuling inustwca em tusv
-
k zlastie ical e damord
-
en a poy- p- bais
.Natinas Tratinb d trdans ord Technckg (NMsT)
-Cn demand i Auie
Kexwru peuLng
-
rapid elat t
Too tybes ocous Environment
JPoblic cloud
Tha &uvico and ngastwtiue a e þrovidsd H-kit ever thu
inTivne
Greatst uvel et e{deny in shared useu ca.
Lesst cunas and move ylnorabu han brivat clous
2 rivat cloud
nhasbuuclue chexaltd boluly fr singk o19anizolion
The same fealunes publie cloud
MapKedue
A baxallelogramming ameoerk.
Parallezothe
Faull tolvona
gate dstsibutm
map eduu
nad balnun
Coby rocs
(Mab, Reduu)to Achad
2 ubmit i o þroghoma
3 ke map þrogiam find: the data em dik ard execlu tRa oo:
Centains
he Jeault eh mab sth au Hn hassed T T h Acduit
procus
4 ansueeu,
bummaiz nd aggl:gat th: inal
Map unetion
Sehaolulu
Results
The Lsuutu so
tbtained axe tommuniccatd, Aemuire irrmalien
duppe tiy cde cisien ma suggesüng conusiens and
king.
ata Analyis precsss Consub f the follousint Þhases that are i rati
Dath Reguiumu nt
Data coUadien sheaicalien
Sata broci
ato clari
ata Analyiu
ommunication
2ata Reqüge me r
Spedtcation
Data prcusing
Data elaaning
atAnalysiu
Commundcalion
JSpreadshest
MiCrcsehl extel
Spreadshush
2 Database
Relationa
Co umn
Dcumen
Grah.
3Progvammung Zang9uags
Rand lrthen
vitualizaion
dely éuice dat
Tabua
b e Pou BI
Ofs Covey
2ata brebasation
3-Medelþlanning
4. Medel buildij
5-Cemmuritation kesuts
6 Oherationalize
data.
and analsis
he kam execute,
t Tequirts ho presenca a n anaytit sondhere,
intohe&ondboe.
oadand transfevm
to qut
times and
Kata oepasation a sks ave likely o be perfome! nat HihG
nol tn fredelined order.
Matlab , STA
Mede! building
Team olevele ba datasets fer tsting »thainig and
Team alse considess
prToduction burpose.
whethen its esting too w?U 3u
Tunning he modls e T
hay nccd m6re rcbust envito nmet
executing medels.
>Prec 0T
then- souTCe toe u -Rand PLR, Octave oERN .
Commurication Kesult
Ar < xeulin9 model eam naed to Cempans
a
istoy
(obeiatia
dab anautiu
gato
prebaralu
Meds
Modu plann
budi,
JBuuinss l s
The business usu woho Unolustonds Ho main
u he One
ansa pd
and is ase basically banstited om the Araults ,
reek Monaze
Ths beLson enstes that key mihstne and puabese th: prepek u
bntim and dtha expeitad qualt
Lato Engtner
a t o ergint grasp deap tehricl skillh asisl wi h kning $gL
gwnior data maragimant md
dato ectro ction and trevides sun
r dat intka int the analytis -sandbox.
The dato engune'ocTks iointty ci th tho data stienist te helb bu
dat in Core et
vnys anltysis,
Lab Adentiit
nth sse ntut fa dli lalts oith the subjek matt. .eepetise fer aralytis
ehriquuei, date modelng and ahflying CeTech analitica thaipes
ov ogvtn kusincss du"
He ensuns oveall Onalyial ojactives a met
o t e dntist Buis and a bby analy ti cal m +hedi ard pre cked
tsuands the data availa b f Concunad rejzek.