You are on page 1of 12

6ig Lata A

Data Governance Data Go vernance nitialives imp*uove dalas


a a m Hesponsible por
dala
by asaignung
qualily himliness, valiaihy
a c c n a y , completness Comiiutenc

and uniqueness availability,


h
managnq
Cala ovennance s paocess
in e n t e p u n entepUR

mdearuly, Aecuriky o data


uuability standads policies
baaed on inlunal data
Ayatemu,
Conol data usag
that also Cemsistent
that data s
enswres

data 20vMance
tective
ustuaty
dosnt qt maustd
ypically ncludes
and

dala govenance proqmam


A well edestqmed h
2aat acls

a stuuung Commiitee
a 2enance team
stewards-
data
as goveNug bod t ? a eup
to creale he atandards pelcts
h
utk togehor
dala aurell a implementaluen
e enna Casried
tat ane pumaruly
enfercement proceduris

by he dala slewas

w y Daa Govenance MaHers ?


dala goveanance, dalaincomsustencus
Whout elut
an oTqanizaluon
might not
d f e n e n t Ayate mu acsos

g t hiotved.

why nganizaluons need to gorean dala


avoid inconsistent lata silos in diferent depmtment
and buymus unils
dlsinuluone a haned
on Lommn data
o 0ghee
wnderg tandn dala
4T mprove dala quality houah o s o idenüfy
enes n datase
ncrea analiucs atcnacy 9ve decisuo)- making
To
untoo maftom
eliale
l 0 mple mendP enforte poli'cis dhat Jhelp prevent dala

*To ulp ensnL Compliance with data privacy lauss f


othe egulalon

Key faxtidpauts in Data Goveanauce


Chie Data fice (CDo) CDO ten have oveal
thin rganitalions
Hespansililihy actbuntability Por
lata 9ovenance poaanm
u Data Govenance leam
A dala govenance manaa
heads Ymay also include
aPo 2sam e e that
dala arehutecti govenance specialist

ui) Data oveenance Cduncal Commite Tunically


Pcally made up
exe Culuves r o m all busina uniu , t sl dala poicies
standosde and e s olve mues

w Data sheuatds Stewasds oveeuee dala sels a e ncharge

mplementtvg qovenance poíces moutoung


Compilanc i t h them

Data Qualdy Anal Engnens The wok wit -he


ovenanu team dala atcuasdso ix dala. enneis

P track data qualiky metncc


pata ua
Metadata Manaaement (MeaDato Data Stewaxdihp
(Qunesship)

Data Data Potic


Data uneoa- qoresnace
(Trogaby ecou daltjpK
JAccountahiliiy CGuidelinei)
o dda
Data Stanalastids
Dato Secutty (Ppresentat"
Data Codaloguna (Aceear Monekernel) tta

(Gusins Glosiay)
TtLinks, technica
meiadata
busineu meteolata
Dda goenae eans hith an organi rat" er
up *ganraliona make decistons about hs
lien it e, Store A
in
mpnves dota qualil pivaty P promoles data f1ive
ive

Data ovenance
u Atlated to Ae people, procems }
to moNaq
chnolog
an o»geniration Cmploys
el data makúng
SLLte
hsight people have th ight
vel aacces do data
Evolution oData Governance 1

(Data G. usincs Defmitons /Responsibilikes


0 Focussed on specifie applrcalens

(Data G OpecalionalData Catalog9


20 business Diren, No Ttegrity b p

Data Ga Reaclive to proacire DG


3.0 Tnte lliqence Dita aoveunaute
Centralized Dafa
Measusce p Monttor Data Gualit
Automatien d Scale
Prote etion
heie have been amumbeu e
In th pas* 5 yeans
to sonme extent, diaupteahu
Juealizaltons that have

pencepten o data qualiky matuity namety


Ceoerd /s CosrecAon envionne nls ,
todls ase
In many

data not to ens une nat drta a valie


ud to i
Ccerect

Moe o9aniraliwnal stakeholdeKs


Tata hepurposuw Ccealed o
one une Fona
4hat data sets
teqnire used muttipie ltmes
CLL
winin enlenpoie
Puposa repesting p
in diexent contexia, panticeslarlp for
nals miqht.
be
ealiz alion, uhic h
Ned bov ovens4gkt Ths that ensurenq
Considered aa tollos on tohst, s
rC hes
alt puposts
th Lwabillty data for oversigkt should
Such
more Compreensuve ovesigkt.
tncor Posaled imto dtste
m
inrndle monitorcd contop
ATOTaTANMAT

development e gete
Thec nealizaov aad to duiplnu calle.d pata Govesnance
The objeelive o Data aoveunauce u to ansttute he
to achieve o othaec out omey
Pugkt Jevels ot Conr 3

data issues tha miqhdhavenegaluve


) Alet Talenttf
bu sine ss lmpa ct

nioize hos ues n retalton to heu


iage drives
Correspondin bunun value
An n adksald
achon
stewauds take pro per
3 Have data
i) Remdiat
when alerted to existence athose tues

Data qovenance Componenl


nas 4Componenly pupOse
data Goenance

Stuctwu, opeuahtons poliei o poDceses

wpose nat data u


STablnima punpose Codifies
e 9ovexnad itenede d out com deta
Stakeoldes Outcmes
eAOMCe me m e s

mig n c l e pol cies and pocnu tos orenng


daka mpsovng data Qualit increasunq
Stake holdes u s t n d a a ano creasng dola
o19nza Rons o stales
Aralegue Priorities

Suchu T incudu utab lubug h rola, reupensililit


prdahonsnupsumod dafa 9oenance

stucuu explains kow Cveysne contnbule


A clear
atseeless collaboratio
o data aevenance: T
wko makes fenent Kinda o decister
idenlifies
stouctuwe should inelude alcast 2 u
The DG
An exe cuwtive lea.deseshp levre reup het helps ensune
t has the oTanzatismal authonty
prasam
Sutained moanaaement
esonce c4uiored*D b
mplemantng
oenseesstasishins
e r p up
dala qualily Puse
dala politiu r o c e u torimpro
iy Opnahons how olala qoveanance usd - k
T lefinu tach data
be cxe
eulk d Opealhonn .incdu ela
will
will tss o
p drcis um makinq proe
o v e l n n c e
9
hty esealale
?hesole issuu
) Policies ottues
4o manaqu
dala
dala
descnbu how
4 includis
includes
Tt also
Conuislenlly }
pposeflly ? data acm,
data rcquels
data tollechovM,
data release
Data P Dala Govexnance iy data
Big when
60mu to
should adepl taditbnal appoacbea
gvenance one
chatacleuisktku
t deta quality Vpan examinauton Key to
cadaptalde
Big daa analylics a nt univexsalty
to data quality * data goveunance
Conventiona! approaches
fer exampBe n a traditonal apprdach
to data qualkiy
based on t
measmed
levels of data usabi lity ane
a
idea
idea o dála quatihy dimensions , sueh
Accuact Refeung to he deane to uwhich data values
ae Correct
Compláteney which spreiftes dala elements that must
han valuts
. Consistency Consiuteney e related dala values acou
ditferent dala nutance
Cuwunct whieh looks at preshnui o dat whelher
h valus ae up to d a a ov no
Uniqutnus I 6ptcifies cach mal wo»td ttem u
reprntad once only onee. within dataset
T h types of aauwnes an generall, tntended to
Thea
validate dala usng dalined rules, catch any eers
whn + tnput doeanot contom o ho-ita u
ulles
s
PCorree re cognirad tmors h n sttuationg allou at
Scanned with CamScanner
This appvoach iypically taxgett moderakely sizLd datasis
e m sowrcu i t h snuctured data, with a relativelyt
small sat Tules Opexalional anattical applicaton
limiled aiz tan inteale dala quality contbro&,
Correeons
clety,and Correctttons and hose
wiu
duce th ee downstream megal
wil impae
ne need for data qorenance in Big data i u u o m
acturat, relialde,
the need o business to obtaun
acionaLle insight into theu existing, dala
Benuito Data Govexnance
ConsLtencu in Compliance
1) Puovides +hat all
ensuwres
2) Tmpoed lala u sthong ovenance
as a
with data quality
data creation undlin
point o an ovemall fmprovement
da
s8dtaa
Thii eads to
Prioriby
quality w H n evganizatton:
3) Impores Data Managemunt
-
rules
The Code o Cenduct *
ensure that dala management
goreurnanci
establhed by ok
Tt makes t pos4ibleo management
is made easier
he dalas secwity P legal compilance
4) BeHee ecäton-Makung well- foveme olata
u
t easie for, the elevant paxlies
makinq
olicovesale, alao means deeistons wl e
It
toind u u insigh& 9rtaler accunac tut
righl dala, ensuwhing
based n
vaualle
$) Operaional {ficieny Gyood deta à incdibly
2should ha treaad
dala drinn business ,
inage d Conaider a' manulaturing
busineAs
tu.
ell nun manula ctunung
phyaica aseu, jos example,
ensune-fheiv produclion line machineny
houiness
P de
undergoes reaulas inspeeluons, maintenance siHh Limid
upnada &o h line operats smothly, to
Same appvoach should PPl dala
dewnhme. The
6 Tncrasnd Re renuu
Scanned with CamScanner
Bi9 Data lools Ktechni9tuis

ADFS Hadeop_Distikuted ile


-
Systen.
sed by faczbvok
HDFS Name
Chent Nede

Dota Data
LNoole Node Data
IMedee

Local Loco
Disk DisK Loco
Disk

128 MB:
5y Default pauthon-
4 KB
In Nosmal bmputen aiton
Same data Copy on Difftevent locotion
HDPS worR On Keplieation

bllection o Vasious Nodea


Rack Auwannen
-

Nome Necde function


t Hecoud mitadata of all he fla soudm clustu
Locakion block Storeol

Siz #le
pemission
Hiunchy
Fslmoge mege oshole ile o tem
tolitleg og actons puyormeol
aSecondary Nam bdate He Chonges dore
in A loama AJode
*Map Keadrce

Keoderce pefaums the paocessing .large dasa kel


Map
parolll mannes
dn istsibutecl

Task Map
Re duce
Mappe olivide he dask in pan ond Convet ham
to Ry- Value pais. nol Reul!
TAen DAOCeA him anod combinev in
ntermediate Ruult

2ackysounol johs ase


dab Traeker (Bon)

Maste Slave
Scheds ling
Job Task MaP
prr uiole Kesehee
Tracko Tracke JRedu
Meni'to nng
Name tMDFS
Node Node

Task 1ask Task slaves.


TracKer Tracka
Data
Taeke
Data
Data
Node Nede Node

(Map
N Re dueely
P Map
ecuce
(Map
3 YFARN- Ye+ Anethe Aesowrez Negorato,
In this
jo6 acke l* ints hsouncs manapeA
Cpplieaton maste
clien Node Manag
Kusour
C

97man0gea
CLia
Client To bmit mab sueetc jobs

eSouHCe 9manag t manage muouwCL allocation n eluster


o Compenent ae- Sehedulau
- abhication manag

i
Cbntainu -

t execute an
ohpleaton ápecifHe pioCu,
Ineluoig RAM
CPU HD
v pplication Mastee manage esousts netolk
-

Ppb'eahon 7 Ircbividkrol
v Joole manaL -

U manage 1Singlu moole Arseuse allo Codin

HrveUad tn nstagaam
sed foa queyina and analyzing daha ketStord
tn A filea lo ummaniz edaka
'Ue h8L ( Hive +S)
( usJSubmi
sQL riveQL to Hadoop Cluste
Conved Slay

uay MopRedueelT

Mastu divio ha Jpb a4gn poa to SlaveA


HBase t à ls taibuted coumn Daiented olatakase built on
kob Hadop ale system
Open oceuce Paject.
Baue a odat model dhat ui rimilas ds loogl's bip
ble uceitgn t b4ovide Quick" andom acers4 to kuge
amount to sbRuclutol data

We Can Stoue
data In HDPS oliaecehy Des haouph HBase.
heae ih en
def Haclrrp fuystem A phoviol Kiad
wnite access

clala.
t enable Kanodom , kluitly, ReaL dime.t s s b

a t a manaqement
* oke ebes
Comtaration
Cantaalizeo Sevia fou maitainng
Zookeepes is odistvibuted Synchro uzotton
marins boDvicin
noAmation
PAOvidi AeuSeviceu

muual exclusion opraion blw pauownet


alow

Pig lt use piq atn-language


the lage datasets.
is ued to paocA

Pig Soiht get


Conveted to Mab Reduce jobs get executed
On data Stoued in HDFs.

Pq Handle Stu ctwed Semi-Stuctued O Unstructua

Mahout - leaanin Algo Techniques tor Implemerting Algo


Classification clustering
H
H
Tava, onalyHcal
librariestechnique
1 Loeistic Kearession Kmean cluste
8 Naive Bayes
sed in lnkedk
dig Data Vnalysci
Step vise apPOaeh

Beng collecHng Cleoning Beinq enbenped


Deine Daa rrom throug analy n g H u u l t
why you Unnecessary |he hem
meedData allSowice& olata olata
AnalysisS
TData fnalysis
-

Depine uwhy yey need


Stems om a busine Puoblem
This meed ypically
OH queHon ueh ag
Heduee puoducton cost wihou
Vacupja
H o u ecan we

qualuty

Do ubomrs dee our bianol bosihively


What ae He wayi to impuove sales

a Colteet _ata lei


tuuther olata analys ts, iwe kove o
o7
data
Data Cteahon S k k uong csita PAmOy
olso Knbsn

also faom extunal Geucy

Clan nne; sany data


.

3
Clean he olubbcate n o anomalvns olaa
hat coulo Rehe analygs o Genuate
accnote Agult.
u eufosm DAta rnalaaThspaocs analyse P
mantplate hi olata.

Can be berfurmeel haougk data mining

5 ntnpret t Aisus
ntespart Ha Altfrem ata analyss Thus
Poa 's esental beca us e Is o w a busine

willgouin actual Vrlue rom psevious A84ps

Big ota failunes

8S Bia Data pseject foiled


7 Not+ Having the Right TData
2 Lack oh Business ObjecHves

3 Leadeship soubles.
42 Lack Skills
557 Lack oInuaStouete.
(ssues n Big Data
egalities Legal
.

Consume pHivacy
2. ecunit o peusonal Informatton
3Contaol ou dodo
4 Copysight Infaingement

You might also like