You are on page 1of 174

NATIONAL OPEN UNIVERSITY OF NIGERIA

COURSE CODE :MBA 758


COURSE TITLE:
DATABASE MANAGEMENT SYSTEM
MBA 758
DATABASE MANAGEMENT SYSTEM
Course Writer Gerald C. Okereke
Eco Communications Inc.
Lagos Ikeja
Course Editor Mr. E. Eseyin
National Open University o Nigeria
!rogramme Leader "r. O. #. On$e
National Open University o Nigeria
Course Coordinator %im&ola' E.U. (deg&ola
National Open University o Nigeria
NATIONAL OPEN UNIVERSITY OF NIGERIA
COURSE
GUIDE
National Open University o Nigeria
)ead*uarters
+,-+. (/madu %ello Way
0ictoria Island
Lagos
(&uja Oice
1' "ar es 2alaam 2treet
O (minu 3ano Crescent
Wuse II' (&uja
Nigeria
e4mail5 centralino6nou.edu.ng
U7L5 $$$.nou.edu.ng
!u&lis/ed &y
National Open University o Nigeria
!rinted 899:
I2%N5 :;<491<4==+4:
(ll 7ig/ts 7eserved
CONTENTS PAGE
Introduction>>>>>>>>>>>>>>>>>>>>>> +
Course (im>>>>>>>>>>>>>>>>>>>>>> +
Course O&jectives>>>>>>>>>>>>>>>>>>>.. 8
Course Materials>>>>>>>>>>>>>>>>>>>.. 8
2tudy Units>>>>>>>>>>>>>>>>>>.>>>> 8
(ssignment ?ile >>>>>>>>>>>>>>>>>>>.. =
(ssessment>>>>>>>>>>>>>>>>>>>>>> ,
Credit Units >>>>>>>>>>>>>>>>>>>>>.. ,
!resentation 2c/edule >>>>>>>>>>>>>>>>>>. ,
Course Overvie$ >>>>>>>>>>>>>>>>>>>.. ,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Introducton
@/is course' "ata&ase Management 2ystem B"%M2C' is a course
designed in t/e pursuit o a degree in Masters "egrees in &usiness'
inance' marketing and related ields o study. It is also a course t/at
can &e studied &y !ostgraduate "iploma students in &usiness' sciences
and education.
@/is course is relevant to students studying &usiness &ecause
inormation-data orm t/e oundation o any &usiness enterprise. @/us a
t/oroug/ understanding o /o$ to manipulate' design and manage
data&ases.
@/is course is primarily to &e studied &y students $/o are already
graduates or post graduates in any ield o study. 2tudents $/o /ad not
/ad eDposure to computer science in t/eir irst degrees need to put in
eDtra eort to grasp t/is course properly.
@/is course guide takes you t/roug/ t/e nature o t/e course' t/e
materials you are going to use and /o$ you are to use materials to your
maDimum &eneit. It is eDpected t/at at least t$o /ours s/ould &e
devoted to t/e study o eac/ course unit. ?or eac/ unit t/ere
assessments in t/e orm o tutor4marked assignment. Aou are advised
carry out t/e eDercises immediately ater studying t/e unit.
@/ere $ill &e tutorial lectures to organiEed or t/is course. @/is serves
as an avenue to interact $it/ course instructors $/o $ill communicate
more clearly $it/ you regarding t/e course. Aou are advised to attend
t/e tutorial lectures &ecause it $ill en/ance your understanding o t/e
course. Note t/at it is also t/roug/ t/ese tutorial lectures t/at you $ill
su&mit your tutor4marked assignment and &e assessed accordingly.
Cour!" A#
%e/ind t/e development and design o t/is course is to kno$ /o$ to
design' manipulate and manage data&ases. @/e course participants are
eDposed to t/e various orms' types and models o data&ase systems to
ena&le t/em make via&le c/oices. 2upportive and complimentary
concepts o managing data and documents are t/oroug/ly eDamined to
give a $/olesome vie$ o data-inormation management. @/e ultimate
aim is to encourage t/e usage o data&ase management systems or
eective data management.
i
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Cour!" O$%"ct&"!
@/e ollo$ing are t/e major o&jectives o t/is course5
deine a "ata&ase Management 2ystem
give a description o t/e "ata&ase Management structure
deine a "ata&ase
deine &asic oundational terms o "ata&ase
understand t/e applications o "ata&ases
kno$ t/e advantages and disadvantages o t/e dierent models
compare relational model $it/ t/e 2tructured Fuery Language
B2FLC
kno$ t/e constraints and controversies associated $it/ relational
data&ase model.
kno$ t/e rules guiding transaction (CI"
identiy t/e major types o relational management systems
compare and contrast t/e types o 7"%M2 &ased on several criteria
understand t/e concept o data planning and "ata&ase design
kno$ t/e steps in t/e development o "ata&ases
trace t/e /istory and development process o 2FL
kno$ t/e scope and eDtension o 2FL
dierentiate "iscretionary and. Mandatory (ccess Control !olicies
kno$ t/e !roposed OO"%M2 2ecurity Models
identiy t/e various unctions o "ata&ase (dministrator
trace t/e /istory and development process o data$are/ouse
list various &eneits o data$are/ouse
compare and contrast document management system and content
management systems
kno$ t/e &asic components o document management systems
Cour!" M't"r'(!
+. Course Guide
8. 2tudy Units
=. @eDt&ooks
,. (ssignment ?ile
1. @utorials
Stud) Unt!
@/is course consists o t/irteen B+=C units' divided into = modules. Eac/
module deals $it/ major aspect o t/e course.
Modu(" *
ii
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Unit + Overvie$
Unit 8 "ata&ase
Unit = "ata&ase Concepts
Unit , "ata&ase Models +
Unit 1 "ata&ase Models5 7elational Model
Unit . %asic Components o "%M2
Modu(" +
Unit + "evelopment and "esign4O "ata&ase
Unit 8 2tructured Fuery Languages B2FLC
Unit = "ata&ase and Inormation 2ystems 2ecurity
Unit , "ata&ase (dministrator and (dministration
Modu(" ,
Unit + 7elational "ata&ase Management 2ystems
Unit 8 "ata$are/ouse
Unit = "ocument Management 2ystem
In studying t/e units' a minimum o 8 /ours is eDpected o you. 2tart &y
going t/roug/ t/e unit o&jectives or you to kno$ $/at you need to
learn and kno$ in t/e course o studying t/e unit. (t t/e end o t/e
study o t/e unit' evaluate yoursel to kno$ i you /ave ac/ieved t/e
o&jectives o t/e unit. I not' you need to go t/roug/ t/e unit again.
@o /elp you ascertain /o$ $ell you understood t/e course' t/ere $ill &e
eDercises mainly in t/e orm o tutor4marked assignments at t/e end o
eac/ unit. (t irst attempt' try to ans$er t/e *uestions $it/out
necessarily /aving to go t/roug/ t/e unit. )o$ever' i you cannot
proer solutions o/and' t/en go t/roug/ t/e unit to ans$er t/e
*uestions.
A!!-n#"nt F("
?or eac/ unit' you $ill ind one B+C or t$o B8C tutor4marked assignments.
@/ese assignments serve t$o purposes5
*. S"(/ E&'(u'ton: @/e tutor4marked assignment $ill assists
you to t/oroug/ly go t/roug/ eac/ unit' &ecause you are advised
to attempt to ans$er t/e *uestions immediately ater studying
eac/ unit. @/e *uestions are designed in suc/ a $ay t/at at least
one *uestion must prompt a typical sel assessment test.
iii
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
+. O$t'n V'(u'$(" M'r0!: @/e tutor4marked assignment is also a
valid means to o&tain marks t/at $ill orm part o your total score
in t/is course. It constitutes =9G o total marks o&taina&le in t/is
course.
Aou are advised to go t/roug/ t/e units t/oroug/ly or you to &e a&le to
proer correct solution to t/e tutor4marked assignment
A!!"!!#"nt
Aou $ill &e assessed and graded in t/is course t/roug/ tutor4marked
assignment and ormal $ritten eDamination. @/e allocation o marks is
as indicated &elo$.
(ssignments H =9 G
EDamination H ;9G
?inal eDamination and grading
@/e inal eDamination $ill consist o t$o B8C sections5
+. 2ection +5 @/is is compulsory and $eig/s ,9 marks
8. 2ection 85 @/is consists o siD B.C *uestions out o $/ic/ you
are to ans$er B,C *uestions. It $eig/ts .9 marks.
@/e duration o t/e eDamination $ill &e = /ours.
Cr"dt Unt!
@/is course attracts = credit units only.
Pr"!"nt'ton Sc1"du("
@/is constitutes t/e sc/eduled dates and venue or tutorial classes' as
$ell as /o$ and $/en to su&mit t/e tutorials. (ll t/is $ill &e
communicated to you in due course.
Cour!" O&"r&"2
@/is indicates t/e units-topic' issues to &e studied eac/ $eek. It also
includes t/e duration o t/e course' revision $eek and eDamination
$eek. @/e details are as provided &elo$5
iv
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Unt Tt(" o/ 3or0 3""04!
Act&t)
A!!"!!#"nt
5"nd o/ unt6
Course Guide
Modu(" *
+ Overvie$ + @M(
8 "ata&ase 8 @M(
= "ata&ase Concepts = @M(
, "ata&ase Models + , @M(
1 "ata&ase Models5 7elational Model 1 @M(
. %asic Components o "%M2 . @M(
Modu(" +
+ "evelopment and "esign4O
"ata&ase
; @M(
8 2tructured Fuery Languages B2FLC < @M(
= "ata&ase and Inormation 2ystems
2ecurity
: @M(
, "ata&ase (dministrator and
(dministration
+9 @M(
Modu(" ,
+ 7elational "ata&ase Management
2ystems
++ @M(
8 "ata$are/ouse +8 @M(
= "ocument Management 2ystem += @M(
R"&!on 'nd E7'#n'ton *8
v
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Course Code M%( ;1<
Course @itle "ata&ase Management 2ystem
Course Writer Gerald C. Okereke
Eco Communications Inc.
Lagos Ikeja
Course Editor Mr. E. Eseyin
National Open University o Nigeria
!rogramme Leader "r. O. #. On$e
National Open University o Nigeria
Course Coordinator %im&ola' E.U. (deg&ola
National Open University o Nigeria
vi
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
NATIONAL OPEN UNIVERSITY OF NIGERIA
vii
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
National Open University o Nigeria
)ead*uarters
+,-+. (/madu %ello Way
0ictoria Island
Lagos
(&uja Oice
1' "ar es 2alaam 2treet
O (minu 3ano Crescent
Wuse II' (&uja
Nigeria
e4mail5 centralino6nou.edu.ng
U7L5 $$$.nou.edu.ng
!u&lis/ed &y
National Open University o Nigeria
!rinted 899:
I2%N5 :;<491<4==+4:
(ll 7ig/ts 7eserved
viii
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
CONTENTS PAGE
Modu(" * 99999999999999999999.. *
Unit + Overvie$>>>>>>>>>>>>>.>>>. +
Unit 8 "ata&ase>>>>>>>>>>>>>>..>>.. ++
Unit = "ata&ase Concepts>>>>>>>>>..>>>.. 8=
Unit , "ata&ase Models +>>>>>>>>>.>.>>.. =.
Unit 1 "ata&ase Models5 7elational Model>>>>>>.. 18
Unit . %asic Components o "%M2 >>>>>>>>> .,
Modu(" + 999999999..99999999999.. 75
Unit + "evelopment and "esign4O "ata&ase >>>>>> ;1
Unit 8 2tructured Fuery Languages B2FLC>>>>>>>. <<
Unit = "ata&ase and Inormation 2ystems 2ecurity >>>... +9+
Unit , "ata&ase (dministrator and (dministration >>>.. ++1
Modu(" , 999999999999999999..99.. *+8
Unit + 7elational "ata&ase Management 2ystems >>.> +8,
Unit 8 "ata Ware/ouse>>>.>>>>>>>.>..>> +=1
Unit = "ocument Management 2ystem>>>>>>>>.. +,;
iD
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
MODULE *
Unit + Overvie$
Unit 8 "ata&ase
Unit = "ata&ase Concepts
Unit , "ata&ase Models +
Unit 1 "ata&ase Models5 7elational Model
Unit . %asic Components o "%M2
UNIT * OVERVIE3
CONTENTS
+.9 Introduction
8.9 O&jectives
=.9 Main Content
=.+ "escription
=.8 "%M2 %eneits
=.= ?eatures and capa&ilities o "%M2
=., Uses o "%M2
=.1 List o "ata&ase Management 2ystems 2ot$are
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
( "ata&ase Management 2ystem B"%M2C is computer sot$are
designed or t/e purpose o managing data&ases &ased on a variety o
data models.
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
deine a "ata&ase Management 2ystem
give a description o t/e "ata&ase Management 2tructure
numerate t/e &eneits o "ata&ase Management 2ystem
descri&e t/e eatures and capa&ilities o a typical "%M2
identiy and dierentiate t/e dierent types and models o
"%M2.
+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.: MAIN CONTENT
,.* D"!cr<ton
( "%M2 is a compleD set o sot$are programs t/at controls t/e
organiEation' storage' management' and retrieval o data in a data&ase.
"%M2 are categoriEed according to t/eir data structures or types'
sometime "%M2 is also kno$n as "ata &ase Manager. It is a set o
pre$ritten programs t/at are used to store' update and retrieve a
"ata&ase. ( "%M2 includes5
( modeling language to deine t/e sc/ema o eac/ data&ase /osted in
t/e "%M2' according to t/e "%M2 data model.
@/e our most common types o organiEations are t/e /ierarc/ical'
net$ork' relational and o&ject models. Inverted lists and ot/er met/ods
are also used. ( given data&ase management system may provide one or
more o t/e our models. @/e optimal structure depends on t/e natural
organiEation o t/e applicationIs data' and on t/e applicationIs
re*uirements B$/ic/ include transaction rate BspeedC' relia&ility'
maintaina&ility' scala&ility' and costC.
@/e dominant model in use today is t/e ad /oc one em&edded in 2FL'
despite t/e o&jections o purists $/o &elieve t/is model is a corruption
o t/e relational model' since it violates several o its undamental
principles or t/e sake o practicality and perormance. Many "%M2s
also support t/e Open "ata&ase Connectivity (!I t/at supports a
standard $ay or programmers to access t/e "%M2.
"ata structures Bields' records' iles and o&jectsC optimiEed to deal $it/
very large amounts o data stored on a permanent data storage device
B$/ic/ implies relatively slo$ access compared to volatile main
memoryC.
( data&ase *uery language and report $riter to allo$ users to
interactively interrogate t/e data&ase' analyEe its data and update it
according to t/e users privileges on data.
It also controls t/e security o t/e data&ase.
"ata security prevents unaut/oriEed users rom vie$ing or updating t/e
data&ase. Using pass$ords' users are allo$ed access to t/e entire
data&ase or su&sets o it called subschemas. ?or eDample' an employee
data&ase can contain all t/e data a&out an individual employee' &ut one
group o users may &e aut/oriEed to vie$ only payroll data' $/ile ot/ers
are allo$ed access to only $ork /istory and medical data.
8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
I t/e "%M2 provides a $ay to interactively enter and update t/e
data&ase' as $ell as interrogate it' t/is capa&ility allo$s or managing
personal data&ases. )o$ever' it may not leave an audit trail o actions or
provide t/e kinds o controls necessary in a multi4user organiEation.
@/ese controls are only availa&le $/en a set o application programs are
customiEed or eac/ data entry and updating unction.
( transaction mec/anism' t/at ideally $ould guarantee t/e (CI"
properties' in order to ensure data integrity' despite concurrent user
accesses Bconcurrency controlC' and aults Bault toleranceC.
It also maintains t/e integrity o t/e data in t/e data&ase.
@/e "%M2 can maintain t/e integrity o t/e data&ase &y not allo$ing
more t/an one user to update t/e same record at t/e same time. @/e
"%M2 can /elp prevent duplicate records via uni*ue indeD constraintsJ
or eDample' no t$o customers $it/ t/e same customer num&ers Bkey
ieldsC can &e entered into t/e data&ase.
@/e "%M2 accepts re*uests or data rom t/e application program and
instructs t/e operating system to transer t/e appropriate data.
W/en a "%M2 is used' inormation systems can &e c/anged muc/ more
easily as t/e organiEationIs inormation re*uirements c/ange. Ne$
categories o data can &e added to t/e data&ase $it/out disruption to t/e
eDisting system.
OrganiEations may use one kind o "%M2 or daily transaction
processing and t/en move t/e detail onto anot/er computer t/at uses
anot/er "%M2 &etter suited or random in*uiries and analysis. Overall
systems design decisions are perormed &y data administrators and
systems analysts. "etailed data&ase design is perormed &y data&ase
administrators.
"ata&ase servers are specially designed computers t/at /old t/e actual
data&ases and run only t/e "%M2 and related sot$are. "ata&ase
servers are usually multiprocessor computers' $it/ 7(I" disk arrays
used or sta&le storage. Connected to one or more servers via a /ig/4
speed c/annel' /ard$are data&ase accelerators are also used in large
volume transaction processing environments.
"%M2s are ound at t/e /eart o most data&ase applications. 2ometimes
"%M2s are &uilt around a private multitasking kernel $it/ &uilt4in
net$orking support alt/oug/ no$adays t/ese unctions are let to t/e
operating system.
=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.+ DBMS B"n"/t!
Improved strategic use o corporate data
7educed compleDity o t/e organiEationKs inormation systems
environment
7educed data redundancy and inconsistency
En/anced data integrity
(pplication4data independence
Improved security
7educed application development and maintenance costs
Improved leDi&ility o inormation systems
Increased access and availa&ility o data and inormation
Logical L !/ysical data independence
Concurrent access anomalies.
?acilitate atomicity pro&lem.
!rovides central control on t/e system t/roug/ "%(.
Figure 1: An example of a database management approach in a
banking information system.
Note /o$ t/e savings' c/ecking' and installment loan programs use a
data&ase management system to s/are a customer data&ase. Note also
t/at t/e "%M2 allo$s a user to make a direct' ad /oc interrogation o
t/e data&ase $it/out using application programs.
,., F"'tur"! 'nd C'<'$(t"! o/ DBMS
( "%M2 can &e c/aracteriEed as an Mattri&ute management systemM
$/ere attri&utes are small c/unks o inormation t/at descri&e
somet/ing. ?or eDample' McolourM is an attri&ute o a car. @/e value o
t/e attri&ute may &e a color suc/ as MredM' M&lueM or MsilverM.
,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
(lternatively' and especially in connection $it/ t/e relational model o
data&ase management' t/e relation &et$een attri&utes dra$n rom a
speciied set o domains can &e seen as &eing primary. ?or instance' t/e
data&ase mig/t indicate t/at a car t/at $as originally MredM mig/t ade to
MpinkM in time' provided it $as o some particular MmakeM $it/ an
inerior paint jo&. 2uc/ /ig/er arity relations/ips provide inormation on
all o t/e underlying domains at t/e same time' $it/ none o t/em &eing
privileged a&ove t/e ot/ers.
@/roug/out recent /istory specialiEed data&ases /ave eDisted or
scientiic' geospatial' imaging' and document storage and like uses.
?unctionality dra$n rom suc/ applications /as lately &egun appearing
in mainstream "%M2s as $ell. )o$ever' t/e main ocus t/ere' at least
$/en aimed at t/e commercial data processing market' is still on
descriptive attri&utes on repetitive record structures.
@/us' t/e "%M2s o today roll toget/er re*uently4needed services or
eatures o attri&ute management. %y eDternaliEing suc/ unctionality to
t/e "%M2' applications eectively s/are code $it/ eac/ ot/er and are
relieved o muc/ internal compleDity. ?eatures commonly oered &y
data&ase management systems include5
=u"r) A$(t)
Fuerying is t/e process o re*uesting attri&ute inormation rom various
perspectives and com&inations o actors. EDample5 M)o$ many 84door
cars in @eDas are greenNM
( data&ase *uery language and report $riter allo$ users to interactively
interrogate t/e data&ase' analyEe its data and update it according to t/e
users privileges on data. It also controls t/e security o t/e data&ase.
"ata security prevents unaut/oriEed users rom vie$ing or updating t/e
data&ase. Using pass$ords' users are allo$ed access to t/e entire
data&ase or su&sets o it called su&sc/emas. ?or eDample' an employee
data&ase can contain all t/e data a&out an individual employee' &ut one
group o users may &e aut/oriEed to vie$ only payroll data' $/ile ot/ers
are allo$ed access to only $ork /istory and medical data.
I t/e "%M2 provides a $ay to interactively enter and update t/e
data&ase' as $ell as interrogate it' t/is capa&ility allo$s or managing
personal data&ases. )o$ever it may not leave an audit trail o actions or
provide t/e kinds o controls necessary in a multi4user organiEation.
@/ese controls are only availa&le $/en a set o application programs are
customiEed or eac/ data entry and updating unction.
1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
B'c0u< 'nd R"<(c'ton
Copies o attri&utes need to &e made regularly in case primary disks or
ot/er e*uipment ails. ( periodic copy o attri&utes may also &e created
or a distant organiEation t/at cannot readily access t/e original. "%M2
usually provide utilities to acilitate t/e process o eDtracting and
disseminating attri&ute sets.
W/en data is replicated &et$een data&ase servers' so t/at t/e
inormation remains consistent t/roug/out t/e data&ase system and
users cannot tell or even kno$ $/ic/ server in t/e "%M2 t/ey are
using' t/e system is said to eD/i&it replication transparency.
Ru(" En/orc"#"nt
Oten one $ants to apply rules to attri&utes so t/at t/e attri&utes are
clean and relia&le. ?or eDample' $e may /ave a rule t/at says eac/ car
can /ave only one engine associated $it/ it Bidentiied &y Engine
Num&erC. I some&ody tries to associate a second engine $it/ a given
car' $e $ant t/e "%M2 to deny suc/ a re*uest and display an error
message. )o$ever' $it/ c/anges in t/e model speciication suc/ as' in
t/is eDample' /y&rid gas4electric cars' rules may need to c/ange. Ideally
suc/ rules s/ould &e a&le to &e added and removed as needed $it/out
signiicant data layout redesign.
S"curt)
Oten it is desira&le to limit $/o can see or c/ange a given attri&utes or
groups o attri&utes. @/is may &e managed directly &y individual' or &y
t/e assignment o individuals and privileges to groups' or Bin t/e most
ela&orate modelsC t/roug/ t/e assignment o individuals and groups to
roles $/ic/ are t/en granted entitlements.
Co#<ut'ton
@/ere are common computations re*uested on attri&utes suc/ as
counting' summing' averaging' sorting' grouping' cross4reerencing' etc.
7at/er t/an /ave eac/ computer application implement t/ese rom
scratc/' t/ey can rely on t/e "%M2 to supply suc/ calculations. (ll
arit/metical $ork to perorm &y computer is called a computation.
C1'n-" 'nd Acc"!! Lo--n-
Oten one $ants to kno$ $/o accessed $/at attri&utes' $/at $as
c/anged' and $/en it $as c/anged. Logging services allo$ t/is &y
keeping a record o access occurrences and c/anges.
.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Auto#'t"d O<t#>'ton
I t/ere are re*uently occurring usage patterns or re*uests' some "%M2
can adjust t/emselves to improve t/e speed o t/ose interactions. In
some cases t/e "%M2 $ill merely provide tools to monitor
perormance' allo$ing a /uman eDpert to make t/e necessary
adjustments ater revie$ing t/e statistics collected.
,.8 U!"! O/ D't'$'!" M'n'-"#"nt S)!t"#!
@/e our major uses o data&ase management systems are5
+. "ata&ase "evelopment
8. "ata&ase Interrogation
=. "ata&ase Maintenance
,. (pplication "evelopment
D't'$'!" D"&"(o<#"nt
"ata&ase packages like Microsot (ccess' Lotus (pproac/ allo$ end
users to develop t/e data&ase t/ey need. )o$ever' large organiEations
$it/ client-server or mainrame4&ased system usually place control o
enterprise4$ide data&ase development in t/e /ands o data&ase
administrators and ot/er data&ase specialists. @/is improves t/e integrity
and security o organiEational data&ase. "ata&ase developers use t/e
data deinition languages B""LC in data&ase management systems like
oracle :i or I%MKs %"8 to develop and speciy t/e data contents'
relations/ips and structure eac/ data&ases' and to modiy t/ese data&ase
speciications called a data dictionary.
F-ur" +: T1" Four M'%or U!"! o/ DBMS
D't'$'!" Int"rro-'ton
@/e "ata&ase interrogation capa&ility is a major use o "ata&ase
management system. End users can interrogate a data&ase management
system &y asking or inormation rom a data&ase using a query
language or a report generator. @/ey can receive an immediate
D't'$'!"
D't'$'!"
U!"!
"ata
"ictionary
O<"r'tn-
S)!t"#
D't'$'!"
M'n'-"#"nt
2ystems
(p<(c'ton
Pro-r'#!
4Database Development

4Database Interrogation
4Database Maintenance
4Application Development
;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
response in t/e orm o video displays or printed reports. No diicult
programming ideas are re*uired.
D't'$'!" M'nt"n'nc"
@/e data&ases o organiEations need to &e updated continually to relect
ne$ &usiness transactions and ot/er events. Ot/er miscellaneous
c/anges must also &e made to ensure accuracy o t/e data in t/e
data&ase. @/is data&ase maintenance process is accomplis/ed &y
transaction processing programs and ot/er end4user application
packages $it/in t/e support o t/e data&ase management system. End4
users and inormation specialists can also employ various utilities
provided &y a "%M2 or data&ase maintenance.
A<<(c'ton D"&"(o<#"nt
"ata&ase management system packages play major roles in application
development. End4users' systems analysts and ot/er application
developers can use t/e ourt/ generational languages B,GLC
programming languages and &uilt4in sot$are development tools
provided &y many "%M2 packages to develop custom application
programs. ?or eDample you can use a "%M2 to easily develop t/e data
entry screens' orms' reports' or $e& pages &y a &usiness application. (
data&ase management system also makes t/e jo& o application
programmers easier' since t/ey do not /ave to develop detailed data
/andling procedures using a conventional programming language every
time t/ey $rite a program.
,.5 Mod"(!
@/e various models o data&ase management systems are5
+. )ierarc/ical
8. Net$ork
=. O&ject4oriented
,. (ssociative
1. Column4Oriented
.. Navigational
;. "istri&uted
<. 7eal @ime 7elational
:. 2FL
@/ese models $ill &e discussed in details in su&se*uent units o t/is
course.
,.? L!t o/ D't'$'!" M'n'-"#"nt S)!t"#! So/t2'r"
EDamples o "%M2s include
<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Oracle
"%8
2y&ase (daptive 2erver Enterprise
?ileMaker
?ire&ird
Ingres
InormiD
Microsot (ccess
Microsot 2FL 2erver
Microsot 0isual ?oD!ro
My2FL
!ostgre2FL
!rogress
2FLite
@eradata
C2FL
OpenLink 0irtuoso
8.: CONCLUSION
"ata&ase management systems /as continue to make data arrangement
and storage to &e muc/ easier t/an it used to &e. Wit/ t/e emergence o
relational model o data&ase management systems muc/ o t/e &ig
c/allenge in /andling large data&ase /as &een reduced. More data&ase
management products $ill &e availa&le on t/e market as t/ere $ill &e
improvement in t/e already eDisting once.
5.: SUMMARY
( D't'$'!" M'n'-"#"nt S)!t"# BDBMSC is computer sot$are
designed or t/e purpose o managing data&ases &ased on a variety
o data models.
( "%M2 is a compleD set o sot$are programs t/at controls t/e
organiEation' storage' management' and retrieval o data in a
data&ase
W/en a "%M2 is used' inormation systems can &e c/anged muc/
more easily as t/e organiEationIs inormation re*uirements c/ange.
Ne$ categories o data can &e added to t/e data&ase $it/out
disruption to t/e eDisting system.
Oten it is desira&le to limit $/o can see or c/ange $/ic/ attri&utes
or groups o attri&utes. @/is may &e managed directly &y individual'
or &y t/e assignment o individuals and privileges to groups' or Bin
t/e most ela&orate modelsC t/roug/ t/e assignment o individuals
and groups to roles $/ic/ are t/en granted entitlements.
:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
( "%M2 can &e c/aracteriEed as an Mattri&ute management systemM
$/ere attri&utes are small c/unks o inormation t/at descri&e
somet/ing. ?or eDample' McolourM is an attri&ute o a car. @/e value
o t/e attri&ute may &e a color suc/ as MredM' M&lueM or MsilverM.
Fuerying is t/e process o re*uesting attri&ute inormation rom
various perspectives and com&inations o actors. EDample5 M)o$
many 84door cars in @eDas are greenNM
(s computers gre$ in capa&ility' t/is trade4o &ecame increasingly
unnecessary and a num&er o general4purpose data&ase systems
emergedJ &y t/e mid4+:.9s t/ere $ere a num&er o suc/ systems in
commercial use. Interest in a standard &egan to gro$' and C/arles
%ac/man' aut/or o one suc/ product' IDS' ounded t/e Database
Task roup $it/in CO"(2AL
?.: TUTOR@MARAED ASSIGNMENT
+. Mention +9 data&ase management systems sot$are
8. "escri&e &riely t/e &ackup and replication a&ility o data&ase
management systems.
7.: REFERENCESBFURTCER READINGS
Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata
%anksM. Communications o t/e (CM += B.C5 =;;O=<;.
OK%rien' #ames (. 899=' Introduction to Inormation 2ystems' McGra$4
)ill' ++
t/
Edition
UNIT + DATABASE
CONTENTS
+.9 Introduction
8.9 O&jectives
+9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
=.9 Main Content
=.+ ?oundations o "ata&ase @erms
=.8 )istory
=.= "ata&ase @ypes
=., "ata&ase 2torage 2tructures
=.1 "ata&ase 2ervers
=.. "ata&ase 7eplication
=.; 7elational "ata&ase
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
( "ata&ase is a structured collection o data t/at is managed to meet t/e
needs o a community o users. @/e structure is ac/ieved &y organiEing
t/e data according to a data&ase model. @/e model in most common use
today is t/e relational model. Ot/er models suc/ as t/e /ierarc/ical
model and t/e net$ork model use a more eDplicit representation o
relations/ips Bsee &elo$ or eDplanation o t/e various data&ase modelsC.
( computer data&ase relies upon sot$are to organiEe t/e storage o
data. @/is sot$are is kno$n as a data&ase management system
B"%M2C. "ata&ases management systems are categoriEed according to
t/e data&ase model t/at t/ey support. @/e model tends to determine t/e
*uery languages t/at are availa&le to access t/e data&ase. ( great deal o
t/e internal engineering o a "%M2' /o$ever' is independent o t/e data
model' and is concerned $it/ managing actors suc/ as perormance'
concurrency' integrity' and recovery rom /ard$are ailures. In t/ese
areas t/ere are large dierences &et$een products.
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
deine a data&ase
deine &asic oundational terms o data&ase
kno$ a little &it o t/e /istory o t/e development o data&ase
kno$ and dierentiate t/e dierent types o data&ase
ans$er t/e *uestion o t/e structure o data&ase.
,.: MAIN CONTENT
,.* Found'ton! o/ D't'$'!" T"r#!
F("
++
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
( ile is an ordered arrangement o records in $/ic/ eac/ record is
stored in a uni*ue identiia&le location. @/e se*uence o t/e record is
t/en t/e means &y $/ic/ t/e record $ill &e located. In most computer
systems' t/e se*uence o records is eit/er alp/a&etic or numeric &ased
on ield common to all records suc/ as name or num&er.
R"cord!
( record or tuple is a complete set o related ields. ?or eDample' t/e
Table ! &elo$ s/o$s a set o related ields' $/ic/ is a record. In ot/er
$ords' i t/is $ere to &e a part o a ta&le t/en $e $ould call it a ro$ o
data. @/ereore' a ro$ o data is also a record.
T'$(" *
Sr No Icod" Ord No Ord D't" PDt)
+ 73234@ 99<=-:: =-=-899< +89
F"(d
( ield is a property or a c/aracteristic t/at /olds some piece o
inormation a&out an entity. (lso' it is a category o inormation $it/in
a set o records. ?or eDample' t/e irst names' or address or p/one
num&ers o people listed in address &ook.
R"('ton!
In t/e relational data model' t/e data in a data&ase is organiEed in
relations. ( relation is synonymous $it/ aKta&leK. ( ta&le consists o
columns and ro$s' $/ic/ are reerred as ield and records in "%M2
terms' and attri&utes and tuples in 7elational "%M2 terms.
Attr$ut"!
(n attri&ute is a property or c/aracteristics t/at /old some inormation
a&out an entity. ( PCustomerK or eDample' /as attri&utes suc/ as a
name' and an address.
+8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
T'$(" +: DBMS 'nd R"('ton'( DBMS T"r#! n Co#<'r!on
Co##on T"r# DBMS T"r#no(o-) RDBMS
T"r#no(o-)
"ata&ase @a&le "ata&ase
@a&le @a&le 7elation
Column ?ield (ttri&ute
7o$ 7ecord @uple
,.+ C!tor)
@/e earliest kno$n use o t/e term database $as in Novem&er +:.='
$/en t/e 2ystem "evelopment Corporation sponsored a symposium
under t/e title Development and Management of a "omputer#centered
Data $ase. D't'$'!" as a single $ord &ecame common in Europe in t/e
early +:;9s and &y t/e end o t/e decade it $as &eing used in major
(merican ne$spapers. B@/e a&&reviation "%' /o$ever' survives.C
@/e irst data&ase management systems $ere developed in t/e +:.9s. (
pioneer in t/e ield $as C/arles %ac/man. %ac/manIs early papers s/o$
t/at /is aim $as to make more eective use o t/e ne$ direct access
storage devices &ecoming availa&le5 until t/en' data processing /ad &een
&ased on punc/ed cards and magnetic tape' so t/at serial processing $as
t/e dominant activity. @$o key data models arose at t/is time5
CO"(2AL developed t/e net$ork model &ased on %ac/manIs ideas'
and Bapparently independentlyC t/e /ierarc/ical model $as used in a
system developed &y Nort/ (merican 7ock$ell later adopted &y I%M as
t/e cornerstone o t/eir IM2 product. W/ile IM2 along $it/ t/e
CO"(2AL I"M2 $ere t/e &ig' /ig/ visi&ility data&ases developed in
t/e +:.9s' several ot/ers $ere also &orn in t/at decade' some o $/ic/
/ave a signiicant installed &ase today. @/e relational model $as
proposed &y E. ?. Codd in +:;9. )e criticiEed eDisting models or
conusing t/e a&stract description o inormation structure $it/
descriptions o p/ysical access mec/anisms. ?or a long $/ile' /o$ever'
t/e relational model remained o academic interest only. W/ile
CO"(2AL products BI"M2C and net$ork model products BIM2C $ere
conceived as practical engineering solutions taking account o t/e
tec/nology as it eDisted at t/e time' t/e relational model took a muc/
more t/eoretical perspective' arguing BcorrectlyC t/at /ard$are and
sot$are tec/nology $ould catc/ up in time. (mong t/e irst
implementations $ere Mic/ael 2tone&rakerIs Ingres at %erkeley' and t/e
2ystem 7 project at I%M. %ot/ o t/ese $ere researc/ prototypes'
announced during +:;.. @/e irst commercial products' Oracle and
"%8' did not appear until around +:<9.
+=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
"uring t/e +:<9s' researc/ activity ocused on distri&uted data&ase
systems and data&ase mac/ines. (not/er important t/eoretical idea $as
t/e ?unctional "ata Model' &ut apart rom some specialiEed applications
in genetics' molecular &iology' and raud investigation' t/e $orld took
little notice.
In t/e +::9s' attention s/ited to o&ject4oriented data&ases. @/ese /ad
some success in ields $/ere it $as necessary to /andle more compleD
data t/an relational systems could easily cope $it/' suc/ as spatial
data&ases' engineering data Bincluding sot$are repositoriesC' and
multimedia data.
In t/e 8999s' t/e as/iona&le area or innovation is t/e QML data&ase.
(s $it/ o&ject data&ases' t/is /as spa$ned a ne$ collection o start4up
companies' &ut at t/e same time t/e key ideas are &eing integrated into
t/e esta&lis/ed relational products.
,., D't'$'!" T)<"!
Considering development in inormation tec/nology and &usiness
applications' t/ese /ave resulted in t/e evolution o several major types
o data&ases. ?igure + illustrates several major conceptual categories o
data&ases t/at may &e ound in many organiEations.
O<"r'ton'( D't'$'!"
@/ese data&ases store detailed data needed to support t/e &usiness
processes and operations o t/e e4&usiness enterprise. @/ey are also
called sub%ect area databases B2""%C' transaction database and
production databases. EDamples are a customer data&ase' /uman
resources data&ases' inventory data&ases' and ot/er data&ases containing
data generated &y &usiness operations. @/is includes data&ases on
Internet and e4commerce activity suc/ as click stream data& descri&ing
t/e online &e/aviour o customers or visitors to a company $e&site.
D!tr$ut"d D't'$'!"!
Many organiEations replicate and distri&ute copies or parts o data&ases
to net$ork servers at a variety o sites. @/ey can also reside in net$ork
servers at a variety o sites. @/ese distri&uted data&ases can reside on
net$ork servers on t/e World Wide We&' on corporate intranets or
eDtranets or on any ot/er company net$orks. "istri&uted data&ases may
&e copies o operational or analytic data&ases' /ypermedia or discussion
data&ases' or any ot/er type o data&ase. 7eplication and distri&ution o
data&ases is done to improve data&ase perormance and security.
Ensuring t/at all o t/e data in an organiEationKs distri&uted data&ases
+,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
are consistently and currently updated is a major c/allenge o distri&uted
data&ase management.
F-ur" *: E7'#<("! o/ t1" #'%or t)<"! o/ d't'$'!"! u!"d $)
or-'n>'ton! 'nd "nd u!"r!.



E7t"rn'( D't'$'!"!
(ccess to $ealt/ o inormation rom eDternal data&ases is availa&le or
a ee rom conventional online services' and $it/ or $it/out c/arges
rom many sources on t/e Internet' especially t/e $orld $ide $e&.
We&sites provide an endless variety o /yperlinked pages o multimedia
documents in hypermedia databases or you to access. "ata are
availa&le in t/e orm o statistics in economics and demograp/ic activity
rom statistical data &anks. Or you can vie$ or do$nload a&stracts or
complete copies o ne$spapers' magaEines' ne$sletters' researc/ papers'
and ot/er pu&lis/ed materials and ot/er periodicals rom bibliographic
and full teDt data&ases.
,.8 D't'$'!" Stor'-" Structur"!
"ata&ase ta&les-indeDes are typically stored in memory or on /ard disk
in one o many orms' ordered-unordered ?lat iles' I2(M' )eaps' )as/
&uckets or %R @rees. @/ese /ave various advantages and disadvantages
discussed in t/is topic. @/e most commonly used are %Rtrees and
I2(M.
Client !C or NC

End User
"ata&ases

EDternal
"ata&ase
on t/e Internet
and online
services
"ata
Ware/ouse
"ata
Marts
Net$ork
2erver
Operational
"ata&ases o
t/e Org
"istri&uted
"ata&ases on
On Intranets
and ot/er
Net$orks
+1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
M"t1od!
F('t F("!
( /('t /(" d't'$'!" descri&es any o various means to encode a data
model Bmost commonly a ta&leC as a plain teDt ile.
( lat ile is a ile t/at contains records' and in $/ic/ eac/ record is
speciied in a single line. ?ields rom eac/ record may simply /ave a
iDed $idt/ $it/ padding' or may &e delimited &y $/itespace' ta&s'
commas BC20C or ot/er c/aracters. EDtra ormatting may &e needed to
avoid delimiter collision. @/ere are no structural relations/ips. @/e data
are MlatM as in a s/eet o paper' in contrast to more compleD models
suc/ as a relational data&ase.
@/e classic eDample o a lat ile data&ase is a &asic name4and4address
list' $/ere t/e data&ase consists o a small' iDed num&er o ields5
'ame' Address' and (hone 'umber. (not/er eDample is a simple
)@ML ta&le' consisting o ro$s and columns. @/is type o data&ase is
routinely encountered' alt/oug/ oten not eDpressly recogniEed as a
data&ase.
I#<("#"nt'ton: It is possi&le to $rite out &y /and' on a s/eet o paper'
a list o names' addresses' and p/one num&ersJ t/is is a lat ile data&ase.
@/is can also &e done $it/ any type$riter or $ord processor. %ut many
pieces o computer sot$are are designed to implement lat ile
data&ases.
Unord"r"d storage typically stores t/e records in t/e order t/ey are
inserted' $/ile /aving good insertion eiciency' it may seem t/at it
$ould /ave ineicient retrieval times' &ut t/is is usually never t/e case
as most data&ases use indeDes on t/e primary keys' resulting in eicient
retrieval times.
Ord"r"d or Linked list storage typically stores t/e records in order and
may /ave to rearrange or increase t/e ile siEe in t/e case a record is
inserted' t/is is very ineicient. )o$ever is &etter or retrieval as t/e
records are pre4sorted BCompleDity OBlogBnCCC.
Structur"d /("!
simplest and most &asic met/od
4 insert eicient' records added at end o ile O Pc/ronologicalK order
4 retrieval ineicient as searc/ing /as to &e linear
4 deletion O deleted records marked
+.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
4 re*uires periodic reorganiEation i ile is very volatile
advantages
4 good or &ulk loading data
4 good or relatively small relations as indeDing over/eads are
avoided
4 good $/en retrievals involve large proportion o records
disadvantages
4 not eicient or selective retrieval using key values' especially i
large
4 sorting may &e time4consuming
not suita&le or PvolatileK ta&les
Hash Buckets
)as/ unctions calculate t/e address o t/e page in $/ic/ t/e record
is to &e stored &ased on one or more ields in t/e record
4 )as/ing unctions c/osen to ensure t/at addresses are spread
evenly across t/e address space
4 PoccupancyK is generally ,9G O .9G o total ile siEe
4 uni*ue address not guaranteed so collision detection and collision
resolution mec/anisms are re*uired
open addressing
c/ained-unc/ained overlo$
pros and cons
4 eicient or eDact matc/es on key ield
4 not suita&le or range retrieval' $/ic/ re*uires se*uential storage
4 calculates $/ere t/e record is stored &ased on ields in t/e record
4 /as/ unctions ensure even spread o data
4 collisions are possi&le' so collision detection and restoration is
re*uired
B+ Trees
@/ese are t/e most used in practice.
t/e time taken to access any tuple is t/e same &ecause same num&er
o nodes searc/ed
indeD is a ull indeD so data ile does not /ave to &e ordered
!ros and cons
+;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
4 versatile data structure O se*uential as $ell as random access
4 access is ast
4 supports eDact' range' part key and pattern matc/es eiciently
4 PvolatileK iles are /andled eiciently &ecause indeD is dynamic O
eDpands and contracts as ta&le gro$s and s/rinks
Less $ell suited to relatively sta&le iles O in t/is case' I2(M is more
eicient.
,.5 D't'$'!" S"r&"r!
( d't'$'!" !"r&"r is a computer program t/at provides data&ase
services to ot/er computer programs or computers' as deined &y t/e
client4server model. @/e term may also reer to a computer dedicated to
running suc/ a program. "ata&ase management systems re*uently
provide data&ase server unctionality' and some "%M2Is Be.g.' My2FLC
rely eDclusively on t/e client4server model or data&ase access.
In a master4slave model' data&ase master servers are central and primary
locations o data $/ile data&ase slave servers are sync/roniEed &ackups
o t/e master acting as proDies.
,.? D't'$'!" R"<(c'ton
"ata&ase replication can &e used on many data&ase management
systems' usually $it/ a master-slave relations/ip &et$een t/e original
and t/e copies. @/e master logs t/e updates' $/ic/ t/en ripple t/roug/
to t/e slaves. @/e slave outputs a message stating t/at it /as received t/e
update successully' t/us allo$ing t/e sending Band potentially re4
sending until successully appliedC o su&se*uent updates.
Multi4master replication' $/ere updates can &e su&mitted to any
data&ase node' and t/en ripple t/roug/ to ot/er servers' is oten desired'
&ut introduces su&stantially increased costs and compleDity $/ic/ may
make it impractical in some situations. @/e most common c/allenge t/at
eDists in multi4master replication is transactional conlict prevention or
resolution. Most sync/ronous or eager replication solutions do conlict
prevention' $/ile async/ronous solutions /ave to do conlict resolution.
?or instance' i a record is c/anged on t$o nodes simultaneously' an
eager replication system $ould detect t/e conlict &eore conirming t/e
commit and a&ort one o t/e transactions. ( laEy replication system
$ould allo$ &ot/ transactions to commit and run a conlict resolution
during resync/roniEation.
"ata&ase replication &ecomes diicult $/en it scales up. Usually' t/e
+<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
scale up goes $it/ t$o dimensions' /oriEontal and vertical5 /oriEontal
scale up /as more data replicas' vertical scale up /as data replicas
located urt/er a$ay in distance. !ro&lems raised &y /oriEontal scale up
can &e alleviated &y a multi4layer multi4vie$ access protocol. 0ertical
scale up runs into less trou&le $/en t/e Internet relia&ility and
perormance are improving.
,.7 R"('ton'( D't'$'!"
( r"('ton'( d't'$'!" is a data&ase t/at conorms to t/e relational
model' and reers to a data&aseIs data and sc/ema Bt/e data&aseIs
structure o /o$ t/ose data are arrangedC. @/e term Mrelational data&aseM
is sometimes inormally used to reer to a relational data&ase
management system' $/ic/ is t/e sot$are t/at is used to create and use
a relational data&ase.
@/e term relational database $as originally deined and coined &y
Edgar Codd at I%M (lmaden 7esearc/ Center in +:;9"ontents
2trictly' a relational data&ase is a collection o relations Bre*uently
called ta&lesC. Ot/er items are re*uently considered part o t/e
data&ase' as t/ey /elp to organiEe and structure t/e data' in addition to
orcing t/e data&ase to conorm to a set o re*uirements.
T"r#no(o-)
7elational data&ase terminology.
7elational data&ase t/eory uses a dierent set o mat/ematical4&ased
terms' $/ic/ are e*uivalent' or roug/ly e*uivalent' to 2FL data&ase
terminology. @/e ta&le &elo$ summariEes some o t/e most important
relational data&ase terms and t/eir 2FL data&ase e*uivalents.
R"('ton'( t"r# S=L "Du&'("nt
relation' &ase relvar ta&le
derived relvar vie$' *uery result' result set
tuple ro$
attri&ute column
R"('ton! or T'$("!
( relation is deined as a set o tuples t/at /ave t/e same attri&utes (
tuple usually represents an o&ject and inormation a&out t/at o&ject.
O&jects are typically p/ysical o&jects or concepts. ( relation is usually
+:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
descri&ed as a ta&le' $/ic/ is organiEed into ro$s and columns. (ll t/e
data reerenced &y an attri&ute are in t/e same domain and conorm to
t/e same constraints.
@/e relational model speciies t/at t/e tuples o a relation /ave no
speciic order and t/at t/e tuples' in turn' impose no order on t/e
attri&utes. (pplications access data &y speciying *ueries' $/ic/ use
operations suc/ as select to identiy tuples' pro%ect to identiy attri&utes'
and %oin to com&ine relations. 7elations can &e modiied using t/e
insert' delete' and update operators. Ne$ tuples can supply eDplicit
values or &e derived rom a *uery. 2imilarly' *ueries identiy tuples or
updating or deleting.
B'!" 'nd D"r&"d R"('ton!
In a relational data&ase' all data are stored and accessed via relations.
7elations t/at store data are called M&ase relationsM' and in
implementations are called Mta&lesM. Ot/er relations do not store data'
&ut are computed &y applying relational operations to ot/er relations.
@/ese relations are sometimes called Mderived relationsM. In
implementations t/ese are called Mvie$sM or M*ueriesM. "erived relations
are convenient in t/at t/oug/ t/ey may gra& inormation rom several
relations' t/ey act as a single relation. (lso' derived relations can &e
used as an a&straction layer.
Keys
( uni*ue key is a kind o constraint t/at ensures t/at an o&ject' or
critical inormation a&out t/e o&ject' occurs in at most one tuple in a
given relation. ?or eDample' a sc/ool mig/t $ant eac/ student to /ave a
separate locker. @o ensure t/is' t/e data&ase designer creates a key on
t/e locker attri&ute o t/e student relation. 3eys can include more t/an
one attri&ute' or eDample' a nation may impose a restriction t/at no
province can /ave t$o cities $it/ t/e same name. @/e key $ould
include province and city name. @/is $ould still allo$ t$o dierent
provinces to /ave a to$n called 2pringield &ecause t/eir province is
dierent. ( key over more t/an one attri&ute is called a compound key.
Foreign Keys
( oreign key is a reerence to a key in anot/er relation' meaning t/at
t/e reerencing tuple /as' as one o its attri&utes' t/e values o a key in
t/e reerenced tuple. ?oreign keys need not /ave uni*ue values in t/e
reerencing relation. ?oreign keys eectively use t/e values o attri&utes
in t/e reerenced relation to restrict t/e domain o one or more attri&utes
in t/e reerencing relation.
89
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
( oreign key could &e descri&ed ormally as5 M?or all tuples in t/e
reerencing relation projected over t/e reerencing attri&utes' t/ere must
eDist a tuple in t/e reerenced relation projected over t/ose same
attri&utes suc/ t/at t/e values in eac/ o t/e reerencing attri&utes matc/
t/e corresponding values in t/e reerenced attri&utes.M
8.: CONCLUSION
"ata&ase applications are used to store and manipulate data. ( data&ase
application can &e used in many &usiness unctions including sales and
inventory tracking' accounting' employee &eneits' payroll' production
and more. "ata&ase programs or personal computers come in various
s/ape and siEes. ( data&ase remains undamental or t/e implementation
o any data&ase management system.
5.: SUMMARY
( "ata&ase is a structured collection o data t/at is managed to meet
t/e needs o a community o users. @/e structure is ac/ieved &y
organiEing t/e data according to a data&ase model
@/e earliest kno$n use o t/e term database $as in Novem&er +:.='
$/en t/e 2ystem "evelopment Corporation sponsored a symposium
under t/e title Development and Management of a "omputer#
centered Data $ase.
Considering development in inormation tec/nology and &usiness
applications /ave resulted in t/e evolution o several major types o
data&ases.
"ata&ase ta&les-indeDes are typically stored in memory or on /ard
disk in one o many orms' ordered-unordered ?lat iles' I2(M'
)eaps' )as/ &uckets or %R @rees
( d't'$'!" !"r&"r is a computer program t/at provides data&ase
services to ot/er computer programs or computers' as deined &y t/e
client4server model
"ata&ase replication can &e used on many data&ase management
systems' usually $it/ a master-slave relations/ip &et$een t/e
original and t/e copies
( r"('ton'( d't'$'!" is a data&ase t/at conorms to t/e relational
model' and reers to a data&aseIs data and sc/ema
?.: TUTOR@MARAED ASSIGNMENT
+. "eine t/e terms5 ?ield' 7ecords' ?ield 7elation and (ttri&ute
8. %riely descri&e a lat ile
8+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
7.: REFERENCESBFURTCER READINGS
Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata
%anksM. "ommunications of the A"M *, B.C5 =;;O=<;. doi5
+9.++,1-=.8=<,.=.8.<1.
OK%rien' #ames (. B899=C. B++
t/
EditionC Introduction to Inormation
2ystems. McGra$4)ill.
88
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
UNIT , DATABASE CONCEPTS
CONTENTS
+.9 Introduction
8.9 O&jectives
=.9 Main Content
=.+ Create' 7ead' Update and "elete
=.8 (CI"
=.= 3eys
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
@/ere are &asic and standard concepts associated $it/ all data&ases' and
t/ese are $/at $e $ill discuss in muc/ detail in t/is unit. @/ese include
t/e concept o Creating' 7eading' Updating and "eleting BC7U"C data'
(CI" BAtomicity& "onsistency& Isolation& DurabilityC' and 3eys o
dierent kinds.
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
kno$ t/e meaning o t/e acronymn C7U"
understand t/e applications o data&ases
kno$ t/e meaning o t/e acronymn (CI" and /o$ eac/ mem&ers o
t/e (CI" dier rom eac/ ot/er
understand t/e structure o a data&ase
kno$ t/e types o keys associated $it/ data&ases.
,.: MAIN CONTENT
,.* Cr"'t"E R"'dE U<d't" 'nd D"("t"
Create' read' update and delete BCRUDC are t/e our &asic unctions o
persistent storage a major part o nearly all computer sot$are.
2ometimes ")*D is eDpanded $it/ t/e $ords retrieve instead o read
or destroys instead o delete. It is also sometimes used to descri&e user
interace conventions t/at acilitate vie$ing' searc/ing' and c/anging
inormationJ oten using computer4&ased orms and reports.
(lternate terms or C7U" Bone initialism and t/ree acronymsC5
8=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
(%C"5 add' &ro$se' c/ange' delete
(CI"5 add' c/ange' in*uire' delete S t/oug/ t/is can &e conused $it/
t/e transactional use o t/e acronym (CI".
%7E("5 &ro$se' read' edit' add' delete
0("EB7C5 vie$' add' delete' edit Band restore' or systems supporting
transaction processingC
D't'$'!" A<<(c'ton!
@/e acronym ")*D reers to all o t/e major unctions t/at need to &e
implemented in a relational data&ase application to consider it complete.
Eac/ letter in t/e acronym can &e mapped to a standard 2FL statement5
O<"r'ton S=L
Create IN2E7@
7ead B7etrieveC 2ELEC@
Update U!"(@E
"elete B"estroyC "ELE@E
(lt/oug/ a relational data&ase is a common persistence layer in
sot$are applications' t/ere are numerous ot/ers. C7U" can &e
implemented $it/ an o&ject data&ase' an QML data&ase' lat teDt iles'
custom ile ormats' tape' or card' or eDample.
Google 2c/olar lists t/e irst reerence to create4read4update4delete as &y
3ilov in +::9. @/e concept seems to &e also descri&ed in more detail in
3ilovIs +::< &ook.
U!"r Int"r/'c"
C7U" is also relevant at t/e user interace level o most applications.
?or eDample' in address &ook sot$are' t/e &asic storage unit is an
individual contact entry. (s a &are minimum' t/e sot$are must allo$
t/e user to5
Create or add ne$ entries
7ead' retrieve' searc/' or vie$ eDisting entries
Update or edit eDisting entries
"elete eDisting entries
Wit/out at least t/ese our operations' t/e sot$are cannot &e considered
complete. %ecause t/ese operations are so undamental' t/ey are oten
documented and descri&ed under one compre/ensive /eading' suc/ as
Mcontact managementM or Mcontact maintenanceM Bor Mdocument
managementM in general' depending on t/e &asic storage unit or t/e
particular applicationC.
8,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.+ ACID
In computer science' ACID BAtomicity& "onsistency& Isolation&
DurabilityC is a set o properties t/at guarantee t/at data&ase transactions
are processed relia&ly. In t/e conteDt o data&ases' a single logical
operation on t/e data is called a transaction.
(n eDample o a transaction is a transer o unds rom one account to
anot/er' even t/oug/ it mig/t consist o multiple individual operations
Bsuc/ as de&iting one account and crediting anot/erC.
Ato#ct)
(tomicity reers to t/e a&ility o t/e "%M2 to guarantee t/at eit/er all
o t/e tasks o a transaction are perormed or none o t/em are. ?or
eDample' t/e transer o unds can &e completed or it can ail or a
multitude o reasons' &ut atomicity guarantees t/at one account $onIt &e
de&ited i t/e ot/er is not credited. (tomicity states t/at data&ase
modiications must ollo$ an Tall or not/ingU rule. Eac/ transaction is
said to &e Tatomic.U I one part o t/e transaction ails' t/e entire
transaction ails. It is critical t/at t/e data&ase management system
maintain t/e atomic nature o transactions in spite o any "%M2'
operating system or /ard$are ailure.
Con!!t"nc)
Consistency property ensures t/at t/e data&ase remains in a consistent
state &eore t/e start o t/e transaction and ater t/e transaction is over
B$/et/er successul or notC.
Consistency states t/at only valid data $ill &e $ritten to t/e data&ase. I'
or some reason' a transaction is eDecuted t/at violates t/e data&aseKs
consistency rules' t/e entire transaction $ill &e rolled &ack and t/e
data&ase $ill &e restored to a state consistent $it/ t/ose rules. On t/e
ot/er /and' i a transaction successully eDecutes' it $ill take t/e
data&ase rom one state t/at is consistent $it/ t/e rules to anot/er state
t/at is also consistent $it/ t/e rules.
Dur'$(t)
"ura&ility reers to t/e guarantee t/at once t/e user /as &een notiied o
success' t/e transaction $ill persist' and not &e undone. @/is means it
$ill survive system ailure' and t/at t/e data&ase system /as c/ecked t/e
integrity constraints and $onIt need to a&ort t/e transaction. Many
data&ases implement dura&ility &y $riting all transactions into a log t/at
can &e played &ack to recreate t/e system state rig/t &eore t/e ailure. (
transaction can only &e deemed committed ater it is saely in t/e log.
81
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
I#<("#"nt'ton
Implementing t/e (CI" properties correctly is not simple. !rocessing a
transaction oten re*uires a num&er o small c/anges to &e made'
including updating indices t/at are used &y t/e system to speed up
searc/es. @/is se*uence o operations is su&ject to ailure or a num&er
o reasonsJ or instance' t/e system may /ave no room let on its disk
drives' or it may /ave used up its allocated C!U time.
(CI" suggests t/at t/e data&ase &e a&le to perorm all o t/ese
operations at once. In act t/is is diicult to arrange. @/ere are t$o
popular amilies o tec/ni*ues5 $rite a/ead logging and s/ado$ paging.
In &ot/ cases' locks must &e ac*uired on all inormation t/at is updated'
and depending on t/e implementation' on all data t/at is &eing read. In
$rite a/ead logging' atomicity is guaranteed &y ensuring t/at
inormation a&out all c/anges is $ritten to a log &eore it is $ritten to
t/e data&ase. @/at allo$s t/e data&ase to return to a consistent state in
t/e event o a cras/. In s/ado$ing' updates are applied to a copy o t/e
data&ase' and t/e ne$ copy is activated $/en t/e transaction commits.
@/e copy reers to unc/anged parts o t/e old version o t/e data&ase'
rat/er t/an &eing an entire duplicate.
Until recently almost all data&ases relied upon locking to provide (CI"
capa&ilities. @/is means t/at a lock must al$ays &e ac*uired &eore
processing data in a data&ase' even on read operations. Maintaining a
large num&er o locks' /o$ever' results in su&stantial over/ead as $ell
as /urting concurrency. I user ( is running a transaction t/at /as read a
ro$ o data t/at user % $ants to modiy' or eDample' user % must $ait
until user (Is transaction is inis/ed.
(n alternative to locking is multiversion concurrency control in $/ic/
t/e data&ase maintains separate copies o any data t/at is modiied. @/is
allo$s users to read data $it/out ac*uiring any locks. Going &ack to t/e
eDample o user ( and user %' $/en user (Is transaction gets to data
t/at user % /as modiied' t/e data&ase is a&le to retrieve t/e eDact
version o t/at data t/at eDisted $/en user ( started t/eir transaction.
@/is ensures t/at user ( gets a consistent vie$ o t/e data&ase even i
ot/er users are c/anging data t/at user ( needs to read. ( natural
implementation o t/is idea results in a relaDation o t/e isolation
property' namely snaps/ot isolation.
It is diicult to guarantee (CI" properties in a net$ork environment.
Net$ork connections mig/t ail' or t$o users mig/t $ant to use t/e
same part o t/e data&ase at t/e same time.
@$o4p/ase commit is typically applied in distri&uted transactions to
ensure t/at eac/ participant in t/e transaction agrees on $/et/er t/e
transaction s/ould &e committed or not.
8.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Care must &e taken $/en running transactions in parallel. @$o p/ase
locking is typically applied to guarantee ull isolation.
,., A")!
,.,.* For"-n A")
In t/e conteDt o relational data&ases' a oreign key is a reerential
constraint &et$een t$o ta&les. @/e oreign key identiies a column or a
set o columns in one BreerencingC ta&le t/at reers to a column or set o
columns in anot/er BreerencedC ta&le. @/e columns in t/e reerencing
ta&le must &e t/e primary key or ot/er candidate key in t/e reerenced
ta&le. @/e values in one ro$ o t/e reerencing columns must occur in a
single ro$ in t/e reerenced ta&le. @/us' a ro$ in t/e reerencing ta&le
cannot contain values t/at donIt eDist in t/e reerenced ta&le BeDcept
potentially NULLC. @/is $ay reerences can &e made to link inormation
toget/er and it is an essential part o data&ase normaliEation. Multiple
ro$s in t/e reerencing ta&le may reer to t/e same ro$ in t/e reerenced
ta&le. Most o t/e time' it relects t/e one Bmaster ta&le' or reerenced
ta&leC to many Bc/ild ta&le' or reerencing ta&leC relations/ip.
@/e reerencing and reerenced ta&le may &e t/e same ta&le' i.e. t/e
oreign key reers &ack to t/e same ta&le. 2uc/ a oreign key is kno$n
in 2FL5899= as !"(/@r"/"r"ncn- or r"cur!&" oreign key.
( ta&le may /ave multiple oreign keys' and eac/ oreign key can /ave a
dierent reerenced ta&le. Eac/ oreign key is enorced independently
&y t/e data&ase system. @/ereore' cascading relations/ips &et$een
ta&les can &e esta&lis/ed using oreign keys.
Improper oreign key-primary key relations/ips or not enorcing t/ose
relations/ips are oten t/e source o many data&ase and data modeling
pro&lems.
R"/"r"nt'( Acton!
%ecause t/e "%M2 enorces reerential constraints' it must ensure data
integrity i ro$s in a reerenced ta&le are to &e deleted Bor updatedC. I
dependent ro$s in reerencing ta&les still eDist' t/ose reerences /ave to
&e considered. 2FL5 899= speciies 1 dierent r"/"r"nt'( 'cton! t/at
s/all take place in suc/ occurrences5
C(2C("E
7E2@7IC@
NO (C@ION
2E@ NULL
8;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
2E@ "E?(UL@
CASCADE
W/enever ro$s in t/e master BreerencedC ta&le are deleted' t/e
respective ro$s o t/e c/ild BreerencingC ta&le $it/ a matc/ing oreign
key column $ill get deleted as $ell. ( oreign key $it/ a cascade delete
means t/at i a record in t/e parent ta&le is deleted' t/en t/e
corresponding records in t/e c/ild ta&le $ill automatically &e deleted.
@/is is called a cascade delete.
EDample @a&les5 CustomerBcustomerVid'cname'caddressCand
OrderBcustomerVid'products'paymentC
Customer is t/e master ta&le and Order is t/e c/ild ta&le' $/ere
IcustomerVidI is t/e oreign key in Order and represents t/e customer
$/o placed t/e order. W/en a ro$ o Customer is deleted' any Order
ro$ matc/ing t/e deleted CustomerIs customerVid $ill also &e deleted.
t/e values are deleted in t/e ro$ like i $e delete one ro$ in t/e parent
ta&le t/en t/e same ro$ in t/e c/ild ta&le $ill &e automatically deleted.
RESTRICT
( ro$ in t/e reerenced ta&le cannot &e updated or deleted i dependent
ro$s still eDist. In t/at case' no data c/ange is even attempted and
s/ould not &e allo$ed.
NO ACTION
@/e U!"(@E or "ELE@E 2FL statement is eDecuted on t/e reerenced
ta&le. @/e "%M2 veriies at t/e end o t/e statement eDecution i none
o t/e reerential relations/ips is violated. @/e major dierence to
7E2@7IC@ is t/at triggers or t/e statement semantics itsel may give a
result in $/ic/ no oreign key relations/ips is violated. @/en' t/e
statement can &e eDecuted successully.
SET NULL
@/e oreign key values in t/e reerencing ro$ are set to NULL $/en t/e
reerenced ro$ is updated or deleted. @/is is only possi&le i t/e
respective columns in t/e reerencing ta&le are nulla&le. "ue to t/e
semantics o NULL' a reerencing ro$ $it/ NULLs in t/e oreign key
columns does not re*uire a reerenced ro$.
8<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
SET DEFAULT
2imilarly to 2E@ NULL' t/e oreign key values in t/e reerencing ro$
are set to t/e column deault $/en t/e reerenced ro$ is updated or
deleted.
,.,.+ C'ndd't" A")
In t/e relational model' a c'ndd't" 0") o a relvar Brelation varia&leC is
a set o attri&utes o t/at relvar suc/ t/at at all times it /olds in t/e
relation assigned to t/at varia&le t/at t/ere are no t$o distinct turples
$it/ t/e same values or t/ese attri&utes and t/ere is not a proper su&set
o t/is set o attri&utes or $/ic/ B+C /olds.
2ince a superkey is deined as a set o attri&utes or $/ic/ B+C /olds' $e
can also deine a candidate key as a minimal superkey' i.e. a superkey o
$/ic/ no proper su&set is also a superkey.
@/e importance o candidate keys is t/at t/ey tell us /o$ $e can
identiy individual tuples in a relation. (s suc/ t/ey are one o t/e most
important types o data&ase constraint t/at s/ould &e speciied $/en
designing a data&ase sc/ema. 2ince a relation is a set Bno duplicate
elementsC' it /olds t/at every relation $ill /ave at least one candidate
key B&ecause t/e entire /eading is al$ays a superkeyC. 2ince in some
7"%M2s ta&les may also represent multisets B$/ic/ strictly means
t/ese "%M2s are not relationalC' it is an important design rule to speciy
eDplicitly at least one candidate key or eac/ relation. ?or practical
reasons 7"%M2s usually re*uire t/at or eac/ relation one o its
candidate keys is declared as t/e primary key' $/ic/ means t/at it is
considered as t/e preerred $ay to identiy individual tuples. ?oreign
keys' or eDample' are usually re*uired to reerence suc/ a primary key
and not any o t/e ot/er candidate keys.
D"t"r#nn- C'ndd't" A")!
@/e previous eDample only illustrates t/e deinition o candidate key
and not /o$ t/ese are in practice determined. 2ince most relations /ave
a large num&er or even ininitely many instances it $ould &e impossi&le
to determine all t/e sets o attri&utes $it/ t/e uni*ueness property or
eac/ instance. Instead it is easier to consider t/e sets o real4$orld
entities t/at are represented &y t/e relation and determine $/ic/
attri&utes o t/e entities uni*uely identiy t/em. ?or eDample a relation
+mployeeB'ame' Address' DeptC pro&a&ly represents employees and
t/ese are likely to &e uni*uely identiied &y a com&ination o 'ame and
Address $/ic/ is t/ereore a superkey' and unless t/e same /olds or
only 'ame or only Address' t/en t/is com&ination is also a candidate
key.
8:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
In order to determine correctly t/e candidate keys it is important to
determine all superkeys' $/ic/ is especially diicult i t/e relation
represents a set o relations/ips rat/er t/an a set o entities
,.,., UnDu" 0")
In relational data&ase design' a unDu" 0") or <r#'r) 0") is a
candidate key to uni*uely identiy eac/ ro$ in a ta&le. ( uni*ue key or
primary key comprises a single column or set o columns. No t$o
distinct ro$s in a ta&le can /ave t/e same value Bor com&ination o
valuesC in t/ose columns. "epending on its design' a ta&le may /ave
ar&itrarily many uni*ue keys &ut at most one primary key.
( uni*ue key must uni*uely identiy all possible ro$s t/at eDist in a
ta&le and not only t/e currently eDisting ro$s. EDamples o uni*ue keys
are 2ocial 2ecurity num&ers Bassociated $it/ a speciic personC or
I2%Ns Bassociated $it/ a speciic &ookC. @elep/one &ooks and
dictionaries cannot use names or $ords or "e$ey "ecimal system
num&ers as candidate keys &ecause t/ey do not uni*uely identiy
telep/one num&ers or $ords.
( primary key is a special case o uni*ue keys. @/e major dierence is
t/at or uni*ue keys t/e implicit NO@ NULL constraint is not
automatically enorced' $/ile or primary keys it is. @/us' t/e values in
a uni*ue key column may or may not &e NULL. (not/er dierence is
t/at primary keys must &e deined using anot/er syntaD.
@/e relational model' as eDpressed t/roug/ relational calculus and
relational alge&ra' does not distinguis/ &et$een primary keys and ot/er
kinds o keys. !rimary keys $ere added to t/e 2FL standard mainly as a
convenience to t/e application programmer.
Uni*ue keys as $ell as primary keys can &e reerenced &y orm
,.,.8 Su<"r0")
A !u<"r0") is deined in t/e relational model o data&ase organiEation
as a set o attri&utes o a relation varia&le BrelvarC or $/ic/ it /olds t/at
in all relations assigned to t/at varia&le t/ere are no t$o distinct tuples
Bro$sC t/at /ave t/e same values or t/e attri&utes in t/is set.
E*uivalently a superkey can also &e deined as a set o attri&utes o a
relvar upon $/ic/ all attri&utes o t/e relvar are unctionally dependent.
Note t/at i attri&ute set , is a superkey o relvar )' t/en at all times it is
t/e case t/at t/e projection o ) over , /as t/e same cardinality as )
itsel.
=9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Inormally' a superkey is a set o columns $it/in a ta&le $/ose values
can &e used to uni*uely identiy a ro$. ( candidate key is a minimal set
o columns necessary to identiy a ro$' t/is is also called a minimal
superkey. ?or eDample' given an employee ta&le' consisting o t/e
columns employeeI"' name' jo&' and departmentI"' $e could use t/e
employeeI" in com&ination $it/ any or all ot/er columns o t/is ta&le
to uni*uely identiy a ro$ in t/e ta&le. EDamples o superkeys in t/is
ta&le $ould &e WemployeeI"' NameX' WemployeeI"' Name' jo&X' and
WemployeeI"' Name' jo&' departmentI"X.
In a real data&ase $e donIt need values or all o t/ose columns to
identiy a ro$. We only need' per our eDample' t/e set WemployeeI"X.
@/is is a minimal superkey O t/at is' a minimal set o columns t/at can
&e used to identiy a single ro$. 2o' employeeI" is a candidate key.
E7'#<("
En-(!1 Mon'rc1!
Mon'rc1 N'#" Mon'rc1 Nu#$"r Ro)'( Cou!"
Ed$ard II !lantagenet
Ed$ard III !lantagenet
7ic/ard II !lantagenet
)enry I0 Lancaster
In t/is eDample' t/e possi&le superkeys are5
WMonarc/ Name' Monarc/ Num&erX
WMonarc/ Name' Monarc/ Num&er' 7oyal )ouseX
,.,.8 Surro-'t" 0")
( !urro-'t" 0") in a data&ase is a uni*ue identiier or eit/er an entity
in t/e modeled $orld or an ob%ect in t/e data&ase. @/e surrogate key is
not derived rom application data.
D"/nton
@/ere appear to &e t$o deinitions o a surrogate in t/e literature. We
s/all call t/ese surrogate -!. and surrogate -/.5
Surro-'t" 5*6
@/is deinition is &ased on t/at given &y )all' O$lett and @odd B+:;.C.
)ere a surrogate represents an entity in t/e outside $orld. @/e surrogate
is internally generated &y t/e system &ut is nevert/eless visi&le &y t/e
user or application.
=+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Surro-'t" 5+6
@/is deinition is &ased on t/at given &y Wieringa and de #ung B+::+C.
)ere a surrogate represents an ob%ect in t/e data&ase itsel. @/e
surrogate is internally generated &y t/e system and is invisi&le to t/e
user or application.
We s/all adopt t/e surrogate -!. deinition t/roug/out t/is article
largely &ecause it is more data model rat/er t/an storage model oriented.
2ee "ate B+::<C.
(n important distinction eDists &et$een a surrogate and a primary key'
depending on $/et/er t/e data&ase is a current data&ase or a temporal
data&ase. ( current database stores only currently valid data' t/ereore
t/ere is a one4to4one correspondence &et$een a surrogate in t/e
modelled $orld and t/e primary key o some o&ject in t/e data&aseJ in
t/is case t/e surrogate may &e used as a primary key' resulting in t/e
term surrogate key. )o$ever' in a temporal data&ase t/ere is a many4to4
one relations/ip &et$een primary keys and t/e surrogate. 2ince t/ere
may &e several o&jects in t/e data&ase corresponding to a single
surrogate' $e cannot use t/e surrogate as a primary keyJ anot/er
attri&ute is re*uired' in addition to t/e surrogate' to uni*uely identiy
eac/ o&ject.
(lt/oug/ )all et alia B+:;.C say not/ing a&out t/is' other aut/ors /ave
argued t/at a surrogate s/ould /ave t/e ollo$ing constraints5
t/e value is uni*ue system4$ide' /ence never reusedJ
t/e value is system generatedJ
t/e value is not manipula&le &y t/e user or applicationJ
t/e value contains no semantic meaningJ
t/e value is not visi&le to t/e user or applicationJ
t/e value is not composed o several values rom dierent domains.
Surro-'t"! n Pr'ctc"
In a current data&ase' t/e surrogate key can &e t/e primary key'
generated &y t/e data&ase management system and not derived rom any
application data in t/e data&ase. @/e only signiicance o t/e surrogate
key is to act as t/e primary key. It is also possi&le t/at t/e surrogate key
eDists in addition to t/e data&ase4generated uuid' e.g. a )7 num&er or
eac/ employee &esides t/e UUI" o eac/ employee.
( surrogate key is re*uently a se*uential num&er Be.g. a 2y&ase or 2FL
2erver Midentity columnM' a !ostgre2FL serial' an Oracle 2EFUENCE
or a column deined $it/ (U@OVINC7EMEN@ in My2FLC &ut doesnIt
=8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
/ave to &e. )aving t/e key independent o all ot/er columns insulates
t/e data&ase relations/ips rom c/anges in data values or data&ase
design Bmaking t/e data&ase more agileC and guarantees uni*ueness.
In a temporal data&ase' it is necessary to distinguis/ &et$een t/e
surrogate key and t/e primary key. @ypically' every ro$ $ould /ave
&ot/ a primary key and a surrogate key. @/e primary key identiies t/e
uni*ue ro$ in t/e data&ase' t/e surrogate key identiies t/e uni*ue entity
in t/e modelled $orldJ t/ese t$o keys are not t/e same. ?or eDample'
ta&le 0taff may contain t$o ro$s or M#o/n 2mit/M' one ro$ $/en /e
$as employed &et$een +::9 and +:::' anot/er ro$ $/en /e $as
employed &et$een 899+ and 899.. @/e surrogate key is identical Bnon4
uni*ueC in &ot/ ro$s /o$ever t/e primary key 1ill &e uni*ue.
2ome data&ase designers use surrogate keys religiously regardless o t/e
suita&ility o ot/er candidate keys' $/ile ot/ers $ill use a key already
present in t/e data' i t/ere is one.
( surrogate may also &e called a
surrogate key'
entity identiier'
system4generated key'
data&ase se*uence num&er'
synt/etic key'
tec/nical key' or
ar&itrary uni*ue identiier.
2ome o t/ese terms descri&e t/e $ay o generating ne$ surrogate
values rat/er t/an t/e nature o t/e surrogate concept.
8.: CONCLUSION
@/e undamental concepts t/at guide t/e operation o a data&ase' t/at is'
C7U" and (CI" remains t/e same irrespective o t/e types and models
o data&ases t/at emerge &y t/e day. )o$ever' one cannot rule out t/e
possi&ilities o ot/er concepts emerging $it/ time in t/e near uture.
5.: SUMMARY
Create' read' update and delete BCRUDC are t/e our &asic unctions
o persistent storage a major part o nearly all computer sot$are.
In computer science' ACID BAtomicity& "onsistency& Isolation&
DurabilityC is a set o properties t/at guarantee t/at data&ase
transactions are processed relia&ly. In t/e conteDt o data&ases' a
single logical operation on t/e data is called a transaction.
==
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
In t/e conteDt o relational data&ases a oreign key is a reerential
constraint &et$een t$o ta&les
In t/e relational model' a c'ndd't" 0") o a relvar Brelation
varia&leC is a set o attri&utes o t/at relvar suc/ t/at at all times it
/olds in t/e relation assigned to t/at varia&le t/at t/ere are no t$o
distinct tuples $it/ t/e same values or t/ese attri&utes
In relational data&ase design' a unDu" 0") or <r#'r) 0") is a
candidate key to uni*uely identiy eac/ ro$ in a ta&le
Su<"r0"): A !u<"r0") is deined in t/e relational model o data&ase
organiEation as a set o attri&utes o a relation varia&le BrelvarC or
$/ic/ it /olds t/at in all relations assigned to t/at varia&le t/ere are
no t$o distinct tuples Bro$sC t/at /ave t/e same values or t/e
attri&utes in t/is set
( !urro-'t" 0") in a data&ase is a uni*ue identiier or eit/er an
entity in t/e modeled $orld or an ob%ect in t/e data&ase.
?.: TUTOR@MARAED ASSIGNMENT
+. W/at are t/e meaning o t/e acronyms C7U" and (CI"
8. W/at are t/e constraints associated $it/ surrogate keys
7.: REFERENCESBFURTCER READINGS
Nijssen' G.M. B+:;.C. Modelling in Data $ase Management 0ystems.
Nort/4)olland !u&. Co. I2%N 94;89,49,1:48.
Engles' 7.W.5 B+:;8C. A Tutorial on Data#$ase 2rgani3ation' (nnual
7evie$ in (utomatic !rogramming' 0ol.;' !art +' !ergamon
!ress' ODord' pp. +O.,.
Langeors' %5 B+:.<C. +lementary 4iles and +lementary 4ile )ecords'
!roceedings o ?ile .<' an I?I!-I(G International 2eminar on
?ile Organisation' (msterdam' Novem&er' pp. <:O:..
@/e Identiication o O&jects and 7oles5 O&ject Identiiers 7evisited &y
Wieringa and de #ung B+::+C.
7elational "ata&ase Writings +::,O+::; &y C.#. "ate B+::<C' C/apters
++ and +8.
Carter' %reck. MIntelligent 0ersus 2urrogate 3eysM. 7etrieved on
899.4+849=.
7ic/ardson' Lee. MCreate "ata "isaster5 (void Uni*ue IndeDes O
BMistake = o +9CM.
%erkus' #os/. M"ata&ase 2oup5 !rimary 3eyvil' !art IM.
=,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Gray' #im B2eptem&er +:<+C. M@/e @ransaction Concept5 0irtues and
LimitationsM. (roceedings of the 5th International "onference on
6ery 7arge Data $ases5 pages +,,O+1,' +:=== 0allco !ark$ay'
Cupertino C( :19+,5 @andem Computers.
#im Gray L (ndreas 7euter' "istri&uted @ransaction !rocessing5
Concepts and @ec/ni*ues' Morgan 3auman +::=. I2%N
+11<.9+:98.
"ate' C/ristop/er B899=C. M15 IntegrityM' An Introduction to Database
0ystems. (ddison4Wesley' pp. 8.<48;.. I2%N :;<49=8++<:1.+.
=1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
UNIT 8 DATABASE MODELS *
CONTENTS
+.9 Introduction
8.9 O&jectives
=.9 Main Content
=.+ )ierarc/ical Model
=.8 Net$ork Model
=.= O&ject47elational "ata&ase
=., O&ject "ata&ase
=.1 (ssociative Model o "ata
=.. Column4Oriented "%M2
=.; Navigational "ata&ase
=.< "istri&uted "ata&ase
=.: 7eal @ime "ata&ase
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
2everal models /ave evolved in t/e course o development o data&ases
and data&ase management system. @/is /as resulted in several orms o
models deployed &y users depending on t/eir needs and understanding.
In t/is unit $e set t/e pace to Q4ray t/ese models and conclude in
su&se*uent unit.
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
kno$ and deine t/e dierent types o data&ase models
dierentiate t/e data&ase models rom eac/ ot/er
sketc/ t/e rame$ork o /ierarc/ical and net$ork models
understand t/e concepts and model &e/ind t/e models
kno$ t/e advantages and disadvantages o t/e dierent models.
,.: MAIN CONTENT
,.* C"r'rc1c'( Mod"(
In a /ierarc/ical model' data is organiEed into an inverted tree4like
structure' implying a multiple do$n$ard link in eac/ node to descri&e
t/e nesting' and a sort ield to keep t/e records in a particular order in
=.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
eac/ same4level list. @/is structure arranges t/e various data elements in
a /ierarc/y and /elps to esta&lis/ logical relations/ips among data
elements o multiple iles. Eac/ unit in t/e model is a record $/ic/ is
also kno$n as a node. In suc/ a model' eac/ record on one level can &e
related to multiple records on t/e neDt lo$er level. ( record t/at /as
su&sidiary records is called a parent and t/e su&sidiary records are called
c/ildren. "ata elements in t/is model are $ell suited or one4to4many
relations/ips $it/ ot/er data elements in t/e data&ase.
F-ur" *: A C"r'rc1c'( Structur"


@/is model is advantageous $/en t/e data elements are in/erently
/ierarc/ical. @/e disadvantage is t/at in order to prepare t/e data&ase it
&ecomes necessary to identiy t/e re*uisite groups o iles t/at are to &e
logically integrated. )ence' a /ierarc/ical data model may not al$ays &e
leDi&le enoug/ to accommodate t/e dynamic needs o an organiEation.
E7'#<("
(n eDample o a 1"r'rc1c'( d't' #od"( $ould &e i an organiEation
/ad records o employees in a ta&le Bentity typeC called MEmployeesM. In
t/e ta&le t/ere $ould &e attri&utes-columns suc/ as ?irst Name' Last
Name' #o& Name and Wage. @/e company also /as data a&out t/e
employeeKs c/ildren in a separate ta&le called MC/ildrenM $it/ attri&utes
suc/ as ?irst Name' Last Name' and date o &irt/. @/e Employee ta&le
represents a parent segment and t/e C/ildren ta&le represents a C/ild
segment. @/ese t$o segments orm a /ierarc/y $/ere an employee may
/ave many c/ildren' &ut eac/ c/ild may only /ave one parent.
"epartment
"ata Element
!roject ( "ata
Element
!roject % "ata
Element
Employee +
"ata Element
Employee %
"ata Element
=;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Consider t/e ollo$ing structure5
E#<No D"!-n'ton R"<ort!To
+9 "irector
89 2enior Manager +9
=9 @ypist 89
,9 !rogrammer 89
In t/is' t/e Mc/ildM is t/e same type as t/e MparentM. @/e /ierarc/y stating
EmpNo +9 is &oss o 89' and =9 and ,9 eac/ report to 89 is represented
&y t/e M7eports@oM column. In 7elational data&ase terms' t/e 7eports@o
column is a oreign key reerencing t/e EmpNo column. I t/e Mc/ildM
data type $ere dierent' it $ould &e in a dierent ta&le' &ut t/ere $ould
still &e a oreign key reerencing t/e EmpNo column o t/e employees
ta&le.
@/is simple model is commonly kno$n as t/e adjacency list model' and
$as introduced &y "r. Edgar ?. Codd ater initial criticisms suraced
t/at t/e relational model could not model /ierarc/ical data.
,.+ N"t2or0 Mod"(
In t/e net$ork model' records can participate in any num&er o named
relations/ips. Eac/ relations/ip associates a record o one type Bcalled
t/e o2n"rC $it/ multiple records o anot/er type Bcalled t/e #"#$"rC.
@/ese relations/ips Bsome$/at conusinglyC are called !"t!. ?or eDample
a student mig/t &e a mem&er o one set $/ose o$ner is t/e course t/ey
are studying' and a mem&er o anot/er set $/ose o$ner is t/e college
t/ey &elong to. (t t/e same time t/e student mig/t &e t/e o$ner o a set
o email addresses' and o$ner o anot/er set containing p/one num&ers.
@/e main dierence &et$een t/e net$ork model and /ierarc/ical model
is t/at in a net$ork model' a c/ild can /ave a num&er o parents $/ereas
in a /ierarc/ical model' a c/ild can /ave only one parent. @/e
/ierarc/ical model is t/ereore a su&set o t/e net$ork model.
=<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
F-ur" ,: N"t2or0 Structur"


!rogrammatic access to net$ork data&ases is traditionally &y means o a
navigational data manipulation language' in $/ic/ programmers
navigate rom a current record to ot/er related records using ver&s suc/
as find o1ner' find next' and find prior. @/e most common eDample o
suc/ an interace is t/e CO%OL4&ased "ata Manipulation Language
deined &y CO"(2AL.
Net$ork data&ases are traditionally implemented &y using c/ains o
pointers &et$een related records. @/ese pointers can &e node num&ers or
disk addresses.
@/e net$ork model &ecame popular &ecause it provided considera&le
leDi&ility in modelling compleD data relations/ips' and also oered
/ig/ perormance &y virtue o t/e act t/at t/e access ver&s used &y
programmers mapped directly to pointer4ollo$ing in t/e
implementation.
@/e net$ork model provides greater advantage t/an t/e /ierarc/ical
model in t/at it promotes greater leDi&ility and data accessi&ility' since
records at a lo$er level can &e accessed $it/out accessing t/e records
a&ove t/em. @/is model is more eicient t/an /ierarc/ical model' easier
to understand and can &e applied to many real $orld pro&lems t/at
re*uire routine transactions. @/e disadvantages are t/at5 It is a compleD
process to design and develop a net$ork data&aseJ It /as to &e reined
re*uentlyJ It re*uires t/at t/e relations/ips among all t/e records &e
deined &eore development starts' and c/anges oten demand major
programming eortsJ Operation and maintenance o t/e net$ork model
is eDpensive and time consuming.
EDamples o data&ase engines t/at /ave net$ork model capa&ilities are
7"M Em&edded and 7"M 2erver.
"epartment (
"epartment %
2tudent ( 2tudent % 2tudent C
!roject ( !roject %
=:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
)o$ever' t/e model /ad several disadvantages. Net$orkl programming
proved error4prone as data models &ecame more compleD' and small
c/anges to t/e data structure could re*uire c/anges to many programs.
(lso' &ecause o t/e use o p/ysical pointers' operations suc/ as
data&ase loading and restructuring could &e very time4consuming.
Conc"<t 'nd C!tor): @/e net$ork model is a data&ase model
conceived as a leDi&le $ay o representing o&jects and t/eir
relations/ips. Its original inventor $as C/arles %ac/man' and it $as
developed into a standard speciication pu&lis/ed in +:.: &y t/e
CO"(2AL Consortium. W/ere t/e /ierarc/ical model structures data
as a tree o records' $it/ eac/ record /aving one parent record and many
c/ildren' t/e net$ork model allo$s eac/ record to /ave multiple parent
and c/ild records' orming a lattice structure.
@/e c/ie argument in avour o t/e net$ork model' in comparison to
t/e /ierarc/ic model' $as t/at it allo$ed a more natural modeling o
relations/ips &et$een entities. (lt/oug/ t/e model $as $idely
implemented and used' it ailed to &ecome dominant or t$o main
reasons. ?irstly' I%M c/ose to stick to t/e /ierarc/ical model $it/ semi4
net$ork eDtensions in t/eir esta&lis/ed products suc/ as IM2 and "L-I.
2econdly' it $as eventually displaced &y t/e relational model' $/ic/
oered a /ig/er4level' more declarative interace. Until t/e early +:<9s
t/e perormance &eneits o t/e lo$4level navigational interaces oered
&y /ierarc/ical and net$ork data&ases $ere persuasive or many large4
scale applications' &ut as /ard$are &ecame aster' t/e eDtra productivity
and leDi&ility o t/e relational model led to t/e gradual o&solescence o
t/e net$ork model in corporate enterprise usage.
,., O$%"ct@R"('ton'( D't'$'!"
(n o&ject4relational data&ase BO7"C or o&ject4relational data&ase
management system BO7"%M2C is a data&ase management system
B"%M2C similar to a relational data&ase' &ut $it/ an o&ject4oriented
data&ase model5 o&jects' classes and in/eritance are directly supported
in data&ase sc/emas and in t/e *uery language. In addition' it supports
eDtension o t/e data model $it/ custom data4types and met/ods.
One aim or t/is type o system is to &ridge t/e gap &et$een conceptual
data modeling tec/ni*ues suc/ as Entity4relations/ip diagram BE7"C
and o&ject4relational mapping BO7MC' $/ic/ oten use classes and
in/eritance' and relational data&ases' $/ic/ do not directly support
t/em.
(not/er' related' aim is to &ridge t/e gap &et$een relational data&ases
and t/e o&ject4oriented modeling tec/ni*ues used in programming
,9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
languages suc/ as #ava' CRR or CY )o$ever' a more popular alternative
or ac/ieving suc/ a &ridge is to use a standard relational data&ase
systems $it/ some orm o O7M sot$are.
W/ereas traditional 7"%M2 or 2FL4"%M2 products ocused on t/e
eicient management o data dra$n rom a limited set o data4types
Bdeined &y t/e relevant language standardsC' an o&ject4relational "%M2
allo$s sot$are4developers to integrate t/eir o$n types and t/e met/ods
t/at apply to t/em into t/e "%M2. O7"%M2 tec/nology aims to allo$
developers to raise t/e level o a&straction at $/ic/ t/ey vie$ t/e
pro&lem domain. @/is goal is not universally s/aredJ proponents o
relational data&ases oten argue t/at o&ject4oriented speciication lo1ers
t/e a&straction level.
(n o&ject4relational data&ase can &e said to provide a middle ground
&et$een relational data&ases and ob%ect#oriented databases
BOO"%M2C. In o&ject4relational data&ases' t/e approac/ is essentially
t/at o relational data&ases5 t/e data resides in t/e data&ase and is
manipulated collectively $it/ *ueries in a *uery languageJ at t/e ot/er
eDtreme are OO"%M2es in $/ic/ t/e data&ase is essentially a persistent
o&ject store or sot$are $ritten in an o&ject4oriented programming
language' $it/ a programming (!I or storing and retrieving o&jects'
and little or no speciic support or *uerying.
Many 2FL O7"%M2s on t/e market today are eDtensi&le $it/ user4
deined types BU"@C and custom4$ritten unctions Be.g. stored
procedures. 2ome Be.g. 2FL 2erverC allo$ suc/ unctions to &e $ritten
in o&ject4oriented programming languages' &ut t/is &y itsel doesnIt
make t/em o&ject4oriented data&asesJ in an o&ject4oriented data&ase'
o&ject orientation is a eature o t/e data model.
,.8 O$%"ct D't'$'!"
In an o$%"ct d't'$'!" Balso o$%"ct or"nt"d d't'$'!"C' inormation is
represented in t/e orm o o&jects as used in o&ject4oriented
programming. W/en data&ase capa&ilities are com&ined $it/ o&ject
programming language capa&ilities' t/e result is an o&ject data&ase
management system BO"%M2C. (n O"%M2 makes data&ase o&jects
appear as programming language o&jects in one or more o&ject
programming languages. (n O"%M2 eDtends t/e programming
language $it/ transparently persistent data' concurrency control' data
recovery' associative *ueries' and ot/er capa&ilities.
2ome o&ject4oriented data&ases are designed to $ork $ell $it/ o&ject4
oriented programming languages suc/ as !yt/on' #ava' CY' 0isual %asic
.NE@' CRR' O&jective4C and 2malltalk. Ot/ers /ave t/eir o$n
,+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
programming languages. (n O"%M2s use eDactly t/e same model as
o&ject4oriented programming languages.
O&ject data&ases are generally recommended $/en t/ere is a &usiness
need or /ig/ perormance processing on compleD data.
Ado<ton o/ O$%"ct D't'$'!"!
O&ject data&ases &ased on persistent programming ac*uired a nic/e in
application areas suc/ as engineering and spatial data&ases'
telecommunications' and scientiic areas suc/ as /ig/ energy p/ysics
and molecular &iology. @/ey /ave made little impact on mainstream
commercial data processing' t/oug/ t/ere is some usage in specialiEed
areas o inancial service
Z
. It is also $ort/ noting t/at o&ject data&ases
/eld t/e record or t/e WorldIs largest data&ase B&eing irst to /old over
+999 @era&ytes at 2tanord Linear (ccelerator Center MLessons Learned
?rom Managing ( !eta&yteMC and t/e /ig/est ingest rate ever recorded
or a commercial data&ase at over one @era&yte per /our.
(not/er group o o&ject data&ases ocuses on em&edded use in devices'
packaged sot$are' and realtime systems.
Ad&'nt'-"! 'nd D!'d&'nt'-"!
%enc/marks &et$een O"%M2s and 7"%M2s /ave s/o$n t/at an
O"%M2 can &e clearly superior or certain kinds o tasks. @/e main
reason or t/is is t/at many operations are perormed using navigational
rat/er t/an declarative interaces' and navigational access to data is
usually implemented very eiciently &y ollo$ing pointers.
Critics o navigational data&ase4&ased tec/nologies like O"%M2
suggest t/at pointer4&ased tec/ni*ues are optimiEed or very speciic
Msearc/ routesM or vie$points. )o$ever' or general4purpose *ueries on
t/e same inormation' pointer4&ased tec/ni*ues $ill tend to &e slo$er
and more diicult to ormulate t/an relational. @/us' navigation appears
to simpliy speciic kno$n uses at t/e eDpense o general' unoreseen'
and varied uture uses. )o$ever' $it/ suita&le language support' direct
o&ject reerences may &e maintained in addition to normalised' indeDed
aggregations' allo$ing &ot/ kinds o accessJ urt/ermore' a persistent
language may indeD aggregations on $/atever is returned &y some
ar&itrary o&ject access met/od' rat/er t/an only on attri&ute value'
$/ic/ can simpliy some *ueries.
Ot/er t/ings t/at $ork against an O"%M2 seem to &e t/e lack o
interopera&ility $it/ a great num&er o tools-eatures t/at are taken or
granted in t/e 2FL $orld including &ut not limited to industry standard
,8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
connectivity' reporting tools' OL(! tools' and &ackup and recovery
standards. (dditionally' o&ject data&ases lack a ormal mat/ematical
oundation' unlike t/e relational model' and t/is in turn leads to
$eaknesses in t/eir *uery support. )o$ever' t/is o&jection is oset &y
t/e act t/at some O"%M2s ully support 2FL in addition to
navigational access' e.g. O&jectivity-2FLRR' Matisse' and Inter2ystems
C(C)[. Eective use may re*uire compromises to keep &ot/
paradigms in sync.
In act t/ere is an intrinsic tension &et$een t/e notion o encapsulation'
$/ic/ /ides data and makes it availa&le only t/roug/ a pu&lis/ed set o
interace met/ods' and t/e assumption underlying muc/ data&ase
tec/nology' $/ic/ is t/at data s/ould &e accessi&le to *ueries &ased on
data content rat/er t/an predeined access pat/s. "ata&ase4centric
t/inking tends to vie$ t/e $orld t/roug/ a declarative and attri&ute4
driven vie$point' $/ile OO! tends to vie$ t/e $orld t/roug/ a
&e/avioral vie$point' maintaining entity4identity independently o
c/anging attri&utes. @/is is one o t/e many impedance mismatc/ issues
surrounding OO! and data&ases.
(lt/oug/ some commentators /ave $ritten o o&ject data&ase
tec/nology as a ailure' t/e essential arguments in its avor remain valid'
and attempts to integrate data&ase unctionality more closely into o&ject
programming languages continue in &ot/ t/e researc/ and t/e industrial
communities.
,.5 A!!oc't&" Mod"( o/ D't'
@/e '!!oc't&" #od"( o/ d't' is an alternative data model or data&ase
systems. Ot/er data models' suc/ as t/e relational model and t/e o&ject
data model' are record4&ased. @/ese models involve encompassing
attri&utes a&out a t/ing' suc/ as a car' in a record structure. 2uc/
attri&utes mig/t &e registration' colour' make' model' etc. In t/e
associative model' everyt/ing $/ic/ /as Tdiscrete independent
eDistenceU is modeled as an entity' and relations/ips &et$een t/em are
modeled as associations. @/e granularity at $/ic/ data is represented is
similar to sc/emes presented &y C/en BEntity4relations/ip modelCJ
%racc/i' !aolini and !elagatti B%inary 7elationsCJ and 2enko B@/e Entity
2et ModelC.
,.? Co(u#n@Or"nt"d DBMS
( co(u#n@or"nt"d DBMS is a data&ase management system B"%M2C
$/ic/ stores its content &y column rat/er t/an &y ro$. @/is /as
advantages or data&ases suc/ as data $are/ouses and li&rary
,=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
catalogues' $/ere aggregates are computed over large num&ers o
similar data items.
B"n"/t!
Comparisons &et$een ro$4oriented and column4oriented systems are
typically concerned $it/ t/e eiciency o /ard4disk access or a given
$orkload' as seek time is incredi&ly long compared to t/e ot/er delays
in computers. ?urt/er' &ecause seek time is improving at a slo$ rate
relative to cpu po$er Bsee MooreIs La$C' t/is ocus $ill likely continue
on systems reliant on /ard4disks or storage. ?ollo$ing is a set o over4
simpliied o&servations $/ic/ attempt to paint a picture o t/e trade4os
&et$een column and ro$ oriented organiEations.
+. Column4oriented systems are more eicient $/en an aggregate
needs to &e computed over many ro$s &ut only or a nota&ly smaller
su&set o all columns o data' &ecause reading t/at smaller su&set o
data can &e aster t/an reading all data.
8. Column4oriented systems are more eicient $/en ne$ values o a
column are supplied or all ro$s at once' &ecause t/at column data
can &e $ritten eiciently and replace old column data $it/out
touc/ing any ot/er columns or t/e ro$s.
=. 7o$4oriented systems are more eicient $/en many columns o a
single ro$ are re*uired at t/e same time' and $/en ro$4siEe is
relatively small' as t/e entire ro$ can &e retrieved $it/ a single disk
seek.
,. 7o$4oriented systems are more eicient $/en $riting a ne$ ro$ i
all o t/e column data is supplied at t/e same time' as t/e entire ro$
can &e $ritten $it/ a single disk seek.
In practice' ro$ oriented arc/itectures are $ell4suited or OL@!4like
$orkloads $/ic/ are more /eavily loaded $it/ interactive transactions.
Column stores are $ell4suited or OL(!4like $orkloads Be.g.' data
$are/ousesC $/ic/ typically involve a smaller num&er o /ig/ly
compleD *ueries over all data Bpossi&ly tera&ytesC.
Stor'-" E//c"nc) &!. R'ndo# Acc"!!
Column data is o uniorm typeJ t/ereore' t/ere are some opportunities
or storage siEe optimiEations availa&le in column oriented data t/at are
not availa&le in ro$ oriented data. ?or eDample' many popular modern
compression sc/emes' suc/ as L\W' make use o t/e similarity o
adjacent data to compress. W/ile t/e same tec/ni*ues may &e used on
,,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
ro$4oriented data' a typical implementation $ill ac/ieve less eective
results. ?urt/er' t/is &e/avior &ecomes more dramatic $/en a large
percentage o adjacent column data is eit/er t/e same or not4present'
suc/ as in a sparse column Bsimilar to a sparse matriDC. @/e opposing
tradeo is 7andom (ccess. 7etrieving all data rom a single ro$ is
more eicient $/en t/at data is located in a single location' suc/ as in a
ro$4oriented arc/itecture. ?urt/er' t/e greater adjacent compression
ac/ieved' t/e more diicult random4access may &ecome' as data mig/t
need to &e uncompressed to &e read.
I#<("#"nt'ton!
?or many years' only t/e 2y&ase IF product $as commonly availa&le in
t/e column4oriented "%M2 class. )o$ever' t/at /as c/anged rapidly in
t/e last e$ years $it/ many open source and commercial
implementations.
,.7 N'&-'ton'( D't'$'!"
N'&-'ton'( d't'$'!"! are c/aracteriEed &y t/e act t/at o&jects in t/e
data&ase are ound primarily &y ollo$ing reerences rom ot/er o&jects.
@raditionally navigational interaces are procedural' t/oug/ one could
c/aracteriEe some modern systems like Q!at/ as &eing simultaneously
navigational and declarative.
Navigational access is traditionally associated $it/ t/e net$ork model
and /ierarc/ical model o data&ase interaces and /ave evolved into 2et4
oriented systems. Navigational tec/ni*ues use MpointersM and Mpat/sM to
navigate among data records Balso kno$n as MnodesMC. @/is is in contrast
to t/e relational model Bimplemented in relational data&asesC' $/ic/
strives to use MdeclarativeM or logic programming tec/ni*ues in $/ic/
you ask t/e system or 1hat you $ant instead o ho1 to navigate to it.
?or eDample' to give directions to a /ouse' t/e navigational approac/
$ould resem&le somet/ing like' MGet on /ig/$ay 81 or < miles' turn
onto )orse 7oad' let at t/e red &arn' t/en stop at t/e =rd /ouse do$n
t/e roadM. W/ereas' t/e declarative approac/ $ould resem&le' M0isit t/e
green /ouseBsC $it/in t/e ollo$ing coordinates....M
)ierarc/ical models are also considered navigational &ecause one
MgoesM up Bto parentC' do$n Bto leavesC' and t/ere are Mpat/sM' suc/ as
t/e amiliar ile-older pat/s in /ierarc/ical ile systems. In general'
navigational systems $ill use com&inations o pat/s and prepositions
suc/ as MneDtM' MpreviousM' MirstM' MlastM' MupM' Mdo$nM' etc.
,1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
2ome also suggest t/at navigational data&ase engines are easier to &uild
and take up less memory B7(MC t/an relational e*uivalents. )o$ever'
t/e eDistence o relational or relational4&ased products o t/e late +:<9s
t/at possessed small engines B&y todayIs standardsC &ecause t/ey did not
use 2FL suggest t/is is not necessarily t/e case. W/atever t/e reason'
navigational tec/ni*ues are still t/e preerred $ay to /andle smaller4
scale structures.
( current eDample o navigational structuring can &e ound in t/e
"ocument O&ject Model B"OMC oten used in $e& &ro$sers and closely
associated $it/ #ava2cript. @/e "OM MengineM is essentially a lig/t4
$eig/t navigational data&ase. @/e World Wide We& itsel and
Wikipedia could even &e considered orms o navigational data&ases.
BOn a large scale' t/e We& is a net$ork model and on smaller or local
scales' suc/ as domain and U7L partitioning' it uses /ierarc/ies.C
,.8 D!tr$ut"d D't'$'!"
( d!tr$ut"d d't'$'!" is a data&ase t/at is under t/e control o a
central data&ase management system B"%M2C in $/ic/ storage devices
are not all attac/ed to a common C!U. It may &e stored in multiple
computers located in t/e same p/ysical location' or may &e dispersed
over a net$ork o interconnected computers.
Collections o data Be.g. in a data&aseC can &e distri&uted across multiple
p/ysical locations. ( distri&uted data&ase is distri&uted into separate
partitions-ragments. Eac/ partition-ragment o a distri&uted data&ase
may &e replicated Bi.e. redundant ail4overs' 7(I" likeC.
%esides distri&uted data&ase replication and ragmentation' t/ere are
many ot/er distri&uted data&ase design tec/nologies. ?or eDample' local
autonomy' sync/ronous and async/ronous distri&uted data&ase
tec/nologies. @/ese tec/nologiesI implementation can and does depend
on t/e needs o t/e &usiness and t/e sensitivity-conidentiality o t/e
data to &e stored in t/e data&ase' and /ence t/e price t/e &usiness is
$illing to spend on ensuring data security' consistency and integrity.
I#<ort'nt con!d"r'ton!
Care $it/ a distri&uted data&ase must &e taken to ensure t/e ollo$ing5
@/e distri&ution is transparent S users must &e a&le to interact $it/
t/e system as i it $ere one logical system. @/is applies to t/e
systemIs perormance' and met/ods o access amongst ot/er t/ings.
@ransactions are transparent S eac/ transaction must maintain
data&ase integrity across multiple data&ases. @ransactions must also
,.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
&e divided into su&transactions' eac/ su&transaction aecting one
data&ase system.
Ad&'nt'-"! o/ D!tr$ut"d D't'$'!"!
7elects organiEational structure S data&ase ragments are
located in t/e departments t/ey relate to.
Local autonomy S a department can control t/e data a&out t/em
Bas t/ey are t/e ones amiliar $it/ it.C
Improved availa&ility S a ault in one data&ase system $ill only
aect one ragment' instead o t/e entire data&ase.
Improved perormance S data is located near t/e site o greatest
demand' and t/e data&ase systems t/emselves are paralleliEed'
allo$ing load on t/e data&ases to &e &alanced among servers. B(
/ig/ load on one module o t/e data&ase $onIt aect ot/er
modules o t/e data&ase in a distri&uted data&ase.C
Economics S it costs less to create a net$ork o smaller
computers $it/ t/e po$er o a single large computer.
Modularity S systems can &e modiied' added and removed rom
t/e distri&uted data&ase $it/out aecting ot/er modules
BsystemsC.
D!'d&'nt'-"! o/ D!tr$ut"d D't'$'!"!
CompleDity S eDtra $ork must &e done &y t/e "%(s to ensure
t/at t/e distri&uted nature o t/e system is transparent. EDtra $ork
must also &e done to maintain multiple disparate systems' instead
o one &ig one. EDtra data&ase design $ork must also &e done to
account or t/e disconnected nature o t/e data&ase S or
eDample' joins &ecome pro/i&itively eDpensive $/en perormed
across multiple systems.
Economics S increased compleDity and a more eDtensive
inrastructure means eDtra la&our costs.
2ecurity S remote data&ase ragments must &e secured' and t/ey
are not centraliEed so t/e remote sites must &e secured as $ell.
@/e inrastructure must also &e secured Be.g.' &y encrypting t/e
net$ork links &et$een remote sitesC.
"iicult to maintain integrity S in a distri&uted data&ase'
enorcing integrity over a net$ork may re*uire too muc/ o t/e
net$orkIs resources to &e easi&le.
IneDperience S distri&uted data&ases are diicult to $ork $it/'
and as a young ield t/ere is not muc/ readily availa&le
eDperience on proper practice.
Lack o standards O t/ere are no tools or met/odologies yet to
/elp users convert a centraliEed "%M2 into a distri&uted "%M2.
,;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
"ata&ase design more compleD O &esides o t/e normal
diiculties' t/e design o a distri&uted data&ase /as to consider
ragmentation o data' allocation o ragments to speciic sites
and data replication.
,.F R"'( T#" D't'$'!"
( r"'(@t#" d't'$'!" is a processing system designed to /andle
$orkloads $/ose state is constantly c/anging B%uc/mannC. @/is diers
rom traditional data&ases containing persistent data' mostly unaected
&y time. ?or eDample' a stock market c/anges very rapidly and is
dynamic. @/e grap/s o t/e dierent markets appear to &e very unsta&le
and yet a data&ase /as to keep track o current values or all o t/e
markets o t/e Ne$ Aork 2tock EDc/ange B3anitkarC. 7eal4time
processing means t/at a transaction is processed ast enoug/ or t/e
result to come &ack and &e acted on rig/t a$ay BCapronC. 7eal4time
data&ases are useul or accounting' &anking' la$' medical records'
multi4media' process control' reservation systems' and scientiic data
analysis B2nodgrassC. (s computers increase in po$er and can store
more data' t/ey are integrating t/emselves into our society and are
employed in many applications.
O&"r&"2
7eal4time data&ases are traditional data&ases t/at use an eDtension to
give t/e additional po$er to yield relia&le responses. @/ey use timing
constraints t/at represent a certain range o values or $/ic/ t/e data are
valid. @/is range is called temporal validity. ( conventional data&ase
cannot $ork under t/ese circumstances &ecause t/e inconsistencies
&et$een t/e real $orld o&jects and t/e data t/at represents t/em are too
severe or simple modiications. (n eective system needs to &e a&le to
/andle time4sensitive *ueries' return only temporally valid data' and
support priority sc/eduling. @o enter t/e data in t/e records' oten a
sensor or an input device monitors t/e state o t/e p/ysical system and
updates t/e data&ase $it/ ne$ inormation to relect t/e p/ysical system
more accurately B(&&otC. W/en designing a real4time data&ase system'
one s/ould consider /o$ to represent valid time' /o$ acts are
associated $it/ real4time system. (lso' consider /o$ to represent
attri&ute values in t/e data&ase so t/at process transactions and data
consistency /ave no violations B(&&otC.
W/en designing a system' it is important to consider $/at t/e system
s/ould do $/en deadlines are not met. ?or eDample' an air4traic
control system constantly monitors /undreds o aircrat and makes
decisions a&out incoming lig/t pat/s and determines t/e order in $/ic/
aircrat s/ould land &ased on data suc/ as uel' altitude' and speed. I
,<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
any o t/is inormation is late' t/e result could &e devastating
B2ivasankaranC. @o address issues o o&solete data' t/e timestamp can
support transactions &y providing clear time reerences B2ivasankaranC.
S=L DBMS
I%M started $orking on a prototype system loosely &ased on CoddIs
concepts as S)!t"# R in t/e early +:;9s S unortunately' 2ystem 7
$as conceived as a $ay o proving CoddIs ideas unimplementa&le' and
t/us t/e project $as delivered to a group o programmers $/o $ere not
under CoddIs supervision' never understood /is ideas ully and ended up
violating several undamentals o t/e relational model. @/e irst
M*uickieM version $as ready in +:;,-1' and $ork t/en started on multi4
ta&le systems in $/ic/ t/e data could &e &roken do$n so t/at all o t/e
data or a record Bmuc/ o $/ic/ is oten optionalC did not /ave to &e
stored in a single large Mc/unkM. 2u&se*uent multi4user versions $ere
tested &y customers in +:;< and +:;:' &y $/ic/ time a standardiEed
*uery language' 2FL' /ad &een added. CoddIs ideas $ere esta&lis/ing
t/emselves as &ot/ $orka&le and superior to Codasyl' pus/ing I%M to
develop a true production version o 2ystem 7' kno$n as S=LBDS' and'
later' D't'$'!" + B"%8C.
Many o t/e people involved $it/ ING7E2 &ecame convinced o t/e
uture commercial success o suc/ systems' and ormed t/eir o$n
companies to commercialiEe t/e $ork &ut $it/ an 2FL interace.
2y&ase' InormiD' Non2top 2FL and eventually Ingres itsel $ere all
&eing sold as os/oots to t/e original ING7E2 product in t/e +:<9s.
Even Microsot 2FL 2erver is actually a re4&uilt version o 2y&ase' and
t/us' ING7E2. Only Larry EllisonKs Oracle started rom a dierent
c/ain' &ased on I%MIs papers on 2ystem 7' and &eat I%M to market
$/en t/e irst version $as released in +:;<.
2tone&raker $ent on to apply t/e lessons rom ING7E2 to develop a
ne$ data&ase' !ostgres' $/ic/ is no$ kno$n as !ostgre2FL.
!ostgre2FL is primarily used or glo&al mission critical applications
Bt/e .org and .ino domain name registries use it as t/eir primary data
store' as do many large companies and inancial institutionsC.
In 2$eden' CoddIs paper $as also read and Mimer 2FL $as developed
rom t/e mid4;9s at Uppsala University. In +:<,' t/is project $as
consolidated into an independent enterprise. In t/e early +:<9s' Mimer
introduced transaction /andling or /ig/ ro&ustness in applications' an
idea t/at $as su&se*uently implemented on most ot/er "%M2.
8.: CONCLUSION
,:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
@/e evolution o data&ase models is continuous until a time an ideal
model $ill emerge t/at $ill meet all t/e re*uirements o end users. @/is
sound impossi&le &ecause t/ere can never &e a system t/at is completely
ault4ree. @/us $e $ill yet see more o models o data&ase. @/e lat
and /ierarc/ical models /ad set t/e tune or emerging models.
5.: SUMMARY
In a /ierarc/ical model' data is organiEed into an inverted tree4
like structure' implying a multiple do$n$ard link in eac/ node to
descri&e t/e nesting' and a sort ield to keep t/e records in a
particular order in eac/ same4level list.
In t/e net$ork model' records can participate in any num&er o
named relations/ips. Eac/ relations/ip associates a record o one
type Bcalled t/e o2n"rC $it/ multiple records o anot/er type Bcalled
t/e #"#$"rC.
(n o&ject4relational data&ase BO7"C or o&ject4relational data&ase
management system BO7"%M2C is a data&ase management system
B"%M2C similar to a relational data&ase' &ut $it/ an o&ject4oriented
data&ase model5 o&jects' classes and in/eritance are directly
supported in data&ase sc/emas and in t/e *uery language.
In an o$%"ct d't'$'!" Balso o$%"ct or"nt"d d't'$'!"C' inormation
is represented in t/e orm o o&jects as used in o&ject4oriented
programming.
@/e '!!oc't&" #od"( o/ d't' is an alternative data model or
data&ase systems. Ot/er data models' suc/ as t/e relational model
and t/e o&ject data model' are record4&ased.
( co(u#n@or"nt"d DBMS is a data&ase management system
B"%M2C $/ic/ stores its content &y column rat/er t/an &y ro$. @/is
/as advantages or data&ases suc/ as data $are/ouses and li&rary
catalogues' $/ere aggregates are computed over large num&ers o
similar data items
N'&-'ton'( d't'$'!"! are c/aracteriEed &y t/e act t/at o&jects in
t/e data&ase are ound primarily &y ollo$ing reerences rom ot/er
o&jects.
( d!tr$ut"d d't'$'!" is a data&ase t/at is under t/e control o a
central data&ase management system B"%M2C in $/ic/ storage
devices are not all attac/ed to a common C!U
( real4time data&ase is a processing system designed to /andle
$orkloads $/ose state is constantly c/anging B%uc/mannC. @/is
diers rom traditional data&ases containing persistent data' mostly
unaected &y time
?.: TUTOR@MARAED ASSIGNMENT
+. Mention 1 models o data&ases
19
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
8. %riely discuss t/e advantages and disadvantages o distri&uted
data&ases
7.: REFERENCESBFURTCER READINGS
C/arles W. %ac/man' The (rogrammer as 'avigator. (CM @uring
($ard Lecture' Communications o t/e (CM' 0olume +.' Issue
++' +:;=' pp. .1=4.1<' I22N 999+49;<8' doi5
+9.++,1-=11.++.=.81=,.
2tone&raker' Mic/ael $it/ Moore' "orot/y. 2b%ect#)elational D$M0s:
The 'ext reat 8ave. Morgan 3aumann !u&lis/ers' +::..
I2%N +411<.94=:;48.
@/ere $as' at t/e @ime' 2ome "ispute W/et/er t/e @erm $as coined &y
Mic/ael 2tone&raker o Illustra or Won 3im o Uni2FL.
3im' Won. Introduction to 2b%ect#2riented Databases. @/e MI@ !ress'
+::9. I2%N 948.84+++8,4+.
%ancil/on' ?rancoisJ "elo&el' ClaudeJ and 3anellakis' !aris. $uilding
an 2b%ect#2riented Database 0ystem: The 0tory of 2
/
. Morgan
3aumann !u&lis/ers' +::8. I2%N +411<.94+.:4,.
C42tore5 ( column4oriented "%M2' 2tone&raker et al' !roceedings o
t/e =+st 0L"% Conerence' @rond/eim' Nor$ay' 8991
%]a^e$icE' #acekJ 3r_liko$ski' \&ysEkoJ MorEy' @adeusE B899=C.
9andbook on Data Management in Information 0ystems.
2pringer' pp. +<. I2%N =1,9,=<:=:.
M. @. OEsu and !. 0aldurieE' (rinciples of Distributed Databases B8nd
editionC' !rentice4)all' I2%N 94+=4.1:;9;4. ?ederal 2tandard
+9=;C.
Elmasri and Navat/e' 4undamentals of Database 0ystems B=rd editionC'
(ddison4Wesley Longman' I2%N 9489+41,8.=4=.
(&&ot' 7o&ert 3.' and )ector Garcia4Molina. 2c/eduling 7eal4@ime
@ransactions5 a !erormance Evaluation. 2tanord University and
"igital E*uipment Corp. (CM' +::8. += "ec. 899. .
%uc/mann' (. M7eal @ime "ata&ase 2ystems.M Encyclopedia o
"ata&ase @ec/nologies and (pplications. Ed. Laura C. 7ivero'
#orge ). "oorn' and 0iviana E. ?erraggine. Idea Group' 8991.
1+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
2tankovic' #o/n (.' Marco 2puri' 3rit/i 7amamrit/am' and Giorgio C.
%uttaEEo. "eadline 2c/eduling or 7eal4@ime 2ystems5 E"? and
7elated (lgorit/ms. 2pringer' +::<.
UNIT 5 DATABASE MODELS: RELATIONAL MODEL
CONTENTS
+.9 Introduction
8.9 O&jectives
18
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
=.9 Main Content
=.+ T1" Mod"(
=.8 Int"r<r"t'ton
=.= A<<(c'ton to D't'$'!"!
=., A(t"rn't&"! to t1" R"('ton'( Mod"(
=.1 C!tor)
=.. S=L 'nd t1" R"('ton'( Mod"(
=.; I#<("#"nt'ton
=.< Contro&"r!"!
=.: D"!-n
,.*: S"t@T1"or"tc For#u('ton
,.** A") Con!tr'nt! 'nd Functon'( D"<"nd"nc"!
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
@/e relational model or data&ase management is a data&ase model
&ased on irst4order predicate logic' irst ormulated and proposed in
+:.: &y Edgar Codd
Its core idea is to descri&e a data&ase as a collection o predicates over a
inite set o predicate varia&les' descri&ing constraints on t/e possi&le
values and com&inations o values. @/e content o t/e data&ase at any
given time is a inite model BlogicC o t/e data&ase' i.e. a set o
r"('ton!' one per predicate varia&le' suc/ t/at all predicates are
satisied. ( re*uest or inormation rom t/e data&ase Ba data&ase *ueryC
is also a predicate.
@/e purpose o t/e relational model is to provide a declarative met/od
or speciying data and *ueries5 $e directly state $/at inormation t/e
data&ase contains and $/at inormation $e $ant rom it' and let t/e
data&ase management system sot$are take care o descri&ing data
structures or storing t/e data and retrieval procedures or getting
*ueries ans$ered.
I%M implemented CoddIs ideas $it/ t/e "%8 data&ase management
systemJ it introduced t/e 2FL data deinition and *uery language. Ot/er
relational data&ase management systems ollo$ed' most o t/em using
2FL as $ell. ( table in an 2FL data&ase sc/ema corresponds to a
predicate varia&leJ t/e contents o a ta&le to a relationJ key constraints'
ot/er constraints' and 2FL *ueries correspond to predicates. )o$ever' it
must &e noted t/at 2FL data&ases' including "%8' deviate rom t/e
relational model in many detailsJ Codd iercely argued against
1=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
deviations t/at compromise t/e original principles.
Z
+.: OB;ECTIVES
(t t/e end o t/is unit' t/e you s/ould &e a&le to5
deine relational model o data&ase
understand and eDplain t/e concept &e/ind relational models
ans$er t/e *uestion o /o$ to interpret a relational data&ase model
kno$ t/e various applications o relational data&ase
compare relational model $it/ t/e structured *uery language B2FLC
kno$ t/e constraints and controversies associated $it/ relational
data&ase model.
F-ur" *: R"('ton'( Structur"
"epartment @a&le
D"<tno Dn'#" D(oc D#-r
"ept (
"ept %
"ept C
E#<(o)"" T'$("
E#<no En'#" Ett(" E!'('r) D"<tno
E#< * D"<t A
E#< + D"<t B
E#< , D"<t C
E#< 8 D"<t D
E#< 5 D"<t E
E#< ? D"<t F
,.: MAIN CONTENT
,.* T1" Mod"(
@/e undamental assumption o t/e relational model is t/at all data is
represented as mat/ematical n4ary r"('ton!' an n4ary relation &eing a
su&set o t/e Cartesian product o n domains. In t/e mat/ematical
model' reasoning a&out suc/ data is done in t$o4valued predicate logic'
meaning t/ere are t$o possi&le evaluations or eac/ proposition5 eit/er
1,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
true or false Band in particular no t/ird value suc/ as unkno1n' or not
applicable' eit/er o $/ic/ are oten associated $it/ t/e concept o
NULLC. 2ome t/ink t$o4valued logic is an important part o t/e
relational model' $/ere ot/ers t/ink a system t/at uses a orm o t/ree4
valued logic can still &e considered relational
Z
"ata are operated upon &y means o a relational calculus or relational
alge&ra' t/ese &eing e*uivalent in eDpressive po$er.
@/e relational model o data permits t/e data&ase designer to create a
consistent' logical representation o inormation. Consistency is
ac/ieved &y including declared constraints in t/e data&ase design'
$/ic/ is usually reerred to as t/e logical sc/ema. @/e t/eory includes a
process o data&ase normaliEation $/ere&y a design $it/ certain
desira&le properties can &e selected rom a set o logically e*uivalent
alternatives. @/e access plans and ot/er implementation and operation
details are /andled &y t/e "%M2 engine' and are not relected in t/e
logical model. @/is contrasts $it/ common practice or 2FL "%M2s in
$/ic/ perormance tuning oten re*uires c/anges to t/e logical model.
@/e &asic relational &uilding &lock is t/e domain or data type' usually
a&&reviated no$adays to type. ( tuple is an unordered set o attribute
values. (n attri&ute is an ordered pair o attribute name and type name.
(n attri&ute value is a speciic valid value or t/e type o t/e attri&ute.
@/is can &e eit/er a scalar value or a more compleD type.
( relation consists o a heading and a body. ( /eading is a set o
attri&utes. ( &ody Bo an n4ary relationC is a set o n4tuples. @/e /eading
o t/e relation is also t/e /eading o eac/ o its tuples.
( relation is deined as a set o n4tuples. In &ot/ mat/ematics and t/e
relational data&ase model' a set is an unordered collection o items'
alt/oug/ some "%M2s impose an order to t/eir data. In mat/ematics' a
tuple /as an order' and allo$s or duplication. E.?. Codd originally
deined tuples using t/is mat/ematical deinition. Later' it $as one o
E.?. CoddKs great insig/ts t/at using attri&ute names instead o an
ordering $ould &e so muc/ more convenient Bin generalC in a computer
language &ased on relations. @/is insig/t is still &eing used today.
@/oug/ t/e concept /as c/anged' t/e name MtupleM /as not. (n
immediate and important conse*uence o t/is distinguis/ing eature is
t/at in t/e relational model t/e Cartesian product &ecomes commutative.
( ta&le is an accepted visual representation o a relationJ a tuple is
similar to t/e concept o ro1' &ut note t/at in t/e data&ase language
2FL t/e columns and t/e ro$s o a ta&le are ordered.
11
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
( relvar is a named varia&le o some speciic relation type' to $/ic/ at
all times some relation o t/at type is assigned' t/oug/ t/e relation may
contain Eero tuples.
@/e &asic principle o t/e relational model is t/e Inormation !rinciple5
all inormation is represented &y data values in relations. In accordance
$it/ t/is !rinciple' a relational data&ase is a set o relvars and t/e result
o every *uery is presented as a relation.
@/e consistency o a relational data&ase is enorced' not &y rules &uilt
into t/e applications t/at use it' &ut rat/er &y constraints' declared as
part o t/e logical sc/ema and enorced &y t/e "%M2 or all
applications. In general' constraints are eDpressed using relational
comparison operators' o $/ic/ just one' Mis su&set oM B C' is
t/eoretically suicient. In practice' several useul s/ort/ands are
eDpected to &e availa&le' o $/ic/ t/e most important are candidate key
Breally' superkeyC and oreign key constraints.
,.+ Int"r<r"t'ton
@o ully appreciate t/e relational model o data it is essential to
understand t/e intended interpretation o a relation.
@/e &ody o a relation is sometimes called its eDtension. @/is is &ecause
it is to &e interpreted as a representation o t/e eDtension o some
predicate' t/is &eing t/e set o true propositions t/at can &e ormed &y
replacing eac/ ree varia&le in t/at predicate &y a name Ba term t/at
designates somet/ingC.
@/ere is a one4to4one correspondence &et$een t/e ree varia&les o t/e
predicate and t/e attri&ute names o t/e relation /eading. Eac/ tuple o
t/e relation &ody provides attri&ute values to instantiate t/e predicate &y
su&stituting eac/ o its ree varia&les. @/e result is a proposition t/at is
deemed' on account o t/e appearance o t/e tuple in t/e relation &ody'
to &e true. Contrari$ise' every tuple $/ose /eading conorms to t/at o
t/e relation &ut $/ic/ does not appear in t/e &ody is deemed to &e alse.
@/is assumption is kno$n as t/e closed $orld assumption
?or a ormal eDposition o t/ese ideas' see t/e section S"t T1"or)
For#u('ton' &elo$.
,., A<<(c'ton to D't'$'!"!
( t)<" as used in a typical relational data&ase mig/t &e t/e set o
integers' t/e set o c/aracter strings' t/e set o dates' or t/e t$o &oolean
values true and false' and so on. @/e corresponding t)<" n'#"! or
1.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
t/ese types mig/t &e t/e strings MintM' Mc/arM' MdateM' M&ooleanM' etc. It is
important to understand' t/oug/' t/at relational t/eory does not dictate
$/at types are to &e supportedJ indeed' no$adays provisions are
eDpected to &e availa&le or user#defined types in addition to t/e built#in
ones provided &y t/e system.
Attr$ut" is t/e term used in t/e t/eory or $/at is commonly reerred
to as a co(u#n. 2imilarly' t'$(" is commonly used in place o t/e
t/eoretical term r"('ton Bt/oug/ in 2FL t/e term is &y no means
synonymous $it/ relationC. ( ta&le data structure is speciied as a list o
column deinitions' eac/ o $/ic/ speciies a uni*ue column name and
t/e type o t/e values t/at are permitted or t/at column. (n 'ttr$ut"
&'(u" is t/e entry in a speciic column and ro$' suc/ as M#o/n "oeM or
M=1M.
( tu<(" is &asically t/e same t/ing as a ro2' eDcept in an 2FL "%M2'
$/ere t/e column values in a ro$ are ordered. B@uples are not orderedJ
instead' eac/ attri&ute value is identiied solely &y t/e 'ttr$ut" n'#"
and never &y its ordinal position $it/in t/e tuple.C (n attri&ute name
mig/t &e MnameM or MageM.
( r"('ton is a t'$(" structure deinition Ba set o column deinitionsC
along $it/ t/e data appearing in t/at structure. @/e structure deinition
is t/e 1"'dn- and t/e data appearing in it is t/e $od)' a set o ro$s. (
data&ase r"(&'r Brelation varia&leC is commonly kno$n as a $'!" t'$(".
@/e /eading o its assigned value at any time is as speciied in t/e ta&le
declaration and its &ody is t/at most recently assigned to it &y invoking
some u<d't" o<"r'tor Btypically' IN2E7@' U!"(@E' or "ELE@EC.
@/e /eading and &ody o t/e ta&le resulting rom evaluation o some
*uery are determined &y t/e deinitions o t/e operators used in t/e
eDpression o t/at *uery. BNote t/at in 2FL t/e /eading is not al$ays a
set o column deinitions as descri&ed a&ove' &ecause it is possi&le or a
column to /ave no name and also or t$o or more columns to /ave t/e
same name. (lso' t/e &ody is not al$ays a set o ro$s &ecause in 2FL it
is possi&le or t/e same ro$ to appear more t/an once in t/e same
&ody.C
,.8 A(t"rn't&"! to t1" R"('ton'( Mod"(
Ot/er models are t/e /ierarc/ical model and net$ork model. 2ome
systems using t/ese older arc/itectures are still in use today in data
centers $it/ /ig/ data volume needs or $/ere eDisting systems are so
compleD and a&stract it $ould &e cost pro/i&itive to migrate to systems
employing t/e relational modelJ also o note are ne$er o&ject4oriented
1;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
data&ases' even t/oug/ many o t/em are "%M24construction kits'
rat/er t/an proper "%M2s.
( recent development is t/e O&ject47elation type4O&ject model' $/ic/
is &ased on t/e assumption t/at any act can &e eDpressed in t/e orm o
one or more &inary relations/ips. @/e model is used in O&ject 7ole
Modeling BO7MC' 7"?-Notation = BN=C and in Gellis/ Englis/.
@/e relational model $as t/e irst ormal data&ase model. (ter it $as
deined' inormal models $ere made to descri&e /ierarc/ical data&ases
Bt/e /ierarc/ical modelC and net$ork data&ases Bt/e net$ork modelC.
)ierarc/ical and net$ork data&ases eDisted before relational data&ases'
&ut $ere only descri&ed as models after t/e relational model $as
deined' in order to esta&lis/ a &asis or comparison.
,.5 C!tor)
@/e relational model $as invented &y E.?. B@edC Codd as a general
model o data' and su&se*uently maintained and developed &y C/ris
"ate and )ug/ "ar$en among ot/ers. In @/e @/ird Maniesto Birst
pu&lis/ed in +::1C "ate and "ar$en s/o$ /o$ t/e relational model can
accommodate certain desired o&ject4oriented eatures.
,.? S=L 'nd t1" R"('ton'( Mod"(
2FL' initially pus/ed as t/e standard language or relational data&ases'
deviates rom t/e relational model in several places. @/e current I2O
2FL standard doesnIt mention t/e relational model or use relational
terms or concepts. )o$ever' it is possi&le to create a data&ase
conorming to t/e relational model using 2FL i one does not use
certain 2FL eatures.
@/e ollo$ing deviations rom t/e relational model /ave &een noted in
2FL. Note t/at e$ data&ase servers implement t/e entire 2FL standard
and in particular do not allo$ some o t/ese deviations. W/ereas NULL
is nearly u&i*uitous' or eDample' allo$ing duplicate column names
$it/in a ta&le or anonymous columns is uncommon.
Du<(c't" Ro2!
@/e same ro$ can appear more t/an once in an 2FL ta&le. @/e same
tuple cannot appear more t/an once in a relation.
Anon)#ou! Co(u#n!
1<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
( column in an 2FL ta&le can &e unnamed and t/us una&le to &e
reerenced in eDpressions. @/e relational model re*uires every attri&ute
to &e named and reerencea&le.
Du<(c't" Co(u#n N'#"!
@$o or more columns o t/e same 2FL ta&le can /ave t/e same name
and t/ereore cannot &e reerenced' on account o t/e o&vious
am&iguity. @/e relational model re*uires every attri&ute to &e
reerencea&le.
Co(u#n Ord"r S-n/c'nc"
@/e order o columns in an 2FL ta&le is deined and signiicant' one
conse*uence &eing t/at 2FLIs implementations o Cartesian product and
union are &ot/ noncommutative. @/e relational model re*uires t/at t/ere
s/ould &e o no signiicance to any ordering o t/e attri&utes o a
relation.
V"2! 2t1out CCECA OPTION
Updates to a vie$ deined $it/out C)EC3 O!@ION can &e accepted
&ut t/e resulting update to t/e data&ase does not necessarily /ave t/e
eDpressed eect on its target. ?or eDample' an invocation o IN2E7@
can &e accepted &ut t/e inserted ro$s mig/t not all appear in t/e vie$'
or an invocation o U!"(@E can result in ro$s disappearing rom t/e
vie$. @/e relational model re*uires updates to a vie$ to /ave t/e same
eect as i t/e vie$ $ere a &ase relvar.
Co(u#n("!! T'$("! Unr"co-n>"d
2FL re*uires every ta&le to /ave at least one column' &ut t/ere are t$o
relations o degree Eero Bo cardinality one and EeroC and t/ey are
needed to represent eDtensions o predicates t/at contain no ree
varia&les.
NULL
@/is special mark can appear instead o a value $/erever a value can
appear in 2FL' in particular in place o a column value in some ro$.
@/e deviation rom t/e relational model arises rom t/e act t/at t/e
implementation o t/is ad hoc concept in 2FL involves t/e use o t/ree4
valued logic' under $/ic/ t/e comparison o NULL $it/ itsel does not
yield true &ut instead yields t/e t/ird trut/ value' unkno1nJ similarly t/e
comparison NULL $it/ somet/ing ot/er t/an itsel does not yield false
&ut instead yields unkno1n. It is &ecause o t/is &e/aviour in
comparisons t/at NULL is descri&ed as a mark rat/er t/an a value. @/e
1:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
relational model depends on t/e la$ o eDcluded middle under $/ic/
anyt/ing t/at is not true is alse and anyt/ing t/at is not alse is trueJ it
also re*uires every tuple in a relation &ody to /ave a value or every
attri&ute o t/at relation. @/is particular deviation is disputed &y some i
only &ecause E.?. Codd /imsel eventually advocated t/e use o special
marks and a ,4valued logic' &ut t/is $as &ased on /is o&servation t/at
t/ere are t$o distinct reasons $/y one mig/t $ant to use a special mark
in place o a value' $/ic/ led opponents o t/e use o suc/ logics to
discover more distinct reasons and at least as many as +: /ave &een
noted' $/ic/ $ould re*uire a 8+4valued logic. 2FL itsel uses NULL or
several purposes ot/er t/an to represent Mvalue unkno$nM. ?or eDample'
t/e sum o t/e empty set is NULL' meaning Eero' t/e average o t/e
empty set is NULL' meaning undeined' and NULL appearing in t/e
result o a LE?@ #OIN can mean Mno value &ecause t/ere is no matc/ing
ro$ in t/e rig/t4/and operandM.
Conc"<t!
2FL uses concepts Mta&leM' McolumnM' Mro$M instead o MrelvarM'
Mattri&uteM' MtupleM. @/ese are not merely dierences in terminology. ?or
eDample' a Mta&leM may contain duplicate ro$s' $/ereas t/e same tuple
cannot appear more t/an once in a relation.
,.7 I#<("#"nt'ton
@/ere /ave &een several attempts to produce a true implementation o
t/e relational data&ase model as originally deined &y Codd and
eDplained &y "ate' "ar$en and ot/ers' &ut none /ave &een popular
successes so ar. 7el is one o t/e more recent attempts to do t/is.
,.8 Contro&"r!"!
Codd /imsel' some years ater pu&lication o /is +:;9 model' proposed
a t/ree4valued logic B@rue' ?alse' Missing or NULLC version o it in
order to deal $it/ missing inormation' and in /is The )elational Model
for Database Management 6ersion / B+::9C /e $ent a step urt/er $it/
a our4valued logic B@rue' ?alse' Missing &ut (pplica&le' Missing &ut
Inapplica&leC version. %ut t/ese /ave never &een implemented'
presuma&ly &ecause o attending compleDity. 2FLIs NULL construct
$as intended to &e part o a t/ree4valued logic system' &ut ell s/ort o
t/at due to logical errors in t/e standard and in its implementations.
,.F D"!-n
.9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
"ata&ase normaliEation is usually perormed $/en designing a
relational data&ase' to improve t/e logical consistency o t/e data&ase
design. @/is trades o transactional perormance or space eiciency.
@/ere are t$o commonly used systems o diagramming to aid in t/e
visual representation o t/e relational model5 t/e entity4relations/ip
diagram BE7"C' and t/e related I"E? diagram used in t/e I"E?+Q
met/od created &y t/e U.2. (ir ?orce &ased on E7"s.
@/e tree structure o data may enorce /ierarc/ical model organiEation'
$it/ parent4c/ild relations/ip ta&le.
,.*: S"t@T1"or"tc For#u('ton
%asic notions in t/e relational model are relation names and attribute
names. We $ill represent t/ese as strings suc/ as M!ersonM and MnameM
and $e $ill usually use t/e varia&les and a'b'c to range over t/em.
(not/er &asic notion is t/e set o atomic values t/at contains values
suc/ as num&ers and strings.
Our irst deinition concerns t/e notion o tuple' $/ic/ ormaliEes t/e
notion o ro$ or record in a ta&le5
Tu<("
( tuple is a partial unction rom attri&ute names to atomic values.
)eader
( /eader is a inite set o attri&ute names.
!rojection
@/e projection o a tuple t on a inite set o attri&utes A is.
@/e neDt deinition deines relation $/ic/ ormaliEes t/e contents o a
ta&le as it is deined in t/e relational model.
R"('ton
( relation is a tuple B9'$C $it/ 9' t/e /eader' and $' t/e &ody' a set o
tuples t/at all /ave t/e domain 9.
2uc/ a relation closely corresponds to $/at is usually called t/e
eDtension o a predicate in irst4order logic eDcept t/at /ere $e identiy
t/e places in t/e predicate $it/ attri&ute names. Usually in t/e relational
model a data&ase sc/ema is said to consist o a set o relation names' t/e
.+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
/eaders t/at are associated $it/ t/ese names and t/e constraints t/at
s/ould /old or every instance o t/e data&ase sc/ema.
,.** A") Con!tr'nt! 'nd Functon'( D"<"nd"nc"!
One o t/e simplest and most important types o relation constraints is
t/e key constraint. It tells us t/at in every instance o a certain relational
sc/ema t/e tuples can &e identiied &y t/eir values or certain attri&utes.
8.: CONCLUSION
@/e evolution o t/e relational model o data&ase and data&ase
management systems is signiicant in t/e /istory and development o
data&ase and data&ase management systems. @/is concept pioneered &y
Edgar Codd &roug/t an entirely and muc/ eicient $ay o storing and
retrieving data' especially or a large data&ase. @/is concept emp/asiEed
t/e use o ta&les and t/en linking t/e ta&les t/roug/ commands. Most o
todayKs data&ase management systems implements t/e relational model
5.: SUMMARY
@/e relational model or data&ase management is a data&ase model
&ased on irst4order predicate logic' irst ormulated and proposed in
+:.: &y Edgar Codd
@/e undamental assumption o t/e relational model is t/at all data is
represented as mat/ematical n4ary r"('ton!' an n4ary relation &eing
a su&set o t/e Cartesian product o n domains.
@o ully appreciate t/e relational model o data it is essential to
understand t/e intended interpretation o a relation.
( t)<" as used in a typical relational data&ase mig/t &e t/e set o
integers' t/e set o c/aracter strings' t/e set o dates' or t/e t$o
&oolean values true and false' and so on
Ot/er models are t/e /ierarc/ical model and net$ork model. 2ome
systems using t/ese older arc/itectures are still in use today in data
centers
@/e relational model $as invented &y E.?. B@edC Codd as a general
model o data' and su&se*uently maintained and developed &y C/ris
"ate and )ug/ "ar$en among ot/ers.
2FL' initially pus/ed as t/e standard language or relational
data&ases' deviates rom t/e relational model in several places.
@/ere /ave &een several attempts to produce a true implementation
o t/e relational data&ase model as originally deined &y Codd and
eDplained &y "ate' "ar$en and ot/ers' &ut none /ave &een popular
successes so ar
.8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
"ata&ase normaliEation is usually perormed $/en designing a
relational data&ase' to improve t/e logical consistency o t/e
data&ase design
%asic notions in t/e relational model are relation names and
attribute names.
One o t/e simplest and most important types o relation constraints
is t/e key constraint.
?.: TUTOR@MARAED ASSIGNMENT
+. %riely discuss Interpretation in 7elational Model.
8. Mention 1 $ays in $/ic/ relational model diers rom an 2FL
7.: REFERENCESBFURTCER READINGS
:Derivability& )edundancy& and "onsistency of )elations 0tored in
7arge Data $anks:' E.?. Codd' I%M 7esearc/ 7eport' +:.:.
:A )elational Model of Data for 7arge 0hared Data $anks:' in
Communications o t/e (CM' +:;9.
W/ite' Colin. In the $eginning: An )D$M0 9istory. @eradata MagaEine
Online. 2eptem&er 899, edition. U7L5
/ttp5--$$$.teradata.com-t-page-+8;91;.
Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata
%anksM. "ommunications of the A"M += B.C5 =;;O=<;. doi5
+9.++,1-=.8=<,.=.8.<1.
"ate' C. #.' "ar$en' ). B8999C. 4oundation for 4uture Database
0ystems: The Third Manifesto' 8nd edition' (ddison4Wesley
!roessional. I2%N 9489+4;9:8<4;.
"ate' C. #. B899=C. Introduction to Database 0ystems. <t/ edition'
(ddison4Wesley. I2%N 94=8+4+:;<,4,.
UNIT ? BASIC COMPONENTS OF DBMS
CONTENTS
.=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
+.9 Introduction
8.9 O&jectives
=.9 Main Content
=.+ Concurrency Controls
=.8 #ava "ata&ase Connectivity
=.= Fuery OptimiEer
=., Open "ata&ase Connectivity
=.1 "ata "ictionary
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
@o &e discussed in t/ese units are t/e &asic components o any data&ase.
@/ese components ensure proper control o data' access o data' *uery
or data as $ell as met/ods o accessing data&ase management systems.
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
kno$ t/e rules guiding transaction (CI"
kno$ $/at is concurrency control in data&ases
mention t/e dierent met/ods o concurrency control
deine and interpret t/e acronymn #"%C
ans$er t/e *uestion o t/e types and drivers o #"%C
deine *uery optimiEer' and its applications and cost estimation
,.: MAIN CONTENT
,.* Concurr"nc) Contro(!
In data&ases' concurr"nc) contro( ensures t/at correct results or
concurrent operations are generated' $/ile getting t/ose results as
*uickly as possi&le.
Concurr"nc) Contro( n D't'$'!"!
Concurrency control in data&ase management systems B"%M2C ensures
t/at data&ase transactions are perormed concurrently $it/out t/e
concurrency violating t/e data integrity o a data&ase. EDecuted
transactions s/ould ollo$ t/e (CI" rules' as descri&ed &elo$. @/e
"%M2 must guarantee t/at only serialiEa&le Bunless 2erialiEa&ility is
intentionally relaDedC' recovera&le sc/edules are generated. It also
.,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
guarantees t/at no eect o committed transactions is lost' and no eect
o a&orted Brolled &ackC transactions remains in t/e related data&ase.
Tr'n!'cton ACID Ru("!
(tomicity 4 Eit/er t/e eects o all or none o its operations remain
$/en a transaction is completed 4 in ot/er $ords' to t/e outside $orld
t/e transaction appears to &e indivisi&le' atomic.
Consistency 4 Every transaction must leave t/e data&ase in a consistent
state.
Isolation 4 @ransactions cannot interere $it/ eac/ ot/er. !roviding
isolation is t/e main goal o concurrency control.
"ura&ility 4 2uccessul transactions must persist t/roug/ cras/es.
Concurr"nc) Contro( M"c1'n!#
@/e main categories o concurrency control mec/anisms are5
O<t#!tc 4 "elay t/e sync/roniEation or a transaction until it is end
$it/out &locking Bread' $riteC operations' and t/en a&ort transactions
t/at violate desired sync/roniEation rules.
P"!!#!tc 4 %lock operations o transaction t/at $ould cause
violation o sync/roniEation rules.
@/ere are several met/ods or concurrency control. (mong t/em5
@$o4p/ase locking
2trict t$o4p/ase locking
Conservative t$o4p/ase locking
IndeD locking
Multiple granularity locking
( Lock is a data&ase system o&ject associated $it/ a data&ase o&ject
Btypically a data itemC t/at prevents undesired Btypically sync/roniEation
rule violatingC operations o ot/er transactions &y &locking t/em.
"ata&ase system operations c/eck or lock eDistence' and /alt $/en
noticing a lock type t/at is intended to &lock t/em.
@/ere are also non4lock concurrency control met/ods' among t/em5
Conlict BserialiEa&ility' precedenceC grap/ c/ecking
@imestamp ordering
.1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
commitment ordering
(lso Optimistic concurrency control met/ods typically do not use
locks.
(lmost all currently implemented lock4&ased and non4lock4&ased
concurrency control mec/anisms guarantee sc/edules t/at are conlict
serialiEa&le Bunless relaDed orms o serialiEa&ility are neededC.
)o$ever' t/ere are many researc/ teDts encouraging vie$ serialiEa&le
sc/edules or possi&le gains in perormance' especially $/en not too
many conlicts eDist Band not too many a&orts o completely eDecuted
transactions occurC' due to reducing t/e considera&le over/ead o
&locking mec/anisms.
Concurr"nc) Contro( n O<"r'tn- S)!t"#!
Operating systems' especially real4time operating systems' need to
maintain t/e illusion t/at many tasks are all running at t/e same time.
2uc/ multitasking is airly simple $/en all tasks are independent rom
eac/ ot/er. )o$ever' $/en several tasks try to use t/e same resource' or
$/en tasks try to s/are inormation' it can lead to conusion and
inconsistency. @/e task o concurrent computing is to solve t/at
pro&lem. 2ome solutions involve MlocksM similar to t/e locks used in
data&ases' &ut t/ey risk causing pro&lems o t/eir o$n suc/ as deadlock.
Ot/er solutions are lock4ree and $ait4ree algorit/ms.
,.+ ;'&' D't'$'!" Conn"ct&t)
;'&' D't'$'!" Conn"ct&t) B#"%CC is an (!I or t/e #ava
programming language t/at deines /o$ a client may access a data&ase.
It provides met/ods or *uerying and updating data in a data&ase. #"%C
is oriented to$ards relational data&ases.
O&"r&"2
#"%C /as &een part o t/e #ava 2tandard Edition since t/e release o
#"3 +.+. @/e #"%C classes are contained in t/e #ava package java.s*l.
2tarting $it/ version =.9' #"%C /as &een developed under t/e #ava
Community !rocess. #27 1, speciies #"%C =.9 Bincluded in #82E +.,C'
#27 ++, speciies t/e #"%C 7o$set additions' and #27 88+ is t/e
speciication o #"%C ,.9 Bincluded in #ava 2E .C.
#"%C allo$s multiple implementations to eDist and &e used &y t/e same
application. @/e (!I provides a mec/anism or dynamically loading t/e
correct #ava packages and registering t/em $it/ t/e #"%C "river
Manager. @/e "river Manager is used as a connection actory or
..
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
creating #"%C connections.
#"%C connections support creating and eDecuting statements. @/ese
may &e update statements suc/ as 2FLIs C7E(@E' IN2E7@' U!"(@E
and "ELE@E' or t/ey may &e *uery statements suc/ as 2ELEC@.
(dditionally' stored procedures may &e invoked t/roug/ a #"%C
connection. #"%C represents statements using one o t/e ollo$ing
classes5
2tatement O t/e statement is sent to t/e data&ase server eac/ and every
time.
!repared2tatement O t/e statement is cac/ed and t/en t/e eDecution
pat/ is pre determined on t/e data&ase server allo$ing it to &e eDecuted
multiple times in an eicient manner.
Calla&le2tatement O used or eDecuting stored procedures on t/e
data&ase.
Update statements suc/ as IN2E7@' U!"(@E and "ELE@E return an
update count t/at indicates /o$ many ro$s $ere aected in t/e
data&ase. @/ese statements do not return any ot/er inormation.
Fuery statements return a #"%C ro$ result set. @/e ro$ result set is
used to $alk over t/e result set. Individual columns in a ro$ are
retrieved eit/er &y name or &y column num&er. @/ere may &e any
num&er o ro$s in t/e result set. @/e ro$ result set /as metadata t/at
descri&es t/e names o t/e columns and t/eir types.
@/ere is an eDtension to t/e &asic #"%C (!I in t/e javaD.s*l package
t/at allo$s or scrolla&le result sets and cursor support among ot/er
t/ings.
;DBC Dr&"r!
#"%C "rivers are client4side adaptors Bt/ey are installed on t/e client
mac/ine' not on t/e serverC t/at convert re*uests rom #ava programs to
a protocol t/at t/e "%M2 can understand.
T)<"!: @/ere are commercial and ree drivers availa&le or most
relational data&ase servers. @/ese drivers all into one o t/e ollo$ing
types5
@ype +'t/e #"%C4O"%C &ridge
@ype 8' t/e Native4(!I driver
@ype =' t/e net$ork4protocol driver
@ype , t/e native4protocol drivers
.;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Internal #"%C driver' driver em&edded $it/ #7E in #ava4ena&led 2FL
data&ases. Used or #ava stored procedures. @/is does not &elong to t/e
a&ove classiication' alt/oug/ it $ould likely &e eit/er a type 8 or type ,
driver Bdepending on $/et/er t/e data&ase itsel is implemented in #ava
or notC. (n eDample o t/is is t/e 3!7% driver supplied $it/ Oracle
7"%M2. Mjd&c5deault5connectionM is a relatively standard $ay o
reerring making suc/ a connection Bat least Oracle and (pac/e "er&y
support itC. @/e distinction /ere is t/at t/e #"%C client is actually
running as part o t/e data&ase &eing accessed' so access can &e made
directly rat/er t/an t/roug/ net$ork protocols.
Sourc"!
2FL2ummit.com pu&lis/es list o drivers' including #"%C drivers and
vendors
2un Microsystems provides a list o some #"%C drivers and vendors
2im&a @ec/nologies s/ips an 2"3 or &uilding custom #"%C "rivers
or any custom-proprietary relational data source
"ata"irect @ec/nologies provides a compre/ensive suite o ast @ype ,
#"%C drivers or all major data&ase
I"2 2ot$are provides a @ype = #"%C driver or concurrent access to
all major data&ases. 2upported eatures include resultset cac/ing' 22L
encryption' custom data source' d&2/ield.
i4net sot$are provides ast @ype , #"%C drivers or all major
data&ases
OpenLink 2ot$are s/ips #"%C "rivers or a variety o data&ases'
including %ridges to ot/er data access mec/anisms Be.g.' O"%C' #"%CC
$/ic/ can provide more unctionality t/an t/e targeted mec/anism
#"%access is a #ava persistence li&rary or My2FL and Oracle $/ic/
deines major data&ase access operations in an easy usa&le (!I a&ove
#"%C
#Net"irect provides a suite o ully 2un #8EE certiied /ig/
perormance #"%C drivers.
)2FLis a 7"%M2 $it/ a #"%C driver and is availa&le under a %2"
license.
,., =u"r) O<t#>"r
@/e Du"r) o<t#>"r is t/e component o a data&ase management
system t/at attempts to determine t/e most eicient $ay to eDecute a
*uery. @/e optimiEer considers t/e possi&le *uery plans or a given
input *uery' and attempts to determine $/ic/ o t/ose plans $ill &e t/e
most eicient. Cost4&ased *uery optimiEers assign an estimated McostM
to eac/ possi&le *uery plan' and c/oose t/e plan $it/ t/e smallest cost.
.<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Costs are used to estimate t/e runtime cost o evaluating t/e *uery' in
terms o t/e num&er o I-O operations re*uired' t/e C!U re*uirements'
and ot/er actors determined rom t/e data dictionary. @/e set o *uery
plans eDamined is ormed &y eDamining t/e possi&le access pat/s Be.g.
indeD scan' se*uential scanC and join algorit/ms Be.g. sort4merge join'
/as/ join' nested loopsC. @/e searc/ space can &ecome *uite large
depending on t/e compleDity o t/e 2FL *uery.
@/e *uery optimiEer cannot &e accessed directly &y users. Instead' once
*ueries are su&mitted to data&ase server' and parsed &y t/e parser' t/ey
are t/en passed to t/e *uery optimiEer $/ere optimiEation occurs.
I#<("#"nt'ton
Most *uery optimiEers represent *uery plans as a tree o Mplan nodesM. (
plan node encapsulates a single operation t/at is re*uired to eDecute t/e
*uery. @/e nodes are arranged as a tree' in $/ic/ intermediate results
lo$ rom t/e &ottom o t/e tree to t/e top. Eac/ node /as Eero or more
c/ild nodes 44 t/ose are nodes $/ose output is ed as input to t/e parent
node. ?or eDample' a join node $ill /ave t$o c/ild nodes' $/ic/
represent t/e t$o join operands' $/ereas a sort node $ould /ave a
single c/ild node Bt/e input to &e sortedC. @/e leaves o t/e tree are
nodes $/ic/ produce results &y scanning t/e disk' or eDample &y
perorming an indeD scan or a se*uential scan.
Co!t E!t#'ton
One o t/e /ardest pro&lems in *uery optimiEation is to accurately
estimate t/e costs o alternative *uery plans. OptimiEers cost *uery
plans using a mat/ematical model o *uery eDecution costs t/at relies
/eavily on estimates o t/e cardinality' or num&er o tuples' lo$ing
t/roug/ eac/ edge in a *uery plan. Cardinality estimation in turn
depends on estimates o t/e selection actor o predicates in t/e *uery.
@raditionally' data&ase systems estimate selectivities t/roug/ airly
detailed statistics on t/e distri&ution o values in eac/ column' suc/ as
/istograms @/is tec/ni*ue $orks $ell or estimation o selectivities o
individual predicates. )o$ever many *ueries /ave conjunctions o
predicates suc/ as select count B`C rom 7 $/ere 7.makeHI)ondaI and
7.modelHI(ccordI. Fuery predicates are oten /ig/ly correlated Bor
eDample' modelHI(ccordI implies makeHI)ondaIC' and it is very /ard to
estimate t/e selectivity o t/e conjunct in general. !oor cardinality
estimates and uncaug/t correlation are one o t/e main reasons $/y
*uery optimiEers pick poor *uery plans. @/is is one reason $/y a "%(
s/ould regularly update t/e data&ase statistics' especially ater major
data loads-unloads.
.:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.8 O<"n D't'$'!" Conn"ct&t)
In computing' O<"n D't'$'!" Conn"ct&t) BODBCC provides a
standard sot$are (!I met/od or using data&ase management systems
B"%M2C. @/e designers o O"%C aimed to make it independent o
programming languages' data&ase systems' and operating systems.
O&"r&"2
@/e !7(@(! speciication oers a procedural (!I or using 2FL
*ueries to access data. (n implementation o O"%C $ill contain one or
more applications' a core O"%C M"river ManagerM li&rary' and one or
more Mdata&ase driversM. @/e "river Manager' independent o t/e
applications and "%M2' acts as an MinterpreterM &et$een t/e
applications and t/e data&ase drivers' $/ereas t/e data&ase drivers
contain t/e "%M24speciic details. @/us a programmer can $rite
applications t/at use standard types and eatures $it/out concern or t/e
speciics o eac/ "%M2 t/at t/e applications may encounter. Like$ise'
data&ase driver implementors need only kno$ /o$ to attac/ to t/e core
li&rary. @/is makes O"%C modular.
@o $rite O"%C code t/at eDploits "%M24speciic eatures re*uires
more advanced programming5 an application must use introspection'
calling O"%C metadata unctions t/at return inormation a&out
supported eatures' availa&le types' syntaD' limits' isolation levels' driver
capa&ilities and more. Even $/en programmers use adaptive tec/ni*ues'
/o$ever' O"%C may not provide some advanced "%M2 eatures. @/e
O"%C =.D (!I operates $ell $it/ traditional 2FL applications suc/ as
OL@!' &ut it /as not evolved to support ric/er types introduced &y 2FL5
+::: and 2FL5899=
O"%C provides t/e standard o u&i*uitous data access &ecause
/undreds o O"%C drivers eDist or a large variety o data sources.
O"%C operates $it/ a variety o operating systems and drivers eDist or
non4relational data suc/ as spreads/eets' teDt and QML iles. %ecause
O"%C dates &ack to +::8' it oers connectivity to a $ider variety o
data sources t/an ot/er data4access (!Is. More drivers eDist or O"%C
t/an drivers or providers eDist or ne$er (!Is suc/ as OLE "%' #"%C'
and ("O.NE@.
"espite t/e &eneits o u&i*uitous connectivity and platorm4
independence' systems designers may perceive O"%C as /aving certain
dra$&acks. (dministering a large num&er o client mac/ines can
involve a diversity o drivers and "LLs. @/is compleDity can increase
system4administration over/ead. Large organiEations $it/ t/ousands o
!Cs /ave oten turned to O"%C server tec/nology Balso kno$n as
MMulti4@ier O"%C "riversMC to simpliy t/e administration pro&lems.
;9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
"ierences &et$een drivers and driver maturity can also raise important
issues. Ne$er O"%C drivers do not al$ays /ave t/e sta&ility o drivers
already deployed or years. Aears o testing and deployment mean a
driver may contain e$er &ugs.
"evelopers needing eatures or types not accessi&le $it/ O"%C can use
ot/er 2FL (!Is. W/en not aiming or platorm4independence'
developers can use proprietary (!Is' $/et/er "%M24speciic Bsuc/ as
@ransact2FLC or language4speciic Bor eDample5 #"%C or #ava
applicationsC.
Brd-n- con/-ur'ton!
;DBC@ODBC Brd-"!
( #"%C4O"%C &ridge consists o a #"%C driver $/ic/ employs an
O"%C driver to connect to a target data&ase. @/is driver translates
#"%C met/od calls into O"%C unction calls. !rogrammers usually use
suc/ a &ridge $/en a particular data&ase lacks a #"%C driver. 2un
Microsystems included one suc/ &ridge in t/e #0M' &ut vie$ed it as a
stop4gap measure $/ile e$ #"%C drivers eDisted. 2un never intended
its &ridge or production environments' and generally recommends
against its use. Independent data4access vendors no$ deliver #"%C4
O"%C &ridges $/ic/ support current standards or &ot/ mec/anisms'
and $/ic/ ar outperorm t/e #0M &uilt4in.
ODBC@;DBC Brd-"!
(n O"%C4#"%C &ridge consists o an O"%C driver $/ic/ uses t/e
services o a #"%C driver to connect to a data&ase. @/is driver translates
O"%C unction calls into #"%C met/od calls. !rogrammers usually use
suc/ a &ridge $/en t/ey lack an O"%C driver or a particular data&ase
&ut /ave access to a #"%C driver.
I#<("#"nt'ton!
O"%C implementations run on many operating systems' including
Microsot Windo$s' UniD' LinuD' O2-8' O2-,99' I%M i1-O2' and Mac
O2 Q. )undreds o O"%C drivers eDist' including drivers or Oracle'
"%8' Microsot 2FL 2erver' 2y&ase' !ervasive 2FL' I%M Lotus
"omino' My2FL' !ostgre2FL' and desktop data&ase products suc/ as
?ileMaker' and Microsot (ccess.
,.5 D't' Dcton'r)
( d't' dcton'r)' as deined in t/e I$M Dictionary of "omputing is a
;+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
McentraliEed repository o inormation a&out data suc/ as meaning'
relations/ips to ot/er data' origin' usage' and ormat. @/e term may /ave
one o several closely related meanings pertaining to data&ases and
data&ase management systems B"%M2C5
a document descri&ing a data&ase or collection o data&ases
an integral component o a "%M2 t/at is re*uired to determine its
structure
a piece o middle$are t/at eDtends or supplants t/e native data
dictionary o a "%M2
D't' Dcton'r) Docu#"nt'ton
"ata&ase users and application developers can &eneit rom an
aut/oritative data dictionary document t/at catalogs t/e organiEation'
contents' and conventions o one or more data&ases @/is typically
includes t/e names and descriptions o various ta&les and ields in eac/
data&ase' plus additional details' like t/e type and lengt/ o eac/ data
element. @/ere is no universal standard as to t/e level o detail in suc/ a
document' &ut it is primarily a distillation o metadata a&out data&ase
structure' not t/e data itsel. ( data dictionary document also may
include urt/er inormation descri&ing /o$ data elements are encoded.
One o t/e advantages o $ell4designed data dictionary documentation
is t/at it /elps to esta&lis/ consistency t/roug/out a compleD data&ase'
or across a large collection o ederated data&ases
D't' Dcton'r) Mdd("2'r"
In t/e construction o data&ase applications' it can &e useul to introduce
an additional layer o data dictionary sot$are' i.e. middle$are' $/ic/
communicates $it/ t/e underlying "%M2 data dictionary. 2uc/ a M/ig/4
levelM data dictionary may oer additional eatures and a degree o
leDi&ility t/at goes &eyond t/e limitations o t/e native Mlo$4levelM data
dictionary' $/ose primary purpose is to support t/e &asic unctions o
t/e "%M2' not t/e re*uirements o a typical application. ?or eDample' a
/ig/4level data dictionary can provide alternative entity4relations/ip
models tailored to suit dierent applications t/at s/are a common
data&ase. EDtensions to t/e data dictionary also can assist in *uery
optimiEation against distri&uted data&ases
2ot$are rame$orks aimed at rapid application development
sometimes include /ig/4level data dictionary acilities' $/ic/ can
su&stantially reduce t/e amount o programming re*uired to &uild
menus' orms' reports' and ot/er components o a data&ase application'
including t/e data&ase itsel. ?or eDample' !)!Lens includes a !)!
class li&rary to automate t/e creation o ta&les' indeDes' and oreign key
constraints porta&ly or multiple data&ases. (not/er !)!4&ased data
dictionary' part o t/e 7("ICO7E toolkit' automatically generates
;8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
program o&jects' scripts' and 2FL code or menus and orms $it/ data
validation and compleD #OINs ?or t/e (2!.NE@ environment' %ase
OneIs data dictionary provides cross4"%M2 acilities or automated
data&ase creation' data validation' perormance en/ancement Bcac/ing
and indeD utiliEationC' application security' and eDtended data types.
8.: CONCLUSION
@/e &asic components o any data&ase management system serve to
ensure t/e availa&ility o data as $ell as t/e eiciency in accessing t/e
data. @/ey include mainly' a data dictionary' *uery optimiEers' and #ava
data&ase connectivity.
5.: SUMMARY
In data&ases' concurr"nc) contro( ensures t/at correct results or
concurrent operations are generated' $/ile getting t/ose results as
*uickly as possi&le.
;'&' D't'$'!" Conn"ct&t) B#"%CC is an (!I or t/e #ava
programming language t/at deines /o$ a client may access a
data&ase. It provides met/ods or *uerying and updating data in a
data&ase. #"%C is oriented to$ards relational data&ases.
@/e Du"r) o<t#>"r is t/e component o a data&ase management
system t/at attempts to determine t/e most eicient $ay to eDecute a
*uery. @/e optimiEer considers t/e possi&le *uery plans or a given
input *uery' and attempts to determine $/ic/ o t/ose plans $ill &e
t/e most eicient.
In computing' O<"n D't'$'!" Conn"ct&t) BODBCC provides a
standard sot$are (!I met/od or using data&ase management
systems B"%M2C. @/e designers o O"%C aimed to make it
independent o programming languages' data&ase systems' and
operating systems.
( d't' dcton'r)' as deined in t/e I$M Dictionary of "omputing
is a McentraliEed repository o inormation a&out data suc/ as
meaning' relations/ips to ot/er data' origin' usage' and ormat
In t/e construction o data&ase applications' it can &e useul to
introduce an additional layer o data dictionary sot$are' i.e.
middle$are' $/ic/ communicates $it/ t/e underlying "%M2 data
dictionary
?.: TUTOR@MARAED ASSIGNMENT
+. "eine t/e @ransaction (CI" rules.
8. List and deine types o #"%C "river.
7.: REFERENCESBFURTCER READINGS
;=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
(CM' I%M "ictionary o Computing' +9t/ edition' +::=
@ec/@arget' 0earch02A' W/at is a "ata "ictionaryN
()IM( !ractice %rie' Guidelines or "eveloping a "ata "ictionary'
;ournal of A9IMA ;;' no.8 B?e&ruary 899.C5 .,(4".
U.2. !atent ,;;,..+' "ata&ase management system $it/ active data
dictionary' ++-+:-+:<1' (@L@
U.2. !atent ,;.:;;8' (utomated Fuery OptimiEation Met/od using
&ot/ Glo&al and !arallel Local OptimiEations or MaterialiEation
access !lanning or "istri&uted "ata&ases' 98-8<-+:<1'
)oney$ell %ull.
!)!Lens' ("Od& "ata "ictionary Li&rary or !)!
7("ICO7E' $/at is a "ata "ictionaryN
%ase One International Corp.' %ase One "ata "ictionary
C/aud/uri' 2urajit B+::<C. M(n Overvie$ o Fuery OptimiEation in
7elational 2ystemsU. (roceedings of the A"M 0ymposium on
(rinciples of Database 0ystems5 pages =,O,=. doi5
+9.++,1-8;1,<;.8;1,:8.
Ioannidis' Aannis BMarc/ +::.C. MFuery optimiEationM. A"M
"omputing 0urveys +8 B+C5 +8+O+8=. doi5
+9.++,1-8=,=+=.8=,=.;.
2elinger' !atricia' et al. B+:;:C. M(ccess !at/ 2election in a 7elational
"ata&ase Management 2ystemM. (roceedings of the !<5< A"M
0IM2D International "onference on Management of Data5
8=4=,. doi5+9.++,1-1<89:1.1<89::.
!arkes' Clara ). B(pril +::.C. M!o$er to t/e !eopleM' D$M0 Maga3ine'
Miller ?reeman' Inc.
MODULE +
Unit + "evelopment and "esign4O "ata&ase
Unit 8 2tructured Fuery Languages B2FLC
Unit = "ata&ase and Inormation 7elational 2ystems
Unit , "ata&ase (dministrator and (dministration
UNIT * DEVELOPMENT AND DESIGN@OF DATABASE
;,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
CONTENTS
+.9 Introduction
8.9 O&jectives
=.9 Main Content
=.+ "ata&ase "evelopment
=.+.+ "ata !lanning and "ata&ase "esign
=.8 "esign o "ata&ase
=.8.+ "ata&ase NormaliEation
=.8.8 C!tor)
,., Nor#'( For#!
,.8 D"nor#'(>'ton
,.5 Non@/r!t nor#'( /or# 5NFG or N*NF6
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
"ata&ase design is t/e process o deciding /o$ to organiEe data into
recordstypes and /o$ t/e record types and /o$ t/e record types and
/o$t/e record types $ill relate to eac/ ot/er. @/e "%M2 mirrorKs t/e
organiEationKs data structure and process transactions eiciently.
"eveloping small' personal data&ases is relatively easy using
microcomputer "%M2 packages or $iEards. )o$ever' developing a
large data&ase o compleD o compleD data types can &e a compleD task.
In many companies' developing and managing large corporate data&ases
are t/e primary responsi&ility o t/e data&ase administrator and data&ase
design analysts. @/ey $ork $it/ end users and systems analyst to model
&usiness processes and t/e data re*uired. @/en t/ey determine5
+. W/at data deinitions s/ould &e included in t/e data&ases
8. W/at structures or relations/ips s/ould eDist among t/e data
elementsN
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
understand t/e concept o data planning and data&ase design
kno$ t/e steps in t/e development o data&ases
identiy t/e unctions o eac/ step o t/e design process
deine data&ase normaliEation
kno$ t/e pro&lems addressed &y normaliEations
;1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
deine normal orms rom +
st
to .
t/
orms
deine and understand t/e term denormaliEation
,.: MAIN CONTENT
,.* D't'$'!" D"&"(o<#"nt
,.*.* D't' P('nnn- 'nd D't'$'!" D"!-n
(s igure + illustrates' data&ase development may start $it/ a top4do$n
d't' <('nnn- <roc"!!. "ata&ase administrators and designers $ork
$it/ corporate and end user management to develop an "nt"r<r!"
#od"( t/at deines t/e &asic &usiness process o t/e enterprise. @/en
t/ey deine t/e inormation needs o end4users in a &usiness process
suc/ as t/e purc/asing- receiving process t/at all &usiness /as.
NeDt' end users must identiy t/e key data elements t/at are needed to
perorm t/e speciic &usiness activities. @/is re*uently involves
developing entity relations/ips among t/e diagrams BE7"sC t/at model
t/e relations/ips among t/e many entities involved in t/e &usiness
processes. End users and data&ase designers could use E7" availa&le to
identiy $/at suppliers and product data are re*uired to activate t/eir
purc/asing-receiving and ot/er &usiness processes using enterprise
resource planning BE7!C or supply c/ain management B2CMC sot$are.
2uc/ usersK vie$s are a major part o a d't' #od"(n- process $/ere
t/e relations/ips &et$een data elements are identiied. Eac/ data model
deines t/e logical relations/ips among t/e data elements needed to
support a &asic &usiness process. ?or eDample' can a supplier provide
more t/an t/e type o product to useN Can a customer /ave more t/an
one type o product to useN Can a customer /ave more t/an one type o
account $it/ usN Can an employee /ave several pay rates or &e assigned
to several projects or $orkgroupN
(ns$ering suc/ *uestions $ill identiy data relations/ips t/at /ave to &e
represented in a data model t/at supports a &usiness process. @/ese data
models t/en serves as logical rame$orks Bcalled sc/emas and su&
sc/emasC on $/ic/ to &ase t/e p/ysical design o data&ases and t/e
development o application programs to support &usiness processes o
t/e organiEation. ( sc/ema is an overall logical vie$ o t/e relations/ip
among t/e data elements in a data&ase' $/ile t/e su& sc/ema is a logical
vie$ o t/e data relations/ips needed to support speciic end user
application programs t/at $ill access t/at data&ase.
7emem&er t/at data models represent logical vie1s o data and
relations/ips o t/e data&ase. !/ysical data&ase design takes a physical
;.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
vie1 o t/e data Balso called internal vie$C t/at descri&es /o$ data are to
&e p/ysically stored and accessed on t/e storage devices o a computer
system. ?or eDample' igure 8 illustrates t/ese dierent vie$s and t/e
sot$are interace o a &ank data&ase processing system. In t/is
eDample' c/ecking' saving and installment lending are t/e &usiness
process $/ere data models are part o a &anking services data model t/at
serves as a logical data rame$ork or all &ank services.
,.+ D"!-n o/ D't'$'!"
,.+.* D't'$'!" Nor#'(>'ton
2ometimes reerred to as canonical synthesis' is a tec/ni*ue or
designing relational data&ase ta&les to minimiEe duplication o
inormation and' in so doing' to saeguard t/e data&ase against certain
types o logical or structural pro&lems' namely data anomalies. ?or
eDample' $/en multiple instances o a given piece o inormation occur
in a ta&le' t/e possi&ility eDists t/at t/ese instances $ill not &e kept
consistent $/en t/e data $it/in t/e ta&le is updated' leading to a loss o
data integrity. ( ta&le t/at is suiciently normaliEed is less vulnera&le to
pro&lems o t/is kind' &ecause its structure relects t/e &asic
assumptions or $/en multiple instances o t/e same inormation s/ould
&e represented &y a single instance only.
)ig/er degrees o normaliEation typically involve more ta&les and create
t/e need or a larger num&er o joins' $/ic/ can reduce perormance.
(ccordingly' more /ig/ly normaliEed ta&les are typically used in
data&ase applications involving many isolated transactions Be.g. an
(utomated teller mac/ineC' $/ile less normaliEed ta&les tend to &e used
in data&ase applications t/at need to map compleD relations/ips &et$een
data entities and data attri&utes Be.g. a reporting application' or a ull4
teDt searc/ applicationC.
"ata&ase t/eory descri&es a ta&leIs degree o normaliEation in terms o
normal orms o successively /ig/er degrees o strictness. ( ta&le in
@/ird Normal ?orm B,NFC' or eDample' is conse*uently in 2econd
Normal ?orm B+NFC as $ellJ &ut t/e reverse is not necessarily t/e case.
F-ur" *: D't'$'!" D"&"(o<#"nt Structur"
*. D't' P('nnn-
"evelops a model o
&usiness process
!/ysical "ata Modes
storage representation
and access met/ods
Enterprise models o
%usiness process $it/
2torage documentation
5. P1)!c'( D"!-n
"etermines t/e data
structures and process
met/ods
+. R"Dur"#"nt S<"c/c'ton
"eine inormation needs o end
Uses in a &usiness process
Logical "ata Models
e.g. relational' net$ork
/ierarc/ical' multidimensional
Or o&ject4oriented models
8. Lo-c'( D"!-n
@ranslates t/e conceptual
models into t/e data model
o a "%M2
"escription o user needs
May &e represented in natural
Language or using t/e tools o
!articular design met/odology
Conceptual "ata Model Oten
eDpressed as entity
relations/ip models
,. Conc"<tu'( D"!-n
EDpresses all inormation
re*uirements in t/e orm o a
/ig/4level model
;;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Not": "ata&ase development involves data planning and data&ase
design activities. "ata models t/at support &usiness process are used to
develop data&ases t/at meet t/e inormation needs o users.
;<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
F-ur" +: E7'#<("! o/ t1" (o-c'( 'nd <1)!c'( d't'$'!" &"2! 'nd
t1" !o/t2'r" nt"r/'c" o/ ' $'n0n- !"r&c" n/or#'ton !)!t"#.


(lt/oug/ t/e normal orms are oten deined inormally in terms o t/e
c/aracteristics o ta&les' rigorous deinitions o t/e normal orms are
concerned $it/ t/e c/aracteristics o mat/ematical constructs kno$n as
relations. W/enever inormation is represented relationally' it is
meaningul to consider t/e eDtent to $/ic/ t/e representation is
normaliEed.
Pro$("#! 'ddr"!!"d $) nor#'(>'ton
(n U<d't" Ano#'(). Employee 1+: is s/o$n as /aving dierent
addresses on dierent records.
(n In!"rton Ano#'(). Until t/e ne$ aculty mem&er is assigned to
teac/ at least one course' /is details cannot &e recorded.
In!t'((#"nt
Lo'n
A<<(c'ton
S'&n-! C1"c0n-
C/ecking and
2avings
"ata Model
Installment
Loan
"ata Model
Lo-c'( u!"r V"2
"ata elements and relations
Bt/e su& sc/emasC needed or
c/ecking' savings' or installment
loan processing
%anking 2ervice "ata Model
"ata&ase Management 2ystem
D't' element and relations/ips
Bt/e sc/emaC needed or support
all &anking services
So/t2'r" Int"r/'c"
@/e "%M2 provides access to
t/e &anks data&ases
%ank
"ata&ases
P1)!c'( D't' V"2!
organiEation and location o
"ata on t/e storage media.
;:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
( D"("ton Ano#'(). (ll inormation a&out "r. Giddens is lost $/en
/e temporarily ceases to &e assigned to any courses.
( ta&le t/at is not suiciently normaliEed can suer rom logical
inconsistencies o various types' and rom anomalies involving data
operations. In suc/ a ta&le5
@/e same inormation can &e eDpressed on multiple recordsJ
t/ereore updates to t/e ta&le may result in logical
inconsistencies. ?or eDample' eac/ record in an MEmployeesI
2killsM ta&le mig/t contain an Employee I"' Employee (ddress'
and 2killJ t/us a c/ange o address or a particular employee $ill
potentially need to &e applied to multiple records Bone or eac/ o
/is skillsC. I t/e update is not carried t/roug/ successullySi'
t/at is' t/e employeeIs address is updated on some records &ut not
ot/ersSt/en t/e ta&le is let in an inconsistent state. 2peciically'
t/e ta&le provides conlicting ans$ers to t/e *uestion o $/at t/is
particular employeeIs address is. @/is p/enomenon is kno$n as
an u<d't" 'no#'().
@/ere are circumstances in $/ic/ certain acts cannot &e recorded
at all. ?or eDample' eac/ record in a M?aculty and @/eir CoursesM
ta&le mig/t contain a ?aculty I"' ?aculty Name' ?aculty )ire
"ate' and Course CodeSt/us $e can record t/e details o any
aculty mem&er $/o teac/es at least one course' &ut $e cannot
record t/e details o a ne$ly4/ired aculty mem&er $/o /as not
yet &een assigned to teac/ any courses. @/is p/enomenon is
kno$n as an n!"rton 'no#'().
@/ere are circumstances in $/ic/ t/e deletion o data
representing certain acts necessitates t/e deletion o data
representing completely dierent acts. @/e M?aculty and @/eir
CoursesM ta&le descri&ed in t/e previous eDample suers rom
t/is type o anomaly' or i a aculty mem&er temporarily ceases
to &e assigned to any courses' $e must delete t/e last o t/e
records on $/ic/ t/at aculty mem&er appears. @/is p/enomenon
is kno$n as a d"("ton 'no#'().
Ideally' a relational data&ase ta&le s/ould &e designed in suc/ a $ay as
to eDclude t/e possi&ility o update' insertion' and deletion anomalies.
@/e normal orms o relational data&ase t/eory provide guidelines or
deciding $/et/er a particular design $ill &e vulnera&le to suc/
anomalies. It is possi&le to correct an unnormaliEed design so as to make
it ad/ere to t/e demands o t/e normal orms5 t/is is called
normaliEation. 7emoval o redundancies o t/e ta&les $ill lead to
several ta&les' $it/ reerential integrity restrictions &et$een t/em.
<9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
NormaliEation typically involves decomposing an unnormaliEed ta&le
into t$o or more ta&les t/at' $ere t/ey to &e com&ined BjoinedC' $ould
convey eDactly t/e same inormation as t/e original ta&le.
B'c0-round to nor#'(>'ton: d"/nton!
Functon'( D"<"nd"nc)5 (ttri&ute % /as a unctional
dependency on attri&ute ( i.e. A H B i' or eac/ value o
attri&ute (' t/ere is eDactly one value o attri&ute %. I value o (
is repeating in tuples t/en value o % $ill also repeat. In our
eDample' Employee (ddress /as a unctional dependency on
Employee I"' &ecause a particular Employee I" value
corresponds to one and only one Employee (ddress value. BNote
t/at t/e reverse need not &e true5 several employees could live at
t/e same address and t/ereore one Employee (ddress value
could correspond to more t/an one Employee I". Employee I" is
t/ereore not unctionally dependent on Employee (ddress.C (n
attri&ute may &e unctionally dependent eit/er on a single
attri&ute or on a com&ination o attri&utes. It is not possi&le to
determine t/e eDtent to $/ic/ a design is normaliEed $it/out
understanding $/at unctional dependencies apply to t/e
attri&utes $it/in its ta&lesJ understanding t/is' in turn' re*uires
kno$ledge o t/e pro&lem domain. ?or eDample' an Employer
may re*uire certain employees to split t/eir time &et$een t$o
locations' suc/ as Ne$ Aork City and London' and t/ereore
$ant to allo$ Employees to /ave more t/an one Employee
(ddress. In t/is case' Employee (ddress $ould no longer &e
unctionally dependent on Employee I".
Tr&'( Functon'( D"<"nd"nc)5 ( trivial unctional
dependency is a unctional dependency o an attri&ute on a
superset o itsel. WEmployee I"' Employee (ddressX a
WEmployee (ddressX is trivial' as is WEmployee (ddressX a
WEmployee (ddressX.
Fu(( Functon'( D"<"nd"nc)5 (n attri&ute is ully unctionally
dependent on a set o attri&utes Q i it is
4 unctionally dependent on Q' and
4 not unctionally dependent on any proper su&set o Q. WEmployee
(ddressX /as a unctional dependency on WEmployee I"' 2killX'
&ut not a full unctional dependency' &ecause is also dependent
on WEmployee I"X.
Tr'n!t&" D"<"nd"nc)5 ( transitive dependency is an indirect
unctional dependency' one in $/ic/ =a> only &y virtue o
=a? and ?a>.
<+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Mu(t&'(u"d D"<"nd"nc)5 ( multivalued dependency is a
constraint according to $/ic/ t/e presence o certain ro$s in a
ta&le implies t/e presence o certain ot/er ro$s5 see t/e
Multivalued "ependency article or a rigorous deinition.
;on D"<"nd"nc)5 ( ta&le T is su&ject to a join dependency i T
can al$ays &e recreated &y joining multiple ta&les eac/ /aving a
su&set o t/e attri&utes o T.
Su<"rA")5 ( superkey is an attri&ute or set o attri&utes t/at
uni*uely identiies ro$s $it/in a ta&leJ in ot/er $ords' t$o
distinct ro$s are al$ays guaranteed to /ave distinct superkeys.
WEmployee I"' Employee (ddress' 2killX $ould &e a superkey
or t/e MEmployeesI 2killsM ta&leJ WEmployee I"' 2killX $ould
also &e a superkey.
C'ndd't" A")5 ( candidate key is a minimal superkey' t/at is' a
superkey or $/ic/ $e can say t/at no proper su&set o it is also a
superkey. WEmployee Id' 2killX $ould &e a candidate key or t/e
MEmployeesI 2killsM ta&le.
Non@Pr#" Attr$ut"5 ( non4prime attri&ute is an attri&ute t/at
does not occur in any candidate key. Employee (ddress $ould &e
a non4prime attri&ute in t/e MEmployeesI 2killsM ta&le.
Pr#'r) A")5 Most "%M2s re*uire a ta&le to &e deined as
/aving a single uni*ue key' rat/er t/an a num&er o possi&le
uni*ue keys. ( primary key is a key $/ic/ t/e data&ase designer
/as designated or t/is purpose.
,.+.+ C!tor)
Edgar ?. Codd irst proposed t/e process o normaliEation and $/at
came to &e kno$n as t/e *!t nor#'( /or#5
@/ere is' in act' a very simple elimination procedure $/ic/ $e s/all call
normaliEation. @/roug/ decomposition non4simple domains are replaced
&y Mdomains 1hose elements are atomic -non#decomposable. values.M
SEdgar ?. Codd' ( 7elational Model o "ata or Large 2/ared "ata
%anks
In /is paper' Edgar ?. Codd used t/e term Mnon4simpleM domains to
descri&e a /eterogeneous data structure' &ut later researc/ers $ould reer
to suc/ a structure as an a&stract data type.
,., Nor#'( For#!
@/e nor#'( /or#! Ba&&rev. NFC o relational data&ase t/eory provide
criteria or determining a ta&leIs degree o vulnera&ility to logical
inconsistencies and anomalies. @/e /ig/er t/e normal orm applica&le to
a ta&le' t/e less vulnera&le it is to inconsistencies and anomalies. Eac/
<8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
ta&le /as a M1-1"!t nor#'( /or#M BCNFC5 &y deinition' a ta&le al$ays
meets t/e re*uirements o its )N? and o all normal orms lo$er t/an
its )N?J also &y deinition' a ta&le ails to meet t/e re*uirements o any
normal orm /ig/er t/an its )N?.
Fr!t nor#'( /or#: ( ta&le is in irst normal orm B+N?C i and only i
it represents a relation. Given t/at data&ase ta&les em&ody a relation4like
orm' t/e deining c/aracteristic o one in irst normal orm is t/at it
does not allo$ duplicate ro$s or nulls. 2imply put' a ta&le $it/ a uni*ue
key B$/ic/' &y deinition' prevents duplicate ro$sC and $it/out any
nulla&le columns is in +N?.
S"cond nor#'( /or#: T1" crt"r' /or second normal orm 58N?6 'r":
@/e ta&le must &e in +N?.
None o t/e non4prime attri&utes o t/e ta&le are unctionally
dependent on a part Bproper su&setC o a candidate keyJ in ot/er
$ords' all unctional dependencies o non4prime attri&utes on
candidate keys are ull unctional dependencies. ?or eDample'
consider an MEmployeesI 2killsM ta&le $/ose attri&utes are
Employee I"' Employee Name' and 2killJ and suppose t/at t/e
com&ination o Employee I" and 2kill uni*uely identiies
records $it/in t/e ta&le. Given t/at Employee Name depends on
only one o t/ose attri&utes O namely' Employee I" O t/e ta&le is
not in 8N?.
In simple' a ta&le is 8N? i it is in +N? and all ields are
dependant on t/e $/ole o t/e primary key' or a relation is in
8N? i it is in +N? and every non4key attri&ute is ully dependent
on eac/ candidate key o t/e relation.
Note t/at i none o a +N? ta&leIs candidate keys are composite O
i.e. every candidate key consists o just on" attri&ute O t/en $e
can say immediately t/at t/e ta&le is in 8N?.
(ll columns must &e a act a&out t/e entire key' and not a su&set
o t/e key.

T1rd Nor#'( For#: T1" crt"r' /or t/ird normal orm 5=N?6 'r":
@/e ta&le must &e in 8N?.
@ransitive dependencies must &e eliminated. (ll attri&utes must
rely only on t/e primary key. 2o' i a data&ase /as a ta&le $it/
columns 2tudent I"' 2tudent' Company' and Company !/one
Num&er' it is not in =N?. @/is is &ecause t/e !/one num&er relies
on t/e Company. 2o' or it to &e in =N?' t/ere must &e a second
ta&le $it/ Company and Company !/one Num&er columnsJ t/e
!/one Num&er column in t/e irst ta&le $ould &e removed.
<=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Fourt1 nor#'( /or#: ( ta&le is in ourt/ normal orm B,N?C i and
only i' or every one o its non4trivial multivalued dependencies = ?'
= is a superkeySt/at is' = is eit/er a candidate key or a superset t/ereo.
?or eDample' i you can /ave t$o p/one num&ers values and t$o
email address values' t/en you s/ould not /ave t/em in t/e same
ta&le.
F/t1 nor#'( /or#: T1" crt"r' /or it/ normal orm 51N? 'nd '(!o
!#-N?6 'r":
@/e ta&le must &e in ,N?.
@/ere must &e no non4trivial join dependencies t/at do not ollo$
rom t/e key constraints. ( ,N? ta&le is said to &e in t/e 1N? i
and only i every join dependency in it is implied &y t/e
candidate keys.
Do#'nB0") Nor#'( For# Bor DANFC re*uires t/at a ta&le not &e
su&ject to any constraints ot/er t/an domain constraints and key
constraints.
S7t1 Nor#'( For#: (ccording to t/e deinition &y C/ristop/er #. "ate
and ot/ers' $/o eDtended data&ase t/eory to take account o temporal
and ot/er interval data' a ta&le is in siDt/ normal orm B.N?C i and only
i it satisies no non4trivial Bin t/e ormal senseC join dependencies at all'
' meaning t/at t/e it/ normal orm is also satisied. W/en reerring to
MjoinM in t/is conteDt it s/ould &e noted t/at "ate et al. additionally use
generaliEed deinitions o relational operators t/at also take account o
interval data Be.g. rom4date to4dateC &y conceptually &reaking t/em
do$n BMunpackingM t/emC into atomic units Be.g. individual daysC' $it/
deined rules or joining interval data' or instance.
,.8 D"nor#'(>'ton
"ata&ases intended or Online @ransaction !rocessing BOL@!C are
typically more normaliEed t/an data&ases intended or Online (nalytical
!rocessing BOL(!C. OL@! (pplications are c/aracteriEed &y a /ig/
volume o small transactions suc/ as updating a sales record at a super
market c/eckout counter. @/e eDpectation is t/at eac/ transaction $ill
leave t/e data&ase in a consistent state. %y contrast' data&ases intended
or OL(! operations are primarily Mread mostlyM data&ases. OL(!
applications tend to eDtract /istorical data t/at /as accumulated over a
long period o time. ?or suc/ data&ases' redundant or MdenormaliEedM
data may acilitate %usiness Intelligence applications. 2peciically'
dimensional ta&les in a star sc/ema oten contain denormaliEed data.
@/e denormaliEed or redundant data must &e careully controlled during
<,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
E@L processing' and users s/ould not &e permitted to see t/e data until it
is in a consistent state. @/e normaliEed alternative to t/e star sc/ema is
t/e sno$lake sc/ema. It /as never &een proven t/at t/is
denormaliEation itsel provides any increase in perormance' or i t/e
concurrent removal o data constraints is $/at increases t/e
perormance. In many cases' t/e need or denormaliEation /as $aned as
computers and 7"%M2 sot$are /ave &ecome more po$erul' &ut since
data volumes /ave generally increased along $it/ /ard$are and
sot$are perormance' OL(! data&ases oten still use denormaliEed
sc/emas.
"enormaliEation is also used to improve perormance on smaller
computers as in computeriEed cas/4registers and mo&ile devices' since
t/ese may use t/e data or look4up only Be.g. price lookupsC.
"enormaliEation may also &e used $/en no 7"%M2 eDists or a
platorm Bsuc/ as !almC' or no c/anges are to &e made to t/e data and a
s$it response is crucial.
,.5 Non@/r!t nor#'( /or# 5NFG or N*NF6
In recognition t/at denormaliEation can &e deli&erate and useul' t/e
non4irst normal orm is a deinition o data&ase designs $/ic/ do not
conorm to t/e irst normal orm' &y allo$ing Msets and sets o sets to &e
attri&ute domainsM B2c/ek +:<8C. @/is eDtension is a Bnon4optimalC $ay
o implementing /ierarc/ies in relations. 2ome academics /ave du&&ed
t/is practitioner developed met/od' M?irst (&4normal ?ormM' Codd
deined a relational data&ase as using relations' so any ta&le not in +N?
could not &e considered to &e relational.
Consider t/e ollo$ing ta&le5
Non@Fr!t Nor#'( For#
P"r!on F'&ort" Co(or!
%o& &lue' red
#ane green' yello$' red
(ssume a person /as several avorite colors. O&viously' avorite colors
consist o a set o colors modeled &y t/e given ta&le.
@o transorm t/is N?b ta&le into a +N? an MunnestM operator is re*uired
$/ic/ eDtends t/e relational alge&ra o t/e /ig/er normal orms. @/e
reverse operator is called MnestM $/ic/ is not al$ays t/e mat/ematical
inverse o MunnestM' alt/oug/ MunnestM is t/e mat/ematical inverse to
MnestM. (not/er constraint re*uired is or t/e operators to &e &ijective'
$/ic/ is covered &y t/e !artitioned Normal ?orm B!N?C.
<1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
8.: CONCLUSION
In t/e design and development o data&ase management systems'
organiEations may use one kind o "%M2 or daily transactions' and
t/en move t/e detail unto anot/er computer t/at uses anot/er "%M2
&etter suited or in*uiries and analysis. Overall systems design decisions
are perormed &y data&ase administrators. @/e t/ree most common
organiEations are /ierarc/ical' net$ork and relational models. ( "%M2
may provide one' t$o or all t/ree models in designing data&ase
management systems.
5.: SUMMARY
"ata&ase design is t/e process o deciding /o$ to organiEe data
into records types and /o$ t/e record types $ill relate to eac/
ot/er
"ata&ase development may start $it/ a top4do$n data planning
process. "ata&ase administrators and designers $ork $it/
corporate and end user management to develop an enterprise
model t/at deines t/e &asic &usiness process o t/e enterprise
D't'$'!" nor#'(>'ton' sometimes reerred to as canonical
synthesis' is a tec/ni*ue or designing relational data&ase ta&les
to minimiEe duplication o inormation and' in so doing' to
saeguard t/e data&ase against certain types o logical or
structural pro&lems' namely data anomalies
Edgar ?. Codd irst proposed t/e process o normaliEation and
$/at came to &e kno$n as t/e *!t nor#'( /or#5
@/e nor#'( /or#! Ba&&rev. NFC o relational data&ase t/eory
provide criteria or determining a ta&leIs degree o vulnera&ility
to logical inconsistencies and anomalies.
"ata&ases intended or Online @ransaction !rocessing BOL@!C
are typically more normaliEed t/an data&ases intended or Online
(nalytical !rocessing BOL(!C. OL@! (pplications are
c/aracteriEed &y a /ig/ volume o small transactions suc/ as
updating a sales record at a super market c/eckout counter.
In recognition t/at denormaliEation can &e deli&erate and useul'
t/e non4irst normal orm is a deinition o data&ase designs
$/ic/ do not conorm to t/e irst normal orm' &y allo$ing Msets
and sets o sets to &e attri&ute domainsM

?.: TUTOR@MARAED ASSIGNMENT
+. Mention t/e 1 p/ases in t/e development o data&ase.
8. Identiy t/e criteria or t/e second normal orm B8N?C.
7.: REFERENCESBFURTCER READINGS
<.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Codd' E.?. B#une +:;9C. M( 7elational Model o "ata or Large 2/ared
"ata %anksM. "ommunications of the A"M *, B.C5 =;;O=<;.
"ate' C.#. MW/at ?irst Normal ?orm 7eally MeansM in Date on
Database: 8ritings /@@@#/@@A B2pringer40erlag' 899.C' p. +8<.
Codd' E.?. MIs Aour "%M2 7eally 7elationalNM Computer$orld'
Octo&er +,' +:<1.
Coles' M. Sc S"#<"r Nu((. 899;. 2FL 2erver Central. 7edgate
2ot$are.
3ent' William. M( 2imple Guide to ?ive Normal ?orms in 7elational
"ata&ase @/eoryM' "ommunications of the A"M +? B8C' ?e&.
+:<=' pp. +894+81.
Codd' E.?. M?urt/er NormaliEation o t/e "ata %ase 7elational
Model.M B!resented at Courant Computer 2cience 2ymposia
2eries .' M"ata %ase 2ystems'M Ne$ Aork City' May 8,t/481t/'
+:;+.C I%M 7esearc/ 7eport 7#:9: B(ugust =+st' +:;+C.
7epu&lis/ed in 7andall #. 7ustin Bed.C' Data $ase 0ystems:
"ourant "omputer 0cience 0ymposia 0eries A. !rentice4)all'
+:;8.
Codd' E. ?. M7ecent Investigations into 7elational "ata %ase 2ystems.M
I%M 7esearc/ 7eport 7#+=<1 B(pril 8=rd' +:;,C. 7epu&lis/ed in
(roc. !<5B "ongress B2tock/olm' 2$eden' +:;,C. Ne$ Aork'
N.A.5 Nort/4)olland B+:;,C.
?agin' 7onald B2eptem&er +:;;C. MMultivalued "ependencies and a
Ne$ Normal ?orm or 7elational "ata&asesM. A"M Transactions
on Database 0ystems + B+C5 8.;. doi5+9.++,1-=8911;.=891;+.
"ate' C/ris #.J )ug/ "ar$en' Nikos (. LorentEos c#anuary 899=Z.
MC/apter +9 "ata&ase "esign' 2ection +9.,5 2iDt/ Normal ?ormM'
Temporal Data and the )elational Model: A Detailed
Investigation into the Application of Interval and )elation
Theory to the (roblem of Temporal Database Management.
ODord5 Elsevier L@"' p+;.. I2%N +11<.9<11:
OK%rien (. #ames' B899=C. B++
t/
EditionC. Introduction to Inormation
2ystems' McGr$4)ill.
\imyani' E. B#une 899.C. M@emporal (ggregates and @emporal
Universal Fuantiication in 2tandard 2FLM. A"M
0IM2D )ecord& volume CD& number /. (CM.
<;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
UNIT + STRUCTURED =UERY LANGUAGE 5S=L6
CONTENTS
+.9 Introduction
8.9 O&jectives
,.: M'n Cont"nt
,.* C!tor)
,.+ St'nd'rd>'ton
,., Sco<" 'nd E7t"n!on!
=., L'n-u'-" E("#"nt!
,.5 Crtc!#! o/ S=L
,.? A(t"rn't&"! to S=L
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
S=L BStructur"d =u"r) L'n-u'-"C is a data&ase computer language
designed or t/e retrieval and management o data in relational data&ase
management systems B7"%M2C' data&ase sc/ema creation and
modiication' and data&ase o&ject access control management.
2FL is a standard interactive and programming language or *uerying
and modiying data and managing data&ases. (lt/oug/ 2FL is &ot/ an
(N2I and an I2O standard' many data&ase products support 2FL $it/
proprietary eDtensions to t/e standard language. @/e core o 2FL is
ormed &y a command language t/at allo$s t/e retrieval' insertion'
updating' and deletion o data' and perorming management and
administrative unctions. 2FL also includes a Call Level Interace
B2FL-CLIC or accessing and managing data and data&ases remotely.
@/e irst version o 2FL $as developed at I%M &y "onald ".
C/am&erlin and 7aymond ?. %oyce in t/e early +:;9s. @/is version'
initially called SE=UEL' $as designed to manipulate and retrieve data
stored in I%MIs original relational data&ase product' 2ystem 7. @/e 2FL
language $as later ormally standardiEed &y t/e (merican National
2tandards Institute B(N2IC in +:<.. 2u&se*uent versions o t/e 2FL
standard /ave &een released as International OrganiEation or
2tandardiEation BI2OC standards.
Originally designed as a declarative *uery and data manipulation
language' variations o 2FL /ave &een created &y 2FL data&ase
management system B"%M2C vendors t/at add procedural constructs'
<<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
control4o4lo$ statements' user4deined data types' and various ot/er
language eDtensions. Wit/ t/e release o t/e 2FL5 +::: standard' many
suc/ eDtensions $ere ormally adopted as part o t/e 2FL language via
t/e 2FL !ersistent 2tored Modules B2FL-!2MC portion o t/e standard.
Common criticisms o 2FL include a perceived lack o cross4platorm
porta&ility &et$een vendors' inappropriate /andling o missing data Bsee
'ull -0E7.' and unnecessarily compleD and occasionally am&iguous
language grammar and semantics.
S=L
P'r'd-# Multi4paradigm
A<<"'r"d n +:;,
D"!-n"d $) "onald ". C/am&erlin
and 7aymond ?. %oyce
D"&"(o<"r I%M
L't"!t r"("'!" 2FL5899.- 899.
T)<n- d!c<(n" static' strong
M'%or
#<("#"nt'ton!
Many
D'("ct! 2FL4<.' 2FL4<:'
2FL4:8' 2FL5+:::'
2FL5 899=' 2FL5899.
In/(u"nc"d $) "atalog
In/(u"nc"d CFL' LINF' Windo$s
!o$er2/ell
OS Cross4platorm
<:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
d"/n" !tructur" Du"r) ('n-u'-" 5S=L6
trace t/e /istory and development process o 2FL
kno$ t/e scope and eDtension o 2FL
identiy t/e vital indices o 2FL
kno$ $/at are t/e language elements
kno$ some o t/e criticism o 2FL
ans$er t/e *uestion o alternatives to 2FL
,.: MAIN CONTENT
,.* C!tor)
"uring t/e +:;9s' a group at I%M 2an #ose 7esearc/ La&oratory
developed t/e 2ystem 7 relational data&ase management system' &ased
on t/e model introduced &y Edgar ?. Codd in /is inluential paper' A
R"('ton'( Mod"( o/ D't' /or L'r-" S1'r"d D't' B'n0!. "onald ".
C/am&erlin and 7aymond ?. %oyce o I%M su&se*uently created t/e
Structur"d En-(!1 =u"r) L'n-u'-" B2EFUELC to manipulate and
manage data stored in 2ystem 7. @/e acronym 2EFUEL $as later
c/anged to 2FL &ecause M2EFUELM $as a trademark o t/e U34&ased
)a$ker 2iddeley aircrat company.
@/e irst non4commercial non42FL 7"%M2' Ingres' $as developed in
+:;, at t/e U.C. %erkeley. Ingres implemented a *uery language kno$n
as FUEL' $/ic/ $as later supplanted in t/e marketplace &y 2FL.
In t/e late +:;9s' 7elational 2ot$are' Inc. Bno$ Oracle CorporationC
sa$ t/e potential o t/e concepts descri&ed &y Codd' C/am&erlin' and
%oyce and developed t/eir o$n 2FL4&ased 7"%M2 $it/ aspirations o
selling it to t/e U.2. Navy' CI(' and ot/er government agencies. In t/e
summer o +:;:' 7elational 2ot$are' Inc. introduced t/e irst
commercially availa&le implementation o 2FL' Oracle 08 B0ersion8C
or 0(Q computers. 2racle 6/ &eat I%MIs release o t/e 2ystem-=<
7"%M2 to market &y a e$ $eeks.
(ter testing 2FL at customer test sites to determine t/e useulness and
practicality o t/e system' I%M &egan developing commercial products
&ased on t/eir 2ystem 7 prototype including 2ystem-=<' 2FL-"2' and
"%8' $/ic/ $ere commercially availa&le in +:;:' +:<+' and +:<='
respectively.
:9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.+ St'nd'rd>'ton
2FL $as adopted as a standard &y (N2I in +:<. and I2O in +:<;. In
t/e original 2FL standard' (N2I declared t/at t/e oicial pronunciation
or 2FL is Mes *ueue elM. )o$ever' many Englis/4speaking data&ase
proessionals still use t/e nonstandard pronunciation -dsiek$fl- Blike t/e
$ord Mse*uelMC. 2EFUEL $as an earlier I%M data&ase language' a
predecessor to t/e 2FL language.
Until +::.' t/e National Institute o 2tandards and @ec/nology BNI2@C
data management standards program $as tasked $it/ certiying 2FL
"%M2 compliance $it/ t/e 2FL standard. In +::.' /o$ever' t/e NI2@
data management standards program $as dissolved' and vendors are
no$ relied upon to sel4certiy t/eir products or compliance.
@/e 2FL standard /as gone t/roug/ a num&er o revisions' as s/o$n
&elo$5
Y"'r N'#" A('! Co##"nt!
+:<. 2FL4<. 2FL4<; ?irst pu&lis/ed &y (N2I. 7atiied &y I2O
in +:<;.
+:<: 2FL4<: ?I!2 +8;4+ Minor revision' adopted as ?I!2 +8;4+.
+::8 2FL4:8 2FL8' ?I!2
+8;48
Major revision BI2O :9;1C' +ntry 7evel
2FL4:8 adopted as ?I!2 +8;48.
+::: 2FL5+::: 2FL= (dded regular eDpression matc/ing'
recursive *ueries' triggers' support or
procedural and control4o4lo$ statements'
non4scalar types' and some o&ject4oriented
eatures.
899= 2FL5899= Introduced QML4related eatures' 1indo1
functions' standardiEed se*uences' and
columns $it/ auto4generated values
Bincluding identity4columnsC.
899. 2FL5899. I2O-IEC :9;14+,5899. deines $ays in
$/ic/ 2FL can &e used in conjunction
$it/ QML. It deines $ays o importing
and storing QML data in an 2FL data&ase'
manipulating it $it/in t/e data&ase and
pu&lis/ing &ot/ QML and conventional
2FL4data in QML orm. In addition' it
provides acilities t/at permit applications
to integrate into t/eir 2FL code t/e use o
QFuery' t/e QML Fuery Language
pu&lis/ed &y t/e World Wide We&
Consortium BW=CC' to concurrently access
ordinary 2FL4data and QML documents.
:+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
@/e 2FL standard is not reely availa&le. 2FL5 899= and 2FL5 899.
may &e purc/ased rom I2O or (N2I. ( late drat o 2FL5 899= is
reely availa&le as a Eip arc/ive' /o$ever' rom W/itemars/ Inormation
2ystems Corporation. @/e Eip arc/ive contains a num&er o !"? iles
t/at deine t/e parts o t/e 2FL5 899= speciication.
,., Sco<" 'nd E7t"n!on!
Proc"dur'( E7t"n!on!
2FL is designed or a speciic purpose5 to *uery data contained in a
relational data&ase. 2FL is a set4&ased' declarative *uery language' not
an imperative language suc/ as C or %(2IC. )o$ever' t/ere are
eDtensions to 2tandard 2FL $/ic/ add procedural programming
language unctionality' suc/ as control4o4lo$ constructs. @/ese are5
Sourc"
Co##on
N'#"
Fu(( N'#"
(N2I-I2O
2tandard
2FL-!2M 2FL-!ersistent 2tored Modules
I%M 2FL !L 2FL !rocedural Language Bimplements 2FL-!2MC
Microsot-
2y&ase
@42FL @ransact42FL
My2FL 2FL-!2M 2FL-!ersistent 2tored Module Bas in I2O 2FL5899=C
Oracle !L-2L !rocedural Language-2FL B&ased on (daC
!ostgre2FL !L-pg2FL
!rocedural Language-!ostgre2FL 2tructured Fuery
Language B&ased on Oracle !L-2FLC
!ostgre2FL !L-!2M
!rocedural Language-!ersistent 2tored Modules
Bimplements 2FL-!2MC
In addition to t/e standard 2FL-!2M eDtensions and proprietary 2FL
eDtensions' procedural and o&ject4oriented programma&ility is availa&le
on many 2FL platorms via "%M2 integration $it/ ot/er languages.
@/e 2FL standard deines 2FL-#7@ eDtensions B2FL 7outines and
@ypes or t/e #ava !rogramming LanguageC to support #ava code in
2FL data&ases. 2FL 2erver 8991 uses t/e 2FLCL7 B2FL 2erver
Common Language 7untimeC to /ost managed .NE@ assem&lies in t/e
data&ase' $/ile prior versions o 2FL 2erver $ere restricted to using
unmanaged eDtended stored procedures $/ic/ $ere primarily $ritten in
C. Ot/er data&ase platorms' like My2FL and !ostgres' allo$ unctions
to &e $ritten in a $ide variety o languages including !erl' !yt/on' @cl'
and C.
Addton'( E7t"n!on!
2FL5 899= also deines several additional eDtensions to t/e standard to
increase 2FL unctionality overall. @/ese eDtensions include5
:8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
@/e 2FL-CLI' or C'((@L"&"( Int"r/'c"' eDtension is deined in I2O-IEC
:9;14=5899=. @/is eDtension deines common interacing components
Bstructures and proceduresC t/at can &e used to eDecute 2FL statements
rom applications $ritten in ot/er programming languages. @/e
2FL-CLI eDtension is deined in suc/ a $ay t/at 2FL statements and
2FL-CLI procedure calls are treated as separate rom t/e calling
applicationIs source code.
@/e 2FL-ME"' or M'n'-"#"nt o/ E7t"rn'( D't'' eDtension is
deined &y I2O-IEC :9;14:5899=. 2FL-ME" provides eDtensions to
2FL t/at deine oreign4data $rappers and datalink types to allo$ 2FL
to manage eDternal data. EDternal data is data t/at is accessi&le to' &ut
not managed &y' an 2FL4&ased "%M2.
@/e 2FL-OL%' or O$%"ct L'n-u'-" Bndn-!' eDtension is deined &y
I2O-IEC :9;14+95899=. 2FL-OL% deines t/e syntaD and symantics o
2FL#' $/ic/ is 2FL em&edded in #ava. @/e standard also descri&es
mec/anisms to ensure &inary porta&ility o 2FL# applications' and
speciies various #ava packages and t/eir contained classes.
@/e 2FL-2c/emata' or In/or#'ton 'nd D"/nton Sc1"#'!'
eDtension is deined &y I2O-IEC :9;14++5899=. 2FL-2c/emata deines
t/e Inormation 2c/ema and "einition 2c/ema' providing a common
set o tools to make 2FL data&ases and o&jects sel4descri&ing. @/ese
tools include t/e 2FL o&ject identiier' structure and integrity
constraints' security and aut/oriEation speciications' eatures and
packages o I2O-IEC :9;1' support o eatures provided &y 2FL4&ased
"%M2 implementations' 2FL4&ased "%M2 implementation
inormation and siEing items' and t/e values supported &y t/e "%M2
implementations.
@/e 2FL-#7@' or S=L Routn"! 'nd T)<"! /or t1" ;'&'
Pro-r'##n- L'n-u'-"' eDtension is deined &y I2O-IEC
:9;14+=5899=. 2FL-#7@ speciies t/e a&ility to invoke static #ava
met/ods as routines rom $it/in 2FL applications. It also calls or t/e
a&ility to use #ava classes as 2FL structured user4deined types.
@/e 2FL-QML' or IML@R"('t"d S<"c/c'ton!' eDtension is deined
&y I2O-IEC :9;14+,5899=. 2FL-QML speciies 2FL4&ased eDtensions
or using QML in conjunction $it/ 2FL. @/e QML data type is
introduced' as $ell as several routines' unctions' and QML4to42FL data
type mappings to support manipulation and storage o QML in an 2FL
data&ase.
@/e 2FL-!2M' or P"r!!t"nt Stor"d Modu("!' eDtension is deined &y
I2O-IEC :9;14,5899=. 2FL-!2M standardiEes procedural eDtensions or
2FL' including lo$ o control' condition /andling' statement condition
:=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
signals and resignals' cursors and local varia&les' and assignment o
eDpressions to varia&les and parameters. In addition' 2FL-!2M
ormaliEes declaration and maintenance o persistent data&ase language
routines Be.g.' Mstored proceduresMC.
,.8 L'n-u'-" E("#"nt!
@/is c/art s/o$s several o t/e 2FL language elements t/at compose a
single statement.
@/e 2FL language is su&4divided into several language elements'
including5
0tatements $/ic/ may /ave a persistent eect on sc/emas and
data' or $/ic/ may control transactions' program lo$'
connections' sessions' or diagnostics.
Eueries $/ic/ retrieve data &ased on speciic criteria.
+xpressions $/ic/ can produce eit/er scalar values or ta&les
consisting o columns and ro$s o data.
(redicates $/ic/ speciy conditions t/at can &e evaluated to 2FL
t/ree4valued logic B=0LC %oolean trut/ values and $/ic/ are
used to limit t/e eects o statements and *ueries' or to c/ange
program lo$.
"lauses' $/ic/ are in some cases optional' constituent
components o statements and *ueries.
W/itespace is generally ignored in 2FL statements and *ueries'
making it easier to ormat 2FL code or reada&ility.
2FL statements also include t/e semicolon BMJMC statement
terminator. @/oug/ not re*uired on every platorm' it is deined
as a standard part o t/e 2FL grammar.
=u"r"!
@/e most common operation in 2FL data&ases is t/e *uery' $/ic/ is
perormed $it/ t/e declarative 2ELEC@ key$ord. 2ELEC@ retrieves
data rom a speciied ta&le' or multiple related ta&les' in a data&ase.
W/ile oten grouped $it/ "ata Manipulation Language B"MLC
statements' t/e standard 2ELEC@ *uery is considered separate rom
2FL "ML' as it /as no persistent eects on t/e data stored in a
data&ase. Note t/at t/ere are some platorm4speciic variations o
2ELEC@ t/at can persist t/eir eects in a data&ase' suc/ as t/e 2ELEC@
IN@O syntaD t/at eDists in some data&ases.
2FL *ueries allo$ t/e user to speciy a description o t/e desired result
set' &ut it is let to t/e devices o t/e data&ase management system
B"%M2C to plan' optimiEe' and perorm t/e p/ysical operations
:,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
necessary to produce t/at result set in as eicient a manner as possi&le.
(n 2FL *uery includes a list o columns to &e included in t/e inal
result immediately ollo$ing t/e 2ELEC@ key$ord. (n asterisk BM`MC
can also &e used as a M$ildcardM indicator to speciy t/at all availa&le
columns o a ta&le Bor multiple ta&lesC are to &e returned. 2ELEC@ is t/e
most compleD statement in 2FL' $it/ several optional key$ords and
clauses' including5
@/e ?7OM clause $/ic/ indicates t/e source ta&le or ta&les rom
$/ic/ t/e data is to &e retrieved. @/e ?7OM clause can include
optional #OIN clauses to join related ta&les to one anot/er &ased
on user4speciied criteria.
@/e W)E7E clause includes a comparison predicate' $/ic/ is
used to restrict t/e num&er o ro$s returned &y t/e *uery. @/e
W)E7E clause is applied &eore t/e G7OU! %A clause. @/e
W)E7E clause eliminates all ro$s rom t/e result set $/ere t/e
comparison predicate does not evaluate to @rue.
@/e G7OU! %A clause is used to com&ine' or group' ro$s $it/
related values into elements o a smaller set o ro$s. G7OU! %A
is oten used in conjunction $it/ 2FL aggregate unctions or to
eliminate duplicate ro$s rom a result set.
@/e )(0ING clause includes a comparison predicate used to
eliminate ro$s ater t/e G7OU! %A clause is applied to t/e
result set. %ecause it acts on t/e results o t/e G7OU! %A
clause' aggregate unctions can &e used in t/e )(0ING clause
predicate.
@/e O7"E7 %A clause is used to identiy $/ic/ columns are
used to sort t/e resulting data' and in $/ic/ order t/ey s/ould &e
sorted Boptions are ascending or descendingC. @/e order o ro$s
returned &y an 2FL *uery is never guaranteed unless an O7"E7
%A clause is speciied.
D't' D"/nton
@/e second group o key$ords is t/e "ata "einition Language B""LC.
""L allo$s t/e user to deine ne$ ta&les and associated elements. Most
commercial 2FL data&ases /ave proprietary eDtensions in t/eir ""L'
$/ic/ allo$ control over nonstandard eatures o t/e data&ase system.
@/e most &asic items o ""L are t/e C7E(@E' (L@E7' 7EN(ME'
@7UNC(@E and "7O! statements5
:1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
C7E(@E causes an o&ject Ba ta&le' or eDampleC to &e created
$it/in t/e data&ase.
"7O! causes an eDisting o&ject $it/in t/e data&ase to &e
deleted' usually irretrieva&ly.
@7UNC(@E deletes all data rom a ta&le Bnon4standard' &ut
common 2FL statementC.
(L@E7 statement permits t/e user to modiy an eDisting o&ject
in various $ays 44 or eDample' adding a column to an eDisting
ta&le.
D't' Contro(
@/e t/ird group o 2FL key$ords is t/e "ata Control Language B"CLC.
"CL /andles t/e aut/oriEation aspects o data and permits t/e user to
control $/o /as access to see or manipulate data $it/in t/e data&ase. Its
t$o main key$ords are5
G7(N@ aut/oriEes one or more users to perorm an operation or
a set o operations on an o&ject.
7E0O3E removes or restricts t/e capa&ility o a user to perorm
an operation or a set o operations.
,.5 Crtc!#! o/ S=L
@ec/nically' 2FL is a declarative computer language or use $it/ M2FL
data&asesM. @/eorists and some practitioners note t/at many o t/e
original 2FL eatures $ere inspired &y' &ut violated' t/e relational
model or data&ase management and its tuple calculus realiEation.
7ecent eDtensions to 2FL ac/ieved relational completeness' &ut /ave
$orsened t/e violations' as documented in The Third Manifesto.
In addition' t/ere are also some criticisms a&out t/e practical use o
2FL5
Implementations are inconsistent and' usually' incompati&le
&et$een vendors. In particular date and time syntaD' string
concatenation' nulls' and comparison case sensitivity oten vary
rom vendor to vendor.
@/e language makes it too easy to do a Cartesian join Bjoining all
possi&le com&inationsC' $/ic/ results in Mrun4a$ayM result sets
$/en W)E7E clauses are mistyped. Cartesian joins are so rarely
used in practice t/at re*uiring an eDplicit C(7@E2I(N key$ord
may &e $arranted.
0E7 !<</ introduced t/e C7O22 #OIN key$ord t/at allo$s t/e user to
make clear t/at a cartesian join is intended' &ut t/e s/ort/and Mcomma4
joinM $it/ no predicate is still accepta&le syntaD.
:.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
It is also possi&le to misconstruct a W)E7E on an update or
delete' t/ere&y aecting more ro$s in a ta&le t/an desired.
@/e grammar o 2FL is per/aps unnecessarily compleD'
&orro$ing a CO%OL4like key$ord approac/' $/en a unction4
inluenced syntaD could result in more re4use o e$er grammar
and syntaD rules. @/is is per/aps due to I%MIs early goal o
making t/e language more Englis/4like so t/at it is more
approac/a&le to t/ose $it/out a mat/ematical or programming
&ackground. B!redecessors to 2FL $ere more mat/ematical.C
R"'!on! /or ('c0 o/ <ort'$(t)
!opular implementations o 2FL commonly omit support or &asic
eatures o 2tandard 2FL' suc/ as t/e "(@E or @IME data types'
preerring variations o t/eir o$n. (s a result' 2FL code can rarely &e
ported &et$een data&ase systems $it/out modiications.
@/ere are several reasons or t/is lack o porta&ility &et$een data&ase
systems5
@/e compleDity and siEe o t/e 2FL standard means t/at most
data&ases do not implement t/e entire standard.
@/e standard does not speciy data&ase &e/avior in several
important areas Be.g. indeDes' ile storage...C' leaving it up to
implementations o t/e data&ase to decide /o$ to &e/ave.
@/e 2FL standard precisely speciies t/e syntaD t/at a
conorming data&ase system must implement. )o$ever' t/e
standardIs speciication o t/e semantics o language constructs is
less $ell4deined' leading to areas o am&iguity.
Many data&ase vendors /ave large eDisting customer &asesJ
$/ere t/e 2FL standard conlicts $it/ t/e prior &e/avior o t/e
vendorIs data&ase' t/e vendor may &e un$illing to &reak
&ack$ard compati&ility.
,.? A(t"rn't&"! to S=L
( distinction s/ould &e made &et$een alternatives to relational *uery
languages and alternatives to 2FL. @/e lists &elo$ are proposed
alternatives to 2FL' &ut are still BnominallyC relational. 2ee navigational
data&ase or alternatives to relational5
I%M %usiness 2ystem +8 BI%M %2+8C
:;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
@utorial "
)i&ernate Fuery Language B)FLC 4 ( #ava4&ased tool t/at uses
modiied 2FL
Fuel introduced in +:;, &y t/e U.C. %erkeley Ingres project.
O&ject Fuery Language
"atalog
.FL 4 o&ject4oriented "atalog
LINF
FLC 4 Fuery Interace to Mnesia' E@2' "ets' etc BErlang
programming languageC
," Fuery Language B," FLC
F%E BFuery %y EDampleC created &y Mos/g \loo' I%M +:;;
(ldat 7elational (lge&ra and "omain alge&ra
8.: CONCLUSION
@/e structured *uery language B2FLC /as &ecome t/e oicial dominant
language or $riting data&ase management system. @/is language
diers rom conventional met/ods o computer language $riting'
&ecause it is not necessarily procedural. (n 2FL statement is not really
a command to computer &ut it is rat/er a description o some o t/e
daatcotained in a data&ase. 2FL is not procedural &ecause it does not
give step4&y4step commands to t/e computer or data&ase. It descri&es
data and sometimes instructs t/e data&ase to do somet/ing $it/ t/e data.
Irrespective o t/is' 2FL /as it o$n criticism.
5.: SUMMARY
S=L BStructur"d =u"r) L'n-u'-"C is a data&ase computer
language designed or t/e retrieval and management o data in
relational data&ase management systems B7"%M2C' data&ase
sc/ema creation and modiication' and data&ase o&ject access
control management.
"uring t/e +:;9s' a group at I%M 2an #ose 7esearc/ La&oratory
developed t/e 2ystem 7 relational data&ase management system'
&ased on t/e model introduced &y Edgar ?. Codd in /is
inluential paper' A R"('ton'( Mod"( o/ D't' /or L'r-"
S1'r"d D't' B'n0!.
2FL $as adopted as a standard &y (N2I in +:<. and I2O in
+:<;. In t/e original 2FL standard' (N2I declared t/at t/e
oicial pronunciation or 2FL is Mes *ueue elM.
:<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
2FL is designed or a speciic purpose5 to *uery data contained in
a relational data&ase. 2FL is a set4&ased' declarative *uery
language' not an imperative language suc/ as C or %(2IC.
@/is c/art s/o$s several o t/e 2FL language elements t/at
compose a single statement.
@ec/nically' 2FL is a declarative computer language or use $it/
M2FL data&asesM. @/eorists and some practitioners note t/at
many o t/e original 2FL eatures $ere inspired &y' &ut violated'
t/e relational model or data&ase management and its tuple
calculus realiEation.
( distinction s/ould &e made &et$een alternatives to relational
*uery languages and alternatives to 2FL
?.: TUTOR@MARAED ASSIGNMENT
List and discuss t/e su&4divisions o t/e language o structures *uery
language
7.: REFERENCESBFURTCER READINGS
C/apple' Mike. M2FL ?undamentals B)@MLC. About.com: Databases.
(&out.com.
M2tructured Fuery Language B2FLCM B)@MLC. International %usiness
Mac/ines BOcto&er 8;' 899.C.
Codd' E.?. B#une +:;9C. M( 7elational Model o "ata or Large 2/ared
"ata %anksM. "ommunications of the A"M *, BNo. .C5 pp. =;;O
=<;. (ssociation or Computing Mac/inery. doi5
+9.++,1-=.8=<,.=.8.<1.
C/am&erlin' "onald ".J %oyce' 7aymond ?. B+:;,C. M2EFUEL5 (
2tructured Englis/ Fuery LanguageM. (roceedings of the !<5B
A"M 0I4ID+T 8orkshop on Data Description& Access and
"ontrol5 pp. 8,:O8.,. (ssociation or Computing Mac/inery.
a

b
Oppel' (ndy BMarc/ +' 899,C. Databases Demystified. 2an ?rancisco'
C(5 McGra$4)ill Os&orne Media' pp. :94:+. I2%N
949;4881=.,4:.
M)istory o I%M' +:;< B)@MLC. I$M Archives. I%M.
::
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
C/apple' Mike BNC. M2FL ?undamentalsM B)@MLC. About.com.
(&out.com' ( Ne$ Aork @imes Company. 7etrieved on
899;49<4=9.
Melton' #imJ (lan 7 2imon B+::=C. *nderstanding the 'e1 0E7: A
"omplete uide. Morgan 3aumann' 1=.. I2%N5
+11<.98,1=. Tc/apter +.8 W/at is 2FLN 2FL Bcorrectly
pronounced Mess cue ell'M instead o t/e some$/at common
Mse*uelMC' is a...U
MUnderstand 2FLM. $$$.a*s.org-docs-.
"oll' 2/elley B#une +:' 8998C. MIs 2FL a 2tandard (nymoreNM B)@MLC.
Tech)epublicFs $uilder.com. @ec/7epu&lic. 7etrieved on
899;49.49:.
I02GI+" <@5D#!!:/@@C: Information and Definition 0chemas
-0E7G0chemata.' 899=' pp. p. +.
(N2I-I2O-IEC International 2tandard BI2C. "ata&ase Language 2FLS
!art 85 ?oundation B2FL-?oundationC. +:::.
MIN@O Clause B@ransact42FLCM B)@MLC. 0E7 0erver /@@D $ooks
2nline. Microsot B899;C. 7etrieved on 899;49.4+;.
M. Negri' G. !elagatti' L. 2&attella B+:<:C 0emantics and problems of
universal quantification in 0E7.
Claudio ?ratarcangeli B+::+C Technique for universal quantification in
0E7.
#alal 3a$as/ "omplex quantification in 0tructured Euery 7anguage
-0E7.: a Tutorial *sing )elational "alculus 4 #ournal o
Computers in Mat/ematics and 2cience @eac/ing I22N
9;=+4:81< 0olume 8=' Issue 8' 899, ((CE Norolk' 0(.
+99
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
UNIT , DATABASE AND INFORMATION SYSTEMS
SECURITY
CONTENTS
+.9 Introduction
8.9 O&jectives
,.: M'n Cont"nt
,.* B'!c Prnc<("!
=.8 "ata&ase 2ecurity
=.= 7elational "%M2 2ecurity
=., !roposed OO"%M2 2ecurity Models
,.5 S"curt) C('!!/c'ton /or In/or#'ton
,.? Cr)<to-r'<1)
,.7 D!'!t"r R"co&"r) P('nnn-
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
D't' !"curt) is t/e means o ensuring t/at data is kept sae rom
corruption and t/at access to it is suita&ly controlled. @/us data security
/elps to ensure privacy. It also /elps in protecting personal data.
In/or#'ton !"curt) means protecting inormation and inormation
systems rom unaut/oriEed access' use' disclosure' disruption'
modiication' or destruction. @/e terms inormation security' computer
security and inormation assurance are re*uently used interc/angea&ly.
@/ese ields are interrelated and s/are t/e common goals o protecting
t/e conidentiality' integrity and availa&ility o inormationJ /o$ever'
t/ere are some su&tle dierences &et$een t/em. @/ese dierences lie
primarily in t/e approac/ to t/e su&ject' t/e met/odologies used' and t/e
areas o concentration. Inormation security is concerned $it/ t/e
conidentiality' integrity and availa&ility o data regardless o t/e orm
t/e data may take5 electronic' print' or ot/er orms.
Governments' military' inancial institutions' /ospitals' and private
&usinesses amass a great deal o conidential inormation a&out t/eir
employees' customers' products' researc/' and inancial status. Most o
t/is inormation is no$ collected' processed and stored on electronic
computers and transmitted across net$orks to ot/er computers. 2/ould
conidential inormation a&out a &usinesses customers or inances or
ne$ product line all into t/e /ands o a competitor' suc/ a &reac/ o
security could lead to lost &usiness' la$ suits or even &ankruptcy o t/e
+9+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
&usiness. !rotecting conidential inormation is a &usiness re*uirement'
and in many cases also an et/ical and legal re*uirement. ?or t/e
individual' inormation security /as a signiicant eect on privacy'
$/ic/ is vie$ed very dierently in dierent cultures.
@/e ield o inormation security /as gro$n and evolved signiicantly in
recent years. (s a career c/oice t/ere are many $ays o gaining entry
into t/e ield. It oers many areas or specialiEation including
Inormation 2ystems (uditing' %usiness Continuity !lanning and
"igital ?orensics 2cience' to name a e$.
+.: OB;ECTIVES
(t t/e end o t/e unit' you s/ould &e a&le to5
understand t/e concepts o t/e CI( @rade in respect o inormation
systems security
kno$ t/e components o t/e "onn !arker model or t/e classic @riad
identiy t/e dierent types o inormation access control and /o$
t/ey dier rom eac/ ot/er
dierentiate "iscretionary and Mandatory (ccess Control !olicies
kno$ t/e !roposed OO"%M2 2ecurity Models
dierentiate &et$een t/e OO"%M2 models
deining appropriate procedures and protection re*uirements or
inormation security
deine cryptograp/y and kno$ its applications in data security.
,.: MAIN CONTENT
,.* B'!c Prnc<("!
,.*.* A") Conc"<t!
?or over t$enty years inormation security /as /eld t/at conidentiality'
integrity and availa&ility Bkno$n as t/e CI( @riadC are t/e core
principles o inormation system security.
Confidentiality
Conidentiality is t/e property o preventing disclosure o inormation to
unaut/oriEed individuals or systems. ?or eDample' a credit card
transaction on t/e Internet re*uires t/e credit card num&er to &e
transmitted rom t/e &uyer to t/e merc/ant and rom t/e merc/ant to a
transaction processing net$ork. @/e system attempts to enorce
conidentiality &y encrypting t/e card num&er during transmission' &y
limiting t/e places $/ere it mig/t appear Bin data&ases' log iles'
+98
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
&ackups' printed receipts' and so onC' and &y restricting access to t/e
places $/ere it is stored. I an unaut/oriEed party o&tains t/e card
num&er in any $ay' a &reac/ o conidentiality /as occurred.
%reac/es o conidentiality take many orms. !ermitting someone to
look over your s/oulder at your computer screen $/ile you /ave
conidential data displayed on it could &e a &reac/ o conidentiality. I a
laptop computer containing sensitive inormation a&out a companyIs
employees is stolen or sold' it could result in a &reac/ o conidentiality.
Giving out conidential inormation over t/e telep/one is a &reac/ o
conidentiality i t/e caller is not aut/oriEed to /ave t/e inormation.
Conidentiality is necessary B&ut not suicientC or maintaining t/e
privacy o t/e people $/ose personal inormation a system /olds.
Integrity
In inormation security' integrity means t/at data cannot &e modiied
$it/out aut/oriEation. B@/is is not t/e same t/ing as reerential integrity
in data&ases.C Integrity is violated $/en an employee Baccidentally or
$it/ malicious intentC deletes important data iles' $/en a computer
virus inects a computer' $/en an employee is a&le to modiy /is o$n
salary in a payroll data&ase' $/en an unaut/oriEed user vandaliEes a
$e& site' $/en someone is a&le to cast a very large num&er o votes in
an online poll' and so on.
vailability
?or any inormation system to serve its purpose' t/e inormation must &e
availa&le $/en it is needed. @/is means t/at t/e computing systems used
to store and process t/e inormation' t/e security controls used to protect
it' and t/e communication c/annels used to access it must &e unctioning
correctly. )ig/ availa&ility systems aim to remain availa&le at all times'
preventing service disruptions due to po$er outages' /ard$are ailures'
and system upgrades. Ensuring availa&ility also involves preventing
denial4o4service attacks.
In 8998' "onn !arker proposed an alternative model or t/e classic CI(
triad t/at /e called t/e siD atomic elements o inormation. @/e elements
are conidentiality' possession' integrity' aut/enticity' availa&ility' and
utility. @/e merits o t/e !arkerian /eDad are a su&ject o de&ate
amongst security proessionals.
,.*.+ Aut1"ntct)
+9=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
In computing' e4%usiness and inormation security it is necessary to
ensure t/at t/e data' transactions' communications or documents
Belectronic or p/ysicalC are genuine Bi.e. t/ey /ave not &een orged or
a&ricated.C
,.*., Non@R"<ud'ton
In la$' non4repudiation implies ones intention to ulill t/eir o&ligations
to a contract. It also implies t/at one party o a transaction can not deny
/aving received a transaction nor can t/e ot/er party deny /aving sent a
transaction.
Electronic commerce uses tec/nology suc/ as digital signatures and
encryption to esta&lis/ aut/enticity and non4repudiation.
,.*.8 R!0 M'n'-"#"nt
2ecurity is everyoneKs responsi&ility. 2ecurity a$areness poster. U.2.
"epartment o Commerce-Oice o 2ecurity.
( compre/ensive treatment o t/e topic o risk management is &eyond
t/e scope o t/is article. We $ill /o$ever' provide a useul deinition o
risk management' outline a commonly used process or risk
management' and deine some &asic terminology.
@/e CI2( 7evie$ Manual 899. provides t/e ollo$ing deinition o
risk management5 :)isk management is the process of identifying
vulnerabilities and threats to the information resources used by an
organi3ation in achieving business ob%ectives& and deciding 1hat
countermeasures& if any& to take in reducing risk to an acceptable level&
based on the value of the information resource to the organi3ation.:
@/ere are t$o t/ings in t/is deinition t/at may need some clariication.
?irst' t/e process o risk management is an ongoing iterative process. It
must &e repeated indeinitely. @/e &usiness environment is constantly
c/anging and ne$ t/reats and vulnera&ilities emerge every day. 2econd'
t/e c/oice o countermeasures BcontrolsC used to manage risks must
strike a &alance &et$een productivity' cost' eectiveness o t/e
countermeasure' and t/e value o t/e inormational asset &eing
protected.
R!0 is t/e likeli/ood t/at somet/ing &ad $ill /appen t/at causes /arm
to an inormational asset Bor t/e loss o t/e assetC. ( &u(n"r'$(t) is a
$eakness t/at could &e used to endanger or cause /arm to an
inormational asset. ( t1r"'t is anyt/ing Bman made or act o natureC
t/at /as t/e potential to cause /arm.
+9,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
@/e likeli/ood t/at a t/reat $ill use a vulnera&ility to cause /arm creates
a risk. W/en a t/reat does use a vulnera&ility to inlict /arm' it /as an
impact. In t/e conteDt o inormation security' t/e impact is a loss o
availa&ility' integrity' and conidentiality' and possi&ly ot/er losses Blost
income' loss o lie' loss o real propertyC. It s/ould &e pointed out t/at it
is not possi&le to identiy all risks' nor is it possi&le to eliminate all risk.
@/e remaining risk is called residual risk.
( risk assessment is carried out &y a team o people $/o /ave
kno$ledge o speciic areas o t/e &usiness. Mem&ers/ip o t/e team
may vary over time as dierent parts o t/e &usiness are assessed. @/e
assessment may use a su&jective Du'(t't&" analysis &ased on inormed
opinion' or $/ere relia&le dollar igures and /istorical inormation is
availa&le' t/e analysis may use Du'ntt't&" analysis.
,.*.5 Contro(!
W/en Management c/ooses to mitigate a risk' t/ey $ill do so &y
implementing one or more o t/ree dierent types o controls.
dministrative
(dministrative controls Balso called procedural controlsC consist o
approved $ritten policies' procedures' standards and guidelines.
(dministrative controls orm t/e rame$ork or running t/e &usiness
and managing people. @/ey inorm people on /o$ t/e &usiness is to &e
run and /o$ day to day operations are to &e conducted. La$s and
regulations created &y government &odies are also a type o
administrative control &ecause t/ey inorm t/e &usiness. 2ome industry
sectors /ave policies' procedures' standards and guidelines t/at must &e
ollo$ed 4 t/e !ayment Card Industry B!CIC "ata 2ecurity 2tandard
re*uired &y 0isa and Master Card is suc/ an eDample. Ot/er eDamples
o administrative controls include t/e corporate security policy'
pass$ord policy' /iring policies' and disciplinary policies.
(dministrative controls orm t/e &asis or t/e selection and
implementation o logical and p/ysical controls. Logical and p/ysical
controls are maniestations o administrative controls. (dministrative
controls are o paramount importance.
!ogical
Logical controls Balso called tec/nical controlsC use sot$are and data to
monitor and control access to inormation and computing systems. ?or
eDample5 pass$ords' net$ork and /ost &ased ire$alls' net$ork
intrusion detection systems' access control lists' and data encryption are
logical controls.
+91
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
(n important logical control t/at is re*uently overlooked is t/e
<rnc<(" o/ ("'!t <r&("-". @/e principle o least privilege re*uires t/at
an individual' program or system process is not granted any more access
privileges t/an are necessary to perorm t/e task. ( &latant eDample o
t/e ailure to ad/ere to t/e principle o least privilege is logging into
Windo$s as user (dministrator to read Email and sur t/e We&.
0iolations o t/is principle can also occur $/en an individual collects
additional access privileges over time. @/is /appens $/en employeesI
jo& duties c/ange' or t/ey are promoted to a ne$ position' or t/ey
transer to anot/er department. @/e access privileges re*uired &y t/eir
ne$ duties are re*uently added onto t/eir already eDisting access
privileges $/ic/ may no longer &e necessary or appropriate.
"hysical
!/ysical controls monitor and control t/e environment o t/e $ork place
and computing acilities. @/ey also monitor and control access to and
rom suc/ acilities. ?or eDample5 doors' locks' /eating and air
conditioning' smoke and ire alarms' ire suppression systems' cameras'
&arricades' encing' security guards' ca&le locks' etc. 2eparating t/e
net$ork and $ork place into unctional areas are also p/ysical controls.
(n important p/ysical control t/at is re*uently overlooked is t/e
!"<'r'ton o/ dut"!. 2eparation o duties ensures t/at an individual can
not complete a critical task &y /imsel. ?or eDample5 an employee $/o
su&mits a re*uest or reim&ursement s/ould not also &e a&le to aut/oriEe
payment or print t/e c/eck. (n applications programmer s/ould not also
&e t/e server administrator or t/e data&ase administrator 4 t/ese roles
and responsi&ilities must &e separated rom one anot/er.
,.+ D't'$'!" S"curt)
"ata&ase security is primarily concerned $it/ t/e secrecy o data.
2ecrecy means protecting a data&ase rom unaut/oriEed access &y users
and sot$are applications.
2ecrecy' in t/e conteDt o data &ase security' includes a variety o t/reats
incurred t/roug/ unaut/oriEed access. @/ese t/reats range rom t/e
intentional t/et or destruction o data to t/e ac*uisition o inormation
t/roug/ more su&tle measures' suc/ as inerence. @/ere are t/ree
generally accepted categories o secrecy4related pro&lems in data &ase
systems5
*. T1" #<ro<"r r"("'!" o/ n/or#'ton /ro# r"'dn- d't' t1't
2'! nt"nton'(() or 'ccd"nt'(() 'cc"!!"d $) un'ut1or>"d
u!"r!. 2ecuring data &ases rom unaut/oriEed access is more
+9.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
diicult t/an controlling access to iles managed &y operating
systems. @/is pro&lem arises rom t/e iner granularity t/at is
used &y data&ases $/en /andling iles' attri&utes' and values.
@/is type o pro&lem also includes t/e violations to secrecy t/at
result rom t/e pro&lem o inerence' $/ic/ is t/e deduction o
unaut/oriEed inormation rom t/e o&servation o aut/oriEed
inormation. Inerence is one o t/e most diicult actors to
control in any attempts to secure data. %ecause t/e inormation in
a data&ase is semantically related' it is possi&le to determine t/e
value o an attri&ute $it/out accessing it directly. Inerence
pro&lems are most serious in statistical data&ases $/ere users can
trace &ack inormation on individual entities rom t/e statistical
aggregated data.
+. T1" I#<ro<"r Mod/c'ton o/ D't'. @/is t/reat includes
violations o t/e security o data t/roug/ mis/andling and
modiications &y unaut/oriEed users. @/ese violations can result
rom errors' viruses' sa&otage' or ailures in t/e data t/at arise
rom access &y unaut/oriEed users.
,. D"n'(@O/@S"r&c" T1r"'t!. (ctions t/at could prevent users
rom using system resources or accessing data are among t/e
most serious. @/is t/reat /as &een demonstrated to a signiicant
degree recently $it/ t/e 2AN looding attacks against net$ork
service providers.
D!cr"ton'r) &!. M'nd'tor) Acc"!! Contro( Po(c"!
%ot/ traditional relational data &ase management system B7"%M2C
security models and OO data &ase models make use o t$o general types
o access control policies to protect t/e inormation in multilevel
systems. @/e irst o t/ese policies is t/e discretionary policy. In t/e
discretionary access control B"(CC policy' access is restricted &ased on
t/e aut/oriEations granted to t/e user.
@/e mandatory access control BM(CC policy secures inormation &y
assigning sensitivity levels' or la&els' to data entities. M(C policies are
generally more secure t/an "(C policies and t/ey are used in systems
in $/ic/ security is critical' suc/ as military applications. )o$ever' t/e
price t/at is usually paid or t/is tig/tened security is reduced
perormance o t/e data &ase management system. Most M(C policies
also incorporate "(C measures as $ell.
,., R"('ton'( DBMS S"curt)
@/e principal met/ods o security in traditional 7"%M2s are t/roug/
t/e appropriate use and manipulation o vie$s and t/e structured *uery
+9;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
language B2FLC G7(N@ and 7E0O3E statements. @/ese measures are
reasona&ly eective &ecause o t/eir mat/ematical oundation in
relational alge&ra and relational calculus.
,.,.* V"2@B'!"d Acc"!! Contro(
0ie$s allo$ t/e data&ase to &e conceptually divided into pieces in $ays
t/at allo$ sensitive data to &e /idden rom unaut/oriEed users. In t/e
relational model' vie$s provide a po$erul mec/anism or speciying
data4dependent aut/oriEations or data retrieval.
(lt/oug/ t/e individual user $/o creates a vie$ is t/e o$ner and is
entitled to drop t/e vie$' /e or s/e may not &e aut/oriEed to eDecute all
privileges on it. @/e aut/oriEations t/at t/e o$ner may eDercise depend
on t/e vie$ semantics and on t/e aut/oriEations t/at t/e o$ner is
allo$ed to implement on t/e ta&les directly accessed &y t/e vie$. ?or
t/e o$ner to eDercise a speciic aut/oriEation on a vie$ t/at /e or s/e
creates' t/e o$ner must possess t/e same aut/oriEation on all ta&les t/at
t/e vie$ uses. @/e privileges t/e o$ner possesses on t/e vie$ are
determined at t/e time o vie$ deinition. Eac/ privilege t/e o$ner
possesses on t/e ta&les is deined or t/e vie$. I' later on' t/e o$ner
receives additional privileges on t/e ta&les used &y t/e vie$' t/ese
additional privileges $ill not &e passed onto t/e vie$. In order to use t/e
ne$ privileges $it/in a vie$' t/e o$ner $ill need to create a ne$ vie$.
@/e &iggest pro&lem $it/ vie$4&ased mandatory access controls is t/at
it is impractical to veriy t/at t/e sot$are perorms t/e vie$
interpretation and processing. I t/e correct aut/oriEations are to &e
assured' t/e system must contain some type o mec/anism to veriy t/e
classiication o t/e sensitivity o t/e inormation in t/e data&ase. @/e
classiication must &e done automatically' and t/e sot$are t/at /andles
t/e classiication must &e trusted. )o$ever' any trusted sot$are or t/e
automatic classiication process $ould &e eDtremely compleD.
?urt/ermore' attempting to use a *uery language suc/ as 2FL to speciy
classiications *uickly &ecome convoluted and compleD. Even $/en t/e
compleDity o t/e classiication sc/eme is overcome' t/e vie$ can do
not/ing more t/an limit $/at t/e user sees S it cannot restrict t/e
operations t/at may &e perormed on t/e vie$s.
,.8 Pro<o!"d OODBMS S"curt) Mod"(!
Currently only a e$ models use discretionary access control measures
in secure o&ject4oriented data &ase management systems.
E7<(ct Aut1or>'ton!
@/e O7ION aut/oriEation model permits access to data on t/e &asis o
eDplicit aut/oriEations provided to eac/ group o users. @/ese
+9<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
aut/oriEations are classiied as positive aut/oriEations &ecause t/ey
speciically allo$ a user access to an o&ject. 2imilarly' a negative
aut/oriEation is used to speciically deny a user access to an o&ject.
@/e placement o an individual into one or more groups is &ased on t/e
role t/at t/e individual plays in t/e organiEation. In addition to t/e
positive aut/oriEations t/at are provided to users $it/in eac/ group'
t/ere are a variety o implicit aut/oriEations t/at may &e granted &ased
on t/e relations/ips &et$een su&jects and access modes.
D't'@Cdn- Mod"(
( similar discretionary access control secure model is t/e data4/iding
model proposed &y "r. Elisa %ertino o t/e UniversitaK di Genova. @/is
model distinguis/es &et$een pu&lic met/ods and private met/ods.
@/e data4/iding model is &ased on aut/oriEations or users to eDecute
met/ods on o&jects. @/e aut/oriEations speciy $/ic/ met/ods t/e user
is aut/oriEed to invoke. (ut/oriEations can only &e granted to users on
pu&lic met/ods. )o$ever' t/e act t/at a user can access a met/od does
not automatically mean t/at t/e user can eDecute all actions associated
$it/ t/e met/od. (s a result' several access controls may need to &e
perormed during t/e eDecution' and all o t/e aut/oriEations or t/e
dierent accesses must eDist i t/e user is to complete t/e processing.
2imilar to t/e use o G7(N@ statements in traditional relational data
&ase management systems' t/e creator o an o&ject is a&le to grant
aut/oriEations to t/e o&ject to dierent users. @/e TcreatorU is also a&le
to revoke t/e aut/oriEations rom users in a manner similar to 7E0O3E
statements. )o$ever' unlike traditional 7"%M2 G7(N@ statements'
t/e data4/iding model includes t/e notion o protection mode. W/en
aut/oriEations are provided to users in t/e protection mode' t/e
aut/oriEations actually c/ecked &y t/e system are t/ose o t/e creator
and not t/e individual eDecuting t/e met/od. (s a result' t/e creator is
a&le to grant a user access to a met/od $it/out granting t/e user t/e
aut/oriEations or t/e met/ods called &y t/e original met/od. In ot/er
$ords' t/e creator can provide a user access to speciic data $it/out
&eing orced to give t/e user complete access to all related inormation
in t/e o&ject.
,.5 S"curt) C('!!/c'ton /or In/or#'ton
(n important aspect o inormation security and risk management is
recogniEing t/e value o inormation and deining appropriate
+9:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
procedures and protection re*uirements or t/e inormation. Not all
inormation is e*ual and so not all inormation re*uires t/e same degree
o protection. @/is re*uires inormation to &e assigned a security
classiication.
2ome actors t/at inluence $/ic/ classiication inormation s/ould &e
assigned include /o$ muc/ value t/at inormation /as to t/e
organiEation' /o$ old t/e inormation is and $/et/er or not t/e
inormation /as &ecome o&solete. La$s and ot/er regulatory
re*uirements are also important considerations $/en classiying
inormation.
Common inormation security classiication la&els used &y t/e &usiness
sector are5 <u$(cE !"n!t&"E <r&'t"E con/d"nt'(. Common
inormation security classiication la&els used &y government are5
Unc('!!/"d' S"n!t&" But Unc('!!/"d' R"!trct"d' Con/d"nt'('
S"cr"t' To< S"cr"t and t/eir non4Englis/ e*uivalents.
(ll employees in t/e organiEation' as $ell as &usiness partners' must &e
trained on t/e classiication sc/ema and understand t/e re*uired security
controls and /andling procedures or eac/ classiication. @/e
classiication a particular inormation asset /as &een assigned s/ould &e
revie$ed periodically to ensure t/e classiication is still appropriate or
t/e inormation and to ensure t/e security controls re*uired &y t/e
classiication are in place.
Acc"!! contro(:(ccess to protected inormation must &e restricted to
people $/o are aut/oriEed to access t/e inormation. @/e computer
programs' and in many cases t/e computers t/at process t/e inormation'
must also &e aut/oriEed. @/is re*uires t/at mec/anisms &e in place to
control t/e access to protected inormation. @/e sop/istication o t/e
access control mec/anisms s/ould &e in parity $it/ t/e value o t/e
inormation &eing protected 4 t/e more sensitive or valua&le t/e
inormation t/e stronger t/e control mec/anisms need to &e. @/e
oundation on $/ic/ access control mec/anisms are &uilt start $it/
identiication and aut/entication.
Id"nt/c'ton is an assertion o $/o someone is or $/at somet/ing is.
I a person makes t/e statement :9ello& my name is ;ohn Doe.: t/ey are
making a claim o $/o t/ey are. )o$ever' t/eir claim may or may not
&e true. %eore #o/n "oe can &e granted access to protected inormation
it $ill &e necessary to veriy t/at t/e person claiming to &e #o/n "oe
really is #o/n "oe.
Aut1"ntc'ton is t/e act o veriying a claim o identity. W/en #o/n
"oe goes into a &ank to make a $it/dra$al' /e tells t/e &ank teller /e is
#o/n "oe Ba claim o identityC. @/e &ank teller asks to see a p/oto I"' so
++9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
/e /ands t/e teller /is driversK license. @/e &ank teller c/ecks t/e license
to make sure it /as #o/n "oe printed on it and compares t/e p/otograp/
on t/e license against t/e person claiming to &e #o/n "oe. I t/e p/oto
and name matc/ t/e person' t/en t/e teller /as aut/enticated t/at #o/n
"oe is $/o /e claimed to &e.
On computer systems in use today' t/e Username is t/e most common
orm o identiication and t/e !ass$ord is t/e most common orm o
aut/entication. Usernames and pass$ords /ave served t/eir purpose &ut
in our modern $orld t/ey are no longer ade*uate. Usernames and
pass$ords are slo$ly &eing replaced $it/ more sop/isticated
aut/entication mec/anisms.
(ter a person' program or computer /as successully &een identiied
and aut/enticated t/en it must &e determined $/at inormational
resources t/ey are permitted to access and $/at actions t/ey $ill &e
allo$ed to perorm Brun' vie$' create' delete' or c/angeC. @/is is called
'ut1or>'ton.
(ut/oriEation to access inormation and ot/er computing services
&egins $it/ administrative policies and procedures. @/e polices
prescri&e $/at inormation and computing services can &e accessed' &y
$/om' and under $/at conditions. @/e access control mec/anisms are
t/en conigured to enorce t/ese policies.
"ierent computing systems are e*uipped $it/ dierent kinds o access
control mec/anisms' some may oer a c/oice o dierent access control
mec/anisms. @/e access control mec/anism a system oers $ill &e
&ased upon one o t/ree approac/es to access control or it may &e
derived rom a com&ination o t/e t/ree approac/es.
@/e non4discretionary approac/ consolidates all access control under a
centraliEed administration. @/e access to inormation and ot/er
resources is usually &ased on t/e individuals unction BroleC in t/e
organiEation or t/e tasks t/e individual must perorm. @/e discretionary
approac/ gives t/e creator or o$ner o t/e inormation resource t/e
a&ility to control access to t/ose resources. In t/e Mandatory access
control approac/' access is granted or denied &ases upon t/e security
classiication assigned to t/e inormation resource.
+++
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.? Cr)<to-r'<1)
Inormation security uses cryptograp/y to transorm usa&le inormation
into a orm t/at renders it unusa&le &y anyone ot/er t/an an aut/oriEed
userJ t/is process is called encryption. Inormation t/at /as &een
encrypted Brendered unusa&leC can &e transormed &ack into its original
usa&le orm &y an aut/oriEed user' $/o possesses t/e cryptograp/ic key'
t/roug/ t/e process o decryption. Cryptograp/y is used in inormation
security to protect inormation rom unaut/oriEed or accidental discloser
$/ile t/e inormation is in transit Beit/er electronically or p/ysicallyC
and $/ile inormation is in storage.
Cryptograp/y provides inormation security $it/ ot/er useul
applications as $ell including improved aut/entication met/ods'
message digests' digital signatures' non4repudiation' and encrypted
net$ork communications.
Cryptograp/y can introduce security pro&lems $/en it is not
implemented correctly. Cryptograp/ic solutions need to &e implemented
using industry accepted solutions t/at /ave undergone rigorous peer
revie$ &y independent eDperts in cryptograp/y. @/e lengt/ and strengt/
o t/e encryption key is also an important consideration. ( key t/at is
$eak or too s/ort $ill produce $eak encryption. @/e keys used or
encryption and decryption must &e protected $it/ t/e same degree o
rigor as any ot/er conidential inormation. @/ey must &e protected rom
unaut/oriEed disclosure and destruction and t/ey must &e availa&le
$/en needed.
Proc"!!
@/e terms r"'!on'$(" 'nd <rud"nt <"r!on' du" c'r" and du"
d(-"nc" /ave &een used in t/e ields o ?inance' 2ecurities' and La$
or many years. In recent years t/ese terms /ave ound t/eir $ay into t/e
ields o computing and inormation security. U.2.(. ?ederal 2entencing
Guidelines no$ make it possi&le to /old corporate oicers lia&le or
ailing to eDercise due care and due diligence in t/e management o t/eir
inormation systems. In t/e &usiness $orld' stock/olders' customers'
&usiness partners and governments /ave t/e eDpectation t/at corporate
oicers $ill run t/e &usiness in accordance $it/ accepted &usiness
practices and in compliance $it/ la$s and ot/er regulatory
re*uirements. @/is is oten descri&ed as t/e Mreasona&le and prudent
personM rule.
++8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.7 D!'!t"r R"co&"r) P('nnn-
W/at is "isaster 7ecovery !lanning
"isaster 7ecovery !lanning is all a&out continuing an I@ service.
Aou need 8 or more sites' one o t/em is primary' $/ic/ is planned to &e
recovered. @/e alternate site may &e online...meaning production data is
simultaneously transerred to &ot/ sites Bsometime called as )O@ 2itesC'
may &e oline...meaning data is tranerred ater a certain delay t/roug/
ot/er means' Bsometimes called as a W(7M siteC or even may not &e
transerred at all' &ut may /ave a replica I@ system o t/e original site'
$/ic/ $ill &e started $/enever t/e primary site aces a disaster
Bsometimes called a COL" siteC.
)o$ are "7! and %C! dierent
@/oug/ "7! is part o t/e %C! process' "7! ocusses on I@
systems recovery and %C! on t/e entire &usiness.
)o$ are "7! and %C! related
"7! is one o t/e recovery activities during eDecution o a %usiness
Continuity !lan.
8.: CONCLUSION
"ata and inormation systems security is t/e ongoing process o
eDercising due care and due diligence to protect inormation' and
inormation systems' rom unaut/oriEed access' use' disclosure'
destruction' modiication' or disruption or distri&ution. T1" n"&"r
"ndn- <roc"!! o inormation security involves ongoing training'
assessment' protection' monitoring L detection' incident response L
repair' documentation' and revie$.
5.: SUMMARY
@/is unit can &e summariEed as ollo$s5
D't' !"curt) is t/e means o ensuring t/at data is kept sae rom
corruption and t/at access to it is suita&ly controlled
In/or#'ton S"curt) means protecting inormation and inormation
systems rom unaut/oriEed access' use' disclosure' disruption'
modiication' or destruction. @/e terms inormation security'
computer security and inormation assurance are re*uently used
interc/angea&ly.
++=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
?or over t$enty years inormation security /as /eld t/at
conidentiality' integrity and availa&ility Bkno$n as t/e CI( @riadC
are t/e core principles o inormation system security.
@/e principal met/ods o security in traditional 7"%M2s are
t/roug/ t/e appropriate use and manipulation o vie$s and t/e
structured *uery language B2FLC G7(N@ and 7E0O3E statements.
Aut1"ntc'ton is t/e act o veriying a claim o identity.
Currently only a e$ models use discretionary access control
measures in secure o&ject4oriented data &ase management systems.
(n important aspect o inormation security and risk management is
recogniEing t/e value o inormation and deining appropriate
procedures and protection re*uirements or t/e inormation.
Inormation security uses cryptograp/y to transorm usa&le
inormation into a orm t/at renders it unusa&le &y anyone ot/er t/an
an aut/oriEed userJ t/is process is called encryption.
"isaster 7ecovery !lanning is all a&out continuing an I@ service.
Aou need 8 or more sites' one o t/em is primary' $/ic/ is planned
to &e recovered.
?.: TUTOR@MARAED ASSIGNMENT
+. List "onn !arkerKs . atomic elements o CI( @riad o
inormation security.
8. %riely discuss "isaster 7ecovery !lanning in t/e security o
"%M2.
7.: REFERENCESBFURTCER READINGS
,, U.2.C h =1,8 B&C B+C B899.C
%lack$ell Encyclopedia o Management Inormation 2ystem' 0ol. III'
Edited &y Gordon %. "avis.
)arris' 2/on B899=C. All#in#one "I00( "ertification +xam uide' 8nd
Ed.' -dmirror-/ttp-en.$ikipedia.org-$-Emeryville' C(5 McGra$4
)ill-Os&orne.
I2(C( B899.C. "I0A )evie1 Manual /@@A. Inormation 2ystems (udit
and Control (ssociation' p. <1. I2%N +4:==8<,4+14=.
Fuist' (rvin 2. B8998C. M0ecurity "lassification of Information B)@MLC.
0olume +. Introduction' )istory' and (dverse Impacts. Oak
7idge Classiication (ssociates' LLC. 7etrieved on 899;49+4++.
++,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
UNIT 8 DATABASE ADMINISTRATOR AND
ADMINISTRATION
CONTENTS
+.9 Introduction
8.9 O&jectives
=.9 Main Content
=.+ "uties o "ata&ase (dministrator
=.8 @ypical Work (ctivities
=.= "ata&ase (dministrations and (utomation
=.=.+ @ypes o "ata&ase (dministration
=.=.8 Nature o "ata&ase (dministration
=.=.= "ata&ase (dministration @ools
=.=., @/e Impact o I@ (utomation on "ata&ase
(dministration
=.=.1 Learning "ata&ase (dministration
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
( d't'$'!" 'd#n!tr'tor BDBAC is a person $/o is responsi&le or t/e
environmental aspects o a data&ase. In general' t/ese include5
7ecovera&ility 4 Creating and testing %ackups
Integrity 4 0eriying or /elping to veriy data integrity
2ecurity 4 "eining and-or implementing access controls to t/e data
(vaila&ility 4 Ensuring maDimum uptime
!erormance 4 Ensuring maDimum perormance
"evelopment and testing support 4 )elping programmers and engineers
to eiciently utiliEe t/e data&ase.
@/e role o a data&ase administrator /as c/anged according to t/e
tec/nology o data&ase management systems B"%M2sC as $ell as t/e
needs o t/e o$ners o t/e data&ases. ?or eDample' alt/oug/ logical and
p/ysical data&ase designs are traditionally t/e duties o a d't'$'!"
'n'()!t or d't'$'!" d"!-n"r' a "%( may &e tasked to perorm t/ose
duties.
++1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
ans$er t/e *uestion o $/o is a data&ase administrator
identiy t/e various unctions o data&ase administrator
kno$ t/e dierent types o data&ase administration
understand t/e nature o data&ase administration
kno$ t/e tools used in data&ase administration.

,.: MAIN CONTENT
,.* Dut"! o/ D't'$'!" Ad#n!tr'tor
@/e duties o a data&ase administrator vary and depend on t/e jo&
description' corporate and Inormation @ec/nology BI@C policies and t/e
tec/nical eatures and capa&ilities o t/e "%M2 &eing administered.
@/ey nearly al$ays include disaster recovery B&ackups and testing o
&ackupsC' perormance analysis and tuning' data dictionary maintenance'
and some data&ase design.
2ome o t/e roles o t/e "%( may include5
Installation o ne$ sot$are S It is primarily t/e jo& o t/e "%( to
install ne$ versions o "%M2 sot$are' application sot$are' and ot/er
sot$are related to "%M2 administration. It is important t/at t/e "%(
or ot/er I2 sta mem&ers test t/is ne$ sot$are &eore it is moved into a
production environment.
Coniguration o /ard$are and sot$are $it/ t/e system administrator
S In many cases t/e system sot$are can only &e accessed &y t/e
system administrator. In t/is case' t/e "%( must $ork closely $it/ t/e
system administrator to perorm sot$are installations' and to conigure
/ard$are and sot$are so t/at it unctions optimally $it/ t/e "%M2.
2ecurity administration S One o t/e main duties o t/e "%( is to
monitor and administer "%M2 security. @/is involves adding and
removing users' administering *uotas' auditing' and c/ecking or
security pro&lems.
"ata analysis S @/e "%( $ill re*uently &e called on to analyEe t/e
data stored in t/e data&ase and to make recommendations relating to
perormance and eiciency o t/at data storage. @/is mig/t relate to t/e
more eective use o indeDes' ena&ling M!arallel FueryM eDecution' or
ot/er "%M2 speciic eatures.
"ata&ase design BpreliminaryC S @/e "%( is oten involved at t/e
preliminary data&ase4design stages. @/roug/ t/e involvement o t/e
"%(' many pro&lems t/at mig/t occur can &e eliminated. @/e "%(
++.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
kno$s t/e "%M2 and system' can point out potential pro&lems' and can
/elp t/e development team $it/ special perormance considerations.
"ata modeling and optimiEation S &y modeling t/e data' it is possi&le
to optimiEe t/e system layout to take t/e most advantage o t/e I-O
su&system.
7esponsi&le or t/e administration o eDisting enterprise data&ases and
t/e analysis' design' and creation o ne$ data&ases.
4 "ata modeling' data&ase optimiEation' understanding and
implementation o sc/emas' and t/e a&ility to interpret and $rite
compleD 2FL *ueries
4 !roactively monitor systems or optimum perormance and
capacity constraints
4 Esta&lis/ standards and &est practices or 2FL
4 Interact $it/ and coac/ developers in 2FL scripting
R"co&"r'$(t)
7ecovera&ility means t/at' i a data entry error' program &ug or
/ard$are ailure occurs' t/e "%( can &ring t/e data&ase &ack$ard in
time to its state at an instant o logical consistency &eore t/e damage
$as done. 7ecovera&ility activities include making data&ase &ackups
and storing t/em in $ays t/at minimiEe t/e risk t/at t/ey $ill &e
damaged or lost' suc/ as placing multiple copies on remova&le media
and storing t/em outside t/e aected area o an anticipated disaster.
7ecovera&ility is t/e "%(Ks most important concern.
@/e &ackup o t/e data&ase consists o data $it/ timestamps com&ined
$it/ data&ase logs to c/ange t/e data to &e consistent to a particular
moment in time. It is possi&le to make a &ackup o t/e data&ase
containing only data $it/out timestamps or logs' &ut t/e "%( must take
t/e data&ase oline to do suc/ a &ackup.
@/e recovery tests o t/e data&ase consist o restoring t/e data' t/en
applying logs against t/at data to &ring t/e data&ase &ackup to
consistency at a particular point in time up to t/e last transaction in t/e
logs. (lternatively' an oline data&ase &ackup can &e restored simply &y
placing t/e data in4place on anot/er copy o t/e data&ase.
I a "%( Bor any administratorC attempts to implement a recovera&ility
plan $it/out t/e recovery tests' t/ere is no guarantee t/at t/e &ackups
are at all valid. In practice' in all &ut t/e most mature 7"%M2 packages'
&ackups rarely are valid $it/out eDtensive testing to &e sure t/at no &ugs
or /uman error /ave corrupted t/e &ackups.
S"curt)
++;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
2ecurity means t/at usersK a&ility to access and c/ange data conorms to
t/e policies o t/e &usiness and t/e delegation decisions o its managers.
Like ot/er metadata' a relational "%M2 manages security inormation
in t/e orm o ta&les. @/ese ta&les are t/e Tkeys to t/e kingdomU and so
it is important to protect t/em rom intruders. so t/at is $/y t/e security
is more and more important or t/e data&ases.
P"r/or#'nc"
!erormance means t/at t/e data&ase does not cause unreasona&le
online response times' and it does not cause unattended programs to run
or an un$orka&le period o time. In compleD client-server and t/ree4tier
systems' t/e data&ase is just one o many elements t/at determine t/e
perormance t/at online users and unattended programs eDperience.
!erormance is a major motivation or t/e "%( to &ecome a generalist
and coordinate $it/ specialists in ot/er parts o t/e system outside o
traditional &ureaucratic reporting lines.
@ec/ni*ues or data&ase perormance tuning /ave c/anged as "%(Is
/ave &ecome more sop/isticated in t/eir understanding o $/at causes
perormance pro&lems and t/eir a&ility to diagnose t/e pro&lem.
In t/e +::9s' "%(s oten ocused on t/e data&ase as a $/ole' and
looked at data&ase4$ide statistics or clues t/at mig/t /elp t/em ind out
$/y t/e system $as slo$. (lso' t/e actions "%(s took in t/eir attempts
to solve perormance pro&lems $ere oten at t/e glo&al' data&ase level'
suc/ as c/anging t/e amount o computer memory availa&le to t/e
data&ase' or c/anging t/e amount o memory availa&le to any data&ase
program t/at needed to sort data.
"%(Is no$ understand t/at perormance pro&lems initially must &e
diagnosed' and t/is is &est done &y eDamining individual 2FL
statements' ta&le process' and system arc/itecture' not t/e data&ase as a
$/ole. 0arious tools' some included $it/ t/e data&ase and some
availa&le rom t/ird parties' provide a &e/ind t/e scenes look at /o$ t/e
data&ase is /andling t/e 2FL statements' s/edding lig/t on $/atIs taking
so long.
)aving identiied t/e pro&lem' t/e individual 2FL statement can &e
D"&"(o<#"ntBT"!tn- Su<<ort
"evelopment and testing support is typically $/at t/e data&ase
administrator regards as /is or /er least important duty' $/ile results4
oriented managers consider it t/e "%(Ks most important duty. 2upport
activities include collecting sample production data or testing ne$ and
++<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
c/anged programs and loading it into test data&asesJ consulting $it/
programmers a&out perormance tuningJ and making ta&le design
c/anges to provide ne$ kinds o storage or ne$ program unctions.
)ere are some I@ roles t/at are related to t/e role o data&ase
administrator5
(pplication programmer or sot$are engineer
2ystem administrator
"ata administrator
"ata arc/itect
,.+ T)<c'( 3or0 Act&t"!
@/e $ork o data&ase administrator B"%(C varies according to t/e
nature o t/e employing organiEation and level o responsi&ility
associated $it/ post. @/e $ork may &e pure maintenance or it may also
involve specialiEing in data&ase development.
@ypical responsi&ility includes some or all o t/e ollo$ing5
esta&lis/ing t/e needs o t/e users and monitoring users access and
security
monitoring perormance and managing parameters to provide ast
*uery responses to Pront endK users
mapping out t/e conceptual design or a planned data&ase in outline
considering &ot/ &ack end organiEation o data and ront end
accessi&ility or t/e end user
reining t/e logical design so t/at it can translated into speciic data
model
urt/er reining t/e p/ysical design to meet systems storage
re*uirements
installing and testing ne$ versions o t/e data&ase management
system
maintaining data standards including ad/erence to t/e "ata
!rotection (ct
$riting data&ase documentation' including data standards'
procedures and deinitions or t/e data dictionary BmetadataC
controlling access permissions and privileges
developing' managing and testing &ackup recovery plans
ensuring t/at storage ' arc/iving' and &ackup procedures are
unctioning properly
capacity planning
$orking closely $it/ I@ project manager' data&ase programmers'
and $e& developers
++:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
communicating regularly $it/ tec/nical applications and operational
sta to ensure data&ase integrity and security
commissioning and installing ne$ applications
%ecause o t/e increasing level o /acking and t/e sensitive nature o
data stored' security and recovera&ility or disaster recovery /as &ecome
increasingly important aspects o t/e $ork.
,., D't'$'!" Ad#n!tr'ton! 'nd Auto#'ton
D't'$'!" Ad#n!tr'ton is t/e unction o managing and maintaining
data&ase management systems B"%M2C sot$are. Mainstream "%M2
sot$are suc/ as Oracle' I%M "%8 and Microsot 2FL 2erver need
ongoing management. (s suc/' corporations t/at use "%M2 sot$are
oten /ire specialiEed I@ BInormation @ec/nologyC personnel called
"ata&ase (dministrators or "%(s.
,.,.* T)<"! o/ D't'$'!" Ad#n!tr'ton
@/ere are t/ree types o "%(s5
+. 2ystems "%(s Bsometimes also reerred to as !/ysical "%(s'
Operations "%(s or !roduction 2upport "%(sC
8. "evelopment "%(s
=. (pplication "%(s
"epending on t/e "%( type' t/eir unctions usually vary. %elo$ is a
&rie description o $/at dierent types o "%(s do5
2ystems "%(s usually ocus on t/e p/ysical aspects o data&ase
administration suc/ as "%M2 installation' coniguration'
patc/ing' upgrades' &ackups' restores' reres/es' perormance
optimiEation' maintenance and disaster recovery.
"evelopment "%(s usually ocus on t/e logical and
development aspects o data&ase administration suc/ as data
model design and maintenance' ""L Bdata deinition languageC
generation' 2FL $riting and tuning' coding stored procedures'
colla&orating $it/ developers to /elp c/oose t/e most appropriate
"%M2 eature-unctionality and ot/er pre4production activities.
(pplication "%(s are usually ound in organiEations t/at /ave
purc/ased =rd party application sot$are suc/ as E7! Benterprise
resource planningC and C7M Bcustomer relations/ip
managementC systems. EDamples o suc/ application sot$are
include Oracle (pplications' 2ie&el and !eople2ot B&ot/ no$
part o Oracle Corp.C and 2(!. (pplication "%(s straddle t/e
ence &et$een t/e "%M2 and t/e application sot$are and are
+89
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
responsi&le or ensuring t/at t/e application is ully optimiEed or
t/e data&ase and vice versa. @/ey usually manage all t/e
application components t/at interact $it/ t/e data&ase and carry
out activities suc/ as application installation and patc/ing'
application upgrades' data&ase cloning' &uilding and running data
cleanup routines' data load process management' etc.
W/ile individuals usually specialiEe in one type o data&ase
administration' in smaller organiEations' it is not uncommon to ind a
single individual or group perorming more t/an one type o data&ase
administration.
,.,.+ N'tur" o/ D't'$'!" Ad#n!tr'ton
@/e degree to $/ic/ t/e administration o a data&ase is automated
dictates t/e skills and personnel re*uired to manage data&ases. On one
end o t/e spectrum' a system $it/ minimal automation $ill re*uire
signiicant eDperienced resources to manageJ per/aps 14+9 data&ases per
"%(. (lternatively an organiEation mig/t c/oose to automate a
signiicant amount o t/e $ork t/at could &e done manually t/ereore
reducing t/e skills re*uired to perorm tasks. (s automation increases'
t/e personnel needs o t/e organiEation splits into /ig/ly skilled $orkers
to create and manage t/e automation and a group o lo$er skilled MlineM
"%(s $/o simply eDecute t/e automation.
"ata&ase administration $ork is compleD' repetitive' time4consuming
and re*uires signiicant training. 2ince data&ases /old valua&le and
mission4critical data' companies usually look or candidates $it/
multiple years o eDperience. "ata&ase administration oten re*uires
"%(s to put in $ork during o4/ours Bor eDample' or planned ater
/ours do$ntime' in t/e event o a data&ase4related outage or i
perormance /as &een severely degradedC. "%(s are commonly $ell
compensated or t/e long /ours.
,.,., D't'$'!" Ad#n!tr'ton Too(!
Oten' t/e "%M2 sot$are comes $it/ certain tools to /elp "%(s
manage t/e "%M2. 2uc/ tools are called native tools. ?or eDample'
Microsot 2FL 2erver comes $it/ 2FL 2erver Enterprise Manager and
Oracle /as tools suc/ as 2FL`!lus and Oracle Enterprise Manager-Grid
Control. In addition' =rd parties suc/ as %MC' Fuest 2ot$are'
Em&arcadero and 2FL Maestro Group oer GUI tools to monitor t/e
"%M2 and /elp "%(s carry out certain unctions inside t/e data&ase
more easily.
(not/er kind o data&ase sot$are eDists to manage t/e provisioning o
ne$ data&ases and t/e management o eDisting data&ases and t/eir
+8+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
related resources. @/e process o creating a ne$ data&ase can consist o
/undreds or t/ousands o uni*ue steps rom satisying prere*uisites to
coniguring &ackups $/ere eac/ step must &e successul &eore t/e neDt
can start. ( /uman cannot &e eDpected to complete t/is procedure in t/e
same eDact $ay time ater time 4 eDactly t/e goal $/en multiple
data&ases eDist. (s t/e num&er o "%(s gro$s' $it/out automation t/e
num&er o uni*ue conigurations re*uently gro$s to &e costly-diicult
to support. (ll o t/ese complicated procedures can &e modeled &y t/e
&est "%(s into data&ase automation sot$are and eDecuted &y t/e
standard "%(s. 2ot$are /as &een created speciically to improve t/e
relia&ility and repeata&ility o t/ese procedures suc/ as 2trataviaIs "ata
!alette and Grid(pp 2ystems Clarity.
,.,.8 T1" I#<'ct o/ IT Auto#'ton on D't'$'!"
Ad#n!tr'ton
7ecently' automation /as &egun to impact t/is area signiicantly. Ne$er
tec/nologies suc/ as )!-Ops$areIs 2(2 B2erver (utomation 2ystemC
and 2trataviaIs "ata !alette suite /ave &egun to increase t/e automation
o servers and data&ases respectively causing t/e reduction o data&ase
related tasks. )o$ever at &est t/is only reduces t/e amount o mundane'
repetitive activities and does not eliminate t/e need or "%(s. @/e
intention o "%( automation is to ena&le "%(s to ocus on more
proactive activities around data&ase arc/itecture and deployment.
,.,.5 L"'rnn- D't'$'!" Ad#n!tr'ton
@/ere are several education institutes t/at oer proessional courses'
including late4nig/t programs' to allo$ candidates to learn data&ase
administration. (lso' "%M2 vendors suc/ as Oracle' Microsot and
I%M oer certiication programs to /elp companies to /ire *ualiied
"%( practitioners.
8.: CONCLUSION
"ata&ase management system B"%M2C is so important in an
organiEation t/at a special manager is oten appointed to oversee its
activities. @/e data&ase administrator is responsi&le or t/e installation
and coordination o "%M2. @/ey are responsi&le or managing one o
t/e most valua&le resources o any organiEation' its data. @/e data&ase
administrator must /ave a sound kno$ledge o t/e structure o t/e
data&ase and o t/e "%M2. @/e "%( must &e t/oroug/ly conversant
$it/ t/e organiEation' itKs system and t/e inormation need o managers.
5.: SUMMARY
+88
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
( D't'$'!" 'd#n!tr'tor BDBAC is a person $/o is responsi&le or
t/e environmental aspects o a data&ase
@/e duties o a data&ase administrator vary and depend on t/e jo&
description' corporate and Inormation @ec/nology BI@C policies and
t/e tec/nical eatures and capa&ilities o t/e "%M2 &eing
administered. @/ey nearly al$ays include disaster recovery B&ackups
and testing o &ackupsC' perormance analysis and tuning' data
dictionary maintenance' and some data&ase design.
@ec/ni*ues or data&ase perormance tuning /ave c/anged as "%(Is
/ave &ecome more sop/isticated in t/eir understanding o $/at
causes perormance pro&lems and t/eir a&ility to diagnose t/e
pro&lem
@/e $ork o data&ase administrator B"%(C varies according to t/e
nature o t/e employing organiEation and level o responsi&ility
associated $it/ post.
D't'$'!" Ad#n!tr'ton is t/e unction o managing and
maintaining data&ase management systems B"%M2C sot$are.
@/e degree to $/ic/ t/e administration o a data&ase is automated
dictates t/e skills and personnel re*uired to manage data&ases
?.: TUTOR@MARAED ASSIGNMENT
+. Mention 1 roles o data&ase administrator
8. Mention t/e types o data&ase administrations
7.: REFERENCESBFURTCER READINGS
(ssociation or Computing Mac/inery 2IGI7 ?orum arc/ive 0olume ;'
Issue ,.
@/e Origins o t/e "ata %ase Concept' Early "%M2 2ystems including
"2 and IM2' t/e "ata %ase @ask Group' and t/e )ierarc/ical'
Net$ork and 7elational "ata Models are discussed in @/omas
)aig/' MI( 0erita&le %ucket o ?acts5I Origins o t/e "ata %ase
Management 2ystem'M (CM 2IGMO" 7ecord =158 B#une 899.C.
)o$ "ata&ase 2ystems 2/are 2torage.
+8=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
MODULE ,
Unit + 7elational "ata&ase Management 2ystems
Unit 8 "ata Ware/ouse
Unit = "ocument Management 2ystem
UNIT * RELATIONAL DATABASE MANAGEMENT
SYSTEMS
CONTENTS
+.9 Introduction
8.9 O&jectives
=.9 Main Content
=.+ C!tor) o/ t1" T"r#
,.+ M'r0"t Structur"
=.= ?eatures and 7esponsi&ilities o an 7"%M2
=., Comparison o 7elational "ata&ase Management 2ystems
=.,.+ G"n"r'( In/or#'ton
,.8.+ O<"r'tn- S)!t"# Su<<ort
,.8., Fund'#"nt'( F"'tur"!
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
( 7elational data&ase management system B7"%M2C is a data&ase
management system B"%M2C t/at is &ased on t/e relational model as
introduced &y E. ?. Codd. Most popular commercial and open source
data&ases currently in use are &ased on t/e relational model.
( s/ort deinition o an 7"%M2 may &e a "%M2 in $/ic/ data is
stored in t/e orm o ta&les and t/e relations/ip among t/e data is also
stored in t/e orm o ta&les.
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
deine relational data&ase management system
trace t/e origin and development o 7"%M2
identiy t/e market structure o 7"%M2
identiy t/e major types o relational management systems
compare and contrast t/e types o 7"%M2 &ased on several criteria
+8,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.: MAIN CONTENT
,.* C!tor) o/ t1" T"r#
E. ?. Codd introduced t/e term in /is seminal paper M( 7elational
Model o "ata or Large 2/ared "ata %anksM' pu&lis/ed in +:;9. In t/is
paper and later papers /e deined $/at /e meant &y r"('ton'(. One
$ell4kno$n deinition o $/at constitutes a relational data&ase system is
CoddIs +8 rules. )o$ever' many o t/e early implementations o t/e
relational model did not conorm to all o CoddIs rules' so t/e term
gradually came to descri&e a &roader class o data&ase systems. (t a
minimum' t/ese systems5
presented t/e data to t/e user as relations Ba presentation in
ta&ular orm' i.e. as a co(("cton o ta&les $it/ eac/ ta&le
consisting o a set o ro$s and columns' can satisy t/is propertyC
provided relational operators to manipulate t/e data in ta&ular
orm
@/e irst systems t/at $ere relatively ait/ul implementations o t/e
relational model $ere rom t/e University o Mic/iganJ Micro "%M2
B+:.:C and rom I%M U3 2cientiic Centre at !eterleeJ I2+ B+:;9O;8C
and its ollo$on !7@0 B+:;=O;:C. @/e irst system sold as an 7"%M2
$as Multics 7elational "ata 2tore' irst sold in +:;<. Ot/ers /ave &een
%erkeley Ingres FUEL and I%M %2+8.
@/e most popular deinition o an 7"%M2 is a product t/at presents a
vie$ o data as a collection o ro$s and columns' even i it is not &ased
strictly upon relational t/eory. %y t/is deinition' 7"%M2 products
typically implement some &ut not all o CoddIs +8 rules.
( second' t/eory4&ased sc/ool o t/oug/t argues t/at i a data&ase does
not implement all o CoddIs rules Bor t/e current understanding on t/e
relational model' as eDpressed &y C/ristop/er # "ate' )ug/ "ar$en and
ot/ersC' it is not relational. @/is vie$' s/ared &y many t/eorists and
ot/er strict ad/erents to CoddIs principles' $ould dis*ualiy most
"%M2s as not relational. ?or clariication' t/ey oten reer to some
7"%M2s as Truly#)elational Database Management 0ystems
B@7"%M2C' naming ot/ers (seudo#)elational Database Management
0ystems B!7"%M2C.
(lmost all commercial relational "%M2s employ 2FL as t/eir *uery
language. (lternative *uery languages /ave &een proposed and
implemented' &ut very e$ /ave &ecome commercial products.
+81
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.+ M'r0"t Structur"
Given &elo$ is a list o top RDBMS &"ndor! n +::? $it/ igures in
millions o United 2tates "ollars pu&lis/ed in an I"C study.
V"ndor G(o$'( R"&"nu"
Oracle ;':+8
I%M =',<=
Microsot ='918
2y&ase 18,9
@eradata ,1;
Ot/ers +'.8,
Tot'( *?E85+
Lo$ adoption costs associated $it/ open4source 7"%M2 products suc/
as My2FL and !ostgre2FL /ave &egun inluencing vendor pricing and
licensing strategies
Z
.
,., F"'tur"! 'nd R"!<on!$(t"! o/ 'n RDBMS
(s mentioned earlier' an 7"%M2 is sot$are t/at is used or creating
and maintaining a data&ase. Maintaining involves several tasks t/at an
7"%M2 takes care o. @/ese tasks are as ollo$5
Contro( D't' R"dund'nc)
2ince data in an 7"%M2 is spread across several ta&les' repetition or
redundancy is reduced. 7edundant data can &e eDtracted and stored in
anot/er ta&le' along $it/ a ield t/at is common to &ot/ t/e ta&les. "ata
can t/en &e eDtracted rom t/e t$o ta&les &y using t/e common ield.
D't' A$!tr'cton
@/is $ould imply t/at t/e 7"%M2 /ides t/e actual $ay' in $/ic/ data
is stored' $/ile providing t/e user $it/ a conceptual representation o
t/e data.
Su<<ort /or Mu(t<(" U!"r!
( true 7"%M2 allo$s eective s/aring o data. @/at is' it ensures t/at
several users can concurrently access t/e data in t/e data&ase $it/out
aecting t/e speed o t/e data access.
+8.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
In a data&ase application' $/ic/ can &e used &y several users
concurrently' t/ere is t/e possi&ility t/at t$o users may try to modiy a
particular record at t/e same time. @/is could lead to one personKs
c/anges &eing made $/ile t/e ot/ers are over$ritten. @o avoid suc/
conusion' most 7"%M2s provide a record4locking mec/anism. @/is
mec/anism ensures t/at no t$o users could modiy a particular record at
t/e same time. ( record is as it $ere TlockedU $/ile one user makes
c/anges to it. (not/er user is t/ereore not allo$ed to modiy it till t/e
c/anges are complete and t/e record is saved. @/e TlockU is t/en
released' and t/e record availa&le or editing again.
Mu(t<(" 3')! o/ Int"r/"rn- to t1" S)!t"#
@/is $ould re*uire t/e data&ase to &e a&le to &e accessi&le t/roug/
dierent *uery languages as $ell as programming languages. It $ould
also mean t/at a variety o ront4end tools s/ould &e a&le to use t/e
data&ase as a &ack4end. ?or eDample data stored in Microsot (ccess
can &e displayed and manipulated using orms created in sot$are suc/
as 0isual %asic or ?ront !age 8999.
R"!trctn- Un'ut1or>"d Acc"!!
(n 7"%M2 provides a security mec/anism t/at ensures t/at data in t/e
data&ase is protected rom unaut/oriEed access and malicious use. @/e
security t/at is implemented in most 7"%M2s is reerred to as PUser4
level securityK' $/erein t/e various users o t/e data&ase are assigned
usernames and pass$ords.' only $/en t/e user enters t/e correct
username and pass$ord is /e a&le to access t/e data in t/e data&ase.
In addition to t/is' a particular user could &e restricted to only vie$ t/e
data' $/ile anot/er could /ave t/e rig/ts to modiy t/e data. ( t/ird user
could /ave rig/t s to c/ange t/e structure o some ta&le itsel' in addition
to t/e rig/ts t/at t/e ot/er t$o /ave.
W/en security is implemented properly' data is secure and cannot &e
tampered $it/.
En/orcn- Int"-rt) Con!tr'nt!
7"%M2 provide a set o rules t/at ensure t/at data entered into a ta&le
is valid. @/ese rules must remain true or a data&ase to preserve
integrity. PIntegrity constraintsK are speciied at t/e time o creating t/e
data&ase' and are enorced &y t/e 7"%M2.
?or eDample in a PMarks Pta&le' a constraint can &e added to ensure t/at
t/e marks in eac/ su&ject &e &et$een 9 and +99. 2uc/ a constraint is
called a PC/eckK constraint. It is a rule t/at can &e set &y t/e user to
+8;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
ensure t/at only data t/at meets t/e criteria speciied t/ere is allo$ed to
enter t/e data&ase. @/e given eDample ensures t/at only a num&er
&et$een 9 and +99 can &e entered into t/e marks column.
B'c0u< 'nd R"co&"r)
In spite o ensuring t/at t/e data&ase is secure rom unaut/oriEed access-
user as $ell as invalid entries' t/ere is al$ays a danger t/at t/e data in
t/e data&ase could get lost. @/ey could /appen due to some /ard$are
pro&lems or system cras/. It could t/ereore result in a loss o all data.
@o guard t/e data&ase rom t/is' most 7"%M2s /ave in&uilt &ackup and
recovery tec/ni*ues t/at ensure t/at t/e data&ase is protected rom t/ese
kinds o atalities too.
,.8 Co#<'r!on o/ R"('ton'( D't'$'!" M'n'-"#"nt
S)!t"#!
@/e ollo$ing ta&les compare general and tec/nical inormation or a
num&er o relational data&ase management systems. Comparisons are
&ased on t/e sta&le versions $it/out any add4ons' eDtensions or eDternal
programs.
,.8.* G"n"r'( n/or#'ton
M'nt'n"r
Fr!t <u$(c
r"("'!"
d't"
L't"!t
!t'$("
&"r!on
So/t2'r"
(c"n!"
8t1 D#"n!on ," s.a.s +:<, v++ 2FL !roprietary
ADABAS 2ot$are (G +:;9 # #
Ad'<t&" S"r&"r
Ent"r<r!"
2y&ase +:<; +1.9 !roprietary
Ad&'nt'-"
D't'$'!" S"r&"r
2y&ase +::8 <.+ !roprietary
A<'c1" D"r$) (pac/e 899, +9.,.+.=
(pac/e
License
D't'co# C( # ++.8 !roprietary
DB+ I%M +:<8 :.1 !roprietary
DBISAM
Elevate
2ot$are
# ,.81 !roprietary
D't'2'!<
2igniicant "ata
2ystems
(pril 899< +.9.+ !roprietary
E("&'t"DB
Elevate
2ot$are
# +.9+ !roprietary
F("M'0"r ?ileMaker +:<, : proprietary
Fr"$rd ?ire&ird project #uly 81' 8999 8.+.9 I!L and I"!L
In/or#7 I%M +:<1 ++.+9 !roprietary
+8<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
CS=LDB
)2FL
"evelopment
Group
899+ +.<.9 %2"
C+ )8 2ot$are 8991 +.9
E!L and
modiied M!L
In-r"! Ingres Corp. +:;,
Ingres 899.
r8 :.+.9
G!L and
proprietary
Int"rB'!" CodeGear +:<1 899; !roprietary
M'7DB 2(! (G # ;..
G!L or
proprietary
Mcro!o/t Acc"!! Microsot +::8 +8 B899;C !roprietary
Mcro!o/t V!u'(
Fo7<ro
Microsot # : B8991C !roprietary
Mcro!o/t S=L
S"r&"r
Microsot +:<:
:.99.=9,8
B8991 2!8C
!roprietary
Mon"tDB
@/e Monet"%
"eveloper
@eam
899,
,.+. B?e&.
899;C
Monet"%
!u&lic License
v+.+
M)S=L
2un
Microsystems
Novem&er
+::.
1.9..;
G!L or
proprietary
CP NonSto< S=L
)e$lett4
!ackard
+:<;
2FL MQ
8.9
!roprietary
O#n! Studo @igerLogic Inc #uly +:<8
,.=.+
7elease +
BMay 899<C
!roprietary
Or'c("
Oracle
Corporation
Novem&er
+:;:
++g
7elease +
B2eptem&er
899;C
!roprietary
Or'c(" Rd$
Oracle
Corporation
+:<, ;.8 !roprietary
O<"nEd-"
!rogress
2ot$are
Corporation
+:<, +9.+C !roprietary
O<"nLn0
Vrtuo!o
OpenLink
2ot$are
+::<
1.9.1
B#anuary
899<C
G!L or
proprietary
P"r&'!&" PS=L
!ervasive
2ot$are
# : !roprietary
Po()1"dr' DBMS ENE( (% +::=
<.9 B#uly
899<C
!roprietary
Po!t-r"S=L
!ostgre2FL
Glo&al
"evelopment
Group
#une +:<:
<.=.= B+8
#une 899<C
%2"
P)rr1o DBMS
University o
!aisley
Novem&er
8991
9.1 !roprietary
RB'!" 7%ase # ;.. !roprietary
+8:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
RDM E#$"dd"d
%irdstep
@ec/nology
+:<, <.+ !roprietary
RDM S"r&"r
%irdstep
@ec/nology
+::9 <.9 !roprietary
Sc#or"DB 2cimore 8991 8.1 ?ree$are
S#'((S=L 2mall2FL
(pril +.'
8991
9.+: LG!L
S=L An)21"r" 2y&ase +::8 +9.9 !roprietary
S=Lt"
". 7ic/ard
)ipp
(ugust +;'
8999
=.1.; B+;
Marc/
899<C
!u&lic domain
T"r'd't' @eradata +:<, 0+8 !roprietary
V'("ntn'
!aradigma
2ot$are
?e&ruary
+::<
=.9.+ !roprietary
,.8.+ O<"r'tn- !)!t"# !u<<ort
@/e operating systems t/e 7"%M2es can run on.
3ndo2!
M'c
OS
I
Lnu7 BSD UNII >BOS
*

8t1 D#"n!on Aes Aes No No No No
ADABAS Aes No Aes No Aes Aes
Ad'<t&" S"r&"r
Ent"r<r!"
Aes No Aes Aes Aes No
Ad&'nt'-" D't'$'!"
S"r&"r
Aes No Aes No No No
A<'c1" D"r$)
+
Aes Aes Aes Aes Aes Aes
D't'Co# No No No No No Aes
D't'2'!< Aes No No No No No
DB+
5
Aes No Aes No Aes Aes
Fr"$rd Aes Aes Aes Aes Aes May&e
CS=LDB
+
Aes Aes Aes Aes Aes Aes
C+
+
Aes Aes Aes Aes Aes May&e
F("M'0"r Aes Aes No No No No
In/or#7 Aes Aes Aes Aes Aes No
In-r"! Aes Aes Aes Aes Aes !artial
Int"rB'!" Aes Aes Aes No
Aes
B2olarisC
No
M'7DB Aes No Aes No Aes May&e
Mcro!o/t Acc"!! Aes No No No No No
Mcro!o/t V!u'(
Fo7<ro
Aes No No No No No
Mcro!o/t S=L
S"r&"r
Aes No No No No No
Mon"tDB Aes Aes Aes No Aes No
M)S=L Aes Aes Aes Aes Aes May&e
O#n! Studo Aes Aes Aes No No No
+=9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Or'c(" Aes Aes Aes No Aes Aes
Or'c(" Rd$
,
No No No No No No
O<"nEd-" Aes No Aes No Aes No
O<"nLn0 Vrtuo!o Aes Aes Aes Aes Aes Aes
Po()1"dr' DBMS Aes No Aes No Aes No
Po!t-r"S=L Aes Aes Aes Aes Aes No
P)rr1o DBMS Aes B.NE@C No
Aes
BMonoC
No No No
RB'!" Aes No No No No No
RDM E#$"dd"d Aes Aes Aes Aes Aes No
RDM S"r&"r Aes Aes Aes Aes Aes No
Sc#or"DB Aes No No No No No
S#'((S=L
+
Aes Aes Aes Aes Aes Aes
S=L An)21"r" Aes Aes Aes No Aes No
S=Lt" Aes Aes Aes Aes Aes May&e
T"r'd't' Aes No Aes No Aes No
V'("ntn' Aes Aes Aes No No No
Note B+C5 Open source data&ases listed as UNIQ4compati&le $ill likely
compile and run under E-O2Is &uilt4in UNIQ 2ystem 2ervices BU22C
su&system. Most data&ases listed as LinuD4compati&le can run alongside
E-O2 on t/e same server using LinuD on E2eries.
Note B85 @/e data&ase availa&ility depends on #ava 0irtual Mac/ine not
on t/e operatin system
Note B=C5 Oracle 7d& $as originally developed &y "EC' and runs on
Open0M2
Note B,C5 Oracle data&ase ++g also runs on Open0M2' )!-UQ and (IQ.
+9g also supported %28999-O2" and E-O2 B=+4&itC' &ut t/at support /as
&een discontinued in ++g. Earlier versions t/an +9g $ere availa&le on a
$ide variety o platorms.
Note B1C5 "%8 is also availa&le or i1-O2' E-0M' E-02E. !revious
versions $ere also availa&le or O2-8.
,.8., Fund'#"nt'( /"'tur"!
Inormation a&out $/at undamental 7"%M2 eatures are implemented
natively.
ACID
R"/"r"nt'(
nt"-rt)
Tr'n!'cton! Uncod" Int"r/'c"
8t1
D#"n!on
Aes Aes Aes Aes GUI L 2FL
ADABAS # # # # #
Ad'<t&"
S"r&"r
Ent"r<r!"
Aes Aes Aes Aes #
+=+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Ad&'nt'-"
D't'$'!"
S"r&"r
Aes Aes Aes No (!I L 2FL
A<'c1"
D"r$)
Aes Aes Aes Aes 2FL
D't'2'!< No Aes Aes Aes GUI
DB+ Aes Aes Aes Aes GUI L 2FL
Fr"$rd Aes Aes Aes Aes 2FL
CS=LDB Aes Aes Aes Aes 2FL
C+ Aes Aes Aes Aes 2FL
In/or#7 Aes Aes Aes Aes #
In-r"! Aes Aes Aes Aes 2FL
Int"rB'!" Aes Aes Aes Aes 2FL
M'7DB Aes Aes Aes Aes 2FL
Mcro!o/t
Acc"!!
No Aes Aes Aes GUI L 2FL
Mcro!o/t
V!u'( Fo7<ro
No Aes Aes No GUI L 2FL
Mcro!o/t
S=L S"r&"r
Aes Aes Aes Aes 2FL
Mon"tDB Aes Aes Aes Aes #
M)S=L Aes
.
Aes
.
Aes
.
!artial 2FL
Or'c(" Aes Aes Aes Aes 2FL
Or'c(" Rd$ Aes Aes Aes Aes #
O<"nEd-" Aes No
;
Aes Aes
!rogress ,GL
L 2FL
O<"nLn0
Vrtuo!o
Aes Aes Aes Aes #
Po()1"dr'
DBMS
Aes Aes Aes Aes 2FL
Po!t-r"S=L Aes Aes Aes Aes 2FL
P)rr1o
DBMS
Aes Aes Aes Aes #
RDM
E#$"dd"d
Aes Aes Aes Aes 2FL L (!I
RDM S"r&"r Aes Aes Aes Aes 2FL L (!I
Sc#or"DB Aes Aes Aes !artial 2FL
S=L
An)21"r"
Aes Aes Aes Aes #
S=Lt" Aes No
<
%asic
<
Aes 2FL
T"r'd't' Aes Aes Aes Aes 2FL
V'("ntn' No Aes No Aes #
Note B.C5 ?or transactions and reerential integrity' t/e Inno"% ta&le
type must &e usedJ Windo$s installer sets t/is as deault i support or
transactions is selected' on ot/er operating systems t/e deault ta&le type
is MyI2(M. )o$ever' even t/e Inno"% ta&le type permits storage o
values t/at eDceed t/e data rangeJ some vie$ t/is as violating t/e
Integrity constraint o (CI".
+=8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Note B;C5 ?O7EIGN 3EA constraints are parsed &ut are not enorced.
@riggers can &e used instead. Nested transactions are not supported.
Note B<C5 (vaila&le via @riggers.
8.: CONCLUSION
@/e most dominant model in use today is t/e relational data&ase
management systems' usually used $it/ t/e structured *uery language
2FL *uery language. Many "%M2 also support t/e Open "ata&ase
Connectivitry t/at supports a standard $ay or programmers to access
t/e data&ase management systems.
5.: SUMMARY
( 7elational data&ase management system B7"%M2C is a data&ase
management system B"%M2C t/at is &ased on t/e relational model as
introduced &y E. ?. Codd. Most popular commercial and open source
data&ases currently in use are &ased on t/e relational model.
E. ?. Codd introduced t/e term in /is seminal paper M( 7elational
Model o "ata or Large 2/ared "ata %anksM' pu&lis/ed in +:;9. In
t/is paper and later papers /e deined $/at /e meant &y r"('ton'(.
One $ell4kno$n deinition o $/at constitutes a relational data&ase
system is CoddIs +8 rules
@/e most popular deinition o an 7"%M2 is a product t/at presents
a vie$ o data as a collection o ro$s and columns' even i it is not
&ased strictly upon relational t/eory
(s mentioned earlier' an 7"%M2 is sot$are t/at is used or
creating and maintaining a data&ase. Maintaining involves several
tasks t/at an 7"%M2 takes care o
Comparisons are &ased on t/e sta&le versions $it/out any add4ons'
eDtensions or eDternal programs.
?.: TUTOR@MARAED ASSIGNMENT
+. List 1 eatures o 7elational "ata&ase Management 2ystems
8. Mention 1 criteria you can use to dierentiate types o 7"%M2s
7.: REFERENCESBFURTCER READINGS
Comparison o dierent 2FL implementations against 2FL standards.
Includes Oracle' "%8' Microsot 2FL 2erver' My2FL and
!ostgre2FL. B9<-#un-899;C.
Comparison o Oracle <-:i' My2FL ,.D and !ostgre2FL ;.D "%M2
against 2FL standards. B+,-Mar-8991C.
+==
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Comparison o Oracle and 2FL 2erver. B899,C.
Comparison o geometrical data /andling in !ostgre2FL' My2FL and
"%8 B8:-2ep-899=C.
Open 2ource "ata&ase 2ot$are Comparison BMar-8991C.
!ostgre2FL vs. My2FL vs. Commercial "ata&ases5 ItIs (ll (&out W/at
Aou Need B+8-(pr-899,C.
+=,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
UNIT + DATA 3ARECOUSE
CONTENTS
+.9 Introduction
8.9 O&jectives
,.: M'n Cont"nt
,.* C!tor)
=.8 B"n"/t! o/ D't' 3'r"1ou!n-
=.= D't' 3'r"1ou!" Arc1t"ctur"
=., Nor#'(>"d V"r!u! D#"n!on'( A<<ro'c1 to Stor'-"
o/ D't'
,.5 Con/or#n- In/or#'ton
,? To<@Do2n &"r!u! Botto#@U< D"!-n M"t1odo(o-"!
,.7 D't' 3'r"1ou!"! &"r!u! O<"r'ton'( S)!t"#!
,.8 E&o(uton n Or-'n>'ton U!" o/ D't' 3'r"1ou!"!
,.F D!'d&'nt'-"! o/ D't' 3'r"1ou!"!
=.+9 "ata Ware/ouse (ppliance
=.++ T1" Futur" o/ D't' 3'r"1ou!n-
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
( d't' 2'r"1ou!" is a repository o an organiEationIs electronically
stored data. "ata $are/ouses are designed to acilitate reporting and
analysis.
@/is classic deinition o t/e data $are/ouse ocuses on data storage.
)o$ever' t/e means to retrieve and analyEe data' to eDtract' transorm
and load data' and to manage t/e dictionary data are also considered
essential components o a data $are/ousing system. Many reerences to
data $are/ousing use t/is &roader conteDt. @/us' an eDpanded deinition
or data $are/ousing includes &usiness intelligence tools' tools to
eDtract' transorm' and load data into t/e repository' and tools to manage
and retrieve metadata.
In contrast to data $are/ouses are operational systems $/ic/ perorm
day4to4day transaction processing.
+=1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
d"/n" d't' 2'r"1ou!"
trace t/e /istory and development process o data $are/ouse
list various &eneits o data $are/ouse
deine t/e arc/itecture o a data $are/ouse
compare and contrast "ata Ware/ouses and Operational 2ystems
kno$ $/at is a data $are/ouse appliance' and t/e disadvantages o
data $are/ouse
/ave idea o $/at t/e uture /olds or data $are/ouse concept.
,.: MAIN CONTENT
,.* C!tor)
@/e concept o data $are/ousing dates &ack to t/e late4+:<9s $/en
I%M researc/ers %arry "evlin and !aul Murp/y developed t/e
M&usiness data $are/ouseM. In essence' t/e data $are/ousing concept
$as intended to provide an arc/itectural model or t/e lo$ o data rom
operational systems to decision support environments. @/e concept
attempted to address t/e various pro&lems associated $it/ t/is lo$ 4
mainly' t/e /ig/ costs associated $it/ it. In t/e a&sence o a data
$are/ousing arc/itecture' an enormous amount o redundancy o
inormation $as re*uired to support t/e multiple decision support
environment t/at usually eDisted. In larger corporations it $as typical
or multiple decision support environments to operate independently.
Eac/ environment served dierent users &ut oten re*uired muc/ o t/e
same data. @/e process o gat/ering' cleaning and integrating data rom
various sources' usually long eDisting operational systems Busually
reerred to as legacy systemsC' $as typically in part replicated or eac/
environment. Moreover' t/e operational systems $ere re*uently
reeDamined as ne$ decision support re*uirements emerged. Oten ne$
re*uirements necessitated gat/ering' cleaning and integrating ne$ data
rom t/e operational systems t/at $ere logically related to prior gat/ered
data.
%ased on analogies $it/ real4lie $are/ouses' data $are/ouses $ere
intended as large4scale collection-storage-staging areas or corporate
data. "ata could &e retrieved rom one central point or data could &e
distri&uted to Mretail storesM or Mdata martsM $/ic/ $ere tailored or
ready access &y users.
+=.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.+ B"n"/t! o/ D't' 3'r"1ou!n-
2ome o t/e &eneits t/at a data $are/ouse provides are as ollo$s5
( data $are/ouse provides a common data model or all data o
interest regardless o t/e dataIs source. @/is makes it easier to
report and analyEe inormation t/an it $ould &e i multiple data
models $ere used to retrieve inormation suc/ as sales invoices'
order receipts' general ledger c/arges' etc.
!rior to loading data into t/e data $are/ouse' inconsistencies are
identiied and resolved. @/is greatly simpliies reporting and
analysis.
Inormation in t/e data $are/ouse is under t/e control o data
$are/ouse users so t/at' even i t/e source system data is purged
over time' t/e inormation in t/e $are/ouse can &e stored saely
or eDtended periods o time.
%ecause t/ey are separate rom operational systems' data
$are/ouses provide retrieval o data $it/out slo$ing do$n
operational systems.
"ata $are/ouses acilitate decision support system applications
suc/ as trend reports Be.g.' t/e items $it/ t/e most sales in a
particular area $it/in t/e last t$o yearsC' eDception reports' and
reports t/at s/o$ actual perormance versus goals.
"ata $are/ouses can $ork in conjunction $it/ and' /ence'
en/ance t/e value o operational &usiness applications' nota&ly
customer relations/ip management BC7MC systems.
,., D't' 3'r"1ou!" Arc1t"ctur"
(rc/itecture' in t/e conteDt o an organiEationIs data $are/ousing
eorts' is a conceptualiEation o /o$ t/e data $are/ouse is &uilt. @/ere
is no rig/t or $rong arc/itecture. @/e $ort/iness o t/e arc/itecture can
&e judged in /o$ t/e conceptualiEation aids in t/e &uilding'
maintenance' and usage o t/e data $are/ouse.
One possi&le simple conceptualiEation o a data $are/ouse arc/itecture
consists o t/e ollo$ing interconnected layers5
O<"r'ton'( D't'$'!" L')"r
@/e source data or t/e data $are/ouse 4 (n organiEationIs E7! systems
all into t/is layer.
+=;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
In/or#'ton'( Acc"!! L')"r
@/e data accessed or reporting and analyEing and t/e tools or reporting
and analyEing data 4 %usiness intelligence tools all into t/is layer. (nd
t/e Inmon43im&all dierences a&out design met/odology' discussed
later in t/is article' /ave to do $it/ t/is layer.
D't' 'cc"!! L')"r
@/e interace &et$een t/e operational and inormational access layer 4
@ools to eDtract' transorm' load data into t/e $are/ouse all into t/is
layer.
M"t'd't' L')"r
@/e data directory 4 @/is is oten usually more detailed t/an an
operational system data directory. @/ere are dictionaries or t/e entire
$are/ouse and sometimes dictionaries or t/e data t/at can &e accessed
&y a particular reporting and analysis tool.
,.8 Nor#'(>"d V"r!u! D#"n!on'( A<<ro'c1 to Stor'-" o/
D't'
@/ere are t$o leading approac/es to storing data in a data $are/ouse 4
t/e dimensional approac/ and t/e normaliEed approac/.
In t/e dimensional approac/' transaction data are partitioned into eit/er
TactsU' $/ic/ are generally numeric transaction data' or MdimensionsM'
$/ic/ are t/e reerence inormation t/at gives conteDt to t/e acts. ?or
eDample' a sales transaction can &e &roken up into acts suc/ as t/e
num&er o products ordered and t/e price paid or t/e products' and into
dimensions suc/ as order date' customer name' product num&er' order
s/ip4to and &ill4to locations' and salesperson responsi&le or receiving
t/e order. ( key advantage o a dimensional approac/ is t/at t/e data
$are/ouse is easier or t/e user to understand and to use. (lso' t/e
retrieval o data rom t/e data $are/ouse tends to operate very *uickly.
@/e main disadvantages o t/e dimensional approac/ are5 +C In order to
maintain t/e integrity o acts and dimensions' loading t/e data
$are/ouse $it/ data rom dierent operational systems is complicated'
and 8C It is diicult to modiy t/e data $are/ouse structure i t/e
organiEation adopting t/e dimensional approac/ c/anges t/e $ay in
$/ic/ it does &usiness.
In t/e normaliEed approac/' t/e data in t/e data $are/ouse are stored
ollo$ing' to a degree' t/e Codd normaliEation rule. @a&les are grouped
toget/er &y !u$%"ct 'r"'! t/at relect general data categories Be.g.' data
+=<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
on customers' products' inance' etc.C @/e main advantage o t/is
approac/ is t/at it is straig/tor$ard to add inormation into t/e
data&ase. ( disadvantage o t/is approac/ is t/at' &ecause o t/e num&er
o ta&les involved' it can &e diicult or users &ot/ to +C join data rom
dierent sources into meaningul inormation and t/en 8C access t/e
inormation $it/out a precise understanding o t/e sources o data and
o t/e data structure o t/e data $are/ouse.
@/ese approac/es are not eDact opposites o eac/ ot/er. "imensional
approac/es can involve normaliEing data to a degree.
,.5 Con/or#n- In/or#'ton
(not/er important decision in designing a data $are/ouse is $/ic/ data
to conorm and /o$ to conorm t/e data. ?or eDample' one operational
system eeding data into t/e data $are/ouse may use MMM and M?M to
denote seD o an employee $/ile anot/er operational system may use
MMaleM and M?emaleM. @/oug/ t/is is a simple eDample' muc/ o t/e
$ork in implementing a data $are/ouse is devoted to making similar
meaning data consistent $/en t/ey are stored in t/e data $are/ouse.
@ypically' eDtract' transorm' load tools are used in t/is $ork.
,.? To<@Do2n &"r!u! Botto#@U< D"!-n M"t1odo(o-"!
Botto#@U< D"!-n
7alp/ 3im&all' a $ell4kno$n aut/or on data $are/ousing' is a
proponent o t/e bottom#up approac/ to data $are/ouse design. In t/e
&ottom4up approac/ data marts are irst created to provide reporting and
analytical capa&ilities or speciic &usiness processes. "ata marts
contain atomic data and' i necessary' summariEed data. @/ese data
marts can eventually &e unioned toget/er to create a compre/ensive data
$are/ouse. @/e com&ination o data marts is managed t/roug/ t/e
implementation o $/at 3im&all calls Ma data $are/ouse &us
arc/itectureM.
%usiness value can &e returned as *uickly as t/e irst data marts can &e
created. Maintaining tig/t management over t/e data $are/ouse &us
arc/itecture is undamental to maintaining t/e integrity o t/e data
$are/ouse. @/e most important management task is making sure
dimensions among data marts are consistent. In 3im&all $ords' t/is
means t/at t/e dimensions MconormM.
To<@Do2n D"!-n
%ill Inmon' one o t/e irst aut/ors on t/e su&ject o data $are/ousing'
/as deined a data $are/ouse as a centraliEed repository or t/e entire
+=:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
enterprise. Inmon is one o t/e leading proponents o t/e top#do1n
approac/ to data $are/ouse design' in $/ic/ t/e data $are/ouse is
designed using a normaliEed enterprise data model. M(tomicM data' t/at
is' data at t/e lo$est level o detail' are stored in t/e data $are/ouse.
"imensional data marts containing data needed or speciic &usiness
processes or speciic departments are created rom t/e data $are/ouse.
In t/e Inmon vision t/e data $are/ouse is at t/e center o t/e MCorporate
Inormation ?actoryM BCI?C' $/ic/ provides a logical rame$ork or
delivering &usiness intelligence B%IC and &usiness management
capa&ilities. @/e CI? is driven &y data provided rom &usiness
operations
Inmon states t/at t/e data $are/ouse is5
Su$%"ct@Or"nt"d
@/e data in t/e data $are/ouse is organiEed so t/at all t/e data elements
relating to t/e same real4$orld event or o&ject are linked toget/er.
T#"@V'r'nt
@/e c/anges to t/e data in t/e data $are/ouse are tracked and recorded
so t/at reports can &e produced s/o$ing c/anges over time.
Non@Vo('t("
"ata in t/e data $are/ouse is never over4$ritten or deleted 4 once
committed' t/e data is static' read4only' and retained or uture reporting.
Int"-r't"d
@/e data $are/ouse contains data rom most or all o an organiEationIs
operational systems and t/is data is made consistent.
@/e top4do$n design met/odology generates /ig/ly consistent
dimensional vie$s o data across data marts since all data marts are
loaded rom t/e centraliEed repository. @op4do$n design /as also
proven to &e ro&ust against &usiness c/anges. Generating ne$
dimensional data marts against t/e data stored in t/e data $are/ouse is a
relatively simple task. @/e main disadvantage to t/e top4do$n
met/odology is t/at it represents a very large project $it/ a very &road
scope. @/e up4ront cost or implementing a data $are/ouse using t/e
top4do$n met/odology is signiicant' and t/e duration o time rom t/e
start o project to t/e point t/at end users eDperience initial &eneits can
&e su&stantial. In addition' t/e top4do$n met/odology can &e inleDi&le
and unresponsive to c/anging departmental needs during t/e
implementation p/ases.
+,9
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
C)$rd D"!-n
Over time it /as &ecome apparent to proponents o &ottom4up and top4
do$n data $are/ouse design t/at &ot/ met/odologies /ave &eneits and
risks. )y&rid met/odologies /ave evolved to take advantage o t/e ast
turn4around time o &ottom4up design and t/e enterprise4$ide data
consistency o top4do$n design
,.7 D't' 3'r"1ou!"! &"r!u! O<"r'ton'( S)!t"#!
Operational systems are optimiEed or preservation o data integrity and
speed o recording o &usiness transactions t/roug/ use o data&ase
normaliEation and an entity4relations/ip model. Operational system
designers generally ollo$ t/e Codd rules o data normaliEation in order
to ensure data integrity. Codd deined ive increasingly stringent rules o
normaliEation. ?ully normaliEed data&ase designs Bt/at is' t/ose
satisying all ive Codd rulesC oten result in inormation rom a
&usiness transaction &eing stored in doEens to /undreds o ta&les.
7elational data&ases are eicient at managing t/e relations/ips &et$een
t/ese ta&les. @/e data&ases /ave very ast insert-update perormance
&ecause only a small amount o data in t/ose ta&les is aected eac/ time
a transaction is processed. ?inally' in order to improve perormance'
older data are usually periodically purged rom operational systems.
"ata $are/ouses are optimiEed or speed o data retrieval. ?re*uently
data in data $are/ouses are denormalised via a dimension4&ased model.
(lso' to speed data retrieval' data $are/ouse data are oten stored
multiple times 4 in t/eir most granular orm and in summariEed orms
called aggregates. "ata $are/ouse data are gat/ered rom t/e
operational systems and /eld in t/e data $are/ouse even ater t/e data
/as &een purged rom t/e operational systems.
,.8 E&o(uton n Or-'n>'ton U!" o/ D't' 3'r"1ou!"!
OrganiEations generally start o $it/ relatively simple use o data
$are/ousing. Over time' more sop/isticated use o data $are/ousing
evolves. @/e ollo$ing general stages o use o t/e data $are/ouse can
&e distinguis/ed5
O// (n" O<"r'ton'( D't'$'!"!
"ata $are/ouses in t/is initial stage are developed &y simply copying
t/e data o an operational system to anot/er server $/ere t/e processing
load o reporting against t/e copied data does not impact t/e operational
systemIs perormance.
+,+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
O// (n" D't' 3'r"1ou!"
"ata $are/ouses at t/is stage are updated rom data in t/e operational
systems on a regular &asis and t/e data $are/ouse data is stored in a
data structure designed to acilitate reporting.
R"'( T#" D't' 3'r"1ou!"
"ata $are/ouses at t/is stage are updated every time an operational
system perorms a transaction Be.g.' an order or a delivery or a &ooking.C
Int"-r't"d D't' 3'r"1ou!"
"ata $are/ouses at t/is stage are updated every time an operational
system perorms a transaction. @/e data $are/ouses t/en generate
transactions t/at are passed &ack into t/e operational systems.
,.F D!'d&'nt'-"! o/ D't' 3'r"1ou!"!
@/ere are also disadvantages to using a data $are/ouse. 2ome o t/em
are5
Over t/eir lie' data $are/ouses can /ave /ig/ costs. @/e data
$are/ouse is usually not static. Maintenance costs are /ig/.
"ata $are/ouses can get outdated relatively *uickly. @/ere is a
cost o delivering su&optimal inormation to t/e organiEation.
@/ere is oten a ine line &et$een data $are/ouses and
operational systems. "uplicate' eDpensive unctionality may &e
developed. Or' unctionality may &e developed in t/e data
$are/ouse t/at' in retrospect' s/ould /ave &een developed in t/e
operational systems and vice versa.
,.*: D't' 3'r"1ou!" A<<('nc"
( d't' 2'r"1ou!" '<<('nc" is an integrated set o servers' storage'
O2' "%M2 and sot$are speciically pre4installed and pre4optimiEed or
data $are/ousing. (lternatively' t/e term is also used or similar
sot$are4only systems t/at purportedly are very easy to install on
speciic recommended /ard$are conigurations. "W appliances provide
solutions or t/e mid4to4large volume data $are/ouse market' oering
lo$4cost perormance most commonly on data volumes in t/e tera&yte
to peta&yte range.
+,8
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
T"c1no(o-) Pr#"r
Most "W appliance vendors use massively parallel processing BM!!C
arc/itectures to provide /ig/ *uery perormance and platorm
scala&ility. M!! arc/itectures consist o independent processors or
servers eDecuting in parallel. Most M!! arc/itectures implement a
Ts/ared not/ing arc/itectureU $/ere eac/ server is sel4suicient and
controls its o$n memory and disk. 2/ared not/ing arc/itectures /ave a
proven record or /ig/ scala&ility and little contention. "W appliances
distri&ute data onto dedicated disk storage units connected to eac/ server
in t/e appliance. @/is distri&ution allo$s "W appliances to resolve a
relational *uery &y scanning data on eac/ server in parallel. @/e divide4
and4con*uer approac/ delivers /ig/ perormance and scales linearly as
ne$ servers are added into t/e arc/itecture.
M!! data&ase arc/itectures are not ne$. @eradata' @andem' %ritton Lee'
and 2e*uent oered M!! 2FL4&ased arc/itectures in t/e +:<9s. @/e re4
emergence o M!! data $are/ouses /as &een aided &y open source and
commodity components. (dvances in tec/nology /ave reduced costs
and improved perormance in storage devices' multi4core C!Us and
net$orking components. Open source 7"%M2 products' suc/ as Ingres
and !ostgre2FL' reduce sot$are license costs and allo$ "W appliance
vendors to ocus on optimiEation rat/er t/an providing &asic data&ase
unctionality. Open source LinuD provides a sta&le' $ell4implemented
O2 or "W appliances.
C!tor)
Many consider @eradataKs initial product as t/e irst "W appliance Bor
%ritton4LeeIs' &ut %ritton LeeSrenamed 2/are%aseS$as ac*uired &y
@eradata in #une' +::9C. 2ome regard @eradataIs current oerings as still
&eing ot/er appliances' $/ile ot/ers argue t/at t/ey all s/ort in ease o
installation or administration. Interest in t/e data $are/ouse appliance
category is generally dated to t/e emergence o NeteEEa in t/e early
8999s.
More recently' a second generation o modern "W appliances /as
emerged' marking t/e move to mainstream vendor integration. I%M
integrated its Ino2p/ere Ware/ouse Bormerly "%8 Ware/ouseC $it/ its
o$n servers and storage to create t/e I%M Ino2p/ere %alanced
Ware/ouse. Ot/er "W appliance vendors /ave partnered $it/ major
/ard$are vendors to /elp &ring t/eir appliances to market. "(@(llegro
partners $it/ EMC and "ell and implements open source Ingres on
LinuD. Greenplum /as a partners/ip $it/ 2un Microsystems and
implements %iEgres Ba orm o !ostgre2FLC on 2olaris using t/e \?2
ile system. )! Neovie$ /as a $/olly4o$ned solution and uses )!
Non2top 2FL.
+,=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
3ognitio oers a ro$4&ased TvirtualU data $are/ouse appliance $/ile
0ertica' and !ar(ccel oer column4&ased TvirtualU data $are/ouse
appliances. Like Greenplum' !ar(ccel partners $it/ 2un Microsystems.
@/ese solutions provide sot$are4only solutions deployed on clusters o
commodity /ard$are. 3ognitioKs /omegro$n WQ8 data&ase runs on
several &lade conigurations. Ot/er players in t/e "W appliance space
include Calpont and "ataupia.
7ecently' t/e market /as seen t/e emergence o data $are/ouse &undles
$/ere vendors com&ine t/eir /ard$are and data&ase sot$are toget/er
as a data $are/ouse platorm. @/e Oracle OptimiEed Ware/ouse
Initiative com&ines t/e Oracle "ata&ase $it/ t/e industryKs leading
computer manuacturers "ell' EMC' )!' I%M' 2GI and 2un
Microsystems. OracleIs OptimiEed Ware/ouses are pre4validated
conigurations and t/e data&ase sot$are comes pre4installed' t/oug/
some analysts dier as to $/et/er t/ese s/ould &e regarded as
appliances.
B"n"/t!
R"ducton n Co!t!
@/e total cost o o$ners/ip B@COC o a data $are/ouse consists o initial
entry costs' on4going maintenance costs and t/e cost o increasing
capacity as t/e data $are/ouse gro$s. "W appliances oer lo$ entry
and maintenance costs. Initial costs range rom i+9'999 to i+19'999 per
tera&yte' depending on t/e siEe o t/e "W appliance installed.
@/e resource cost or monitoring and tuning t/e data $are/ouse makes
up a large part o t/e @CO' oten as muc/ as <9G. "W appliances
reduce administration or day4to4day operations' setup and integration.
Many also oer lo$ costs or eDpanding processing po$er and capacity.
Wit/ t/e increased ocus on controlling costs com&ined $it/ tig/t I@
%udgets' data $are/ouse managers need to reduce and manage eDpenses
$/ile leveraging t/eir tec/nology as muc/ as possi&le making "W
appliances a natural solution.
P'r'(("( P"r/or#'nc"
"W appliances provide a compelling price-perormance ratio. Many
support miDed4$orkloads $/ere a &road range o ad4/oc *ueries and
reports run simultaneously $it/ loading. "W appliance vendors use
several distri&ution and partitioning met/ods to provide parallel
perormance. 2ome "W appliances scan data using partitioning and
se*uential I-O instead o indeD usage. Ot/er "W appliances use
+,,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
standard data&ase indeDing.
Wit/ /ig/ perormance on /ig/ly granular data' "W appliances are a&le
to address analytics t/at previously could not meet perormance
re*uirements.
R"duc"d Ad#n!tr'ton
"W appliances provide a single vendor solution and take o$ners/ip or
optimiEing t/e parts and sot$are $it/in t/e appliance. @/is eliminates
t/e customerKs costs or integration and regression testing o t/e "%M2'
storage and O2 on a tera&yte scale and avoids some o t/e compati&ility
issues t/at arise rom multi4vendor solutions. ( single support point also
provides a single source or pro&lem resolution and a simpliied upgrade
pat/ or sot$are and /ard$are.
@/e care and eeding o "W appliances is less t/an many alternate data
$are/ouse solutions. "W appliances reduce administration t/roug/
automated space allocation' reduced indeD maintenance and in most
cases' reduced tuning and perormance analysis.
Bu(t@n C-1 A&'('$(t)
"W appliance vendors provide &uilt4in /ig/ availa&ility t/roug/
redundancy on components $it/in t/e appliance. Many oer $arm4
stand&y servers' dual net$orks' dual po$er supplies' disk mirroring $it/
ro&ust ailover and solutions or server ailure.
Sc'('$(t)
"W appliances scale or &ot/ capacity and perormance. Many "W
appliances implement a modular design t/at data&ase administrators can
add to incrementally' eliminating up4ront costs or over4provisioning.
In contrast' arc/itectures t/at do not support incremental eDpansion
result in /ours o production do$ntime' during $/ic/ data&ase
administrators eDport and re4load tera&ytes o data. In M!!
arc/itectures' adding servers increases perormance as $ell as capacity.
@/is is not al$ays t/e case $it/ alternate solutions.
R'<d T#"@to@V'(u"
Companies increasingly eDpect to use &usiness analytics to improve t/e
current cycle. "W appliances provide ast implementations $it/out t/e
need or regression and integration testing. 7apid prototyping is possi&le
&ecause o reduced tuning and indeD creation' ast loading and reduced
needs or aggregation in some cases.
+,1
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
A<<(c'ton U!"!
D3 '<<('nc"! <ro&d" !o(uton! /or #'n) 'n'()tc '<<(c'ton u!"!E
nc(udn-:
Enterprise data $are/ousing
2uper4siEed sand&oDes isolate po$er users $it/ resource intensive
*ueries
!ilot projects or projects re*uiring rapid prototyping and rapid time4to4
value
O4loading projects rom t/e enterprise data $are/ouseJ ie large
analytical *uery projects t/at aect t/e overall $orkload o t/e
enterprise data $are/ouse
(pplications $it/ speciic perormance or loading re*uirements
"ata marts t/at /ave outgro$n t/eir present environment
@urnkey data $are/ouses or data marts
2olutions or applications $it/ /ig/ data gro$t/ and /ig/ perormance
re*uirements
(pplications re*uiring data $are/ouse encryption
Tr"nd!
T1" D3 '<<('nc" #'r0"t ! !1/tn- tr"nd! n #'n) 'r"'! '! t
"&o(&"!:
0endors are moving to$ard using commodity tec/nologies rat/er t/an
proprietary assem&ly o commodity components.
Implemented applications s/o$ usage eDpansion rom tactical and data
mart solutions to strategic and enterprise data $are/ouse use.
Mainstream vendor participation is no$ apparent.
Wit/ a lo$er total cost o o$ners/ip' reduced maintenance and /ig/
perormance to address &usiness analytics on gro$ing data volumes'
most analysts &elieve t/at "W appliances $ill gain market s/are.
,.** T1" Futur" o/ D't' 3'r"1ou!n-
"ata $are/ousing' like any tec/nology nic/e' /as a /istory o
innovations t/at did not receive market acceptance.
( 899; Gartner Group paper predicted t/e ollo$ing tec/nologies could
&e disruptive to t/e &usiness intelligence market.
2ervice Oriented (rc/itecture
2earc/ capa&ilities integrated into reporting and analysis
tec/nology
+,.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
2ot$are as a 2ervice
(nalytic tools t/at $ork in memory
0isualiEation
(not/er prediction is t/at data $are/ouse perormance $ill continue to
&e improved &y use o data $are/ouse appliances' many o $/ic/
incorporate t/e developments in t/e aorementioned Gartner Group
report.
?inally' management consultant @/omas "avenport' among ot/ers'
predicts t/at more organiEations $ill seek to dierentiate t/emselves &y
using analytics ena&led &y data $are/ouses.
8.: CONCLUSION
"ata $are/ouse is no$ emerging as very important in data&ase
management systems. @/is is as a result t/e gro$t/ in t/e data&ase o
large corporations. ( data $are/ouse no$ makes it easier or t/e
/olding o data $/ile in use. )o$ever' t/ere are c/allenges are
constraints in t/e acceptance and implementation o data $are/ouse'
$/ic/ is a normal in t/e development o any concept. @/e uture o data
$are/ouse is good as some organiEations $ill opt or it.
5.: SUMMARY
( d't' 2'r"1ou!" is a repository o an organiEationIs electronically
stored data. "ata $are/ouses are designed to acilitate reporting and
analysis.
@/e concept o data $are/ousing dates &ack to t/e late4+:<9s $/en
I%M researc/ers %arry "evlin and !aul Murp/y developed t/e
M&usiness data $are/ouseM.
(rc/itecture' in t/e conteDt o an organiEationIs data $are/ousing
eorts' is a conceptualiEation o /o$ t/e data $are/ouse is &uilt.
@/ere are t$o leading approac/es to storing data in a data $are/ouse
4 t/e dimensional approac/ and t/e normaliEed approac/.
(not/er important decision in designing a data $are/ouse is $/ic/
data to conorm and /o$ to conorm t/e data.
7alp/ 3im&all' a $ell4kno$n aut/or on data $are/ousing' is a
proponent o t/e bottom#up approac/ to data $are/ouse design.
Operational systems are optimiEed or preservation o data integrity
and speed o recording o &usiness transactions t/roug/ use o
data&ase normaliEation and an entity4relations/ip model.
OrganiEations generally start o $it/ relatively simple use o data
$are/ousing. Over time' more sop/isticated use o data $are/ousing
evolves.
+,;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
( d't' 2'r"1ou!" '<<('nc" is an integrated set o servers' storage'
O2' "%M2 and sot$are speciically pre4installed and pre4optimiEed
or data $are/ousing
"ata $are/ousing' like any tec/nology nic/e' /as a /istory o
innovations t/at did not receive market acceptance.
?.: Tutor@M'r0"d A!!-n#"nt
+. "iscuss t/e &eneits associated $it/ t/e use o data $are/ouse..
8. Mention 1 applications o data $are/ouse appliances
7.: REFERENCESBFURTCER READINGS
Inmon' W.). Tech Topic: 8hat is a Data 8arehouseH !rism 2olutions.
0olume +. +::1.
Aang' #un. 8are9ouse Information (rototype at 0tanford -89I(0..
2tanord University. #uly ;' +::<.
Caldeira' C. M"ata Ware/ousing 4 Conceitos e ModelosM. Edijkes
2lla&o. 899<. I2%N :;<4:;84.+<4,;:4:
3im&all' 7. and 7oss' M. M@/e "ata Ware/ouse @oolkit5 @/e Complete
Guide to "imensional ModelingM. pp. =+9. Wiley. 8nd Ed. 8998.
I2%N 94,;+48998,4;.
Ericsson' 7. M%uilding %usiness Intelligence (pplications $it/ .NE@M.
+st Ed. C/arles 7iver Media. ?e&ruary 899,. pp. 8<48:.
!endse' Nigel and %ange' Carsten M@/e Missing NeDt %ig @/ingsM'
2c/legel' 3urt MEmerging @ec/nologies Could !rove "isruptive to t/e
%usiness Intelligence MarketM' Gartner Group. #uly .' 899;
"avenport' @/omas and )arris' #eanne MCompeting on (nalytics5 @/e
Ne$ 2cience o WinningM. )arvard %usiness 2c/ool !ress. 899;.
I2%N +4,884+9==84=.
Fueries rom )ell &log m W/en is an appliance not an applianceN
"%M28 S "ata%ase Management 2ystem 2ervicesm%log (rc/ive m
"ata $are/ouse appliances O act and iction
@odd W/ite BNovem&er 1 +::9C. M@eradata Corp. suers irst *uarterly
Loss in our yearsM. 7os Angeles $usiness ;ournal.
+,<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
UNIT , DOCUMENT MANAGEMENT SYSTEM
CONTENTS
+.9 Introduction
8.9 O&jectives
,.: M'n Cont"nt
,.* C!tor)
3.2 Document Management and Content
Management
,., Co#<on"nt!
,.8 I!!u"! Addr"!!"d n Docu#"nt M'n'-"#"nt
=.1 Using QML in "ocument and Inormation Management
,.? T)<"! o/ Docu#"nt M'n'-"#"nt S)!t"#!
,.9 Conclusion
1.9 2ummary
..9 @utor4Marked (ssignment
;.9 7eerences-?urt/er 7eadings
*.: INTRODUCTION
( docu#"nt #'n'-"#"nt !)!t"# B"M2C is a computer system Bor set
o computer programsC used to track and store electronic documents
and-or images o paper documents. @/e term /as some overlap $it/ t/e
concepts o Content Management 2ystems and is oten vie$ed as a
component o Enterprise Content Management 2ystems and related to
"igital (sset Management' "ocument imaging' Worklo$ systems and
7ecords Management systems. Contract Management and Contract
Liecycle Management BCLMC can &e vie$ed as eit/er components or
implementations o ECM.
+.: OB;ECTIVES
(t t/e end o t/is unit' you s/ould &e a&le to5
deine document management system
trace t/e /istory and development process o document management
system
compare and contrast document management system and content
management systems
kno$ t/e &asic components o document management systems
ans$er t/e *uestion o issues addressed &y document management
systems
kno$ t/e types o document management systems availa&le o t/e
s/el.
+,:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
,.: MAIN CONTENT
,.* C!tor)
%eginning in t/e +:<9s' a num&er o vendors &egan developing systems
to manage paper4&ased documents. @/ese systems managed paper
documents' $/ic/ included not only printed and pu&lis/ed documents'
&ut also p/otos' prints' etc.
Later' a second system $as developed' to manage electronic documents'
i.e.' all t/ose documents' or iles' created on computers' and oten stored
on local user ile systems. @/e earliest electronic document management
BE"MC systems $ere eit/er developed to manage proprietary ile types'
or a limited num&er o ile ormats. Many o t/ese systems $ere later
reerred to as document imaging systems' &ecause t/e main capa&ilities
$ere capture' storage' indeDing and retrieval o image ile ormats.
@/ese systems ena&led an organiEation to capture aDes and orms' save
copies o t/e documents as images' and store t/e image iles in t/e
repository or security and *uick retrieval Bretrieval $as possi&le
&ecause t/e system /andled t/e eDtraction o t/e teDt rom t/e document
as it $as captured' and t/e teDt indeDer provided teDt retrieval
capa&ilitiesC.
E"M systems evolved to $/ere t/e system $as a&le to manage any type
o ile ormat t/at could &e stored on t/e net$ork. @/e applications gre$
to encompass electronic documents' colla&oration tools' security' and
auditing capa&ilities.
,.+ Docu#"nt M'n'-"#"nt 'nd Cont"nt M'n'-"#"nt
@/ere is considera&le conusion in t/e market &et$een document
management systems B"M2C and content management systems BCM2C.
@/is /as not &een /elped &y t/e vendors' $/o are keen to market t/eir
products as $idely as possi&le.
@/ese t$o types o systems are very dierent' and serve complementary
needs. W/ile t/ere is an ongoing move to merge t/e t$o toget/er Ba
positive stepC' it is important to understand $/en eac/ system is
appropriate.
Docu#"nt M'n'-"#"nt S)!t"#! 5DMS6
"ocument management is certainly t/e older discipline' &orn out o t/e
need to manage /uge num&ers o documents in organisations.
+19
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Mature and $ell4tested' document management systems can &e
c/aracterised as ollo$s5
ocused on managing documents' in t/e traditional sense Blike Word
ilesC
eac/ unit o inormation BdocumentC is airly large' and sel4
contained
t/ere are e$ Bi anyC links &et$een documents
provides limited integration $it/ repository Bc/eck4in' c/eck4out'
etcC
ocused primarily on storage and arc/iving
includes po$erul $orklo$
targeted at storing and presenting documents in t/eir native ormat
limited $e& pu&lis/ing engine typically produces one page or eac/
document
Note t/at t/is is just a generalised description o a "M2' $it/ most
systems oering a range o uni*ue eatures and capa&ilities.
Nonet/eless' t/is does provide a representative outline o common "M2
unctionality.
( typical document management scenario5
A large legal firm purchases a DM0 to track the huge number of advice
documents& contracts and briefs. It allo1s la1yers to easily retrieve
earlier advice& and to use FprecedentF templates to quickly create ne1
documents.
Aou canIt &uild a $e&site $it/ just a "M system
Cont"nt M'n'-"#"nt S)!t"#! 5CMS6
Content management is more recent' and is primarily designed to meet
t/e gro$ing needs o t/e $e&site and intranet markets.

( content management system can &e summarised as ollo$s5
manages small' interconnected units o inormation Be.g. $e&
pagesC
eac/ unit BpageC is deined &y its location on t/e site
eDtensive cross4linking &et$een pages
ocused primarily on page creation and editing
provides tig/t integration &et$een aut/oring and t/e repository
Bmetadata' etcC
provides a very po$erul pu&lis/ing engine Btemplates' scripting'
etcC
( typical content management scenario5
+1+
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
A "M0 is purchased to manage the C@@@ page corporate 1ebsite.
Template#based authoring allo1s business groups to easily create
content& 1hile the publishing system dynamically generates richly#
formatted pages.
Content management and document management are complementary'
not competing tec/nologies. Aou must c/oose an appropriate system i
&usiness needs are to &e met.
,., Co#<on"nt!
"ocument management systems commonly provide storage' versioning'
metadata' security' as $ell as indeDing and retrieval capa&ilities. )ere is
a description o t/ese components5
M"t'd't'
Metadata is typically stored or eac/ document. Metadata may' or
eDample' include t/e date t/e document $as stored and t/e identity o
t/e user storing it. @/e "M2 may also eDtract metadata rom t/e
document automatically or prompt t/e user to add metadata. 2ome
systems also use optical c/aracter recognition on scanned images' or
perorm teDt eDtraction on electronic documents. @/e resulting eDtracted
teDt can &e used to assist users in locating documents &y identiying
pro&a&le key$ords or providing or ull teDt searc/ capa&ility' or can &e
used on its o$n. EDtracted teDt can also &e stored as a component o
metadata' stored $it/ t/e image' or separately as a source or searc/ing
document collections.

Int"-r'ton
Many document management systems attempt to integrate document
management directly into ot/er applications' so t/at users may retrieve
eDisting documents directly rom t/e document management system
repository' make c/anges' and save t/e c/anged document &ack to t/e
repository as a ne$ version' all $it/out leaving t/e application. 2uc/
integration is commonly availa&le or oice suites and e4mail or
colla&oration-group$are sot$are.
C'<tur"
Images o paper documents using scanners or multiunction printers.
Optical C/aracter 7ecognition BOC7C sot$are is oten used' $/et/er
integrated into t/e /ard$are or as stand4alone sot$are' in order to
convert digital images into mac/ine reada&le teDt.
+18
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Ind"7n-
@rack electronic documents. IndeDing may &e as simple as keeping track
o uni*ue document identiiersJ &ut oten it takes a more compleD orm'
providing classiication t/roug/ t/e documentsI metadata or even
t/roug/ $ord indeDes eDtracted rom t/e documentsI contents. IndeDing
eDists mainly to support retrieval. One area o critical importance or
rapid retrieval is t/e creation o an indeD topology.
Stor'-"
2tore electronic documents. 2torage o t/e documents oten includes
management o t/ose same documentsJ $/ere t/ey are stored' or /o$
long' migration o t/e documents rom one storage media to anot/er
B)ierarc/ical storage managementC and eventual document destruction.
R"tr"&'(
7etrieve t/e electronic documents rom t/e storage. (lt/oug/ t/e notion
o retrieving a particular document is simple' retrieval in t/e electronic
conteDt can &e *uite compleD and po$erul. 2imple retrieval o
individual documents can &e supported &y allo$ing t/e user to speciy
t/e uni*ue document identiier' and /aving t/e system use t/e &asic
indeD Bor a non4indeDed *uery on its data storeC to retrieve t/e
document. More leDi&le retrieval allo$s t/e user to speciy partial
searc/ terms involving t/e document identiier and-or parts o t/e
eDpected metadata. @/is $ould typically return a list o documents
$/ic/ matc/ t/e userIs searc/ terms. 2ome systems provide t/e
capa&ility to speciy a %oolean eDpression containing multiple key$ords
or eDample p/rases eDpected to eDist $it/in t/e documentsI contents.
@/e retrieval or t/is kind o *uery may &e supported &y previously4&uilt
indeDes' or may perorm more time4consuming searc/es t/roug/ t/e
documentsI contents to return a list o t/e potentially relevant
documents. 2ee also "ocument retrieval.
D!tr$uton S"curt)
"ocument security is vital in many document management applications.
Compliance re*uirements or certain documents can &e *uite compleD
depending on t/e type o documents. ?or instance t/e )ealt/ Insurance
!orta&ility and (ccounta&ility (ct B)I!((C re*uirements dictate t/at
medical documents /ave certain security re*uirements. 2ome document
management systems /ave a rig/ts management module t/at allo$s an
administrator to give access to documents &ased on type to only certain
people or groups o people.
3or0/(o2
+1=
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Worklo$ is a compleD pro&lem and some document management
systems /ave a &uilt in $orklo$ module. @/ere are dierent types o
$orklo$. Usage depends on t/e environment t/e E"M2 is applied to.
Manual $orklo$ re*uires a user to vie$ t/e document and decide $/o
to send it to. 7ules4&ased $orklo$ allo$s an administrator to create a
rule t/at dictates t/e lo$ o t/e document t/roug/ an organiEation5 or
instance' an invoice passes t/roug/ an approval process and t/en is
routed to t/e accounts paya&le department. "ynamic rules allo$ or
&ranc/es to &e created in a $orklo$ process. ( simple eDample $ould
&e to enter an invoice amount and i t/e amount is lo$er t/an a certain
set amount' it ollo$s dierent routes t/roug/ t/e organiEation.
Co(('$or'ton
Colla&oration s/ould &e in/erent in an E"M2. "ocuments s/ould &e
capa&le o &eing retrieved &y an aut/oriEed user and $orked on. (ccess
s/ould &e &locked to ot/er users $/ile $ork is &eing perormed on t/e
document.
V"r!onn-
0ersioning is a process &y $/ic/ documents are c/ecked in or out o t/e
document management system' allo$ing users to retrieve previous
versions and to continue $ork rom a selected point. 0ersioning is
useul or documents t/at c/ange over time and re*uire updating' &ut it
may &e necessary to go &ack to a previous copy.
,.8 I!!u"! Addr"!!"d n Docu#"nt M'n'-"#"nt
@/ere are several common issues t/at are involved in managing
documents' $/et/er t/e system is an inormal' ad4/oc' paper4&ased
met/od or one person or i it is a ormal' structured' computer en/anced
system or many people across multiple oices. Most met/ods or
managing documents address t/e ollo$ing areas5
Loc'ton
W/ere $ill documents &e storedN W/ere $ill people need to go to
access documentsN !/ysical journeys to iling ca&inets and ile
rooms are analogous to t/e onscreen navigation re*uired to use a
document management system.
F(n-
)o$ $ill documents &e iledN W/at met/ods $ill &e used to
organiEe or indeD t/e documents to assist in later retrievalN
"ocument management systems $ill typically use a data&ase to store
iling inormation.
R"tr"&'(
)o$ $ill documents &e oundN @ypically' retrieval encompasses
&ot/ &ro$sing t/roug/ documents and searc/ing or speciic
inormation.
S"curt) )o$ $ill documents &e kept secureN )o$ $ill unaut/oriEed
+1,
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
personnel &e prevented rom reading' modiying or destroying
documentsN
D!'!t"r
R"co&"r)
)o$ can documents &e recovered in case o destruction rom ires'
loods or natural disastersN
R"t"nton
<"rod
)o$ long s/ould documents &e kept' i.e. retainedN (s organiEations
gro$ and regulations increase' inormal guidelines or keeping
various types o documents give $ay to more ormal 7ecords
Management practices.
Arc1&n- )o$ can documents &e preserved or uture reada&ilityN
D!tr$uton )o$ can documents &e availa&le to t/e people t/at need t/emN
3or0/(o2
I documents need to pass rom one person to anot/er' $/at are t/e
rules or /o$ t/eir $ork s/ould lo$N
Cr"'ton
)o$ are documents createdN @/is *uestion &ecomes important $/en
multiple people need to colla&orate' and t/e logistics o version
control and aut/oring arise.
Aut1"ntc'ton Is t/ere a $ay to vouc/ or t/e aut/enticity o a documentN
,.5 U!n- IML n Docu#"nt 'nd In/or#'ton M'n'-"#"nt
@/e attention paid to QML BEDtensi&le Markup LanguageC' $/ose +.9
standard $as pu&lis/ed ?e&ruary +9' +::<' is impressive. QML /as &een
/eralded as t/e neDt important internet tec/nology' t/e neDt step
ollo$ing )@ML' and t/e natural and $ort/y companion to t/e #ava
programming language itsel. Enterprises o all stripes /ave rapturously
em&raced QML. (n important role or QML is in managing not only
documents &ut also t/e inormation components on $/ic/ documents
are &ased.
Docu#"nt M'n'-"#"nt: Or-'n>n- F("!
"ocument management as a tec/nology and a discipline /as
traditionally augmented t/e capa&ilities o a computerIs ile system. %y
ena&ling users to c/aracteriEe t/eir documents' $/ic/ are usually stored
in iles' document management systems ena&le users to store' retrieve'
and use t/eir documents more easily and po$erully t/an t/ey can do
$it/in t/e ile system itsel.
Long &eore anyone t/oug/t o QML' document management systems
$ere originally developed to /elp la$ oices maintain &etter control
over and access to t/e many documents t/at legal proessionals
generate. @/e &asic mec/anisms o t/e irst document management
systems perormed' among ot/ers' t/ese simple &ut po$erul tasks5
(dd inormation a&out a document to t/e ile t/at contains t/e
document
OrganiEe t/e user4supplied inormation in a data&ase
+11
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Create inormation a&out t/e relations/ips &et$een dierent
documents
In essence' document management systems created li&raries o
documents in a computer system or a net$ork. @/e document li&rary
contained a Mcard catalogM $/ere t/e user4supplied inormation $as
stored and t/roug/ $/ic/ users could ind out a&out t/e documents and
access t/em. @/e card catalog $as a data&ase t/at captured inormation
a&out a document' suc/ as t/ese5
Aut1or5 $/o $rote or contri&uted to t/e document
M'n to<c!5 $/at su&jects are covered in t/e document
Or-n'ton d't"5 $/en $as it started
Co#<("ton d't"5 $/en $as it inis/ed
R"('t"d docu#"nt!5 $/at ot/er documents are relevant to t/is
document
A!!oc't"d '<<(c'ton!5 $/at programs are used to process t/e
document
C'!"5 to $/ic/ legal case Bor ot/er &usiness processC is t/e document
related
(rmed $it/ a data&ase o suc/ inormation a&out documents' users
could ind inormation in more sensi&le and intuitive $ays t/an
scanning dierent directoriesI lists o contents' /oping t/at a ileIs name
mig/t reveal $/at t/e ile contained. Many people consider document
management systemsI irst ac/ievement to /ave created Ma ile system
$it/in t/e ile system.M
2oon' document management systems &egan to provide additional and
valua&le unctionality. %y enric/ing t/e data&ases o inormation a&out
t/e documents Bt/e metadataC' t/ese systems provided t/ese capa&ilities5
V"r!on tr'c0n-5 see /o$ a document evolves over time
Docu#"nt !1'rn-5 see in $/at &usiness processes t/e document is
used and re4used
E("ctronc r"&"25 ena&le users to add t/eir comments to a document
$it/out actually c/anging t/e document itsel
Docu#"nt !"curt)5 reine t/e dierent types o access t/at dierent
users need to t/e document
Pu$(!1n- #'n'-"#"nt5 control t/e delivery o documents to
dierent pu&lis/ing process *ueues
3or0/(o2 nt"-r'ton5 associate t/e dierent stages o a documentIs
lie4cycle $it/ people and projects $it/ sc/edules
@/ese critical capa&ilities Bamong ot/ersC o document management
systems /ave proven enormously successul' ueling a multi4&illion
dollar &usiness.
+1.
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
IML: M'n'-n- Docu#"nt Co#<on"nt!
QML and its parent tec/nology' 2GML B2tandard GeneraliEed Markup
LanguageC' provide t/e oundation or managing not only documents &ut
also t/e inormation components o $/ic/ t/e documents are composed.
@/is is due to some nota&le c/aracteristics o QML data.
Docu#"nt! &!. F("!
In QML' documents can &e seen independently o iles. One document
can comprise many iles' or one ile can contain many documents. @/is
is t/e distinction &et$een t/e <1)!c'( 'nd (o-c'( !tructur" o
inormation. QML data is primarily descri&ed &y its logical structure. In
a logical structure' principal interest is placed on $/at t/e pieces o
inormation are and /o$ t/ey relate to eac/ ot/er' and secondary interest
is placed on t/e p/ysical items t/at constitute t/e inormation.
7at/er t/an relying on ile /eaders and ot/er system4speciic
c/aracteristics o a ile as t/e primary means or understanding and
managing inormation' QML relies on t/e markup in t/e data itsel. (
c/apter in a document is not a c/apter &ecause it resides in a ile called
c/apter+.doc &ut &ecause t/e c/apterIs content is contained in t/e
nc/aptero and n-c/aptero element tags.
%ecause elements in QML can /ave attri&utes' t/e components o a
document can &e eDtensively sel4descriptive. ?or eDample' in QML you
can learn a lot a&out t/e c/apter $it/out actually reading it i t/e
c/apterIs markup is ric/ in attri&utes' as in nc/apter languageHMEnglis/M
su&jectHMcolonial economicsM revisionVdateHM+::<9.8=M aut/orHM#oan
Q. !ringleM t/esisVadvisorHM7amona Winkel/oMo. W/en t/e elements
carry sel4descri&ing metadata $it/ t/em' systems t/at understand QML
syntaD can operate on t/ose elements in useul $ays' just like a
traditional document management system can. %ut t/ere is a major
dierence.
In/or#'ton &!. Docu#"nt!
QML markup provides metadata or all components o a document' not
merely t/e o&ject t/at contains t/e document itsel. @/is makes t/e
pieces o inormation t/at constitute a document just as managea&le as
t/e ields o a record in a data&ase. %ecause QML data ollo$s syntactic
rules or $ell4ormedness and proper containment o elements'
document management systems t/at can correctly read and parse QML
data can apply t/e unctions o document management system' suc/ as
t/ose mentioned a&ove' to any and all inormation components inside
t/e document.
@/e ocus on inormation rat/er t/an documents rom QML oers some
+1;
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
important capa&ilities5
R"u!" o/ In/or#'ton
W/ile standard document management systems do oer some measure
o inormation reuse t/roug/ ile s/aring' inormation management
systems &ased on QML or 2GML ena&le people to s/are pieces o
common inormation $it/out storing t/e piece o inormation in
multiple places.
In/or#'ton C'r&"!tn-
%y ena&ling people to ocus on inormation components t/at make up
documents rat/er t/an on t/e documents t/emselves' t/ese systems can
identiy and capture useul inormation components t/at /ave ongoing
value M&uriedM inside documents $/ose value as documents is limited.
@/at is' a particular document may &e useul only or a s/ort time' &ut
c/unks o inormation inside t/at document may &e reusa&le and
valua&le or a longer period.
Fn"@Gr'nu('rt) T"7t@M'n'-"#"nt A<<(c'ton!
%ecause t/e inormation components in QML documents are
identiia&le' manipulata&le' and managea&le' QML inormation
management tec/nology can support real economies in applications suc/
as translation o tec/nical manuals.
E&'(u'tn- Product O//"rn-!
W/ile t/e general $orld o document management and inormation
management is moving to$ard adoption o structured inormation and
use o QML and 2GML' some product oerings distinguis/ t/emselves
&y using underlying data&ase management products $it/ native support
or o&ject4oriented data. O&ject4oriented data matc/es t/e structure o
QML data *uite $ell and data&ase systems t/at compre/end o&ject4
oriented data adapt $ell to t/e tasks o managing QML inormation.
%y contrast' ot/er inormation management products t/at compre/end
QML or 2GML data use relational data&ase systems and provide t/eir
o$n o&ject4oriented eDtensions to t/ose data&ase systems in order to
compre/end o&ject4oriented data suc/ as QML or 2GML data' and
relying on suc/ implementations /ave also garnered success and respect
in t/e document management marketplace.
,.? T)<"! o/ Docu#"nt M'n'-"#"nt S)!t"#!
(lresco Bsot$areC
Colum&ia2ot
Main--!yrus "M2
Open3M
+1<
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
Comput/inkIs 0ie$Wise
"idga/
"ocumentum
"oc!oint
)umming&ird "M
Inter$ovenIs Worksite
Inonic "ocument Management
BU3C
I2I2 !apyrus
3no$ledge@ree
Laseric/e
Livelink
O=spaces
OracleIs 2tellent
!erceptive 2ot$are
Fuestys 2olutions
7edmap
7eport8We&
2/are!oint
2aperion
2(! 3MLC 2(! Net$eaver
@7IM ConteDt
QeroD "ocus/are
8.: CONCLUSION
"ocument management systems /ave added variety to t/e pool o
options availa&le in datase managemnt in corpcorations. Many products
are o t/e s/el or end users to c/oose rom. @/e use o document
management systems /as encouraged t/e concept and drive or
paperless oice and transactions. It is a concept t/at truly makes t/e
uture &ig/t as man tend to$ard greater eiciency &y eliminating use o
papers and /ard copies o data and inormation.
5.: SUMMARY
( docu#"nt #'n'-"#"nt !)!t"# B"M2C is a computer system Bor
set o computer programsC used to track and store electronic
documents and-or images o paper documents
%eginning in t/e +:<9s' a num&er o vendors &egan developing
systems to manage paper4&ased documents. @/ese systems managed
paper documents' $/ic/ included not only printed and pu&lis/ed
documents' &ut also p/otos' prints' etc.
@/ere is considera&le conusion in t/e market &et$een document
management systems B"M2C and content management systems
BCM2C.
"ocument management systems commonly provide storage'
versioning' metadata' security' as $ell as indeDing and retrieval
capa&ilities. )ere is a description o t/ese components5
@/ere are several common issues t/at are involved in managing
documents' $/et/er t/e system is an inormal' ad4/oc' paper4&ased
met/od or one person or i it is a ormal' structured' computer
en/anced system or many people across multiple oices
@/e attention paid to QML BEDtensi&le Markup LanguageC' $/ose
+.9 standard $as pu&lis/ed ?e&ruary +9' +::<' is impressive. QML
/as &een /eralded as t/e neDt important Internet tec/nology' t/e neDt
+1:
M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM
step ollo$ing )@ML' and t/e natural and $ort/y companion to t/e
#ava programming language itsel. Enterprises o all stripes /ave
rapturously em&raced QML.
?.: TUTOR@MARAED ASSIGNMENT
+. List 1 c/aracteristics o a document management system
8. "iscuss &riely $orklo$ in t/e conteDt o it as a component o
document management system
7.: REFERENCESBFURTCER READINGS
%%C 4/8g8 guide 2/oe&oD 2torage.
#ames 7o&ertson' !u&lis/ed on +, ?e&ruary 899=.(
Miles L. Mat/ieu' Ernest (. CapoEEoli B8998C. MThe (aperless 2ffice:
Accepting Digiti3ed dataM B!"?C. @roy 2tate University.
3evin Craine. M+xcerpts from Designing a Document
0trategyM B)@MLC. Craine Communications Group.
+.9

You might also like