COURSE TITLE: DATABASE MANAGEMENT SYSTEM MBA 758 DATABASE MANAGEMENT SYSTEM Course Writer Gerald C. Okereke Eco Communications Inc. Lagos Ikeja Course Editor Mr. E. Eseyin National Open University o Nigeria !rogramme Leader "r. O. #. On$e National Open University o Nigeria Course Coordinator %im&ola' E.U. (deg&ola National Open University o Nigeria NATIONAL OPEN UNIVERSITY OF NIGERIA COURSE GUIDE National Open University o Nigeria )ead*uarters +,-+. (/madu %ello Way 0ictoria Island Lagos (&uja Oice 1' "ar es 2alaam 2treet O (minu 3ano Crescent Wuse II' (&uja Nigeria e4mail5 centralino6nou.edu.ng U7L5 $$$.nou.edu.ng !u&lis/ed &y National Open University o Nigeria !rinted 899: I2%N5 :;<491<4==+4: (ll 7ig/ts 7eserved CONTENTS PAGE Introduction>>>>>>>>>>>>>>>>>>>>>> + Course (im>>>>>>>>>>>>>>>>>>>>>> + Course O&jectives>>>>>>>>>>>>>>>>>>>.. 8 Course Materials>>>>>>>>>>>>>>>>>>>.. 8 2tudy Units>>>>>>>>>>>>>>>>>>.>>>> 8 (ssignment ?ile >>>>>>>>>>>>>>>>>>>.. = (ssessment>>>>>>>>>>>>>>>>>>>>>> , Credit Units >>>>>>>>>>>>>>>>>>>>>.. , !resentation 2c/edule >>>>>>>>>>>>>>>>>>. , Course Overvie$ >>>>>>>>>>>>>>>>>>>.. , M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Introducton @/is course' "ata&ase Management 2ystem B"%M2C' is a course designed in t/e pursuit o a degree in Masters "egrees in &usiness' inance' marketing and related ields o study. It is also a course t/at can &e studied &y !ostgraduate "iploma students in &usiness' sciences and education. @/is course is relevant to students studying &usiness &ecause inormation-data orm t/e oundation o any &usiness enterprise. @/us a t/oroug/ understanding o /o$ to manipulate' design and manage data&ases. @/is course is primarily to &e studied &y students $/o are already graduates or post graduates in any ield o study. 2tudents $/o /ad not /ad eDposure to computer science in t/eir irst degrees need to put in eDtra eort to grasp t/is course properly. @/is course guide takes you t/roug/ t/e nature o t/e course' t/e materials you are going to use and /o$ you are to use materials to your maDimum &eneit. It is eDpected t/at at least t$o /ours s/ould &e devoted to t/e study o eac/ course unit. ?or eac/ unit t/ere assessments in t/e orm o tutor4marked assignment. Aou are advised carry out t/e eDercises immediately ater studying t/e unit. @/ere $ill &e tutorial lectures to organiEed or t/is course. @/is serves as an avenue to interact $it/ course instructors $/o $ill communicate more clearly $it/ you regarding t/e course. Aou are advised to attend t/e tutorial lectures &ecause it $ill en/ance your understanding o t/e course. Note t/at it is also t/roug/ t/ese tutorial lectures t/at you $ill su&mit your tutor4marked assignment and &e assessed accordingly. Cour!" A# %e/ind t/e development and design o t/is course is to kno$ /o$ to design' manipulate and manage data&ases. @/e course participants are eDposed to t/e various orms' types and models o data&ase systems to ena&le t/em make via&le c/oices. 2upportive and complimentary concepts o managing data and documents are t/oroug/ly eDamined to give a $/olesome vie$ o data-inormation management. @/e ultimate aim is to encourage t/e usage o data&ase management systems or eective data management. i M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Cour!" O$%"ct&"! @/e ollo$ing are t/e major o&jectives o t/is course5 deine a "ata&ase Management 2ystem give a description o t/e "ata&ase Management structure deine a "ata&ase deine &asic oundational terms o "ata&ase understand t/e applications o "ata&ases kno$ t/e advantages and disadvantages o t/e dierent models compare relational model $it/ t/e 2tructured Fuery Language B2FLC kno$ t/e constraints and controversies associated $it/ relational data&ase model. kno$ t/e rules guiding transaction (CI" identiy t/e major types o relational management systems compare and contrast t/e types o 7"%M2 &ased on several criteria understand t/e concept o data planning and "ata&ase design kno$ t/e steps in t/e development o "ata&ases trace t/e /istory and development process o 2FL kno$ t/e scope and eDtension o 2FL dierentiate "iscretionary and. Mandatory (ccess Control !olicies kno$ t/e !roposed OO"%M2 2ecurity Models identiy t/e various unctions o "ata&ase (dministrator trace t/e /istory and development process o data$are/ouse list various &eneits o data$are/ouse compare and contrast document management system and content management systems kno$ t/e &asic components o document management systems Cour!" M't"r'(! +. Course Guide 8. 2tudy Units =. @eDt&ooks ,. (ssignment ?ile 1. @utorials Stud) Unt! @/is course consists o t/irteen B+=C units' divided into = modules. Eac/ module deals $it/ major aspect o t/e course. Modu(" * ii M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Unit + Overvie$ Unit 8 "ata&ase Unit = "ata&ase Concepts Unit , "ata&ase Models + Unit 1 "ata&ase Models5 7elational Model Unit . %asic Components o "%M2 Modu(" + Unit + "evelopment and "esign4O "ata&ase Unit 8 2tructured Fuery Languages B2FLC Unit = "ata&ase and Inormation 2ystems 2ecurity Unit , "ata&ase (dministrator and (dministration Modu(" , Unit + 7elational "ata&ase Management 2ystems Unit 8 "ata$are/ouse Unit = "ocument Management 2ystem In studying t/e units' a minimum o 8 /ours is eDpected o you. 2tart &y going t/roug/ t/e unit o&jectives or you to kno$ $/at you need to learn and kno$ in t/e course o studying t/e unit. (t t/e end o t/e study o t/e unit' evaluate yoursel to kno$ i you /ave ac/ieved t/e o&jectives o t/e unit. I not' you need to go t/roug/ t/e unit again. @o /elp you ascertain /o$ $ell you understood t/e course' t/ere $ill &e eDercises mainly in t/e orm o tutor4marked assignments at t/e end o eac/ unit. (t irst attempt' try to ans$er t/e *uestions $it/out necessarily /aving to go t/roug/ t/e unit. )o$ever' i you cannot proer solutions o/and' t/en go t/roug/ t/e unit to ans$er t/e *uestions. A!!-n#"nt F(" ?or eac/ unit' you $ill ind one B+C or t$o B8C tutor4marked assignments. @/ese assignments serve t$o purposes5 *. S"(/ E&'(u'ton: @/e tutor4marked assignment $ill assists you to t/oroug/ly go t/roug/ eac/ unit' &ecause you are advised to attempt to ans$er t/e *uestions immediately ater studying eac/ unit. @/e *uestions are designed in suc/ a $ay t/at at least one *uestion must prompt a typical sel assessment test. iii M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM +. O$t'n V'(u'$(" M'r0!: @/e tutor4marked assignment is also a valid means to o&tain marks t/at $ill orm part o your total score in t/is course. It constitutes =9G o total marks o&taina&le in t/is course. Aou are advised to go t/roug/ t/e units t/oroug/ly or you to &e a&le to proer correct solution to t/e tutor4marked assignment A!!"!!#"nt Aou $ill &e assessed and graded in t/is course t/roug/ tutor4marked assignment and ormal $ritten eDamination. @/e allocation o marks is as indicated &elo$. (ssignments H =9 G EDamination H ;9G ?inal eDamination and grading @/e inal eDamination $ill consist o t$o B8C sections5 +. 2ection +5 @/is is compulsory and $eig/s ,9 marks 8. 2ection 85 @/is consists o siD B.C *uestions out o $/ic/ you are to ans$er B,C *uestions. It $eig/ts .9 marks. @/e duration o t/e eDamination $ill &e = /ours. Cr"dt Unt! @/is course attracts = credit units only. Pr"!"nt'ton Sc1"du(" @/is constitutes t/e sc/eduled dates and venue or tutorial classes' as $ell as /o$ and $/en to su&mit t/e tutorials. (ll t/is $ill &e communicated to you in due course. Cour!" O&"r&"2 @/is indicates t/e units-topic' issues to &e studied eac/ $eek. It also includes t/e duration o t/e course' revision $eek and eDamination $eek. @/e details are as provided &elo$5 iv M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Unt Tt(" o/ 3or0 3""04! Act&t) A!!"!!#"nt 5"nd o/ unt6 Course Guide Modu(" * + Overvie$ + @M( 8 "ata&ase 8 @M( = "ata&ase Concepts = @M( , "ata&ase Models + , @M( 1 "ata&ase Models5 7elational Model 1 @M( . %asic Components o "%M2 . @M( Modu(" + + "evelopment and "esign4O "ata&ase ; @M( 8 2tructured Fuery Languages B2FLC < @M( = "ata&ase and Inormation 2ystems 2ecurity : @M( , "ata&ase (dministrator and (dministration +9 @M( Modu(" , + 7elational "ata&ase Management 2ystems ++ @M( 8 "ata$are/ouse +8 @M( = "ocument Management 2ystem += @M( R"&!on 'nd E7'#n'ton *8 v M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Course Code M%( ;1< Course @itle "ata&ase Management 2ystem Course Writer Gerald C. Okereke Eco Communications Inc. Lagos Ikeja Course Editor Mr. E. Eseyin National Open University o Nigeria !rogramme Leader "r. O. #. On$e National Open University o Nigeria Course Coordinator %im&ola' E.U. (deg&ola National Open University o Nigeria vi M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM NATIONAL OPEN UNIVERSITY OF NIGERIA vii M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM National Open University o Nigeria )ead*uarters +,-+. (/madu %ello Way 0ictoria Island Lagos (&uja Oice 1' "ar es 2alaam 2treet O (minu 3ano Crescent Wuse II' (&uja Nigeria e4mail5 centralino6nou.edu.ng U7L5 $$$.nou.edu.ng !u&lis/ed &y National Open University o Nigeria !rinted 899: I2%N5 :;<491<4==+4: (ll 7ig/ts 7eserved viii M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM CONTENTS PAGE Modu(" * 99999999999999999999.. * Unit + Overvie$>>>>>>>>>>>>>.>>>. + Unit 8 "ata&ase>>>>>>>>>>>>>>..>>.. ++ Unit = "ata&ase Concepts>>>>>>>>>..>>>.. 8= Unit , "ata&ase Models +>>>>>>>>>.>.>>.. =. Unit 1 "ata&ase Models5 7elational Model>>>>>>.. 18 Unit . %asic Components o "%M2 >>>>>>>>> ., Modu(" + 999999999..99999999999.. 75 Unit + "evelopment and "esign4O "ata&ase >>>>>> ;1 Unit 8 2tructured Fuery Languages B2FLC>>>>>>>. << Unit = "ata&ase and Inormation 2ystems 2ecurity >>>... +9+ Unit , "ata&ase (dministrator and (dministration >>>.. ++1 Modu(" , 999999999999999999..99.. *+8 Unit + 7elational "ata&ase Management 2ystems >>.> +8, Unit 8 "ata Ware/ouse>>>.>>>>>>>.>..>> +=1 Unit = "ocument Management 2ystem>>>>>>>>.. +,; iD M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM MODULE * Unit + Overvie$ Unit 8 "ata&ase Unit = "ata&ase Concepts Unit , "ata&ase Models + Unit 1 "ata&ase Models5 7elational Model Unit . %asic Components o "%M2 UNIT * OVERVIE3 CONTENTS +.9 Introduction 8.9 O&jectives =.9 Main Content =.+ "escription =.8 "%M2 %eneits =.= ?eatures and capa&ilities o "%M2 =., Uses o "%M2 =.1 List o "ata&ase Management 2ystems 2ot$are ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION ( "ata&ase Management 2ystem B"%M2C is computer sot$are designed or t/e purpose o managing data&ases &ased on a variety o data models. +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 deine a "ata&ase Management 2ystem give a description o t/e "ata&ase Management 2tructure numerate t/e &eneits o "ata&ase Management 2ystem descri&e t/e eatures and capa&ilities o a typical "%M2 identiy and dierentiate t/e dierent types and models o "%M2. + M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.: MAIN CONTENT ,.* D"!cr<ton ( "%M2 is a compleD set o sot$are programs t/at controls t/e organiEation' storage' management' and retrieval o data in a data&ase. "%M2 are categoriEed according to t/eir data structures or types' sometime "%M2 is also kno$n as "ata &ase Manager. It is a set o pre$ritten programs t/at are used to store' update and retrieve a "ata&ase. ( "%M2 includes5 ( modeling language to deine t/e sc/ema o eac/ data&ase /osted in t/e "%M2' according to t/e "%M2 data model. @/e our most common types o organiEations are t/e /ierarc/ical' net$ork' relational and o&ject models. Inverted lists and ot/er met/ods are also used. ( given data&ase management system may provide one or more o t/e our models. @/e optimal structure depends on t/e natural organiEation o t/e applicationIs data' and on t/e applicationIs re*uirements B$/ic/ include transaction rate BspeedC' relia&ility' maintaina&ility' scala&ility' and costC. @/e dominant model in use today is t/e ad /oc one em&edded in 2FL' despite t/e o&jections o purists $/o &elieve t/is model is a corruption o t/e relational model' since it violates several o its undamental principles or t/e sake o practicality and perormance. Many "%M2s also support t/e Open "ata&ase Connectivity (!I t/at supports a standard $ay or programmers to access t/e "%M2. "ata structures Bields' records' iles and o&jectsC optimiEed to deal $it/ very large amounts o data stored on a permanent data storage device B$/ic/ implies relatively slo$ access compared to volatile main memoryC. ( data&ase *uery language and report $riter to allo$ users to interactively interrogate t/e data&ase' analyEe its data and update it according to t/e users privileges on data. It also controls t/e security o t/e data&ase. "ata security prevents unaut/oriEed users rom vie$ing or updating t/e data&ase. Using pass$ords' users are allo$ed access to t/e entire data&ase or su&sets o it called subschemas. ?or eDample' an employee data&ase can contain all t/e data a&out an individual employee' &ut one group o users may &e aut/oriEed to vie$ only payroll data' $/ile ot/ers are allo$ed access to only $ork /istory and medical data. 8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM I t/e "%M2 provides a $ay to interactively enter and update t/e data&ase' as $ell as interrogate it' t/is capa&ility allo$s or managing personal data&ases. )o$ever' it may not leave an audit trail o actions or provide t/e kinds o controls necessary in a multi4user organiEation. @/ese controls are only availa&le $/en a set o application programs are customiEed or eac/ data entry and updating unction. ( transaction mec/anism' t/at ideally $ould guarantee t/e (CI" properties' in order to ensure data integrity' despite concurrent user accesses Bconcurrency controlC' and aults Bault toleranceC. It also maintains t/e integrity o t/e data in t/e data&ase. @/e "%M2 can maintain t/e integrity o t/e data&ase &y not allo$ing more t/an one user to update t/e same record at t/e same time. @/e "%M2 can /elp prevent duplicate records via uni*ue indeD constraintsJ or eDample' no t$o customers $it/ t/e same customer num&ers Bkey ieldsC can &e entered into t/e data&ase. @/e "%M2 accepts re*uests or data rom t/e application program and instructs t/e operating system to transer t/e appropriate data. W/en a "%M2 is used' inormation systems can &e c/anged muc/ more easily as t/e organiEationIs inormation re*uirements c/ange. Ne$ categories o data can &e added to t/e data&ase $it/out disruption to t/e eDisting system. OrganiEations may use one kind o "%M2 or daily transaction processing and t/en move t/e detail onto anot/er computer t/at uses anot/er "%M2 &etter suited or random in*uiries and analysis. Overall systems design decisions are perormed &y data administrators and systems analysts. "etailed data&ase design is perormed &y data&ase administrators. "ata&ase servers are specially designed computers t/at /old t/e actual data&ases and run only t/e "%M2 and related sot$are. "ata&ase servers are usually multiprocessor computers' $it/ 7(I" disk arrays used or sta&le storage. Connected to one or more servers via a /ig/4 speed c/annel' /ard$are data&ase accelerators are also used in large volume transaction processing environments. "%M2s are ound at t/e /eart o most data&ase applications. 2ometimes "%M2s are &uilt around a private multitasking kernel $it/ &uilt4in net$orking support alt/oug/ no$adays t/ese unctions are let to t/e operating system. = M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.+ DBMS B"n"/t! Improved strategic use o corporate data 7educed compleDity o t/e organiEationKs inormation systems environment 7educed data redundancy and inconsistency En/anced data integrity (pplication4data independence Improved security 7educed application development and maintenance costs Improved leDi&ility o inormation systems Increased access and availa&ility o data and inormation Logical L !/ysical data independence Concurrent access anomalies. ?acilitate atomicity pro&lem. !rovides central control on t/e system t/roug/ "%(. Figure 1: An example of a database management approach in a banking information system. Note /o$ t/e savings' c/ecking' and installment loan programs use a data&ase management system to s/are a customer data&ase. Note also t/at t/e "%M2 allo$s a user to make a direct' ad /oc interrogation o t/e data&ase $it/out using application programs. ,., F"'tur"! 'nd C'<'$(t"! o/ DBMS ( "%M2 can &e c/aracteriEed as an Mattri&ute management systemM $/ere attri&utes are small c/unks o inormation t/at descri&e somet/ing. ?or eDample' McolourM is an attri&ute o a car. @/e value o t/e attri&ute may &e a color suc/ as MredM' M&lueM or MsilverM. , M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM (lternatively' and especially in connection $it/ t/e relational model o data&ase management' t/e relation &et$een attri&utes dra$n rom a speciied set o domains can &e seen as &eing primary. ?or instance' t/e data&ase mig/t indicate t/at a car t/at $as originally MredM mig/t ade to MpinkM in time' provided it $as o some particular MmakeM $it/ an inerior paint jo&. 2uc/ /ig/er arity relations/ips provide inormation on all o t/e underlying domains at t/e same time' $it/ none o t/em &eing privileged a&ove t/e ot/ers. @/roug/out recent /istory specialiEed data&ases /ave eDisted or scientiic' geospatial' imaging' and document storage and like uses. ?unctionality dra$n rom suc/ applications /as lately &egun appearing in mainstream "%M2s as $ell. )o$ever' t/e main ocus t/ere' at least $/en aimed at t/e commercial data processing market' is still on descriptive attri&utes on repetitive record structures. @/us' t/e "%M2s o today roll toget/er re*uently4needed services or eatures o attri&ute management. %y eDternaliEing suc/ unctionality to t/e "%M2' applications eectively s/are code $it/ eac/ ot/er and are relieved o muc/ internal compleDity. ?eatures commonly oered &y data&ase management systems include5 =u"r) A$(t) Fuerying is t/e process o re*uesting attri&ute inormation rom various perspectives and com&inations o actors. EDample5 M)o$ many 84door cars in @eDas are greenNM ( data&ase *uery language and report $riter allo$ users to interactively interrogate t/e data&ase' analyEe its data and update it according to t/e users privileges on data. It also controls t/e security o t/e data&ase. "ata security prevents unaut/oriEed users rom vie$ing or updating t/e data&ase. Using pass$ords' users are allo$ed access to t/e entire data&ase or su&sets o it called su&sc/emas. ?or eDample' an employee data&ase can contain all t/e data a&out an individual employee' &ut one group o users may &e aut/oriEed to vie$ only payroll data' $/ile ot/ers are allo$ed access to only $ork /istory and medical data. I t/e "%M2 provides a $ay to interactively enter and update t/e data&ase' as $ell as interrogate it' t/is capa&ility allo$s or managing personal data&ases. )o$ever it may not leave an audit trail o actions or provide t/e kinds o controls necessary in a multi4user organiEation. @/ese controls are only availa&le $/en a set o application programs are customiEed or eac/ data entry and updating unction. 1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM B'c0u< 'nd R"<(c'ton Copies o attri&utes need to &e made regularly in case primary disks or ot/er e*uipment ails. ( periodic copy o attri&utes may also &e created or a distant organiEation t/at cannot readily access t/e original. "%M2 usually provide utilities to acilitate t/e process o eDtracting and disseminating attri&ute sets. W/en data is replicated &et$een data&ase servers' so t/at t/e inormation remains consistent t/roug/out t/e data&ase system and users cannot tell or even kno$ $/ic/ server in t/e "%M2 t/ey are using' t/e system is said to eD/i&it replication transparency. Ru(" En/orc"#"nt Oten one $ants to apply rules to attri&utes so t/at t/e attri&utes are clean and relia&le. ?or eDample' $e may /ave a rule t/at says eac/ car can /ave only one engine associated $it/ it Bidentiied &y Engine Num&erC. I some&ody tries to associate a second engine $it/ a given car' $e $ant t/e "%M2 to deny suc/ a re*uest and display an error message. )o$ever' $it/ c/anges in t/e model speciication suc/ as' in t/is eDample' /y&rid gas4electric cars' rules may need to c/ange. Ideally suc/ rules s/ould &e a&le to &e added and removed as needed $it/out signiicant data layout redesign. S"curt) Oten it is desira&le to limit $/o can see or c/ange a given attri&utes or groups o attri&utes. @/is may &e managed directly &y individual' or &y t/e assignment o individuals and privileges to groups' or Bin t/e most ela&orate modelsC t/roug/ t/e assignment o individuals and groups to roles $/ic/ are t/en granted entitlements. Co#<ut'ton @/ere are common computations re*uested on attri&utes suc/ as counting' summing' averaging' sorting' grouping' cross4reerencing' etc. 7at/er t/an /ave eac/ computer application implement t/ese rom scratc/' t/ey can rely on t/e "%M2 to supply suc/ calculations. (ll arit/metical $ork to perorm &y computer is called a computation. C1'n-" 'nd Acc"!! Lo--n- Oten one $ants to kno$ $/o accessed $/at attri&utes' $/at $as c/anged' and $/en it $as c/anged. Logging services allo$ t/is &y keeping a record o access occurrences and c/anges. . M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Auto#'t"d O<t#>'ton I t/ere are re*uently occurring usage patterns or re*uests' some "%M2 can adjust t/emselves to improve t/e speed o t/ose interactions. In some cases t/e "%M2 $ill merely provide tools to monitor perormance' allo$ing a /uman eDpert to make t/e necessary adjustments ater revie$ing t/e statistics collected. ,.8 U!"! O/ D't'$'!" M'n'-"#"nt S)!t"#! @/e our major uses o data&ase management systems are5 +. "ata&ase "evelopment 8. "ata&ase Interrogation =. "ata&ase Maintenance ,. (pplication "evelopment D't'$'!" D"&"(o<#"nt "ata&ase packages like Microsot (ccess' Lotus (pproac/ allo$ end users to develop t/e data&ase t/ey need. )o$ever' large organiEations $it/ client-server or mainrame4&ased system usually place control o enterprise4$ide data&ase development in t/e /ands o data&ase administrators and ot/er data&ase specialists. @/is improves t/e integrity and security o organiEational data&ase. "ata&ase developers use t/e data deinition languages B""LC in data&ase management systems like oracle :i or I%MKs %"8 to develop and speciy t/e data contents' relations/ips and structure eac/ data&ases' and to modiy t/ese data&ase speciications called a data dictionary. F-ur" +: T1" Four M'%or U!"! o/ DBMS D't'$'!" Int"rro-'ton @/e "ata&ase interrogation capa&ility is a major use o "ata&ase management system. End users can interrogate a data&ase management system &y asking or inormation rom a data&ase using a query language or a report generator. @/ey can receive an immediate D't'$'!" D't'$'!" U!"! "ata "ictionary O<"r'tn- S)!t"# D't'$'!" M'n'-"#"nt 2ystems (p<(c'ton Pro-r'#! 4Database Development
4Database Interrogation 4Database Maintenance 4Application Development ; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM response in t/e orm o video displays or printed reports. No diicult programming ideas are re*uired. D't'$'!" M'nt"n'nc" @/e data&ases o organiEations need to &e updated continually to relect ne$ &usiness transactions and ot/er events. Ot/er miscellaneous c/anges must also &e made to ensure accuracy o t/e data in t/e data&ase. @/is data&ase maintenance process is accomplis/ed &y transaction processing programs and ot/er end4user application packages $it/in t/e support o t/e data&ase management system. End4 users and inormation specialists can also employ various utilities provided &y a "%M2 or data&ase maintenance. A<<(c'ton D"&"(o<#"nt "ata&ase management system packages play major roles in application development. End4users' systems analysts and ot/er application developers can use t/e ourt/ generational languages B,GLC programming languages and &uilt4in sot$are development tools provided &y many "%M2 packages to develop custom application programs. ?or eDample you can use a "%M2 to easily develop t/e data entry screens' orms' reports' or $e& pages &y a &usiness application. ( data&ase management system also makes t/e jo& o application programmers easier' since t/ey do not /ave to develop detailed data /andling procedures using a conventional programming language every time t/ey $rite a program. ,.5 Mod"(! @/e various models o data&ase management systems are5 +. )ierarc/ical 8. Net$ork =. O&ject4oriented ,. (ssociative 1. Column4Oriented .. Navigational ;. "istri&uted <. 7eal @ime 7elational :. 2FL @/ese models $ill &e discussed in details in su&se*uent units o t/is course. ,.? L!t o/ D't'$'!" M'n'-"#"nt S)!t"#! So/t2'r" EDamples o "%M2s include < M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Oracle "%8 2y&ase (daptive 2erver Enterprise ?ileMaker ?ire&ird Ingres InormiD Microsot (ccess Microsot 2FL 2erver Microsot 0isual ?oD!ro My2FL !ostgre2FL !rogress 2FLite @eradata C2FL OpenLink 0irtuoso 8.: CONCLUSION "ata&ase management systems /as continue to make data arrangement and storage to &e muc/ easier t/an it used to &e. Wit/ t/e emergence o relational model o data&ase management systems muc/ o t/e &ig c/allenge in /andling large data&ase /as &een reduced. More data&ase management products $ill &e availa&le on t/e market as t/ere $ill &e improvement in t/e already eDisting once. 5.: SUMMARY ( D't'$'!" M'n'-"#"nt S)!t"# BDBMSC is computer sot$are designed or t/e purpose o managing data&ases &ased on a variety o data models. ( "%M2 is a compleD set o sot$are programs t/at controls t/e organiEation' storage' management' and retrieval o data in a data&ase W/en a "%M2 is used' inormation systems can &e c/anged muc/ more easily as t/e organiEationIs inormation re*uirements c/ange. Ne$ categories o data can &e added to t/e data&ase $it/out disruption to t/e eDisting system. Oten it is desira&le to limit $/o can see or c/ange $/ic/ attri&utes or groups o attri&utes. @/is may &e managed directly &y individual' or &y t/e assignment o individuals and privileges to groups' or Bin t/e most ela&orate modelsC t/roug/ t/e assignment o individuals and groups to roles $/ic/ are t/en granted entitlements. : M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ( "%M2 can &e c/aracteriEed as an Mattri&ute management systemM $/ere attri&utes are small c/unks o inormation t/at descri&e somet/ing. ?or eDample' McolourM is an attri&ute o a car. @/e value o t/e attri&ute may &e a color suc/ as MredM' M&lueM or MsilverM. Fuerying is t/e process o re*uesting attri&ute inormation rom various perspectives and com&inations o actors. EDample5 M)o$ many 84door cars in @eDas are greenNM (s computers gre$ in capa&ility' t/is trade4o &ecame increasingly unnecessary and a num&er o general4purpose data&ase systems emergedJ &y t/e mid4+:.9s t/ere $ere a num&er o suc/ systems in commercial use. Interest in a standard &egan to gro$' and C/arles %ac/man' aut/or o one suc/ product' IDS' ounded t/e Database Task roup $it/in CO"(2AL ?.: TUTOR@MARAED ASSIGNMENT +. Mention +9 data&ase management systems sot$are 8. "escri&e &riely t/e &ackup and replication a&ility o data&ase management systems. 7.: REFERENCESBFURTCER READINGS Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. Communications o t/e (CM += B.C5 =;;O=<;. OK%rien' #ames (. 899=' Introduction to Inormation 2ystems' McGra$4 )ill' ++ t/ Edition UNIT + DATABASE CONTENTS +.9 Introduction 8.9 O&jectives +9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM =.9 Main Content =.+ ?oundations o "ata&ase @erms =.8 )istory =.= "ata&ase @ypes =., "ata&ase 2torage 2tructures =.1 "ata&ase 2ervers =.. "ata&ase 7eplication =.; 7elational "ata&ase ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION ( "ata&ase is a structured collection o data t/at is managed to meet t/e needs o a community o users. @/e structure is ac/ieved &y organiEing t/e data according to a data&ase model. @/e model in most common use today is t/e relational model. Ot/er models suc/ as t/e /ierarc/ical model and t/e net$ork model use a more eDplicit representation o relations/ips Bsee &elo$ or eDplanation o t/e various data&ase modelsC. ( computer data&ase relies upon sot$are to organiEe t/e storage o data. @/is sot$are is kno$n as a data&ase management system B"%M2C. "ata&ases management systems are categoriEed according to t/e data&ase model t/at t/ey support. @/e model tends to determine t/e *uery languages t/at are availa&le to access t/e data&ase. ( great deal o t/e internal engineering o a "%M2' /o$ever' is independent o t/e data model' and is concerned $it/ managing actors suc/ as perormance' concurrency' integrity' and recovery rom /ard$are ailures. In t/ese areas t/ere are large dierences &et$een products. +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 deine a data&ase deine &asic oundational terms o data&ase kno$ a little &it o t/e /istory o t/e development o data&ase kno$ and dierentiate t/e dierent types o data&ase ans$er t/e *uestion o t/e structure o data&ase. ,.: MAIN CONTENT ,.* Found'ton! o/ D't'$'!" T"r#! F(" ++ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ( ile is an ordered arrangement o records in $/ic/ eac/ record is stored in a uni*ue identiia&le location. @/e se*uence o t/e record is t/en t/e means &y $/ic/ t/e record $ill &e located. In most computer systems' t/e se*uence o records is eit/er alp/a&etic or numeric &ased on ield common to all records suc/ as name or num&er. R"cord! ( record or tuple is a complete set o related ields. ?or eDample' t/e Table ! &elo$ s/o$s a set o related ields' $/ic/ is a record. In ot/er $ords' i t/is $ere to &e a part o a ta&le t/en $e $ould call it a ro$ o data. @/ereore' a ro$ o data is also a record. T'$(" * Sr No Icod" Ord No Ord D't" PDt) + 73234@ 99<=-:: =-=-899< +89 F"(d ( ield is a property or a c/aracteristic t/at /olds some piece o inormation a&out an entity. (lso' it is a category o inormation $it/in a set o records. ?or eDample' t/e irst names' or address or p/one num&ers o people listed in address &ook. R"('ton! In t/e relational data model' t/e data in a data&ase is organiEed in relations. ( relation is synonymous $it/ aKta&leK. ( ta&le consists o columns and ro$s' $/ic/ are reerred as ield and records in "%M2 terms' and attri&utes and tuples in 7elational "%M2 terms. Attr$ut"! (n attri&ute is a property or c/aracteristics t/at /old some inormation a&out an entity. ( PCustomerK or eDample' /as attri&utes suc/ as a name' and an address. +8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM T'$(" +: DBMS 'nd R"('ton'( DBMS T"r#! n Co#<'r!on Co##on T"r# DBMS T"r#no(o-) RDBMS T"r#no(o-) "ata&ase @a&le "ata&ase @a&le @a&le 7elation Column ?ield (ttri&ute 7o$ 7ecord @uple ,.+ C!tor) @/e earliest kno$n use o t/e term database $as in Novem&er +:.=' $/en t/e 2ystem "evelopment Corporation sponsored a symposium under t/e title Development and Management of a "omputer#centered Data $ase. D't'$'!" as a single $ord &ecame common in Europe in t/e early +:;9s and &y t/e end o t/e decade it $as &eing used in major (merican ne$spapers. B@/e a&&reviation "%' /o$ever' survives.C @/e irst data&ase management systems $ere developed in t/e +:.9s. ( pioneer in t/e ield $as C/arles %ac/man. %ac/manIs early papers s/o$ t/at /is aim $as to make more eective use o t/e ne$ direct access storage devices &ecoming availa&le5 until t/en' data processing /ad &een &ased on punc/ed cards and magnetic tape' so t/at serial processing $as t/e dominant activity. @$o key data models arose at t/is time5 CO"(2AL developed t/e net$ork model &ased on %ac/manIs ideas' and Bapparently independentlyC t/e /ierarc/ical model $as used in a system developed &y Nort/ (merican 7ock$ell later adopted &y I%M as t/e cornerstone o t/eir IM2 product. W/ile IM2 along $it/ t/e CO"(2AL I"M2 $ere t/e &ig' /ig/ visi&ility data&ases developed in t/e +:.9s' several ot/ers $ere also &orn in t/at decade' some o $/ic/ /ave a signiicant installed &ase today. @/e relational model $as proposed &y E. ?. Codd in +:;9. )e criticiEed eDisting models or conusing t/e a&stract description o inormation structure $it/ descriptions o p/ysical access mec/anisms. ?or a long $/ile' /o$ever' t/e relational model remained o academic interest only. W/ile CO"(2AL products BI"M2C and net$ork model products BIM2C $ere conceived as practical engineering solutions taking account o t/e tec/nology as it eDisted at t/e time' t/e relational model took a muc/ more t/eoretical perspective' arguing BcorrectlyC t/at /ard$are and sot$are tec/nology $ould catc/ up in time. (mong t/e irst implementations $ere Mic/ael 2tone&rakerIs Ingres at %erkeley' and t/e 2ystem 7 project at I%M. %ot/ o t/ese $ere researc/ prototypes' announced during +:;.. @/e irst commercial products' Oracle and "%8' did not appear until around +:<9. += M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM "uring t/e +:<9s' researc/ activity ocused on distri&uted data&ase systems and data&ase mac/ines. (not/er important t/eoretical idea $as t/e ?unctional "ata Model' &ut apart rom some specialiEed applications in genetics' molecular &iology' and raud investigation' t/e $orld took little notice. In t/e +::9s' attention s/ited to o&ject4oriented data&ases. @/ese /ad some success in ields $/ere it $as necessary to /andle more compleD data t/an relational systems could easily cope $it/' suc/ as spatial data&ases' engineering data Bincluding sot$are repositoriesC' and multimedia data. In t/e 8999s' t/e as/iona&le area or innovation is t/e QML data&ase. (s $it/ o&ject data&ases' t/is /as spa$ned a ne$ collection o start4up companies' &ut at t/e same time t/e key ideas are &eing integrated into t/e esta&lis/ed relational products. ,., D't'$'!" T)<"! Considering development in inormation tec/nology and &usiness applications' t/ese /ave resulted in t/e evolution o several major types o data&ases. ?igure + illustrates several major conceptual categories o data&ases t/at may &e ound in many organiEations. O<"r'ton'( D't'$'!" @/ese data&ases store detailed data needed to support t/e &usiness processes and operations o t/e e4&usiness enterprise. @/ey are also called sub%ect area databases B2""%C' transaction database and production databases. EDamples are a customer data&ase' /uman resources data&ases' inventory data&ases' and ot/er data&ases containing data generated &y &usiness operations. @/is includes data&ases on Internet and e4commerce activity suc/ as click stream data& descri&ing t/e online &e/aviour o customers or visitors to a company $e&site. D!tr$ut"d D't'$'!"! Many organiEations replicate and distri&ute copies or parts o data&ases to net$ork servers at a variety o sites. @/ey can also reside in net$ork servers at a variety o sites. @/ese distri&uted data&ases can reside on net$ork servers on t/e World Wide We&' on corporate intranets or eDtranets or on any ot/er company net$orks. "istri&uted data&ases may &e copies o operational or analytic data&ases' /ypermedia or discussion data&ases' or any ot/er type o data&ase. 7eplication and distri&ution o data&ases is done to improve data&ase perormance and security. Ensuring t/at all o t/e data in an organiEationKs distri&uted data&ases +, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM are consistently and currently updated is a major c/allenge o distri&uted data&ase management. F-ur" *: E7'#<("! o/ t1" #'%or t)<"! o/ d't'$'!"! u!"d $) or-'n>'ton! 'nd "nd u!"r!.
E7t"rn'( D't'$'!"! (ccess to $ealt/ o inormation rom eDternal data&ases is availa&le or a ee rom conventional online services' and $it/ or $it/out c/arges rom many sources on t/e Internet' especially t/e $orld $ide $e&. We&sites provide an endless variety o /yperlinked pages o multimedia documents in hypermedia databases or you to access. "ata are availa&le in t/e orm o statistics in economics and demograp/ic activity rom statistical data &anks. Or you can vie$ or do$nload a&stracts or complete copies o ne$spapers' magaEines' ne$sletters' researc/ papers' and ot/er pu&lis/ed materials and ot/er periodicals rom bibliographic and full teDt data&ases. ,.8 D't'$'!" Stor'-" Structur"! "ata&ase ta&les-indeDes are typically stored in memory or on /ard disk in one o many orms' ordered-unordered ?lat iles' I2(M' )eaps' )as/ &uckets or %R @rees. @/ese /ave various advantages and disadvantages discussed in t/is topic. @/e most commonly used are %Rtrees and I2(M. Client !C or NC
End User "ata&ases
EDternal "ata&ase on t/e Internet and online services "ata Ware/ouse "ata Marts Net$ork 2erver Operational "ata&ases o t/e Org "istri&uted "ata&ases on On Intranets and ot/er Net$orks +1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM M"t1od! F('t F("! ( /('t /(" d't'$'!" descri&es any o various means to encode a data model Bmost commonly a ta&leC as a plain teDt ile. ( lat ile is a ile t/at contains records' and in $/ic/ eac/ record is speciied in a single line. ?ields rom eac/ record may simply /ave a iDed $idt/ $it/ padding' or may &e delimited &y $/itespace' ta&s' commas BC20C or ot/er c/aracters. EDtra ormatting may &e needed to avoid delimiter collision. @/ere are no structural relations/ips. @/e data are MlatM as in a s/eet o paper' in contrast to more compleD models suc/ as a relational data&ase. @/e classic eDample o a lat ile data&ase is a &asic name4and4address list' $/ere t/e data&ase consists o a small' iDed num&er o ields5 'ame' Address' and (hone 'umber. (not/er eDample is a simple )@ML ta&le' consisting o ro$s and columns. @/is type o data&ase is routinely encountered' alt/oug/ oten not eDpressly recogniEed as a data&ase. I#<("#"nt'ton: It is possi&le to $rite out &y /and' on a s/eet o paper' a list o names' addresses' and p/one num&ersJ t/is is a lat ile data&ase. @/is can also &e done $it/ any type$riter or $ord processor. %ut many pieces o computer sot$are are designed to implement lat ile data&ases. Unord"r"d storage typically stores t/e records in t/e order t/ey are inserted' $/ile /aving good insertion eiciency' it may seem t/at it $ould /ave ineicient retrieval times' &ut t/is is usually never t/e case as most data&ases use indeDes on t/e primary keys' resulting in eicient retrieval times. Ord"r"d or Linked list storage typically stores t/e records in order and may /ave to rearrange or increase t/e ile siEe in t/e case a record is inserted' t/is is very ineicient. )o$ever is &etter or retrieval as t/e records are pre4sorted BCompleDity OBlogBnCCC. Structur"d /("! simplest and most &asic met/od 4 insert eicient' records added at end o ile O Pc/ronologicalK order 4 retrieval ineicient as searc/ing /as to &e linear 4 deletion O deleted records marked +. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 4 re*uires periodic reorganiEation i ile is very volatile advantages 4 good or &ulk loading data 4 good or relatively small relations as indeDing over/eads are avoided 4 good $/en retrievals involve large proportion o records disadvantages 4 not eicient or selective retrieval using key values' especially i large 4 sorting may &e time4consuming not suita&le or PvolatileK ta&les Hash Buckets )as/ unctions calculate t/e address o t/e page in $/ic/ t/e record is to &e stored &ased on one or more ields in t/e record 4 )as/ing unctions c/osen to ensure t/at addresses are spread evenly across t/e address space 4 PoccupancyK is generally ,9G O .9G o total ile siEe 4 uni*ue address not guaranteed so collision detection and collision resolution mec/anisms are re*uired open addressing c/ained-unc/ained overlo$ pros and cons 4 eicient or eDact matc/es on key ield 4 not suita&le or range retrieval' $/ic/ re*uires se*uential storage 4 calculates $/ere t/e record is stored &ased on ields in t/e record 4 /as/ unctions ensure even spread o data 4 collisions are possi&le' so collision detection and restoration is re*uired B+ Trees @/ese are t/e most used in practice. t/e time taken to access any tuple is t/e same &ecause same num&er o nodes searc/ed indeD is a ull indeD so data ile does not /ave to &e ordered !ros and cons +; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 4 versatile data structure O se*uential as $ell as random access 4 access is ast 4 supports eDact' range' part key and pattern matc/es eiciently 4 PvolatileK iles are /andled eiciently &ecause indeD is dynamic O eDpands and contracts as ta&le gro$s and s/rinks Less $ell suited to relatively sta&le iles O in t/is case' I2(M is more eicient. ,.5 D't'$'!" S"r&"r! ( d't'$'!" !"r&"r is a computer program t/at provides data&ase services to ot/er computer programs or computers' as deined &y t/e client4server model. @/e term may also reer to a computer dedicated to running suc/ a program. "ata&ase management systems re*uently provide data&ase server unctionality' and some "%M2Is Be.g.' My2FLC rely eDclusively on t/e client4server model or data&ase access. In a master4slave model' data&ase master servers are central and primary locations o data $/ile data&ase slave servers are sync/roniEed &ackups o t/e master acting as proDies. ,.? D't'$'!" R"<(c'ton "ata&ase replication can &e used on many data&ase management systems' usually $it/ a master-slave relations/ip &et$een t/e original and t/e copies. @/e master logs t/e updates' $/ic/ t/en ripple t/roug/ to t/e slaves. @/e slave outputs a message stating t/at it /as received t/e update successully' t/us allo$ing t/e sending Band potentially re4 sending until successully appliedC o su&se*uent updates. Multi4master replication' $/ere updates can &e su&mitted to any data&ase node' and t/en ripple t/roug/ to ot/er servers' is oten desired' &ut introduces su&stantially increased costs and compleDity $/ic/ may make it impractical in some situations. @/e most common c/allenge t/at eDists in multi4master replication is transactional conlict prevention or resolution. Most sync/ronous or eager replication solutions do conlict prevention' $/ile async/ronous solutions /ave to do conlict resolution. ?or instance' i a record is c/anged on t$o nodes simultaneously' an eager replication system $ould detect t/e conlict &eore conirming t/e commit and a&ort one o t/e transactions. ( laEy replication system $ould allo$ &ot/ transactions to commit and run a conlict resolution during resync/roniEation. "ata&ase replication &ecomes diicult $/en it scales up. Usually' t/e +< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM scale up goes $it/ t$o dimensions' /oriEontal and vertical5 /oriEontal scale up /as more data replicas' vertical scale up /as data replicas located urt/er a$ay in distance. !ro&lems raised &y /oriEontal scale up can &e alleviated &y a multi4layer multi4vie$ access protocol. 0ertical scale up runs into less trou&le $/en t/e Internet relia&ility and perormance are improving. ,.7 R"('ton'( D't'$'!" ( r"('ton'( d't'$'!" is a data&ase t/at conorms to t/e relational model' and reers to a data&aseIs data and sc/ema Bt/e data&aseIs structure o /o$ t/ose data are arrangedC. @/e term Mrelational data&aseM is sometimes inormally used to reer to a relational data&ase management system' $/ic/ is t/e sot$are t/at is used to create and use a relational data&ase. @/e term relational database $as originally deined and coined &y Edgar Codd at I%M (lmaden 7esearc/ Center in +:;9"ontents 2trictly' a relational data&ase is a collection o relations Bre*uently called ta&lesC. Ot/er items are re*uently considered part o t/e data&ase' as t/ey /elp to organiEe and structure t/e data' in addition to orcing t/e data&ase to conorm to a set o re*uirements. T"r#no(o-) 7elational data&ase terminology. 7elational data&ase t/eory uses a dierent set o mat/ematical4&ased terms' $/ic/ are e*uivalent' or roug/ly e*uivalent' to 2FL data&ase terminology. @/e ta&le &elo$ summariEes some o t/e most important relational data&ase terms and t/eir 2FL data&ase e*uivalents. R"('ton'( t"r# S=L "Du&'("nt relation' &ase relvar ta&le derived relvar vie$' *uery result' result set tuple ro$ attri&ute column R"('ton! or T'$("! ( relation is deined as a set o tuples t/at /ave t/e same attri&utes ( tuple usually represents an o&ject and inormation a&out t/at o&ject. O&jects are typically p/ysical o&jects or concepts. ( relation is usually +: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM descri&ed as a ta&le' $/ic/ is organiEed into ro$s and columns. (ll t/e data reerenced &y an attri&ute are in t/e same domain and conorm to t/e same constraints. @/e relational model speciies t/at t/e tuples o a relation /ave no speciic order and t/at t/e tuples' in turn' impose no order on t/e attri&utes. (pplications access data &y speciying *ueries' $/ic/ use operations suc/ as select to identiy tuples' pro%ect to identiy attri&utes' and %oin to com&ine relations. 7elations can &e modiied using t/e insert' delete' and update operators. Ne$ tuples can supply eDplicit values or &e derived rom a *uery. 2imilarly' *ueries identiy tuples or updating or deleting. B'!" 'nd D"r&"d R"('ton! In a relational data&ase' all data are stored and accessed via relations. 7elations t/at store data are called M&ase relationsM' and in implementations are called Mta&lesM. Ot/er relations do not store data' &ut are computed &y applying relational operations to ot/er relations. @/ese relations are sometimes called Mderived relationsM. In implementations t/ese are called Mvie$sM or M*ueriesM. "erived relations are convenient in t/at t/oug/ t/ey may gra& inormation rom several relations' t/ey act as a single relation. (lso' derived relations can &e used as an a&straction layer. Keys ( uni*ue key is a kind o constraint t/at ensures t/at an o&ject' or critical inormation a&out t/e o&ject' occurs in at most one tuple in a given relation. ?or eDample' a sc/ool mig/t $ant eac/ student to /ave a separate locker. @o ensure t/is' t/e data&ase designer creates a key on t/e locker attri&ute o t/e student relation. 3eys can include more t/an one attri&ute' or eDample' a nation may impose a restriction t/at no province can /ave t$o cities $it/ t/e same name. @/e key $ould include province and city name. @/is $ould still allo$ t$o dierent provinces to /ave a to$n called 2pringield &ecause t/eir province is dierent. ( key over more t/an one attri&ute is called a compound key. Foreign Keys ( oreign key is a reerence to a key in anot/er relation' meaning t/at t/e reerencing tuple /as' as one o its attri&utes' t/e values o a key in t/e reerenced tuple. ?oreign keys need not /ave uni*ue values in t/e reerencing relation. ?oreign keys eectively use t/e values o attri&utes in t/e reerenced relation to restrict t/e domain o one or more attri&utes in t/e reerencing relation. 89 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ( oreign key could &e descri&ed ormally as5 M?or all tuples in t/e reerencing relation projected over t/e reerencing attri&utes' t/ere must eDist a tuple in t/e reerenced relation projected over t/ose same attri&utes suc/ t/at t/e values in eac/ o t/e reerencing attri&utes matc/ t/e corresponding values in t/e reerenced attri&utes.M 8.: CONCLUSION "ata&ase applications are used to store and manipulate data. ( data&ase application can &e used in many &usiness unctions including sales and inventory tracking' accounting' employee &eneits' payroll' production and more. "ata&ase programs or personal computers come in various s/ape and siEes. ( data&ase remains undamental or t/e implementation o any data&ase management system. 5.: SUMMARY ( "ata&ase is a structured collection o data t/at is managed to meet t/e needs o a community o users. @/e structure is ac/ieved &y organiEing t/e data according to a data&ase model @/e earliest kno$n use o t/e term database $as in Novem&er +:.=' $/en t/e 2ystem "evelopment Corporation sponsored a symposium under t/e title Development and Management of a "omputer# centered Data $ase. Considering development in inormation tec/nology and &usiness applications /ave resulted in t/e evolution o several major types o data&ases. "ata&ase ta&les-indeDes are typically stored in memory or on /ard disk in one o many orms' ordered-unordered ?lat iles' I2(M' )eaps' )as/ &uckets or %R @rees ( d't'$'!" !"r&"r is a computer program t/at provides data&ase services to ot/er computer programs or computers' as deined &y t/e client4server model "ata&ase replication can &e used on many data&ase management systems' usually $it/ a master-slave relations/ip &et$een t/e original and t/e copies ( r"('ton'( d't'$'!" is a data&ase t/at conorms to t/e relational model' and reers to a data&aseIs data and sc/ema ?.: TUTOR@MARAED ASSIGNMENT +. "eine t/e terms5 ?ield' 7ecords' ?ield 7elation and (ttri&ute 8. %riely descri&e a lat ile 8+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 7.: REFERENCESBFURTCER READINGS Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. "ommunications of the A"M *, B.C5 =;;O=<;. doi5 +9.++,1-=.8=<,.=.8.<1. OK%rien' #ames (. B899=C. B++ t/ EditionC Introduction to Inormation 2ystems. McGra$4)ill. 88 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM UNIT , DATABASE CONCEPTS CONTENTS +.9 Introduction 8.9 O&jectives =.9 Main Content =.+ Create' 7ead' Update and "elete =.8 (CI" =.= 3eys ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION @/ere are &asic and standard concepts associated $it/ all data&ases' and t/ese are $/at $e $ill discuss in muc/ detail in t/is unit. @/ese include t/e concept o Creating' 7eading' Updating and "eleting BC7U"C data' (CI" BAtomicity& "onsistency& Isolation& DurabilityC' and 3eys o dierent kinds. +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 kno$ t/e meaning o t/e acronymn C7U" understand t/e applications o data&ases kno$ t/e meaning o t/e acronymn (CI" and /o$ eac/ mem&ers o t/e (CI" dier rom eac/ ot/er understand t/e structure o a data&ase kno$ t/e types o keys associated $it/ data&ases. ,.: MAIN CONTENT ,.* Cr"'t"E R"'dE U<d't" 'nd D"("t" Create' read' update and delete BCRUDC are t/e our &asic unctions o persistent storage a major part o nearly all computer sot$are. 2ometimes ")*D is eDpanded $it/ t/e $ords retrieve instead o read or destroys instead o delete. It is also sometimes used to descri&e user interace conventions t/at acilitate vie$ing' searc/ing' and c/anging inormationJ oten using computer4&ased orms and reports. (lternate terms or C7U" Bone initialism and t/ree acronymsC5 8= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM (%C"5 add' &ro$se' c/ange' delete (CI"5 add' c/ange' in*uire' delete S t/oug/ t/is can &e conused $it/ t/e transactional use o t/e acronym (CI". %7E("5 &ro$se' read' edit' add' delete 0("EB7C5 vie$' add' delete' edit Band restore' or systems supporting transaction processingC D't'$'!" A<<(c'ton! @/e acronym ")*D reers to all o t/e major unctions t/at need to &e implemented in a relational data&ase application to consider it complete. Eac/ letter in t/e acronym can &e mapped to a standard 2FL statement5 O<"r'ton S=L Create IN2E7@ 7ead B7etrieveC 2ELEC@ Update U!"(@E "elete B"estroyC "ELE@E (lt/oug/ a relational data&ase is a common persistence layer in sot$are applications' t/ere are numerous ot/ers. C7U" can &e implemented $it/ an o&ject data&ase' an QML data&ase' lat teDt iles' custom ile ormats' tape' or card' or eDample. Google 2c/olar lists t/e irst reerence to create4read4update4delete as &y 3ilov in +::9. @/e concept seems to &e also descri&ed in more detail in 3ilovIs +::< &ook. U!"r Int"r/'c" C7U" is also relevant at t/e user interace level o most applications. ?or eDample' in address &ook sot$are' t/e &asic storage unit is an individual contact entry. (s a &are minimum' t/e sot$are must allo$ t/e user to5 Create or add ne$ entries 7ead' retrieve' searc/' or vie$ eDisting entries Update or edit eDisting entries "elete eDisting entries Wit/out at least t/ese our operations' t/e sot$are cannot &e considered complete. %ecause t/ese operations are so undamental' t/ey are oten documented and descri&ed under one compre/ensive /eading' suc/ as Mcontact managementM or Mcontact maintenanceM Bor Mdocument managementM in general' depending on t/e &asic storage unit or t/e particular applicationC. 8, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.+ ACID In computer science' ACID BAtomicity& "onsistency& Isolation& DurabilityC is a set o properties t/at guarantee t/at data&ase transactions are processed relia&ly. In t/e conteDt o data&ases' a single logical operation on t/e data is called a transaction. (n eDample o a transaction is a transer o unds rom one account to anot/er' even t/oug/ it mig/t consist o multiple individual operations Bsuc/ as de&iting one account and crediting anot/erC. Ato#ct) (tomicity reers to t/e a&ility o t/e "%M2 to guarantee t/at eit/er all o t/e tasks o a transaction are perormed or none o t/em are. ?or eDample' t/e transer o unds can &e completed or it can ail or a multitude o reasons' &ut atomicity guarantees t/at one account $onIt &e de&ited i t/e ot/er is not credited. (tomicity states t/at data&ase modiications must ollo$ an Tall or not/ingU rule. Eac/ transaction is said to &e Tatomic.U I one part o t/e transaction ails' t/e entire transaction ails. It is critical t/at t/e data&ase management system maintain t/e atomic nature o transactions in spite o any "%M2' operating system or /ard$are ailure. Con!!t"nc) Consistency property ensures t/at t/e data&ase remains in a consistent state &eore t/e start o t/e transaction and ater t/e transaction is over B$/et/er successul or notC. Consistency states t/at only valid data $ill &e $ritten to t/e data&ase. I' or some reason' a transaction is eDecuted t/at violates t/e data&aseKs consistency rules' t/e entire transaction $ill &e rolled &ack and t/e data&ase $ill &e restored to a state consistent $it/ t/ose rules. On t/e ot/er /and' i a transaction successully eDecutes' it $ill take t/e data&ase rom one state t/at is consistent $it/ t/e rules to anot/er state t/at is also consistent $it/ t/e rules. Dur'$(t) "ura&ility reers to t/e guarantee t/at once t/e user /as &een notiied o success' t/e transaction $ill persist' and not &e undone. @/is means it $ill survive system ailure' and t/at t/e data&ase system /as c/ecked t/e integrity constraints and $onIt need to a&ort t/e transaction. Many data&ases implement dura&ility &y $riting all transactions into a log t/at can &e played &ack to recreate t/e system state rig/t &eore t/e ailure. ( transaction can only &e deemed committed ater it is saely in t/e log. 81 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM I#<("#"nt'ton Implementing t/e (CI" properties correctly is not simple. !rocessing a transaction oten re*uires a num&er o small c/anges to &e made' including updating indices t/at are used &y t/e system to speed up searc/es. @/is se*uence o operations is su&ject to ailure or a num&er o reasonsJ or instance' t/e system may /ave no room let on its disk drives' or it may /ave used up its allocated C!U time. (CI" suggests t/at t/e data&ase &e a&le to perorm all o t/ese operations at once. In act t/is is diicult to arrange. @/ere are t$o popular amilies o tec/ni*ues5 $rite a/ead logging and s/ado$ paging. In &ot/ cases' locks must &e ac*uired on all inormation t/at is updated' and depending on t/e implementation' on all data t/at is &eing read. In $rite a/ead logging' atomicity is guaranteed &y ensuring t/at inormation a&out all c/anges is $ritten to a log &eore it is $ritten to t/e data&ase. @/at allo$s t/e data&ase to return to a consistent state in t/e event o a cras/. In s/ado$ing' updates are applied to a copy o t/e data&ase' and t/e ne$ copy is activated $/en t/e transaction commits. @/e copy reers to unc/anged parts o t/e old version o t/e data&ase' rat/er t/an &eing an entire duplicate. Until recently almost all data&ases relied upon locking to provide (CI" capa&ilities. @/is means t/at a lock must al$ays &e ac*uired &eore processing data in a data&ase' even on read operations. Maintaining a large num&er o locks' /o$ever' results in su&stantial over/ead as $ell as /urting concurrency. I user ( is running a transaction t/at /as read a ro$ o data t/at user % $ants to modiy' or eDample' user % must $ait until user (Is transaction is inis/ed. (n alternative to locking is multiversion concurrency control in $/ic/ t/e data&ase maintains separate copies o any data t/at is modiied. @/is allo$s users to read data $it/out ac*uiring any locks. Going &ack to t/e eDample o user ( and user %' $/en user (Is transaction gets to data t/at user % /as modiied' t/e data&ase is a&le to retrieve t/e eDact version o t/at data t/at eDisted $/en user ( started t/eir transaction. @/is ensures t/at user ( gets a consistent vie$ o t/e data&ase even i ot/er users are c/anging data t/at user ( needs to read. ( natural implementation o t/is idea results in a relaDation o t/e isolation property' namely snaps/ot isolation. It is diicult to guarantee (CI" properties in a net$ork environment. Net$ork connections mig/t ail' or t$o users mig/t $ant to use t/e same part o t/e data&ase at t/e same time. @$o4p/ase commit is typically applied in distri&uted transactions to ensure t/at eac/ participant in t/e transaction agrees on $/et/er t/e transaction s/ould &e committed or not. 8. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Care must &e taken $/en running transactions in parallel. @$o p/ase locking is typically applied to guarantee ull isolation. ,., A")! ,.,.* For"-n A") In t/e conteDt o relational data&ases' a oreign key is a reerential constraint &et$een t$o ta&les. @/e oreign key identiies a column or a set o columns in one BreerencingC ta&le t/at reers to a column or set o columns in anot/er BreerencedC ta&le. @/e columns in t/e reerencing ta&le must &e t/e primary key or ot/er candidate key in t/e reerenced ta&le. @/e values in one ro$ o t/e reerencing columns must occur in a single ro$ in t/e reerenced ta&le. @/us' a ro$ in t/e reerencing ta&le cannot contain values t/at donIt eDist in t/e reerenced ta&le BeDcept potentially NULLC. @/is $ay reerences can &e made to link inormation toget/er and it is an essential part o data&ase normaliEation. Multiple ro$s in t/e reerencing ta&le may reer to t/e same ro$ in t/e reerenced ta&le. Most o t/e time' it relects t/e one Bmaster ta&le' or reerenced ta&leC to many Bc/ild ta&le' or reerencing ta&leC relations/ip. @/e reerencing and reerenced ta&le may &e t/e same ta&le' i.e. t/e oreign key reers &ack to t/e same ta&le. 2uc/ a oreign key is kno$n in 2FL5899= as !"(/@r"/"r"ncn- or r"cur!&" oreign key. ( ta&le may /ave multiple oreign keys' and eac/ oreign key can /ave a dierent reerenced ta&le. Eac/ oreign key is enorced independently &y t/e data&ase system. @/ereore' cascading relations/ips &et$een ta&les can &e esta&lis/ed using oreign keys. Improper oreign key-primary key relations/ips or not enorcing t/ose relations/ips are oten t/e source o many data&ase and data modeling pro&lems. R"/"r"nt'( Acton! %ecause t/e "%M2 enorces reerential constraints' it must ensure data integrity i ro$s in a reerenced ta&le are to &e deleted Bor updatedC. I dependent ro$s in reerencing ta&les still eDist' t/ose reerences /ave to &e considered. 2FL5 899= speciies 1 dierent r"/"r"nt'( 'cton! t/at s/all take place in suc/ occurrences5 C(2C("E 7E2@7IC@ NO (C@ION 2E@ NULL 8; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 2E@ "E?(UL@ CASCADE W/enever ro$s in t/e master BreerencedC ta&le are deleted' t/e respective ro$s o t/e c/ild BreerencingC ta&le $it/ a matc/ing oreign key column $ill get deleted as $ell. ( oreign key $it/ a cascade delete means t/at i a record in t/e parent ta&le is deleted' t/en t/e corresponding records in t/e c/ild ta&le $ill automatically &e deleted. @/is is called a cascade delete. EDample @a&les5 CustomerBcustomerVid'cname'caddressCand OrderBcustomerVid'products'paymentC Customer is t/e master ta&le and Order is t/e c/ild ta&le' $/ere IcustomerVidI is t/e oreign key in Order and represents t/e customer $/o placed t/e order. W/en a ro$ o Customer is deleted' any Order ro$ matc/ing t/e deleted CustomerIs customerVid $ill also &e deleted. t/e values are deleted in t/e ro$ like i $e delete one ro$ in t/e parent ta&le t/en t/e same ro$ in t/e c/ild ta&le $ill &e automatically deleted. RESTRICT ( ro$ in t/e reerenced ta&le cannot &e updated or deleted i dependent ro$s still eDist. In t/at case' no data c/ange is even attempted and s/ould not &e allo$ed. NO ACTION @/e U!"(@E or "ELE@E 2FL statement is eDecuted on t/e reerenced ta&le. @/e "%M2 veriies at t/e end o t/e statement eDecution i none o t/e reerential relations/ips is violated. @/e major dierence to 7E2@7IC@ is t/at triggers or t/e statement semantics itsel may give a result in $/ic/ no oreign key relations/ips is violated. @/en' t/e statement can &e eDecuted successully. SET NULL @/e oreign key values in t/e reerencing ro$ are set to NULL $/en t/e reerenced ro$ is updated or deleted. @/is is only possi&le i t/e respective columns in t/e reerencing ta&le are nulla&le. "ue to t/e semantics o NULL' a reerencing ro$ $it/ NULLs in t/e oreign key columns does not re*uire a reerenced ro$. 8< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM SET DEFAULT 2imilarly to 2E@ NULL' t/e oreign key values in t/e reerencing ro$ are set to t/e column deault $/en t/e reerenced ro$ is updated or deleted. ,.,.+ C'ndd't" A") In t/e relational model' a c'ndd't" 0") o a relvar Brelation varia&leC is a set o attri&utes o t/at relvar suc/ t/at at all times it /olds in t/e relation assigned to t/at varia&le t/at t/ere are no t$o distinct turples $it/ t/e same values or t/ese attri&utes and t/ere is not a proper su&set o t/is set o attri&utes or $/ic/ B+C /olds. 2ince a superkey is deined as a set o attri&utes or $/ic/ B+C /olds' $e can also deine a candidate key as a minimal superkey' i.e. a superkey o $/ic/ no proper su&set is also a superkey. @/e importance o candidate keys is t/at t/ey tell us /o$ $e can identiy individual tuples in a relation. (s suc/ t/ey are one o t/e most important types o data&ase constraint t/at s/ould &e speciied $/en designing a data&ase sc/ema. 2ince a relation is a set Bno duplicate elementsC' it /olds t/at every relation $ill /ave at least one candidate key B&ecause t/e entire /eading is al$ays a superkeyC. 2ince in some 7"%M2s ta&les may also represent multisets B$/ic/ strictly means t/ese "%M2s are not relationalC' it is an important design rule to speciy eDplicitly at least one candidate key or eac/ relation. ?or practical reasons 7"%M2s usually re*uire t/at or eac/ relation one o its candidate keys is declared as t/e primary key' $/ic/ means t/at it is considered as t/e preerred $ay to identiy individual tuples. ?oreign keys' or eDample' are usually re*uired to reerence suc/ a primary key and not any o t/e ot/er candidate keys. D"t"r#nn- C'ndd't" A")! @/e previous eDample only illustrates t/e deinition o candidate key and not /o$ t/ese are in practice determined. 2ince most relations /ave a large num&er or even ininitely many instances it $ould &e impossi&le to determine all t/e sets o attri&utes $it/ t/e uni*ueness property or eac/ instance. Instead it is easier to consider t/e sets o real4$orld entities t/at are represented &y t/e relation and determine $/ic/ attri&utes o t/e entities uni*uely identiy t/em. ?or eDample a relation +mployeeB'ame' Address' DeptC pro&a&ly represents employees and t/ese are likely to &e uni*uely identiied &y a com&ination o 'ame and Address $/ic/ is t/ereore a superkey' and unless t/e same /olds or only 'ame or only Address' t/en t/is com&ination is also a candidate key. 8: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM In order to determine correctly t/e candidate keys it is important to determine all superkeys' $/ic/ is especially diicult i t/e relation represents a set o relations/ips rat/er t/an a set o entities ,.,., UnDu" 0") In relational data&ase design' a unDu" 0") or <r#'r) 0") is a candidate key to uni*uely identiy eac/ ro$ in a ta&le. ( uni*ue key or primary key comprises a single column or set o columns. No t$o distinct ro$s in a ta&le can /ave t/e same value Bor com&ination o valuesC in t/ose columns. "epending on its design' a ta&le may /ave ar&itrarily many uni*ue keys &ut at most one primary key. ( uni*ue key must uni*uely identiy all possible ro$s t/at eDist in a ta&le and not only t/e currently eDisting ro$s. EDamples o uni*ue keys are 2ocial 2ecurity num&ers Bassociated $it/ a speciic personC or I2%Ns Bassociated $it/ a speciic &ookC. @elep/one &ooks and dictionaries cannot use names or $ords or "e$ey "ecimal system num&ers as candidate keys &ecause t/ey do not uni*uely identiy telep/one num&ers or $ords. ( primary key is a special case o uni*ue keys. @/e major dierence is t/at or uni*ue keys t/e implicit NO@ NULL constraint is not automatically enorced' $/ile or primary keys it is. @/us' t/e values in a uni*ue key column may or may not &e NULL. (not/er dierence is t/at primary keys must &e deined using anot/er syntaD. @/e relational model' as eDpressed t/roug/ relational calculus and relational alge&ra' does not distinguis/ &et$een primary keys and ot/er kinds o keys. !rimary keys $ere added to t/e 2FL standard mainly as a convenience to t/e application programmer. Uni*ue keys as $ell as primary keys can &e reerenced &y orm ,.,.8 Su<"r0") A !u<"r0") is deined in t/e relational model o data&ase organiEation as a set o attri&utes o a relation varia&le BrelvarC or $/ic/ it /olds t/at in all relations assigned to t/at varia&le t/ere are no t$o distinct tuples Bro$sC t/at /ave t/e same values or t/e attri&utes in t/is set. E*uivalently a superkey can also &e deined as a set o attri&utes o a relvar upon $/ic/ all attri&utes o t/e relvar are unctionally dependent. Note t/at i attri&ute set , is a superkey o relvar )' t/en at all times it is t/e case t/at t/e projection o ) over , /as t/e same cardinality as ) itsel. =9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Inormally' a superkey is a set o columns $it/in a ta&le $/ose values can &e used to uni*uely identiy a ro$. ( candidate key is a minimal set o columns necessary to identiy a ro$' t/is is also called a minimal superkey. ?or eDample' given an employee ta&le' consisting o t/e columns employeeI"' name' jo&' and departmentI"' $e could use t/e employeeI" in com&ination $it/ any or all ot/er columns o t/is ta&le to uni*uely identiy a ro$ in t/e ta&le. EDamples o superkeys in t/is ta&le $ould &e WemployeeI"' NameX' WemployeeI"' Name' jo&X' and WemployeeI"' Name' jo&' departmentI"X. In a real data&ase $e donIt need values or all o t/ose columns to identiy a ro$. We only need' per our eDample' t/e set WemployeeI"X. @/is is a minimal superkey O t/at is' a minimal set o columns t/at can &e used to identiy a single ro$. 2o' employeeI" is a candidate key. E7'#<(" En-(!1 Mon'rc1! Mon'rc1 N'#" Mon'rc1 Nu#$"r Ro)'( Cou!" Ed$ard II !lantagenet Ed$ard III !lantagenet 7ic/ard II !lantagenet )enry I0 Lancaster In t/is eDample' t/e possi&le superkeys are5 WMonarc/ Name' Monarc/ Num&erX WMonarc/ Name' Monarc/ Num&er' 7oyal )ouseX ,.,.8 Surro-'t" 0") ( !urro-'t" 0") in a data&ase is a uni*ue identiier or eit/er an entity in t/e modeled $orld or an ob%ect in t/e data&ase. @/e surrogate key is not derived rom application data. D"/nton @/ere appear to &e t$o deinitions o a surrogate in t/e literature. We s/all call t/ese surrogate -!. and surrogate -/.5 Surro-'t" 5*6 @/is deinition is &ased on t/at given &y )all' O$lett and @odd B+:;.C. )ere a surrogate represents an entity in t/e outside $orld. @/e surrogate is internally generated &y t/e system &ut is nevert/eless visi&le &y t/e user or application. =+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Surro-'t" 5+6 @/is deinition is &ased on t/at given &y Wieringa and de #ung B+::+C. )ere a surrogate represents an ob%ect in t/e data&ase itsel. @/e surrogate is internally generated &y t/e system and is invisi&le to t/e user or application. We s/all adopt t/e surrogate -!. deinition t/roug/out t/is article largely &ecause it is more data model rat/er t/an storage model oriented. 2ee "ate B+::<C. (n important distinction eDists &et$een a surrogate and a primary key' depending on $/et/er t/e data&ase is a current data&ase or a temporal data&ase. ( current database stores only currently valid data' t/ereore t/ere is a one4to4one correspondence &et$een a surrogate in t/e modelled $orld and t/e primary key o some o&ject in t/e data&aseJ in t/is case t/e surrogate may &e used as a primary key' resulting in t/e term surrogate key. )o$ever' in a temporal data&ase t/ere is a many4to4 one relations/ip &et$een primary keys and t/e surrogate. 2ince t/ere may &e several o&jects in t/e data&ase corresponding to a single surrogate' $e cannot use t/e surrogate as a primary keyJ anot/er attri&ute is re*uired' in addition to t/e surrogate' to uni*uely identiy eac/ o&ject. (lt/oug/ )all et alia B+:;.C say not/ing a&out t/is' other aut/ors /ave argued t/at a surrogate s/ould /ave t/e ollo$ing constraints5 t/e value is uni*ue system4$ide' /ence never reusedJ t/e value is system generatedJ t/e value is not manipula&le &y t/e user or applicationJ t/e value contains no semantic meaningJ t/e value is not visi&le to t/e user or applicationJ t/e value is not composed o several values rom dierent domains. Surro-'t"! n Pr'ctc" In a current data&ase' t/e surrogate key can &e t/e primary key' generated &y t/e data&ase management system and not derived rom any application data in t/e data&ase. @/e only signiicance o t/e surrogate key is to act as t/e primary key. It is also possi&le t/at t/e surrogate key eDists in addition to t/e data&ase4generated uuid' e.g. a )7 num&er or eac/ employee &esides t/e UUI" o eac/ employee. ( surrogate key is re*uently a se*uential num&er Be.g. a 2y&ase or 2FL 2erver Midentity columnM' a !ostgre2FL serial' an Oracle 2EFUENCE or a column deined $it/ (U@OVINC7EMEN@ in My2FLC &ut doesnIt =8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM /ave to &e. )aving t/e key independent o all ot/er columns insulates t/e data&ase relations/ips rom c/anges in data values or data&ase design Bmaking t/e data&ase more agileC and guarantees uni*ueness. In a temporal data&ase' it is necessary to distinguis/ &et$een t/e surrogate key and t/e primary key. @ypically' every ro$ $ould /ave &ot/ a primary key and a surrogate key. @/e primary key identiies t/e uni*ue ro$ in t/e data&ase' t/e surrogate key identiies t/e uni*ue entity in t/e modelled $orldJ t/ese t$o keys are not t/e same. ?or eDample' ta&le 0taff may contain t$o ro$s or M#o/n 2mit/M' one ro$ $/en /e $as employed &et$een +::9 and +:::' anot/er ro$ $/en /e $as employed &et$een 899+ and 899.. @/e surrogate key is identical Bnon4 uni*ueC in &ot/ ro$s /o$ever t/e primary key 1ill &e uni*ue. 2ome data&ase designers use surrogate keys religiously regardless o t/e suita&ility o ot/er candidate keys' $/ile ot/ers $ill use a key already present in t/e data' i t/ere is one. ( surrogate may also &e called a surrogate key' entity identiier' system4generated key' data&ase se*uence num&er' synt/etic key' tec/nical key' or ar&itrary uni*ue identiier. 2ome o t/ese terms descri&e t/e $ay o generating ne$ surrogate values rat/er t/an t/e nature o t/e surrogate concept. 8.: CONCLUSION @/e undamental concepts t/at guide t/e operation o a data&ase' t/at is' C7U" and (CI" remains t/e same irrespective o t/e types and models o data&ases t/at emerge &y t/e day. )o$ever' one cannot rule out t/e possi&ilities o ot/er concepts emerging $it/ time in t/e near uture. 5.: SUMMARY Create' read' update and delete BCRUDC are t/e our &asic unctions o persistent storage a major part o nearly all computer sot$are. In computer science' ACID BAtomicity& "onsistency& Isolation& DurabilityC is a set o properties t/at guarantee t/at data&ase transactions are processed relia&ly. In t/e conteDt o data&ases' a single logical operation on t/e data is called a transaction. == M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM In t/e conteDt o relational data&ases a oreign key is a reerential constraint &et$een t$o ta&les In t/e relational model' a c'ndd't" 0") o a relvar Brelation varia&leC is a set o attri&utes o t/at relvar suc/ t/at at all times it /olds in t/e relation assigned to t/at varia&le t/at t/ere are no t$o distinct tuples $it/ t/e same values or t/ese attri&utes In relational data&ase design' a unDu" 0") or <r#'r) 0") is a candidate key to uni*uely identiy eac/ ro$ in a ta&le Su<"r0"): A !u<"r0") is deined in t/e relational model o data&ase organiEation as a set o attri&utes o a relation varia&le BrelvarC or $/ic/ it /olds t/at in all relations assigned to t/at varia&le t/ere are no t$o distinct tuples Bro$sC t/at /ave t/e same values or t/e attri&utes in t/is set ( !urro-'t" 0") in a data&ase is a uni*ue identiier or eit/er an entity in t/e modeled $orld or an ob%ect in t/e data&ase. ?.: TUTOR@MARAED ASSIGNMENT +. W/at are t/e meaning o t/e acronyms C7U" and (CI" 8. W/at are t/e constraints associated $it/ surrogate keys 7.: REFERENCESBFURTCER READINGS Nijssen' G.M. B+:;.C. Modelling in Data $ase Management 0ystems. Nort/4)olland !u&. Co. I2%N 94;89,49,1:48. Engles' 7.W.5 B+:;8C. A Tutorial on Data#$ase 2rgani3ation' (nnual 7evie$ in (utomatic !rogramming' 0ol.;' !art +' !ergamon !ress' ODord' pp. +O.,. Langeors' %5 B+:.<C. +lementary 4iles and +lementary 4ile )ecords' !roceedings o ?ile .<' an I?I!-I(G International 2eminar on ?ile Organisation' (msterdam' Novem&er' pp. <:O:.. @/e Identiication o O&jects and 7oles5 O&ject Identiiers 7evisited &y Wieringa and de #ung B+::+C. 7elational "ata&ase Writings +::,O+::; &y C.#. "ate B+::<C' C/apters ++ and +8. Carter' %reck. MIntelligent 0ersus 2urrogate 3eysM. 7etrieved on 899.4+849=. 7ic/ardson' Lee. MCreate "ata "isaster5 (void Uni*ue IndeDes O BMistake = o +9CM. %erkus' #os/. M"ata&ase 2oup5 !rimary 3eyvil' !art IM. =, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Gray' #im B2eptem&er +:<+C. M@/e @ransaction Concept5 0irtues and LimitationsM. (roceedings of the 5th International "onference on 6ery 7arge Data $ases5 pages +,,O+1,' +:=== 0allco !ark$ay' Cupertino C( :19+,5 @andem Computers. #im Gray L (ndreas 7euter' "istri&uted @ransaction !rocessing5 Concepts and @ec/ni*ues' Morgan 3auman +::=. I2%N +11<.9+:98. "ate' C/ristop/er B899=C. M15 IntegrityM' An Introduction to Database 0ystems. (ddison4Wesley' pp. 8.<48;.. I2%N :;<49=8++<:1.+. =1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM UNIT 8 DATABASE MODELS * CONTENTS +.9 Introduction 8.9 O&jectives =.9 Main Content =.+ )ierarc/ical Model =.8 Net$ork Model =.= O&ject47elational "ata&ase =., O&ject "ata&ase =.1 (ssociative Model o "ata =.. Column4Oriented "%M2 =.; Navigational "ata&ase =.< "istri&uted "ata&ase =.: 7eal @ime "ata&ase ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION 2everal models /ave evolved in t/e course o development o data&ases and data&ase management system. @/is /as resulted in several orms o models deployed &y users depending on t/eir needs and understanding. In t/is unit $e set t/e pace to Q4ray t/ese models and conclude in su&se*uent unit. +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 kno$ and deine t/e dierent types o data&ase models dierentiate t/e data&ase models rom eac/ ot/er sketc/ t/e rame$ork o /ierarc/ical and net$ork models understand t/e concepts and model &e/ind t/e models kno$ t/e advantages and disadvantages o t/e dierent models. ,.: MAIN CONTENT ,.* C"r'rc1c'( Mod"( In a /ierarc/ical model' data is organiEed into an inverted tree4like structure' implying a multiple do$n$ard link in eac/ node to descri&e t/e nesting' and a sort ield to keep t/e records in a particular order in =. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM eac/ same4level list. @/is structure arranges t/e various data elements in a /ierarc/y and /elps to esta&lis/ logical relations/ips among data elements o multiple iles. Eac/ unit in t/e model is a record $/ic/ is also kno$n as a node. In suc/ a model' eac/ record on one level can &e related to multiple records on t/e neDt lo$er level. ( record t/at /as su&sidiary records is called a parent and t/e su&sidiary records are called c/ildren. "ata elements in t/is model are $ell suited or one4to4many relations/ips $it/ ot/er data elements in t/e data&ase. F-ur" *: A C"r'rc1c'( Structur"
@/is model is advantageous $/en t/e data elements are in/erently /ierarc/ical. @/e disadvantage is t/at in order to prepare t/e data&ase it &ecomes necessary to identiy t/e re*uisite groups o iles t/at are to &e logically integrated. )ence' a /ierarc/ical data model may not al$ays &e leDi&le enoug/ to accommodate t/e dynamic needs o an organiEation. E7'#<(" (n eDample o a 1"r'rc1c'( d't' #od"( $ould &e i an organiEation /ad records o employees in a ta&le Bentity typeC called MEmployeesM. In t/e ta&le t/ere $ould &e attri&utes-columns suc/ as ?irst Name' Last Name' #o& Name and Wage. @/e company also /as data a&out t/e employeeKs c/ildren in a separate ta&le called MC/ildrenM $it/ attri&utes suc/ as ?irst Name' Last Name' and date o &irt/. @/e Employee ta&le represents a parent segment and t/e C/ildren ta&le represents a C/ild segment. @/ese t$o segments orm a /ierarc/y $/ere an employee may /ave many c/ildren' &ut eac/ c/ild may only /ave one parent. "epartment "ata Element !roject ( "ata Element !roject % "ata Element Employee + "ata Element Employee % "ata Element =; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Consider t/e ollo$ing structure5 E#<No D"!-n'ton R"<ort!To +9 "irector 89 2enior Manager +9 =9 @ypist 89 ,9 !rogrammer 89 In t/is' t/e Mc/ildM is t/e same type as t/e MparentM. @/e /ierarc/y stating EmpNo +9 is &oss o 89' and =9 and ,9 eac/ report to 89 is represented &y t/e M7eports@oM column. In 7elational data&ase terms' t/e 7eports@o column is a oreign key reerencing t/e EmpNo column. I t/e Mc/ildM data type $ere dierent' it $ould &e in a dierent ta&le' &ut t/ere $ould still &e a oreign key reerencing t/e EmpNo column o t/e employees ta&le. @/is simple model is commonly kno$n as t/e adjacency list model' and $as introduced &y "r. Edgar ?. Codd ater initial criticisms suraced t/at t/e relational model could not model /ierarc/ical data. ,.+ N"t2or0 Mod"( In t/e net$ork model' records can participate in any num&er o named relations/ips. Eac/ relations/ip associates a record o one type Bcalled t/e o2n"rC $it/ multiple records o anot/er type Bcalled t/e #"#$"rC. @/ese relations/ips Bsome$/at conusinglyC are called !"t!. ?or eDample a student mig/t &e a mem&er o one set $/ose o$ner is t/e course t/ey are studying' and a mem&er o anot/er set $/ose o$ner is t/e college t/ey &elong to. (t t/e same time t/e student mig/t &e t/e o$ner o a set o email addresses' and o$ner o anot/er set containing p/one num&ers. @/e main dierence &et$een t/e net$ork model and /ierarc/ical model is t/at in a net$ork model' a c/ild can /ave a num&er o parents $/ereas in a /ierarc/ical model' a c/ild can /ave only one parent. @/e /ierarc/ical model is t/ereore a su&set o t/e net$ork model. =< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM F-ur" ,: N"t2or0 Structur"
!rogrammatic access to net$ork data&ases is traditionally &y means o a navigational data manipulation language' in $/ic/ programmers navigate rom a current record to ot/er related records using ver&s suc/ as find o1ner' find next' and find prior. @/e most common eDample o suc/ an interace is t/e CO%OL4&ased "ata Manipulation Language deined &y CO"(2AL. Net$ork data&ases are traditionally implemented &y using c/ains o pointers &et$een related records. @/ese pointers can &e node num&ers or disk addresses. @/e net$ork model &ecame popular &ecause it provided considera&le leDi&ility in modelling compleD data relations/ips' and also oered /ig/ perormance &y virtue o t/e act t/at t/e access ver&s used &y programmers mapped directly to pointer4ollo$ing in t/e implementation. @/e net$ork model provides greater advantage t/an t/e /ierarc/ical model in t/at it promotes greater leDi&ility and data accessi&ility' since records at a lo$er level can &e accessed $it/out accessing t/e records a&ove t/em. @/is model is more eicient t/an /ierarc/ical model' easier to understand and can &e applied to many real $orld pro&lems t/at re*uire routine transactions. @/e disadvantages are t/at5 It is a compleD process to design and develop a net$ork data&aseJ It /as to &e reined re*uentlyJ It re*uires t/at t/e relations/ips among all t/e records &e deined &eore development starts' and c/anges oten demand major programming eortsJ Operation and maintenance o t/e net$ork model is eDpensive and time consuming. EDamples o data&ase engines t/at /ave net$ork model capa&ilities are 7"M Em&edded and 7"M 2erver. "epartment ( "epartment % 2tudent ( 2tudent % 2tudent C !roject ( !roject % =: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM )o$ever' t/e model /ad several disadvantages. Net$orkl programming proved error4prone as data models &ecame more compleD' and small c/anges to t/e data structure could re*uire c/anges to many programs. (lso' &ecause o t/e use o p/ysical pointers' operations suc/ as data&ase loading and restructuring could &e very time4consuming. Conc"<t 'nd C!tor): @/e net$ork model is a data&ase model conceived as a leDi&le $ay o representing o&jects and t/eir relations/ips. Its original inventor $as C/arles %ac/man' and it $as developed into a standard speciication pu&lis/ed in +:.: &y t/e CO"(2AL Consortium. W/ere t/e /ierarc/ical model structures data as a tree o records' $it/ eac/ record /aving one parent record and many c/ildren' t/e net$ork model allo$s eac/ record to /ave multiple parent and c/ild records' orming a lattice structure. @/e c/ie argument in avour o t/e net$ork model' in comparison to t/e /ierarc/ic model' $as t/at it allo$ed a more natural modeling o relations/ips &et$een entities. (lt/oug/ t/e model $as $idely implemented and used' it ailed to &ecome dominant or t$o main reasons. ?irstly' I%M c/ose to stick to t/e /ierarc/ical model $it/ semi4 net$ork eDtensions in t/eir esta&lis/ed products suc/ as IM2 and "L-I. 2econdly' it $as eventually displaced &y t/e relational model' $/ic/ oered a /ig/er4level' more declarative interace. Until t/e early +:<9s t/e perormance &eneits o t/e lo$4level navigational interaces oered &y /ierarc/ical and net$ork data&ases $ere persuasive or many large4 scale applications' &ut as /ard$are &ecame aster' t/e eDtra productivity and leDi&ility o t/e relational model led to t/e gradual o&solescence o t/e net$ork model in corporate enterprise usage. ,., O$%"ct@R"('ton'( D't'$'!" (n o&ject4relational data&ase BO7"C or o&ject4relational data&ase management system BO7"%M2C is a data&ase management system B"%M2C similar to a relational data&ase' &ut $it/ an o&ject4oriented data&ase model5 o&jects' classes and in/eritance are directly supported in data&ase sc/emas and in t/e *uery language. In addition' it supports eDtension o t/e data model $it/ custom data4types and met/ods. One aim or t/is type o system is to &ridge t/e gap &et$een conceptual data modeling tec/ni*ues suc/ as Entity4relations/ip diagram BE7"C and o&ject4relational mapping BO7MC' $/ic/ oten use classes and in/eritance' and relational data&ases' $/ic/ do not directly support t/em. (not/er' related' aim is to &ridge t/e gap &et$een relational data&ases and t/e o&ject4oriented modeling tec/ni*ues used in programming ,9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM languages suc/ as #ava' CRR or CY )o$ever' a more popular alternative or ac/ieving suc/ a &ridge is to use a standard relational data&ase systems $it/ some orm o O7M sot$are. W/ereas traditional 7"%M2 or 2FL4"%M2 products ocused on t/e eicient management o data dra$n rom a limited set o data4types Bdeined &y t/e relevant language standardsC' an o&ject4relational "%M2 allo$s sot$are4developers to integrate t/eir o$n types and t/e met/ods t/at apply to t/em into t/e "%M2. O7"%M2 tec/nology aims to allo$ developers to raise t/e level o a&straction at $/ic/ t/ey vie$ t/e pro&lem domain. @/is goal is not universally s/aredJ proponents o relational data&ases oten argue t/at o&ject4oriented speciication lo1ers t/e a&straction level. (n o&ject4relational data&ase can &e said to provide a middle ground &et$een relational data&ases and ob%ect#oriented databases BOO"%M2C. In o&ject4relational data&ases' t/e approac/ is essentially t/at o relational data&ases5 t/e data resides in t/e data&ase and is manipulated collectively $it/ *ueries in a *uery languageJ at t/e ot/er eDtreme are OO"%M2es in $/ic/ t/e data&ase is essentially a persistent o&ject store or sot$are $ritten in an o&ject4oriented programming language' $it/ a programming (!I or storing and retrieving o&jects' and little or no speciic support or *uerying. Many 2FL O7"%M2s on t/e market today are eDtensi&le $it/ user4 deined types BU"@C and custom4$ritten unctions Be.g. stored procedures. 2ome Be.g. 2FL 2erverC allo$ suc/ unctions to &e $ritten in o&ject4oriented programming languages' &ut t/is &y itsel doesnIt make t/em o&ject4oriented data&asesJ in an o&ject4oriented data&ase' o&ject orientation is a eature o t/e data model. ,.8 O$%"ct D't'$'!" In an o$%"ct d't'$'!" Balso o$%"ct or"nt"d d't'$'!"C' inormation is represented in t/e orm o o&jects as used in o&ject4oriented programming. W/en data&ase capa&ilities are com&ined $it/ o&ject programming language capa&ilities' t/e result is an o&ject data&ase management system BO"%M2C. (n O"%M2 makes data&ase o&jects appear as programming language o&jects in one or more o&ject programming languages. (n O"%M2 eDtends t/e programming language $it/ transparently persistent data' concurrency control' data recovery' associative *ueries' and ot/er capa&ilities. 2ome o&ject4oriented data&ases are designed to $ork $ell $it/ o&ject4 oriented programming languages suc/ as !yt/on' #ava' CY' 0isual %asic .NE@' CRR' O&jective4C and 2malltalk. Ot/ers /ave t/eir o$n ,+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM programming languages. (n O"%M2s use eDactly t/e same model as o&ject4oriented programming languages. O&ject data&ases are generally recommended $/en t/ere is a &usiness need or /ig/ perormance processing on compleD data. Ado<ton o/ O$%"ct D't'$'!"! O&ject data&ases &ased on persistent programming ac*uired a nic/e in application areas suc/ as engineering and spatial data&ases' telecommunications' and scientiic areas suc/ as /ig/ energy p/ysics and molecular &iology. @/ey /ave made little impact on mainstream commercial data processing' t/oug/ t/ere is some usage in specialiEed areas o inancial service Z . It is also $ort/ noting t/at o&ject data&ases /eld t/e record or t/e WorldIs largest data&ase B&eing irst to /old over +999 @era&ytes at 2tanord Linear (ccelerator Center MLessons Learned ?rom Managing ( !eta&yteMC and t/e /ig/est ingest rate ever recorded or a commercial data&ase at over one @era&yte per /our. (not/er group o o&ject data&ases ocuses on em&edded use in devices' packaged sot$are' and realtime systems. Ad&'nt'-"! 'nd D!'d&'nt'-"! %enc/marks &et$een O"%M2s and 7"%M2s /ave s/o$n t/at an O"%M2 can &e clearly superior or certain kinds o tasks. @/e main reason or t/is is t/at many operations are perormed using navigational rat/er t/an declarative interaces' and navigational access to data is usually implemented very eiciently &y ollo$ing pointers. Critics o navigational data&ase4&ased tec/nologies like O"%M2 suggest t/at pointer4&ased tec/ni*ues are optimiEed or very speciic Msearc/ routesM or vie$points. )o$ever' or general4purpose *ueries on t/e same inormation' pointer4&ased tec/ni*ues $ill tend to &e slo$er and more diicult to ormulate t/an relational. @/us' navigation appears to simpliy speciic kno$n uses at t/e eDpense o general' unoreseen' and varied uture uses. )o$ever' $it/ suita&le language support' direct o&ject reerences may &e maintained in addition to normalised' indeDed aggregations' allo$ing &ot/ kinds o accessJ urt/ermore' a persistent language may indeD aggregations on $/atever is returned &y some ar&itrary o&ject access met/od' rat/er t/an only on attri&ute value' $/ic/ can simpliy some *ueries. Ot/er t/ings t/at $ork against an O"%M2 seem to &e t/e lack o interopera&ility $it/ a great num&er o tools-eatures t/at are taken or granted in t/e 2FL $orld including &ut not limited to industry standard ,8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM connectivity' reporting tools' OL(! tools' and &ackup and recovery standards. (dditionally' o&ject data&ases lack a ormal mat/ematical oundation' unlike t/e relational model' and t/is in turn leads to $eaknesses in t/eir *uery support. )o$ever' t/is o&jection is oset &y t/e act t/at some O"%M2s ully support 2FL in addition to navigational access' e.g. O&jectivity-2FLRR' Matisse' and Inter2ystems C(C)[. Eective use may re*uire compromises to keep &ot/ paradigms in sync. In act t/ere is an intrinsic tension &et$een t/e notion o encapsulation' $/ic/ /ides data and makes it availa&le only t/roug/ a pu&lis/ed set o interace met/ods' and t/e assumption underlying muc/ data&ase tec/nology' $/ic/ is t/at data s/ould &e accessi&le to *ueries &ased on data content rat/er t/an predeined access pat/s. "ata&ase4centric t/inking tends to vie$ t/e $orld t/roug/ a declarative and attri&ute4 driven vie$point' $/ile OO! tends to vie$ t/e $orld t/roug/ a &e/avioral vie$point' maintaining entity4identity independently o c/anging attri&utes. @/is is one o t/e many impedance mismatc/ issues surrounding OO! and data&ases. (lt/oug/ some commentators /ave $ritten o o&ject data&ase tec/nology as a ailure' t/e essential arguments in its avor remain valid' and attempts to integrate data&ase unctionality more closely into o&ject programming languages continue in &ot/ t/e researc/ and t/e industrial communities. ,.5 A!!oc't&" Mod"( o/ D't' @/e '!!oc't&" #od"( o/ d't' is an alternative data model or data&ase systems. Ot/er data models' suc/ as t/e relational model and t/e o&ject data model' are record4&ased. @/ese models involve encompassing attri&utes a&out a t/ing' suc/ as a car' in a record structure. 2uc/ attri&utes mig/t &e registration' colour' make' model' etc. In t/e associative model' everyt/ing $/ic/ /as Tdiscrete independent eDistenceU is modeled as an entity' and relations/ips &et$een t/em are modeled as associations. @/e granularity at $/ic/ data is represented is similar to sc/emes presented &y C/en BEntity4relations/ip modelCJ %racc/i' !aolini and !elagatti B%inary 7elationsCJ and 2enko B@/e Entity 2et ModelC. ,.? Co(u#n@Or"nt"d DBMS ( co(u#n@or"nt"d DBMS is a data&ase management system B"%M2C $/ic/ stores its content &y column rat/er t/an &y ro$. @/is /as advantages or data&ases suc/ as data $are/ouses and li&rary ,= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM catalogues' $/ere aggregates are computed over large num&ers o similar data items. B"n"/t! Comparisons &et$een ro$4oriented and column4oriented systems are typically concerned $it/ t/e eiciency o /ard4disk access or a given $orkload' as seek time is incredi&ly long compared to t/e ot/er delays in computers. ?urt/er' &ecause seek time is improving at a slo$ rate relative to cpu po$er Bsee MooreIs La$C' t/is ocus $ill likely continue on systems reliant on /ard4disks or storage. ?ollo$ing is a set o over4 simpliied o&servations $/ic/ attempt to paint a picture o t/e trade4os &et$een column and ro$ oriented organiEations. +. Column4oriented systems are more eicient $/en an aggregate needs to &e computed over many ro$s &ut only or a nota&ly smaller su&set o all columns o data' &ecause reading t/at smaller su&set o data can &e aster t/an reading all data. 8. Column4oriented systems are more eicient $/en ne$ values o a column are supplied or all ro$s at once' &ecause t/at column data can &e $ritten eiciently and replace old column data $it/out touc/ing any ot/er columns or t/e ro$s. =. 7o$4oriented systems are more eicient $/en many columns o a single ro$ are re*uired at t/e same time' and $/en ro$4siEe is relatively small' as t/e entire ro$ can &e retrieved $it/ a single disk seek. ,. 7o$4oriented systems are more eicient $/en $riting a ne$ ro$ i all o t/e column data is supplied at t/e same time' as t/e entire ro$ can &e $ritten $it/ a single disk seek. In practice' ro$ oriented arc/itectures are $ell4suited or OL@!4like $orkloads $/ic/ are more /eavily loaded $it/ interactive transactions. Column stores are $ell4suited or OL(!4like $orkloads Be.g.' data $are/ousesC $/ic/ typically involve a smaller num&er o /ig/ly compleD *ueries over all data Bpossi&ly tera&ytesC. Stor'-" E//c"nc) &!. R'ndo# Acc"!! Column data is o uniorm typeJ t/ereore' t/ere are some opportunities or storage siEe optimiEations availa&le in column oriented data t/at are not availa&le in ro$ oriented data. ?or eDample' many popular modern compression sc/emes' suc/ as L\W' make use o t/e similarity o adjacent data to compress. W/ile t/e same tec/ni*ues may &e used on ,, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ro$4oriented data' a typical implementation $ill ac/ieve less eective results. ?urt/er' t/is &e/avior &ecomes more dramatic $/en a large percentage o adjacent column data is eit/er t/e same or not4present' suc/ as in a sparse column Bsimilar to a sparse matriDC. @/e opposing tradeo is 7andom (ccess. 7etrieving all data rom a single ro$ is more eicient $/en t/at data is located in a single location' suc/ as in a ro$4oriented arc/itecture. ?urt/er' t/e greater adjacent compression ac/ieved' t/e more diicult random4access may &ecome' as data mig/t need to &e uncompressed to &e read. I#<("#"nt'ton! ?or many years' only t/e 2y&ase IF product $as commonly availa&le in t/e column4oriented "%M2 class. )o$ever' t/at /as c/anged rapidly in t/e last e$ years $it/ many open source and commercial implementations. ,.7 N'&-'ton'( D't'$'!" N'&-'ton'( d't'$'!"! are c/aracteriEed &y t/e act t/at o&jects in t/e data&ase are ound primarily &y ollo$ing reerences rom ot/er o&jects. @raditionally navigational interaces are procedural' t/oug/ one could c/aracteriEe some modern systems like Q!at/ as &eing simultaneously navigational and declarative. Navigational access is traditionally associated $it/ t/e net$ork model and /ierarc/ical model o data&ase interaces and /ave evolved into 2et4 oriented systems. Navigational tec/ni*ues use MpointersM and Mpat/sM to navigate among data records Balso kno$n as MnodesMC. @/is is in contrast to t/e relational model Bimplemented in relational data&asesC' $/ic/ strives to use MdeclarativeM or logic programming tec/ni*ues in $/ic/ you ask t/e system or 1hat you $ant instead o ho1 to navigate to it. ?or eDample' to give directions to a /ouse' t/e navigational approac/ $ould resem&le somet/ing like' MGet on /ig/$ay 81 or < miles' turn onto )orse 7oad' let at t/e red &arn' t/en stop at t/e =rd /ouse do$n t/e roadM. W/ereas' t/e declarative approac/ $ould resem&le' M0isit t/e green /ouseBsC $it/in t/e ollo$ing coordinates....M )ierarc/ical models are also considered navigational &ecause one MgoesM up Bto parentC' do$n Bto leavesC' and t/ere are Mpat/sM' suc/ as t/e amiliar ile-older pat/s in /ierarc/ical ile systems. In general' navigational systems $ill use com&inations o pat/s and prepositions suc/ as MneDtM' MpreviousM' MirstM' MlastM' MupM' Mdo$nM' etc. ,1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 2ome also suggest t/at navigational data&ase engines are easier to &uild and take up less memory B7(MC t/an relational e*uivalents. )o$ever' t/e eDistence o relational or relational4&ased products o t/e late +:<9s t/at possessed small engines B&y todayIs standardsC &ecause t/ey did not use 2FL suggest t/is is not necessarily t/e case. W/atever t/e reason' navigational tec/ni*ues are still t/e preerred $ay to /andle smaller4 scale structures. ( current eDample o navigational structuring can &e ound in t/e "ocument O&ject Model B"OMC oten used in $e& &ro$sers and closely associated $it/ #ava2cript. @/e "OM MengineM is essentially a lig/t4 $eig/t navigational data&ase. @/e World Wide We& itsel and Wikipedia could even &e considered orms o navigational data&ases. BOn a large scale' t/e We& is a net$ork model and on smaller or local scales' suc/ as domain and U7L partitioning' it uses /ierarc/ies.C ,.8 D!tr$ut"d D't'$'!" ( d!tr$ut"d d't'$'!" is a data&ase t/at is under t/e control o a central data&ase management system B"%M2C in $/ic/ storage devices are not all attac/ed to a common C!U. It may &e stored in multiple computers located in t/e same p/ysical location' or may &e dispersed over a net$ork o interconnected computers. Collections o data Be.g. in a data&aseC can &e distri&uted across multiple p/ysical locations. ( distri&uted data&ase is distri&uted into separate partitions-ragments. Eac/ partition-ragment o a distri&uted data&ase may &e replicated Bi.e. redundant ail4overs' 7(I" likeC. %esides distri&uted data&ase replication and ragmentation' t/ere are many ot/er distri&uted data&ase design tec/nologies. ?or eDample' local autonomy' sync/ronous and async/ronous distri&uted data&ase tec/nologies. @/ese tec/nologiesI implementation can and does depend on t/e needs o t/e &usiness and t/e sensitivity-conidentiality o t/e data to &e stored in t/e data&ase' and /ence t/e price t/e &usiness is $illing to spend on ensuring data security' consistency and integrity. I#<ort'nt con!d"r'ton! Care $it/ a distri&uted data&ase must &e taken to ensure t/e ollo$ing5 @/e distri&ution is transparent S users must &e a&le to interact $it/ t/e system as i it $ere one logical system. @/is applies to t/e systemIs perormance' and met/ods o access amongst ot/er t/ings. @ransactions are transparent S eac/ transaction must maintain data&ase integrity across multiple data&ases. @ransactions must also ,. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM &e divided into su&transactions' eac/ su&transaction aecting one data&ase system. Ad&'nt'-"! o/ D!tr$ut"d D't'$'!"! 7elects organiEational structure S data&ase ragments are located in t/e departments t/ey relate to. Local autonomy S a department can control t/e data a&out t/em Bas t/ey are t/e ones amiliar $it/ it.C Improved availa&ility S a ault in one data&ase system $ill only aect one ragment' instead o t/e entire data&ase. Improved perormance S data is located near t/e site o greatest demand' and t/e data&ase systems t/emselves are paralleliEed' allo$ing load on t/e data&ases to &e &alanced among servers. B( /ig/ load on one module o t/e data&ase $onIt aect ot/er modules o t/e data&ase in a distri&uted data&ase.C Economics S it costs less to create a net$ork o smaller computers $it/ t/e po$er o a single large computer. Modularity S systems can &e modiied' added and removed rom t/e distri&uted data&ase $it/out aecting ot/er modules BsystemsC. D!'d&'nt'-"! o/ D!tr$ut"d D't'$'!"! CompleDity S eDtra $ork must &e done &y t/e "%(s to ensure t/at t/e distri&uted nature o t/e system is transparent. EDtra $ork must also &e done to maintain multiple disparate systems' instead o one &ig one. EDtra data&ase design $ork must also &e done to account or t/e disconnected nature o t/e data&ase S or eDample' joins &ecome pro/i&itively eDpensive $/en perormed across multiple systems. Economics S increased compleDity and a more eDtensive inrastructure means eDtra la&our costs. 2ecurity S remote data&ase ragments must &e secured' and t/ey are not centraliEed so t/e remote sites must &e secured as $ell. @/e inrastructure must also &e secured Be.g.' &y encrypting t/e net$ork links &et$een remote sitesC. "iicult to maintain integrity S in a distri&uted data&ase' enorcing integrity over a net$ork may re*uire too muc/ o t/e net$orkIs resources to &e easi&le. IneDperience S distri&uted data&ases are diicult to $ork $it/' and as a young ield t/ere is not muc/ readily availa&le eDperience on proper practice. Lack o standards O t/ere are no tools or met/odologies yet to /elp users convert a centraliEed "%M2 into a distri&uted "%M2. ,; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM "ata&ase design more compleD O &esides o t/e normal diiculties' t/e design o a distri&uted data&ase /as to consider ragmentation o data' allocation o ragments to speciic sites and data replication. ,.F R"'( T#" D't'$'!" ( r"'(@t#" d't'$'!" is a processing system designed to /andle $orkloads $/ose state is constantly c/anging B%uc/mannC. @/is diers rom traditional data&ases containing persistent data' mostly unaected &y time. ?or eDample' a stock market c/anges very rapidly and is dynamic. @/e grap/s o t/e dierent markets appear to &e very unsta&le and yet a data&ase /as to keep track o current values or all o t/e markets o t/e Ne$ Aork 2tock EDc/ange B3anitkarC. 7eal4time processing means t/at a transaction is processed ast enoug/ or t/e result to come &ack and &e acted on rig/t a$ay BCapronC. 7eal4time data&ases are useul or accounting' &anking' la$' medical records' multi4media' process control' reservation systems' and scientiic data analysis B2nodgrassC. (s computers increase in po$er and can store more data' t/ey are integrating t/emselves into our society and are employed in many applications. O&"r&"2 7eal4time data&ases are traditional data&ases t/at use an eDtension to give t/e additional po$er to yield relia&le responses. @/ey use timing constraints t/at represent a certain range o values or $/ic/ t/e data are valid. @/is range is called temporal validity. ( conventional data&ase cannot $ork under t/ese circumstances &ecause t/e inconsistencies &et$een t/e real $orld o&jects and t/e data t/at represents t/em are too severe or simple modiications. (n eective system needs to &e a&le to /andle time4sensitive *ueries' return only temporally valid data' and support priority sc/eduling. @o enter t/e data in t/e records' oten a sensor or an input device monitors t/e state o t/e p/ysical system and updates t/e data&ase $it/ ne$ inormation to relect t/e p/ysical system more accurately B(&&otC. W/en designing a real4time data&ase system' one s/ould consider /o$ to represent valid time' /o$ acts are associated $it/ real4time system. (lso' consider /o$ to represent attri&ute values in t/e data&ase so t/at process transactions and data consistency /ave no violations B(&&otC. W/en designing a system' it is important to consider $/at t/e system s/ould do $/en deadlines are not met. ?or eDample' an air4traic control system constantly monitors /undreds o aircrat and makes decisions a&out incoming lig/t pat/s and determines t/e order in $/ic/ aircrat s/ould land &ased on data suc/ as uel' altitude' and speed. I ,< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM any o t/is inormation is late' t/e result could &e devastating B2ivasankaranC. @o address issues o o&solete data' t/e timestamp can support transactions &y providing clear time reerences B2ivasankaranC. S=L DBMS I%M started $orking on a prototype system loosely &ased on CoddIs concepts as S)!t"# R in t/e early +:;9s S unortunately' 2ystem 7 $as conceived as a $ay o proving CoddIs ideas unimplementa&le' and t/us t/e project $as delivered to a group o programmers $/o $ere not under CoddIs supervision' never understood /is ideas ully and ended up violating several undamentals o t/e relational model. @/e irst M*uickieM version $as ready in +:;,-1' and $ork t/en started on multi4 ta&le systems in $/ic/ t/e data could &e &roken do$n so t/at all o t/e data or a record Bmuc/ o $/ic/ is oten optionalC did not /ave to &e stored in a single large Mc/unkM. 2u&se*uent multi4user versions $ere tested &y customers in +:;< and +:;:' &y $/ic/ time a standardiEed *uery language' 2FL' /ad &een added. CoddIs ideas $ere esta&lis/ing t/emselves as &ot/ $orka&le and superior to Codasyl' pus/ing I%M to develop a true production version o 2ystem 7' kno$n as S=LBDS' and' later' D't'$'!" + B"%8C. Many o t/e people involved $it/ ING7E2 &ecame convinced o t/e uture commercial success o suc/ systems' and ormed t/eir o$n companies to commercialiEe t/e $ork &ut $it/ an 2FL interace. 2y&ase' InormiD' Non2top 2FL and eventually Ingres itsel $ere all &eing sold as os/oots to t/e original ING7E2 product in t/e +:<9s. Even Microsot 2FL 2erver is actually a re4&uilt version o 2y&ase' and t/us' ING7E2. Only Larry EllisonKs Oracle started rom a dierent c/ain' &ased on I%MIs papers on 2ystem 7' and &eat I%M to market $/en t/e irst version $as released in +:;<. 2tone&raker $ent on to apply t/e lessons rom ING7E2 to develop a ne$ data&ase' !ostgres' $/ic/ is no$ kno$n as !ostgre2FL. !ostgre2FL is primarily used or glo&al mission critical applications Bt/e .org and .ino domain name registries use it as t/eir primary data store' as do many large companies and inancial institutionsC. In 2$eden' CoddIs paper $as also read and Mimer 2FL $as developed rom t/e mid4;9s at Uppsala University. In +:<,' t/is project $as consolidated into an independent enterprise. In t/e early +:<9s' Mimer introduced transaction /andling or /ig/ ro&ustness in applications' an idea t/at $as su&se*uently implemented on most ot/er "%M2. 8.: CONCLUSION ,: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM @/e evolution o data&ase models is continuous until a time an ideal model $ill emerge t/at $ill meet all t/e re*uirements o end users. @/is sound impossi&le &ecause t/ere can never &e a system t/at is completely ault4ree. @/us $e $ill yet see more o models o data&ase. @/e lat and /ierarc/ical models /ad set t/e tune or emerging models. 5.: SUMMARY In a /ierarc/ical model' data is organiEed into an inverted tree4 like structure' implying a multiple do$n$ard link in eac/ node to descri&e t/e nesting' and a sort ield to keep t/e records in a particular order in eac/ same4level list. In t/e net$ork model' records can participate in any num&er o named relations/ips. Eac/ relations/ip associates a record o one type Bcalled t/e o2n"rC $it/ multiple records o anot/er type Bcalled t/e #"#$"rC. (n o&ject4relational data&ase BO7"C or o&ject4relational data&ase management system BO7"%M2C is a data&ase management system B"%M2C similar to a relational data&ase' &ut $it/ an o&ject4oriented data&ase model5 o&jects' classes and in/eritance are directly supported in data&ase sc/emas and in t/e *uery language. In an o$%"ct d't'$'!" Balso o$%"ct or"nt"d d't'$'!"C' inormation is represented in t/e orm o o&jects as used in o&ject4oriented programming. @/e '!!oc't&" #od"( o/ d't' is an alternative data model or data&ase systems. Ot/er data models' suc/ as t/e relational model and t/e o&ject data model' are record4&ased. ( co(u#n@or"nt"d DBMS is a data&ase management system B"%M2C $/ic/ stores its content &y column rat/er t/an &y ro$. @/is /as advantages or data&ases suc/ as data $are/ouses and li&rary catalogues' $/ere aggregates are computed over large num&ers o similar data items N'&-'ton'( d't'$'!"! are c/aracteriEed &y t/e act t/at o&jects in t/e data&ase are ound primarily &y ollo$ing reerences rom ot/er o&jects. ( d!tr$ut"d d't'$'!" is a data&ase t/at is under t/e control o a central data&ase management system B"%M2C in $/ic/ storage devices are not all attac/ed to a common C!U ( real4time data&ase is a processing system designed to /andle $orkloads $/ose state is constantly c/anging B%uc/mannC. @/is diers rom traditional data&ases containing persistent data' mostly unaected &y time ?.: TUTOR@MARAED ASSIGNMENT +. Mention 1 models o data&ases 19 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 8. %riely discuss t/e advantages and disadvantages o distri&uted data&ases 7.: REFERENCESBFURTCER READINGS C/arles W. %ac/man' The (rogrammer as 'avigator. (CM @uring ($ard Lecture' Communications o t/e (CM' 0olume +.' Issue ++' +:;=' pp. .1=4.1<' I22N 999+49;<8' doi5 +9.++,1-=11.++.=.81=,. 2tone&raker' Mic/ael $it/ Moore' "orot/y. 2b%ect#)elational D$M0s: The 'ext reat 8ave. Morgan 3aumann !u&lis/ers' +::.. I2%N +411<.94=:;48. @/ere $as' at t/e @ime' 2ome "ispute W/et/er t/e @erm $as coined &y Mic/ael 2tone&raker o Illustra or Won 3im o Uni2FL. 3im' Won. Introduction to 2b%ect#2riented Databases. @/e MI@ !ress' +::9. I2%N 948.84+++8,4+. %ancil/on' ?rancoisJ "elo&el' ClaudeJ and 3anellakis' !aris. $uilding an 2b%ect#2riented Database 0ystem: The 0tory of 2 / . Morgan 3aumann !u&lis/ers' +::8. I2%N +411<.94+.:4,. C42tore5 ( column4oriented "%M2' 2tone&raker et al' !roceedings o t/e =+st 0L"% Conerence' @rond/eim' Nor$ay' 8991 %]a^e$icE' #acekJ 3r_liko$ski' \&ysEkoJ MorEy' @adeusE B899=C. 9andbook on Data Management in Information 0ystems. 2pringer' pp. +<. I2%N =1,9,=<:=:. M. @. OEsu and !. 0aldurieE' (rinciples of Distributed Databases B8nd editionC' !rentice4)all' I2%N 94+=4.1:;9;4. ?ederal 2tandard +9=;C. Elmasri and Navat/e' 4undamentals of Database 0ystems B=rd editionC' (ddison4Wesley Longman' I2%N 9489+41,8.=4=. (&&ot' 7o&ert 3.' and )ector Garcia4Molina. 2c/eduling 7eal4@ime @ransactions5 a !erormance Evaluation. 2tanord University and "igital E*uipment Corp. (CM' +::8. += "ec. 899. . %uc/mann' (. M7eal @ime "ata&ase 2ystems.M Encyclopedia o "ata&ase @ec/nologies and (pplications. Ed. Laura C. 7ivero' #orge ). "oorn' and 0iviana E. ?erraggine. Idea Group' 8991. 1+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 2tankovic' #o/n (.' Marco 2puri' 3rit/i 7amamrit/am' and Giorgio C. %uttaEEo. "eadline 2c/eduling or 7eal4@ime 2ystems5 E"? and 7elated (lgorit/ms. 2pringer' +::<. UNIT 5 DATABASE MODELS: RELATIONAL MODEL CONTENTS +.9 Introduction 8.9 O&jectives 18 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM =.9 Main Content =.+ T1" Mod"( =.8 Int"r<r"t'ton =.= A<<(c'ton to D't'$'!"! =., A(t"rn't&"! to t1" R"('ton'( Mod"( =.1 C!tor) =.. S=L 'nd t1" R"('ton'( Mod"( =.; I#<("#"nt'ton =.< Contro&"r!"! =.: D"!-n ,.*: S"t@T1"or"tc For#u('ton ,.** A") Con!tr'nt! 'nd Functon'( D"<"nd"nc"! ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION @/e relational model or data&ase management is a data&ase model &ased on irst4order predicate logic' irst ormulated and proposed in +:.: &y Edgar Codd Its core idea is to descri&e a data&ase as a collection o predicates over a inite set o predicate varia&les' descri&ing constraints on t/e possi&le values and com&inations o values. @/e content o t/e data&ase at any given time is a inite model BlogicC o t/e data&ase' i.e. a set o r"('ton!' one per predicate varia&le' suc/ t/at all predicates are satisied. ( re*uest or inormation rom t/e data&ase Ba data&ase *ueryC is also a predicate. @/e purpose o t/e relational model is to provide a declarative met/od or speciying data and *ueries5 $e directly state $/at inormation t/e data&ase contains and $/at inormation $e $ant rom it' and let t/e data&ase management system sot$are take care o descri&ing data structures or storing t/e data and retrieval procedures or getting *ueries ans$ered. I%M implemented CoddIs ideas $it/ t/e "%8 data&ase management systemJ it introduced t/e 2FL data deinition and *uery language. Ot/er relational data&ase management systems ollo$ed' most o t/em using 2FL as $ell. ( table in an 2FL data&ase sc/ema corresponds to a predicate varia&leJ t/e contents o a ta&le to a relationJ key constraints' ot/er constraints' and 2FL *ueries correspond to predicates. )o$ever' it must &e noted t/at 2FL data&ases' including "%8' deviate rom t/e relational model in many detailsJ Codd iercely argued against 1= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM deviations t/at compromise t/e original principles. Z +.: OB;ECTIVES (t t/e end o t/is unit' t/e you s/ould &e a&le to5 deine relational model o data&ase understand and eDplain t/e concept &e/ind relational models ans$er t/e *uestion o /o$ to interpret a relational data&ase model kno$ t/e various applications o relational data&ase compare relational model $it/ t/e structured *uery language B2FLC kno$ t/e constraints and controversies associated $it/ relational data&ase model. F-ur" *: R"('ton'( Structur" "epartment @a&le D"<tno Dn'#" D(oc D#-r "ept ( "ept % "ept C E#<(o)"" T'$(" E#<no En'#" Ett(" E!'('r) D"<tno E#< * D"<t A E#< + D"<t B E#< , D"<t C E#< 8 D"<t D E#< 5 D"<t E E#< ? D"<t F ,.: MAIN CONTENT ,.* T1" Mod"( @/e undamental assumption o t/e relational model is t/at all data is represented as mat/ematical n4ary r"('ton!' an n4ary relation &eing a su&set o t/e Cartesian product o n domains. In t/e mat/ematical model' reasoning a&out suc/ data is done in t$o4valued predicate logic' meaning t/ere are t$o possi&le evaluations or eac/ proposition5 eit/er 1, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM true or false Band in particular no t/ird value suc/ as unkno1n' or not applicable' eit/er o $/ic/ are oten associated $it/ t/e concept o NULLC. 2ome t/ink t$o4valued logic is an important part o t/e relational model' $/ere ot/ers t/ink a system t/at uses a orm o t/ree4 valued logic can still &e considered relational Z "ata are operated upon &y means o a relational calculus or relational alge&ra' t/ese &eing e*uivalent in eDpressive po$er. @/e relational model o data permits t/e data&ase designer to create a consistent' logical representation o inormation. Consistency is ac/ieved &y including declared constraints in t/e data&ase design' $/ic/ is usually reerred to as t/e logical sc/ema. @/e t/eory includes a process o data&ase normaliEation $/ere&y a design $it/ certain desira&le properties can &e selected rom a set o logically e*uivalent alternatives. @/e access plans and ot/er implementation and operation details are /andled &y t/e "%M2 engine' and are not relected in t/e logical model. @/is contrasts $it/ common practice or 2FL "%M2s in $/ic/ perormance tuning oten re*uires c/anges to t/e logical model. @/e &asic relational &uilding &lock is t/e domain or data type' usually a&&reviated no$adays to type. ( tuple is an unordered set o attribute values. (n attri&ute is an ordered pair o attribute name and type name. (n attri&ute value is a speciic valid value or t/e type o t/e attri&ute. @/is can &e eit/er a scalar value or a more compleD type. ( relation consists o a heading and a body. ( /eading is a set o attri&utes. ( &ody Bo an n4ary relationC is a set o n4tuples. @/e /eading o t/e relation is also t/e /eading o eac/ o its tuples. ( relation is deined as a set o n4tuples. In &ot/ mat/ematics and t/e relational data&ase model' a set is an unordered collection o items' alt/oug/ some "%M2s impose an order to t/eir data. In mat/ematics' a tuple /as an order' and allo$s or duplication. E.?. Codd originally deined tuples using t/is mat/ematical deinition. Later' it $as one o E.?. CoddKs great insig/ts t/at using attri&ute names instead o an ordering $ould &e so muc/ more convenient Bin generalC in a computer language &ased on relations. @/is insig/t is still &eing used today. @/oug/ t/e concept /as c/anged' t/e name MtupleM /as not. (n immediate and important conse*uence o t/is distinguis/ing eature is t/at in t/e relational model t/e Cartesian product &ecomes commutative. ( ta&le is an accepted visual representation o a relationJ a tuple is similar to t/e concept o ro1' &ut note t/at in t/e data&ase language 2FL t/e columns and t/e ro$s o a ta&le are ordered. 11 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ( relvar is a named varia&le o some speciic relation type' to $/ic/ at all times some relation o t/at type is assigned' t/oug/ t/e relation may contain Eero tuples. @/e &asic principle o t/e relational model is t/e Inormation !rinciple5 all inormation is represented &y data values in relations. In accordance $it/ t/is !rinciple' a relational data&ase is a set o relvars and t/e result o every *uery is presented as a relation. @/e consistency o a relational data&ase is enorced' not &y rules &uilt into t/e applications t/at use it' &ut rat/er &y constraints' declared as part o t/e logical sc/ema and enorced &y t/e "%M2 or all applications. In general' constraints are eDpressed using relational comparison operators' o $/ic/ just one' Mis su&set oM B C' is t/eoretically suicient. In practice' several useul s/ort/ands are eDpected to &e availa&le' o $/ic/ t/e most important are candidate key Breally' superkeyC and oreign key constraints. ,.+ Int"r<r"t'ton @o ully appreciate t/e relational model o data it is essential to understand t/e intended interpretation o a relation. @/e &ody o a relation is sometimes called its eDtension. @/is is &ecause it is to &e interpreted as a representation o t/e eDtension o some predicate' t/is &eing t/e set o true propositions t/at can &e ormed &y replacing eac/ ree varia&le in t/at predicate &y a name Ba term t/at designates somet/ingC. @/ere is a one4to4one correspondence &et$een t/e ree varia&les o t/e predicate and t/e attri&ute names o t/e relation /eading. Eac/ tuple o t/e relation &ody provides attri&ute values to instantiate t/e predicate &y su&stituting eac/ o its ree varia&les. @/e result is a proposition t/at is deemed' on account o t/e appearance o t/e tuple in t/e relation &ody' to &e true. Contrari$ise' every tuple $/ose /eading conorms to t/at o t/e relation &ut $/ic/ does not appear in t/e &ody is deemed to &e alse. @/is assumption is kno$n as t/e closed $orld assumption ?or a ormal eDposition o t/ese ideas' see t/e section S"t T1"or) For#u('ton' &elo$. ,., A<<(c'ton to D't'$'!"! ( t)<" as used in a typical relational data&ase mig/t &e t/e set o integers' t/e set o c/aracter strings' t/e set o dates' or t/e t$o &oolean values true and false' and so on. @/e corresponding t)<" n'#"! or 1. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM t/ese types mig/t &e t/e strings MintM' Mc/arM' MdateM' M&ooleanM' etc. It is important to understand' t/oug/' t/at relational t/eory does not dictate $/at types are to &e supportedJ indeed' no$adays provisions are eDpected to &e availa&le or user#defined types in addition to t/e built#in ones provided &y t/e system. Attr$ut" is t/e term used in t/e t/eory or $/at is commonly reerred to as a co(u#n. 2imilarly' t'$(" is commonly used in place o t/e t/eoretical term r"('ton Bt/oug/ in 2FL t/e term is &y no means synonymous $it/ relationC. ( ta&le data structure is speciied as a list o column deinitions' eac/ o $/ic/ speciies a uni*ue column name and t/e type o t/e values t/at are permitted or t/at column. (n 'ttr$ut" &'(u" is t/e entry in a speciic column and ro$' suc/ as M#o/n "oeM or M=1M. ( tu<(" is &asically t/e same t/ing as a ro2' eDcept in an 2FL "%M2' $/ere t/e column values in a ro$ are ordered. B@uples are not orderedJ instead' eac/ attri&ute value is identiied solely &y t/e 'ttr$ut" n'#" and never &y its ordinal position $it/in t/e tuple.C (n attri&ute name mig/t &e MnameM or MageM. ( r"('ton is a t'$(" structure deinition Ba set o column deinitionsC along $it/ t/e data appearing in t/at structure. @/e structure deinition is t/e 1"'dn- and t/e data appearing in it is t/e $od)' a set o ro$s. ( data&ase r"(&'r Brelation varia&leC is commonly kno$n as a $'!" t'$(". @/e /eading o its assigned value at any time is as speciied in t/e ta&le declaration and its &ody is t/at most recently assigned to it &y invoking some u<d't" o<"r'tor Btypically' IN2E7@' U!"(@E' or "ELE@EC. @/e /eading and &ody o t/e ta&le resulting rom evaluation o some *uery are determined &y t/e deinitions o t/e operators used in t/e eDpression o t/at *uery. BNote t/at in 2FL t/e /eading is not al$ays a set o column deinitions as descri&ed a&ove' &ecause it is possi&le or a column to /ave no name and also or t$o or more columns to /ave t/e same name. (lso' t/e &ody is not al$ays a set o ro$s &ecause in 2FL it is possi&le or t/e same ro$ to appear more t/an once in t/e same &ody.C ,.8 A(t"rn't&"! to t1" R"('ton'( Mod"( Ot/er models are t/e /ierarc/ical model and net$ork model. 2ome systems using t/ese older arc/itectures are still in use today in data centers $it/ /ig/ data volume needs or $/ere eDisting systems are so compleD and a&stract it $ould &e cost pro/i&itive to migrate to systems employing t/e relational modelJ also o note are ne$er o&ject4oriented 1; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM data&ases' even t/oug/ many o t/em are "%M24construction kits' rat/er t/an proper "%M2s. ( recent development is t/e O&ject47elation type4O&ject model' $/ic/ is &ased on t/e assumption t/at any act can &e eDpressed in t/e orm o one or more &inary relations/ips. @/e model is used in O&ject 7ole Modeling BO7MC' 7"?-Notation = BN=C and in Gellis/ Englis/. @/e relational model $as t/e irst ormal data&ase model. (ter it $as deined' inormal models $ere made to descri&e /ierarc/ical data&ases Bt/e /ierarc/ical modelC and net$ork data&ases Bt/e net$ork modelC. )ierarc/ical and net$ork data&ases eDisted before relational data&ases' &ut $ere only descri&ed as models after t/e relational model $as deined' in order to esta&lis/ a &asis or comparison. ,.5 C!tor) @/e relational model $as invented &y E.?. B@edC Codd as a general model o data' and su&se*uently maintained and developed &y C/ris "ate and )ug/ "ar$en among ot/ers. In @/e @/ird Maniesto Birst pu&lis/ed in +::1C "ate and "ar$en s/o$ /o$ t/e relational model can accommodate certain desired o&ject4oriented eatures. ,.? S=L 'nd t1" R"('ton'( Mod"( 2FL' initially pus/ed as t/e standard language or relational data&ases' deviates rom t/e relational model in several places. @/e current I2O 2FL standard doesnIt mention t/e relational model or use relational terms or concepts. )o$ever' it is possi&le to create a data&ase conorming to t/e relational model using 2FL i one does not use certain 2FL eatures. @/e ollo$ing deviations rom t/e relational model /ave &een noted in 2FL. Note t/at e$ data&ase servers implement t/e entire 2FL standard and in particular do not allo$ some o t/ese deviations. W/ereas NULL is nearly u&i*uitous' or eDample' allo$ing duplicate column names $it/in a ta&le or anonymous columns is uncommon. Du<(c't" Ro2! @/e same ro$ can appear more t/an once in an 2FL ta&le. @/e same tuple cannot appear more t/an once in a relation. Anon)#ou! Co(u#n! 1< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ( column in an 2FL ta&le can &e unnamed and t/us una&le to &e reerenced in eDpressions. @/e relational model re*uires every attri&ute to &e named and reerencea&le. Du<(c't" Co(u#n N'#"! @$o or more columns o t/e same 2FL ta&le can /ave t/e same name and t/ereore cannot &e reerenced' on account o t/e o&vious am&iguity. @/e relational model re*uires every attri&ute to &e reerencea&le. Co(u#n Ord"r S-n/c'nc" @/e order o columns in an 2FL ta&le is deined and signiicant' one conse*uence &eing t/at 2FLIs implementations o Cartesian product and union are &ot/ noncommutative. @/e relational model re*uires t/at t/ere s/ould &e o no signiicance to any ordering o t/e attri&utes o a relation. V"2! 2t1out CCECA OPTION Updates to a vie$ deined $it/out C)EC3 O!@ION can &e accepted &ut t/e resulting update to t/e data&ase does not necessarily /ave t/e eDpressed eect on its target. ?or eDample' an invocation o IN2E7@ can &e accepted &ut t/e inserted ro$s mig/t not all appear in t/e vie$' or an invocation o U!"(@E can result in ro$s disappearing rom t/e vie$. @/e relational model re*uires updates to a vie$ to /ave t/e same eect as i t/e vie$ $ere a &ase relvar. Co(u#n("!! T'$("! Unr"co-n>"d 2FL re*uires every ta&le to /ave at least one column' &ut t/ere are t$o relations o degree Eero Bo cardinality one and EeroC and t/ey are needed to represent eDtensions o predicates t/at contain no ree varia&les. NULL @/is special mark can appear instead o a value $/erever a value can appear in 2FL' in particular in place o a column value in some ro$. @/e deviation rom t/e relational model arises rom t/e act t/at t/e implementation o t/is ad hoc concept in 2FL involves t/e use o t/ree4 valued logic' under $/ic/ t/e comparison o NULL $it/ itsel does not yield true &ut instead yields t/e t/ird trut/ value' unkno1nJ similarly t/e comparison NULL $it/ somet/ing ot/er t/an itsel does not yield false &ut instead yields unkno1n. It is &ecause o t/is &e/aviour in comparisons t/at NULL is descri&ed as a mark rat/er t/an a value. @/e 1: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM relational model depends on t/e la$ o eDcluded middle under $/ic/ anyt/ing t/at is not true is alse and anyt/ing t/at is not alse is trueJ it also re*uires every tuple in a relation &ody to /ave a value or every attri&ute o t/at relation. @/is particular deviation is disputed &y some i only &ecause E.?. Codd /imsel eventually advocated t/e use o special marks and a ,4valued logic' &ut t/is $as &ased on /is o&servation t/at t/ere are t$o distinct reasons $/y one mig/t $ant to use a special mark in place o a value' $/ic/ led opponents o t/e use o suc/ logics to discover more distinct reasons and at least as many as +: /ave &een noted' $/ic/ $ould re*uire a 8+4valued logic. 2FL itsel uses NULL or several purposes ot/er t/an to represent Mvalue unkno$nM. ?or eDample' t/e sum o t/e empty set is NULL' meaning Eero' t/e average o t/e empty set is NULL' meaning undeined' and NULL appearing in t/e result o a LE?@ #OIN can mean Mno value &ecause t/ere is no matc/ing ro$ in t/e rig/t4/and operandM. Conc"<t! 2FL uses concepts Mta&leM' McolumnM' Mro$M instead o MrelvarM' Mattri&uteM' MtupleM. @/ese are not merely dierences in terminology. ?or eDample' a Mta&leM may contain duplicate ro$s' $/ereas t/e same tuple cannot appear more t/an once in a relation. ,.7 I#<("#"nt'ton @/ere /ave &een several attempts to produce a true implementation o t/e relational data&ase model as originally deined &y Codd and eDplained &y "ate' "ar$en and ot/ers' &ut none /ave &een popular successes so ar. 7el is one o t/e more recent attempts to do t/is. ,.8 Contro&"r!"! Codd /imsel' some years ater pu&lication o /is +:;9 model' proposed a t/ree4valued logic B@rue' ?alse' Missing or NULLC version o it in order to deal $it/ missing inormation' and in /is The )elational Model for Database Management 6ersion / B+::9C /e $ent a step urt/er $it/ a our4valued logic B@rue' ?alse' Missing &ut (pplica&le' Missing &ut Inapplica&leC version. %ut t/ese /ave never &een implemented' presuma&ly &ecause o attending compleDity. 2FLIs NULL construct $as intended to &e part o a t/ree4valued logic system' &ut ell s/ort o t/at due to logical errors in t/e standard and in its implementations. ,.F D"!-n .9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM "ata&ase normaliEation is usually perormed $/en designing a relational data&ase' to improve t/e logical consistency o t/e data&ase design. @/is trades o transactional perormance or space eiciency. @/ere are t$o commonly used systems o diagramming to aid in t/e visual representation o t/e relational model5 t/e entity4relations/ip diagram BE7"C' and t/e related I"E? diagram used in t/e I"E?+Q met/od created &y t/e U.2. (ir ?orce &ased on E7"s. @/e tree structure o data may enorce /ierarc/ical model organiEation' $it/ parent4c/ild relations/ip ta&le. ,.*: S"t@T1"or"tc For#u('ton %asic notions in t/e relational model are relation names and attribute names. We $ill represent t/ese as strings suc/ as M!ersonM and MnameM and $e $ill usually use t/e varia&les and a'b'c to range over t/em. (not/er &asic notion is t/e set o atomic values t/at contains values suc/ as num&ers and strings. Our irst deinition concerns t/e notion o tuple' $/ic/ ormaliEes t/e notion o ro$ or record in a ta&le5 Tu<(" ( tuple is a partial unction rom attri&ute names to atomic values. )eader ( /eader is a inite set o attri&ute names. !rojection @/e projection o a tuple t on a inite set o attri&utes A is. @/e neDt deinition deines relation $/ic/ ormaliEes t/e contents o a ta&le as it is deined in t/e relational model. R"('ton ( relation is a tuple B9'$C $it/ 9' t/e /eader' and $' t/e &ody' a set o tuples t/at all /ave t/e domain 9. 2uc/ a relation closely corresponds to $/at is usually called t/e eDtension o a predicate in irst4order logic eDcept t/at /ere $e identiy t/e places in t/e predicate $it/ attri&ute names. Usually in t/e relational model a data&ase sc/ema is said to consist o a set o relation names' t/e .+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM /eaders t/at are associated $it/ t/ese names and t/e constraints t/at s/ould /old or every instance o t/e data&ase sc/ema. ,.** A") Con!tr'nt! 'nd Functon'( D"<"nd"nc"! One o t/e simplest and most important types o relation constraints is t/e key constraint. It tells us t/at in every instance o a certain relational sc/ema t/e tuples can &e identiied &y t/eir values or certain attri&utes. 8.: CONCLUSION @/e evolution o t/e relational model o data&ase and data&ase management systems is signiicant in t/e /istory and development o data&ase and data&ase management systems. @/is concept pioneered &y Edgar Codd &roug/t an entirely and muc/ eicient $ay o storing and retrieving data' especially or a large data&ase. @/is concept emp/asiEed t/e use o ta&les and t/en linking t/e ta&les t/roug/ commands. Most o todayKs data&ase management systems implements t/e relational model 5.: SUMMARY @/e relational model or data&ase management is a data&ase model &ased on irst4order predicate logic' irst ormulated and proposed in +:.: &y Edgar Codd @/e undamental assumption o t/e relational model is t/at all data is represented as mat/ematical n4ary r"('ton!' an n4ary relation &eing a su&set o t/e Cartesian product o n domains. @o ully appreciate t/e relational model o data it is essential to understand t/e intended interpretation o a relation. ( t)<" as used in a typical relational data&ase mig/t &e t/e set o integers' t/e set o c/aracter strings' t/e set o dates' or t/e t$o &oolean values true and false' and so on Ot/er models are t/e /ierarc/ical model and net$ork model. 2ome systems using t/ese older arc/itectures are still in use today in data centers @/e relational model $as invented &y E.?. B@edC Codd as a general model o data' and su&se*uently maintained and developed &y C/ris "ate and )ug/ "ar$en among ot/ers. 2FL' initially pus/ed as t/e standard language or relational data&ases' deviates rom t/e relational model in several places. @/ere /ave &een several attempts to produce a true implementation o t/e relational data&ase model as originally deined &y Codd and eDplained &y "ate' "ar$en and ot/ers' &ut none /ave &een popular successes so ar .8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM "ata&ase normaliEation is usually perormed $/en designing a relational data&ase' to improve t/e logical consistency o t/e data&ase design %asic notions in t/e relational model are relation names and attribute names. One o t/e simplest and most important types o relation constraints is t/e key constraint. ?.: TUTOR@MARAED ASSIGNMENT +. %riely discuss Interpretation in 7elational Model. 8. Mention 1 $ays in $/ic/ relational model diers rom an 2FL 7.: REFERENCESBFURTCER READINGS :Derivability& )edundancy& and "onsistency of )elations 0tored in 7arge Data $anks:' E.?. Codd' I%M 7esearc/ 7eport' +:.:. :A )elational Model of Data for 7arge 0hared Data $anks:' in Communications o t/e (CM' +:;9. W/ite' Colin. In the $eginning: An )D$M0 9istory. @eradata MagaEine Online. 2eptem&er 899, edition. U7L5 /ttp5--$$$.teradata.com-t-page-+8;91;. Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. "ommunications of the A"M += B.C5 =;;O=<;. doi5 +9.++,1-=.8=<,.=.8.<1. "ate' C. #.' "ar$en' ). B8999C. 4oundation for 4uture Database 0ystems: The Third Manifesto' 8nd edition' (ddison4Wesley !roessional. I2%N 9489+4;9:8<4;. "ate' C. #. B899=C. Introduction to Database 0ystems. <t/ edition' (ddison4Wesley. I2%N 94=8+4+:;<,4,. UNIT ? BASIC COMPONENTS OF DBMS CONTENTS .= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM +.9 Introduction 8.9 O&jectives =.9 Main Content =.+ Concurrency Controls =.8 #ava "ata&ase Connectivity =.= Fuery OptimiEer =., Open "ata&ase Connectivity =.1 "ata "ictionary ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION @o &e discussed in t/ese units are t/e &asic components o any data&ase. @/ese components ensure proper control o data' access o data' *uery or data as $ell as met/ods o accessing data&ase management systems. +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 kno$ t/e rules guiding transaction (CI" kno$ $/at is concurrency control in data&ases mention t/e dierent met/ods o concurrency control deine and interpret t/e acronymn #"%C ans$er t/e *uestion o t/e types and drivers o #"%C deine *uery optimiEer' and its applications and cost estimation ,.: MAIN CONTENT ,.* Concurr"nc) Contro(! In data&ases' concurr"nc) contro( ensures t/at correct results or concurrent operations are generated' $/ile getting t/ose results as *uickly as possi&le. Concurr"nc) Contro( n D't'$'!"! Concurrency control in data&ase management systems B"%M2C ensures t/at data&ase transactions are perormed concurrently $it/out t/e concurrency violating t/e data integrity o a data&ase. EDecuted transactions s/ould ollo$ t/e (CI" rules' as descri&ed &elo$. @/e "%M2 must guarantee t/at only serialiEa&le Bunless 2erialiEa&ility is intentionally relaDedC' recovera&le sc/edules are generated. It also ., M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM guarantees t/at no eect o committed transactions is lost' and no eect o a&orted Brolled &ackC transactions remains in t/e related data&ase. Tr'n!'cton ACID Ru("! (tomicity 4 Eit/er t/e eects o all or none o its operations remain $/en a transaction is completed 4 in ot/er $ords' to t/e outside $orld t/e transaction appears to &e indivisi&le' atomic. Consistency 4 Every transaction must leave t/e data&ase in a consistent state. Isolation 4 @ransactions cannot interere $it/ eac/ ot/er. !roviding isolation is t/e main goal o concurrency control. "ura&ility 4 2uccessul transactions must persist t/roug/ cras/es. Concurr"nc) Contro( M"c1'n!# @/e main categories o concurrency control mec/anisms are5 O<t#!tc 4 "elay t/e sync/roniEation or a transaction until it is end $it/out &locking Bread' $riteC operations' and t/en a&ort transactions t/at violate desired sync/roniEation rules. P"!!#!tc 4 %lock operations o transaction t/at $ould cause violation o sync/roniEation rules. @/ere are several met/ods or concurrency control. (mong t/em5 @$o4p/ase locking 2trict t$o4p/ase locking Conservative t$o4p/ase locking IndeD locking Multiple granularity locking ( Lock is a data&ase system o&ject associated $it/ a data&ase o&ject Btypically a data itemC t/at prevents undesired Btypically sync/roniEation rule violatingC operations o ot/er transactions &y &locking t/em. "ata&ase system operations c/eck or lock eDistence' and /alt $/en noticing a lock type t/at is intended to &lock t/em. @/ere are also non4lock concurrency control met/ods' among t/em5 Conlict BserialiEa&ility' precedenceC grap/ c/ecking @imestamp ordering .1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM commitment ordering (lso Optimistic concurrency control met/ods typically do not use locks. (lmost all currently implemented lock4&ased and non4lock4&ased concurrency control mec/anisms guarantee sc/edules t/at are conlict serialiEa&le Bunless relaDed orms o serialiEa&ility are neededC. )o$ever' t/ere are many researc/ teDts encouraging vie$ serialiEa&le sc/edules or possi&le gains in perormance' especially $/en not too many conlicts eDist Band not too many a&orts o completely eDecuted transactions occurC' due to reducing t/e considera&le over/ead o &locking mec/anisms. Concurr"nc) Contro( n O<"r'tn- S)!t"#! Operating systems' especially real4time operating systems' need to maintain t/e illusion t/at many tasks are all running at t/e same time. 2uc/ multitasking is airly simple $/en all tasks are independent rom eac/ ot/er. )o$ever' $/en several tasks try to use t/e same resource' or $/en tasks try to s/are inormation' it can lead to conusion and inconsistency. @/e task o concurrent computing is to solve t/at pro&lem. 2ome solutions involve MlocksM similar to t/e locks used in data&ases' &ut t/ey risk causing pro&lems o t/eir o$n suc/ as deadlock. Ot/er solutions are lock4ree and $ait4ree algorit/ms. ,.+ ;'&' D't'$'!" Conn"ct&t) ;'&' D't'$'!" Conn"ct&t) B#"%CC is an (!I or t/e #ava programming language t/at deines /o$ a client may access a data&ase. It provides met/ods or *uerying and updating data in a data&ase. #"%C is oriented to$ards relational data&ases. O&"r&"2 #"%C /as &een part o t/e #ava 2tandard Edition since t/e release o #"3 +.+. @/e #"%C classes are contained in t/e #ava package java.s*l. 2tarting $it/ version =.9' #"%C /as &een developed under t/e #ava Community !rocess. #27 1, speciies #"%C =.9 Bincluded in #82E +.,C' #27 ++, speciies t/e #"%C 7o$set additions' and #27 88+ is t/e speciication o #"%C ,.9 Bincluded in #ava 2E .C. #"%C allo$s multiple implementations to eDist and &e used &y t/e same application. @/e (!I provides a mec/anism or dynamically loading t/e correct #ava packages and registering t/em $it/ t/e #"%C "river Manager. @/e "river Manager is used as a connection actory or .. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM creating #"%C connections. #"%C connections support creating and eDecuting statements. @/ese may &e update statements suc/ as 2FLIs C7E(@E' IN2E7@' U!"(@E and "ELE@E' or t/ey may &e *uery statements suc/ as 2ELEC@. (dditionally' stored procedures may &e invoked t/roug/ a #"%C connection. #"%C represents statements using one o t/e ollo$ing classes5 2tatement O t/e statement is sent to t/e data&ase server eac/ and every time. !repared2tatement O t/e statement is cac/ed and t/en t/e eDecution pat/ is pre determined on t/e data&ase server allo$ing it to &e eDecuted multiple times in an eicient manner. Calla&le2tatement O used or eDecuting stored procedures on t/e data&ase. Update statements suc/ as IN2E7@' U!"(@E and "ELE@E return an update count t/at indicates /o$ many ro$s $ere aected in t/e data&ase. @/ese statements do not return any ot/er inormation. Fuery statements return a #"%C ro$ result set. @/e ro$ result set is used to $alk over t/e result set. Individual columns in a ro$ are retrieved eit/er &y name or &y column num&er. @/ere may &e any num&er o ro$s in t/e result set. @/e ro$ result set /as metadata t/at descri&es t/e names o t/e columns and t/eir types. @/ere is an eDtension to t/e &asic #"%C (!I in t/e javaD.s*l package t/at allo$s or scrolla&le result sets and cursor support among ot/er t/ings. ;DBC Dr&"r! #"%C "rivers are client4side adaptors Bt/ey are installed on t/e client mac/ine' not on t/e serverC t/at convert re*uests rom #ava programs to a protocol t/at t/e "%M2 can understand. T)<"!: @/ere are commercial and ree drivers availa&le or most relational data&ase servers. @/ese drivers all into one o t/e ollo$ing types5 @ype +'t/e #"%C4O"%C &ridge @ype 8' t/e Native4(!I driver @ype =' t/e net$ork4protocol driver @ype , t/e native4protocol drivers .; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Internal #"%C driver' driver em&edded $it/ #7E in #ava4ena&led 2FL data&ases. Used or #ava stored procedures. @/is does not &elong to t/e a&ove classiication' alt/oug/ it $ould likely &e eit/er a type 8 or type , driver Bdepending on $/et/er t/e data&ase itsel is implemented in #ava or notC. (n eDample o t/is is t/e 3!7% driver supplied $it/ Oracle 7"%M2. Mjd&c5deault5connectionM is a relatively standard $ay o reerring making suc/ a connection Bat least Oracle and (pac/e "er&y support itC. @/e distinction /ere is t/at t/e #"%C client is actually running as part o t/e data&ase &eing accessed' so access can &e made directly rat/er t/an t/roug/ net$ork protocols. Sourc"! 2FL2ummit.com pu&lis/es list o drivers' including #"%C drivers and vendors 2un Microsystems provides a list o some #"%C drivers and vendors 2im&a @ec/nologies s/ips an 2"3 or &uilding custom #"%C "rivers or any custom-proprietary relational data source "ata"irect @ec/nologies provides a compre/ensive suite o ast @ype , #"%C drivers or all major data&ase I"2 2ot$are provides a @ype = #"%C driver or concurrent access to all major data&ases. 2upported eatures include resultset cac/ing' 22L encryption' custom data source' d&2/ield. i4net sot$are provides ast @ype , #"%C drivers or all major data&ases OpenLink 2ot$are s/ips #"%C "rivers or a variety o data&ases' including %ridges to ot/er data access mec/anisms Be.g.' O"%C' #"%CC $/ic/ can provide more unctionality t/an t/e targeted mec/anism #"%access is a #ava persistence li&rary or My2FL and Oracle $/ic/ deines major data&ase access operations in an easy usa&le (!I a&ove #"%C #Net"irect provides a suite o ully 2un #8EE certiied /ig/ perormance #"%C drivers. )2FLis a 7"%M2 $it/ a #"%C driver and is availa&le under a %2" license. ,., =u"r) O<t#>"r @/e Du"r) o<t#>"r is t/e component o a data&ase management system t/at attempts to determine t/e most eicient $ay to eDecute a *uery. @/e optimiEer considers t/e possi&le *uery plans or a given input *uery' and attempts to determine $/ic/ o t/ose plans $ill &e t/e most eicient. Cost4&ased *uery optimiEers assign an estimated McostM to eac/ possi&le *uery plan' and c/oose t/e plan $it/ t/e smallest cost. .< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Costs are used to estimate t/e runtime cost o evaluating t/e *uery' in terms o t/e num&er o I-O operations re*uired' t/e C!U re*uirements' and ot/er actors determined rom t/e data dictionary. @/e set o *uery plans eDamined is ormed &y eDamining t/e possi&le access pat/s Be.g. indeD scan' se*uential scanC and join algorit/ms Be.g. sort4merge join' /as/ join' nested loopsC. @/e searc/ space can &ecome *uite large depending on t/e compleDity o t/e 2FL *uery. @/e *uery optimiEer cannot &e accessed directly &y users. Instead' once *ueries are su&mitted to data&ase server' and parsed &y t/e parser' t/ey are t/en passed to t/e *uery optimiEer $/ere optimiEation occurs. I#<("#"nt'ton Most *uery optimiEers represent *uery plans as a tree o Mplan nodesM. ( plan node encapsulates a single operation t/at is re*uired to eDecute t/e *uery. @/e nodes are arranged as a tree' in $/ic/ intermediate results lo$ rom t/e &ottom o t/e tree to t/e top. Eac/ node /as Eero or more c/ild nodes 44 t/ose are nodes $/ose output is ed as input to t/e parent node. ?or eDample' a join node $ill /ave t$o c/ild nodes' $/ic/ represent t/e t$o join operands' $/ereas a sort node $ould /ave a single c/ild node Bt/e input to &e sortedC. @/e leaves o t/e tree are nodes $/ic/ produce results &y scanning t/e disk' or eDample &y perorming an indeD scan or a se*uential scan. Co!t E!t#'ton One o t/e /ardest pro&lems in *uery optimiEation is to accurately estimate t/e costs o alternative *uery plans. OptimiEers cost *uery plans using a mat/ematical model o *uery eDecution costs t/at relies /eavily on estimates o t/e cardinality' or num&er o tuples' lo$ing t/roug/ eac/ edge in a *uery plan. Cardinality estimation in turn depends on estimates o t/e selection actor o predicates in t/e *uery. @raditionally' data&ase systems estimate selectivities t/roug/ airly detailed statistics on t/e distri&ution o values in eac/ column' suc/ as /istograms @/is tec/ni*ue $orks $ell or estimation o selectivities o individual predicates. )o$ever many *ueries /ave conjunctions o predicates suc/ as select count B`C rom 7 $/ere 7.makeHI)ondaI and 7.modelHI(ccordI. Fuery predicates are oten /ig/ly correlated Bor eDample' modelHI(ccordI implies makeHI)ondaIC' and it is very /ard to estimate t/e selectivity o t/e conjunct in general. !oor cardinality estimates and uncaug/t correlation are one o t/e main reasons $/y *uery optimiEers pick poor *uery plans. @/is is one reason $/y a "%( s/ould regularly update t/e data&ase statistics' especially ater major data loads-unloads. .: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.8 O<"n D't'$'!" Conn"ct&t) In computing' O<"n D't'$'!" Conn"ct&t) BODBCC provides a standard sot$are (!I met/od or using data&ase management systems B"%M2C. @/e designers o O"%C aimed to make it independent o programming languages' data&ase systems' and operating systems. O&"r&"2 @/e !7(@(! speciication oers a procedural (!I or using 2FL *ueries to access data. (n implementation o O"%C $ill contain one or more applications' a core O"%C M"river ManagerM li&rary' and one or more Mdata&ase driversM. @/e "river Manager' independent o t/e applications and "%M2' acts as an MinterpreterM &et$een t/e applications and t/e data&ase drivers' $/ereas t/e data&ase drivers contain t/e "%M24speciic details. @/us a programmer can $rite applications t/at use standard types and eatures $it/out concern or t/e speciics o eac/ "%M2 t/at t/e applications may encounter. Like$ise' data&ase driver implementors need only kno$ /o$ to attac/ to t/e core li&rary. @/is makes O"%C modular. @o $rite O"%C code t/at eDploits "%M24speciic eatures re*uires more advanced programming5 an application must use introspection' calling O"%C metadata unctions t/at return inormation a&out supported eatures' availa&le types' syntaD' limits' isolation levels' driver capa&ilities and more. Even $/en programmers use adaptive tec/ni*ues' /o$ever' O"%C may not provide some advanced "%M2 eatures. @/e O"%C =.D (!I operates $ell $it/ traditional 2FL applications suc/ as OL@!' &ut it /as not evolved to support ric/er types introduced &y 2FL5 +::: and 2FL5899= O"%C provides t/e standard o u&i*uitous data access &ecause /undreds o O"%C drivers eDist or a large variety o data sources. O"%C operates $it/ a variety o operating systems and drivers eDist or non4relational data suc/ as spreads/eets' teDt and QML iles. %ecause O"%C dates &ack to +::8' it oers connectivity to a $ider variety o data sources t/an ot/er data4access (!Is. More drivers eDist or O"%C t/an drivers or providers eDist or ne$er (!Is suc/ as OLE "%' #"%C' and ("O.NE@. "espite t/e &eneits o u&i*uitous connectivity and platorm4 independence' systems designers may perceive O"%C as /aving certain dra$&acks. (dministering a large num&er o client mac/ines can involve a diversity o drivers and "LLs. @/is compleDity can increase system4administration over/ead. Large organiEations $it/ t/ousands o !Cs /ave oten turned to O"%C server tec/nology Balso kno$n as MMulti4@ier O"%C "riversMC to simpliy t/e administration pro&lems. ;9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM "ierences &et$een drivers and driver maturity can also raise important issues. Ne$er O"%C drivers do not al$ays /ave t/e sta&ility o drivers already deployed or years. Aears o testing and deployment mean a driver may contain e$er &ugs. "evelopers needing eatures or types not accessi&le $it/ O"%C can use ot/er 2FL (!Is. W/en not aiming or platorm4independence' developers can use proprietary (!Is' $/et/er "%M24speciic Bsuc/ as @ransact2FLC or language4speciic Bor eDample5 #"%C or #ava applicationsC. Brd-n- con/-ur'ton! ;DBC@ODBC Brd-"! ( #"%C4O"%C &ridge consists o a #"%C driver $/ic/ employs an O"%C driver to connect to a target data&ase. @/is driver translates #"%C met/od calls into O"%C unction calls. !rogrammers usually use suc/ a &ridge $/en a particular data&ase lacks a #"%C driver. 2un Microsystems included one suc/ &ridge in t/e #0M' &ut vie$ed it as a stop4gap measure $/ile e$ #"%C drivers eDisted. 2un never intended its &ridge or production environments' and generally recommends against its use. Independent data4access vendors no$ deliver #"%C4 O"%C &ridges $/ic/ support current standards or &ot/ mec/anisms' and $/ic/ ar outperorm t/e #0M &uilt4in. ODBC@;DBC Brd-"! (n O"%C4#"%C &ridge consists o an O"%C driver $/ic/ uses t/e services o a #"%C driver to connect to a data&ase. @/is driver translates O"%C unction calls into #"%C met/od calls. !rogrammers usually use suc/ a &ridge $/en t/ey lack an O"%C driver or a particular data&ase &ut /ave access to a #"%C driver. I#<("#"nt'ton! O"%C implementations run on many operating systems' including Microsot Windo$s' UniD' LinuD' O2-8' O2-,99' I%M i1-O2' and Mac O2 Q. )undreds o O"%C drivers eDist' including drivers or Oracle' "%8' Microsot 2FL 2erver' 2y&ase' !ervasive 2FL' I%M Lotus "omino' My2FL' !ostgre2FL' and desktop data&ase products suc/ as ?ileMaker' and Microsot (ccess. ,.5 D't' Dcton'r) ( d't' dcton'r)' as deined in t/e I$M Dictionary of "omputing is a ;+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM McentraliEed repository o inormation a&out data suc/ as meaning' relations/ips to ot/er data' origin' usage' and ormat. @/e term may /ave one o several closely related meanings pertaining to data&ases and data&ase management systems B"%M2C5 a document descri&ing a data&ase or collection o data&ases an integral component o a "%M2 t/at is re*uired to determine its structure a piece o middle$are t/at eDtends or supplants t/e native data dictionary o a "%M2 D't' Dcton'r) Docu#"nt'ton "ata&ase users and application developers can &eneit rom an aut/oritative data dictionary document t/at catalogs t/e organiEation' contents' and conventions o one or more data&ases @/is typically includes t/e names and descriptions o various ta&les and ields in eac/ data&ase' plus additional details' like t/e type and lengt/ o eac/ data element. @/ere is no universal standard as to t/e level o detail in suc/ a document' &ut it is primarily a distillation o metadata a&out data&ase structure' not t/e data itsel. ( data dictionary document also may include urt/er inormation descri&ing /o$ data elements are encoded. One o t/e advantages o $ell4designed data dictionary documentation is t/at it /elps to esta&lis/ consistency t/roug/out a compleD data&ase' or across a large collection o ederated data&ases D't' Dcton'r) Mdd("2'r" In t/e construction o data&ase applications' it can &e useul to introduce an additional layer o data dictionary sot$are' i.e. middle$are' $/ic/ communicates $it/ t/e underlying "%M2 data dictionary. 2uc/ a M/ig/4 levelM data dictionary may oer additional eatures and a degree o leDi&ility t/at goes &eyond t/e limitations o t/e native Mlo$4levelM data dictionary' $/ose primary purpose is to support t/e &asic unctions o t/e "%M2' not t/e re*uirements o a typical application. ?or eDample' a /ig/4level data dictionary can provide alternative entity4relations/ip models tailored to suit dierent applications t/at s/are a common data&ase. EDtensions to t/e data dictionary also can assist in *uery optimiEation against distri&uted data&ases 2ot$are rame$orks aimed at rapid application development sometimes include /ig/4level data dictionary acilities' $/ic/ can su&stantially reduce t/e amount o programming re*uired to &uild menus' orms' reports' and ot/er components o a data&ase application' including t/e data&ase itsel. ?or eDample' !)!Lens includes a !)! class li&rary to automate t/e creation o ta&les' indeDes' and oreign key constraints porta&ly or multiple data&ases. (not/er !)!4&ased data dictionary' part o t/e 7("ICO7E toolkit' automatically generates ;8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM program o&jects' scripts' and 2FL code or menus and orms $it/ data validation and compleD #OINs ?or t/e (2!.NE@ environment' %ase OneIs data dictionary provides cross4"%M2 acilities or automated data&ase creation' data validation' perormance en/ancement Bcac/ing and indeD utiliEationC' application security' and eDtended data types. 8.: CONCLUSION @/e &asic components o any data&ase management system serve to ensure t/e availa&ility o data as $ell as t/e eiciency in accessing t/e data. @/ey include mainly' a data dictionary' *uery optimiEers' and #ava data&ase connectivity. 5.: SUMMARY In data&ases' concurr"nc) contro( ensures t/at correct results or concurrent operations are generated' $/ile getting t/ose results as *uickly as possi&le. ;'&' D't'$'!" Conn"ct&t) B#"%CC is an (!I or t/e #ava programming language t/at deines /o$ a client may access a data&ase. It provides met/ods or *uerying and updating data in a data&ase. #"%C is oriented to$ards relational data&ases. @/e Du"r) o<t#>"r is t/e component o a data&ase management system t/at attempts to determine t/e most eicient $ay to eDecute a *uery. @/e optimiEer considers t/e possi&le *uery plans or a given input *uery' and attempts to determine $/ic/ o t/ose plans $ill &e t/e most eicient. In computing' O<"n D't'$'!" Conn"ct&t) BODBCC provides a standard sot$are (!I met/od or using data&ase management systems B"%M2C. @/e designers o O"%C aimed to make it independent o programming languages' data&ase systems' and operating systems. ( d't' dcton'r)' as deined in t/e I$M Dictionary of "omputing is a McentraliEed repository o inormation a&out data suc/ as meaning' relations/ips to ot/er data' origin' usage' and ormat In t/e construction o data&ase applications' it can &e useul to introduce an additional layer o data dictionary sot$are' i.e. middle$are' $/ic/ communicates $it/ t/e underlying "%M2 data dictionary ?.: TUTOR@MARAED ASSIGNMENT +. "eine t/e @ransaction (CI" rules. 8. List and deine types o #"%C "river. 7.: REFERENCESBFURTCER READINGS ;= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM (CM' I%M "ictionary o Computing' +9t/ edition' +::= @ec/@arget' 0earch02A' W/at is a "ata "ictionaryN ()IM( !ractice %rie' Guidelines or "eveloping a "ata "ictionary' ;ournal of A9IMA ;;' no.8 B?e&ruary 899.C5 .,(4". U.2. !atent ,;;,..+' "ata&ase management system $it/ active data dictionary' ++-+:-+:<1' (@L@ U.2. !atent ,;.:;;8' (utomated Fuery OptimiEation Met/od using &ot/ Glo&al and !arallel Local OptimiEations or MaterialiEation access !lanning or "istri&uted "ata&ases' 98-8<-+:<1' )oney$ell %ull. !)!Lens' ("Od& "ata "ictionary Li&rary or !)! 7("ICO7E' $/at is a "ata "ictionaryN %ase One International Corp.' %ase One "ata "ictionary C/aud/uri' 2urajit B+::<C. M(n Overvie$ o Fuery OptimiEation in 7elational 2ystemsU. (roceedings of the A"M 0ymposium on (rinciples of Database 0ystems5 pages =,O,=. doi5 +9.++,1-8;1,<;.8;1,:8. Ioannidis' Aannis BMarc/ +::.C. MFuery optimiEationM. A"M "omputing 0urveys +8 B+C5 +8+O+8=. doi5 +9.++,1-8=,=+=.8=,=.;. 2elinger' !atricia' et al. B+:;:C. M(ccess !at/ 2election in a 7elational "ata&ase Management 2ystemM. (roceedings of the !<5< A"M 0IM2D International "onference on Management of Data5 8=4=,. doi5+9.++,1-1<89:1.1<89::. !arkes' Clara ). B(pril +::.C. M!o$er to t/e !eopleM' D$M0 Maga3ine' Miller ?reeman' Inc. MODULE + Unit + "evelopment and "esign4O "ata&ase Unit 8 2tructured Fuery Languages B2FLC Unit = "ata&ase and Inormation 7elational 2ystems Unit , "ata&ase (dministrator and (dministration UNIT * DEVELOPMENT AND DESIGN@OF DATABASE ;, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM CONTENTS +.9 Introduction 8.9 O&jectives =.9 Main Content =.+ "ata&ase "evelopment =.+.+ "ata !lanning and "ata&ase "esign =.8 "esign o "ata&ase =.8.+ "ata&ase NormaliEation =.8.8 C!tor) ,., Nor#'( For#! ,.8 D"nor#'(>'ton ,.5 Non@/r!t nor#'( /or# 5NFG or N*NF6 ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION "ata&ase design is t/e process o deciding /o$ to organiEe data into recordstypes and /o$ t/e record types and /o$ t/e record types and /o$t/e record types $ill relate to eac/ ot/er. @/e "%M2 mirrorKs t/e organiEationKs data structure and process transactions eiciently. "eveloping small' personal data&ases is relatively easy using microcomputer "%M2 packages or $iEards. )o$ever' developing a large data&ase o compleD o compleD data types can &e a compleD task. In many companies' developing and managing large corporate data&ases are t/e primary responsi&ility o t/e data&ase administrator and data&ase design analysts. @/ey $ork $it/ end users and systems analyst to model &usiness processes and t/e data re*uired. @/en t/ey determine5 +. W/at data deinitions s/ould &e included in t/e data&ases 8. W/at structures or relations/ips s/ould eDist among t/e data elementsN +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 understand t/e concept o data planning and data&ase design kno$ t/e steps in t/e development o data&ases identiy t/e unctions o eac/ step o t/e design process deine data&ase normaliEation kno$ t/e pro&lems addressed &y normaliEations ;1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM deine normal orms rom + st to . t/ orms deine and understand t/e term denormaliEation ,.: MAIN CONTENT ,.* D't'$'!" D"&"(o<#"nt ,.*.* D't' P('nnn- 'nd D't'$'!" D"!-n (s igure + illustrates' data&ase development may start $it/ a top4do$n d't' <('nnn- <roc"!!. "ata&ase administrators and designers $ork $it/ corporate and end user management to develop an "nt"r<r!" #od"( t/at deines t/e &asic &usiness process o t/e enterprise. @/en t/ey deine t/e inormation needs o end4users in a &usiness process suc/ as t/e purc/asing- receiving process t/at all &usiness /as. NeDt' end users must identiy t/e key data elements t/at are needed to perorm t/e speciic &usiness activities. @/is re*uently involves developing entity relations/ips among t/e diagrams BE7"sC t/at model t/e relations/ips among t/e many entities involved in t/e &usiness processes. End users and data&ase designers could use E7" availa&le to identiy $/at suppliers and product data are re*uired to activate t/eir purc/asing-receiving and ot/er &usiness processes using enterprise resource planning BE7!C or supply c/ain management B2CMC sot$are. 2uc/ usersK vie$s are a major part o a d't' #od"(n- process $/ere t/e relations/ips &et$een data elements are identiied. Eac/ data model deines t/e logical relations/ips among t/e data elements needed to support a &asic &usiness process. ?or eDample' can a supplier provide more t/an t/e type o product to useN Can a customer /ave more t/an one type o product to useN Can a customer /ave more t/an one type o account $it/ usN Can an employee /ave several pay rates or &e assigned to several projects or $orkgroupN (ns$ering suc/ *uestions $ill identiy data relations/ips t/at /ave to &e represented in a data model t/at supports a &usiness process. @/ese data models t/en serves as logical rame$orks Bcalled sc/emas and su& sc/emasC on $/ic/ to &ase t/e p/ysical design o data&ases and t/e development o application programs to support &usiness processes o t/e organiEation. ( sc/ema is an overall logical vie$ o t/e relations/ip among t/e data elements in a data&ase' $/ile t/e su& sc/ema is a logical vie$ o t/e data relations/ips needed to support speciic end user application programs t/at $ill access t/at data&ase. 7emem&er t/at data models represent logical vie1s o data and relations/ips o t/e data&ase. !/ysical data&ase design takes a physical ;. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM vie1 o t/e data Balso called internal vie$C t/at descri&es /o$ data are to &e p/ysically stored and accessed on t/e storage devices o a computer system. ?or eDample' igure 8 illustrates t/ese dierent vie$s and t/e sot$are interace o a &ank data&ase processing system. In t/is eDample' c/ecking' saving and installment lending are t/e &usiness process $/ere data models are part o a &anking services data model t/at serves as a logical data rame$ork or all &ank services. ,.+ D"!-n o/ D't'$'!" ,.+.* D't'$'!" Nor#'(>'ton 2ometimes reerred to as canonical synthesis' is a tec/ni*ue or designing relational data&ase ta&les to minimiEe duplication o inormation and' in so doing' to saeguard t/e data&ase against certain types o logical or structural pro&lems' namely data anomalies. ?or eDample' $/en multiple instances o a given piece o inormation occur in a ta&le' t/e possi&ility eDists t/at t/ese instances $ill not &e kept consistent $/en t/e data $it/in t/e ta&le is updated' leading to a loss o data integrity. ( ta&le t/at is suiciently normaliEed is less vulnera&le to pro&lems o t/is kind' &ecause its structure relects t/e &asic assumptions or $/en multiple instances o t/e same inormation s/ould &e represented &y a single instance only. )ig/er degrees o normaliEation typically involve more ta&les and create t/e need or a larger num&er o joins' $/ic/ can reduce perormance. (ccordingly' more /ig/ly normaliEed ta&les are typically used in data&ase applications involving many isolated transactions Be.g. an (utomated teller mac/ineC' $/ile less normaliEed ta&les tend to &e used in data&ase applications t/at need to map compleD relations/ips &et$een data entities and data attri&utes Be.g. a reporting application' or a ull4 teDt searc/ applicationC. "ata&ase t/eory descri&es a ta&leIs degree o normaliEation in terms o normal orms o successively /ig/er degrees o strictness. ( ta&le in @/ird Normal ?orm B,NFC' or eDample' is conse*uently in 2econd Normal ?orm B+NFC as $ellJ &ut t/e reverse is not necessarily t/e case. F-ur" *: D't'$'!" D"&"(o<#"nt Structur" *. D't' P('nnn- "evelops a model o &usiness process !/ysical "ata Modes storage representation and access met/ods Enterprise models o %usiness process $it/ 2torage documentation 5. P1)!c'( D"!-n "etermines t/e data structures and process met/ods +. R"Dur"#"nt S<"c/c'ton "eine inormation needs o end Uses in a &usiness process Logical "ata Models e.g. relational' net$ork /ierarc/ical' multidimensional Or o&ject4oriented models 8. Lo-c'( D"!-n @ranslates t/e conceptual models into t/e data model o a "%M2 "escription o user needs May &e represented in natural Language or using t/e tools o !articular design met/odology Conceptual "ata Model Oten eDpressed as entity relations/ip models ,. Conc"<tu'( D"!-n EDpresses all inormation re*uirements in t/e orm o a /ig/4level model ;; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Not": "ata&ase development involves data planning and data&ase design activities. "ata models t/at support &usiness process are used to develop data&ases t/at meet t/e inormation needs o users. ;< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM F-ur" +: E7'#<("! o/ t1" (o-c'( 'nd <1)!c'( d't'$'!" &"2! 'nd t1" !o/t2'r" nt"r/'c" o/ ' $'n0n- !"r&c" n/or#'ton !)!t"#.
(lt/oug/ t/e normal orms are oten deined inormally in terms o t/e c/aracteristics o ta&les' rigorous deinitions o t/e normal orms are concerned $it/ t/e c/aracteristics o mat/ematical constructs kno$n as relations. W/enever inormation is represented relationally' it is meaningul to consider t/e eDtent to $/ic/ t/e representation is normaliEed. Pro$("#! 'ddr"!!"d $) nor#'(>'ton (n U<d't" Ano#'(). Employee 1+: is s/o$n as /aving dierent addresses on dierent records. (n In!"rton Ano#'(). Until t/e ne$ aculty mem&er is assigned to teac/ at least one course' /is details cannot &e recorded. In!t'((#"nt Lo'n A<<(c'ton S'&n-! C1"c0n- C/ecking and 2avings "ata Model Installment Loan "ata Model Lo-c'( u!"r V"2 "ata elements and relations Bt/e su& sc/emasC needed or c/ecking' savings' or installment loan processing %anking 2ervice "ata Model "ata&ase Management 2ystem D't' element and relations/ips Bt/e sc/emaC needed or support all &anking services So/t2'r" Int"r/'c" @/e "%M2 provides access to t/e &anks data&ases %ank "ata&ases P1)!c'( D't' V"2! organiEation and location o "ata on t/e storage media. ;: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ( D"("ton Ano#'(). (ll inormation a&out "r. Giddens is lost $/en /e temporarily ceases to &e assigned to any courses. ( ta&le t/at is not suiciently normaliEed can suer rom logical inconsistencies o various types' and rom anomalies involving data operations. In suc/ a ta&le5 @/e same inormation can &e eDpressed on multiple recordsJ t/ereore updates to t/e ta&le may result in logical inconsistencies. ?or eDample' eac/ record in an MEmployeesI 2killsM ta&le mig/t contain an Employee I"' Employee (ddress' and 2killJ t/us a c/ange o address or a particular employee $ill potentially need to &e applied to multiple records Bone or eac/ o /is skillsC. I t/e update is not carried t/roug/ successullySi' t/at is' t/e employeeIs address is updated on some records &ut not ot/ersSt/en t/e ta&le is let in an inconsistent state. 2peciically' t/e ta&le provides conlicting ans$ers to t/e *uestion o $/at t/is particular employeeIs address is. @/is p/enomenon is kno$n as an u<d't" 'no#'(). @/ere are circumstances in $/ic/ certain acts cannot &e recorded at all. ?or eDample' eac/ record in a M?aculty and @/eir CoursesM ta&le mig/t contain a ?aculty I"' ?aculty Name' ?aculty )ire "ate' and Course CodeSt/us $e can record t/e details o any aculty mem&er $/o teac/es at least one course' &ut $e cannot record t/e details o a ne$ly4/ired aculty mem&er $/o /as not yet &een assigned to teac/ any courses. @/is p/enomenon is kno$n as an n!"rton 'no#'(). @/ere are circumstances in $/ic/ t/e deletion o data representing certain acts necessitates t/e deletion o data representing completely dierent acts. @/e M?aculty and @/eir CoursesM ta&le descri&ed in t/e previous eDample suers rom t/is type o anomaly' or i a aculty mem&er temporarily ceases to &e assigned to any courses' $e must delete t/e last o t/e records on $/ic/ t/at aculty mem&er appears. @/is p/enomenon is kno$n as a d"("ton 'no#'(). Ideally' a relational data&ase ta&le s/ould &e designed in suc/ a $ay as to eDclude t/e possi&ility o update' insertion' and deletion anomalies. @/e normal orms o relational data&ase t/eory provide guidelines or deciding $/et/er a particular design $ill &e vulnera&le to suc/ anomalies. It is possi&le to correct an unnormaliEed design so as to make it ad/ere to t/e demands o t/e normal orms5 t/is is called normaliEation. 7emoval o redundancies o t/e ta&les $ill lead to several ta&les' $it/ reerential integrity restrictions &et$een t/em. <9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM NormaliEation typically involves decomposing an unnormaliEed ta&le into t$o or more ta&les t/at' $ere t/ey to &e com&ined BjoinedC' $ould convey eDactly t/e same inormation as t/e original ta&le. B'c0-round to nor#'(>'ton: d"/nton! Functon'( D"<"nd"nc)5 (ttri&ute % /as a unctional dependency on attri&ute ( i.e. A H B i' or eac/ value o attri&ute (' t/ere is eDactly one value o attri&ute %. I value o ( is repeating in tuples t/en value o % $ill also repeat. In our eDample' Employee (ddress /as a unctional dependency on Employee I"' &ecause a particular Employee I" value corresponds to one and only one Employee (ddress value. BNote t/at t/e reverse need not &e true5 several employees could live at t/e same address and t/ereore one Employee (ddress value could correspond to more t/an one Employee I". Employee I" is t/ereore not unctionally dependent on Employee (ddress.C (n attri&ute may &e unctionally dependent eit/er on a single attri&ute or on a com&ination o attri&utes. It is not possi&le to determine t/e eDtent to $/ic/ a design is normaliEed $it/out understanding $/at unctional dependencies apply to t/e attri&utes $it/in its ta&lesJ understanding t/is' in turn' re*uires kno$ledge o t/e pro&lem domain. ?or eDample' an Employer may re*uire certain employees to split t/eir time &et$een t$o locations' suc/ as Ne$ Aork City and London' and t/ereore $ant to allo$ Employees to /ave more t/an one Employee (ddress. In t/is case' Employee (ddress $ould no longer &e unctionally dependent on Employee I". Tr&'( Functon'( D"<"nd"nc)5 ( trivial unctional dependency is a unctional dependency o an attri&ute on a superset o itsel. WEmployee I"' Employee (ddressX a WEmployee (ddressX is trivial' as is WEmployee (ddressX a WEmployee (ddressX. Fu(( Functon'( D"<"nd"nc)5 (n attri&ute is ully unctionally dependent on a set o attri&utes Q i it is 4 unctionally dependent on Q' and 4 not unctionally dependent on any proper su&set o Q. WEmployee (ddressX /as a unctional dependency on WEmployee I"' 2killX' &ut not a full unctional dependency' &ecause is also dependent on WEmployee I"X. Tr'n!t&" D"<"nd"nc)5 ( transitive dependency is an indirect unctional dependency' one in $/ic/ =a> only &y virtue o =a? and ?a>. <+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Mu(t&'(u"d D"<"nd"nc)5 ( multivalued dependency is a constraint according to $/ic/ t/e presence o certain ro$s in a ta&le implies t/e presence o certain ot/er ro$s5 see t/e Multivalued "ependency article or a rigorous deinition. ;on D"<"nd"nc)5 ( ta&le T is su&ject to a join dependency i T can al$ays &e recreated &y joining multiple ta&les eac/ /aving a su&set o t/e attri&utes o T. Su<"rA")5 ( superkey is an attri&ute or set o attri&utes t/at uni*uely identiies ro$s $it/in a ta&leJ in ot/er $ords' t$o distinct ro$s are al$ays guaranteed to /ave distinct superkeys. WEmployee I"' Employee (ddress' 2killX $ould &e a superkey or t/e MEmployeesI 2killsM ta&leJ WEmployee I"' 2killX $ould also &e a superkey. C'ndd't" A")5 ( candidate key is a minimal superkey' t/at is' a superkey or $/ic/ $e can say t/at no proper su&set o it is also a superkey. WEmployee Id' 2killX $ould &e a candidate key or t/e MEmployeesI 2killsM ta&le. Non@Pr#" Attr$ut"5 ( non4prime attri&ute is an attri&ute t/at does not occur in any candidate key. Employee (ddress $ould &e a non4prime attri&ute in t/e MEmployeesI 2killsM ta&le. Pr#'r) A")5 Most "%M2s re*uire a ta&le to &e deined as /aving a single uni*ue key' rat/er t/an a num&er o possi&le uni*ue keys. ( primary key is a key $/ic/ t/e data&ase designer /as designated or t/is purpose. ,.+.+ C!tor) Edgar ?. Codd irst proposed t/e process o normaliEation and $/at came to &e kno$n as t/e *!t nor#'( /or#5 @/ere is' in act' a very simple elimination procedure $/ic/ $e s/all call normaliEation. @/roug/ decomposition non4simple domains are replaced &y Mdomains 1hose elements are atomic -non#decomposable. values.M SEdgar ?. Codd' ( 7elational Model o "ata or Large 2/ared "ata %anks In /is paper' Edgar ?. Codd used t/e term Mnon4simpleM domains to descri&e a /eterogeneous data structure' &ut later researc/ers $ould reer to suc/ a structure as an a&stract data type. ,., Nor#'( For#! @/e nor#'( /or#! Ba&&rev. NFC o relational data&ase t/eory provide criteria or determining a ta&leIs degree o vulnera&ility to logical inconsistencies and anomalies. @/e /ig/er t/e normal orm applica&le to a ta&le' t/e less vulnera&le it is to inconsistencies and anomalies. Eac/ <8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ta&le /as a M1-1"!t nor#'( /or#M BCNFC5 &y deinition' a ta&le al$ays meets t/e re*uirements o its )N? and o all normal orms lo$er t/an its )N?J also &y deinition' a ta&le ails to meet t/e re*uirements o any normal orm /ig/er t/an its )N?. Fr!t nor#'( /or#: ( ta&le is in irst normal orm B+N?C i and only i it represents a relation. Given t/at data&ase ta&les em&ody a relation4like orm' t/e deining c/aracteristic o one in irst normal orm is t/at it does not allo$ duplicate ro$s or nulls. 2imply put' a ta&le $it/ a uni*ue key B$/ic/' &y deinition' prevents duplicate ro$sC and $it/out any nulla&le columns is in +N?. S"cond nor#'( /or#: T1" crt"r' /or second normal orm 58N?6 'r": @/e ta&le must &e in +N?. None o t/e non4prime attri&utes o t/e ta&le are unctionally dependent on a part Bproper su&setC o a candidate keyJ in ot/er $ords' all unctional dependencies o non4prime attri&utes on candidate keys are ull unctional dependencies. ?or eDample' consider an MEmployeesI 2killsM ta&le $/ose attri&utes are Employee I"' Employee Name' and 2killJ and suppose t/at t/e com&ination o Employee I" and 2kill uni*uely identiies records $it/in t/e ta&le. Given t/at Employee Name depends on only one o t/ose attri&utes O namely' Employee I" O t/e ta&le is not in 8N?. In simple' a ta&le is 8N? i it is in +N? and all ields are dependant on t/e $/ole o t/e primary key' or a relation is in 8N? i it is in +N? and every non4key attri&ute is ully dependent on eac/ candidate key o t/e relation. Note t/at i none o a +N? ta&leIs candidate keys are composite O i.e. every candidate key consists o just on" attri&ute O t/en $e can say immediately t/at t/e ta&le is in 8N?. (ll columns must &e a act a&out t/e entire key' and not a su&set o t/e key.
T1rd Nor#'( For#: T1" crt"r' /or t/ird normal orm 5=N?6 'r": @/e ta&le must &e in 8N?. @ransitive dependencies must &e eliminated. (ll attri&utes must rely only on t/e primary key. 2o' i a data&ase /as a ta&le $it/ columns 2tudent I"' 2tudent' Company' and Company !/one Num&er' it is not in =N?. @/is is &ecause t/e !/one num&er relies on t/e Company. 2o' or it to &e in =N?' t/ere must &e a second ta&le $it/ Company and Company !/one Num&er columnsJ t/e !/one Num&er column in t/e irst ta&le $ould &e removed. <= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Fourt1 nor#'( /or#: ( ta&le is in ourt/ normal orm B,N?C i and only i' or every one o its non4trivial multivalued dependencies = ?' = is a superkeySt/at is' = is eit/er a candidate key or a superset t/ereo. ?or eDample' i you can /ave t$o p/one num&ers values and t$o email address values' t/en you s/ould not /ave t/em in t/e same ta&le. F/t1 nor#'( /or#: T1" crt"r' /or it/ normal orm 51N? 'nd '(!o !#-N?6 'r": @/e ta&le must &e in ,N?. @/ere must &e no non4trivial join dependencies t/at do not ollo$ rom t/e key constraints. ( ,N? ta&le is said to &e in t/e 1N? i and only i every join dependency in it is implied &y t/e candidate keys. Do#'nB0") Nor#'( For# Bor DANFC re*uires t/at a ta&le not &e su&ject to any constraints ot/er t/an domain constraints and key constraints. S7t1 Nor#'( For#: (ccording to t/e deinition &y C/ristop/er #. "ate and ot/ers' $/o eDtended data&ase t/eory to take account o temporal and ot/er interval data' a ta&le is in siDt/ normal orm B.N?C i and only i it satisies no non4trivial Bin t/e ormal senseC join dependencies at all' ' meaning t/at t/e it/ normal orm is also satisied. W/en reerring to MjoinM in t/is conteDt it s/ould &e noted t/at "ate et al. additionally use generaliEed deinitions o relational operators t/at also take account o interval data Be.g. rom4date to4dateC &y conceptually &reaking t/em do$n BMunpackingM t/emC into atomic units Be.g. individual daysC' $it/ deined rules or joining interval data' or instance. ,.8 D"nor#'(>'ton "ata&ases intended or Online @ransaction !rocessing BOL@!C are typically more normaliEed t/an data&ases intended or Online (nalytical !rocessing BOL(!C. OL@! (pplications are c/aracteriEed &y a /ig/ volume o small transactions suc/ as updating a sales record at a super market c/eckout counter. @/e eDpectation is t/at eac/ transaction $ill leave t/e data&ase in a consistent state. %y contrast' data&ases intended or OL(! operations are primarily Mread mostlyM data&ases. OL(! applications tend to eDtract /istorical data t/at /as accumulated over a long period o time. ?or suc/ data&ases' redundant or MdenormaliEedM data may acilitate %usiness Intelligence applications. 2peciically' dimensional ta&les in a star sc/ema oten contain denormaliEed data. @/e denormaliEed or redundant data must &e careully controlled during <, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM E@L processing' and users s/ould not &e permitted to see t/e data until it is in a consistent state. @/e normaliEed alternative to t/e star sc/ema is t/e sno$lake sc/ema. It /as never &een proven t/at t/is denormaliEation itsel provides any increase in perormance' or i t/e concurrent removal o data constraints is $/at increases t/e perormance. In many cases' t/e need or denormaliEation /as $aned as computers and 7"%M2 sot$are /ave &ecome more po$erul' &ut since data volumes /ave generally increased along $it/ /ard$are and sot$are perormance' OL(! data&ases oten still use denormaliEed sc/emas. "enormaliEation is also used to improve perormance on smaller computers as in computeriEed cas/4registers and mo&ile devices' since t/ese may use t/e data or look4up only Be.g. price lookupsC. "enormaliEation may also &e used $/en no 7"%M2 eDists or a platorm Bsuc/ as !almC' or no c/anges are to &e made to t/e data and a s$it response is crucial. ,.5 Non@/r!t nor#'( /or# 5NFG or N*NF6 In recognition t/at denormaliEation can &e deli&erate and useul' t/e non4irst normal orm is a deinition o data&ase designs $/ic/ do not conorm to t/e irst normal orm' &y allo$ing Msets and sets o sets to &e attri&ute domainsM B2c/ek +:<8C. @/is eDtension is a Bnon4optimalC $ay o implementing /ierarc/ies in relations. 2ome academics /ave du&&ed t/is practitioner developed met/od' M?irst (&4normal ?ormM' Codd deined a relational data&ase as using relations' so any ta&le not in +N? could not &e considered to &e relational. Consider t/e ollo$ing ta&le5 Non@Fr!t Nor#'( For# P"r!on F'&ort" Co(or! %o& &lue' red #ane green' yello$' red (ssume a person /as several avorite colors. O&viously' avorite colors consist o a set o colors modeled &y t/e given ta&le. @o transorm t/is N?b ta&le into a +N? an MunnestM operator is re*uired $/ic/ eDtends t/e relational alge&ra o t/e /ig/er normal orms. @/e reverse operator is called MnestM $/ic/ is not al$ays t/e mat/ematical inverse o MunnestM' alt/oug/ MunnestM is t/e mat/ematical inverse to MnestM. (not/er constraint re*uired is or t/e operators to &e &ijective' $/ic/ is covered &y t/e !artitioned Normal ?orm B!N?C. <1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 8.: CONCLUSION In t/e design and development o data&ase management systems' organiEations may use one kind o "%M2 or daily transactions' and t/en move t/e detail unto anot/er computer t/at uses anot/er "%M2 &etter suited or in*uiries and analysis. Overall systems design decisions are perormed &y data&ase administrators. @/e t/ree most common organiEations are /ierarc/ical' net$ork and relational models. ( "%M2 may provide one' t$o or all t/ree models in designing data&ase management systems. 5.: SUMMARY "ata&ase design is t/e process o deciding /o$ to organiEe data into records types and /o$ t/e record types $ill relate to eac/ ot/er "ata&ase development may start $it/ a top4do$n data planning process. "ata&ase administrators and designers $ork $it/ corporate and end user management to develop an enterprise model t/at deines t/e &asic &usiness process o t/e enterprise D't'$'!" nor#'(>'ton' sometimes reerred to as canonical synthesis' is a tec/ni*ue or designing relational data&ase ta&les to minimiEe duplication o inormation and' in so doing' to saeguard t/e data&ase against certain types o logical or structural pro&lems' namely data anomalies Edgar ?. Codd irst proposed t/e process o normaliEation and $/at came to &e kno$n as t/e *!t nor#'( /or#5 @/e nor#'( /or#! Ba&&rev. NFC o relational data&ase t/eory provide criteria or determining a ta&leIs degree o vulnera&ility to logical inconsistencies and anomalies. "ata&ases intended or Online @ransaction !rocessing BOL@!C are typically more normaliEed t/an data&ases intended or Online (nalytical !rocessing BOL(!C. OL@! (pplications are c/aracteriEed &y a /ig/ volume o small transactions suc/ as updating a sales record at a super market c/eckout counter. In recognition t/at denormaliEation can &e deli&erate and useul' t/e non4irst normal orm is a deinition o data&ase designs $/ic/ do not conorm to t/e irst normal orm' &y allo$ing Msets and sets o sets to &e attri&ute domainsM
?.: TUTOR@MARAED ASSIGNMENT +. Mention t/e 1 p/ases in t/e development o data&ase. 8. Identiy t/e criteria or t/e second normal orm B8N?C. 7.: REFERENCESBFURTCER READINGS <. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Codd' E.?. B#une +:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. "ommunications of the A"M *, B.C5 =;;O=<;. "ate' C.#. MW/at ?irst Normal ?orm 7eally MeansM in Date on Database: 8ritings /@@@#/@@A B2pringer40erlag' 899.C' p. +8<. Codd' E.?. MIs Aour "%M2 7eally 7elationalNM Computer$orld' Octo&er +,' +:<1. Coles' M. Sc S"#<"r Nu((. 899;. 2FL 2erver Central. 7edgate 2ot$are. 3ent' William. M( 2imple Guide to ?ive Normal ?orms in 7elational "ata&ase @/eoryM' "ommunications of the A"M +? B8C' ?e&. +:<=' pp. +894+81. Codd' E.?. M?urt/er NormaliEation o t/e "ata %ase 7elational Model.M B!resented at Courant Computer 2cience 2ymposia 2eries .' M"ata %ase 2ystems'M Ne$ Aork City' May 8,t/481t/' +:;+.C I%M 7esearc/ 7eport 7#:9: B(ugust =+st' +:;+C. 7epu&lis/ed in 7andall #. 7ustin Bed.C' Data $ase 0ystems: "ourant "omputer 0cience 0ymposia 0eries A. !rentice4)all' +:;8. Codd' E. ?. M7ecent Investigations into 7elational "ata %ase 2ystems.M I%M 7esearc/ 7eport 7#+=<1 B(pril 8=rd' +:;,C. 7epu&lis/ed in (roc. !<5B "ongress B2tock/olm' 2$eden' +:;,C. Ne$ Aork' N.A.5 Nort/4)olland B+:;,C. ?agin' 7onald B2eptem&er +:;;C. MMultivalued "ependencies and a Ne$ Normal ?orm or 7elational "ata&asesM. A"M Transactions on Database 0ystems + B+C5 8.;. doi5+9.++,1-=8911;.=891;+. "ate' C/ris #.J )ug/ "ar$en' Nikos (. LorentEos c#anuary 899=Z. MC/apter +9 "ata&ase "esign' 2ection +9.,5 2iDt/ Normal ?ormM' Temporal Data and the )elational Model: A Detailed Investigation into the Application of Interval and )elation Theory to the (roblem of Temporal Database Management. ODord5 Elsevier L@"' p+;.. I2%N +11<.9<11: OK%rien (. #ames' B899=C. B++ t/ EditionC. Introduction to Inormation 2ystems' McGr$4)ill. \imyani' E. B#une 899.C. M@emporal (ggregates and @emporal Universal Fuantiication in 2tandard 2FLM. A"M 0IM2D )ecord& volume CD& number /. (CM. <; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM UNIT + STRUCTURED =UERY LANGUAGE 5S=L6 CONTENTS +.9 Introduction 8.9 O&jectives ,.: M'n Cont"nt ,.* C!tor) ,.+ St'nd'rd>'ton ,., Sco<" 'nd E7t"n!on! =., L'n-u'-" E("#"nt! ,.5 Crtc!#! o/ S=L ,.? A(t"rn't&"! to S=L ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION S=L BStructur"d =u"r) L'n-u'-"C is a data&ase computer language designed or t/e retrieval and management o data in relational data&ase management systems B7"%M2C' data&ase sc/ema creation and modiication' and data&ase o&ject access control management. 2FL is a standard interactive and programming language or *uerying and modiying data and managing data&ases. (lt/oug/ 2FL is &ot/ an (N2I and an I2O standard' many data&ase products support 2FL $it/ proprietary eDtensions to t/e standard language. @/e core o 2FL is ormed &y a command language t/at allo$s t/e retrieval' insertion' updating' and deletion o data' and perorming management and administrative unctions. 2FL also includes a Call Level Interace B2FL-CLIC or accessing and managing data and data&ases remotely. @/e irst version o 2FL $as developed at I%M &y "onald ". C/am&erlin and 7aymond ?. %oyce in t/e early +:;9s. @/is version' initially called SE=UEL' $as designed to manipulate and retrieve data stored in I%MIs original relational data&ase product' 2ystem 7. @/e 2FL language $as later ormally standardiEed &y t/e (merican National 2tandards Institute B(N2IC in +:<.. 2u&se*uent versions o t/e 2FL standard /ave &een released as International OrganiEation or 2tandardiEation BI2OC standards. Originally designed as a declarative *uery and data manipulation language' variations o 2FL /ave &een created &y 2FL data&ase management system B"%M2C vendors t/at add procedural constructs' << M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM control4o4lo$ statements' user4deined data types' and various ot/er language eDtensions. Wit/ t/e release o t/e 2FL5 +::: standard' many suc/ eDtensions $ere ormally adopted as part o t/e 2FL language via t/e 2FL !ersistent 2tored Modules B2FL-!2MC portion o t/e standard. Common criticisms o 2FL include a perceived lack o cross4platorm porta&ility &et$een vendors' inappropriate /andling o missing data Bsee 'ull -0E7.' and unnecessarily compleD and occasionally am&iguous language grammar and semantics. S=L P'r'd-# Multi4paradigm A<<"'r"d n +:;, D"!-n"d $) "onald ". C/am&erlin and 7aymond ?. %oyce D"&"(o<"r I%M L't"!t r"("'!" 2FL5899.- 899. T)<n- d!c<(n" static' strong M'%or #<("#"nt'ton! Many D'("ct! 2FL4<.' 2FL4<:' 2FL4:8' 2FL5+:::' 2FL5 899=' 2FL5899. In/(u"nc"d $) "atalog In/(u"nc"d CFL' LINF' Windo$s !o$er2/ell OS Cross4platorm <: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 d"/n" !tructur" Du"r) ('n-u'-" 5S=L6 trace t/e /istory and development process o 2FL kno$ t/e scope and eDtension o 2FL identiy t/e vital indices o 2FL kno$ $/at are t/e language elements kno$ some o t/e criticism o 2FL ans$er t/e *uestion o alternatives to 2FL ,.: MAIN CONTENT ,.* C!tor) "uring t/e +:;9s' a group at I%M 2an #ose 7esearc/ La&oratory developed t/e 2ystem 7 relational data&ase management system' &ased on t/e model introduced &y Edgar ?. Codd in /is inluential paper' A R"('ton'( Mod"( o/ D't' /or L'r-" S1'r"d D't' B'n0!. "onald ". C/am&erlin and 7aymond ?. %oyce o I%M su&se*uently created t/e Structur"d En-(!1 =u"r) L'n-u'-" B2EFUELC to manipulate and manage data stored in 2ystem 7. @/e acronym 2EFUEL $as later c/anged to 2FL &ecause M2EFUELM $as a trademark o t/e U34&ased )a$ker 2iddeley aircrat company. @/e irst non4commercial non42FL 7"%M2' Ingres' $as developed in +:;, at t/e U.C. %erkeley. Ingres implemented a *uery language kno$n as FUEL' $/ic/ $as later supplanted in t/e marketplace &y 2FL. In t/e late +:;9s' 7elational 2ot$are' Inc. Bno$ Oracle CorporationC sa$ t/e potential o t/e concepts descri&ed &y Codd' C/am&erlin' and %oyce and developed t/eir o$n 2FL4&ased 7"%M2 $it/ aspirations o selling it to t/e U.2. Navy' CI(' and ot/er government agencies. In t/e summer o +:;:' 7elational 2ot$are' Inc. introduced t/e irst commercially availa&le implementation o 2FL' Oracle 08 B0ersion8C or 0(Q computers. 2racle 6/ &eat I%MIs release o t/e 2ystem-=< 7"%M2 to market &y a e$ $eeks. (ter testing 2FL at customer test sites to determine t/e useulness and practicality o t/e system' I%M &egan developing commercial products &ased on t/eir 2ystem 7 prototype including 2ystem-=<' 2FL-"2' and "%8' $/ic/ $ere commercially availa&le in +:;:' +:<+' and +:<=' respectively. :9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.+ St'nd'rd>'ton 2FL $as adopted as a standard &y (N2I in +:<. and I2O in +:<;. In t/e original 2FL standard' (N2I declared t/at t/e oicial pronunciation or 2FL is Mes *ueue elM. )o$ever' many Englis/4speaking data&ase proessionals still use t/e nonstandard pronunciation -dsiek$fl- Blike t/e $ord Mse*uelMC. 2EFUEL $as an earlier I%M data&ase language' a predecessor to t/e 2FL language. Until +::.' t/e National Institute o 2tandards and @ec/nology BNI2@C data management standards program $as tasked $it/ certiying 2FL "%M2 compliance $it/ t/e 2FL standard. In +::.' /o$ever' t/e NI2@ data management standards program $as dissolved' and vendors are no$ relied upon to sel4certiy t/eir products or compliance. @/e 2FL standard /as gone t/roug/ a num&er o revisions' as s/o$n &elo$5 Y"'r N'#" A('! Co##"nt! +:<. 2FL4<. 2FL4<; ?irst pu&lis/ed &y (N2I. 7atiied &y I2O in +:<;. +:<: 2FL4<: ?I!2 +8;4+ Minor revision' adopted as ?I!2 +8;4+. +::8 2FL4:8 2FL8' ?I!2 +8;48 Major revision BI2O :9;1C' +ntry 7evel 2FL4:8 adopted as ?I!2 +8;48. +::: 2FL5+::: 2FL= (dded regular eDpression matc/ing' recursive *ueries' triggers' support or procedural and control4o4lo$ statements' non4scalar types' and some o&ject4oriented eatures. 899= 2FL5899= Introduced QML4related eatures' 1indo1 functions' standardiEed se*uences' and columns $it/ auto4generated values Bincluding identity4columnsC. 899. 2FL5899. I2O-IEC :9;14+,5899. deines $ays in $/ic/ 2FL can &e used in conjunction $it/ QML. It deines $ays o importing and storing QML data in an 2FL data&ase' manipulating it $it/in t/e data&ase and pu&lis/ing &ot/ QML and conventional 2FL4data in QML orm. In addition' it provides acilities t/at permit applications to integrate into t/eir 2FL code t/e use o QFuery' t/e QML Fuery Language pu&lis/ed &y t/e World Wide We& Consortium BW=CC' to concurrently access ordinary 2FL4data and QML documents. :+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM @/e 2FL standard is not reely availa&le. 2FL5 899= and 2FL5 899. may &e purc/ased rom I2O or (N2I. ( late drat o 2FL5 899= is reely availa&le as a Eip arc/ive' /o$ever' rom W/itemars/ Inormation 2ystems Corporation. @/e Eip arc/ive contains a num&er o !"? iles t/at deine t/e parts o t/e 2FL5 899= speciication. ,., Sco<" 'nd E7t"n!on! Proc"dur'( E7t"n!on! 2FL is designed or a speciic purpose5 to *uery data contained in a relational data&ase. 2FL is a set4&ased' declarative *uery language' not an imperative language suc/ as C or %(2IC. )o$ever' t/ere are eDtensions to 2tandard 2FL $/ic/ add procedural programming language unctionality' suc/ as control4o4lo$ constructs. @/ese are5 Sourc" Co##on N'#" Fu(( N'#" (N2I-I2O 2tandard 2FL-!2M 2FL-!ersistent 2tored Modules I%M 2FL !L 2FL !rocedural Language Bimplements 2FL-!2MC Microsot- 2y&ase @42FL @ransact42FL My2FL 2FL-!2M 2FL-!ersistent 2tored Module Bas in I2O 2FL5899=C Oracle !L-2L !rocedural Language-2FL B&ased on (daC !ostgre2FL !L-pg2FL !rocedural Language-!ostgre2FL 2tructured Fuery Language B&ased on Oracle !L-2FLC !ostgre2FL !L-!2M !rocedural Language-!ersistent 2tored Modules Bimplements 2FL-!2MC In addition to t/e standard 2FL-!2M eDtensions and proprietary 2FL eDtensions' procedural and o&ject4oriented programma&ility is availa&le on many 2FL platorms via "%M2 integration $it/ ot/er languages. @/e 2FL standard deines 2FL-#7@ eDtensions B2FL 7outines and @ypes or t/e #ava !rogramming LanguageC to support #ava code in 2FL data&ases. 2FL 2erver 8991 uses t/e 2FLCL7 B2FL 2erver Common Language 7untimeC to /ost managed .NE@ assem&lies in t/e data&ase' $/ile prior versions o 2FL 2erver $ere restricted to using unmanaged eDtended stored procedures $/ic/ $ere primarily $ritten in C. Ot/er data&ase platorms' like My2FL and !ostgres' allo$ unctions to &e $ritten in a $ide variety o languages including !erl' !yt/on' @cl' and C. Addton'( E7t"n!on! 2FL5 899= also deines several additional eDtensions to t/e standard to increase 2FL unctionality overall. @/ese eDtensions include5 :8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM @/e 2FL-CLI' or C'((@L"&"( Int"r/'c"' eDtension is deined in I2O-IEC :9;14=5899=. @/is eDtension deines common interacing components Bstructures and proceduresC t/at can &e used to eDecute 2FL statements rom applications $ritten in ot/er programming languages. @/e 2FL-CLI eDtension is deined in suc/ a $ay t/at 2FL statements and 2FL-CLI procedure calls are treated as separate rom t/e calling applicationIs source code. @/e 2FL-ME"' or M'n'-"#"nt o/ E7t"rn'( D't'' eDtension is deined &y I2O-IEC :9;14:5899=. 2FL-ME" provides eDtensions to 2FL t/at deine oreign4data $rappers and datalink types to allo$ 2FL to manage eDternal data. EDternal data is data t/at is accessi&le to' &ut not managed &y' an 2FL4&ased "%M2. @/e 2FL-OL%' or O$%"ct L'n-u'-" Bndn-!' eDtension is deined &y I2O-IEC :9;14+95899=. 2FL-OL% deines t/e syntaD and symantics o 2FL#' $/ic/ is 2FL em&edded in #ava. @/e standard also descri&es mec/anisms to ensure &inary porta&ility o 2FL# applications' and speciies various #ava packages and t/eir contained classes. @/e 2FL-2c/emata' or In/or#'ton 'nd D"/nton Sc1"#'!' eDtension is deined &y I2O-IEC :9;14++5899=. 2FL-2c/emata deines t/e Inormation 2c/ema and "einition 2c/ema' providing a common set o tools to make 2FL data&ases and o&jects sel4descri&ing. @/ese tools include t/e 2FL o&ject identiier' structure and integrity constraints' security and aut/oriEation speciications' eatures and packages o I2O-IEC :9;1' support o eatures provided &y 2FL4&ased "%M2 implementations' 2FL4&ased "%M2 implementation inormation and siEing items' and t/e values supported &y t/e "%M2 implementations. @/e 2FL-#7@' or S=L Routn"! 'nd T)<"! /or t1" ;'&' Pro-r'##n- L'n-u'-"' eDtension is deined &y I2O-IEC :9;14+=5899=. 2FL-#7@ speciies t/e a&ility to invoke static #ava met/ods as routines rom $it/in 2FL applications. It also calls or t/e a&ility to use #ava classes as 2FL structured user4deined types. @/e 2FL-QML' or IML@R"('t"d S<"c/c'ton!' eDtension is deined &y I2O-IEC :9;14+,5899=. 2FL-QML speciies 2FL4&ased eDtensions or using QML in conjunction $it/ 2FL. @/e QML data type is introduced' as $ell as several routines' unctions' and QML4to42FL data type mappings to support manipulation and storage o QML in an 2FL data&ase. @/e 2FL-!2M' or P"r!!t"nt Stor"d Modu("!' eDtension is deined &y I2O-IEC :9;14,5899=. 2FL-!2M standardiEes procedural eDtensions or 2FL' including lo$ o control' condition /andling' statement condition := M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM signals and resignals' cursors and local varia&les' and assignment o eDpressions to varia&les and parameters. In addition' 2FL-!2M ormaliEes declaration and maintenance o persistent data&ase language routines Be.g.' Mstored proceduresMC. ,.8 L'n-u'-" E("#"nt! @/is c/art s/o$s several o t/e 2FL language elements t/at compose a single statement. @/e 2FL language is su&4divided into several language elements' including5 0tatements $/ic/ may /ave a persistent eect on sc/emas and data' or $/ic/ may control transactions' program lo$' connections' sessions' or diagnostics. Eueries $/ic/ retrieve data &ased on speciic criteria. +xpressions $/ic/ can produce eit/er scalar values or ta&les consisting o columns and ro$s o data. (redicates $/ic/ speciy conditions t/at can &e evaluated to 2FL t/ree4valued logic B=0LC %oolean trut/ values and $/ic/ are used to limit t/e eects o statements and *ueries' or to c/ange program lo$. "lauses' $/ic/ are in some cases optional' constituent components o statements and *ueries. W/itespace is generally ignored in 2FL statements and *ueries' making it easier to ormat 2FL code or reada&ility. 2FL statements also include t/e semicolon BMJMC statement terminator. @/oug/ not re*uired on every platorm' it is deined as a standard part o t/e 2FL grammar. =u"r"! @/e most common operation in 2FL data&ases is t/e *uery' $/ic/ is perormed $it/ t/e declarative 2ELEC@ key$ord. 2ELEC@ retrieves data rom a speciied ta&le' or multiple related ta&les' in a data&ase. W/ile oten grouped $it/ "ata Manipulation Language B"MLC statements' t/e standard 2ELEC@ *uery is considered separate rom 2FL "ML' as it /as no persistent eects on t/e data stored in a data&ase. Note t/at t/ere are some platorm4speciic variations o 2ELEC@ t/at can persist t/eir eects in a data&ase' suc/ as t/e 2ELEC@ IN@O syntaD t/at eDists in some data&ases. 2FL *ueries allo$ t/e user to speciy a description o t/e desired result set' &ut it is let to t/e devices o t/e data&ase management system B"%M2C to plan' optimiEe' and perorm t/e p/ysical operations :, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM necessary to produce t/at result set in as eicient a manner as possi&le. (n 2FL *uery includes a list o columns to &e included in t/e inal result immediately ollo$ing t/e 2ELEC@ key$ord. (n asterisk BM`MC can also &e used as a M$ildcardM indicator to speciy t/at all availa&le columns o a ta&le Bor multiple ta&lesC are to &e returned. 2ELEC@ is t/e most compleD statement in 2FL' $it/ several optional key$ords and clauses' including5 @/e ?7OM clause $/ic/ indicates t/e source ta&le or ta&les rom $/ic/ t/e data is to &e retrieved. @/e ?7OM clause can include optional #OIN clauses to join related ta&les to one anot/er &ased on user4speciied criteria. @/e W)E7E clause includes a comparison predicate' $/ic/ is used to restrict t/e num&er o ro$s returned &y t/e *uery. @/e W)E7E clause is applied &eore t/e G7OU! %A clause. @/e W)E7E clause eliminates all ro$s rom t/e result set $/ere t/e comparison predicate does not evaluate to @rue. @/e G7OU! %A clause is used to com&ine' or group' ro$s $it/ related values into elements o a smaller set o ro$s. G7OU! %A is oten used in conjunction $it/ 2FL aggregate unctions or to eliminate duplicate ro$s rom a result set. @/e )(0ING clause includes a comparison predicate used to eliminate ro$s ater t/e G7OU! %A clause is applied to t/e result set. %ecause it acts on t/e results o t/e G7OU! %A clause' aggregate unctions can &e used in t/e )(0ING clause predicate. @/e O7"E7 %A clause is used to identiy $/ic/ columns are used to sort t/e resulting data' and in $/ic/ order t/ey s/ould &e sorted Boptions are ascending or descendingC. @/e order o ro$s returned &y an 2FL *uery is never guaranteed unless an O7"E7 %A clause is speciied. D't' D"/nton @/e second group o key$ords is t/e "ata "einition Language B""LC. ""L allo$s t/e user to deine ne$ ta&les and associated elements. Most commercial 2FL data&ases /ave proprietary eDtensions in t/eir ""L' $/ic/ allo$ control over nonstandard eatures o t/e data&ase system. @/e most &asic items o ""L are t/e C7E(@E' (L@E7' 7EN(ME' @7UNC(@E and "7O! statements5 :1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM C7E(@E causes an o&ject Ba ta&le' or eDampleC to &e created $it/in t/e data&ase. "7O! causes an eDisting o&ject $it/in t/e data&ase to &e deleted' usually irretrieva&ly. @7UNC(@E deletes all data rom a ta&le Bnon4standard' &ut common 2FL statementC. (L@E7 statement permits t/e user to modiy an eDisting o&ject in various $ays 44 or eDample' adding a column to an eDisting ta&le. D't' Contro( @/e t/ird group o 2FL key$ords is t/e "ata Control Language B"CLC. "CL /andles t/e aut/oriEation aspects o data and permits t/e user to control $/o /as access to see or manipulate data $it/in t/e data&ase. Its t$o main key$ords are5 G7(N@ aut/oriEes one or more users to perorm an operation or a set o operations on an o&ject. 7E0O3E removes or restricts t/e capa&ility o a user to perorm an operation or a set o operations. ,.5 Crtc!#! o/ S=L @ec/nically' 2FL is a declarative computer language or use $it/ M2FL data&asesM. @/eorists and some practitioners note t/at many o t/e original 2FL eatures $ere inspired &y' &ut violated' t/e relational model or data&ase management and its tuple calculus realiEation. 7ecent eDtensions to 2FL ac/ieved relational completeness' &ut /ave $orsened t/e violations' as documented in The Third Manifesto. In addition' t/ere are also some criticisms a&out t/e practical use o 2FL5 Implementations are inconsistent and' usually' incompati&le &et$een vendors. In particular date and time syntaD' string concatenation' nulls' and comparison case sensitivity oten vary rom vendor to vendor. @/e language makes it too easy to do a Cartesian join Bjoining all possi&le com&inationsC' $/ic/ results in Mrun4a$ayM result sets $/en W)E7E clauses are mistyped. Cartesian joins are so rarely used in practice t/at re*uiring an eDplicit C(7@E2I(N key$ord may &e $arranted. 0E7 !<</ introduced t/e C7O22 #OIN key$ord t/at allo$s t/e user to make clear t/at a cartesian join is intended' &ut t/e s/ort/and Mcomma4 joinM $it/ no predicate is still accepta&le syntaD. :. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM It is also possi&le to misconstruct a W)E7E on an update or delete' t/ere&y aecting more ro$s in a ta&le t/an desired. @/e grammar o 2FL is per/aps unnecessarily compleD' &orro$ing a CO%OL4like key$ord approac/' $/en a unction4 inluenced syntaD could result in more re4use o e$er grammar and syntaD rules. @/is is per/aps due to I%MIs early goal o making t/e language more Englis/4like so t/at it is more approac/a&le to t/ose $it/out a mat/ematical or programming &ackground. B!redecessors to 2FL $ere more mat/ematical.C R"'!on! /or ('c0 o/ <ort'$(t) !opular implementations o 2FL commonly omit support or &asic eatures o 2tandard 2FL' suc/ as t/e "(@E or @IME data types' preerring variations o t/eir o$n. (s a result' 2FL code can rarely &e ported &et$een data&ase systems $it/out modiications. @/ere are several reasons or t/is lack o porta&ility &et$een data&ase systems5 @/e compleDity and siEe o t/e 2FL standard means t/at most data&ases do not implement t/e entire standard. @/e standard does not speciy data&ase &e/avior in several important areas Be.g. indeDes' ile storage...C' leaving it up to implementations o t/e data&ase to decide /o$ to &e/ave. @/e 2FL standard precisely speciies t/e syntaD t/at a conorming data&ase system must implement. )o$ever' t/e standardIs speciication o t/e semantics o language constructs is less $ell4deined' leading to areas o am&iguity. Many data&ase vendors /ave large eDisting customer &asesJ $/ere t/e 2FL standard conlicts $it/ t/e prior &e/avior o t/e vendorIs data&ase' t/e vendor may &e un$illing to &reak &ack$ard compati&ility. ,.? A(t"rn't&"! to S=L ( distinction s/ould &e made &et$een alternatives to relational *uery languages and alternatives to 2FL. @/e lists &elo$ are proposed alternatives to 2FL' &ut are still BnominallyC relational. 2ee navigational data&ase or alternatives to relational5 I%M %usiness 2ystem +8 BI%M %2+8C :; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM @utorial " )i&ernate Fuery Language B)FLC 4 ( #ava4&ased tool t/at uses modiied 2FL Fuel introduced in +:;, &y t/e U.C. %erkeley Ingres project. O&ject Fuery Language "atalog .FL 4 o&ject4oriented "atalog LINF FLC 4 Fuery Interace to Mnesia' E@2' "ets' etc BErlang programming languageC ," Fuery Language B," FLC F%E BFuery %y EDampleC created &y Mos/g \loo' I%M +:;; (ldat 7elational (lge&ra and "omain alge&ra 8.: CONCLUSION @/e structured *uery language B2FLC /as &ecome t/e oicial dominant language or $riting data&ase management system. @/is language diers rom conventional met/ods o computer language $riting' &ecause it is not necessarily procedural. (n 2FL statement is not really a command to computer &ut it is rat/er a description o some o t/e daatcotained in a data&ase. 2FL is not procedural &ecause it does not give step4&y4step commands to t/e computer or data&ase. It descri&es data and sometimes instructs t/e data&ase to do somet/ing $it/ t/e data. Irrespective o t/is' 2FL /as it o$n criticism. 5.: SUMMARY S=L BStructur"d =u"r) L'n-u'-"C is a data&ase computer language designed or t/e retrieval and management o data in relational data&ase management systems B7"%M2C' data&ase sc/ema creation and modiication' and data&ase o&ject access control management. "uring t/e +:;9s' a group at I%M 2an #ose 7esearc/ La&oratory developed t/e 2ystem 7 relational data&ase management system' &ased on t/e model introduced &y Edgar ?. Codd in /is inluential paper' A R"('ton'( Mod"( o/ D't' /or L'r-" S1'r"d D't' B'n0!. 2FL $as adopted as a standard &y (N2I in +:<. and I2O in +:<;. In t/e original 2FL standard' (N2I declared t/at t/e oicial pronunciation or 2FL is Mes *ueue elM. :< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 2FL is designed or a speciic purpose5 to *uery data contained in a relational data&ase. 2FL is a set4&ased' declarative *uery language' not an imperative language suc/ as C or %(2IC. @/is c/art s/o$s several o t/e 2FL language elements t/at compose a single statement. @ec/nically' 2FL is a declarative computer language or use $it/ M2FL data&asesM. @/eorists and some practitioners note t/at many o t/e original 2FL eatures $ere inspired &y' &ut violated' t/e relational model or data&ase management and its tuple calculus realiEation. ( distinction s/ould &e made &et$een alternatives to relational *uery languages and alternatives to 2FL ?.: TUTOR@MARAED ASSIGNMENT List and discuss t/e su&4divisions o t/e language o structures *uery language 7.: REFERENCESBFURTCER READINGS C/apple' Mike. M2FL ?undamentals B)@MLC. About.com: Databases. (&out.com. M2tructured Fuery Language B2FLCM B)@MLC. International %usiness Mac/ines BOcto&er 8;' 899.C. Codd' E.?. B#une +:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. "ommunications of the A"M *, BNo. .C5 pp. =;;O =<;. (ssociation or Computing Mac/inery. doi5 +9.++,1-=.8=<,.=.8.<1. C/am&erlin' "onald ".J %oyce' 7aymond ?. B+:;,C. M2EFUEL5 ( 2tructured Englis/ Fuery LanguageM. (roceedings of the !<5B A"M 0I4ID+T 8orkshop on Data Description& Access and "ontrol5 pp. 8,:O8.,. (ssociation or Computing Mac/inery. a
b Oppel' (ndy BMarc/ +' 899,C. Databases Demystified. 2an ?rancisco' C(5 McGra$4)ill Os&orne Media' pp. :94:+. I2%N 949;4881=.,4:. M)istory o I%M' +:;< B)@MLC. I$M Archives. I%M. :: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM C/apple' Mike BNC. M2FL ?undamentalsM B)@MLC. About.com. (&out.com' ( Ne$ Aork @imes Company. 7etrieved on 899;49<4=9. Melton' #imJ (lan 7 2imon B+::=C. *nderstanding the 'e1 0E7: A "omplete uide. Morgan 3aumann' 1=.. I2%N5 +11<.98,1=. Tc/apter +.8 W/at is 2FLN 2FL Bcorrectly pronounced Mess cue ell'M instead o t/e some$/at common Mse*uelMC' is a...U MUnderstand 2FLM. $$$.a*s.org-docs-. "oll' 2/elley B#une +:' 8998C. MIs 2FL a 2tandard (nymoreNM B)@MLC. Tech)epublicFs $uilder.com. @ec/7epu&lic. 7etrieved on 899;49.49:. I02GI+" <@5D#!!:/@@C: Information and Definition 0chemas -0E7G0chemata.' 899=' pp. p. +. (N2I-I2O-IEC International 2tandard BI2C. "ata&ase Language 2FLS !art 85 ?oundation B2FL-?oundationC. +:::. MIN@O Clause B@ransact42FLCM B)@MLC. 0E7 0erver /@@D $ooks 2nline. Microsot B899;C. 7etrieved on 899;49.4+;. M. Negri' G. !elagatti' L. 2&attella B+:<:C 0emantics and problems of universal quantification in 0E7. Claudio ?ratarcangeli B+::+C Technique for universal quantification in 0E7. #alal 3a$as/ "omplex quantification in 0tructured Euery 7anguage -0E7.: a Tutorial *sing )elational "alculus 4 #ournal o Computers in Mat/ematics and 2cience @eac/ing I22N 9;=+4:81< 0olume 8=' Issue 8' 899, ((CE Norolk' 0(. +99 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM UNIT , DATABASE AND INFORMATION SYSTEMS SECURITY CONTENTS +.9 Introduction 8.9 O&jectives ,.: M'n Cont"nt ,.* B'!c Prnc<("! =.8 "ata&ase 2ecurity =.= 7elational "%M2 2ecurity =., !roposed OO"%M2 2ecurity Models ,.5 S"curt) C('!!/c'ton /or In/or#'ton ,.? Cr)<to-r'<1) ,.7 D!'!t"r R"co&"r) P('nnn- ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION D't' !"curt) is t/e means o ensuring t/at data is kept sae rom corruption and t/at access to it is suita&ly controlled. @/us data security /elps to ensure privacy. It also /elps in protecting personal data. In/or#'ton !"curt) means protecting inormation and inormation systems rom unaut/oriEed access' use' disclosure' disruption' modiication' or destruction. @/e terms inormation security' computer security and inormation assurance are re*uently used interc/angea&ly. @/ese ields are interrelated and s/are t/e common goals o protecting t/e conidentiality' integrity and availa&ility o inormationJ /o$ever' t/ere are some su&tle dierences &et$een t/em. @/ese dierences lie primarily in t/e approac/ to t/e su&ject' t/e met/odologies used' and t/e areas o concentration. Inormation security is concerned $it/ t/e conidentiality' integrity and availa&ility o data regardless o t/e orm t/e data may take5 electronic' print' or ot/er orms. Governments' military' inancial institutions' /ospitals' and private &usinesses amass a great deal o conidential inormation a&out t/eir employees' customers' products' researc/' and inancial status. Most o t/is inormation is no$ collected' processed and stored on electronic computers and transmitted across net$orks to ot/er computers. 2/ould conidential inormation a&out a &usinesses customers or inances or ne$ product line all into t/e /ands o a competitor' suc/ a &reac/ o security could lead to lost &usiness' la$ suits or even &ankruptcy o t/e +9+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM &usiness. !rotecting conidential inormation is a &usiness re*uirement' and in many cases also an et/ical and legal re*uirement. ?or t/e individual' inormation security /as a signiicant eect on privacy' $/ic/ is vie$ed very dierently in dierent cultures. @/e ield o inormation security /as gro$n and evolved signiicantly in recent years. (s a career c/oice t/ere are many $ays o gaining entry into t/e ield. It oers many areas or specialiEation including Inormation 2ystems (uditing' %usiness Continuity !lanning and "igital ?orensics 2cience' to name a e$. +.: OB;ECTIVES (t t/e end o t/e unit' you s/ould &e a&le to5 understand t/e concepts o t/e CI( @rade in respect o inormation systems security kno$ t/e components o t/e "onn !arker model or t/e classic @riad identiy t/e dierent types o inormation access control and /o$ t/ey dier rom eac/ ot/er dierentiate "iscretionary and Mandatory (ccess Control !olicies kno$ t/e !roposed OO"%M2 2ecurity Models dierentiate &et$een t/e OO"%M2 models deining appropriate procedures and protection re*uirements or inormation security deine cryptograp/y and kno$ its applications in data security. ,.: MAIN CONTENT ,.* B'!c Prnc<("! ,.*.* A") Conc"<t! ?or over t$enty years inormation security /as /eld t/at conidentiality' integrity and availa&ility Bkno$n as t/e CI( @riadC are t/e core principles o inormation system security. Confidentiality Conidentiality is t/e property o preventing disclosure o inormation to unaut/oriEed individuals or systems. ?or eDample' a credit card transaction on t/e Internet re*uires t/e credit card num&er to &e transmitted rom t/e &uyer to t/e merc/ant and rom t/e merc/ant to a transaction processing net$ork. @/e system attempts to enorce conidentiality &y encrypting t/e card num&er during transmission' &y limiting t/e places $/ere it mig/t appear Bin data&ases' log iles' +98 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM &ackups' printed receipts' and so onC' and &y restricting access to t/e places $/ere it is stored. I an unaut/oriEed party o&tains t/e card num&er in any $ay' a &reac/ o conidentiality /as occurred. %reac/es o conidentiality take many orms. !ermitting someone to look over your s/oulder at your computer screen $/ile you /ave conidential data displayed on it could &e a &reac/ o conidentiality. I a laptop computer containing sensitive inormation a&out a companyIs employees is stolen or sold' it could result in a &reac/ o conidentiality. Giving out conidential inormation over t/e telep/one is a &reac/ o conidentiality i t/e caller is not aut/oriEed to /ave t/e inormation. Conidentiality is necessary B&ut not suicientC or maintaining t/e privacy o t/e people $/ose personal inormation a system /olds. Integrity In inormation security' integrity means t/at data cannot &e modiied $it/out aut/oriEation. B@/is is not t/e same t/ing as reerential integrity in data&ases.C Integrity is violated $/en an employee Baccidentally or $it/ malicious intentC deletes important data iles' $/en a computer virus inects a computer' $/en an employee is a&le to modiy /is o$n salary in a payroll data&ase' $/en an unaut/oriEed user vandaliEes a $e& site' $/en someone is a&le to cast a very large num&er o votes in an online poll' and so on. vailability ?or any inormation system to serve its purpose' t/e inormation must &e availa&le $/en it is needed. @/is means t/at t/e computing systems used to store and process t/e inormation' t/e security controls used to protect it' and t/e communication c/annels used to access it must &e unctioning correctly. )ig/ availa&ility systems aim to remain availa&le at all times' preventing service disruptions due to po$er outages' /ard$are ailures' and system upgrades. Ensuring availa&ility also involves preventing denial4o4service attacks. In 8998' "onn !arker proposed an alternative model or t/e classic CI( triad t/at /e called t/e siD atomic elements o inormation. @/e elements are conidentiality' possession' integrity' aut/enticity' availa&ility' and utility. @/e merits o t/e !arkerian /eDad are a su&ject o de&ate amongst security proessionals. ,.*.+ Aut1"ntct) +9= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM In computing' e4%usiness and inormation security it is necessary to ensure t/at t/e data' transactions' communications or documents Belectronic or p/ysicalC are genuine Bi.e. t/ey /ave not &een orged or a&ricated.C ,.*., Non@R"<ud'ton In la$' non4repudiation implies ones intention to ulill t/eir o&ligations to a contract. It also implies t/at one party o a transaction can not deny /aving received a transaction nor can t/e ot/er party deny /aving sent a transaction. Electronic commerce uses tec/nology suc/ as digital signatures and encryption to esta&lis/ aut/enticity and non4repudiation. ,.*.8 R!0 M'n'-"#"nt 2ecurity is everyoneKs responsi&ility. 2ecurity a$areness poster. U.2. "epartment o Commerce-Oice o 2ecurity. ( compre/ensive treatment o t/e topic o risk management is &eyond t/e scope o t/is article. We $ill /o$ever' provide a useul deinition o risk management' outline a commonly used process or risk management' and deine some &asic terminology. @/e CI2( 7evie$ Manual 899. provides t/e ollo$ing deinition o risk management5 :)isk management is the process of identifying vulnerabilities and threats to the information resources used by an organi3ation in achieving business ob%ectives& and deciding 1hat countermeasures& if any& to take in reducing risk to an acceptable level& based on the value of the information resource to the organi3ation.: @/ere are t$o t/ings in t/is deinition t/at may need some clariication. ?irst' t/e process o risk management is an ongoing iterative process. It must &e repeated indeinitely. @/e &usiness environment is constantly c/anging and ne$ t/reats and vulnera&ilities emerge every day. 2econd' t/e c/oice o countermeasures BcontrolsC used to manage risks must strike a &alance &et$een productivity' cost' eectiveness o t/e countermeasure' and t/e value o t/e inormational asset &eing protected. R!0 is t/e likeli/ood t/at somet/ing &ad $ill /appen t/at causes /arm to an inormational asset Bor t/e loss o t/e assetC. ( &u(n"r'$(t) is a $eakness t/at could &e used to endanger or cause /arm to an inormational asset. ( t1r"'t is anyt/ing Bman made or act o natureC t/at /as t/e potential to cause /arm. +9, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM @/e likeli/ood t/at a t/reat $ill use a vulnera&ility to cause /arm creates a risk. W/en a t/reat does use a vulnera&ility to inlict /arm' it /as an impact. In t/e conteDt o inormation security' t/e impact is a loss o availa&ility' integrity' and conidentiality' and possi&ly ot/er losses Blost income' loss o lie' loss o real propertyC. It s/ould &e pointed out t/at it is not possi&le to identiy all risks' nor is it possi&le to eliminate all risk. @/e remaining risk is called residual risk. ( risk assessment is carried out &y a team o people $/o /ave kno$ledge o speciic areas o t/e &usiness. Mem&ers/ip o t/e team may vary over time as dierent parts o t/e &usiness are assessed. @/e assessment may use a su&jective Du'(t't&" analysis &ased on inormed opinion' or $/ere relia&le dollar igures and /istorical inormation is availa&le' t/e analysis may use Du'ntt't&" analysis. ,.*.5 Contro(! W/en Management c/ooses to mitigate a risk' t/ey $ill do so &y implementing one or more o t/ree dierent types o controls. dministrative (dministrative controls Balso called procedural controlsC consist o approved $ritten policies' procedures' standards and guidelines. (dministrative controls orm t/e rame$ork or running t/e &usiness and managing people. @/ey inorm people on /o$ t/e &usiness is to &e run and /o$ day to day operations are to &e conducted. La$s and regulations created &y government &odies are also a type o administrative control &ecause t/ey inorm t/e &usiness. 2ome industry sectors /ave policies' procedures' standards and guidelines t/at must &e ollo$ed 4 t/e !ayment Card Industry B!CIC "ata 2ecurity 2tandard re*uired &y 0isa and Master Card is suc/ an eDample. Ot/er eDamples o administrative controls include t/e corporate security policy' pass$ord policy' /iring policies' and disciplinary policies. (dministrative controls orm t/e &asis or t/e selection and implementation o logical and p/ysical controls. Logical and p/ysical controls are maniestations o administrative controls. (dministrative controls are o paramount importance. !ogical Logical controls Balso called tec/nical controlsC use sot$are and data to monitor and control access to inormation and computing systems. ?or eDample5 pass$ords' net$ork and /ost &ased ire$alls' net$ork intrusion detection systems' access control lists' and data encryption are logical controls. +91 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM (n important logical control t/at is re*uently overlooked is t/e <rnc<(" o/ ("'!t <r&("-". @/e principle o least privilege re*uires t/at an individual' program or system process is not granted any more access privileges t/an are necessary to perorm t/e task. ( &latant eDample o t/e ailure to ad/ere to t/e principle o least privilege is logging into Windo$s as user (dministrator to read Email and sur t/e We&. 0iolations o t/is principle can also occur $/en an individual collects additional access privileges over time. @/is /appens $/en employeesI jo& duties c/ange' or t/ey are promoted to a ne$ position' or t/ey transer to anot/er department. @/e access privileges re*uired &y t/eir ne$ duties are re*uently added onto t/eir already eDisting access privileges $/ic/ may no longer &e necessary or appropriate. "hysical !/ysical controls monitor and control t/e environment o t/e $ork place and computing acilities. @/ey also monitor and control access to and rom suc/ acilities. ?or eDample5 doors' locks' /eating and air conditioning' smoke and ire alarms' ire suppression systems' cameras' &arricades' encing' security guards' ca&le locks' etc. 2eparating t/e net$ork and $ork place into unctional areas are also p/ysical controls. (n important p/ysical control t/at is re*uently overlooked is t/e !"<'r'ton o/ dut"!. 2eparation o duties ensures t/at an individual can not complete a critical task &y /imsel. ?or eDample5 an employee $/o su&mits a re*uest or reim&ursement s/ould not also &e a&le to aut/oriEe payment or print t/e c/eck. (n applications programmer s/ould not also &e t/e server administrator or t/e data&ase administrator 4 t/ese roles and responsi&ilities must &e separated rom one anot/er. ,.+ D't'$'!" S"curt) "ata&ase security is primarily concerned $it/ t/e secrecy o data. 2ecrecy means protecting a data&ase rom unaut/oriEed access &y users and sot$are applications. 2ecrecy' in t/e conteDt o data &ase security' includes a variety o t/reats incurred t/roug/ unaut/oriEed access. @/ese t/reats range rom t/e intentional t/et or destruction o data to t/e ac*uisition o inormation t/roug/ more su&tle measures' suc/ as inerence. @/ere are t/ree generally accepted categories o secrecy4related pro&lems in data &ase systems5 *. T1" #<ro<"r r"("'!" o/ n/or#'ton /ro# r"'dn- d't' t1't 2'! nt"nton'(() or 'ccd"nt'(() 'cc"!!"d $) un'ut1or>"d u!"r!. 2ecuring data &ases rom unaut/oriEed access is more +9. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM diicult t/an controlling access to iles managed &y operating systems. @/is pro&lem arises rom t/e iner granularity t/at is used &y data&ases $/en /andling iles' attri&utes' and values. @/is type o pro&lem also includes t/e violations to secrecy t/at result rom t/e pro&lem o inerence' $/ic/ is t/e deduction o unaut/oriEed inormation rom t/e o&servation o aut/oriEed inormation. Inerence is one o t/e most diicult actors to control in any attempts to secure data. %ecause t/e inormation in a data&ase is semantically related' it is possi&le to determine t/e value o an attri&ute $it/out accessing it directly. Inerence pro&lems are most serious in statistical data&ases $/ere users can trace &ack inormation on individual entities rom t/e statistical aggregated data. +. T1" I#<ro<"r Mod/c'ton o/ D't'. @/is t/reat includes violations o t/e security o data t/roug/ mis/andling and modiications &y unaut/oriEed users. @/ese violations can result rom errors' viruses' sa&otage' or ailures in t/e data t/at arise rom access &y unaut/oriEed users. ,. D"n'(@O/@S"r&c" T1r"'t!. (ctions t/at could prevent users rom using system resources or accessing data are among t/e most serious. @/is t/reat /as &een demonstrated to a signiicant degree recently $it/ t/e 2AN looding attacks against net$ork service providers. D!cr"ton'r) &!. M'nd'tor) Acc"!! Contro( Po(c"! %ot/ traditional relational data &ase management system B7"%M2C security models and OO data &ase models make use o t$o general types o access control policies to protect t/e inormation in multilevel systems. @/e irst o t/ese policies is t/e discretionary policy. In t/e discretionary access control B"(CC policy' access is restricted &ased on t/e aut/oriEations granted to t/e user. @/e mandatory access control BM(CC policy secures inormation &y assigning sensitivity levels' or la&els' to data entities. M(C policies are generally more secure t/an "(C policies and t/ey are used in systems in $/ic/ security is critical' suc/ as military applications. )o$ever' t/e price t/at is usually paid or t/is tig/tened security is reduced perormance o t/e data &ase management system. Most M(C policies also incorporate "(C measures as $ell. ,., R"('ton'( DBMS S"curt) @/e principal met/ods o security in traditional 7"%M2s are t/roug/ t/e appropriate use and manipulation o vie$s and t/e structured *uery +9; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM language B2FLC G7(N@ and 7E0O3E statements. @/ese measures are reasona&ly eective &ecause o t/eir mat/ematical oundation in relational alge&ra and relational calculus. ,.,.* V"2@B'!"d Acc"!! Contro( 0ie$s allo$ t/e data&ase to &e conceptually divided into pieces in $ays t/at allo$ sensitive data to &e /idden rom unaut/oriEed users. In t/e relational model' vie$s provide a po$erul mec/anism or speciying data4dependent aut/oriEations or data retrieval. (lt/oug/ t/e individual user $/o creates a vie$ is t/e o$ner and is entitled to drop t/e vie$' /e or s/e may not &e aut/oriEed to eDecute all privileges on it. @/e aut/oriEations t/at t/e o$ner may eDercise depend on t/e vie$ semantics and on t/e aut/oriEations t/at t/e o$ner is allo$ed to implement on t/e ta&les directly accessed &y t/e vie$. ?or t/e o$ner to eDercise a speciic aut/oriEation on a vie$ t/at /e or s/e creates' t/e o$ner must possess t/e same aut/oriEation on all ta&les t/at t/e vie$ uses. @/e privileges t/e o$ner possesses on t/e vie$ are determined at t/e time o vie$ deinition. Eac/ privilege t/e o$ner possesses on t/e ta&les is deined or t/e vie$. I' later on' t/e o$ner receives additional privileges on t/e ta&les used &y t/e vie$' t/ese additional privileges $ill not &e passed onto t/e vie$. In order to use t/e ne$ privileges $it/in a vie$' t/e o$ner $ill need to create a ne$ vie$. @/e &iggest pro&lem $it/ vie$4&ased mandatory access controls is t/at it is impractical to veriy t/at t/e sot$are perorms t/e vie$ interpretation and processing. I t/e correct aut/oriEations are to &e assured' t/e system must contain some type o mec/anism to veriy t/e classiication o t/e sensitivity o t/e inormation in t/e data&ase. @/e classiication must &e done automatically' and t/e sot$are t/at /andles t/e classiication must &e trusted. )o$ever' any trusted sot$are or t/e automatic classiication process $ould &e eDtremely compleD. ?urt/ermore' attempting to use a *uery language suc/ as 2FL to speciy classiications *uickly &ecome convoluted and compleD. Even $/en t/e compleDity o t/e classiication sc/eme is overcome' t/e vie$ can do not/ing more t/an limit $/at t/e user sees S it cannot restrict t/e operations t/at may &e perormed on t/e vie$s. ,.8 Pro<o!"d OODBMS S"curt) Mod"(! Currently only a e$ models use discretionary access control measures in secure o&ject4oriented data &ase management systems. E7<(ct Aut1or>'ton! @/e O7ION aut/oriEation model permits access to data on t/e &asis o eDplicit aut/oriEations provided to eac/ group o users. @/ese +9< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM aut/oriEations are classiied as positive aut/oriEations &ecause t/ey speciically allo$ a user access to an o&ject. 2imilarly' a negative aut/oriEation is used to speciically deny a user access to an o&ject. @/e placement o an individual into one or more groups is &ased on t/e role t/at t/e individual plays in t/e organiEation. In addition to t/e positive aut/oriEations t/at are provided to users $it/in eac/ group' t/ere are a variety o implicit aut/oriEations t/at may &e granted &ased on t/e relations/ips &et$een su&jects and access modes. D't'@Cdn- Mod"( ( similar discretionary access control secure model is t/e data4/iding model proposed &y "r. Elisa %ertino o t/e UniversitaK di Genova. @/is model distinguis/es &et$een pu&lic met/ods and private met/ods. @/e data4/iding model is &ased on aut/oriEations or users to eDecute met/ods on o&jects. @/e aut/oriEations speciy $/ic/ met/ods t/e user is aut/oriEed to invoke. (ut/oriEations can only &e granted to users on pu&lic met/ods. )o$ever' t/e act t/at a user can access a met/od does not automatically mean t/at t/e user can eDecute all actions associated $it/ t/e met/od. (s a result' several access controls may need to &e perormed during t/e eDecution' and all o t/e aut/oriEations or t/e dierent accesses must eDist i t/e user is to complete t/e processing. 2imilar to t/e use o G7(N@ statements in traditional relational data &ase management systems' t/e creator o an o&ject is a&le to grant aut/oriEations to t/e o&ject to dierent users. @/e TcreatorU is also a&le to revoke t/e aut/oriEations rom users in a manner similar to 7E0O3E statements. )o$ever' unlike traditional 7"%M2 G7(N@ statements' t/e data4/iding model includes t/e notion o protection mode. W/en aut/oriEations are provided to users in t/e protection mode' t/e aut/oriEations actually c/ecked &y t/e system are t/ose o t/e creator and not t/e individual eDecuting t/e met/od. (s a result' t/e creator is a&le to grant a user access to a met/od $it/out granting t/e user t/e aut/oriEations or t/e met/ods called &y t/e original met/od. In ot/er $ords' t/e creator can provide a user access to speciic data $it/out &eing orced to give t/e user complete access to all related inormation in t/e o&ject. ,.5 S"curt) C('!!/c'ton /or In/or#'ton (n important aspect o inormation security and risk management is recogniEing t/e value o inormation and deining appropriate +9: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM procedures and protection re*uirements or t/e inormation. Not all inormation is e*ual and so not all inormation re*uires t/e same degree o protection. @/is re*uires inormation to &e assigned a security classiication. 2ome actors t/at inluence $/ic/ classiication inormation s/ould &e assigned include /o$ muc/ value t/at inormation /as to t/e organiEation' /o$ old t/e inormation is and $/et/er or not t/e inormation /as &ecome o&solete. La$s and ot/er regulatory re*uirements are also important considerations $/en classiying inormation. Common inormation security classiication la&els used &y t/e &usiness sector are5 <u$(cE !"n!t&"E <r&'t"E con/d"nt'(. Common inormation security classiication la&els used &y government are5 Unc('!!/"d' S"n!t&" But Unc('!!/"d' R"!trct"d' Con/d"nt'(' S"cr"t' To< S"cr"t and t/eir non4Englis/ e*uivalents. (ll employees in t/e organiEation' as $ell as &usiness partners' must &e trained on t/e classiication sc/ema and understand t/e re*uired security controls and /andling procedures or eac/ classiication. @/e classiication a particular inormation asset /as &een assigned s/ould &e revie$ed periodically to ensure t/e classiication is still appropriate or t/e inormation and to ensure t/e security controls re*uired &y t/e classiication are in place. Acc"!! contro(:(ccess to protected inormation must &e restricted to people $/o are aut/oriEed to access t/e inormation. @/e computer programs' and in many cases t/e computers t/at process t/e inormation' must also &e aut/oriEed. @/is re*uires t/at mec/anisms &e in place to control t/e access to protected inormation. @/e sop/istication o t/e access control mec/anisms s/ould &e in parity $it/ t/e value o t/e inormation &eing protected 4 t/e more sensitive or valua&le t/e inormation t/e stronger t/e control mec/anisms need to &e. @/e oundation on $/ic/ access control mec/anisms are &uilt start $it/ identiication and aut/entication. Id"nt/c'ton is an assertion o $/o someone is or $/at somet/ing is. I a person makes t/e statement :9ello& my name is ;ohn Doe.: t/ey are making a claim o $/o t/ey are. )o$ever' t/eir claim may or may not &e true. %eore #o/n "oe can &e granted access to protected inormation it $ill &e necessary to veriy t/at t/e person claiming to &e #o/n "oe really is #o/n "oe. Aut1"ntc'ton is t/e act o veriying a claim o identity. W/en #o/n "oe goes into a &ank to make a $it/dra$al' /e tells t/e &ank teller /e is #o/n "oe Ba claim o identityC. @/e &ank teller asks to see a p/oto I"' so ++9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM /e /ands t/e teller /is driversK license. @/e &ank teller c/ecks t/e license to make sure it /as #o/n "oe printed on it and compares t/e p/otograp/ on t/e license against t/e person claiming to &e #o/n "oe. I t/e p/oto and name matc/ t/e person' t/en t/e teller /as aut/enticated t/at #o/n "oe is $/o /e claimed to &e. On computer systems in use today' t/e Username is t/e most common orm o identiication and t/e !ass$ord is t/e most common orm o aut/entication. Usernames and pass$ords /ave served t/eir purpose &ut in our modern $orld t/ey are no longer ade*uate. Usernames and pass$ords are slo$ly &eing replaced $it/ more sop/isticated aut/entication mec/anisms. (ter a person' program or computer /as successully &een identiied and aut/enticated t/en it must &e determined $/at inormational resources t/ey are permitted to access and $/at actions t/ey $ill &e allo$ed to perorm Brun' vie$' create' delete' or c/angeC. @/is is called 'ut1or>'ton. (ut/oriEation to access inormation and ot/er computing services &egins $it/ administrative policies and procedures. @/e polices prescri&e $/at inormation and computing services can &e accessed' &y $/om' and under $/at conditions. @/e access control mec/anisms are t/en conigured to enorce t/ese policies. "ierent computing systems are e*uipped $it/ dierent kinds o access control mec/anisms' some may oer a c/oice o dierent access control mec/anisms. @/e access control mec/anism a system oers $ill &e &ased upon one o t/ree approac/es to access control or it may &e derived rom a com&ination o t/e t/ree approac/es. @/e non4discretionary approac/ consolidates all access control under a centraliEed administration. @/e access to inormation and ot/er resources is usually &ased on t/e individuals unction BroleC in t/e organiEation or t/e tasks t/e individual must perorm. @/e discretionary approac/ gives t/e creator or o$ner o t/e inormation resource t/e a&ility to control access to t/ose resources. In t/e Mandatory access control approac/' access is granted or denied &ases upon t/e security classiication assigned to t/e inormation resource. +++ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.? Cr)<to-r'<1) Inormation security uses cryptograp/y to transorm usa&le inormation into a orm t/at renders it unusa&le &y anyone ot/er t/an an aut/oriEed userJ t/is process is called encryption. Inormation t/at /as &een encrypted Brendered unusa&leC can &e transormed &ack into its original usa&le orm &y an aut/oriEed user' $/o possesses t/e cryptograp/ic key' t/roug/ t/e process o decryption. Cryptograp/y is used in inormation security to protect inormation rom unaut/oriEed or accidental discloser $/ile t/e inormation is in transit Beit/er electronically or p/ysicallyC and $/ile inormation is in storage. Cryptograp/y provides inormation security $it/ ot/er useul applications as $ell including improved aut/entication met/ods' message digests' digital signatures' non4repudiation' and encrypted net$ork communications. Cryptograp/y can introduce security pro&lems $/en it is not implemented correctly. Cryptograp/ic solutions need to &e implemented using industry accepted solutions t/at /ave undergone rigorous peer revie$ &y independent eDperts in cryptograp/y. @/e lengt/ and strengt/ o t/e encryption key is also an important consideration. ( key t/at is $eak or too s/ort $ill produce $eak encryption. @/e keys used or encryption and decryption must &e protected $it/ t/e same degree o rigor as any ot/er conidential inormation. @/ey must &e protected rom unaut/oriEed disclosure and destruction and t/ey must &e availa&le $/en needed. Proc"!! @/e terms r"'!on'$(" 'nd <rud"nt <"r!on' du" c'r" and du" d(-"nc" /ave &een used in t/e ields o ?inance' 2ecurities' and La$ or many years. In recent years t/ese terms /ave ound t/eir $ay into t/e ields o computing and inormation security. U.2.(. ?ederal 2entencing Guidelines no$ make it possi&le to /old corporate oicers lia&le or ailing to eDercise due care and due diligence in t/e management o t/eir inormation systems. In t/e &usiness $orld' stock/olders' customers' &usiness partners and governments /ave t/e eDpectation t/at corporate oicers $ill run t/e &usiness in accordance $it/ accepted &usiness practices and in compliance $it/ la$s and ot/er regulatory re*uirements. @/is is oten descri&ed as t/e Mreasona&le and prudent personM rule. ++8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.7 D!'!t"r R"co&"r) P('nnn- W/at is "isaster 7ecovery !lanning "isaster 7ecovery !lanning is all a&out continuing an I@ service. Aou need 8 or more sites' one o t/em is primary' $/ic/ is planned to &e recovered. @/e alternate site may &e online...meaning production data is simultaneously transerred to &ot/ sites Bsometime called as )O@ 2itesC' may &e oline...meaning data is tranerred ater a certain delay t/roug/ ot/er means' Bsometimes called as a W(7M siteC or even may not &e transerred at all' &ut may /ave a replica I@ system o t/e original site' $/ic/ $ill &e started $/enever t/e primary site aces a disaster Bsometimes called a COL" siteC. )o$ are "7! and %C! dierent @/oug/ "7! is part o t/e %C! process' "7! ocusses on I@ systems recovery and %C! on t/e entire &usiness. )o$ are "7! and %C! related "7! is one o t/e recovery activities during eDecution o a %usiness Continuity !lan. 8.: CONCLUSION "ata and inormation systems security is t/e ongoing process o eDercising due care and due diligence to protect inormation' and inormation systems' rom unaut/oriEed access' use' disclosure' destruction' modiication' or disruption or distri&ution. T1" n"&"r "ndn- <roc"!! o inormation security involves ongoing training' assessment' protection' monitoring L detection' incident response L repair' documentation' and revie$. 5.: SUMMARY @/is unit can &e summariEed as ollo$s5 D't' !"curt) is t/e means o ensuring t/at data is kept sae rom corruption and t/at access to it is suita&ly controlled In/or#'ton S"curt) means protecting inormation and inormation systems rom unaut/oriEed access' use' disclosure' disruption' modiication' or destruction. @/e terms inormation security' computer security and inormation assurance are re*uently used interc/angea&ly. ++= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ?or over t$enty years inormation security /as /eld t/at conidentiality' integrity and availa&ility Bkno$n as t/e CI( @riadC are t/e core principles o inormation system security. @/e principal met/ods o security in traditional 7"%M2s are t/roug/ t/e appropriate use and manipulation o vie$s and t/e structured *uery language B2FLC G7(N@ and 7E0O3E statements. Aut1"ntc'ton is t/e act o veriying a claim o identity. Currently only a e$ models use discretionary access control measures in secure o&ject4oriented data &ase management systems. (n important aspect o inormation security and risk management is recogniEing t/e value o inormation and deining appropriate procedures and protection re*uirements or t/e inormation. Inormation security uses cryptograp/y to transorm usa&le inormation into a orm t/at renders it unusa&le &y anyone ot/er t/an an aut/oriEed userJ t/is process is called encryption. "isaster 7ecovery !lanning is all a&out continuing an I@ service. Aou need 8 or more sites' one o t/em is primary' $/ic/ is planned to &e recovered. ?.: TUTOR@MARAED ASSIGNMENT +. List "onn !arkerKs . atomic elements o CI( @riad o inormation security. 8. %riely discuss "isaster 7ecovery !lanning in t/e security o "%M2. 7.: REFERENCESBFURTCER READINGS ,, U.2.C h =1,8 B&C B+C B899.C %lack$ell Encyclopedia o Management Inormation 2ystem' 0ol. III' Edited &y Gordon %. "avis. )arris' 2/on B899=C. All#in#one "I00( "ertification +xam uide' 8nd Ed.' -dmirror-/ttp-en.$ikipedia.org-$-Emeryville' C(5 McGra$4 )ill-Os&orne. I2(C( B899.C. "I0A )evie1 Manual /@@A. Inormation 2ystems (udit and Control (ssociation' p. <1. I2%N +4:==8<,4+14=. Fuist' (rvin 2. B8998C. M0ecurity "lassification of Information B)@MLC. 0olume +. Introduction' )istory' and (dverse Impacts. Oak 7idge Classiication (ssociates' LLC. 7etrieved on 899;49+4++. ++, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM UNIT 8 DATABASE ADMINISTRATOR AND ADMINISTRATION CONTENTS +.9 Introduction 8.9 O&jectives =.9 Main Content =.+ "uties o "ata&ase (dministrator =.8 @ypical Work (ctivities =.= "ata&ase (dministrations and (utomation =.=.+ @ypes o "ata&ase (dministration =.=.8 Nature o "ata&ase (dministration =.=.= "ata&ase (dministration @ools =.=., @/e Impact o I@ (utomation on "ata&ase (dministration =.=.1 Learning "ata&ase (dministration ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION ( d't'$'!" 'd#n!tr'tor BDBAC is a person $/o is responsi&le or t/e environmental aspects o a data&ase. In general' t/ese include5 7ecovera&ility 4 Creating and testing %ackups Integrity 4 0eriying or /elping to veriy data integrity 2ecurity 4 "eining and-or implementing access controls to t/e data (vaila&ility 4 Ensuring maDimum uptime !erormance 4 Ensuring maDimum perormance "evelopment and testing support 4 )elping programmers and engineers to eiciently utiliEe t/e data&ase. @/e role o a data&ase administrator /as c/anged according to t/e tec/nology o data&ase management systems B"%M2sC as $ell as t/e needs o t/e o$ners o t/e data&ases. ?or eDample' alt/oug/ logical and p/ysical data&ase designs are traditionally t/e duties o a d't'$'!" 'n'()!t or d't'$'!" d"!-n"r' a "%( may &e tasked to perorm t/ose duties. ++1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 ans$er t/e *uestion o $/o is a data&ase administrator identiy t/e various unctions o data&ase administrator kno$ t/e dierent types o data&ase administration understand t/e nature o data&ase administration kno$ t/e tools used in data&ase administration.
,.: MAIN CONTENT ,.* Dut"! o/ D't'$'!" Ad#n!tr'tor @/e duties o a data&ase administrator vary and depend on t/e jo& description' corporate and Inormation @ec/nology BI@C policies and t/e tec/nical eatures and capa&ilities o t/e "%M2 &eing administered. @/ey nearly al$ays include disaster recovery B&ackups and testing o &ackupsC' perormance analysis and tuning' data dictionary maintenance' and some data&ase design. 2ome o t/e roles o t/e "%( may include5 Installation o ne$ sot$are S It is primarily t/e jo& o t/e "%( to install ne$ versions o "%M2 sot$are' application sot$are' and ot/er sot$are related to "%M2 administration. It is important t/at t/e "%( or ot/er I2 sta mem&ers test t/is ne$ sot$are &eore it is moved into a production environment. Coniguration o /ard$are and sot$are $it/ t/e system administrator S In many cases t/e system sot$are can only &e accessed &y t/e system administrator. In t/is case' t/e "%( must $ork closely $it/ t/e system administrator to perorm sot$are installations' and to conigure /ard$are and sot$are so t/at it unctions optimally $it/ t/e "%M2. 2ecurity administration S One o t/e main duties o t/e "%( is to monitor and administer "%M2 security. @/is involves adding and removing users' administering *uotas' auditing' and c/ecking or security pro&lems. "ata analysis S @/e "%( $ill re*uently &e called on to analyEe t/e data stored in t/e data&ase and to make recommendations relating to perormance and eiciency o t/at data storage. @/is mig/t relate to t/e more eective use o indeDes' ena&ling M!arallel FueryM eDecution' or ot/er "%M2 speciic eatures. "ata&ase design BpreliminaryC S @/e "%( is oten involved at t/e preliminary data&ase4design stages. @/roug/ t/e involvement o t/e "%(' many pro&lems t/at mig/t occur can &e eliminated. @/e "%( ++. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM kno$s t/e "%M2 and system' can point out potential pro&lems' and can /elp t/e development team $it/ special perormance considerations. "ata modeling and optimiEation S &y modeling t/e data' it is possi&le to optimiEe t/e system layout to take t/e most advantage o t/e I-O su&system. 7esponsi&le or t/e administration o eDisting enterprise data&ases and t/e analysis' design' and creation o ne$ data&ases. 4 "ata modeling' data&ase optimiEation' understanding and implementation o sc/emas' and t/e a&ility to interpret and $rite compleD 2FL *ueries 4 !roactively monitor systems or optimum perormance and capacity constraints 4 Esta&lis/ standards and &est practices or 2FL 4 Interact $it/ and coac/ developers in 2FL scripting R"co&"r'$(t) 7ecovera&ility means t/at' i a data entry error' program &ug or /ard$are ailure occurs' t/e "%( can &ring t/e data&ase &ack$ard in time to its state at an instant o logical consistency &eore t/e damage $as done. 7ecovera&ility activities include making data&ase &ackups and storing t/em in $ays t/at minimiEe t/e risk t/at t/ey $ill &e damaged or lost' suc/ as placing multiple copies on remova&le media and storing t/em outside t/e aected area o an anticipated disaster. 7ecovera&ility is t/e "%(Ks most important concern. @/e &ackup o t/e data&ase consists o data $it/ timestamps com&ined $it/ data&ase logs to c/ange t/e data to &e consistent to a particular moment in time. It is possi&le to make a &ackup o t/e data&ase containing only data $it/out timestamps or logs' &ut t/e "%( must take t/e data&ase oline to do suc/ a &ackup. @/e recovery tests o t/e data&ase consist o restoring t/e data' t/en applying logs against t/at data to &ring t/e data&ase &ackup to consistency at a particular point in time up to t/e last transaction in t/e logs. (lternatively' an oline data&ase &ackup can &e restored simply &y placing t/e data in4place on anot/er copy o t/e data&ase. I a "%( Bor any administratorC attempts to implement a recovera&ility plan $it/out t/e recovery tests' t/ere is no guarantee t/at t/e &ackups are at all valid. In practice' in all &ut t/e most mature 7"%M2 packages' &ackups rarely are valid $it/out eDtensive testing to &e sure t/at no &ugs or /uman error /ave corrupted t/e &ackups. S"curt) ++; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 2ecurity means t/at usersK a&ility to access and c/ange data conorms to t/e policies o t/e &usiness and t/e delegation decisions o its managers. Like ot/er metadata' a relational "%M2 manages security inormation in t/e orm o ta&les. @/ese ta&les are t/e Tkeys to t/e kingdomU and so it is important to protect t/em rom intruders. so t/at is $/y t/e security is more and more important or t/e data&ases. P"r/or#'nc" !erormance means t/at t/e data&ase does not cause unreasona&le online response times' and it does not cause unattended programs to run or an un$orka&le period o time. In compleD client-server and t/ree4tier systems' t/e data&ase is just one o many elements t/at determine t/e perormance t/at online users and unattended programs eDperience. !erormance is a major motivation or t/e "%( to &ecome a generalist and coordinate $it/ specialists in ot/er parts o t/e system outside o traditional &ureaucratic reporting lines. @ec/ni*ues or data&ase perormance tuning /ave c/anged as "%(Is /ave &ecome more sop/isticated in t/eir understanding o $/at causes perormance pro&lems and t/eir a&ility to diagnose t/e pro&lem. In t/e +::9s' "%(s oten ocused on t/e data&ase as a $/ole' and looked at data&ase4$ide statistics or clues t/at mig/t /elp t/em ind out $/y t/e system $as slo$. (lso' t/e actions "%(s took in t/eir attempts to solve perormance pro&lems $ere oten at t/e glo&al' data&ase level' suc/ as c/anging t/e amount o computer memory availa&le to t/e data&ase' or c/anging t/e amount o memory availa&le to any data&ase program t/at needed to sort data. "%(Is no$ understand t/at perormance pro&lems initially must &e diagnosed' and t/is is &est done &y eDamining individual 2FL statements' ta&le process' and system arc/itecture' not t/e data&ase as a $/ole. 0arious tools' some included $it/ t/e data&ase and some availa&le rom t/ird parties' provide a &e/ind t/e scenes look at /o$ t/e data&ase is /andling t/e 2FL statements' s/edding lig/t on $/atIs taking so long. )aving identiied t/e pro&lem' t/e individual 2FL statement can &e D"&"(o<#"ntBT"!tn- Su<<ort "evelopment and testing support is typically $/at t/e data&ase administrator regards as /is or /er least important duty' $/ile results4 oriented managers consider it t/e "%(Ks most important duty. 2upport activities include collecting sample production data or testing ne$ and ++< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM c/anged programs and loading it into test data&asesJ consulting $it/ programmers a&out perormance tuningJ and making ta&le design c/anges to provide ne$ kinds o storage or ne$ program unctions. )ere are some I@ roles t/at are related to t/e role o data&ase administrator5 (pplication programmer or sot$are engineer 2ystem administrator "ata administrator "ata arc/itect ,.+ T)<c'( 3or0 Act&t"! @/e $ork o data&ase administrator B"%(C varies according to t/e nature o t/e employing organiEation and level o responsi&ility associated $it/ post. @/e $ork may &e pure maintenance or it may also involve specialiEing in data&ase development. @ypical responsi&ility includes some or all o t/e ollo$ing5 esta&lis/ing t/e needs o t/e users and monitoring users access and security monitoring perormance and managing parameters to provide ast *uery responses to Pront endK users mapping out t/e conceptual design or a planned data&ase in outline considering &ot/ &ack end organiEation o data and ront end accessi&ility or t/e end user reining t/e logical design so t/at it can translated into speciic data model urt/er reining t/e p/ysical design to meet systems storage re*uirements installing and testing ne$ versions o t/e data&ase management system maintaining data standards including ad/erence to t/e "ata !rotection (ct $riting data&ase documentation' including data standards' procedures and deinitions or t/e data dictionary BmetadataC controlling access permissions and privileges developing' managing and testing &ackup recovery plans ensuring t/at storage ' arc/iving' and &ackup procedures are unctioning properly capacity planning $orking closely $it/ I@ project manager' data&ase programmers' and $e& developers ++: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM communicating regularly $it/ tec/nical applications and operational sta to ensure data&ase integrity and security commissioning and installing ne$ applications %ecause o t/e increasing level o /acking and t/e sensitive nature o data stored' security and recovera&ility or disaster recovery /as &ecome increasingly important aspects o t/e $ork. ,., D't'$'!" Ad#n!tr'ton! 'nd Auto#'ton D't'$'!" Ad#n!tr'ton is t/e unction o managing and maintaining data&ase management systems B"%M2C sot$are. Mainstream "%M2 sot$are suc/ as Oracle' I%M "%8 and Microsot 2FL 2erver need ongoing management. (s suc/' corporations t/at use "%M2 sot$are oten /ire specialiEed I@ BInormation @ec/nologyC personnel called "ata&ase (dministrators or "%(s. ,.,.* T)<"! o/ D't'$'!" Ad#n!tr'ton @/ere are t/ree types o "%(s5 +. 2ystems "%(s Bsometimes also reerred to as !/ysical "%(s' Operations "%(s or !roduction 2upport "%(sC 8. "evelopment "%(s =. (pplication "%(s "epending on t/e "%( type' t/eir unctions usually vary. %elo$ is a &rie description o $/at dierent types o "%(s do5 2ystems "%(s usually ocus on t/e p/ysical aspects o data&ase administration suc/ as "%M2 installation' coniguration' patc/ing' upgrades' &ackups' restores' reres/es' perormance optimiEation' maintenance and disaster recovery. "evelopment "%(s usually ocus on t/e logical and development aspects o data&ase administration suc/ as data model design and maintenance' ""L Bdata deinition languageC generation' 2FL $riting and tuning' coding stored procedures' colla&orating $it/ developers to /elp c/oose t/e most appropriate "%M2 eature-unctionality and ot/er pre4production activities. (pplication "%(s are usually ound in organiEations t/at /ave purc/ased =rd party application sot$are suc/ as E7! Benterprise resource planningC and C7M Bcustomer relations/ip managementC systems. EDamples o suc/ application sot$are include Oracle (pplications' 2ie&el and !eople2ot B&ot/ no$ part o Oracle Corp.C and 2(!. (pplication "%(s straddle t/e ence &et$een t/e "%M2 and t/e application sot$are and are +89 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM responsi&le or ensuring t/at t/e application is ully optimiEed or t/e data&ase and vice versa. @/ey usually manage all t/e application components t/at interact $it/ t/e data&ase and carry out activities suc/ as application installation and patc/ing' application upgrades' data&ase cloning' &uilding and running data cleanup routines' data load process management' etc. W/ile individuals usually specialiEe in one type o data&ase administration' in smaller organiEations' it is not uncommon to ind a single individual or group perorming more t/an one type o data&ase administration. ,.,.+ N'tur" o/ D't'$'!" Ad#n!tr'ton @/e degree to $/ic/ t/e administration o a data&ase is automated dictates t/e skills and personnel re*uired to manage data&ases. On one end o t/e spectrum' a system $it/ minimal automation $ill re*uire signiicant eDperienced resources to manageJ per/aps 14+9 data&ases per "%(. (lternatively an organiEation mig/t c/oose to automate a signiicant amount o t/e $ork t/at could &e done manually t/ereore reducing t/e skills re*uired to perorm tasks. (s automation increases' t/e personnel needs o t/e organiEation splits into /ig/ly skilled $orkers to create and manage t/e automation and a group o lo$er skilled MlineM "%(s $/o simply eDecute t/e automation. "ata&ase administration $ork is compleD' repetitive' time4consuming and re*uires signiicant training. 2ince data&ases /old valua&le and mission4critical data' companies usually look or candidates $it/ multiple years o eDperience. "ata&ase administration oten re*uires "%(s to put in $ork during o4/ours Bor eDample' or planned ater /ours do$ntime' in t/e event o a data&ase4related outage or i perormance /as &een severely degradedC. "%(s are commonly $ell compensated or t/e long /ours. ,.,., D't'$'!" Ad#n!tr'ton Too(! Oten' t/e "%M2 sot$are comes $it/ certain tools to /elp "%(s manage t/e "%M2. 2uc/ tools are called native tools. ?or eDample' Microsot 2FL 2erver comes $it/ 2FL 2erver Enterprise Manager and Oracle /as tools suc/ as 2FL`!lus and Oracle Enterprise Manager-Grid Control. In addition' =rd parties suc/ as %MC' Fuest 2ot$are' Em&arcadero and 2FL Maestro Group oer GUI tools to monitor t/e "%M2 and /elp "%(s carry out certain unctions inside t/e data&ase more easily. (not/er kind o data&ase sot$are eDists to manage t/e provisioning o ne$ data&ases and t/e management o eDisting data&ases and t/eir +8+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM related resources. @/e process o creating a ne$ data&ase can consist o /undreds or t/ousands o uni*ue steps rom satisying prere*uisites to coniguring &ackups $/ere eac/ step must &e successul &eore t/e neDt can start. ( /uman cannot &e eDpected to complete t/is procedure in t/e same eDact $ay time ater time 4 eDactly t/e goal $/en multiple data&ases eDist. (s t/e num&er o "%(s gro$s' $it/out automation t/e num&er o uni*ue conigurations re*uently gro$s to &e costly-diicult to support. (ll o t/ese complicated procedures can &e modeled &y t/e &est "%(s into data&ase automation sot$are and eDecuted &y t/e standard "%(s. 2ot$are /as &een created speciically to improve t/e relia&ility and repeata&ility o t/ese procedures suc/ as 2trataviaIs "ata !alette and Grid(pp 2ystems Clarity. ,.,.8 T1" I#<'ct o/ IT Auto#'ton on D't'$'!" Ad#n!tr'ton 7ecently' automation /as &egun to impact t/is area signiicantly. Ne$er tec/nologies suc/ as )!-Ops$areIs 2(2 B2erver (utomation 2ystemC and 2trataviaIs "ata !alette suite /ave &egun to increase t/e automation o servers and data&ases respectively causing t/e reduction o data&ase related tasks. )o$ever at &est t/is only reduces t/e amount o mundane' repetitive activities and does not eliminate t/e need or "%(s. @/e intention o "%( automation is to ena&le "%(s to ocus on more proactive activities around data&ase arc/itecture and deployment. ,.,.5 L"'rnn- D't'$'!" Ad#n!tr'ton @/ere are several education institutes t/at oer proessional courses' including late4nig/t programs' to allo$ candidates to learn data&ase administration. (lso' "%M2 vendors suc/ as Oracle' Microsot and I%M oer certiication programs to /elp companies to /ire *ualiied "%( practitioners. 8.: CONCLUSION "ata&ase management system B"%M2C is so important in an organiEation t/at a special manager is oten appointed to oversee its activities. @/e data&ase administrator is responsi&le or t/e installation and coordination o "%M2. @/ey are responsi&le or managing one o t/e most valua&le resources o any organiEation' its data. @/e data&ase administrator must /ave a sound kno$ledge o t/e structure o t/e data&ase and o t/e "%M2. @/e "%( must &e t/oroug/ly conversant $it/ t/e organiEation' itKs system and t/e inormation need o managers. 5.: SUMMARY +88 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ( D't'$'!" 'd#n!tr'tor BDBAC is a person $/o is responsi&le or t/e environmental aspects o a data&ase @/e duties o a data&ase administrator vary and depend on t/e jo& description' corporate and Inormation @ec/nology BI@C policies and t/e tec/nical eatures and capa&ilities o t/e "%M2 &eing administered. @/ey nearly al$ays include disaster recovery B&ackups and testing o &ackupsC' perormance analysis and tuning' data dictionary maintenance' and some data&ase design. @ec/ni*ues or data&ase perormance tuning /ave c/anged as "%(Is /ave &ecome more sop/isticated in t/eir understanding o $/at causes perormance pro&lems and t/eir a&ility to diagnose t/e pro&lem @/e $ork o data&ase administrator B"%(C varies according to t/e nature o t/e employing organiEation and level o responsi&ility associated $it/ post. D't'$'!" Ad#n!tr'ton is t/e unction o managing and maintaining data&ase management systems B"%M2C sot$are. @/e degree to $/ic/ t/e administration o a data&ase is automated dictates t/e skills and personnel re*uired to manage data&ases ?.: TUTOR@MARAED ASSIGNMENT +. Mention 1 roles o data&ase administrator 8. Mention t/e types o data&ase administrations 7.: REFERENCESBFURTCER READINGS (ssociation or Computing Mac/inery 2IGI7 ?orum arc/ive 0olume ;' Issue ,. @/e Origins o t/e "ata %ase Concept' Early "%M2 2ystems including "2 and IM2' t/e "ata %ase @ask Group' and t/e )ierarc/ical' Net$ork and 7elational "ata Models are discussed in @/omas )aig/' MI( 0erita&le %ucket o ?acts5I Origins o t/e "ata %ase Management 2ystem'M (CM 2IGMO" 7ecord =158 B#une 899.C. )o$ "ata&ase 2ystems 2/are 2torage. +8= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM MODULE , Unit + 7elational "ata&ase Management 2ystems Unit 8 "ata Ware/ouse Unit = "ocument Management 2ystem UNIT * RELATIONAL DATABASE MANAGEMENT SYSTEMS CONTENTS +.9 Introduction 8.9 O&jectives =.9 Main Content =.+ C!tor) o/ t1" T"r# ,.+ M'r0"t Structur" =.= ?eatures and 7esponsi&ilities o an 7"%M2 =., Comparison o 7elational "ata&ase Management 2ystems =.,.+ G"n"r'( In/or#'ton ,.8.+ O<"r'tn- S)!t"# Su<<ort ,.8., Fund'#"nt'( F"'tur"! ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION ( 7elational data&ase management system B7"%M2C is a data&ase management system B"%M2C t/at is &ased on t/e relational model as introduced &y E. ?. Codd. Most popular commercial and open source data&ases currently in use are &ased on t/e relational model. ( s/ort deinition o an 7"%M2 may &e a "%M2 in $/ic/ data is stored in t/e orm o ta&les and t/e relations/ip among t/e data is also stored in t/e orm o ta&les. +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 deine relational data&ase management system trace t/e origin and development o 7"%M2 identiy t/e market structure o 7"%M2 identiy t/e major types o relational management systems compare and contrast t/e types o 7"%M2 &ased on several criteria +8, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.: MAIN CONTENT ,.* C!tor) o/ t1" T"r# E. ?. Codd introduced t/e term in /is seminal paper M( 7elational Model o "ata or Large 2/ared "ata %anksM' pu&lis/ed in +:;9. In t/is paper and later papers /e deined $/at /e meant &y r"('ton'(. One $ell4kno$n deinition o $/at constitutes a relational data&ase system is CoddIs +8 rules. )o$ever' many o t/e early implementations o t/e relational model did not conorm to all o CoddIs rules' so t/e term gradually came to descri&e a &roader class o data&ase systems. (t a minimum' t/ese systems5 presented t/e data to t/e user as relations Ba presentation in ta&ular orm' i.e. as a co(("cton o ta&les $it/ eac/ ta&le consisting o a set o ro$s and columns' can satisy t/is propertyC provided relational operators to manipulate t/e data in ta&ular orm @/e irst systems t/at $ere relatively ait/ul implementations o t/e relational model $ere rom t/e University o Mic/iganJ Micro "%M2 B+:.:C and rom I%M U3 2cientiic Centre at !eterleeJ I2+ B+:;9O;8C and its ollo$on !7@0 B+:;=O;:C. @/e irst system sold as an 7"%M2 $as Multics 7elational "ata 2tore' irst sold in +:;<. Ot/ers /ave &een %erkeley Ingres FUEL and I%M %2+8. @/e most popular deinition o an 7"%M2 is a product t/at presents a vie$ o data as a collection o ro$s and columns' even i it is not &ased strictly upon relational t/eory. %y t/is deinition' 7"%M2 products typically implement some &ut not all o CoddIs +8 rules. ( second' t/eory4&ased sc/ool o t/oug/t argues t/at i a data&ase does not implement all o CoddIs rules Bor t/e current understanding on t/e relational model' as eDpressed &y C/ristop/er # "ate' )ug/ "ar$en and ot/ersC' it is not relational. @/is vie$' s/ared &y many t/eorists and ot/er strict ad/erents to CoddIs principles' $ould dis*ualiy most "%M2s as not relational. ?or clariication' t/ey oten reer to some 7"%M2s as Truly#)elational Database Management 0ystems B@7"%M2C' naming ot/ers (seudo#)elational Database Management 0ystems B!7"%M2C. (lmost all commercial relational "%M2s employ 2FL as t/eir *uery language. (lternative *uery languages /ave &een proposed and implemented' &ut very e$ /ave &ecome commercial products. +81 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.+ M'r0"t Structur" Given &elo$ is a list o top RDBMS &"ndor! n +::? $it/ igures in millions o United 2tates "ollars pu&lis/ed in an I"C study. V"ndor G(o$'( R"&"nu" Oracle ;':+8 I%M =',<= Microsot ='918 2y&ase 18,9 @eradata ,1; Ot/ers +'.8, Tot'( *?E85+ Lo$ adoption costs associated $it/ open4source 7"%M2 products suc/ as My2FL and !ostgre2FL /ave &egun inluencing vendor pricing and licensing strategies Z . ,., F"'tur"! 'nd R"!<on!$(t"! o/ 'n RDBMS (s mentioned earlier' an 7"%M2 is sot$are t/at is used or creating and maintaining a data&ase. Maintaining involves several tasks t/at an 7"%M2 takes care o. @/ese tasks are as ollo$5 Contro( D't' R"dund'nc) 2ince data in an 7"%M2 is spread across several ta&les' repetition or redundancy is reduced. 7edundant data can &e eDtracted and stored in anot/er ta&le' along $it/ a ield t/at is common to &ot/ t/e ta&les. "ata can t/en &e eDtracted rom t/e t$o ta&les &y using t/e common ield. D't' A$!tr'cton @/is $ould imply t/at t/e 7"%M2 /ides t/e actual $ay' in $/ic/ data is stored' $/ile providing t/e user $it/ a conceptual representation o t/e data. Su<<ort /or Mu(t<(" U!"r! ( true 7"%M2 allo$s eective s/aring o data. @/at is' it ensures t/at several users can concurrently access t/e data in t/e data&ase $it/out aecting t/e speed o t/e data access. +8. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM In a data&ase application' $/ic/ can &e used &y several users concurrently' t/ere is t/e possi&ility t/at t$o users may try to modiy a particular record at t/e same time. @/is could lead to one personKs c/anges &eing made $/ile t/e ot/ers are over$ritten. @o avoid suc/ conusion' most 7"%M2s provide a record4locking mec/anism. @/is mec/anism ensures t/at no t$o users could modiy a particular record at t/e same time. ( record is as it $ere TlockedU $/ile one user makes c/anges to it. (not/er user is t/ereore not allo$ed to modiy it till t/e c/anges are complete and t/e record is saved. @/e TlockU is t/en released' and t/e record availa&le or editing again. Mu(t<(" 3')! o/ Int"r/"rn- to t1" S)!t"# @/is $ould re*uire t/e data&ase to &e a&le to &e accessi&le t/roug/ dierent *uery languages as $ell as programming languages. It $ould also mean t/at a variety o ront4end tools s/ould &e a&le to use t/e data&ase as a &ack4end. ?or eDample data stored in Microsot (ccess can &e displayed and manipulated using orms created in sot$are suc/ as 0isual %asic or ?ront !age 8999. R"!trctn- Un'ut1or>"d Acc"!! (n 7"%M2 provides a security mec/anism t/at ensures t/at data in t/e data&ase is protected rom unaut/oriEed access and malicious use. @/e security t/at is implemented in most 7"%M2s is reerred to as PUser4 level securityK' $/erein t/e various users o t/e data&ase are assigned usernames and pass$ords.' only $/en t/e user enters t/e correct username and pass$ord is /e a&le to access t/e data in t/e data&ase. In addition to t/is' a particular user could &e restricted to only vie$ t/e data' $/ile anot/er could /ave t/e rig/ts to modiy t/e data. ( t/ird user could /ave rig/t s to c/ange t/e structure o some ta&le itsel' in addition to t/e rig/ts t/at t/e ot/er t$o /ave. W/en security is implemented properly' data is secure and cannot &e tampered $it/. En/orcn- Int"-rt) Con!tr'nt! 7"%M2 provide a set o rules t/at ensure t/at data entered into a ta&le is valid. @/ese rules must remain true or a data&ase to preserve integrity. PIntegrity constraintsK are speciied at t/e time o creating t/e data&ase' and are enorced &y t/e 7"%M2. ?or eDample in a PMarks Pta&le' a constraint can &e added to ensure t/at t/e marks in eac/ su&ject &e &et$een 9 and +99. 2uc/ a constraint is called a PC/eckK constraint. It is a rule t/at can &e set &y t/e user to +8; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ensure t/at only data t/at meets t/e criteria speciied t/ere is allo$ed to enter t/e data&ase. @/e given eDample ensures t/at only a num&er &et$een 9 and +99 can &e entered into t/e marks column. B'c0u< 'nd R"co&"r) In spite o ensuring t/at t/e data&ase is secure rom unaut/oriEed access- user as $ell as invalid entries' t/ere is al$ays a danger t/at t/e data in t/e data&ase could get lost. @/ey could /appen due to some /ard$are pro&lems or system cras/. It could t/ereore result in a loss o all data. @o guard t/e data&ase rom t/is' most 7"%M2s /ave in&uilt &ackup and recovery tec/ni*ues t/at ensure t/at t/e data&ase is protected rom t/ese kinds o atalities too. ,.8 Co#<'r!on o/ R"('ton'( D't'$'!" M'n'-"#"nt S)!t"#! @/e ollo$ing ta&les compare general and tec/nical inormation or a num&er o relational data&ase management systems. Comparisons are &ased on t/e sta&le versions $it/out any add4ons' eDtensions or eDternal programs. ,.8.* G"n"r'( n/or#'ton M'nt'n"r Fr!t <u$(c r"("'!" d't" L't"!t !t'$(" &"r!on So/t2'r" (c"n!" 8t1 D#"n!on ," s.a.s +:<, v++ 2FL !roprietary ADABAS 2ot$are (G +:;9 # # Ad'<t&" S"r&"r Ent"r<r!" 2y&ase +:<; +1.9 !roprietary Ad&'nt'-" D't'$'!" S"r&"r 2y&ase +::8 <.+ !roprietary A<'c1" D"r$) (pac/e 899, +9.,.+.= (pac/e License D't'co# C( # ++.8 !roprietary DB+ I%M +:<8 :.1 !roprietary DBISAM Elevate 2ot$are # ,.81 !roprietary D't'2'!< 2igniicant "ata 2ystems (pril 899< +.9.+ !roprietary E("&'t"DB Elevate 2ot$are # +.9+ !roprietary F("M'0"r ?ileMaker +:<, : proprietary Fr"$rd ?ire&ird project #uly 81' 8999 8.+.9 I!L and I"!L In/or#7 I%M +:<1 ++.+9 !roprietary +8< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM CS=LDB )2FL "evelopment Group 899+ +.<.9 %2" C+ )8 2ot$are 8991 +.9 E!L and modiied M!L In-r"! Ingres Corp. +:;, Ingres 899. r8 :.+.9 G!L and proprietary Int"rB'!" CodeGear +:<1 899; !roprietary M'7DB 2(! (G # ;.. G!L or proprietary Mcro!o/t Acc"!! Microsot +::8 +8 B899;C !roprietary Mcro!o/t V!u'( Fo7<ro Microsot # : B8991C !roprietary Mcro!o/t S=L S"r&"r Microsot +:<: :.99.=9,8 B8991 2!8C !roprietary Mon"tDB @/e Monet"% "eveloper @eam 899, ,.+. B?e&. 899;C Monet"% !u&lic License v+.+ M)S=L 2un Microsystems Novem&er +::. 1.9..; G!L or proprietary CP NonSto< S=L )e$lett4 !ackard +:<; 2FL MQ 8.9 !roprietary O#n! Studo @igerLogic Inc #uly +:<8 ,.=.+ 7elease + BMay 899<C !roprietary Or'c(" Oracle Corporation Novem&er +:;: ++g 7elease + B2eptem&er 899;C !roprietary Or'c(" Rd$ Oracle Corporation +:<, ;.8 !roprietary O<"nEd-" !rogress 2ot$are Corporation +:<, +9.+C !roprietary O<"nLn0 Vrtuo!o OpenLink 2ot$are +::< 1.9.1 B#anuary 899<C G!L or proprietary P"r&'!&" PS=L !ervasive 2ot$are # : !roprietary Po()1"dr' DBMS ENE( (% +::= <.9 B#uly 899<C !roprietary Po!t-r"S=L !ostgre2FL Glo&al "evelopment Group #une +:<: <.=.= B+8 #une 899<C %2" P)rr1o DBMS University o !aisley Novem&er 8991 9.1 !roprietary RB'!" 7%ase # ;.. !roprietary +8: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM RDM E#$"dd"d %irdstep @ec/nology +:<, <.+ !roprietary RDM S"r&"r %irdstep @ec/nology +::9 <.9 !roprietary Sc#or"DB 2cimore 8991 8.1 ?ree$are S#'((S=L 2mall2FL (pril +.' 8991 9.+: LG!L S=L An)21"r" 2y&ase +::8 +9.9 !roprietary S=Lt" ". 7ic/ard )ipp (ugust +;' 8999 =.1.; B+; Marc/ 899<C !u&lic domain T"r'd't' @eradata +:<, 0+8 !roprietary V'("ntn' !aradigma 2ot$are ?e&ruary +::< =.9.+ !roprietary ,.8.+ O<"r'tn- !)!t"# !u<<ort @/e operating systems t/e 7"%M2es can run on. 3ndo2! M'c OS I Lnu7 BSD UNII >BOS *
8t1 D#"n!on Aes Aes No No No No ADABAS Aes No Aes No Aes Aes Ad'<t&" S"r&"r Ent"r<r!" Aes No Aes Aes Aes No Ad&'nt'-" D't'$'!" S"r&"r Aes No Aes No No No A<'c1" D"r$) + Aes Aes Aes Aes Aes Aes D't'Co# No No No No No Aes D't'2'!< Aes No No No No No DB+ 5 Aes No Aes No Aes Aes Fr"$rd Aes Aes Aes Aes Aes May&e CS=LDB + Aes Aes Aes Aes Aes Aes C+ + Aes Aes Aes Aes Aes May&e F("M'0"r Aes Aes No No No No In/or#7 Aes Aes Aes Aes Aes No In-r"! Aes Aes Aes Aes Aes !artial Int"rB'!" Aes Aes Aes No Aes B2olarisC No M'7DB Aes No Aes No Aes May&e Mcro!o/t Acc"!! Aes No No No No No Mcro!o/t V!u'( Fo7<ro Aes No No No No No Mcro!o/t S=L S"r&"r Aes No No No No No Mon"tDB Aes Aes Aes No Aes No M)S=L Aes Aes Aes Aes Aes May&e O#n! Studo Aes Aes Aes No No No +=9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Or'c(" Aes Aes Aes No Aes Aes Or'c(" Rd$ , No No No No No No O<"nEd-" Aes No Aes No Aes No O<"nLn0 Vrtuo!o Aes Aes Aes Aes Aes Aes Po()1"dr' DBMS Aes No Aes No Aes No Po!t-r"S=L Aes Aes Aes Aes Aes No P)rr1o DBMS Aes B.NE@C No Aes BMonoC No No No RB'!" Aes No No No No No RDM E#$"dd"d Aes Aes Aes Aes Aes No RDM S"r&"r Aes Aes Aes Aes Aes No Sc#or"DB Aes No No No No No S#'((S=L + Aes Aes Aes Aes Aes Aes S=L An)21"r" Aes Aes Aes No Aes No S=Lt" Aes Aes Aes Aes Aes May&e T"r'd't' Aes No Aes No Aes No V'("ntn' Aes Aes Aes No No No Note B+C5 Open source data&ases listed as UNIQ4compati&le $ill likely compile and run under E-O2Is &uilt4in UNIQ 2ystem 2ervices BU22C su&system. Most data&ases listed as LinuD4compati&le can run alongside E-O2 on t/e same server using LinuD on E2eries. Note B85 @/e data&ase availa&ility depends on #ava 0irtual Mac/ine not on t/e operatin system Note B=C5 Oracle 7d& $as originally developed &y "EC' and runs on Open0M2 Note B,C5 Oracle data&ase ++g also runs on Open0M2' )!-UQ and (IQ. +9g also supported %28999-O2" and E-O2 B=+4&itC' &ut t/at support /as &een discontinued in ++g. Earlier versions t/an +9g $ere availa&le on a $ide variety o platorms. Note B1C5 "%8 is also availa&le or i1-O2' E-0M' E-02E. !revious versions $ere also availa&le or O2-8. ,.8., Fund'#"nt'( /"'tur"! Inormation a&out $/at undamental 7"%M2 eatures are implemented natively. ACID R"/"r"nt'( nt"-rt) Tr'n!'cton! Uncod" Int"r/'c" 8t1 D#"n!on Aes Aes Aes Aes GUI L 2FL ADABAS # # # # # Ad'<t&" S"r&"r Ent"r<r!" Aes Aes Aes Aes # +=+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Ad&'nt'-" D't'$'!" S"r&"r Aes Aes Aes No (!I L 2FL A<'c1" D"r$) Aes Aes Aes Aes 2FL D't'2'!< No Aes Aes Aes GUI DB+ Aes Aes Aes Aes GUI L 2FL Fr"$rd Aes Aes Aes Aes 2FL CS=LDB Aes Aes Aes Aes 2FL C+ Aes Aes Aes Aes 2FL In/or#7 Aes Aes Aes Aes # In-r"! Aes Aes Aes Aes 2FL Int"rB'!" Aes Aes Aes Aes 2FL M'7DB Aes Aes Aes Aes 2FL Mcro!o/t Acc"!! No Aes Aes Aes GUI L 2FL Mcro!o/t V!u'( Fo7<ro No Aes Aes No GUI L 2FL Mcro!o/t S=L S"r&"r Aes Aes Aes Aes 2FL Mon"tDB Aes Aes Aes Aes # M)S=L Aes . Aes . Aes . !artial 2FL Or'c(" Aes Aes Aes Aes 2FL Or'c(" Rd$ Aes Aes Aes Aes # O<"nEd-" Aes No ; Aes Aes !rogress ,GL L 2FL O<"nLn0 Vrtuo!o Aes Aes Aes Aes # Po()1"dr' DBMS Aes Aes Aes Aes 2FL Po!t-r"S=L Aes Aes Aes Aes 2FL P)rr1o DBMS Aes Aes Aes Aes # RDM E#$"dd"d Aes Aes Aes Aes 2FL L (!I RDM S"r&"r Aes Aes Aes Aes 2FL L (!I Sc#or"DB Aes Aes Aes !artial 2FL S=L An)21"r" Aes Aes Aes Aes # S=Lt" Aes No < %asic < Aes 2FL T"r'd't' Aes Aes Aes Aes 2FL V'("ntn' No Aes No Aes # Note B.C5 ?or transactions and reerential integrity' t/e Inno"% ta&le type must &e usedJ Windo$s installer sets t/is as deault i support or transactions is selected' on ot/er operating systems t/e deault ta&le type is MyI2(M. )o$ever' even t/e Inno"% ta&le type permits storage o values t/at eDceed t/e data rangeJ some vie$ t/is as violating t/e Integrity constraint o (CI". +=8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Note B;C5 ?O7EIGN 3EA constraints are parsed &ut are not enorced. @riggers can &e used instead. Nested transactions are not supported. Note B<C5 (vaila&le via @riggers. 8.: CONCLUSION @/e most dominant model in use today is t/e relational data&ase management systems' usually used $it/ t/e structured *uery language 2FL *uery language. Many "%M2 also support t/e Open "ata&ase Connectivitry t/at supports a standard $ay or programmers to access t/e data&ase management systems. 5.: SUMMARY ( 7elational data&ase management system B7"%M2C is a data&ase management system B"%M2C t/at is &ased on t/e relational model as introduced &y E. ?. Codd. Most popular commercial and open source data&ases currently in use are &ased on t/e relational model. E. ?. Codd introduced t/e term in /is seminal paper M( 7elational Model o "ata or Large 2/ared "ata %anksM' pu&lis/ed in +:;9. In t/is paper and later papers /e deined $/at /e meant &y r"('ton'(. One $ell4kno$n deinition o $/at constitutes a relational data&ase system is CoddIs +8 rules @/e most popular deinition o an 7"%M2 is a product t/at presents a vie$ o data as a collection o ro$s and columns' even i it is not &ased strictly upon relational t/eory (s mentioned earlier' an 7"%M2 is sot$are t/at is used or creating and maintaining a data&ase. Maintaining involves several tasks t/at an 7"%M2 takes care o Comparisons are &ased on t/e sta&le versions $it/out any add4ons' eDtensions or eDternal programs. ?.: TUTOR@MARAED ASSIGNMENT +. List 1 eatures o 7elational "ata&ase Management 2ystems 8. Mention 1 criteria you can use to dierentiate types o 7"%M2s 7.: REFERENCESBFURTCER READINGS Comparison o dierent 2FL implementations against 2FL standards. Includes Oracle' "%8' Microsot 2FL 2erver' My2FL and !ostgre2FL. B9<-#un-899;C. Comparison o Oracle <-:i' My2FL ,.D and !ostgre2FL ;.D "%M2 against 2FL standards. B+,-Mar-8991C. +== M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Comparison o Oracle and 2FL 2erver. B899,C. Comparison o geometrical data /andling in !ostgre2FL' My2FL and "%8 B8:-2ep-899=C. Open 2ource "ata&ase 2ot$are Comparison BMar-8991C. !ostgre2FL vs. My2FL vs. Commercial "ata&ases5 ItIs (ll (&out W/at Aou Need B+8-(pr-899,C. +=, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM UNIT + DATA 3ARECOUSE CONTENTS +.9 Introduction 8.9 O&jectives ,.: M'n Cont"nt ,.* C!tor) =.8 B"n"/t! o/ D't' 3'r"1ou!n- =.= D't' 3'r"1ou!" Arc1t"ctur" =., Nor#'(>"d V"r!u! D#"n!on'( A<<ro'c1 to Stor'-" o/ D't' ,.5 Con/or#n- In/or#'ton ,? To<@Do2n &"r!u! Botto#@U< D"!-n M"t1odo(o-"! ,.7 D't' 3'r"1ou!"! &"r!u! O<"r'ton'( S)!t"#! ,.8 E&o(uton n Or-'n>'ton U!" o/ D't' 3'r"1ou!"! ,.F D!'d&'nt'-"! o/ D't' 3'r"1ou!"! =.+9 "ata Ware/ouse (ppliance =.++ T1" Futur" o/ D't' 3'r"1ou!n- ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION ( d't' 2'r"1ou!" is a repository o an organiEationIs electronically stored data. "ata $are/ouses are designed to acilitate reporting and analysis. @/is classic deinition o t/e data $are/ouse ocuses on data storage. )o$ever' t/e means to retrieve and analyEe data' to eDtract' transorm and load data' and to manage t/e dictionary data are also considered essential components o a data $are/ousing system. Many reerences to data $are/ousing use t/is &roader conteDt. @/us' an eDpanded deinition or data $are/ousing includes &usiness intelligence tools' tools to eDtract' transorm' and load data into t/e repository' and tools to manage and retrieve metadata. In contrast to data $are/ouses are operational systems $/ic/ perorm day4to4day transaction processing. +=1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 d"/n" d't' 2'r"1ou!" trace t/e /istory and development process o data $are/ouse list various &eneits o data $are/ouse deine t/e arc/itecture o a data $are/ouse compare and contrast "ata Ware/ouses and Operational 2ystems kno$ $/at is a data $are/ouse appliance' and t/e disadvantages o data $are/ouse /ave idea o $/at t/e uture /olds or data $are/ouse concept. ,.: MAIN CONTENT ,.* C!tor) @/e concept o data $are/ousing dates &ack to t/e late4+:<9s $/en I%M researc/ers %arry "evlin and !aul Murp/y developed t/e M&usiness data $are/ouseM. In essence' t/e data $are/ousing concept $as intended to provide an arc/itectural model or t/e lo$ o data rom operational systems to decision support environments. @/e concept attempted to address t/e various pro&lems associated $it/ t/is lo$ 4 mainly' t/e /ig/ costs associated $it/ it. In t/e a&sence o a data $are/ousing arc/itecture' an enormous amount o redundancy o inormation $as re*uired to support t/e multiple decision support environment t/at usually eDisted. In larger corporations it $as typical or multiple decision support environments to operate independently. Eac/ environment served dierent users &ut oten re*uired muc/ o t/e same data. @/e process o gat/ering' cleaning and integrating data rom various sources' usually long eDisting operational systems Busually reerred to as legacy systemsC' $as typically in part replicated or eac/ environment. Moreover' t/e operational systems $ere re*uently reeDamined as ne$ decision support re*uirements emerged. Oten ne$ re*uirements necessitated gat/ering' cleaning and integrating ne$ data rom t/e operational systems t/at $ere logically related to prior gat/ered data. %ased on analogies $it/ real4lie $are/ouses' data $are/ouses $ere intended as large4scale collection-storage-staging areas or corporate data. "ata could &e retrieved rom one central point or data could &e distri&uted to Mretail storesM or Mdata martsM $/ic/ $ere tailored or ready access &y users. +=. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.+ B"n"/t! o/ D't' 3'r"1ou!n- 2ome o t/e &eneits t/at a data $are/ouse provides are as ollo$s5 ( data $are/ouse provides a common data model or all data o interest regardless o t/e dataIs source. @/is makes it easier to report and analyEe inormation t/an it $ould &e i multiple data models $ere used to retrieve inormation suc/ as sales invoices' order receipts' general ledger c/arges' etc. !rior to loading data into t/e data $are/ouse' inconsistencies are identiied and resolved. @/is greatly simpliies reporting and analysis. Inormation in t/e data $are/ouse is under t/e control o data $are/ouse users so t/at' even i t/e source system data is purged over time' t/e inormation in t/e $are/ouse can &e stored saely or eDtended periods o time. %ecause t/ey are separate rom operational systems' data $are/ouses provide retrieval o data $it/out slo$ing do$n operational systems. "ata $are/ouses acilitate decision support system applications suc/ as trend reports Be.g.' t/e items $it/ t/e most sales in a particular area $it/in t/e last t$o yearsC' eDception reports' and reports t/at s/o$ actual perormance versus goals. "ata $are/ouses can $ork in conjunction $it/ and' /ence' en/ance t/e value o operational &usiness applications' nota&ly customer relations/ip management BC7MC systems. ,., D't' 3'r"1ou!" Arc1t"ctur" (rc/itecture' in t/e conteDt o an organiEationIs data $are/ousing eorts' is a conceptualiEation o /o$ t/e data $are/ouse is &uilt. @/ere is no rig/t or $rong arc/itecture. @/e $ort/iness o t/e arc/itecture can &e judged in /o$ t/e conceptualiEation aids in t/e &uilding' maintenance' and usage o t/e data $are/ouse. One possi&le simple conceptualiEation o a data $are/ouse arc/itecture consists o t/e ollo$ing interconnected layers5 O<"r'ton'( D't'$'!" L')"r @/e source data or t/e data $are/ouse 4 (n organiEationIs E7! systems all into t/is layer. +=; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM In/or#'ton'( Acc"!! L')"r @/e data accessed or reporting and analyEing and t/e tools or reporting and analyEing data 4 %usiness intelligence tools all into t/is layer. (nd t/e Inmon43im&all dierences a&out design met/odology' discussed later in t/is article' /ave to do $it/ t/is layer. D't' 'cc"!! L')"r @/e interace &et$een t/e operational and inormational access layer 4 @ools to eDtract' transorm' load data into t/e $are/ouse all into t/is layer. M"t'd't' L')"r @/e data directory 4 @/is is oten usually more detailed t/an an operational system data directory. @/ere are dictionaries or t/e entire $are/ouse and sometimes dictionaries or t/e data t/at can &e accessed &y a particular reporting and analysis tool. ,.8 Nor#'(>"d V"r!u! D#"n!on'( A<<ro'c1 to Stor'-" o/ D't' @/ere are t$o leading approac/es to storing data in a data $are/ouse 4 t/e dimensional approac/ and t/e normaliEed approac/. In t/e dimensional approac/' transaction data are partitioned into eit/er TactsU' $/ic/ are generally numeric transaction data' or MdimensionsM' $/ic/ are t/e reerence inormation t/at gives conteDt to t/e acts. ?or eDample' a sales transaction can &e &roken up into acts suc/ as t/e num&er o products ordered and t/e price paid or t/e products' and into dimensions suc/ as order date' customer name' product num&er' order s/ip4to and &ill4to locations' and salesperson responsi&le or receiving t/e order. ( key advantage o a dimensional approac/ is t/at t/e data $are/ouse is easier or t/e user to understand and to use. (lso' t/e retrieval o data rom t/e data $are/ouse tends to operate very *uickly. @/e main disadvantages o t/e dimensional approac/ are5 +C In order to maintain t/e integrity o acts and dimensions' loading t/e data $are/ouse $it/ data rom dierent operational systems is complicated' and 8C It is diicult to modiy t/e data $are/ouse structure i t/e organiEation adopting t/e dimensional approac/ c/anges t/e $ay in $/ic/ it does &usiness. In t/e normaliEed approac/' t/e data in t/e data $are/ouse are stored ollo$ing' to a degree' t/e Codd normaliEation rule. @a&les are grouped toget/er &y !u$%"ct 'r"'! t/at relect general data categories Be.g.' data +=< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM on customers' products' inance' etc.C @/e main advantage o t/is approac/ is t/at it is straig/tor$ard to add inormation into t/e data&ase. ( disadvantage o t/is approac/ is t/at' &ecause o t/e num&er o ta&les involved' it can &e diicult or users &ot/ to +C join data rom dierent sources into meaningul inormation and t/en 8C access t/e inormation $it/out a precise understanding o t/e sources o data and o t/e data structure o t/e data $are/ouse. @/ese approac/es are not eDact opposites o eac/ ot/er. "imensional approac/es can involve normaliEing data to a degree. ,.5 Con/or#n- In/or#'ton (not/er important decision in designing a data $are/ouse is $/ic/ data to conorm and /o$ to conorm t/e data. ?or eDample' one operational system eeding data into t/e data $are/ouse may use MMM and M?M to denote seD o an employee $/ile anot/er operational system may use MMaleM and M?emaleM. @/oug/ t/is is a simple eDample' muc/ o t/e $ork in implementing a data $are/ouse is devoted to making similar meaning data consistent $/en t/ey are stored in t/e data $are/ouse. @ypically' eDtract' transorm' load tools are used in t/is $ork. ,.? To<@Do2n &"r!u! Botto#@U< D"!-n M"t1odo(o-"! Botto#@U< D"!-n 7alp/ 3im&all' a $ell4kno$n aut/or on data $are/ousing' is a proponent o t/e bottom#up approac/ to data $are/ouse design. In t/e &ottom4up approac/ data marts are irst created to provide reporting and analytical capa&ilities or speciic &usiness processes. "ata marts contain atomic data and' i necessary' summariEed data. @/ese data marts can eventually &e unioned toget/er to create a compre/ensive data $are/ouse. @/e com&ination o data marts is managed t/roug/ t/e implementation o $/at 3im&all calls Ma data $are/ouse &us arc/itectureM. %usiness value can &e returned as *uickly as t/e irst data marts can &e created. Maintaining tig/t management over t/e data $are/ouse &us arc/itecture is undamental to maintaining t/e integrity o t/e data $are/ouse. @/e most important management task is making sure dimensions among data marts are consistent. In 3im&all $ords' t/is means t/at t/e dimensions MconormM. To<@Do2n D"!-n %ill Inmon' one o t/e irst aut/ors on t/e su&ject o data $are/ousing' /as deined a data $are/ouse as a centraliEed repository or t/e entire +=: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM enterprise. Inmon is one o t/e leading proponents o t/e top#do1n approac/ to data $are/ouse design' in $/ic/ t/e data $are/ouse is designed using a normaliEed enterprise data model. M(tomicM data' t/at is' data at t/e lo$est level o detail' are stored in t/e data $are/ouse. "imensional data marts containing data needed or speciic &usiness processes or speciic departments are created rom t/e data $are/ouse. In t/e Inmon vision t/e data $are/ouse is at t/e center o t/e MCorporate Inormation ?actoryM BCI?C' $/ic/ provides a logical rame$ork or delivering &usiness intelligence B%IC and &usiness management capa&ilities. @/e CI? is driven &y data provided rom &usiness operations Inmon states t/at t/e data $are/ouse is5 Su$%"ct@Or"nt"d @/e data in t/e data $are/ouse is organiEed so t/at all t/e data elements relating to t/e same real4$orld event or o&ject are linked toget/er. T#"@V'r'nt @/e c/anges to t/e data in t/e data $are/ouse are tracked and recorded so t/at reports can &e produced s/o$ing c/anges over time. Non@Vo('t(" "ata in t/e data $are/ouse is never over4$ritten or deleted 4 once committed' t/e data is static' read4only' and retained or uture reporting. Int"-r't"d @/e data $are/ouse contains data rom most or all o an organiEationIs operational systems and t/is data is made consistent. @/e top4do$n design met/odology generates /ig/ly consistent dimensional vie$s o data across data marts since all data marts are loaded rom t/e centraliEed repository. @op4do$n design /as also proven to &e ro&ust against &usiness c/anges. Generating ne$ dimensional data marts against t/e data stored in t/e data $are/ouse is a relatively simple task. @/e main disadvantage to t/e top4do$n met/odology is t/at it represents a very large project $it/ a very &road scope. @/e up4ront cost or implementing a data $are/ouse using t/e top4do$n met/odology is signiicant' and t/e duration o time rom t/e start o project to t/e point t/at end users eDperience initial &eneits can &e su&stantial. In addition' t/e top4do$n met/odology can &e inleDi&le and unresponsive to c/anging departmental needs during t/e implementation p/ases. +,9 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM C)$rd D"!-n Over time it /as &ecome apparent to proponents o &ottom4up and top4 do$n data $are/ouse design t/at &ot/ met/odologies /ave &eneits and risks. )y&rid met/odologies /ave evolved to take advantage o t/e ast turn4around time o &ottom4up design and t/e enterprise4$ide data consistency o top4do$n design ,.7 D't' 3'r"1ou!"! &"r!u! O<"r'ton'( S)!t"#! Operational systems are optimiEed or preservation o data integrity and speed o recording o &usiness transactions t/roug/ use o data&ase normaliEation and an entity4relations/ip model. Operational system designers generally ollo$ t/e Codd rules o data normaliEation in order to ensure data integrity. Codd deined ive increasingly stringent rules o normaliEation. ?ully normaliEed data&ase designs Bt/at is' t/ose satisying all ive Codd rulesC oten result in inormation rom a &usiness transaction &eing stored in doEens to /undreds o ta&les. 7elational data&ases are eicient at managing t/e relations/ips &et$een t/ese ta&les. @/e data&ases /ave very ast insert-update perormance &ecause only a small amount o data in t/ose ta&les is aected eac/ time a transaction is processed. ?inally' in order to improve perormance' older data are usually periodically purged rom operational systems. "ata $are/ouses are optimiEed or speed o data retrieval. ?re*uently data in data $are/ouses are denormalised via a dimension4&ased model. (lso' to speed data retrieval' data $are/ouse data are oten stored multiple times 4 in t/eir most granular orm and in summariEed orms called aggregates. "ata $are/ouse data are gat/ered rom t/e operational systems and /eld in t/e data $are/ouse even ater t/e data /as &een purged rom t/e operational systems. ,.8 E&o(uton n Or-'n>'ton U!" o/ D't' 3'r"1ou!"! OrganiEations generally start o $it/ relatively simple use o data $are/ousing. Over time' more sop/isticated use o data $are/ousing evolves. @/e ollo$ing general stages o use o t/e data $are/ouse can &e distinguis/ed5 O// (n" O<"r'ton'( D't'$'!"! "ata $are/ouses in t/is initial stage are developed &y simply copying t/e data o an operational system to anot/er server $/ere t/e processing load o reporting against t/e copied data does not impact t/e operational systemIs perormance. +,+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM O// (n" D't' 3'r"1ou!" "ata $are/ouses at t/is stage are updated rom data in t/e operational systems on a regular &asis and t/e data $are/ouse data is stored in a data structure designed to acilitate reporting. R"'( T#" D't' 3'r"1ou!" "ata $are/ouses at t/is stage are updated every time an operational system perorms a transaction Be.g.' an order or a delivery or a &ooking.C Int"-r't"d D't' 3'r"1ou!" "ata $are/ouses at t/is stage are updated every time an operational system perorms a transaction. @/e data $are/ouses t/en generate transactions t/at are passed &ack into t/e operational systems. ,.F D!'d&'nt'-"! o/ D't' 3'r"1ou!"! @/ere are also disadvantages to using a data $are/ouse. 2ome o t/em are5 Over t/eir lie' data $are/ouses can /ave /ig/ costs. @/e data $are/ouse is usually not static. Maintenance costs are /ig/. "ata $are/ouses can get outdated relatively *uickly. @/ere is a cost o delivering su&optimal inormation to t/e organiEation. @/ere is oten a ine line &et$een data $are/ouses and operational systems. "uplicate' eDpensive unctionality may &e developed. Or' unctionality may &e developed in t/e data $are/ouse t/at' in retrospect' s/ould /ave &een developed in t/e operational systems and vice versa. ,.*: D't' 3'r"1ou!" A<<('nc" ( d't' 2'r"1ou!" '<<('nc" is an integrated set o servers' storage' O2' "%M2 and sot$are speciically pre4installed and pre4optimiEed or data $are/ousing. (lternatively' t/e term is also used or similar sot$are4only systems t/at purportedly are very easy to install on speciic recommended /ard$are conigurations. "W appliances provide solutions or t/e mid4to4large volume data $are/ouse market' oering lo$4cost perormance most commonly on data volumes in t/e tera&yte to peta&yte range. +,8 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM T"c1no(o-) Pr#"r Most "W appliance vendors use massively parallel processing BM!!C arc/itectures to provide /ig/ *uery perormance and platorm scala&ility. M!! arc/itectures consist o independent processors or servers eDecuting in parallel. Most M!! arc/itectures implement a Ts/ared not/ing arc/itectureU $/ere eac/ server is sel4suicient and controls its o$n memory and disk. 2/ared not/ing arc/itectures /ave a proven record or /ig/ scala&ility and little contention. "W appliances distri&ute data onto dedicated disk storage units connected to eac/ server in t/e appliance. @/is distri&ution allo$s "W appliances to resolve a relational *uery &y scanning data on eac/ server in parallel. @/e divide4 and4con*uer approac/ delivers /ig/ perormance and scales linearly as ne$ servers are added into t/e arc/itecture. M!! data&ase arc/itectures are not ne$. @eradata' @andem' %ritton Lee' and 2e*uent oered M!! 2FL4&ased arc/itectures in t/e +:<9s. @/e re4 emergence o M!! data $are/ouses /as &een aided &y open source and commodity components. (dvances in tec/nology /ave reduced costs and improved perormance in storage devices' multi4core C!Us and net$orking components. Open source 7"%M2 products' suc/ as Ingres and !ostgre2FL' reduce sot$are license costs and allo$ "W appliance vendors to ocus on optimiEation rat/er t/an providing &asic data&ase unctionality. Open source LinuD provides a sta&le' $ell4implemented O2 or "W appliances. C!tor) Many consider @eradataKs initial product as t/e irst "W appliance Bor %ritton4LeeIs' &ut %ritton LeeSrenamed 2/are%aseS$as ac*uired &y @eradata in #une' +::9C. 2ome regard @eradataIs current oerings as still &eing ot/er appliances' $/ile ot/ers argue t/at t/ey all s/ort in ease o installation or administration. Interest in t/e data $are/ouse appliance category is generally dated to t/e emergence o NeteEEa in t/e early 8999s. More recently' a second generation o modern "W appliances /as emerged' marking t/e move to mainstream vendor integration. I%M integrated its Ino2p/ere Ware/ouse Bormerly "%8 Ware/ouseC $it/ its o$n servers and storage to create t/e I%M Ino2p/ere %alanced Ware/ouse. Ot/er "W appliance vendors /ave partnered $it/ major /ard$are vendors to /elp &ring t/eir appliances to market. "(@(llegro partners $it/ EMC and "ell and implements open source Ingres on LinuD. Greenplum /as a partners/ip $it/ 2un Microsystems and implements %iEgres Ba orm o !ostgre2FLC on 2olaris using t/e \?2 ile system. )! Neovie$ /as a $/olly4o$ned solution and uses )! Non2top 2FL. +,= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 3ognitio oers a ro$4&ased TvirtualU data $are/ouse appliance $/ile 0ertica' and !ar(ccel oer column4&ased TvirtualU data $are/ouse appliances. Like Greenplum' !ar(ccel partners $it/ 2un Microsystems. @/ese solutions provide sot$are4only solutions deployed on clusters o commodity /ard$are. 3ognitioKs /omegro$n WQ8 data&ase runs on several &lade conigurations. Ot/er players in t/e "W appliance space include Calpont and "ataupia. 7ecently' t/e market /as seen t/e emergence o data $are/ouse &undles $/ere vendors com&ine t/eir /ard$are and data&ase sot$are toget/er as a data $are/ouse platorm. @/e Oracle OptimiEed Ware/ouse Initiative com&ines t/e Oracle "ata&ase $it/ t/e industryKs leading computer manuacturers "ell' EMC' )!' I%M' 2GI and 2un Microsystems. OracleIs OptimiEed Ware/ouses are pre4validated conigurations and t/e data&ase sot$are comes pre4installed' t/oug/ some analysts dier as to $/et/er t/ese s/ould &e regarded as appliances. B"n"/t! R"ducton n Co!t! @/e total cost o o$ners/ip B@COC o a data $are/ouse consists o initial entry costs' on4going maintenance costs and t/e cost o increasing capacity as t/e data $are/ouse gro$s. "W appliances oer lo$ entry and maintenance costs. Initial costs range rom i+9'999 to i+19'999 per tera&yte' depending on t/e siEe o t/e "W appliance installed. @/e resource cost or monitoring and tuning t/e data $are/ouse makes up a large part o t/e @CO' oten as muc/ as <9G. "W appliances reduce administration or day4to4day operations' setup and integration. Many also oer lo$ costs or eDpanding processing po$er and capacity. Wit/ t/e increased ocus on controlling costs com&ined $it/ tig/t I@ %udgets' data $are/ouse managers need to reduce and manage eDpenses $/ile leveraging t/eir tec/nology as muc/ as possi&le making "W appliances a natural solution. P'r'(("( P"r/or#'nc" "W appliances provide a compelling price-perormance ratio. Many support miDed4$orkloads $/ere a &road range o ad4/oc *ueries and reports run simultaneously $it/ loading. "W appliance vendors use several distri&ution and partitioning met/ods to provide parallel perormance. 2ome "W appliances scan data using partitioning and se*uential I-O instead o indeD usage. Ot/er "W appliances use +,, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM standard data&ase indeDing. Wit/ /ig/ perormance on /ig/ly granular data' "W appliances are a&le to address analytics t/at previously could not meet perormance re*uirements. R"duc"d Ad#n!tr'ton "W appliances provide a single vendor solution and take o$ners/ip or optimiEing t/e parts and sot$are $it/in t/e appliance. @/is eliminates t/e customerKs costs or integration and regression testing o t/e "%M2' storage and O2 on a tera&yte scale and avoids some o t/e compati&ility issues t/at arise rom multi4vendor solutions. ( single support point also provides a single source or pro&lem resolution and a simpliied upgrade pat/ or sot$are and /ard$are. @/e care and eeding o "W appliances is less t/an many alternate data $are/ouse solutions. "W appliances reduce administration t/roug/ automated space allocation' reduced indeD maintenance and in most cases' reduced tuning and perormance analysis. Bu(t@n C-1 A&'('$(t) "W appliance vendors provide &uilt4in /ig/ availa&ility t/roug/ redundancy on components $it/in t/e appliance. Many oer $arm4 stand&y servers' dual net$orks' dual po$er supplies' disk mirroring $it/ ro&ust ailover and solutions or server ailure. Sc'('$(t) "W appliances scale or &ot/ capacity and perormance. Many "W appliances implement a modular design t/at data&ase administrators can add to incrementally' eliminating up4ront costs or over4provisioning. In contrast' arc/itectures t/at do not support incremental eDpansion result in /ours o production do$ntime' during $/ic/ data&ase administrators eDport and re4load tera&ytes o data. In M!! arc/itectures' adding servers increases perormance as $ell as capacity. @/is is not al$ays t/e case $it/ alternate solutions. R'<d T#"@to@V'(u" Companies increasingly eDpect to use &usiness analytics to improve t/e current cycle. "W appliances provide ast implementations $it/out t/e need or regression and integration testing. 7apid prototyping is possi&le &ecause o reduced tuning and indeD creation' ast loading and reduced needs or aggregation in some cases. +,1 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM A<<(c'ton U!"! D3 '<<('nc"! <ro&d" !o(uton! /or #'n) 'n'()tc '<<(c'ton u!"!E nc(udn-: Enterprise data $are/ousing 2uper4siEed sand&oDes isolate po$er users $it/ resource intensive *ueries !ilot projects or projects re*uiring rapid prototyping and rapid time4to4 value O4loading projects rom t/e enterprise data $are/ouseJ ie large analytical *uery projects t/at aect t/e overall $orkload o t/e enterprise data $are/ouse (pplications $it/ speciic perormance or loading re*uirements "ata marts t/at /ave outgro$n t/eir present environment @urnkey data $are/ouses or data marts 2olutions or applications $it/ /ig/ data gro$t/ and /ig/ perormance re*uirements (pplications re*uiring data $are/ouse encryption Tr"nd! T1" D3 '<<('nc" #'r0"t ! !1/tn- tr"nd! n #'n) 'r"'! '! t "&o(&"!: 0endors are moving to$ard using commodity tec/nologies rat/er t/an proprietary assem&ly o commodity components. Implemented applications s/o$ usage eDpansion rom tactical and data mart solutions to strategic and enterprise data $are/ouse use. Mainstream vendor participation is no$ apparent. Wit/ a lo$er total cost o o$ners/ip' reduced maintenance and /ig/ perormance to address &usiness analytics on gro$ing data volumes' most analysts &elieve t/at "W appliances $ill gain market s/are. ,.** T1" Futur" o/ D't' 3'r"1ou!n- "ata $are/ousing' like any tec/nology nic/e' /as a /istory o innovations t/at did not receive market acceptance. ( 899; Gartner Group paper predicted t/e ollo$ing tec/nologies could &e disruptive to t/e &usiness intelligence market. 2ervice Oriented (rc/itecture 2earc/ capa&ilities integrated into reporting and analysis tec/nology +,. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM 2ot$are as a 2ervice (nalytic tools t/at $ork in memory 0isualiEation (not/er prediction is t/at data $are/ouse perormance $ill continue to &e improved &y use o data $are/ouse appliances' many o $/ic/ incorporate t/e developments in t/e aorementioned Gartner Group report. ?inally' management consultant @/omas "avenport' among ot/ers' predicts t/at more organiEations $ill seek to dierentiate t/emselves &y using analytics ena&led &y data $are/ouses. 8.: CONCLUSION "ata $are/ouse is no$ emerging as very important in data&ase management systems. @/is is as a result t/e gro$t/ in t/e data&ase o large corporations. ( data $are/ouse no$ makes it easier or t/e /olding o data $/ile in use. )o$ever' t/ere are c/allenges are constraints in t/e acceptance and implementation o data $are/ouse' $/ic/ is a normal in t/e development o any concept. @/e uture o data $are/ouse is good as some organiEations $ill opt or it. 5.: SUMMARY ( d't' 2'r"1ou!" is a repository o an organiEationIs electronically stored data. "ata $are/ouses are designed to acilitate reporting and analysis. @/e concept o data $are/ousing dates &ack to t/e late4+:<9s $/en I%M researc/ers %arry "evlin and !aul Murp/y developed t/e M&usiness data $are/ouseM. (rc/itecture' in t/e conteDt o an organiEationIs data $are/ousing eorts' is a conceptualiEation o /o$ t/e data $are/ouse is &uilt. @/ere are t$o leading approac/es to storing data in a data $are/ouse 4 t/e dimensional approac/ and t/e normaliEed approac/. (not/er important decision in designing a data $are/ouse is $/ic/ data to conorm and /o$ to conorm t/e data. 7alp/ 3im&all' a $ell4kno$n aut/or on data $are/ousing' is a proponent o t/e bottom#up approac/ to data $are/ouse design. Operational systems are optimiEed or preservation o data integrity and speed o recording o &usiness transactions t/roug/ use o data&ase normaliEation and an entity4relations/ip model. OrganiEations generally start o $it/ relatively simple use o data $are/ousing. Over time' more sop/isticated use o data $are/ousing evolves. +,; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ( d't' 2'r"1ou!" '<<('nc" is an integrated set o servers' storage' O2' "%M2 and sot$are speciically pre4installed and pre4optimiEed or data $are/ousing "ata $are/ousing' like any tec/nology nic/e' /as a /istory o innovations t/at did not receive market acceptance. ?.: Tutor@M'r0"d A!!-n#"nt +. "iscuss t/e &eneits associated $it/ t/e use o data $are/ouse.. 8. Mention 1 applications o data $are/ouse appliances 7.: REFERENCESBFURTCER READINGS Inmon' W.). Tech Topic: 8hat is a Data 8arehouseH !rism 2olutions. 0olume +. +::1. Aang' #un. 8are9ouse Information (rototype at 0tanford -89I(0.. 2tanord University. #uly ;' +::<. Caldeira' C. M"ata Ware/ousing 4 Conceitos e ModelosM. Edijkes 2lla&o. 899<. I2%N :;<4:;84.+<4,;:4: 3im&all' 7. and 7oss' M. M@/e "ata Ware/ouse @oolkit5 @/e Complete Guide to "imensional ModelingM. pp. =+9. Wiley. 8nd Ed. 8998. I2%N 94,;+48998,4;. Ericsson' 7. M%uilding %usiness Intelligence (pplications $it/ .NE@M. +st Ed. C/arles 7iver Media. ?e&ruary 899,. pp. 8<48:. !endse' Nigel and %ange' Carsten M@/e Missing NeDt %ig @/ingsM' 2c/legel' 3urt MEmerging @ec/nologies Could !rove "isruptive to t/e %usiness Intelligence MarketM' Gartner Group. #uly .' 899; "avenport' @/omas and )arris' #eanne MCompeting on (nalytics5 @/e Ne$ 2cience o WinningM. )arvard %usiness 2c/ool !ress. 899;. I2%N +4,884+9==84=. Fueries rom )ell &log m W/en is an appliance not an applianceN "%M28 S "ata%ase Management 2ystem 2ervicesm%log (rc/ive m "ata $are/ouse appliances O act and iction @odd W/ite BNovem&er 1 +::9C. M@eradata Corp. suers irst *uarterly Loss in our yearsM. 7os Angeles $usiness ;ournal. +,< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM UNIT , DOCUMENT MANAGEMENT SYSTEM CONTENTS +.9 Introduction 8.9 O&jectives ,.: M'n Cont"nt ,.* C!tor) 3.2 Document Management and Content Management ,., Co#<on"nt! ,.8 I!!u"! Addr"!!"d n Docu#"nt M'n'-"#"nt =.1 Using QML in "ocument and Inormation Management ,.? T)<"! o/ Docu#"nt M'n'-"#"nt S)!t"#! ,.9 Conclusion 1.9 2ummary ..9 @utor4Marked (ssignment ;.9 7eerences-?urt/er 7eadings *.: INTRODUCTION ( docu#"nt #'n'-"#"nt !)!t"# B"M2C is a computer system Bor set o computer programsC used to track and store electronic documents and-or images o paper documents. @/e term /as some overlap $it/ t/e concepts o Content Management 2ystems and is oten vie$ed as a component o Enterprise Content Management 2ystems and related to "igital (sset Management' "ocument imaging' Worklo$ systems and 7ecords Management systems. Contract Management and Contract Liecycle Management BCLMC can &e vie$ed as eit/er components or implementations o ECM. +.: OB;ECTIVES (t t/e end o t/is unit' you s/ould &e a&le to5 deine document management system trace t/e /istory and development process o document management system compare and contrast document management system and content management systems kno$ t/e &asic components o document management systems ans$er t/e *uestion o issues addressed &y document management systems kno$ t/e types o document management systems availa&le o t/e s/el. +,: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM ,.: MAIN CONTENT ,.* C!tor) %eginning in t/e +:<9s' a num&er o vendors &egan developing systems to manage paper4&ased documents. @/ese systems managed paper documents' $/ic/ included not only printed and pu&lis/ed documents' &ut also p/otos' prints' etc. Later' a second system $as developed' to manage electronic documents' i.e.' all t/ose documents' or iles' created on computers' and oten stored on local user ile systems. @/e earliest electronic document management BE"MC systems $ere eit/er developed to manage proprietary ile types' or a limited num&er o ile ormats. Many o t/ese systems $ere later reerred to as document imaging systems' &ecause t/e main capa&ilities $ere capture' storage' indeDing and retrieval o image ile ormats. @/ese systems ena&led an organiEation to capture aDes and orms' save copies o t/e documents as images' and store t/e image iles in t/e repository or security and *uick retrieval Bretrieval $as possi&le &ecause t/e system /andled t/e eDtraction o t/e teDt rom t/e document as it $as captured' and t/e teDt indeDer provided teDt retrieval capa&ilitiesC. E"M systems evolved to $/ere t/e system $as a&le to manage any type o ile ormat t/at could &e stored on t/e net$ork. @/e applications gre$ to encompass electronic documents' colla&oration tools' security' and auditing capa&ilities. ,.+ Docu#"nt M'n'-"#"nt 'nd Cont"nt M'n'-"#"nt @/ere is considera&le conusion in t/e market &et$een document management systems B"M2C and content management systems BCM2C. @/is /as not &een /elped &y t/e vendors' $/o are keen to market t/eir products as $idely as possi&le. @/ese t$o types o systems are very dierent' and serve complementary needs. W/ile t/ere is an ongoing move to merge t/e t$o toget/er Ba positive stepC' it is important to understand $/en eac/ system is appropriate. Docu#"nt M'n'-"#"nt S)!t"#! 5DMS6 "ocument management is certainly t/e older discipline' &orn out o t/e need to manage /uge num&ers o documents in organisations. +19 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Mature and $ell4tested' document management systems can &e c/aracterised as ollo$s5 ocused on managing documents' in t/e traditional sense Blike Word ilesC eac/ unit o inormation BdocumentC is airly large' and sel4 contained t/ere are e$ Bi anyC links &et$een documents provides limited integration $it/ repository Bc/eck4in' c/eck4out' etcC ocused primarily on storage and arc/iving includes po$erul $orklo$ targeted at storing and presenting documents in t/eir native ormat limited $e& pu&lis/ing engine typically produces one page or eac/ document Note t/at t/is is just a generalised description o a "M2' $it/ most systems oering a range o uni*ue eatures and capa&ilities. Nonet/eless' t/is does provide a representative outline o common "M2 unctionality. ( typical document management scenario5 A large legal firm purchases a DM0 to track the huge number of advice documents& contracts and briefs. It allo1s la1yers to easily retrieve earlier advice& and to use FprecedentF templates to quickly create ne1 documents. Aou canIt &uild a $e&site $it/ just a "M system Cont"nt M'n'-"#"nt S)!t"#! 5CMS6 Content management is more recent' and is primarily designed to meet t/e gro$ing needs o t/e $e&site and intranet markets.
( content management system can &e summarised as ollo$s5 manages small' interconnected units o inormation Be.g. $e& pagesC eac/ unit BpageC is deined &y its location on t/e site eDtensive cross4linking &et$een pages ocused primarily on page creation and editing provides tig/t integration &et$een aut/oring and t/e repository Bmetadata' etcC provides a very po$erul pu&lis/ing engine Btemplates' scripting' etcC ( typical content management scenario5 +1+ M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM A "M0 is purchased to manage the C@@@ page corporate 1ebsite. Template#based authoring allo1s business groups to easily create content& 1hile the publishing system dynamically generates richly# formatted pages. Content management and document management are complementary' not competing tec/nologies. Aou must c/oose an appropriate system i &usiness needs are to &e met. ,., Co#<on"nt! "ocument management systems commonly provide storage' versioning' metadata' security' as $ell as indeDing and retrieval capa&ilities. )ere is a description o t/ese components5 M"t'd't' Metadata is typically stored or eac/ document. Metadata may' or eDample' include t/e date t/e document $as stored and t/e identity o t/e user storing it. @/e "M2 may also eDtract metadata rom t/e document automatically or prompt t/e user to add metadata. 2ome systems also use optical c/aracter recognition on scanned images' or perorm teDt eDtraction on electronic documents. @/e resulting eDtracted teDt can &e used to assist users in locating documents &y identiying pro&a&le key$ords or providing or ull teDt searc/ capa&ility' or can &e used on its o$n. EDtracted teDt can also &e stored as a component o metadata' stored $it/ t/e image' or separately as a source or searc/ing document collections.
Int"-r'ton Many document management systems attempt to integrate document management directly into ot/er applications' so t/at users may retrieve eDisting documents directly rom t/e document management system repository' make c/anges' and save t/e c/anged document &ack to t/e repository as a ne$ version' all $it/out leaving t/e application. 2uc/ integration is commonly availa&le or oice suites and e4mail or colla&oration-group$are sot$are. C'<tur" Images o paper documents using scanners or multiunction printers. Optical C/aracter 7ecognition BOC7C sot$are is oten used' $/et/er integrated into t/e /ard$are or as stand4alone sot$are' in order to convert digital images into mac/ine reada&le teDt. +18 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Ind"7n- @rack electronic documents. IndeDing may &e as simple as keeping track o uni*ue document identiiersJ &ut oten it takes a more compleD orm' providing classiication t/roug/ t/e documentsI metadata or even t/roug/ $ord indeDes eDtracted rom t/e documentsI contents. IndeDing eDists mainly to support retrieval. One area o critical importance or rapid retrieval is t/e creation o an indeD topology. Stor'-" 2tore electronic documents. 2torage o t/e documents oten includes management o t/ose same documentsJ $/ere t/ey are stored' or /o$ long' migration o t/e documents rom one storage media to anot/er B)ierarc/ical storage managementC and eventual document destruction. R"tr"&'( 7etrieve t/e electronic documents rom t/e storage. (lt/oug/ t/e notion o retrieving a particular document is simple' retrieval in t/e electronic conteDt can &e *uite compleD and po$erul. 2imple retrieval o individual documents can &e supported &y allo$ing t/e user to speciy t/e uni*ue document identiier' and /aving t/e system use t/e &asic indeD Bor a non4indeDed *uery on its data storeC to retrieve t/e document. More leDi&le retrieval allo$s t/e user to speciy partial searc/ terms involving t/e document identiier and-or parts o t/e eDpected metadata. @/is $ould typically return a list o documents $/ic/ matc/ t/e userIs searc/ terms. 2ome systems provide t/e capa&ility to speciy a %oolean eDpression containing multiple key$ords or eDample p/rases eDpected to eDist $it/in t/e documentsI contents. @/e retrieval or t/is kind o *uery may &e supported &y previously4&uilt indeDes' or may perorm more time4consuming searc/es t/roug/ t/e documentsI contents to return a list o t/e potentially relevant documents. 2ee also "ocument retrieval. D!tr$uton S"curt) "ocument security is vital in many document management applications. Compliance re*uirements or certain documents can &e *uite compleD depending on t/e type o documents. ?or instance t/e )ealt/ Insurance !orta&ility and (ccounta&ility (ct B)I!((C re*uirements dictate t/at medical documents /ave certain security re*uirements. 2ome document management systems /ave a rig/ts management module t/at allo$s an administrator to give access to documents &ased on type to only certain people or groups o people. 3or0/(o2 +1= M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Worklo$ is a compleD pro&lem and some document management systems /ave a &uilt in $orklo$ module. @/ere are dierent types o $orklo$. Usage depends on t/e environment t/e E"M2 is applied to. Manual $orklo$ re*uires a user to vie$ t/e document and decide $/o to send it to. 7ules4&ased $orklo$ allo$s an administrator to create a rule t/at dictates t/e lo$ o t/e document t/roug/ an organiEation5 or instance' an invoice passes t/roug/ an approval process and t/en is routed to t/e accounts paya&le department. "ynamic rules allo$ or &ranc/es to &e created in a $orklo$ process. ( simple eDample $ould &e to enter an invoice amount and i t/e amount is lo$er t/an a certain set amount' it ollo$s dierent routes t/roug/ t/e organiEation. Co(('$or'ton Colla&oration s/ould &e in/erent in an E"M2. "ocuments s/ould &e capa&le o &eing retrieved &y an aut/oriEed user and $orked on. (ccess s/ould &e &locked to ot/er users $/ile $ork is &eing perormed on t/e document. V"r!onn- 0ersioning is a process &y $/ic/ documents are c/ecked in or out o t/e document management system' allo$ing users to retrieve previous versions and to continue $ork rom a selected point. 0ersioning is useul or documents t/at c/ange over time and re*uire updating' &ut it may &e necessary to go &ack to a previous copy. ,.8 I!!u"! Addr"!!"d n Docu#"nt M'n'-"#"nt @/ere are several common issues t/at are involved in managing documents' $/et/er t/e system is an inormal' ad4/oc' paper4&ased met/od or one person or i it is a ormal' structured' computer en/anced system or many people across multiple oices. Most met/ods or managing documents address t/e ollo$ing areas5 Loc'ton W/ere $ill documents &e storedN W/ere $ill people need to go to access documentsN !/ysical journeys to iling ca&inets and ile rooms are analogous to t/e onscreen navigation re*uired to use a document management system. F(n- )o$ $ill documents &e iledN W/at met/ods $ill &e used to organiEe or indeD t/e documents to assist in later retrievalN "ocument management systems $ill typically use a data&ase to store iling inormation. R"tr"&'( )o$ $ill documents &e oundN @ypically' retrieval encompasses &ot/ &ro$sing t/roug/ documents and searc/ing or speciic inormation. S"curt) )o$ $ill documents &e kept secureN )o$ $ill unaut/oriEed +1, M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM personnel &e prevented rom reading' modiying or destroying documentsN D!'!t"r R"co&"r) )o$ can documents &e recovered in case o destruction rom ires' loods or natural disastersN R"t"nton <"rod )o$ long s/ould documents &e kept' i.e. retainedN (s organiEations gro$ and regulations increase' inormal guidelines or keeping various types o documents give $ay to more ormal 7ecords Management practices. Arc1&n- )o$ can documents &e preserved or uture reada&ilityN D!tr$uton )o$ can documents &e availa&le to t/e people t/at need t/emN 3or0/(o2 I documents need to pass rom one person to anot/er' $/at are t/e rules or /o$ t/eir $ork s/ould lo$N Cr"'ton )o$ are documents createdN @/is *uestion &ecomes important $/en multiple people need to colla&orate' and t/e logistics o version control and aut/oring arise. Aut1"ntc'ton Is t/ere a $ay to vouc/ or t/e aut/enticity o a documentN ,.5 U!n- IML n Docu#"nt 'nd In/or#'ton M'n'-"#"nt @/e attention paid to QML BEDtensi&le Markup LanguageC' $/ose +.9 standard $as pu&lis/ed ?e&ruary +9' +::<' is impressive. QML /as &een /eralded as t/e neDt important internet tec/nology' t/e neDt step ollo$ing )@ML' and t/e natural and $ort/y companion to t/e #ava programming language itsel. Enterprises o all stripes /ave rapturously em&raced QML. (n important role or QML is in managing not only documents &ut also t/e inormation components on $/ic/ documents are &ased. Docu#"nt M'n'-"#"nt: Or-'n>n- F("! "ocument management as a tec/nology and a discipline /as traditionally augmented t/e capa&ilities o a computerIs ile system. %y ena&ling users to c/aracteriEe t/eir documents' $/ic/ are usually stored in iles' document management systems ena&le users to store' retrieve' and use t/eir documents more easily and po$erully t/an t/ey can do $it/in t/e ile system itsel. Long &eore anyone t/oug/t o QML' document management systems $ere originally developed to /elp la$ oices maintain &etter control over and access to t/e many documents t/at legal proessionals generate. @/e &asic mec/anisms o t/e irst document management systems perormed' among ot/ers' t/ese simple &ut po$erul tasks5 (dd inormation a&out a document to t/e ile t/at contains t/e document OrganiEe t/e user4supplied inormation in a data&ase +11 M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Create inormation a&out t/e relations/ips &et$een dierent documents In essence' document management systems created li&raries o documents in a computer system or a net$ork. @/e document li&rary contained a Mcard catalogM $/ere t/e user4supplied inormation $as stored and t/roug/ $/ic/ users could ind out a&out t/e documents and access t/em. @/e card catalog $as a data&ase t/at captured inormation a&out a document' suc/ as t/ese5 Aut1or5 $/o $rote or contri&uted to t/e document M'n to<c!5 $/at su&jects are covered in t/e document Or-n'ton d't"5 $/en $as it started Co#<("ton d't"5 $/en $as it inis/ed R"('t"d docu#"nt!5 $/at ot/er documents are relevant to t/is document A!!oc't"d '<<(c'ton!5 $/at programs are used to process t/e document C'!"5 to $/ic/ legal case Bor ot/er &usiness processC is t/e document related (rmed $it/ a data&ase o suc/ inormation a&out documents' users could ind inormation in more sensi&le and intuitive $ays t/an scanning dierent directoriesI lists o contents' /oping t/at a ileIs name mig/t reveal $/at t/e ile contained. Many people consider document management systemsI irst ac/ievement to /ave created Ma ile system $it/in t/e ile system.M 2oon' document management systems &egan to provide additional and valua&le unctionality. %y enric/ing t/e data&ases o inormation a&out t/e documents Bt/e metadataC' t/ese systems provided t/ese capa&ilities5 V"r!on tr'c0n-5 see /o$ a document evolves over time Docu#"nt !1'rn-5 see in $/at &usiness processes t/e document is used and re4used E("ctronc r"&"25 ena&le users to add t/eir comments to a document $it/out actually c/anging t/e document itsel Docu#"nt !"curt)5 reine t/e dierent types o access t/at dierent users need to t/e document Pu$(!1n- #'n'-"#"nt5 control t/e delivery o documents to dierent pu&lis/ing process *ueues 3or0/(o2 nt"-r'ton5 associate t/e dierent stages o a documentIs lie4cycle $it/ people and projects $it/ sc/edules @/ese critical capa&ilities Bamong ot/ersC o document management systems /ave proven enormously successul' ueling a multi4&illion dollar &usiness. +1. M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM IML: M'n'-n- Docu#"nt Co#<on"nt! QML and its parent tec/nology' 2GML B2tandard GeneraliEed Markup LanguageC' provide t/e oundation or managing not only documents &ut also t/e inormation components o $/ic/ t/e documents are composed. @/is is due to some nota&le c/aracteristics o QML data. Docu#"nt! &!. F("! In QML' documents can &e seen independently o iles. One document can comprise many iles' or one ile can contain many documents. @/is is t/e distinction &et$een t/e <1)!c'( 'nd (o-c'( !tructur" o inormation. QML data is primarily descri&ed &y its logical structure. In a logical structure' principal interest is placed on $/at t/e pieces o inormation are and /o$ t/ey relate to eac/ ot/er' and secondary interest is placed on t/e p/ysical items t/at constitute t/e inormation. 7at/er t/an relying on ile /eaders and ot/er system4speciic c/aracteristics o a ile as t/e primary means or understanding and managing inormation' QML relies on t/e markup in t/e data itsel. ( c/apter in a document is not a c/apter &ecause it resides in a ile called c/apter+.doc &ut &ecause t/e c/apterIs content is contained in t/e nc/aptero and n-c/aptero element tags. %ecause elements in QML can /ave attri&utes' t/e components o a document can &e eDtensively sel4descriptive. ?or eDample' in QML you can learn a lot a&out t/e c/apter $it/out actually reading it i t/e c/apterIs markup is ric/ in attri&utes' as in nc/apter languageHMEnglis/M su&jectHMcolonial economicsM revisionVdateHM+::<9.8=M aut/orHM#oan Q. !ringleM t/esisVadvisorHM7amona Winkel/oMo. W/en t/e elements carry sel4descri&ing metadata $it/ t/em' systems t/at understand QML syntaD can operate on t/ose elements in useul $ays' just like a traditional document management system can. %ut t/ere is a major dierence. In/or#'ton &!. Docu#"nt! QML markup provides metadata or all components o a document' not merely t/e o&ject t/at contains t/e document itsel. @/is makes t/e pieces o inormation t/at constitute a document just as managea&le as t/e ields o a record in a data&ase. %ecause QML data ollo$s syntactic rules or $ell4ormedness and proper containment o elements' document management systems t/at can correctly read and parse QML data can apply t/e unctions o document management system' suc/ as t/ose mentioned a&ove' to any and all inormation components inside t/e document. @/e ocus on inormation rat/er t/an documents rom QML oers some +1; M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM important capa&ilities5 R"u!" o/ In/or#'ton W/ile standard document management systems do oer some measure o inormation reuse t/roug/ ile s/aring' inormation management systems &ased on QML or 2GML ena&le people to s/are pieces o common inormation $it/out storing t/e piece o inormation in multiple places. In/or#'ton C'r&"!tn- %y ena&ling people to ocus on inormation components t/at make up documents rat/er t/an on t/e documents t/emselves' t/ese systems can identiy and capture useul inormation components t/at /ave ongoing value M&uriedM inside documents $/ose value as documents is limited. @/at is' a particular document may &e useul only or a s/ort time' &ut c/unks o inormation inside t/at document may &e reusa&le and valua&le or a longer period. Fn"@Gr'nu('rt) T"7t@M'n'-"#"nt A<<(c'ton! %ecause t/e inormation components in QML documents are identiia&le' manipulata&le' and managea&le' QML inormation management tec/nology can support real economies in applications suc/ as translation o tec/nical manuals. E&'(u'tn- Product O//"rn-! W/ile t/e general $orld o document management and inormation management is moving to$ard adoption o structured inormation and use o QML and 2GML' some product oerings distinguis/ t/emselves &y using underlying data&ase management products $it/ native support or o&ject4oriented data. O&ject4oriented data matc/es t/e structure o QML data *uite $ell and data&ase systems t/at compre/end o&ject4 oriented data adapt $ell to t/e tasks o managing QML inormation. %y contrast' ot/er inormation management products t/at compre/end QML or 2GML data use relational data&ase systems and provide t/eir o$n o&ject4oriented eDtensions to t/ose data&ase systems in order to compre/end o&ject4oriented data suc/ as QML or 2GML data' and relying on suc/ implementations /ave also garnered success and respect in t/e document management marketplace. ,.? T)<"! o/ Docu#"nt M'n'-"#"nt S)!t"#! (lresco Bsot$areC Colum&ia2ot Main--!yrus "M2 Open3M +1< M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM Comput/inkIs 0ie$Wise "idga/ "ocumentum "oc!oint )umming&ird "M Inter$ovenIs Worksite Inonic "ocument Management BU3C I2I2 !apyrus 3no$ledge@ree Laseric/e Livelink O=spaces OracleIs 2tellent !erceptive 2ot$are Fuestys 2olutions 7edmap 7eport8We& 2/are!oint 2aperion 2(! 3MLC 2(! Net$eaver @7IM ConteDt QeroD "ocus/are 8.: CONCLUSION "ocument management systems /ave added variety to t/e pool o options availa&le in datase managemnt in corpcorations. Many products are o t/e s/el or end users to c/oose rom. @/e use o document management systems /as encouraged t/e concept and drive or paperless oice and transactions. It is a concept t/at truly makes t/e uture &ig/t as man tend to$ard greater eiciency &y eliminating use o papers and /ard copies o data and inormation. 5.: SUMMARY ( docu#"nt #'n'-"#"nt !)!t"# B"M2C is a computer system Bor set o computer programsC used to track and store electronic documents and-or images o paper documents %eginning in t/e +:<9s' a num&er o vendors &egan developing systems to manage paper4&ased documents. @/ese systems managed paper documents' $/ic/ included not only printed and pu&lis/ed documents' &ut also p/otos' prints' etc. @/ere is considera&le conusion in t/e market &et$een document management systems B"M2C and content management systems BCM2C. "ocument management systems commonly provide storage' versioning' metadata' security' as $ell as indeDing and retrieval capa&ilities. )ere is a description o t/ese components5 @/ere are several common issues t/at are involved in managing documents' $/et/er t/e system is an inormal' ad4/oc' paper4&ased met/od or one person or i it is a ormal' structured' computer en/anced system or many people across multiple oices @/e attention paid to QML BEDtensi&le Markup LanguageC' $/ose +.9 standard $as pu&lis/ed ?e&ruary +9' +::<' is impressive. QML /as &een /eralded as t/e neDt important Internet tec/nology' t/e neDt +1: M%( ;1< "(@(%(2E M(N(GEMEN@ 2A2@EM step ollo$ing )@ML' and t/e natural and $ort/y companion to t/e #ava programming language itsel. Enterprises o all stripes /ave rapturously em&raced QML. ?.: TUTOR@MARAED ASSIGNMENT +. List 1 c/aracteristics o a document management system 8. "iscuss &riely $orklo$ in t/e conteDt o it as a component o document management system 7.: REFERENCESBFURTCER READINGS %%C 4/8g8 guide 2/oe&oD 2torage. #ames 7o&ertson' !u&lis/ed on +, ?e&ruary 899=.( Miles L. Mat/ieu' Ernest (. CapoEEoli B8998C. MThe (aperless 2ffice: Accepting Digiti3ed dataM B!"?C. @roy 2tate University. 3evin Craine. M+xcerpts from Designing a Document 0trategyM B)@MLC. Craine Communications Group. +.9