You are on page 1of 1034
An Introductio Database Systems + Part T (four chapters) provides a broad introduction to the concepts of database systems in general and relational systems in particular. It also introduces the standard database language. SQL. + Part IT (six chapters) consists of a detailed and very careful description of the relational model. which is not only the theoretical foundation underlying rela- tional systems but is. in fact. the theoretical foundation for the entire database field. : + Part IIT (four chapters) discusses the general question of database design. Three chapters are devoted to design theory. and the fourth considers semantic model- ing and the entity/relationship model. + Part IV (qwo chapters? is concemed with transaction management (i.e.. recovery and concurrency controls). - = Part V (cight chapters) shows how relational concepts are relevant to a variety of further aspects of database technology—security. distributed databases, temporal data, decision support, and so on. + Part VI (three chapters) describes the impact of object technology on database systems. Chapter 25 describes object systems specifically: Chapter 26 considers the possibility of a rapprochement between object and relational technologies and discusses objec/elational systems: and Chapter 27 addresses the relevance (0 databases of XML. About the Author C.J. DATE is an author. lecturer. sesearcher. and independent consultant specializing in relational database systems, An active member of the database community for nearly 35 years. C. J. Date devotes the major part of his career to exploring, expanding. ‘and expounding the theory and practice of relational technology. He enjoys a reputation second to none for his ability 10 explain ‘complex technical material in a clear and understandable fashion. “{(C. J. Date"s} book is the flag bearer of relational theory and. ent in general.,.as well as the runaway leader in discussit sxercises much more respect for careful language and ‘and principles in gaining mastery of the field.” “ARL ECKBERG, San Diego State Universit ‘SQL standards, importance of con- ce {The} 8th Edkjon is an excellent and comprehesive presentation of the contem- porary datab: ers On types. relations. object databases, and obj ether provide an exceptionally clear. self-contained expos tional approach to databases.” | | tical treat- “Chris Date is the computer it base technology. and his book A be the definitive work for those /f most respected expert and thinker on data- ‘ troduction to Database Systems continues to a compreliensive and current guide (0 et ‘concurrency that" and an indispensable rete practitioner should be without this —DEcLaN Brat cialist, Fujitsu “The author's deep igSights into the area. informal treatment of profodyd topics. open-ended discusftons of critical issues, comprehensive and up-to-dat&contents. as well as rich ary i i the database areg for more than two decades.” —Quane Lau, The University of Michigan, Dearborn . al is its comprehensiveness and the tact that it is very up-to- earch developments. The latter factor is due mainly to {Date's} in- with these developments, which gives him a unique opportunity to them.” ; Davip LivinGSTone. University of Northumbria at Newcastle Senior Acquisitions Editor: Maite Suarez-Rivas* Project Editor: Katherine Hartunian farketing Manager: Nathan Schultz, Phaguction Supervisor: Marilya Lloyd Proofreader: Jenn Design Manager: 5 Cover Design: Night & Cover Image: Lindy Date Access the latest information about Aggson-Wesley titles from our World Wide Web site: bptwnam contr : as trademarks. Where those degign ‘The programs and applica value, They have been te 3.456789 10-HAM-06050403 quoted in the form: don’t know history are doomed to repeat it George Santayana ‘would like to see computer science teaching set deliberately in a historical framework. .. Students need to understand how the present situation has come about, what was tried, what worked and what did not, and how improvements in hardware progress possible. The absence of this element in ie training causes people to ch every problem from ples. They are apt to es —— te ‘6n technical planning and externas design for the ft IBM in May, 1983. f database fcld for well over 30 years. the was one i tahase—thraughaut North America and alee in Europe, Australia, Latin Far East. In addition to the present book, he is author or coauthor of a yabase texts, including, from Morgan Kauf- mann. Temporal Data arf (2003) and, from Addison-Wesley, Four- dation for Future Datgbe anifesto (2nd edition, 2000), a detailed proposal for the furyle di languages. including Brille ines, ‘Dutch. French, German, Greek, Italian, Japanese, Rerean, Polish, Pori- Russian, and Spanish, Date has also produced over 300 technical articles and res ide a variety of original contributions to database theory. For several ‘be was a regular columnist for the magazine Database Programming & Design. ile so contrib- utes regularly to the website hup://dbdebunk.com. His professional seminars Oy database technology, offered both in North Ainerica and overseas, are widely considered Mbe sec ‘ond to none for the quality of the subject matier and the clarity ofthe exposition. Mr. Date holds an Honours Degree in Mathematics from Cambridge University, England (BA. 1962, MA 1966) and the honorary degree of Doctor of Technology from De Montfort University, England (1994). papers and has Me seapey Chapter 1 uw 12 13 14 15 16 7 Chapter2 24 22 24 25 26 27 28 29 210 241 27 243 Preface to the Eighth Edition xxi PARTI ‘PRELIMINARIES 1 ‘An Overview of Database Management 3 Introduction ~ What Isa Database System? 6 What Isa Database? 11 - Why Database? 16 \ Data Independence 20 Relational Systems and Others 26 Summary 28 Exercises 29 References and Bibliography 31 Database System Architecture 33 wee Introduction 33 . The Three Levels of the Architecture uM The External Level: © 37 The Conceptual Level _ 39 The Internal Level 40 Mappings 41 The Database Administrator 42 The Database Management System 44 Data Communications 48 Client/Server Architecture 49 Utilities = Si Diatrihuted Processing 51 Summary 55 Exercises 56 References and Bibliography 56 x Chapter3 An Introduction to Relational Databases 59 34 Introduction 59 3.2 An Informal Look at the Relational Model 60 3.3 Kelations and Kelvars cd . 34 WhatRelations Mean 66 35 Optimization 69 3.6 TheCatalog 71 37 Base Relvarsand Views 72 3.8 Transactions 76 3.9 The Suppliers-and-Parts Database 77 3.10 Summary 79 Exercises 81 References and Bibliography 81 Chapter4 An Introduction toSQL 85 4.1 Introduction 85 42 Overview 86 43° The Catalog © “89 44 Views 90 45° Transactions 91 46 EmbeddedSQL 91 47 DynamicSQLand SQL/CLI 97 418 SQLIsNot Perfect 100 ¥ 49° Summary 101 Exercises 102 References and Bibliography 104 PARTI THE RELATIONAL MODEL 109 ChapterS TYPES 111 5.1 Introduction 111 52 Valuesvs. Variables 112 53 Types 0s. Representations 115 24 Type Definition 119 v5 Operators 122 - 56 TypeGenerators 127 5.7 SQL Facilities 128 58 Summary 136 Exercises 137 References and Bibliography 139 Chapter 6 61 62 63 6.4 65 66 67 ” Chapter 7 71 72 73 74 76 27 79 7.10 Chapter 8 81 83 84 85 86 a7 88 89 Relations 14 Introduction 141 .Tuples 141 Relation Types 146 Relation Values 148 Relation Variables 156 SQL Facilities 161 Summary 167 Exercises: 168 References and Bibliography 170 Relational Algebra. 173 Introduction 173 Closure Revisited 175 The Original Algebra: Syntax _ What Is the’Algebra For? 192 Further Points 194 Additional Operators 195 Grouping and Ungrouping 203 Summary 206 Exercises 207 References and Bibliography 209 Relational Calculus 213 Introduction 213, Tuple Calculus 215 Examples 223 Calculus vs. Algebra 225 230 ‘Exercises 246 References and Bibliography . 250 Chapter 11 ma n2 ns m4 us Integrity 253 Introduction 253, ACloser Look 255 Predicates and Propositions 258 Relvar Predicates and Database Predicates 259 Checking the Constraints 260 Internal vs, External Predicates 261 Correctness vs. Consistency 263 Integrity and Views 265 AConstraint Classification Scheme 266 Keys 268 . ‘Triggers (a Digression) 277 SQL Facilities 279 Summary 284 Exercises - 285 References and Bibliography 288 Views 295, Introduction 295 What Are Views For? 298 View Retrievals 302 View Updates 303 Snapshots (a Digression) 318 SQL Facilities " 320 Summary 323 Exercises 324 References and Bibliography 325 PART IIIT DATABASE DESIGN Functional Dependencies 333 Introduction 333. Basic Definitions 334 ‘Trivial and Nontrivial Dependencies 337 Closure ofa Set of Dependencies 338 Closure ofa Set of Attributes 339 329 116 117 121 122 123 124 125 126 127 13.1 132 133 13.4 BS 136 137 BS Chapter 14 141 14.2 143 144 1465 146 147 Irreducible Sets of Dependencies 341 Summary 343 Exercises: 344 References and Bibliography 345 Further Normalization I: INE 2NE3NEBCNF 349 Introduction 349 Nonloss Decomposit First, Second, and Third Normal Forms Dependency Preservation 364 Boyce/Codd Normal Form 367 A Note on Relation-Valued Attributes Summary 375 Exercises 376 : References and Bibliography. ~ 378 357 373 Further Normalization II: Higher Normal Forms Introduction 381 Multi-valued Depen Join Dependencies and Fifth Normal Form 386 ‘The Normalization Procedure Summarized ANote on Denormalization 393 Orthogonal Design (a Digression) 395 Other Normal Forms 398 Summary 400 Exercises 401 References and Bibliography Semantic Modeling ° 409 Introduction 409 ‘The Overall Approach ; 411 TheE/RModel 414 E/RDiagrams 418 402 Database Design with the E/R Model ABriefAnalysis 424 Summary 428 Exercises 429 References and Bibliography 430 420 391 ion and Functional Dependencies 381 \dencies and Fourth Normal Form 353 382 xiv Contents Chapter 15 15.1 15.2 153 15.4 1535 15.6 187 158 15.9 Chapter 16 16.1 162 163 164 16S 166 167 168 169 16.10 16.11 16.12 chapter 17 Wt W2 PARTIV TRANSACTION MANAGEMENT Recovery 445 Introduction 445, Transactions 446 ‘Transaction Recovery 450 System Recovery 453 Media Recovery 455 ‘Two-Phase Commit 456 Savepoints (a Digression) 457 SQLFacilities "458 Summary 459 Exercises 460 References and Bibliography 460 Concurrency 465 Introduction 465 Three Concurrency Problems 466 Locking 470 ‘The Three Concurrency Problems Revisited 472 Deadlock 474 arjalizability 476 Recovery Revisited 478 Isolation Levels 480 Intent Locking 483 Dropping ACID 485 SQL Facilities 490 Summary 491 Exercises 492 References and Bibliography 494 PART V FURTHER TOPICS 501 Security 503 Introduction 503 Discretionary Access Control 506 443 73 Wa 175 176 7 Chapter 18 18.1 182 183 18.4 185 186 18.7 188 Chapter 19 19.1 192 193 194 195 196 19.7 198 Chapter 28 20.1 202 203 204 205 20.6 ‘Mandatory Access Control Statistical Databases 513 DataEncyption 519 SQL Facilities 523 sit Summary 527 Exercises 528 References and Bibliography 529 Optimization 531 Introduction 531 AMotivating Example 533 ‘An Overview of Query Processing Expression Transformation... 539 Database Statistics 544 534 ADivide-and-Conquer Strategy 545 Implementing the Relational Operators Missing Information 575 Introduction 575 ‘An Overview of the 3VL Approach ‘Some Consequences of the Foregoing Scheme Nullsand Keys 586 Outer Join (a Digression) 589 Special Values 591 SQL Facilities 592 Summary 597 Exercises 598 References and Bibliography Type Inheritance 605: Introduction 605 Type Hierarchies 610 600 Polymorphism and Substitutability Variables and Assignments Specialization by Constraint Comparisons 623 617 621 37 613, 548 582 Contents wi Contents . 27 * 20.8 20.9 20.10 20.11 Chapter 21 24 212 23 214 2s 21.6 217 218 Chapter 22 2a 22 24 25 226 27 28 29 Chapter 23 231 23.2 233 23.4 23.5 23.6 Operators, Versions, and Signatures 626 IsaCirclean Ellipse? 630 . Specialization by Constraint Revisited 634 SQLFacilities "636 Summary 641 Exercises 642 References and Bibliography 644 Distributed Databases 647 Introduction 647, Some Preliminaries 648 The Twelve Objectives 652 Problems of Distributed Systems 660 Client/Server Systems "671 DBMS independence 674 SQL Facilities 679 Summary 680 Exercises 681 References and Bibliography 682 Decision Support 689 Introduction 689 Aspects of Decision Support 691 Database Design for Decision Support 693 Data Preparation 701 Data Warehouses and Data Marts 704 Online Analytical Processing 709 DataMining 717 SQLFacilities 719 Summary 720 Exercises 721 References and Bibliography 722 Temporal Databases 727 Introduction 727 * ‘What Is the Problem? 732 Intervals 737 Packing and Unpacking Relations 743 Generalizing the Relational Operators 754 Database Design 758 237 23.8 Chapter 24 24.1 24.2 24.3 24.4 245 246 24.7 24.8 Chapter 25 251 252 33 24 235 26 Chapter 26 26.1 262 26.3 64 265 26.6 26.7 Contents xvii Integrity Constraints 764 Summary 770 Exercises 71 References and Bibliography 772 Logie-Based Databases 775 Introduction 775 Overview 776 Propositional Calculus 778 Predicate Calculus 783 AProof-Theoretic View of Databases 789 Deductive Database Systems, :, 793 Recursive Query Processing’... 798 Summary 803 * Exercises 806 . References and Bibliography: ~ wus; PART VI OBJECTS, RELATIONS, AND XML "811 ” Object Databases 813 Introduction 813 Objects, Classes, Methods, and Messages 817 F ACloser Look 822 _ ACradle-to-Grave Example 830+ Miscellaneous Issues 840 - Summary 847 . : Exercises 850 References and Bibliography 851 Object/Relational Databases 859 Introduction 859 ‘The First Great Blunder 862 The Second Great Blunder 870 Implementation Issues 874 Benefits of True Rapprochement 876 SQLFacilities 878 Summary 885 Exercises ‘885 References and Bibliography 886 xviii Contents ‘Chapter 27 274 272 273 274 275 27.6 277 278 The World Wide Web and XML 895 Introduction 895 ‘The Web and the Internet 896 An Overview of XML 897 XML Data Definition 908 XML Data Manipulation 917 XML and Databases 925 SQL Facilities 928 Summary 932 Exercises 934 . References and Bibliography 935 APPENDIXES 939 Appendix A The TransRelational™ Model 941. Al A2 AS AA AS ASG Az Introduction - 941 Three Levels of Abstraction . 943 The Basicldea 946 Condensed Columns 952 Merged Columns 956 Implementing the Relational Operators 960 Summary 966 References and Bibliography 966 \ppendixB SQLExpressions 967 BA B2 B3 Introduction 967 Table Expressions 968 Boolean Expressions 973 \ppendix C Abbreviations, Acronyms, and Symbols 977 CF eral Beparimenis Fig. 16 Entty/relationship (E/R) diagram for KnowWare Inc. Chapter 1 | An Overview of Database Management 13 In addition to the basic entities themselves (suppliers, pars, and so on, in the examm- ple), there will also be relationships linking those basic entities together. Such relstion- ships are represented by dlamonds and connecting lines in Fig. 1.6. For example, there is 8 relationship (‘SP” or shipments) between suppliers and pars: Each supplier supplies cer- tain parts, and conversely each partis supplied by certain suppliers (more accurately, each ‘supplier supplies certain kinds of pars, each kind of partis supplied by certain suppliers). ‘Similarly, parts are used in projects, and conversely projects use parts (relationship PJ}: parts are stored in warehouses, and warehouses store pars (relationship WP); and so on. Note that these relationships are all bidirectionai—tha is, they can be traversed in either More precisly. by ples in refaions (see Chapter 3), Jian. OF 16 Part I | Preliminaries interact, The objects allow us to model the structure of data. The operators allow Ui: to model its behavior We can then draw a useful (and very important!) distinction between the model and its implementation: ™ An implementation of a given data model is a physical realization on a real machine (of the components of the abstract machine that together constiute that model, Tn a nutshell: The model is what users have to know about; the implementation is what users do not have to know about. ‘As you can see from the foregoing, the between logical and physical. Sadly, however, many of today's database systems. even ‘ones that profess to be relational, do not make these distinctions as clearly as they should. Indeed, there seems to be a fairly widespread lack of understanding of these distinctions and the importance of making them. As a consequence, there 1 all too often a gap between database principles (the way database systems ought to be} and database practice (the way they actually are) In this book we are concemed primarily with principles, but it is only fair to wam you that you might therefore be in for a few surprises. mostly of an unpleasant nature, if and when you start using a commercial product. . In closing this section, we should mention the fact that dara model is another term that is used in the literature with two quite different meanings. The first meaning is as already explained. The second is as follows: A data model (second seise) is a model of the persis ‘ent data of some particular enterprise (e.g., the manufacturing company KnowWare Inc. ussed earlier in this section). The difference between the two meanings can be charac- terized thus: ™ A data model in the first sense is like a programming language—albeit one that is somewhat abstract—whase constructs can be used to solve a wide variety of specific problems, but in and of themselves have no direct connection with any such specific problem. A data model in the second sense is like a specific program written in that language. In other words, a data model in the second sense takes the facilities provided by some ‘model inthe frst sense and applies them to some specific problem. It ean be regarded 88 a Specific application of some mode! in the first sense. : In this book, the term data model will be used only in the first sense from this point forward, barring explicit statements to the contrary. 14 WHY DATABASE? ‘Why use a database system? What ae the advantages? To some extent the answer to these questions depends on whether the system in question is single- or multi-user (or rather, t0 be more accurate, there are numerous addirional advantages in the multi-user case). We consider the single-user case first. —————_———“_OOSCS—‘<‘ 1998 ; 2. Projects Result: [wine BOTTLES SELECT WINE, Zinfandel = FROM CELLAR’; Funé Blane 2 Pinot oir 3 Zinfandel 3 Fig. 18 Data structure and operators ina relational system (examples) of the CELLAR table from Fig. 1.1, reduced in size to.make it more manageable). Two . sample retrievals, one involving a restriction or row-subsetting operation and the other a projection or column-subsetting operation, are shown in part b of the figure. The examples ‘are expressed in SQL once again, ‘We can now distinguish between relational and nonrelational systems. In a celational system, the user sees the data as tables. and nothing but tables (as already explained). Ina nonrelational system, by contrast, the User sees other data structures (either instead of 0 as well as the tables of a relational system). Those other structures. in turn. require other ‘operators to access them. For example. in a hierarchic system like [BM's [MS, the datais represented to the user in the form of trees (hierarchies), and the optrators provided for accessing such trees include operators for following pointers (namely, the pointers that implement the hierarchic paths up and down the trees). By.contrast, as the examples ia this chapter have shown, itis precisely an important distinguishing characteristic of rla- ‘ional systems that they involve no pointers (at least, no pointers Visible to the user—ie., ‘no pointers at the model level—though there might well be pointers at the level of the Physical implement tied “Seoning to the data structures and operators they present to the user. According to this scheme, the oldest (prerestonal) systems fll nto thee broad categories inverted Uist, blerarchfe, and network systems.° (Nore: The term network here has nothing todo § By analogy with the relational model, earlier editions of this book referred to inverted lst, hierarchic, and nétwork models (end much of the literature stl does). To tlk in such terms is a lite msleating, bowever. because—unlke the relational model-—dhe iaverte list. blerarchic. and network “rodels” wert allied fre oct tt, conmereil inverted Ux irc and network prods we in ‘mented fist and the Corresponding “models” were defined subzequenily by 3 process of induction (this conte spol ern for gueswark) fom thoe existing implementations, See the santo ole ence (1:1 for further discussion. - 28 PartI { Preliminaries ‘with networks in the data communications sense, as described in the next chapter.) We do not discuss these categories in detail in this book because—from a technological point of view, at least—they must be regarded as obsolete. You can find tutorial descriptions of all, three in reference [1.5] if you are interested. ‘Asan aside, we remark that network systems are sometimes called either CODASYT. systems or DBTG systems, after the committee that proposed them: namely, the Data, Base Task Group (DBTG) of the Conference on Data Systems Languages (CODASYL). Probably the best-known example of such a system is IDMS, from Computer Associates International Inc. Like hierarchic systems (but unlike relational ones), such systems all expose pointers to the user. ‘The first relational products began to appear in the late 1970s and early 1980s. At the time of writing, the vast majority of database systems are relational (at least. they support SQL), and they run on just about every kind of hardware and software platform available. Leading examples include, in alphabetical order, DB2 (various versions) from IBM Corp. , Ingres I from Computer Associates International Inc., Inforthix' Dynamic Server from Informix Software Inc.,” Microsoft SQL Server from Microsoft Corp., Oracle 9i' from Oracle Corp., and Sybase Adaptive Server from Sybase Inc. Note: When we have cause to refer to any of these products later in this book, we wil refer to them (as most of the industry does, informally) by the abbreviated names DB2, Ingres (pronounced “ingress”), Informix, SQL Server, Oracle, and Sybase, respectively. Subsequently, object and object/relational products began to becoriie available— object systems in the late 1980s and early 1990s, objectrelational systems in the late 1990s. The objectrelational systems are extended versions of certain of the original SQL products (e.g., DB2, Informix); the object—sometimes object-oriented—systems represent attempts to do something entirely different, as in the case of GemStone from GemStone Systems Inc. and Versant ODBMS from Versant Object Technology. Such sys- tems are discussed in Part YI of this book. (We should note that the term object as used in this paragraph has a rather specific meaning, which we will explain when we get to Part VI. Prior to that point, we will use the term in its normal generic sense, barring explicit statements to the contrary.) In addition to the approaches already mentioned, research has proceeded over the years on a variety of alternative schemes, including the multi-dimensional approach and the logic-based (also called deductive or expert) approach. We discuss multi-dimensional systems in Chapter 22 and logic-based systems in Chapter 24, Also, the recent explosive growth of the World Wide Web and the use of XML has generated much interest in what has become known (not very aptly) as the semistructured approach. We discuss “semi- structured” systems in Chapter 27. 17 SUMMARY We close this introductory chapter by summarizing the main database system can be thought of as a computerized record-keeping system. Such a sys~ 7 The DBMS division of Informix Software Inc, wes acquired by IBM Corp. in 2001. eee r annette cementite tttertere temarenen team in i i

You might also like