Introduction to Database

CHAPTER 1

INTRODUCTION 
        

Database-System Applications Purpose of Database Systems View of Data Database Languages Relational Databases Database Design Data Storage and Querying Transaction Management Database Architecture Database Users and Administrators
1-1

Edited: Wei-Pang Yang, IM.NDHU

Database System: Introduction 

Database Management System (DBMS)  Contains a large bodies of information  Collection of interrelated data (database)  Set of programs to access the data Goal of a DBMS:  provides a way to store and retrieve database information that is both ‡ convenient and ‡ efficient. Functions of DBMS: Management of Data (MOD)  Defining structure for storage data  Providing mechanisms for manipulation of data  Ensure safety of data (system crashes, unauthorized access, misused, «)  Concurrent control in multi-user environment   

Computer Scientists: developed a lot of concepts and technique for MOD  concepts and technique form the focus of this book, and this course
Edited: Wei-Pang Yang, IM.NDHU

1-2

1.1 Database-System Applications 

Database Applications:  Banking: all transactions 
   

Airlines: reservations, schedules Universities: registration, grades, student profile, .. Sales: customers, products, purchases Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deductions 

Databases touch all aspects of our lives

Edited: Wei-Pang Yang, IM.NDHU

1-3

1.2 Purpose of Database Systems 


In the early days, database applications were built on top of file systems Drawbacks of using file systems to store data:  Data redundancy and inconsistency

‡ Multiple file formats, duplication of information in different 

files Difficulty in accessing data

‡ Need to write a new program to carry out each new task 


Data isolation ³ multiple files and formats Integrity problems

‡ Integrity constraints ‡

(e.g. account balance > 0) become part

of program code Hard to add new constraints or change existing ones
1-4

Edited: Wei-Pang Yang, IM.NDHU

Drawbacks of using file systems (cont.) 

Drawbacks of using file systems to store data: (cont.) ,  Atomicity of updates

‡ Failures may leave database in an inconsistent state with
partial updates carried out ‡ E.g. transfer of funds from one account to another should either complete or not happen at all Concurrent access by multiple users 

‡ Concurrent accessed needed for performance ‡ Uncontrolled concurrent accesses can lead to inconsistencies 
E.g. two people reading a balance and updating it at the same

time 

Security problems
Solution

Database systems offer solutions to all the above problems
Edited: Wei-Pang Yang, IM.NDHU

1-5

type customer = record name : string. IM. and the relationships among the data. Views can also hide information (e. city : string.1. Edited: Wei-Pang Yang.  By sequential file. « Logical level: describes data stored in database. end.   View level: application programs hide details of data types.g.g. or hash structure. pointer. street : string. income) for security purposes.. customer information) is stored in disk.. income : integer.NDHU 1-6 .3 View of Data and Data Abstraction  Physical level: describes how a record (e.

View of Data -1: Three Levels An architecture for a database system Edited: Wei-Pang Yang. IM.NDHU 1-7 .

g. system catalog < (Build and maintain schemas and mappings) Conceptual schema DBA Conceptual/internal mapping Storage structure definition (Internal schema) 1 2 3 . IM.View of Data -2: Three Levels User A1 Host Language + DSL 1 2 User A2 Host Language + DSL 3 User B1 Host Language + DSL User B2 User B3 Host Host C. Language) e.NDHU 1-8 . 100 Stored database (Internal View) # & @ Edited: Wei-Pang Yang. SQL External View @ # & External schema A External schema B External View B External/conceptual mapping A External/conceptual mapping B Conceptual View Database management system Dictionary (DBMS) e. C++ Language Language + DSL + DSL DSL (Data Sub.g...

. the database consists of information about a set of customers and accounts and the relationship between them  Analogous to type information of a variable in a program  Physical schema: database design at the physical level  Logical schema: database design at the logical level account create table account (account-number char(10). IM.g.1. street : string. end.3. balance integer) type customer = record name : string.NDHU customer 1-9 .2 Instances and Schemas  Schema ² the logical structure of the database  e. city : integer. Edited: Wei-Pang Yang.

NDHU . 1-10 Edited: Wei-Pang Yang. the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others. balance integer) Schema  Physical Data Independence ² the ability to modify the physical schema without changing the logical schema  Applications depend on the logical schema  In general. IM.Instances and Schemas (cont.)  Instance ² the actual content of the database at a particular point in time  Analogous to the value of a variable Instance create table account (account-number char(10).

View of Data: Three Levels An architecture for a database system Physical Data Independence Edited: Wei-Pang Yang.NDHU 1-11 . IM.

objects)    data relationships data semantics data consistency constraints  Data Models Provide:  A way to describe the design of a database at 3 levels ‡ Physical level ‡ Logical level ‡ View level Edited: Wei-Pang Yang.3 Data Models  A collection of conceptual tools for describing  data (entities.NDHU 1-12 . IM.3.1.

NDHU 1-13 .Category of Data Models  Category of Data Models:  Entity-Relationship model    Relational model Object-oriented model Semi-structured data models ‡ Extensible Markup Language (XML)  Older models: ‡ Network model and ‡ Hierarchical model Edited: Wei-Pang Yang. IM.

balance integer)  Data Manipulation Language (DML)   To express database queries or updates E.g.g. create table account (account-number char(10).4 Database Languages  Data Definition Language (DDL):  Specification notation for defining the database schema  E.NDHU . Select account-number from account where balance >1000  SQL (Structured Query Language): a single language for both 1-14 Edited: Wei-Pang Yang. IM.1.

1 Data-Manipulation Language (DML)  Language for accessing and manipulating the data organized by the appropriate data model  DML also known as query language  For retrieval. IM.1.g. insertion. deletion.g. In SQL: Edited: Wei-Pang Yang. « in C  Declarative DMLs (Nonprocedural DMLs) ² user specifies what data is required without specifying how to get those data  Select account-number from account where balance > 700 SQL is the most widely used query language 1-15 ‡ E.NDHU . modification (update)  Two classes of languages  Procedural DMLs ² user specifies what data is required and how to get those data ‡ E.4.

primary key (branch-name). create table account (account-number char(10).g. check (assets >= 0)) 1-16 Edited: Wei-Pang Yang. assets are integer type  Assertions: e. assets >= 0  Authorization: for different users  «. create table branch (branch-name char(15).g.1. branch-city char(30).g.2 Data-Definition Language (DDL)  Specification notation for defining the database schema  E.4. balance integer)  Define: ‡ Attributes name ‡ Data type ‡ Consistency constraints (integrity constraints)  Domain constraints: e. assets integer. IM.NDHU .

data about data) ‡ Database schema ‡ System tables ‡ Users ‡«  Database system consults the Data dictionary before reading or modifying actual dada.NDHU 1-17 . Data storage and definition language ‡ To specify the storage structure and access methods ‡ Usually an extension of the data definition language  (ch.e.Data Dictionary and Storage Definition  Data Dictionary:  DDL compiler generates a set of tables stored in a data dictionary  contains metadata (i.. IM. 11.12) Edited: Wei-Pang Yang.

C. IM. normalized relations (tables). Normalized: contains no repeating group (only contains atomic value). Time-varying: the set of tuples changes with time. ‡ ‡ ‡  Perceived by the users: the relational model apply at the view level and logical levels.5 Relational Databases  Definition 1: A Relational Database is a database that is perceived by the users as a collection of time-varying.NDHU 1-18 . like high-level language. PASCAL .PL/1 assembler machine DBMS environments Relational DBMS Relational Data Model Edited: Wei-Pang Yang. The relational model represents a database system at a level of abstraction that removed from the details of the underlying machine.1.

5. IM.1 Tables  Definition 2: A Relational Database is a database that is perceived by its users as a collection of tables (and nothing but tables). Edited: Wei-Pang Yang.NDHU 1-19 .1.

customer-name from customer where customer. find the name of the customer with customer-id 192-83-7465 select customer.NDHU 1-20 .5.customer-id = ¶192-83-7465· Output: customer-name Johnson customer Edited: Wei-Pang Yang.2 Data-Manipulation Language  SQL (Structured Query Language) : widely used  E.g. IM.1.

account-number = account.SQL (Structured Query Language)  E.balance from depositor. find the balances of all accounts held by the customer with customer-id 192-83-7465 select account. IM. account where depositor.customer-id = ¶192-83-7465· and depositor.NDHU 1-21 .g.account-number Edited: Wei-Pang Yang.

balance integer)  Assertions (ref. p.132) ‡ E. p.3 Data-Definition Language  SQL provides DDL to define database schema:  Tables ‡ E.1.g.g. create table account (account-number char(10).129) Edited: Wei-Pang Yang. IM.NDHU 1-22 . create assertion balance-constraint check account.balance >= 1000  integrity Constraints (ref.5.

IM. depositor Edited: Wei-Pang Yang. account references create table depositor (customer-name char(20). branch-name char(15). 3.Referential Integrity Constraint create table account (account-number char(10). foreign key (account-number) references account.NDHU 1-23 . 4. account-number char(10). primary key (account-number). primary key (customer-name. account-number). balance integer.

4 Data Access from Application Programs  Application programs generally access databases through one of  Language extensions to allow embedded SQL  Application program interface (e.NDHU 1-24 .1.g. ODBC/JDBC) which allow SQL queries to be sent to a database ODBC: Open Database Connectivity for C JDBC: Java Database Connectivity for Java language   ODBC/JDBC Edited: Wei-Pang Yang. IM.5.

The process of designing the general structure of the database:  Logical Design  Physical Design Logical Design ² Deciding on the database schema.1. IM.   Business decision ² What attributes should we record in the database? Computer Science decision ² What relation schemas should we have and how should the attributes be distributed among the various relation schemas?   Physical Design ² Deciding on the physical layout of the database 1-25 Edited: Wei-Pang Yang.NDHU .6 Database Design  Database Design .  To find a ´goodµ collection of relation schemas.

12) 1-26  Phase III   Phase IV    Edited: Wei-Pang Yang.6. IM. 11.NDHU .1 Design Process  Phase I  Specification of user requirement (with domain experts) Phase II      Conceptual design (ch. 7) Specification of functional requirements Implementation Logical-design Physical-design (ch.1. 6) Choose a data model Design tables Normalization (ch.

2 Database Design for Banking  Banking Database: consists 6 relations: 1. branch-city. borrower (customer-name. assets) 2. customer-street. balance) 4.NDHU 1-27 . loan-number) Edited: Wei-Pang Yang.1. branch-name.6. branch (branch-name. amount) 5. account (account-number. branch-name. customer (customer-name. customer-only) 3. loan (loan-number. account-number) 6. depositor (customer-name. IM.

NDHU 1-28 . IM. borrower 5. depositor 4. account 6. ) 3. customer ( .Example: Banking Database 1. branch 2. loan Edited: Wei-Pang Yang.

6)  Example: Schema in the Entity-Relationship model ( .1. IM.6.3 Entity-Relationship Model (ch. ) 1-29 Edited: Wei-Pang Yang.NDHU . .

NDHU 1-30 .240 Edited: Wei-Pang Yang. p.E-R Diagram for a Banking Enterprise. IM.

customers. bank branch  Relationships between entities ‡ E.g. Relationship set depositor associates customers with accounts  Widely used for database design    Database design in E-R model usually converted to design in the Relational model (coming up next) which is used for storage and processing Relational Model (ch. accounts. 2) E-R model (ch.g.Entity Relationship Model (cont.g. Account A-101 is held by customer Johnson ‡ E.)  E-R model of real world  Entities (objects) ‡ E. 6) 1-31 Edited: Wei-Pang Yang.NDHU . IM.

NDHU 1-32 .4 Normalization  Definition: A Relational Database is a database that is perceived by its users as a collection of tables (and nothing but tables). <e.6. IM.> Supplier-and-Parts Database S S# S1 S2 S3 S4 S5 P# P1 P2 P3 P4 P5 P6 SNAME STATUS CITY Smith 20 London Jones 10 Paris Blake 30 Paris Clark 20 London Adams 30 Athens PNAME Nut Bolt Screw Screw Cam Cog COLOR WEIGHT CITY Red 12 London Green 17 Paris Blue 17 Rome Red 14 London Blue 12 Paris Red 19 London SP S# S1 S1 S1 S1 S1 S1 S2 S2 S3 S4 S4 S4 P P# QTY P1 300 P2 200 P3 400 P4 200 P5 100 P6 100 P1 300 P2 400 P2 200 P2 200 P4 300 P5 400 Edited: Wei-Pang Yang.1.g.

. . S4. P5. .g.. London. .. 12. 20.. 300 S1. . Paris. . . . . 400 S S# SNAME STATUS CITY P P# .. P2. .. Nut. 20.. . Blue... Smith SP' S# CITY P# QTY P1 P2 300 200 S1 London or S# SNAME STATUS S1 S2 . .. Green. . . . . . S1 London . . London. . . Smith.. Bolt.Problem of Normalization <e. . . 12. London. . . London . . ) 1-33 Redundancy Edited: Wei-Pang Yang. 20. London. Clark. . P1. 200 .. .NDHU Update Anomalies! .> S1. . IM. 17. Paris. . S' P P# . Smith. S# P# QTY s1 . Cam. Red. .. . . . . . ( . . . Normalization SP .

NDHU 1-34 .7 Object-Based and Semistructured Databases  Extend the relational data model  by including object orientation and  constructs to deal with added data types. including  non-atomic values such as nested relations. (repeated data. while extending modeling power. borrower  Edited: Wei-Pang Yang. (video. image. «) Preserve relational foundations. «)  Allow attributes of tuples to have complex types.  in particular the declarative access to data. 6. IM.1.

NDHU 1-35 .1. not just documents XML has become the basis for all new generation data interchange formats. and to create nested tag structures made XML a great way to exchange data.7. A wide variety of tools is available for parsing. IM. browsing and querying XML documents/data    Edited: Wei-Pang Yang.2 Semistructured Data Models  XML (Extensible Markup Language)  Defined by the WWW Consortium (W3C)  Originally intended as a document markup language not a database language The ability to specify new tags.

8 Data Storage and Querying  Components of Database System  Query Processor Query DBMS ‡ ‡ ‡  Helps to simplify to access data High-level view Users are not be burdened unnecessarily with the physical details Require a large amount of space Can not store in main memory Disk speed is slower Minimize the need to move data between disk and main memory Language Processor Query Processor Optimizer Operation Processor Storage Manager ‡ ‡ ‡ ‡ Access Method Storage Manager File Manager Goal of a DBMS: provides a way to store and retrieve data that is both convenient and efficient.NDHU Database 1-36 .1. Edited: Wei-Pang Yang. IM.

IM.NDHU database 1-37 .Overall System Structure Overall System Structure low-level data stored Edited: Wei-Pang Yang.

responsible for storing.NDHU .  i. retrieving and updating of data in database Data Structures of the Storage Manager  Data files: store database itself  Data Dictionary: store metadata  Indices: provide fast access to data items that hold particular values 1-38   Edited: Wei-Pang Yang.1 Storage Management  Storage Manager  is a program module  that provides the interface between the low-level data stored and the application programs and queries submitted to the system. IM. Tasks of the Storage Manager:  interaction with the file manager (part of Operating System)  Translates DML into low-level file-system commands.1.8.e.

Storage Management (cont.NDHU . IM.)  Components of Storage manager:  Authorization and Integrity Manager ‡ Tests for the satisfaction of integrity constraints ‡ Checks the authority of users to access data  Transaction Manager ‡ Ensure the database in a consistent state (correct) after failures ‡ Ensure that concurrent transaction executions proceed without conflicting  File Manager ‡ Manages the allocation of space on disk ‡ Manages the data structures used to representation data stored  Buffer manager ‡ Fetches data from disk into main memory ‡ Decides what data to cache in main memory 1-39 Edited: Wei-Pang Yang.

1.2 The Query Processor  DDL Interpreter  Interprets DDL statements  write the definitions (schema..8. view. IM.) into the data dictionary  DML Compiler  Translates DML statements into an evaluation plan (or some evaluation plans) which consists low-level instructions  Query Optimization: picks the lowest cost evaluation plan  Query Evaluation Engine:  execute low-level instructions generated by the DML Compiler Edited: Wei-Pang Yang.NDHU 1-40 . .

Evaluation Edited: Wei-Pang Yang. Parsing and translation 2. Optimization 3.Flow of Query Processing 1. IM.NDHU 1-41 .

IM.Query Optimizer  Alternative ways of evaluating a given query  Equivalent expressions  Different algorithms for each operation  Cost difference between a good and a bad way of evaluating a query can be enormous  Need to estimate the cost of operations  Depends critically on statistical information about relations which the database must maintain Need to estimate statistics for intermediate results to compute cost of complex expressions  Edited: Wei-Pang Yang.NDHU 1-42 .

g.Example: A Simple Query Processing ( Query in SQL SELECT CUSTOMER. INVOICE WHERE REGION = 'N. Index.C#=INVOICE. .Y. .NDHU database 1-43 . Hashing Calls to file system GET10th to 25th bytes from block #6 of file #5 Access Method Storage Manager File System Edited: Wei-Pang Yang.B-tree. Access Method e.' AND AMOUNT > 10000 AND CUTOMER. NAME FROM CUSTOMER. create C SCAN I using amount index.C ) DBMS Language Processor Internal Form : 4(W (S SP) Optimizer Operator : SCAN C using region index. IM. create I SORT C?and I?on C# JOIN C?and I?on C# EXTRACT name field Language Processor Query Processor Operator Processor Calls to Access Method OPEN SCAN on C with region index GET next tuple .

1.  Failure: ‡ system failures (e.NDHU .. to ensure the consistency of the database. IM. power failures and operating system ‡  crashes) transaction failures.g.9 Transaction Management  Transaction:  A transaction is a collection of operations that performs a single logical function in a database application  Atomicity: all or nothing  Failure recovery manager  ensures that the database remains in a consistent (correct) state. Concurrency-control manager  controls the interaction among the concurrent transactions. 1-44 Edited: Wei-Pang Yang.

10 Data Mining and Analysis  Data Analysis and Mining  Decision Support Systems    Data Analysis and OLAP (Online analytical processing).NDHU 1-45 . Data Warehousing Data Mining Edited: Wei-Pang Yang.1. IM.

NDHU 1-46 .  often based on data collected by on-line transaction systems. age. etc.) Edited: Wei-Pang Yang. IM.  Examples of business decisions:  What items to stock?   What insurance premium to change? To whom to send advertisements?  Examples of data used for making decisions  Retail sales transaction details  Customer profiles (income. gender.Decision Support Systems  Decision-support systems  are used to make business decisions.

. Associations may be used as a first step in detecting causation 1-47 ‡ Find books that are often bought by ´similarµ customers.g.23: Young women buy cars.  is the process of semi-automatically analyzing large databases to find useful patterns  Prediction based on past history  Predict if a credit card applicant poses a good credit risk.Data Mining (ch. p. suggest the others too. based on some attributes (income.) and past history  Predict if a pattern of phone calling card usage is likely to be fraudulent  Descriptive Patterns  Associations If a new such customer buys one such book.NDHU . IM.18)  Data mining:  seeks to discover knowledge automatically in the form of statistical rules and patterns from large databases. job type. E.  Edited: Wei-Pang Yang. age. (library) ..

25) Application Structure  User uses database at the site   Users uses database through a network ‡ Client: remote database users work ‡ Sever: database system runs here  Partition of Database Application  Two-tier architecture  Three-tier architecture 1-48 Edited: Wei-Pang Yang.1.6 (p. IM. 1.NDHU .11 Database Architecture  System Structure of a Database System  Fig.

g.Application Architectures ODBC/JDBC   Two-tier Architecture: e. web-based applications.g. and applications built using ´middlewareµ 1-49 Edited: Wei-Pang Yang. IM.NDHU . client programs using ODBC/JDBC to communicate with a database Three-tier Architecture: e.

100 Stored database (Internal View) # & @ Edited: Wei-Pang Yang. C++ Language Language + DSL + DSL DSL (Data Sub. IM.12 Database Users and Administrators User A1 Host Language + DSL 1 2 User A2 Host Language + DSL 3 User B1 Host Language + DSL User B2 User B3 Host Host C. Language) e. system catalog < (Build and maintain schemas and mappings) Conceptual schema DBA Conceptual/internal mapping Storage structure definition (Internal schema) 1 2 3 ..g.g.NDHU 1-50 . SQL External View @ # & External schema A External schema B External View B External/conceptual mapping A External/conceptual mapping B Conceptual View Database management system Dictionary (DBMS) e.1..

g.NDHU . OLAP (Online analytical processing). complex data type (graphics. CAD.1 Database Users and User Interfaces     Application programmers  interact with system through DML calls . IM. Sophisticated users  Submit query without write program  E. expert system. data mining tools Specialized users  write specialized database applications that do not fit into the traditional data processing framework  E. audio) Naive users (end user)  invoke one of the permanent application programs that have been written previously  E.g. people accessing database over the web. clerical staff 1-51 Edited: Wei-Pang Yang.12.1. bank tellers.g.

disk ‡ Monitoring performance  « 1-52 Edited: Wei-Pang Yang.12.  has a good understanding of the enterprise·s information resources and needs. IM.g.2 Database Administrator   Database Administrator:  Coordinates all the activities of the database system.NDHU . Database Administrator's Duties:  Schema definition  Storage structure and access method definition  Schema and physical organization modification  Granting of authorization for data access  Routine maintenance ‡ Periodically backup database ‡ Upgrade system e.1.

object-oriented. distributed database. Lab. « Early 1990s:  Parallel database  Object-Relational Late 1990s:  World Wide Web was explosive growth  Database were used much more than ever before  Database had to support Web interfaces to data 1-53 Edited: Wei-Pang Yang.  Input: punched decks. « Turing Award 1980s:  System R: IBM Res. DEC Rdb  Replaced Network/Hierarchical model  Research: parallel database. IM.1. Ingress.NDHU .13 History of Database Systems      1950s ² early 1960:  Tapes: sequentially  Application: Payroll.1970s:  Disk: direct access  Codd proposed Relational Model. Output: printer Late 1960s -. Oracle. IBM DB2.

IM.History of Database Systems ( 1950-1965 Network Hierarchical ) 1965-1979 Network Hierarchical 1980-1989 1990-1995 Merging data models.NDHU 1-54 . knowledge-base Relation 1995-present Object-Oriented OO-relation XML Relation Data Model Database Hardware User Interface Program Interface Presentation and display processing Mainframes Semantic Objectoriented Relation proposed Logic Relation Mainframes Minis Mainframes PCs DL/I COBOL+DL/I Graphics. Menus SQL. QUEL Query-by-forms Embedded Query non-Procedural Report generators Information and transaction processing Faster PCs Workstations Parallel Database machines Optical memories Natural language Speech input WWW Web interface None Forms Procedural Procedural Integrated database 4GL and programming Logic programming language Business graphics Image output Knowledge processing Multimedia Reports Processing data Reports Processing data Edited: Wei-Pang Yang.

IM.NDHU 1-55 .±  ( ) 1901 1969  :  (Alan Turing 1912-1954) (Turing Award)  1966 (Cook) C (Thompson) (Ritchie) Unix (Codd) Edited: Wei-Pang Yang.

NDHU 1-56 . IM.) 1936 (Turing machine)     1954 42 Edited: Wei-Pang Yang.±  (cont.

Sign up to vote on this title
UsefulNot useful