You are on page 1of 36

Database Management Systems

Chapter 1

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
1
Basic Concepts
 Data
 Known facts that could be recorded and stored on
computer media
 Example: For a student, we have name, student
number, program, address, and so on.
 Data vs. Information
 These two are closely related and are often used
interchangeably. However, it is often useful to
distinguish between data and information.
 We define information as data that has been
processed in such a way that it can increase the
knowledge of the person who uses it.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
2
Basic Concepts

 Metadata
 Data that describe the properties or characteristics
of other data
 Data definition, data structures, rules and
constraints are some of the properties included in
metadata.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
3
Why Databases?

 information superhighway
 data can become a liability
 cost of managing and acquiring >> value derived
from it
 what do we need?
 tools that simplify the task of managing data and
extracting useful information in a timely fashion

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
4
Database

 What is a Database?
 A collection of data that is used by the application
systems of some given enterprise.

 A database is a collection of files, which in turn are
collections of records.
 Records consist of information that is related in
one way or another.
 The pieces of information that comprise a record
are called fields.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5
5
Database

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
6
DBMS

 Database Management System
 Database system is basically a computerized
record-keeping system
 The information concerned can be anything that is
deemed to be of significance to the individual or
organization the system is intended to serve
 A database system involves four major
components, namely data, hardware, software,
and users.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7
7
What Is a DBMS?

 Models real-world enterprise.
 Entities (e.g., students, courses)
 Relationships (e.g., Madonna is taking CS564)
 A Database Management System (DBMS) is a
software package designed to store and
manage databases.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
8
DBMS
 a microcosm of computer science?
 programming languages
 object-orientation
 operating systems
 concurrent programming
 data structures
 algorithms
 parallel and distributed systems
 AI and expert systems
 statistical techniques, etc.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9
9
History
 Integrated Data Store (early 1960s)
 Charles Bachman (network data model)
 Bachman became the first recipient of ACM’s
Turing Award (1973) in his work in the database
area
 Information Management System (IMS, late
1960s)
 developed by IBM (hierarchical data model)
 SABRE system
• airline reservation system (American Airlines)
• travelocity.com

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
10
History
 Relational Model (1970)
 Edgar Codd (IBM)
 Codd won the 1981 Turing Award for his work
 Database systems matured as an academic
discipline, and changed the commercial landscape.
 SQL (1980)
 the relational model is the dominant DBMS
paradigm
 the Structured Query Language for relational
databases became the standard query language

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11
11
History
 Concurrent Transactions (1980s)
 concurrent execution of database programs
 users write them to be run by themselves, the
DBMS is responsible for running them
concurrently
 James Gray won the 1999 Turing Award for his
work on transaction management
 Data Warehouses (1990s)
 more powerful query languages, richer data
models, etc.
 Internet
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
12
Paper vs DBMS

 Paper-Based System versus Database System

 Advantages of database system:
 compactness
 speed
 less drudgery
 currency

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
13
Files vs. DBMS

 Scenario:
 A company has a large collection (500 GB) of data
on employees, departments, products, sales and so
on.
 data is accessed concurrently
 questions about the data must be answered
quickly
 changes made to the data by different users must
be applied consistently
 access to certain parts of the data (e.g., salaries)
must be restricted.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
14
Files vs. DBMS

 Application must store large datasets
between main memory and secondary
storage (e.g., buffering, page-oriented access,
32-bit addressing, etc.)
 Special code for different queries
 Must protect data from inconsistency due to
multiple concurrent users
 Crash recovery (restore data)
 Security and access control

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15
15
Why Use a DBMS?

 Data independence
 Efficient access.
 Data integrity and security.
 Uniform data administration.
 Concurrent access, recovery from crashes.
 Reduced application development time.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
16
Database Applications

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17
17
?
Why Study Databases??

 Shift from computation to information
 Internet technologies
 scientific applications
 Datasets increasing in diversity and volume.
 Digital libraries, interactive video, Human
Genome project, EOS project
 ... need for DBMS exploding
 DBMS encompasses most of CS
 OS, languages, theory, AI, multimedia, logic

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
18
Data Models
 A data model is a collection of concepts for
describing data.
 A schema is a description of a particular
collection of data, using a given data model.
 The relational model of data is the most widely
used model today.
 Main concept: relation, basically a table with rows
and columns.
 Every relation has a schema, which describes the
columns, or fields.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19
19
Levels of Abstraction
 Many views, single View 1 View 2 View 3
conceptual (logical) schema
and physical schema.
Conceptual Schema
 Views describe how users
see the data.
Physical Schema
 Conceptual schema defines
logical structure
 Physical schema describes
the files and indexes used.

 Schemas are defined using DDL; data is modified/queried using DML.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20
20
Example: University Database
 Conceptual schema:
 Students(sid: string, name: string, login: string,
age: integer, gpa:real)
 Courses(cid: string, cname: string, credits: integer)
 Faculty(fid: string, fname: string, sal: real)
 Physical schema:
 Relations stored as unordered files.
 Index on first column of Students.
 External Schema (View):
 Course_info(cid: string, fname: string, enrollment:
integer)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21
21
Data Independence *

 Applications insulated from how data is
structured and stored.
 Logical data independence: Protection from
changes in logical structure of data
(conceptual schema).
 Physical data independence: Protection from
changes in physical structure of data (physical
schema).
 One of the most important benefits of using a DBMS!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22
22
Concurrency
 Concurrent execution of user programs
is essential for good DBMS performance.
 Because disk accesses are frequent, and relatively
slow, it is important to keep the cpu humming by
working on several user programs concurrently.
 Interleaving actions of different user programs
can lead to inconsistency: e.g., check is cleared
while account balance is being computed.
 DBMS ensures such problems don’t arise: users
can pretend they are using a single-user system.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23
23
Transaction: An Execution of a DB Program
 Key concept is transaction, which is an atomic
sequence of database actions (reads/writes).

 Each transaction, executed completely, must
leave the DB in a consistent state if DB is
consistent when the transaction began.
 Users can specify some simple integrity constraints on
the data, and the DBMS will enforce these constraints.
 Thus, ensuring that a transaction (run alone) preserves
consistency is ultimately the user’s responsibility!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24
24
Scheduling Concurrent Transactions
 DBMS ensures that execution of {T1, ... , Tn} is
equivalent to some serial execution T1’ ... Tn’.
 Before reading/writing an object, a transaction requests
a lock on the object, and waits till the DBMS gives it the
lock. All locks are released at the end of the transaction.
(Strict 2PL locking protocol.)

 Idea: If an action of Ti (say, writing X) affects Tj (which
perhaps reads X), one of them, say Ti, will obtain the
lock on X first and Tj is forced to wait until Ti completes;
this effectively orders the transactions

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25
25
Scheduling Concurrent Transactions
 Locks
 Write-lock (exclusive lock)
only one transaction at a
time
 Read-lock (shared lock) can
be held by more than 1
transaction at a time
 Deadlocks
 one of T3 or T4 must be 
rolled back and its lock
released

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26
26
Ensuring Atomicity
 DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle of a write
action.
 Idea: Keep a log (history) of all actions carried out
by the DBMS before executing a write action:
 Before a change is made to the database, the
corresponding log entry is forced to a safe location
(disk).
 (WAL protocol; OS support for this is often inadequate.)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27
27
Ensuring Atomicity
 After a crash, the effects of partially executed
transactions are undone using the log.
 If log entry wasn’t saved before the crash, DBMS would
be unable to detect and undo the change.
 The time required to recover from a crash can be
reduced by periodically forcing some information to
disk; this periodic operation is called a checkpoint.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28
28
The Log
 The following actions are recorded in the log:
 Ti writes an object: the old value and the new value.
• Log record must go to disk before the changed page!
 Ti commits/aborts: a log record indicating this action.
 Log records chained together by write action id, so it’s
easy to undo a specific write action (e.g., to resolve a
deadlock).
 Log is often duplexed and archived on “stable” storage.
 All log related activities (and in fact, all CC related
activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 29
29
These layers
Structure of a DBMS must consider
concurrency
control and
recovery
 A typical DBMS has a Query Optimization
layered architecture. and Execution
 The figure does not Relational Operators
show the concurrency
Files and Access Methods
control and recovery
components. Buffer Management
 This is one of several
Disk Space Management
possible architectures;
each system has its own
variations. DB

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 30
30
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 31
31
Database Roles

  Data Owner
 the person who makes the strategic and policy
decisions regarding the data of the
enterprise.
  Database Administrators
  Database Application Programmers
  End Users and DBMS Vendors

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 32
32
Database Roles

  Database Administrator (DBA) is the
person who provides the necessary technical
support for implementing those decisions.
Thus, the DBA is responsible for the overall
control of the system at a technical level.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 33
33
Database Roles

 Here are functions of the DBA:
 Defining the physical/conceptual schema
 Handling of security, authentication and integrity
rules
 Liaising with users
 Defining backup and recovery procedures for data
availability
 Monitoring performance and responding to
changing requirements

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 34
34
Databases make these folks happy ...

 End users and DBMS vendors
 DB application programmers
 E.g. smart webmasters
 Database administrator (DBA)
 Designs logical /physical schemas
 Handles security and authorization
 Data availability, crash recovery
 Database tuning as needs evolve
Must understand how a DBMS works!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 35
35
Summary
 DBMS used to maintain, query large datasets.
 Benefits include recovery from system crashes,
concurrent access, quick application
development, data integrity and security.
 Levels of abstraction give data independence.
 A DBMS typically has a layered architecture.
 DBAs hold responsible jobs
and are well-paid!
 DBMS R&D is one of the broadest,
most exciting areas in CS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 36
36