Professional Documents
Culture Documents
Unit 1 Understanding Database System
Unit 1 Understanding Database System
• Data isolation
▫ Multiple files and formats
Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve
the appropriate data is difficult.
• Integrity problems
▫ Integrity constraints (e.g., account balance > 0) become
“buried” in program code rather than being stated explicitly
▫ Hard to add new constraints or change existing ones
The data values stored in the database must satisfy certain
types of consistency constraints.
For example, the balance of a bank account may never fall below a
prescribed amount (say, $25).
Developers enforce these constraints in the system by adding
appropriate code in the various application programs.
However, when new constraints are added, it is difficult to
change the programs to enforce them.
Drawbacks of using file systems to store data (Cont.)
• Atomicity of updates
▫ In many applications, it is crucial that, if a failure occurs,
the data be restored to the consistent state that existed
prior to the failure.
Consider a program to transfer $50 from account A to account
B.
If a system failure occurs during the execution of the program, it is
possible that the $50 was removed from account A but was not
credited to account B, resulting in an inconsistent database state.
Concurrent access by multiple users
• Security problems
▫ Hard to provide user access to some, but not all, data.
1. Network model
2.Hierarchical model
Entity - Relational Model
• The entity-relationship (E-R) data model is based on a
perception of a real world that consists of a collection of basic
objects, called entities, and of relationships among these
objects.
• An entity is a “thing” or “object” in the real world that is
distinguishable from other objects.
▫ For example, each person is an entity, and bank accounts can be
considered as entities.
• Entities are described in a database by a set of attributes.
▫ For example, the attributes account-number and balance may
describe one particular account in a bank, and they form
attributes of the account entity set.
• A relationship is an association among several entities.
▫ For example, a depositor relationship associates a customer with
each account that she has.
The set of all entities of the same type and the set of all
relationships of the same type are termed an entity set and
relationship set, respectively.
E-R Diagrams
• The overall logical structure (schema) of a database can
be expressed graphically by an E-R diagram, which is
built up from the following components:
▫ Rectangles, which represent entity sets
▫ Ellipses, which represent attributes
▫ Diamonds, which represent relationships among entity
sets
▫ Lines, which link attributes to entity sets and entity sets to
relationships
• In addition to entities and relationships, the E-R model
represents certain constraints to which the contents of a
database must confirm.
• One important constraint is mapping cardinalities,
which express the number of entities to which another
entity can be associated via a relationship set.
▫ For example, if each account must belong to only one
customer, the E-R model can express that constraint.
Sample E-R Diagram
• The E-R diagram indicates that there are two
entity sets, customer and account, with
attributes as outlined earlier.
• The diagram also shows a relationship depositor
between customer and account.
Relational Model
• The relational model revolves around a
fundamental data structure called a table,
which is a formalization of the intuitive Columns
notion of a table.
• Informally, the relational model consists of:
▫ A class of data structures referred to as
tables. Rows
▫ A collection of methods for building new
tables starting from an initial collection of
tables;
we refer to these methods as relational
algebra operations.
▫ A collection of constraints imposed on the
data contained in tables.
• The relational model uses a collection of
tables to represent both data and the
relationships among those data.
• Each table has multiple columns, and each
column has a unique name.
A Sample Relational Database
The relational model is at a lower level of abstraction than the E-R model. Database
designs are often carried out in the E-R model, and then translated to the relational
model;
Database Languages
• A database system provides a data definition language to
specify the database schema and a data manipulation
language to express database queries and updates.
• Two classes of languages
▫ Pure – used for proving properties about computational power and for optimization
Relational Algebra
Tuple relational calculus
Domain relational calculus
▫ Commercial – used in commercial systems
SQL is the most widely used commercial language
• In practice, the data definition and data manipulation
languages are not two separate languages; instead they simply
form parts of a single database language, such as the widely
used SQL language.
• The commands in the language are classified into different
categories based on their functional implementation
▫ DDL – Data Definition Language
▫ DML – Data Manipulation Language
▫ DCL – Data Control Language
▫ TCL – Transaction Control Language
Data Definition Language (DDL)
• data storage and definition language.
• These statements define the implementation details of the database
schemas, which are usually hidden from the users.
• The data values stored in the database must satisfy certain consistency
constraints.
• Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
• Execution of the above DDL statement creates the account table.
▫ In addition, it updates a special set of tables called the data dictionary or data
directory.
▫ Data dictionary contains metadata (i.e., data about data)
Database schema
Integrity constraints
Primary key (ID uniquely identifies instructors)
Authorization
Who can access what
Data Manipulation Language (DML)
• Data manipulation is
▫ The retrieval of information stored in the database
▫ The insertion of new information into the database
▫ The deletion of information from the database
▫ The modification of information stored in the database
• A data-manipulation language (DML) is a language that enables users to
access or manipulate data as organized by the appropriate data model.
• There are basically two types:
▫ Procedural DMLs require a user to specify what data are needed and how to get those
data.
▫ Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specify what data are needed without specifying how to get those data.
• The DML component of the SQL language is nonprocedural.
• Storage or Memory
manager
• Query processing
• Transaction
manager
Storage Management
• Storage manager is a program module that
provides the interface between the low-level data
stored in the database and the application programs
and queries submitted to the system.
• The storage manager is responsible for the
interaction with the file manager.
▫ The storage manager translates the various DML
statements into low-level file-system commands.
Attribute
Cardinality = 2
tuple/relational
instance SID Name Major GPA
1234 John CS 2.8
5678 Mary EE 3.6
4 Degree
A Schema / Relation
From ER Model to Relational Model
Major Dept
GPA
Major GPA
Major GPA
• Intuitively Simple
▫ Build a new table with as many columns as there are
attributes for the union of the primary keys of all
participating entity sets.
▫ Augment additional columns for descriptive attributes
of the relationship set (if necessary)
▫ The primary key of this table is the union of all
primary keys of entity sets that are on “many” side
Example – N-ary Relationship Set
P-Key1
D-Attribute A-Key
E-Set 1
P-Key3
E-Set 3
SSN Name
Street City
Representing Multivalue Attribute
Major GPA
Stud_SID Children
1234 Johnson
1234 Mary
SID Name Major GPA
5678 Bart
1234 John CS 2.8
5678 Lisa
5678 Homer EE 3.6
5678 Maggie
Representing Class Hierarchy
• Two general approaches depending on
disjointness and completeness
▫ For non-disjoint and/or non-complete class hierarchy:
create a table for each super class entity set
according to normal entity set translation method.
Create a table for each subclass entity set with a
column for each of the attributes of that entity set
plus one for each attributes of the primary key of the
super class entity set
This primary key from super class entity set is also
used as the primary key for this new table
Class Hierarchy SSN Name
Example 1
Person
SID Status
Gender
ISA
Student
Major GPA
ISA
SID
Student Faculty
Disjoint and Complete
mapping
Major GPA Dept
Dept
SID
Name
member
Number of Tables
Redundancy
Complexity
▫ Third Normal Form (3NF)
▫ Boyce-Codd Normal Form (BCNF)
▫ Fourth Normal Form (4NF)
▫ Fifth Normal Form (5NF)
▫ Domain Key Normal Form (DKNF)
Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFororBCNF
BCNFininorder
ordertotoavoid
avoidthe
thedatabase
databaseanomalies.
anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each
Eachhigher
higherlevel
levelisisaasubset
subsetofofthe
thelower
lowerlevel
level
First Normal Form (1NF)
A table is considered to be in 1NF if all the fields contain
only scalar values (as opposed to list of values).
Example (Not 1NF)
0-55-123456-9 Main Street Jones, Smith 123-333-3333, Small House 714-000-0000 $22.95
654-223-3455
Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar
1NF - Decomposition
1. Place all items that appear in the repeating group in a new table
2. Designate a primary key for each new table produced.
3. Duplicate in the new table the primary key of the table from
which the repeating group was extracted or vice versa.
Example (1NF)
0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333
1-22-233700-0 Visual Basic Big House 123-456-7890 $25.00 0-123-45678-0 Joyce 666-666-6666
Example 1
Example 3
5 Smith 654-223-3455
6 Joyce 666-666-6666
7 Roman 444-444-4444
FD – Example
Database to track reviews of papers submitted to an academic
conference. Prospective authors submit papers for review and possible
acceptance in the published conference proceedings. Details of the
entities
▫ Author information includes a unique author number, a name, a mailing
address, and a unique (optional) email address.
▫ Paper information includes the primary author, the paper number, the
title, the abstract, and review status (pending, accepted, rejected)
▫ Reviewer information includes the reviewer number, the name, the
mailing address, and a unique (optional) email address
▫ A completed review includes the reviewer number, the date, the paper
number, comments to the authors, comments to the program chairperson,
and ratings (overall, originality, correctness, style, clarity)
FD – Example
Functional Dependencies
▫ AuthNo AuthName, AuthEmail, AuthAddress
▫ AuthEmail AuthNo
▫ PaperNo Primary-AuthNo, Title, Abstract, Status
▫ RevNo RevName, RevEmail, RevAddress
▫ RevEmail RevNo
▫ RevNo, PaperNo AuthComm, Prog-Comm, Date,
Rating1, Rating2, Rating3, Rating4, Rating5
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements
▫ The database is in first normal form
▫ All nonkey attributes in the table must be functionally dependent on the entire
primary key
Note: Remember that we are dealing with non-key attributes
Contractor Contractor
Example 3 (Convert to 3NF) BuildingID Fee
Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas
BCNF - Decomposition
Example 2 (Convert to BCNF)
Old Scheme {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieTitle, PersonName}
• Loss of relation {MovieID} {MovieTitle}
New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieID, MovieTitle}
• We got the {MovieID} {MovieTitle} relationship back