You are on page 1of 43

prepered

by kailash dhirwani (

WHAT IS DBMS ?
- To be able to carry out operations like insertion, deletion and retrieval, the database
needs to be managed by a substantial piece of software; this software is usually called a
Database Management System(DBMS).
- A DBMS is usually a very large software package that enables many different tasks
including the provision of facilities to enable the user to access and modify information in
the database.
- Data Description Languages (DDL) and Data Manipulation Languages (DML) are
needed for manipulating and retrieving data stored in the DBMS. These languages are
called respectively.

An architecture for database systems, called the three-schema architecture was proposed
to help achieve and visualize the important characteristics of the database approach.

What is a Database?
A collection of related pieces of data:
▪ Representing/capturing the information about a real-world
enterprise or part of an enterprise.
▪ Collected and maintained to serve specific data management
needs of the enterprise.
▪ Activities of the enterprise are supported by the database and
continually update the database.
University Database:
Data about students, faculty, courses, research-
laboratories, course registration/enrollment etc.
Reflects the state of affairs of the academic aspects of the
university.
Purpose: To keep an accurate track of the academic
activities of the university.
RDBMS

1
A Relational Database Management System is a program that lets you create,
update and administrator a relational database. The primary rule for RDBMS is
that the Data should be stored in the form of tables.
Most of the RDBMS’s use the Structures Query Language to access the database.
When a database undergoes NORMALISATION it is called as a RDBMS.

THE THREE-SCHEMA ARCHITECTURE:


The goal of the three-schema architecture is to separate the user applications and the
physical database. In this architecture, schemas can be defined at 3 levels :
1. Internal level or Internal schema : Describes the physical storage structure of the
database. The internal schema uses a physical data model and describes the complete
details of data storage and access paths for the database.
2. Conceptual level or Conceptual schema : Describes the structure of the whole database
for a community of users. It hides the details of physical storage structures and
concentrates on describing entities, data types, relationships, user operations, and
constraints. Implementation data model can be used at this level.
3. External level or External schema : It includes a number of external schemas or user
views. Each external schema describes the part of the database that a particular user is
interested in and hides the rest of the database from user. Implementation data model can
be used at this level.

2
What is the purpose of the mappings in the Three Schema Architecture? Is the user
or the DBMS responsible for using the mappings?
Ans: The purpose of the mappings in the Three Schema Architecture is to describe how
a
schema at a higher level is derived from a schema at a lower level. The DBMS, not the
user, is responsible for using the mappings

IMPORTANT TO REMEMBER :
Data and meta-data
– three schemas are only meta-data(descriptions of data).
– data actually exists only at the physical level.
Mapping
– DBMS must transform a request specified on an external schema into a request against
the conceptual schema, and then into the internal schema.
– requires information in meta-data on how to accomplish the mapping among various
levels.
– overhead(time-consuming) leading to inefficiencies.
– few DBMSs have implemented the full three-schema architecture.

3
DATA INDEPENDENCE

The disjointing of data descriptions from the application programs (or user-interfaces)
that uses the data is called data independence. Data independence is one of the main
advantages of DBMS. The three-schema architecture provides the concept of data
independence, which means that upper-levels are unaffected by changes to lower-levels.
The three schemas architecture makes it easier to achieve true data independence. There
are two kinds of data independence.

- Physical data independence


* The ability to modify the physical scheme without causing application programs to be

rewritten.
* Modifications at this level are usually to improve performance.

- Logical data independence


* The ability to modify the conceptual scheme without causing application programs to
be rewritten.
* Usually done when logical structure of database is altered.

Logical data independence is harder to achieve as the application programs are usually
heavily dependent on the logical structure of the data. An analogy is made to abstract data
types in programming languages.

. What is a DBMS?
Ans: A database management system (DBMS) is a collection of software that supports
the creation, use, and maintenance of databases. Initially, DBMSs provided efficient
storage and retrieval of data. Due to marketplace demands and product innovation,
DBMSs have evolved to provide a broad range of features for data acquisition, storage,
dissemination, maintenance, retrieval, and formatting. The evolution of these features
has made DBMSs rather complex
What is SQL?
Ans: The Structured Query Language (SQL) is an industry standard language supported
by most DBMSs. SQL contains statements for data definition, data manipulation, and
data control. A DBMS has to be persistent, that is it should be accessible when the
program created the data ceases to exist or even the application that created the data
restarted. A DBMS also has to provide some uniform methods independent of a specific
application for accessing the information that is stored.

RDBMS is a Relational Data Base Management System Relational DBMS. This adds the
additional condition that the system supports a tabular structure for the data, with
enforced relationships between the tables. This excludes the databases that don't support a
tabular structure or don't enforce relationships between tables.

4
Many DBA's think that RDBMS is a Client Server Database system but thats not the case
with RDBMS.

Yes you can say DBMS does not impose any constraints or security with regard to data
manipulation it is user or the programmer responsibility to ensure the ACID PROPERTY
of the database whereas the rdbms is more with this regard bcz rdbms define the integrity
constraint for the purpose of holding ACID PROPERTY.

I have found many answers on many websites saying that


DBMS are for smaller organizations with small amount of data, where security of the
data is not of major concern and RDBMS are designed to take care of large amounts of
data and also the security of this data.
and this is completely wrong by definition of RDBMS and Dbms

Different abstract levels


- a widely accepted general architecture for a database
- database described by three abstract levels
- internal schema (physical database)
- conceptual schema (conceptual database)
- external schema (view)
Objectives
- insulation of application programs and data
- support of multiple user views
- use of schema to store the DB description (mete-data)
The Three Schema Architecture
External schema
- describes a subset of the database that a particular
user group is interested in, according to the format
the format user wants, and hides the rest
- may contain virtual data that is derived from the
files, but is not explicitly stored
Conceptual schema
- hides the details of physical storage structures and
concentrates on describing entities, data types,
relationships, operations, and constraints.
Internal schema
- describes the physical storage structure of the DB
- uses a low-level (physical) data model to describe

5
the complete details of data storage and access paths
Three Schema Architecture
Data and meta-data
- three schemas are only meta-data (descriptions of data)
- data actually exists only at the physical level
Mapping
- DBMS must transform a request specified on an external
schema into a request against the conceptual schema,
and then into the internal schema
- requires information in meta-data on how to accomplish
the mapping among various levels
- overhead (time-consuming) leading to inefficiencies
- few DBMSs have implemented the full three-schema
architecture
Benefits of Three Schema Architecture
Logical data independence
- the capacity to change the conceptual schema without
having to change external schema or application prgms
ex: Employee (E#, Name, Address, Salary)
A view including only E# and Name is not affected by
changes in any other attributes.
Physical data independence
- the capacity to change the internal schema without
having to change the conceptual (or external) schema
- internal schema may change to improve the performance
(e.g., creating additional access structure)
- easier to achieve logical data independence, because
application programs are dependent on logical structures
Data Models
Data abstraction
- one fundamental characteristic of the database approach
- hides details of data storage that are not needed by most
database users and applications
Data model
- a set of data structures and conceptual tools
used to describe the structure of a database
(data types, relationships, and constraints)
- used in the definition of the conceptual,
external, and internal schema
- must provide means for DB designers to represent
the real-world information completely and naturally
Data Models
High-level (conceptual) data models
- use concepts such as entities, attributes, relationships

6
- object-based models: ER model, OO model
Representational (implementation) data models
- most frequently used in commercial DBMSs
- record-based models: relational, hierarchical, network
Low-level (physical) data models
- to describe the details of how data is stored
- captures aspects of database system implementation:
record structures (fixed/variable length) and ordering,
access paths (key indexing), etc.
Schemas and Instances
In any data model, it is important to distinguish between
the description of the database and the database itself.

Data Models
ER model
- popular high-level conceptual model used in DB design

- proposed by P. Chen in 1976 (ACM TODS)


- perception of real-world consisting of a collection of
entities and relationships among them
OO model
- DB is defined in terms of objects, their properties,
and their operations (methods)
Relational model
- represents a DB as a collection of tables
Network model
- represents DB as record types and 1:N relationships
Hierarchical model
- represents data as hierarchical tree structures..

oo..

Logical and Physical Data Organization


Logical organization
- conceptual or logical format of the data
(e.g., employee record has E#, Name, Address)
Physical organization
- actual structure of the data and all supporting access
structures (e.g., index)

7
(e.g., employee: E# 32 bits
Name 30 bytes
Address 50 bytes)
Benefit
- application programs must know the logical organization
but the physical organization is an implementation detail
they need not know

A Data Manipulation Language (DML) statement is executed when you


o Add new rows to a table
o Modify existing rows in a table
o Remove existing rows from a table
• A transaction consists of a collection of DML statements that form a logical unit of work.
• Adding a new row to a table is accomplished using the INSERT statement
INSERT INTO table [column, column, column]
VALUES (value, value, value);
ƒ Because you can insert a new row that contains values for each column, the column list is
not
required in the INSERT clause. However, if you do not use the column list, the values must
be
listed according to the default order of the columns in the table.
ƒ You can insert NULL values by simply omitting the column value, or by specifying either
(‘‘) or
NULL as the item to be inserted.
• Here’s an example of an insert…
SQL> insert into emp
2 values (2296, ’AROMANO’, ’SALESMAN’, 7782,
3 TO_DATE(’FEB 3, 1997’, ’MON DD, YYYY’),
4 1300, NULL, 10);
1 row created.
• Note that the TO_DATE() function formats the string into a DATE datatype.
• Creating a Script with Customized Prompts
o ACCEPT stores the value in the variable
o PROMPT displays your customized text.
• Let’s look at the following example…first, we create the following script and save it with
the following
name scriptsWithCustomizedPrompts.sql:
ACCEPT department_id PROMPT ’Please enter the -
department number:’
ACCEPT department_name PROMPT ’Please enter -
the department name:’
ACCEPT location PROMPT ’Please enter the -
location:’
INSERT INTO dept (deptno, dname, loc)
VALUES (&department_id, ’&department_name’, ’&location’);

8
• We then run the script using the START keyword…

SQL> START scriptsWithCustomizedPrompts


Please enter the
department number:90
Please enter
the department name:PAYROLL
Please enter the
location:HOUSTON
old 2: VALUES (&department_id, ’&department_name’,
’&location’)
new 2: VALUES (90, ’PAYROLL’, ’HOUSTON’)
1 row created.
• Notice that the ACCEPT keyword allows us to accept the value entered by the user and
to store it in thein the
variable name that follows it, which , by the way, does not require the substitution
parameter (&).
However, when we do make use of its contents in the INSERT statement, we must
include the
ampersand! When the script is run, the user is asked to provide the values for all three
variables here
defined: department_id, department_name, and location.
• Copying Rows from another Table: Write your INSERT statement with a subquery. Do
not use the
VALUES clause. Match the number of columns in the INSERT clause to those in the
subquery.
SQL> create table managers(id number(4), name varchar2(10), salary
number(7,2), hiredate date)
2/
Table created.
SQL> INSERT INTO managers(id, name, salary, hiredate)
2 SELECT empno, ename, sal, hiredate
3 FROM emp
4 WHERE job = ’MANAGER’;
3 rows created.
SQL> select *
2 from managers
3/
ID NAME
SALARY HIREDATE
--------- ---------- --------- ---------
7566 JONES
2000 02-APR-81
7698 BLAKE

9
2000 01-MAY-81
7782 CLARK
2000 09-JUN-81
• For changing date in a table, we make use of the UPDATE statement…
SQL> update managers

Introduction to Structured Query


Language (SQL)
SQL allows users to access data in relational database management systems, such as
Oracle, Sybase,
Informix, Microsoft SQL Server, Access, and others, by allowing users to describe the
data the user wishes to
see. SQL also allows users to define the data in a database, and manipulate that data. The
SQL used in this document is "ANSI", or standard SQL,

Table of Contents
Basics of the SELECT Statement
Conditional Selection
Relational Operators
Compound Conditions
IN & BETWEEN
Using LIKE
Joins
Keys
Performing a Join
Eliminating Duplicates
Aliases & In/Subqueries
Aggregate Functions
Views
Creating New Tables
Altering Tables
Adding Data
Deleting Data
Updating Data
Indexes
GROUP BY & HAVING
More Subqueries
EXISTS & ALL
UNION & Outer Joins
Embedded SQL
Common SQL Questions

10
Nonstandard SQL
Syntax Summary
Exercises
Important Links
Basics of the SELECT Statement
In a relational database, data is stored in tables. An example table would relate Social
Security Number, Name,
and Address:
EmployeeAddressTable
SSN
FirstName LastName Address
City
State

512687458 Joe
Smith
83 First Street Howard Ohio
758420012 Mary
Scott
842 Vine Ave. Losantiville Ohio
102254896 Sam
Jones
33 Elm St.
Paris
New York
876512563 Sarah
Ackerman 440 U.S. 110 Upton
Michigan
Now, let's say you want to see the address of each employee. Use the SELECT statement,
like so:
SELECT FirstName, LastName, Address, City, State
FROM EmployeeAddressTable;
The following is the results of your query of the database:
First Name Last Name Address
City
State
Joe
Smith
83 First Street Howard Ohio
Mary
Scott
842 Vine Ave. Losantiville Ohio

11
Sam
Jones
33 Elm St.
Paris
New York
Sarah
Ackerman 440 U.S. 110 Upton
Michigan
To explain what you just did, you asked for the all of data in the EmployeeAddressTable,
and specifically, you
asked for the columns called FirstName, LastName, Address, City, and State. Note that
column names and
table names do not have spaces...they must be typed as one word; and that the statement
ends with a semicolon
(;). The general form for a SELECT statement, retrieving all of the rows in the table is:
SELECT ColumnName, ColumnName, ...
FROM TableName;
To get all columns of a table without typing all column names, use:
SELECT * FROM TableName;
Each database management system (DBMS) and database software has different methods
for logging in to the
database and entering SQL commands; see the local computer "guru" to help you get
onto the system, so that
you can use SQL

Data model

A data model provides the details of information to be stored, and is of primary use when
the final product is the generation of computer software code for an application or the

12
preparation of a functional specification to aid a computer software make-or-buy
decision. The figure is an example of the interaction between process and data models.
According to Hoberman (2009), "A data model is a wayfinding tool for both business
and IT professionals, which uses a set of symbols and text to precisely explain a subset of
real information to improve communication within the organization and thereby lead to a
more flexible and stable application environment

Database model

• A database model is a theory or specification describing how a database is


structured and used. Several such models have been suggested. Common models
include: This may not strictly qualify as a data model. The flat (or table) model
consists of a single, two-dimensional array of data elements, where all members
of a given column are assumed to be similar values, and all members of a row are
assumed to be related to one another.

• Hierarchical model: In this model data is organized into a tree-like structure,


implying a single upward link in each record to describe the nesting, and a sort
field to keep the records in a particular order in each same-level list.
• Network model: This model organizes data using two fundamental constructs,
called records and sets. Records contain fields, and sets define one-to-many
relationships between records: one owner, many members.
• Relational model: is a database model based on first-order predicate logic. Its core
idea is to describe a database as a collection of predicates over a finite set of
predicate variables, describing constraints on the possible values and
combinations of values.

• Object-relational model: Similar to a relational database model, but objects,


classes and inheritance are directly supported in database schemas and in the
query language.
• Star schema is the simplest style of data warehouse schema. The star schema
consists of a few "fact tables" (possibly only one, justifying the name) referencing
any number of "dimension tables". The star schema is considered an important
special case of the snowflake schema.

13
Hirchiel model network model relational model

Data structure

A binary tree, a simple type of branching linked data structure.

A data structure is a way of storing data in a computer so that it can be used efficiently. It
is an organization of mathematical and logical concepts of data. Often a carefully chosen
data structure will allow the most efficient algorithm to be used. The choice of the data
structure often begins from the choice of an abstract data type.

A data model describes the structure of the data within a given domain and, by
implication, the underlying structure of that domain itself. This means that a data model
in fact specifies a dedicated grammar for a dedicated artificial language for that domain.
A data model represents classes of entities (kinds of things) about which a company
wishes to hold information, the attributes of that information, and relationships among
those entities and (often implicit) relationships among those attributes. The model
describes the organization of the data to some extent irrespective of how data might be
represented in a computer system.

The entities represented by a data model can be the tangible entities, but models that
include such concrete entity classes tend to change over time. Robust data models often
identify abstractions of such entities. For example, a data model might include an entity
class called "Person", representing all the people who interact with an organization. Such
an abstract entity class is typically more appropriate than ones called "Vendor" or
"Employee", which identify specific roles played by those people

Data flow diagram


A data flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system. It differs from the flowchart as it shows the data flow instead of the
control flow of the program. A data flow diagram can also be used for the visualization
of data processing (structured design). Data flow diagrams were invented by Larry
Constantine, the original developer of structured design,[19] based on Martin and Estrin's
"data flow graph" model of computation.

14
It is common practice to draw a context-level Data flow diagram first which shows the
interaction between the system and outside entities. The DFD is designed to show how a
system is divided into smaller portions and to highlight the flow of data between those
parts. This context-level Data flow diagram is then "exploded" to show more detail of the
system being modeled

Object model

An object model in computer science is a collection of objects or classes through which a


program can examine and manipulate some specific parts of its world. In other words, the
object-oriented interface to some service or system. Such an interface is said to be the
object model of the represented service or system. For example, the Document Object
Model (DOM) [3] is a collection of objects that represent a page in a web browser, used
by script programs to examine and dynamically change the page. There is a Microsoft
Excel object model[21] for controlling Microsoft Excel from another program, and the
ASCOM Telescope Driver[22] is an object model for controlling an astronomical
telescope.

In computing the term object model has a distinct second meaning of the general
properties of objects in a specific computer programming language, technology, notation
or methodology that uses them. For example, the Java object model, the COM object
model, or the object model of OMT. Such object models are usually defined using
concepts such as class, message, inheritance, polymorphism, and encapsulation. There is
an extensive literature on formalized object models as a subset of the formal semantics of
programming languages

Data properties

Some important properties of data for which requirements need to be met are:

• definition-related properties
o relevance: the usefulness of the data in the context of your business.
o clarity: the availability of a clear and shared definition for the data.

15
o consistency: the compatibility of the same type of data from different
sources.

Another kind of data model describes how to organize data using a database management
system or other data management technology. It describes, for example, relational tables
and columns or object-oriented classes and attributes. Such a data model is sometimes
referred to as the physical data model, but in the original ANSI three schema architecture,
it is called "logical". In that architecture, the physical model describes the storage media
(cylinders, tracks, and tablespaces). Ideally, this model is derived from the more
conceptual data model described above. It may differ, however, to account for constraints
like processing capacity and usage patterns.

While data analysis is a common term for data modeling, the activity actually has more
in common with the ideas and methods of synthesis (inferring general concepts from
particular instances) than it does with analysis (identifying component concepts from
more general ones). {Presumably we call ourselves systems analysts because no one can
say systems synthesists.} Data modeling strives to bring the data structures of interest
together into a cohesive, inseparable, whole by eliminating unnecessary data
redundancies and by relating data structures with relationships./

Concurrency Control & Recovery


In computer science, concurrency is a property of systems in which several computations
are executing simultaneously, and potentially interacting with each other
this occurs in programs like SharePoint where two users edit the same document at the
same time. This can be avoided by using the check-in and check-out feature in
SharePoint. Versioning must be turned on at the site level for this to work

• Concurrency Control
– Provide correct and highly available access to data
in the presence of concurrent access by large and
diverse user populations
• Recovery
– Ensures database is fault tolerant, and not
corrupted by software, system or media failure
– 7x24 access to mission critical data
• Existence of Concrncy&Recovry
allows applications to be
written without explicit concern for
concurrency and fault tolerance

16
Database transaction and the ACID rules

the concept of a database transaction (or atomic transaction) has evolved in order to
enable both a well understood database system behavior in a faulty environment where
crashes can happen any time, and recovery from a crash to a well understood database
state. A database transaction is a unit of work, typically encapsulating a number of
operations over a database (e.g., reading a database object, writing, acquiring lock, etc.),
an abstraction supported in database and also other systems. Each transaction has well
defined boundaries in terms of which program/code executions are included in that
transaction (determined by the transaction's programmer via special transaction
commands). Every database transaction obeys the following rules (by support in the
database system; i.e., a database system is designed to guarantee them for the transactions
it runs):

• Atomicity - Either the effects of all or none of its operations remain ("all or
nothing" semantics) when a transaction is completed (committed or aborted
respectively). In other words, to the outside world a committed transaction appears

(by its
effects) to be indivisible, atomic, and an aborted transaction does not leave effects
at all, as if never existed.
• Consistency - Every transaction must leave the database in a consistent (correct)
state, i.e., maintain the predetermined integrity rules of the database (constraints
upon and among the database's objects). A transaction must transform a database
from one consistent state to another consistent state (it is the responsibility of the
transaction's programmer to make sure that the transaction itself is correct, i.e.,
performs correctly what it intends to perform while maintaining the integrity rules).
Thus since a database can be normally changed only by transactions, all the
database's states are consistent. An aborted transaction does not change the state.
• Isolation - Transactions cannot interfere with each other. Moreover, usually the
effects of an incomplete transaction are not visible to another transaction. Providing
isolation is the main goal of concurrency control.
• Durability - Effects of successful (committed) transactions must persist through
crashes (typically by recording the transaction's effects and its commit event in a
non-volatile memory).
• Thus concurrency control is an essential element for correctness in any system
where two database transactions or more, executed with time overlap, can access
the same data, e.g., virtually in any general-purpose database system. Consequently
a vast body of related research has been accumulated since database systems have
emerged in the early 1970s. A well established concurrency control theory exists for
database systems: serializability theory, which allows to effectively design and
analyze concurrency control methods and mechanisms.

17
transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency
exists. However, if concurrent transactions with interleaving operations are allowed in an uncontrolled
manner, some unexpected, undesirable result may occur. Here are some typical examples:

• The lost update problem: A second transaction writes a second value of a data-item (datum) on top
of a first value written by a first concurrent transaction, and the first value is lost to other
transactions running concurrently which need, by their precedence, to read the first value. The
transactions that have read the wrong value end with incorrect results.
• The dirty read problem: Transactions read a value written by a transaction that has been later
aborted. This value disappears from the database upon abort, and should not have been read by any
transaction ("dirty read"). The reading transactions end with incorrect results.
• The incorrect summary problem: While one transaction takes a summary over the values of all the
instances of a repeated data-item, a second transaction updates some instances of that data-item. The
resulting summary does not reflect a correct result for any (usually needed for correctness)
precedence order between the two transactions (if one is executed before the other), but rather some
random result, depending on the timing of the updates, and whether certain update results have been
included in the summary or not.

Many methods for concurrency control exist. Most of them can be


implemented within either main category above. The major methods, which have each
many variants, and in some cases may overlap or be combined, are:

Locking (e.g., Two-phase locking - 2PL) - Two-Phase Locking (2PL)

Two-Phase Locking Protocol


¡ Each Xact must obtain a S (shared) lock on object
before reading, and an X (exclusive) lock on object
before writing.
¡ A transaction can not request additional locks
once it releases any locks.
¡ If an Xact holds an X lock on an object, no other
Xact can get a lock (S or X) on that object

• Controlling access to data by locks assigned to the data. Access of a transaction to a


data item (database object) locked by another transaction may be blocked
(depending on lock type and access operation type) until lock release.
• Serialization graph checking (also called Serializability, or Conflict, or
Precedence graph checking) - Checking for cycles in the schedule's graph and
breaking them by aborts.

• Timestamp ordering (TO) - Assigning timestamps to transactions, and controlling
or checking access to data by timestamp order.
• Commitment ordering (or Commit ordering; CO) - Controlling or checking
transactions' order of commit events to be compatible with their respective
precedence order.

18
Lock Management

Lock and unlock requests are handled by the lock


manager

Lock table entry:


¡ Number of transactions currently holding a lock
¡ Type of lock held (shared or exclusive)
¡ Pointer to queue of lock requests

Locking and unlocking have to be atomic operations

Lock upgrade: transaction that holds a shared lock


can be upgraded to hold an exclusive lock
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
9
Deadlocks

Deadlock: Cycle of transactions waiting for


locks to be released by each other.

Two ways of dealing with deadlocks:


¡ Deadlock prevention
¡ Deadlock detection
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
10
Deadlock Prevention

Assign priorities based on timestamps.


Assume Ti wants a lock that Tj holds. Two
policies are possible:
¡ Wait-Die: It Ti has higher priority, Ti waits for Tj;
otherwise Ti aborts
¡ Wound-wait: If Ti has higher priority, Tj aborts;
otherwise Ti waits

If a transaction re-starts, make sure it has its


original timestamp
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
11
Deadlock Detection

Create a waits-for graph:


¡ Nodes are transactions
¡ There is an edge from Ti to Tj if Ti is waiting for Tj
to release a lock

19
Periodically check for cycles in the waits-for
graph

Database Systems
Introduction
Dr P Sreenivasa Kumar
Professor
CS&E Department
I I T Madras
Introduction
What is a Database?
A collection of related pieces of data:
▪ Representing/capturing the information about a real-world
enterprise or part of an enterprise.
▪ Collected and maintained to serve specific data management
needs of the enterprise.
▪ Activities of the enterprise are supported by the database and
continually update the database.
University Database:
Data about students, faculty, courses, research-
laboratories, course registration/enrollment etc.
Reflects the state of affairs of the academic aspects of the
university.
Purpose: To keep an accurate track of the academic
activities of the university.
An Example
Database Management System (DBMS)
A general purpose software system enabling:
▪ Creation of large disk-resident databases.
▪ Posing of data retrieval queries in a standard manner.
▪ Retrieval of query results efficiently.
▪ Concurrent use of the system by a large number of users
in a consistent manner.
▪ Guaranteed availability of data irrespective of system failures.
OS File System Storage Based Approach
• Files of records – used for data storage
• data redundancy – wastage of space
• maintaining consistency becomes difficult
• Record structures – hard coded into the programs
• structure modifications – hard to perform
• Each different data access request (a query)

20
• performed by a separate program
• difficult to anticipate all such requests
• Creating the system
• requires a lot of effort
• Managing concurrent access and failure recovery are difficult
DBMS Approach
DBMS
• separation of data and metadata
• flexibility of changing metadata
• program-data independence
Data access language
• standardized – SQL
• ad-hoc query formulation – easy
System development
• less effort required
• concentration on logical level design is enough
• components to organize data storage
process queries, manage concurrent access,
recovery from failures, manage access control
are all available
Data Model
Collection of conceptual tools to describe the database at a
certain level of abstraction.
▪ Conceptual Data Model
▪ a high level description
▪ useful for requirements understanding.
▪ Representational Data Model
▪ describing the logical representation of data
without giving details of physical representation.
▪ Physical Data Model
▪ description giving details about record formats,
file structures etc.
E/R (Entity/Relationship) Model
▪ A conceptual level data model.
▪ Provides the concepts of entities, relationships and attributes.
The University Database Context
Entities: student, faculty member, course, departments etc.
Relationships: enrollment relationship between student & course,
employment relationship between faculty
member, department etc.
Attributes: name, rollNumber, address etc., of student entity,
name, empNumber, phoneNumber etc., of faculty
thee-schema Architecture(1/2)
Logical Level Schema

21
▪ Describes the logical structure of the entire database.
▪ No physical level details are given.
Physical Level Schema
▪ Describes the physical structure of data in terms of
record formats, file structures, indexes etc.
Remarks
• Views are optional
- Can be set up if the DB system is very large and if
easily identifiable user-groups exist
• The logical scheme is essential
• Modern RDBMS’s hide details of the physical layer
Three-schema Architecture(2/2)
The ability to modify physical level schema without
affecting the logical or view level schema.
Performance tuning – modification at physical level
creating a new index etc.
Physical Data Independence – modification is localized
▪ achieved by suitably modifying PL-LL mapping.
▪ a very important feature of modern DBMS.
Physical Data Independence
Three Schema Arch
Logical Data Independence
The ability to change the logical level scheme without affecting the
view level schemes or application programs
Adding a new attribute to some relation
• no need to change the programs or views that don’t
require to use the new attribute
Deleting an attribute
• no need to change the programs or views that use
the remaining data
• view definitions in VL-LL mapping only need to be
changed for views that use the deleted attribute
.

Functional dependency
Functional dependencies are represented, associated with a particular schema,
by a set of elements found in the antecedent, and set of elements in the con-
sequent. These functional dependencies can be manipulated by application of
Armstrong’s axioms. This manipulation, as well as automatic generation of can-
didate keys, is handled by a Solver. The Solver is invoked when a user selects an
axiom to apply.

22
A functional dependency (FD) is a constraint between two sets of attributes in a relation
from a database.

Given a relation R, a set of attributes X in R is said to functionally determine another


attribute Y, also in R, (written X → Y) if and only if each X value is associated with
precisely one Y value. Customarily we call X the determinant set and Y the dependent
attribute. Thus, given a tuple and the values of the attributes in X, one can determine the
corresponding value of the Y attribute. For the purposes of simplicity, given that X and Y
are sets of attributes in R, X → Y denotes that X functionally determines each of the
members of Y - in this case Y is known as the dependent set. Thus, a candidate key is a
minimal set of attributes that functionally determine all of the attributes in a relation.

(Note: the "function" being discussed in "functional dependency" is the function


of identification.)
Constraint between two sets of attributes
Formal method for grouping attributes
DB as one single universal relation/-literal
R = {A1,A2,…,An}
Two sets of attributes, X subset R,Y subset R
Functional dependency (FD or f.d.) X -> Y
If t1[X] = t2[X], then t1[Y] = t2[Y]
Values of the Y attribute depend on value of X
X functionally determines Y, not reverse necessarily

A functional dependency FD: X → Y is called trivial if Y is a subset of X.

The determination of functional dependencies is an important part of designing databases


in the relational model, and in database normalization and denormalization. The
functional dependencies, along with the attribute domains, are selected so as to generate
constraints that would exclude as much data inappropriate to the user domain from the
system as possible.

Irreducible function depending set

A functional depending set S is irreducible if the set has following three properties:

1. Each right set of a functional dependency of S contains only one attribute.


2. Each left set of a functional dependency of S is irreducible. It means that reducing
any one attribute from left set will change the content of S (S will lose some
information).
3. Reducing any functional dependency will change the content of S.

Sets of Functional Dependencies(FD) with these properties are also called canonical or
minimal.

23
Properties of functional dependencies

Given that X, Y, and Z are sets of attributes in a relation R, one can derive several
properties of functional dependencies. Among the most important are Armstrong's
axioms, which are used in database normalization:

• Subset Property (Axiom of Reflexivity): If Y is a subset of X, then X → Y


• Augmentation (Axiom of Augmentation): If X → Y, then XZ → YZ
• Transitivity (Axiom of Transitivity): If X → Y and Y → Z, then X → Z

From these rules, we can derive these secondary rules:

• Union: If X → Y and X → Z, then X → YZ


• Decomposition: If X → YZ, then X → Y and X → Z
• Pseudotransitivity: If X → Y and WY → Z, then WX → Z

Equivalent sets of functional dependencies are called covers of each other. Every set of
functional dependencies has a canonical cover.

Inclusion dependencies
INDs (which can say, for example, that every manager is an
employee) are studied, including their interaction with functional dependencies, or FDs. A
simple complete axiomatization for INDs is presented, and the decision problem for INDs is
shown to be PSPACE-complete. (The decision problem for INDs is the problem of deter-
mining whether or not C logically implies u, given a set Z of INDs and a single IND u).
As an example, an inclusion
dependency can say that every MANAGER entry of the R relation appears as an
EMPLOYEE entry of the S relation. In general, an inclusion dependency is of the
form
We note that INDs differ from other commonly studied database dependencies in
two important respects. First, INDs may be interrelational, whereas the others deal
with a single relation at a time. Second, INDs are not typed [Fa4]; they are special
cases of extended embedded implicational dependencies [Fa4], for which the
existence of “Armstrong-like databases” have been proven. We show that INDs have a
simple complete axiomatization. However, we also
show the rather surprising fact that the decision problem for INDs

DATA MANIPULATION LANGUAGE(D M L)

24
These commands are used to append, change or remove the data in a Table.
COMMIT/ROLLBACK statement should be given to make the changes permanent or to
revert back.
2.2.1 INSERT
Using this command we can append data into tables.
Syntax
INSERT INTO <table name>(<column_name1>, <column_name2>, ...)
VALUES (column1_value, column2_value, ...);
INSERT INTO <table-name>(<column_name2>, <column_name1>, ...)
VALUES (column2-value, column1-value, ...);
INSERT INTO <table-name> VALUES (value1, value2, ...);
Example
INSERT INTO Employee(empno, ename, salary, hire_date, gender, email)
VALUES(1234, ‘JOHN’, 8000, ’18-AUG-80, ‘M’, ‘john@miraclesoft.com’);
INSERT INTO Employee(email , ename, empno, hire_date, gender, salary)
VALUES(‘rhonda@miraclesoft.com’, ‘RHONDA’, 1235, ’24-JUL-81’, ‘F’, 7500);
INSERT INTO Employee
VALUES(1236, ‘JACK’, 15000, ’23-SEP-79’, ‘m’, ‘jack@miraclesoft.com’);

4
2.2.2 UPDATE
This command is used to modify the data existing in the tables.
Syntax
UPDATE <table-name> SET <column-name> = <value>;
UPDATE <table-name> SET <column-name> = <value> WHERE <condition>;
Example
UPDATE Employee SET salary = 15000;
UPDATE employee SET salary = 15000 WHERE empno = 1235;
2.2.3 DELETE
This command is used to remove the data from the tables.
Syntax
DELETE FROM <table-name>;
DELETE FROM <table-name>WHERE <condition>;
Example
DELETE FROM Employee;
DELETE FROM Employee WHERE empno = 1236

Oracle Forms
Oracle Forms is a software product for creating screens that interact with an Oracle
database. It has an IDE including an object navigator, property sheet and code editor that
uses PL/SQL. It was originally developed to run server-side in character mode terminal
sessions. It was ported to other platforms, including Windows, to function in a client–
server environment. Later versions were ported to Java where it runs in a Java EE
container and can integrate with Java and web services.

The primary focus of Forms is to create data entry systems that access an Oracle database

Oracle Forms accesses the Oracle database and generates a screen that presents the data.
The source form (*.fmb) is compiled into an "executable" (*.fmx), that is run
(interpreted) by the forms runtime module. The form is used to view and edit data in

25
database-driven applications. Various GUI elements, such as buttons, menus, scrollbars,
and graphics can be placed on the form.

The environment supplies built-in record creation, query, and update modes, each with its
own default data manipulations. This minimizes the need to program common and
tedious operations, such as creating dynamic SQL, sensing changed fields, and locking
rows.

As is normal with event driven interfaces, the software implements event-handling


functions called triggers which are automatically invoked at critical steps in the
processing of records, the receipt of keyboard strokes, and the receipt of mouse
movements. Different triggers may be called before, during, and after each critical step.

Each trigger function is initially a stub, containing a default action or nothing.


Programming Oracle Forms therefore generally consists of modifying the contents of
these triggers in order to alter the default behavior. Some triggers, if provided by the
programmer, replace the default action while others augment it.

As a result of this strategy, it is possible to create a number of default form layouts which
possess complete database functionality yet contain no programmer-written code at all.

History

Oracle Forms is sold and released separately from the Oracle database. However, major
releases of an Oracle database usually result in a new major version of Oracle Forms to
support new features in the database.

HOW IT WOKS

Oracle Forms accesses the Oracle database and generates a screen that presents the data.
The source form (*.fmb) is compiled into an "executable" (*.fmx), that is run
(interpreted) by the forms runtime module. The form is used to view and edit data in
database-driven applications. Various GUI elements, such as buttons, menus, scrollbars,
and graphics can be placed on the form.

The environment supplies built-in record creation, query, and update modes, each with its
own default data manipulations. This minimizes the need to program common and
tedious operations, such as creating dynamic SQL, sensing changed fields, and locking
rows.

As is normal with event driven interfaces, the software implements event-handling


functions called triggers which are automatically invoked at critical steps in the
processing of records, the receipt of keyboard strokes, and the receipt of mouse
movements. Different triggers may be called before, during, and after each critical step.

26
Each trigger function is initially a stub, containing a default action or nothing.
Programming Oracle Forms therefore generally consists of modifying the contents of
these triggers in order to alter the default behavior. Some triggers, if provided by the
programmer, replace the default action while others augment it.

As a result of this strategy, it is possible to create a number of default form layouts which
possess complete database functionality yet contain no programmer-written code at all.

The first version of Oracle Forms was named Interactive Application Facility (IAF). This
had two main components, the compiler (Interactive Application Generator - IAG) and
the runtime interpreter (Interactive Application Processor - IAP). This provided a
character mode interface to allow users to enter and query data from an Oracle database.

IAF was released with Oracle Database Version 2, the first commercial version of Oracle.
It was renamed to FastForms with Oracle Database version 4 and added an additional
tool to help generate a default form to edit with standard tool (IAG).

Renamed to SQL*Forms version 2 with the Oracle 5 database.

Oracle Forms 2.3 was character based, and did not use PL/SQL. The source file was an
*.INP ASCII file. It was common for developers to edit the INP file directly although that
was not supported by Oracle. This version used its own primitive and unfriendly built-in
language, augmented by user exits—compiled language code linked to the binary of the
Oracle-provided run-time

Oracle Forms 3 was character based, and was the first real version of Forms, using
PL/SQL. All subsequent versions are a development of this version. It could run under X
but did not support any X interface specific features such as checkboxes. The source file
was an *.INP ASCII file. The IDE was vastly improved from 2.3 which dramatically
decreased the need to edit the INP file directly, although this was still a common practice.
Forms 3 automatically generated triggers and code to support some database constraints.
Constraints could be defined, but not enforced in the Oracle 6 database at this time, so
Oracle used Forms 3 to claim support for enforcing constraints.

There was a "GUI" version of Forms 3 which could be run in environments such as X
Window, but not Microsoft Windows. This had no new trigger types, which made it
difficult to attach PL/SQL to GUI events such as mouse movements.

Oracle Forms version 4.0 was the first "true" GUI based version. A character based
runtime was still available for certain customers on request. The arrival of Microsoft
Windows 3 forced Oracle to release this GUI version of Forms for commercial reasons.
Forms 4.0 accompanied Oracle version 6 with support for Microsoft Windows and X

27
Window. This version was notoriously buggy and introduced an IDE that was unpopular
with developers. This version was not used by the Oracle Financials software suite. The
4.0 source files were named *.FMB and were binary.

Oracle Forms version 4.5 was really a major release rather than a "point release" of 4.0
despite its ".5" version number. It contained significant functional changes and a brand
new IDE, replacing the unpopular IDE introduced in 4.0. It is believed to be named 4.5 in
order to meet contractual obligations to support Forms 4 for a period of time for certain
clients. It added GUI-based triggers, and provided a modern IDE with an object
navigator, property sheets and code editor.

Due to conflicting operational paradigms, Oracle Forms version 5, which accompanied


Oracle version 7, featured custom graphical modes tuned especially for each of the major
systems. However, its internal programmatic interface remained system-independent. It
was quickly superseded by Forms 6.

Forms 6 was released with Oracle 8.0 database; it was rereleased as Forms 6i with Oracle
8i. This was basically Forms 4.5 with some extra wizards and bug-fixes. But it also
included the facility to run inside a web server. A Forms Server was supplied which
solved the problem of adapting Oracle Forms to a three-tier, browser-based delivery,
without incurring major changes in its programmatic interface. The complex, highly
interactive form interface was provided by a Java applet which communicated directly
with the Forms server. However the web version did not work very well over HTTP. A
fix from Forms 9i was retrofitted to later versions of 6i to address this.

The naming and numbering system applied to Oracle Forms underwent several changes
due to marketing factors, without altering the essential nature of the product. The ability
to code in Java, as well as PL/SQL, was added in this period.

Forms 9i included many bug fixes to 6i and was a stable version. But it did not include
either client–server or character-based interfaces, and three-tier, browser-based delivery
is the only deployment choice from here on. The ability to import java classes means that
it can act as a web service client.

Forms 10g is actually Forms version 9.0.4, so is merely a rebadged forms 9i.

Forms 11 will include some new features, relying on Oracle AQ to allow it to interact
with JM

10g is Oracle's award winning Web Rapid Application


Development tool, part of the Oracle Developer Suite 10g. It is a highly
productive, end-to-end, PL/SQL based, development environment for
building enterprise-class, database centric Internet applications. Oracle
Application Server 10g provides out-of-the-box optimized Web
deployment platform for Oracle Forms 10g. Oracle itself is using

28
Oracle Forms for Oracle Applications

DBMS Function
1. Data Dictionary Management
2. Data Storage Management
3. Data Transformation and Presentation
4. Security Management
5. Multi-User Access Control
6. Backup and Recovery Management
7. Data Integrity Management
8. Database Access Languages and
Application Programming Interfaces
9. Database Communication Interfaces

Database Model
16
Copyright © 2004 R.M. Laurie

Collection of logical constructs used to


represent the data structure and the data
relationships found within the database.
◆Conceptual models focus on what is represented
rather than how it is represented.
◆Entity Relationship Diagram
◆Object Oriented Model
◆Implementation models emphasis on how the
data is represented in the database or on how the
data structures are implemented.
◆Hierarchical Database Model
◆Relational Database Model
◆Object Oriented Database Model

17

Database Conceptual Model


❖Three Types of Relationships
◆One-to-many relationships (1:M)
◆A painter paints many different paintings, but
each one of them is painted by only that painter.
PAINTER (1) paints PAINTING (M)

◆Many-to-many relationships (M:N)


◆An employee might learn many job skills, and
each job skill might be learned by many
employees.
EMPLOYEE (M) learns SKILL (N)

◆One-to-one relationships (1:1)


◆Each store is managed by a single employee and
each store manager (employee) only manages a
single store.

29
EMPLOYEE (1) manages STORE (1)
Copyright © 2004 R.M. Laurie
18

❖Logically represented by an upside down tree


◆ Each parent can have many children
◆ Each child has only one parent
Figure 1.8

Implementation Model: Hierarchical Database


Copyright © 2004 R.M. Laurie
19

Hierarchical Database❖Advantages
◆ Conceptual simplicity – Relationships defined
◆ Database security – Uniform throughout system
◆ Data independence – Data type cascaded
◆ Database integrity – Child referenced to parent
◆ Efficiency – Parent to Child (One to Many)
❖Disadvantages
◆ Complex implementation
◆ Difficultto manage
◆ Lacks structural independence
◆ Applications programming and use is complex
◆ Implementation limitations (Many to Many)
◆ Lack of standards

Copyright © 2004 R.M. Laurie


20

Implementation Model: Relational Database


❖Basic Structure
◆ Relational DataBase Management Systems
(RDBMS) allows operations in a human logical
environment
◆ The relational database is perceived as a collection
of tables.
◆ Each table consists of a series of row/column
intersections.
◆ Tables (or relations) are related to each other by
sharing a common entity characteristic
◆ The relationship type shown in a relational schema
◆ A table yields data and structural independence
◆ Microsoft Access is a RDBMS

21

Relational Database Model


❖Advantages
◆ Structural independence
◆ Improved conceptual simplicity
◆ Easier database design, implementation,
management, and use
◆ Ad hoc query capability (SQL)
◆ Powerful database management system
◆ Most common DBMS used today
❖Disadvantages
◆ Substantial hardware and system software

30
overhead
◆ Possibility of poor design and implementation
◆ Potential “islands of information” = local DB
Copyright © 2004 R.M. Laurie

Figure 1.11
Copyright © 2004 R.M. Laurie
22

Relational Database Model


23

Conceptual Model: Entity Relationship


❖E-R models are normally represented in an
Entity Relationship Diagram (ERD).
❖An entity is represented by a rectangle.
◆ Usually a Noun or Object of the sentence.
❖A relationship is represented by a diamond
connected to the related entities.
◆ Usually a Verb.
❖An attribute is a characteristic of the entity.
◆ Represented by ellipses connected to entity.
◆ Usually Nouns
Figure 1.13
Note: Preferred over crow's feet because can use PowerPoint to draw

Copyright © 2004 R.M. Laurie


25

Entity Relationship Model


❖Advantages
◆Exceptional conceptual simplicity
◆Visual representation
◆Effective communication tool
◆Integrated with the relational database
model
❖Disadvantages
◆Limited constraint representation
◆Limited relationship representation
◆No data manipulation language
◆Loss of information content
Copyright © 2004 R.M. Laurie
26

Implementation Model: Object-Oriented DB


❖Basic Structure
◆Objects are abstractions of actual entities.
◆Attributes are properties of an object.
◆A Class is a collection of similar objects
with shared structure (attributes) and
behavior (methods).
◆Classes are organized in a class hierarchy.
◆An object can inherit the attributes and
methods of the classes above it.

31
Figure 1.15: A Comparison: The OO Data Model and the ER Model
Copyright © 2004 R.M. Laurie
28

Object-Oriented Database Model


❖Advantages
◆ Visual presentation
◆ Database integrity
◆ Both structural and data independence
◆ Object Oriented Method with Class Inheritance
❖Disadvantages
◆ Lack of Object Oriented Data Model standards
◆ Complex navigational data access
◆ Steep learning curve
◆ High system overhead slows transactions

Normal forms

The normal forms (abbrev. NF) of relational database theory provide criteria for
determining a table's degree of vulnerability to logical inconsistencies and anomalies. The
higher the normal form applicable to a table, the less vulnerable it is to inconsistencies
and anomalies. Each table has a "highest normal form" (HNF): by definition, a table
always meets the requirements of its HNF and of all normal forms lower than its HNF;
also by definition, a table fails to meet the requirements of any normal form higher than
its HNF.

The normal forms are applicable to individual tables; to say that an entire database is in
normal form n is to say that all of its tables are in normal form n.

Newcomers to database design sometimes suppose that normalization proceeds in an


iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on.
This is not an accurate description of how normalization typically works. A sensibly
designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is
overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms
(above 3NF) does not usually require an extra expenditure of effort on the part of the
designer, because 3NF tables usually need no modification to meet the requirements of
these higher normal forms.

The main normal forms are summarized below.

Normal form Defined by Brief definition


First normal Two versions: E.F. Codd Table faithfully represents a
form (1NF) (1970), C.J. Date (2003)[11] relation and has no repeating
groups

32
Second normal E.F. Codd (1971)[12] No non-prime attribute in the table
form (2NF) is functionally dependent on a
proper subset of a candidate key
Third normal E.F. Codd (1971)[13]; see +also Every non-prime attribute is non-
form (3NF) Carlo Zaniolo's equivalent but transitively dependent on every
differently-expressed definition candidate key in the table
(1982)[14]
Boyce–Codd Raymond F. Boyce and E.F. Every non-trivial functional
normal form Codd (1974)[15] dependency in the table is a
(BCNF) dependency on a superkey
Fourth normal Ronald Fagin (1977)[16] Every non-trivial multivalued
form (4NF) dependency in the table is a
dependency on a superkey
Fifth normal Ronald Fagin (1979)[17] Every non-trivial join dependency
form (5NF) in the table is implied by the
superkeys of the table
Domain/key Ronald Fagin (1981)[18] Every constraint on the table is a
normal form logical consequence of the table's
(DKNF) domain constraints and key
constraints
Sixth normal C.J. Date, Hugh Darwen, and Table features no non-trivial join
form (6NF) Nikos Lorentzos (2002)[4] dependencies at all (with reference
to generalized join operator)

Anomaly

When an attempt is made to modify (update, insert into, or delete from) a table, undesired
side-effects may follow. Not all tables can suffer from these side-effects; rather, the side-
effects can only arise in tables that have not been sufficiently normalized. An
insufficiently normalized table might have one or more of the following characteristics:

• The same information can be expressed on multiple rows; therefore updates to the
table may result in logical inconsistencies. For example, each record in an
"Employees' Skills" table might contain an Employee ID, Employee Address, and
Skill; thus a change of address for a particular employee will potentially need to
be applied to multiple records (one for each of his skills). If the update is not
carried through successfully—if, that is, the employee's address is updated on
some records but not others—then the table is left in an inconsistent state.
Specifically, the table provides conflicting answers to the question of what this
particular employee's address is. This phenomenon is known as an update
anomaly.
• There are circumstances in which certain facts cannot be recorded at all. For
example, each record in a "Faculty and Their Courses" table might contain a

33
Faculty ID, Faculty Name, Faculty Hire Date, and Course Code—thus we can
record the details of any faculty member who teaches at least one course, but we
cannot record the details of a newly-hired faculty member who has not yet been
assigned to teach any courses except by setting the Course Code to null. This
phenomenon is known as an insertion anomaly.
• There are circumstances in which the deletion of data representing certain facts
necessitates the deletion of data representing completely different facts. The
"Faculty and Their Courses" table described in the previous example suffers from
this type of anomaly, for if a faculty member temporarily ceases to be assigned to
any courses, we must delete the last of the records on which that faculty member
appears, effectively also deleting the faculty member. This phenomenon is known
as a deletion anomaly

34
35
Mapping Constraints
An E-R scheme may define certain constraints to which the contents of a database must
conform.

• Mapping Cardinalities: express the number of entities to which another entity


can be associated via a relationship. For binary relationship sets between entity
sets A and B, the mapping cardinality must be one of:
1. One-to-one: An entity in A is associated with at most one entity in B, and
an entity in B is associated with at most one entity in A. (Figure 2.3)
2. One-to-many: An entity in A is associated with any number in B. An
entity in B is associated with at most one entity in A. (Figure 2.4)
3. Many-to-one: An entity in A is associated with at most one entity in B.
An entity in B is associated with any number in A. (Figure 2.5)
4. Many-to-many: Entities in A and B are associated with any number from
each other. (Figure 2.6)

The appropriate mapping cardinality for a particular relationship set depends on


the real world being modeled. (Think about the CustAcct relationship...)

• Existence Dependencies: if the existence of entity X depends on the existence of


entity Y, then X is said to be existence dependent on Y. (Or we say that Y is the
dominant entity and X is the subordinate entity.)

For example,

o Consider account and transaction entity sets, and a relationship log


between them.
o This is one-to-many from account to transaction.
o If an account entity is deleted, its associated transaction entities must also
be deleted.
o Thus account is dominant and transaction is subordinate.

ER diagram

36
(ER diagram)

Entity relationship diagram is a graphical representation of a data model of an application. It acts


as the basis for mapping the
application to the relational database

37
The Entity-Relationship (ER) Diagram. One of the key techniques in ER modeling is to document the
entity and relationship types in a graphical form called, Entity-Relationship (ER) diagram. Figure 2 is a
typical ER diagram. The entity types such as EMP and PROJ are depicted as rectangular boxes, and the
relationship types such as WORK-FOR are depicted as a diamond-shaped box. The value sets (domains)
such as EMP#, NAME, and PHONE are depicted as circles, while attributes are the “mappings” from entity
and relationships types to the value sets. The cardinality information of relationship is also expressed. For
example, the “1” or “N” on the lines between the entity types and relationship types indicated the upper
limit of the entities of that entity type participating in that relationships.
Fig. 2. An Entity-Relationship (ER) Diagram
ER Model is based on Strong Mathematical Foundations. The ER model is based on (1) Set Theory,
(2) Mathematical Relations, (3) Modern Algebra, (4) Logic, and (5) Lattice Theory. A formal definition of
the entity and relationship concepts can be found in Fig. 3.
Fig. 3. Formal Definitions of Entity and Relationship Concepts

Significant Differences between the ER model and the Relational Model.


There are several differences
between the ER model and the Relational Model:
ER Model uses the Mathematical Relation Construct to Express the Relationships between Entities. The
relational model and the ER model both use the mathematical structure called Cartesian product. In some
way, both models look the same – both use the mathematical structure that utilizes the Cartesian product of
something. As can be seen in Figure 3, a relationship in the ER model is defined as an ordered tuple of

38
“entities.” In the relational model, a Cartesian product of data “domains” is a “relation,” while in the ER
model a Cartesian product of “entities” is a “relationships.” In other words, in the relational model the

mathematical relation construct is used to express the “structure of data values,” while in the ER model the
same construct is used to express the “structure of entities.”
ER Model Contains More Semantic Information than the Relational Model. By the original definition of
relation by Codd, any table is a relation. There is very little in the semantics of what a relation is or should
be. The ER model adds the semantics of data to a data structure. Several years later, Codd developed a
data model called RM/T, which incorporated some of the concepts of the ER model.
ER Model has Explicit Linkage between Entities. As can be seen in Figures 2 and 4, the linkage between
entities is explicit in the ER model while in the relational model is implicit. In addition, the cardinality
information is explicit in the ER model, and some of the cardinality information is not captured in the
relational model

Object-Oriented (OO) Analysis Techniques are Partically Based on the ER Concepts


It is commonly acknowledged that one major component of the object-oriented (OO) analysis techniques
are based on the ER concepts. However, the “relationship” concept in the OO analysis techniques are still
hierarchy-oriented and not yet equal to the general relationship concept advocated in the ER model. It is
noticeable in the past few years that the OO analysis techniques are moving toward the direction of
adopting a more general relationship concept.
4.4 Data Mining is a Way to Discover Hidden Relationships
Many of you have heard about data mining. If you think deeply about what the data mining actually does,
you will see the linkage between data mining and the ER model. What is data mining? What does the data
mining really is doing? In our view, it is a discovery of “hidden relationships” between data entities. The
relationships exist already, and we need to discover them and then take advantage of them. This is
different from conventional database design in which the database designers identify the relationships. In
data mining, algorithms instead of humans are used to discover the hidden relationship

An ERD is a model that identifies the concepts or entities that exist in a


system and the relationships between those entities. An ERD is often used as a way to
visualize a relational database: each entity represents a database table, and the
relationship lines represent the keys in one table that point to specific records in related
tables. ERDs may also be more abstract, not necessarily capturing every table needed
within a database, but serving to diagram the major concepts and relationships. This
ERD is of the latter type, intended to present an abstract, theoretical view of the major
entities and relationships needed for management of e-resources. It may assist the
database design process for an ERM system, but does not identify every table that would
be necessary for an e-resource management database.
This ERD should be examined in close consultation with other components of the Report
of the DLF Electronic Resource Management Initiative, especially Appendix D (Data
Element Dictionary) and Appendix E (Data Structure). The ERD presents a visual
representation of e-resource management concepts and the relationships between them.
The Data Element Dictionary identifies and defines the individual data elements that an
e-resource management system must contain and manage, but leaves the relationship
between the elements to be inferred by the reader. The Data Structure associates each

39
data element with the entities and relationships defined in the ERD. Together, these three
documents form a complete conceptual data model for e-resource management.
Understanding the Model
There are several different modeling systems for entity relationship diagramming. This
ERD is presented in the “Information Engineering” style. Those unfamiliar with entity
relationship diagramming or unfamiliar with this style of notation may wish to consult
the following section to clarify the diagramming symbology

Relational Algebra
Steps in Building and Using a Database
1. Design schema
2. Create schema in DBMS
3. Load initial data
4. Repeat: execute queries and updates on the database
Database Query Languages
What is a query?
Given a database, ask questions, get answers
Example: get all students who are now taking CS145
Example (from the TPC-D benchmark):
“The Volume Shipping Query finds, for two given nations, the gross discounted revenues derived from lineitems in which parts were
shipped from a supplier in either nation to a customer in the other nation during 1995 and 1996. The query lists the supplier nation, the
customer nation, the year, and the revenue from shipments that took place in that year. The query orders the answer by supplier nation,
customer nation, and year (all ascending).”

Some queries are easy to pose, some are not


Some queries are easy for DBMS to answer, some are not
Relational Query Languages
Formal: Relational Algebra, Relational Calculus, Datalog
Practical: SQL, Quel, QBE (Query-by-Example)

40
What is a relational query?
Input: a number of relations in your database
Output: one relation as the answer
Relational Algebra
Basic operators: selection, projection, cross product, union, difference,
and renaming
Additional operators (can be defined using basic ones): theta-join,
natural join, intersection, etc.

Operands: relations
Input relation(s) operator output relation
Jun Yang 1 CS145 Spring 1999
Example:
Student(SID, name, age, GPA)
Take(SID, CID)
Course(CID, title)
Selection
Notation:
Purpose: pick rows according to some criteria
Input: a table
Output: has the same columns as , but only the rows of that satisfy
Example: the student with SID 123
Example: students with GPA higher than 3.0
Example: straight-A students under 18 or over 21
The selection predicate in general can include any columns of , constants,
comparisons such as , , etc., and Boolean connectives (and),
(or), (not)
Projection
Notation:
Purpose: pick columns to output
Input: a table
Output: has only the columns of listed in
Example: SID’s and names of all students
Example: SID’s of students taking classes
Notice the elimination of duplicate rows
Example of composing and : names of students under 18
Jun Yang 2 CS145 Spring 1999
Product and Joins
Cross Product
Notation:
Purpose: pair rows from two tables
Input: two tables and
Output: for each row in and each row in , output a row ; the output
table has the columns of and the columns of

41
Example: Student Take
If column names conflict, prefix the names with the table name and a dot
Looks odd to glue unrelated tuples together; why use then?
Example: names of students and CID’s of the courses they are taking
Theta-Join
Notation:
Purpose: relate rows from two tables according to some criteria
Shorthand for:
Example: names of students and CID’s of the courses they are taking
Natural Join

Notation:

Purpose: relate rows from two tables, and


enforce equality on all common attributes
eliminate one copy of common attributes
Shorthand for: , where , and
Example: Student Take
Example: names of students taking calculus
Jun Yang 3 CS145 Spring 1999
Set Operators
Union:
Difference:
Intersection:
Input: two tables and with identical schema
Output: has the same schema as and
Duplicate rows are eliminated (as usual) in union
is just a shorthand for
Example of union:
Student(SID, name, age, GPA)
GradStudent(SID, name, age, GPA, advisor)
Find all student SID’s
Example of difference: CID’s of the courses that nobody is taking
What if we also want course titles?
Renaming
Notation: , or
Purpose: rename a table and/or its columns

Example: SID’s of all pairs of classmates

Atomicity

42
In database systems, atomicity (or atomicness) is one of the ACID transaction properties.
In an atomic transaction, a series of database operations either all occur, or nothing
occurs. ... Atomicity – All database modifications must follow an “all or nothing” rule in which each
transaction is “atomic.” That means that if one part of the transaction fails, the entire
transaction fails. No splitting of atoms allowed! It is critical that the database
management system maintain the atomic nature of transactions in spite of any DBMS,
operating system or hardware failure

A.C.I.D. stands for Atomicity, Consistency, Isolation and Durability

43

You might also like