You are on page 1of 26


1.1 Introduction to Database Systems
Database (DB) - An integrated collection of related data
By related data we mean that the data represents logically coherent facts about some aspects of the real world
that are required by an application
Universe of discourse or mini-world - The part of the real world that a database is designed to model within
a computer
By integrated we mean that the data for multiple applications is stored together and manipulated in a uniform
way on a secondary storage such as a magnetic or an optical disk. The primary goal of integration is to support
information sharing across multiple applications.
a Database System consists of 1) an application specific database, 2) the DBMS that maintains that database,
and 3) the application software that manipulates the database

Database Systems and Database Management Systems
A Database Management System (DBMS) is a collection of programs that controls a database. Specifically,
it provides us with an interface to create, maintain, and manipulate multiple databases
DBMS is a general-purpose software system that we can use not only to create and maintain multiple
databases but also to implement database systems for different applications as well.
As opposed to a DBMS, which is general-purpose, a database system is developed to support the operations
of a specific organization or a specific set of applications.
(Ways of Handling Databases)
1. Manual – manual manipulation of data
Ex. Manual card catalog
2. Computerized – electronic data handling
Traditional File Processing System (TFPS)
Database Management System (DBMS)
TFPS - application programs directly (filenames and data definitions are embedded in each program.)
-data are integrated in a single, shared data file, all application programs that share the data file must
be aware of all the data in the file, including those data items that they do not make use of or need to
- The problem gets worse when a new field is added to a data file

Disadvantages of TFPS

1. Uncontrolled Redundancy
2. Inconsistent Data
3. Inflexibility
4. Limited Data Sharing
5. Poor Enforcement of Standards
6. Low Programmer Productivity
7. Excessive Program Maintenance

College of Computer Studies 1
DBMS - stores the structure of the data as part of the description of the database in the system catalog,
separately from the application programs

Characteristics of DBMS

Data Abstraction
DBMSs allow data to be structured in ways that make it more understandable and meaningful to the
applications than the ways data are physically stored on disks. They provide users with high-level, conceptual
reC:\Documents and Settings\Arnel & Maegen\My Documents\CSCI12_lecedited.docpresentations of the data—a
table in relational DBMSs, or an object in object-oriented DBMSs, to give two examples—while they hide storage
details that are not of interest to most database users.
program-data independence - the physical organization of data can be changed without affecting the
application programs
program-operation independence- the implementation of abstract operations can be changed without affecting
the code of the application programs, - as long as their calling interface stays the same

Data abstraction and, in particular, data independence is what facilitates data sharing and integration. This
is the main advantage of DBMS against Traditional File processing whose application programs depend on the low-
level structure of the data or storage organization, each program stores its data in a separate data file

DBMSs provide high reliability by 1) enforcing integrity constraints and 2) ensuring data consistency despite hardware
or software failures.
Integrity constraints reflect the meaning (or, the semantics) of the data and of the application ( ex. Data
Constraints – conditions, restrictions
Data consistency that is, interrupted update operations do not corrupt the database with values that violate
the integrity constraints and no data in the database is lost.
After a failure, a DBMS automatically recovers, restoring the database to the consistent state in which
it existed just prior to the interruption. This consistent state is constructed as follows. During recovery, a DBMS
rolls back all interrupted transactions, obliterating their updates from the database, and re-executes
successfully terminated transactions as necessary, restoring their updates in the database

DBMSs support both efficient space utilization and efficient access to data. By making use of the data description in
the catalog, DBMSs are able to minimize data redundancy, which in turn saves both space, by storing each data item
only once, and processing time, by eliminating the need of multiple updates to keep the replicas consistent and up-to-
DBMSs enhance the performance of queries by means of optimizations and the use of access methods to data based
on their values. Optimizations simplify the queries so that they can execute faster,

and access methods allow direct access to locations where relevant data are stored, in a way similar to the access
provided by the index in the back of a book.
DBMSs decrease response time of transactions by allowing multiple users to access the database
1.2 Relational Databases
Relational Database Schema
A relational database schema is a set of table schemas and a set of integrity constraints. Integrity constraints can
be sorted into two kinds:

 structural (model-specific) integrity constraints that are imposed by the model as discussed below, and
 semantic (application-specific) integrity constraints imposed by the application, such as the constraint, for
example, that the balance of a savings account cannot be negative.

Keys - Keys are columns whose values are sufficient to uniquely identify a row
1. Primary Key – uniquely identifies a record
2. Secondary Key – used to access a group of records with common attributes
3. Alternate Key – candidate to be Primary Key
4. Composite Key – composed of two or more columns to access a unique record
5. Foreign Key - a non- key attribute (ordinary column ) in one table, but a primary key in another.
- establishes association(relationships) among tables within one database (in a relational
database schema)

College of Computer Studies 2
DDL (Data Definition Language)

The command to create a table in SQL is the CREATE TABLE command. SQL supports all the basic
data types found in most programming languages: integer, float, character, and character string. SQL
commands are not case sensitive.

MemNo integer(4),
DriverLic integer,
Fname char(10),
MI char,
Lname char(15),
PhoneNumber char(14),
UNIQUE (DriverLic)

The primary key is specified using the PRIMARY KEY directive, alternate keys using the UNIQUE directive
DML (Data Manipulation Lanaguage)

Update Operations
Relational DML allows us to insert and delete rows in a table as well as to update the values of one or more
columns in a row.
In SQL, only one row can be inserted at a time, by specifying the values of each column, as in the following
VALUES (101, 6876588, 'Susan', W, 'Jones', '412-376-8888');
This statement inserts a new row for Susan W. Jones in the MEMBER table. In SQL, strings are enclosed
within single quotes.
Delete and update can be applied to multiple rows that satisfy a selection condition. In SQL, a selection condition in a
deletion is specified by a WHERE clause. In the simplest case, a row is selected by specifying the value of its primary
key. For example, the statement
WHERE MemNo = 102;
deletes the row with member number 102 from the MEMBER table. The following statement changes the middle initial
of the member 101 in the MEMBER table.
WHERE MemNo = 101;
An update operation succeeds if it does not violate any integrity constraints. For example, an insert operation
will not succeed if it attempts to insert a row whose keys, primary and alternate, conflict with existing keys. That is, if
the row were to be inserted, the property that keys should be unique would be violated. On the other hand, deleting a
row never violates a key constraint, unless the deleted row is referenced by a foreign key. In that case, deleting a row
might violate a referential integrity constraint

1) QBE (Query By Example)
Query By Example (QBE) is another visual query language developed by IBM [Zloof, 1977] to simplify an
average end-user's task of retrieving data from a database. QBE saves the user from having to remember the names
of tables and columns, and the precise syntax of a query language. The basic idea is to retrieve data with the help of
query templates.

QBE works as follows:

 the system provides the user with a skeleton or query template of the tables in the database, and
 the user fills in the tables with examples of what is to be retrieved or updated.

A skeleton of a table is basically a copy of the table without any rows, i.e. an empty table. For simple selection
conditions, the examples can be constant values, such as Susan and 100, or comparisons with constant values such
as 100, specified under a column

Projection in QBE
Projection is specified by selecting the show button associated with each field, which we denote in our
example with "P.". To print all columns of retrieved tuples, we only need to put one "P." under the name of the table.
EX. displays MemNo, Lname, and PhoneNumber from MEMBER:

College of Computer Studies 3
MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|
P. P. P.
The result of a query is displayed in a result table, which subsequently can be either stored or manipulated
further. In Microsoft Access, the resulting table is called a datasheet.
Selection in QBE
QBE2: Retrieve all members whose first name is John.
MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|
P. John
By placing P. under the table name, this will retrieve and display the data in all the columns.

QBE3: Retrieve the name and member number of all the members whose member number is greater than
MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|
>100 P. P.
Comparison with constant value (in the above example the constant value is 100) is placed in the appropriate
column. The resulting table will have the following columns:
Result| MemNo | Fname | Lname |
In QBE, a disjunction (OR) is expressed by using different examples in different rows of the skeleton.

QBE4: Retrieve the name and member number of all the members whose first name is John or Susan.

MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|
P. P.John P.
P. P.Susan P.
A conjunction (AND), on the other hand, is expressed in the same row.

QBE5: Retrieve the name and member number of all the members whose first name is Susan and whose member
number is greater than 100.

MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|
P.>100 P.Susan P.
If the conjunction is a condition involving a single column, the condition can be specified using the AND
operator, as in SQL. For example, if the MemNo should be greater than 100 and less than 150, this is specified under
the MemNo column as: ( _x > 100 ) AND ( _x < 150 )

Join in QBE
Joins can be expressed by using common example variables in multiple tables in the columns to be joined.

QBE6: List the member number and last name of all the members who currently have a borrowed book.

MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|
P._join P.
To express multiple joins you can use multiple example variables at the same time.

1.What is a Relational Database?
2.Enumerate the different types of Keys and give an example
QUIZ # 1

Lecture no. 2: Complete SQL
2.1 SQL
Structured Query Language
SQL – is the de-facto standard query language for relational DBMS.
- is a comprehensive language providing statements for both data definition and data manipulation.
SQL DDL – (Data Definition Language)
- Provides basic commands for defining the conceptual schema of a database.
SQL Provides 3 Numeric data types:
1.) Exact Number – These are integers or whole numbers which maybe positive or negative or zero.
SQL Support 2 integer types:
1.) Integer (INT)
2.) Approximate number – these are numbers that cannot be represented exactly, such as real numbers and
fractional types.
3.) Formatted Number – theses are numbers stored in decimal notation.

College of Computer Studies 4
Formatted numbers can be defined using the ff:
1.) Decimal (ij)
2.) DEC (ij)
3.) Numeric (ij)
Where: I = is the precision on the total number of digits excluding decimal point.
J = is the scale, on the number of fractional digits.
Default scale is zero (0)

Syntax in creating a database name in SQL Query analyzer

4.) INSERT [INTO] <table> [(column_list)]
VALUES (data_values)

1.) Projection Operation (Π)
- Selects the attributes or an attribute list from a table r, while discarding the vest.
2.) Selection Operation (б)
- Selects some rows in attribute r that satisfy a selection condition (alias predicate).
3.) Join Operation
- Combines two tables in one, there by allowing us to obtain more information.
1.) What is SQL and it’s capabilities?
2.) Create a Database, Insert values and view the data inserted.
QUIZ # 2

College of Computer Studies 5
Lecture no. 1: DATABASE DESIGN
1.1 Database System design

If you are to implement DBMS in an organization you need to consider these 4 things :
1. New personalized personnel
- organization should have or train individuals to :
a. maintain the new database software
b. develop and enforce new programming standards
c. design databases
d. manage the staff of new people to train the new employees

- this personnel will increase or may increase productivity (should not

minimize skills )

2. Need to explicit back – up

- provide back – up copies of data because :

a. it is helpful in restoring damaged data files

b. provides validity checks on crucial data

3. Interference with shared data

- concurrent access to shared data via several application program problems

a. when 2 concurrent users both want to change the same or related data inaccurate results

can occur if access to data is not properly synchronized.

b. When data are used exclusively for updating, different users can obtain control of different

segments of the database to lock – up any use of the data.

Organizational Conflict
- a shared database requires a consensus of data definition

a. conflicts on how to define data length and coding rights to update shared data and

associated issues.


1. Operational Database

- contains business transaction and history of daily business activities

- used to support the on – going daily activities of the organization

- use on the “ Transaction Processing System “ Ex. Customers orders, purchases,

accounting, shipments and payments

2. Managerial Database

- used by middle managers for planning control, summaries of operational database

- summary of operation

- use on “ Management Information System “

College of Computer Studies 6
3. Strategic Database

- used by senior managers to develop corporate strategies and seek competitive


- contains information on competitors to economic factors as well as corporate


- used on “ Decision Support System “


1. Data Capture

- captures transaction data, populate databases and maintain the currency of data, gather


2. Data Transfer

- moves / transfers data from one database to another

- Ex. From operational to managerial

3. Data Distribution

- application resulting from data analysis

- converts data into a readily useful information and present them to the management

in a readily understandable form

- Ex. Report, summary and graphs



CASE tools user interface Application

Repository Database

College of Computer Studies 7
1. CASE Tools

- Computer – Aided Software Engineering ( CASE ) tools

- Automated tools used to design databases and application program

2. Repository

- centralized knowledge base containing all data definitions, screen and report formats

and definitions of other organizations and system components containing definitions of

data format

3. DBMS - commercial software system used to provide access to the database and repository

4. Database

- an integrated collection of data, organized to meet the information needs of multiple

users in an organization

- contains occurrences of data ( value itself

5. Application Programs

- computer programs are used to create and maintain the database and provide

information to users

6. User Interface

- languages, menu and other facilities interacted by the users front and support

- use of menu driven system, mouse and voice recognition system to promote end –

user computing – user who are not experts, can define their own report, displays and


7. Data Administrators

- persons who are responsible for designing databases and for developing policies

regarding databases security and integrity

- they use CASE tools to improve the productivity of databases planning and design.

8. System Developers

- persons such as system analysts and programmers who design new application


- They use CASE tools for system requirement, analysis and program design.

College of Computer Studies 8
9. End Users

- persons through the organization who adds, edits, delete and receive information

- encoders


2.1 Entity – Relationship models


Relationship between two or more entities.



Represent the relationship of data item or shows how each data item is related to
another. Each type of data item is represented by an ellipse or bubble with the data item enclosed.
Association between data items is represented by an arrow connecting the data item bubbles.

Example of data items that has - no meaningful association.


Example of data items that has - meaningful association


Types of Association

1. One - association - means that at any point in time, a given value of A has
one and only one value of A, then the value of B is implicitly known. Implicitly known means that it can
be understood though not plainly expressed. We represent a one – association with a single – headed arrow.

College of Computer Studies 9


2. Many – association - means that at any point, a given value of A has one or many values of B
associated with it. We represent a many – association with a double – headed arrow.

Ex. A B


MULTIVALUED ATTRIBUTE - occurs potentially multiple times for each item of A

3. Conditional Association - with this, for a given value of data item A there are two
possibilities: either there is no value of data item B or there is one ( or many ) value (s) of data
item B. A conditional association is represented by a zero recorded on the arrow near the
conditional item.


Ex. Conditional item


CARDINALITY - term used by the analysts that is represented by the arrow heads and zeros on the
arrows which can be thought of as having minimum and maximum values.

Reverse Association

If there is an association from data item A to data item B, there is also a reverse
association from B to A.

Types of Reverse Association

1. One – to – one association
Means that at any point in time, each value of data item A is associated with zero or
exactly one of data item B. Conversely, each value of B is associated with one value of A.



2. One – to – many association

Means that at any point in time, each value of data item A is associated with zero, one or
many values of data item B. However, each value of B is associated with exactly one value of A.
The mapping from B to A is said to be many - to – one, since there may be many values of B
associated with one value of A.

College of Computer Studies 10



3. Many - to – many association

Means that at any point in time, each value of data item A is associated with zero, or
one or many values of data item B. Also each value of B is associated with zero, or one or many
values of A.




Shows the relationship between records.

Crow’s Foot - used to distinguish one and many associations between entities and records.

Crow’s Foot Notation - used to represent the association between records.

Types of Association

1. One Association - no crow’s foot ( one - to – one )



2. Many Association - represented by a crow’s foot





Representation of the data about entities, events, activities and their associations within the organizations.



College of Computer Studies 11
Use of capture all meaning of data and to embed this as integrity and structural clauses in the database

definitions. Such concepts as class, subclass, aggregation, dynamic properties and structures and handling

object of different types ( images, voice print, as well as text and data ) are included in the SDM and other

semantically rich data models.


The relational data model uses the concept of a relation to represent what we have previously called a file

that is a relation represents an entity class. A relation is viewed as a two dimensional table.

The choice of many database builders and users is the relational data model. It is different from other

models not only from the architecture but also in the following ways :

1. Implementation Independence - it logically represents all relationships implicitly and hence, one does

not know what associations are or not physically represented by an efficient method. Relational shares this

property with ER – D.

2. Terminology - it uses its own terminology, most of which has equivalent terms in other data models.

3. Logical Key Pointers - it uses primary and secondary keys in records to represent the association

between 2 records, whereas ER – D uses arc between entity boxes.

4. Normalization Theory - properties of database that make it free of certain maintenance problems have

been developed within the context of the relational data model ( although this properties can also be designed

into an ER – D or a network data model ).

5. High Level Programming Languages - P. L. have been developed specifically to access database

defined via the relational data model; these languages permit data to be manipulated as groups of files than

procedurally one record at a time.


Organizations are usually viewed as a hierarchy oppositions and authority. Computer programs can be

viewed as hierarchy of control and operating modules; and various taxonomies of animals and plants view

elements in a hierarchical sets of relationship. The hierarchical data model represents data as a set of nested

one to many relationships, the hierarchical data model is used exclusively with hierarchical database

management systems; since such systems are in general, being phased out.


The network data model permits as much or as little structure as is desired. We can even create a hierarchy

( a special of a network ) if that is what is needed. As the hierarchical data model, if a certain relationship

is not explicitly included in the database definition, then it cannot be used by a DBMS in processing a


College of Computer Studies 12

It is based on the perception of a real world that consists of a set of basic objects called entities and

relationships among entities / objects. It is a graphical notation that uses special symbols to indicate

relationship among entities intended primarily for the database design process.

Basic Symbols



Data Item

Stands for “is a”
Primary Key

Class - sub - class ISA

The number of entities that participate in a relationship.

Most Typical Degrees for Relationship

1. Unary Relationship - relationship between instances of the
entity class.




2. Binary Relationship - relationship between instances of
two entity classes.


College of Computer Studies 13


3. Ternary Relationship - relationship among instances of three
entity classes.





1.) Give two examples using Unary, Binary and Ternary Relationships

QUIZ # 3

Normalizing a Database


- is a process of reducing redundancies of data in a database.

- is a technique that is used when designing and redesigning a database.

- is a process or set of guidelines used to optimally design a database to reduce redundant data.

The Raw Database

A database that is not normalized may include data that is contained in one or more different tables for no apparent

reason. This could be bad for security reasons, disk space usage, speed of queries, efficiency of database updates, and,

maybe most importantly, data integrity. A database before normalization is one that has not been broken down

logically into smaller, more manageable tables.


College of Computer Studies 14
Emp_id cust_id
Last_name cust_name
First_name cust_address
Middle_name cust_city
Address cust_state
City cust_zip
State cust_phone
Zip cust_fax
Phone ord_num
Pager qty
Position ord_date
Date_hire prod_id

Logical Database Design

Any database should be designed with the end user in mind. Logical database design, also referred to as the logical

model, is the process of arranging data into logical, organized groups of objects that can easily be maintained. The

logical design of a database should reduce data repetition or go so far as to completely eliminate it. After all, why

store the same data twice? Naming conventions used in a database should also be standard and logical.

What are the End User’s Needs?

The needs of the end user should be one of the top considerations when designing a database. Remember that the end

user is the person who ultimately uses the database. There should be ease of use through the user’s front-end tool (a

client program that allows a user access to a database), but this, along with optimal performance, cannot be achieved if

the user’s needs are not taken into consideration.

Some user-related design considerations include the following:

 What data should be stored in the database?

 How will the user access the database?

 What privileges does the user require?

 How should the data be grouped in the database?

 What data is the most commonly accessed?

 How is all data related in the database?

 What measures should be taken to ensure accurate data?

Data redundancy

Data should not be redundant, which means that the duplication of data should be kept to a minimum for several

reasons. For example, it is unnecessary to store an employee’s home address in more than one table. With duplicate

data, unnecessary space is used. Confusion is always a threat when, for instance, an address for an employee in one

table does not match the address of the same employee in another table.

College of Computer Studies 15
Which table is correct? Do you have documentation to verify the employee’s current address? As if data management

were not difficult enough, redundancy of data could prove to be a disaster.

The Normal Forms

Normal form is a way of measuring the levels or depth, to which a database has been normalized. A

database’s level of normalization is determined by the normal form.

The following are the three most common normal forms in the normalization process :

 The first normal form

 The second normal form

 The third normal form

Of the three normal forms, each subsequent normal form depends on normalization steps taken in the

previous normal form. For example, to normalize a database using the second normal form, the database

must first be in the first normal form.

The First Normal Form

The objective of the first normal form is to divide the base data into logical units called tables. When each

table has been designed, a primary key is assigned to most or all tables.

emp_id emp_id cust_id cust_id
last_name last_name cust_name cust_name
first_name first_name cust_address cust_address
middle_name middle_name cust_city cust_city
address address cust_state cust_state
city city cust_zip cust_zip
state state cust_phone cust_phone
zip zip cust_fax cust_fax
phone phone ord_num ord_num
pager pager qty qty
position position ord_date ord_date
position_desc postion_desc
date_hire date_hire prod_id
pay_rate pay_rate prod_desc PRODUCTS_TBL
bonus bonus cost prod_id
date_last_raise date_last_raise prod_desc

You can see that to achieve the first normal form, data had to be broken into logical units of related

information, each having a primary key and ensuring that there are no repeated groups in any of the tables.

Instead of the large table, there are now smaller, more manageable tables: EMPLOYEE_TBL,

College of Computer Studies 16
CUSTOMER_TBL and PRODUCTS_TBL. The primary keys are normally the first columns listed in a table,

in this case: EMP_ID and PROD_ID.

The Second Normal Form

The objective of the second normal form is to take data that is only partly dependent on the primary key

and enter that data into another table.

emp_id emp_id
last_name last_name
first_name first_name
middle_name middle_name
address address
city city
state state EMPLOYEE_PAY_TBL
zip zip emp_id
phone phone position
pager pager position_desc
position_desc date_hire
date_hire pay_rate
pay_rate bonus
bonus date_last_raise

cust_id cust_name
cust_name cust_address
cust_address cust_city
cust_city cust_state
cust_state cust_zip
cust_zip cust_phone
cust_phone cust_fax
prod_id ord_num
qty prod_id
ord_date qty


According to the figure, the second normal form is derived from the first normal form by further breaking

two tables down into more specific units.

EMPLOYEE_TBL split into two tables called EMPLOYEE-TBL and EMPLOYEE_PAY_TBL. Personal

employee information is dependent on the primary key

College of Computer Studies 17
( EMP_ID ), so that the information remained in the EMPLOYEE_TBL ( EMP_ID, LAST_NAME,


hand, the information that is only partly dependent on the EMP_ID ( each individual employee ) is used to


and DATE_LAST_RAISE ). Notice that both tables contain the column EMP_ID. This is the primary key

of each table and is used to match corresponding data between the two tables.

CUSTOMER_TBL split into two tables called CUSTOMER_TBL and ORDERS_TBL. What took place is

similar to what occurred in the EMPLOYEE_TBL. Columns that were partly dependent on the primary key

were directed to another table. The order information for a customer is dependent on each CUST_ID, but

does not directly depend on the general customer information in the original table.

The Third Normal Form

The third normal form’s objective is to remove data in a table that is not dependent on the primary key.

Another table was created to display the use of the third normal form. EMPLOYEE_PAY_TBL is split into

two tables, one table containing the actual employee pay information and the other containing the position

descriptions, which really do not

need to reside in EMPLOYEE-PAY_TBL. The POSITION_DESC column is totally independent of the

primary key, EMP_ID.




position position
date_hire position-desc

College of Computer Studies 18
Benefits of Normalization

Normalization provides numerous benefits to a database. Some of the major benefits include the following :

- Greater overall database organization

- Reduction of redundant data

- Data consistency within the database design

- A much more flexible database design

- A better handle on database security


Drawbacks of Normalization

Although most successful databases are normalized to some degree, there is one substantial drawback of a

normalized database: reduced database performance. The acceptance of reduced performance requires the

knowledge that when a query or transaction request is sent to the database, there are factors involved, such

as CPU usage, memory usage and input/output (I/O). To make a long story short, a normalized database

requires much more CPU, memory and I/O to process transactions and database queries than does a de-

normalized database. A normalized database must locate the requested tables and then join the data from the

tables to either get the requested information or to process the desired data. A more in-depth discussion

concerning database performance occurs in Hour 18, “Managing Database Users. “

QUIZ # 4




1.1 Transaction Support

Transaction – an action or series of actions, carried out by a single user or application
program, which accesses or changes the contents of the database.

- is a logical unit of work on the database. It may be an entire program, part
of a program, or a single command.

Properties of Transactions

There are properties that all transactions should possess the four basic, or so called ACID:

College of Computer Studies 19
 Atomicity – the “all or nothing” property. A transaction is an indivisible unit that is either performed
in its entirely or it is not performed at all.
 Consistency- a transaction must transform the database from one consistent state to another
consistent state.
 Isolation – transactions execute independently of one another. In other words, the partial effects of
incomplete transactions should not be visible to other transactions.
 Durability – the effects of a successfully completed (committed) transaction are permanently
recorded in the database and must be lost because of a subsequent failure.


Transaction Manager – coordinates transactions on behalf of application programs.

Scheduler – the module responsible for implementing a particular strategy for concurrency control.
Sometimes referred to as “ Lock Manager”.

Recovery Manager - ensures that the database is restored to the state it was in before the start of the
transaction, and therefore a consistent file.

Buffer Manager – is responsible for the transfer of data between disk storage and main memory.

Transaction Manager Scheduler

Buffer Manager Recovery Manager

Access Manager File Manager

Systems Manager

Database and System Catalog

Figure 1
The DBMS Transaction Subsystem

1.2 Concurrency Control

Concurrency Control – the process of managing simultaneous operations on the
database without having them interfere with one another.

Major objective in developing a database

“ Is to enable many users to access shared data concurrently”

Three examples of potential problems caused by concurrency:

1.) The Lost Update Problem – an apparently successfully completed update
operations by one user can be overridden by another user.

College of Computer Studies 20
2.) The Uncommitted Dependency Problem – occurs when one transaction is allowed to see the
intermediate results of another transaction before it has committed.

3.) The Inconsistent Analysis Problem – occurs when a transaction updates some of them using the
execution of the first.

1.3 Database Recovery

Database Recovery – the process of restoring the database to a correct state in the event of a failure.

Four different types of media with an increasing degree for reliability:

1.) Main Memory – is volatile storage that usually does not survive system crashes.

2.) Magnetic Disks – provide online non-volatile storage. Compared with main memory, disks are more
reliable and much cheaper, but slower by three to four order of magnitude.

3.) Magnetic Tape – is an offline non-volatile storage medium, which is far more reliable than disk and
fairly inexpensive, but slower, providing only sequential access.

4.) Optical Disks – is more reliable than tape, generally cheaper, faster, providing random access.

Additional Facts:

- Main memory is also called Primary Storage.
- Disks and tape are known as Secondary Storage.
- Stable storage represents information that has been replicated in several non-volatile storage
media (usually disk) with independent failure modes.

Among the causes of failure are:

1.) System crashes – due to hardware or software errors, resulting in loss of main memory.

2.) Media failures – such as head crashes or unreadable media, resulting in the loss of parts of
secondary storage.

3.) Application software errors – such as logical errors in the program that is accessing the database,
which cause one or more transaction to fail.

4.) Natural physical disasters – such as fire, floods, earthquakes, or power failures.

5.) Sabotage – or can be called as intentional corruption or destruction of data, hardware or software

Two Principal Effects that we need to consider:

1.) The loss of main memory, including the database buffers.
2.) The loss of the disk copy of the database.

A DBMS should provide the following facilities to assist with recovery:

 A backup mechanism, which makes periodic backup copies of the database.
 Logging facilities, which keep track of the current state of transactions and database changes.

College of Computer Studies 21
 A checkpoint facility, which enables updates to the database that are in progress to be made
 A recovery manager, which allows the system to restore the database to a consistent state following a

Log File

To keep track of database transactions, the DBMS maintains a special file called a log
(or journal) that contains information about all updates to the database.

The log may contain the following data:

1.) Transaction Records, containing:

 Transaction identifier.
 Type of log record.
 Identifier of data item affected by the database action.

2.) Checkpoint records

Checkpoint – the point of synchronization between the database and the
transaction log file. All buffers are force-written to secondary storage.

Checkpoint are scheduled at predetermined intervals and involve the following

 Writing all log records in main memory to secondary storage.
 Writing the modified blocks in the database buffers to secondary storage.
 Writing a checkpoint record to the log file. This record contains the identifiers of all transactions that
are active at the time of the checkpoint.

Recovery Techniques

1.) Recovery techniques using deferred update

 When a transaction starts, write a transaction start record to the log.

 When any write operation is performed, write a log record containing all the data specified previously
(excluding the before-image of the update). Do not actually write the update to the database buffers
or the database itself.

 When a transaction is about to commit, write a transaction commit log record, write all the log
records for the transaction to disk and then commit the transaction. Use the log records to perform the
actual updates to the database.

 If a transaction aborts, ignore the log records for the transaction and do not perform the writes.

 Any transaction with transaction start and transaction commit log records should be redone.

 For any transactions with transaction start and transaction abort log records, we do nothing, since no
actual writing was done to the database, so these transactions do not have to be undone.

College of Computer Studies 22
2.) Recovery techniques using immediate update

 When a transaction starts, write a transaction start record to the log.

 When a write operation is performed, write a record containing the necessary data to the log file.

 Once the log record is written, write the update to the database buffers.

 The updates to the database itself are written when the buffers are next flushed to secondary

 When a transaction commits, write a transaction commit to the log.


2.1 Hash Files

Hash function – calculates the address of the page in which the record is to be stored
based on one or more of the fields in the record.

 Hash field – is also called as base field.

 Hash key – if the field is also a key field of the file.

Collision – when the same address is generated for two or more records.

There are several techniques that can be used to manage collisions:

1.) Open addressing
2.) Unchained overflow
3.) Chained overflow
4.) Multiple hashing

2.2 Indexes

Indexes – a data structure that allows the DBMS to locate particular records in a file more quickly, and
thereby speed response to user queries.

 Data file – the file containing the logical records.

 Index file – the file containing the index records.

 Primary index – if the data file is sequentially ordered, and the indexing field is a key field of
the file, it is guaranteed to have a unique value in each record.

College of Computer Studies 23
 Clustering index – if the indexing is not a key field of the file, so that there can be more than
one record corresponding to a value of the indexing field.

 Secondary index – an index that is defined on a non-ordering field of the data file.

Indexed sequential file – a sorted data file with a primary index.

An Indexed sequential file is a more versatile structure, which normally has:

 A primary storage area.

 A separate index or indexes.

 An overflow area.


1.) Define Transaction

2.) Give other Database recovery techniques



3.1 Data Warehousing

Data Warehousing - is a subject-oriented, integrated, time-variant, and non-volatile
collection of data in support of management’s decision-making process.

Benefits of Data Warehousing

 Potential high returns on investment
 Competitive advantage
 Increased productivity of corporate decision-makers

Problems of Data Warehousing

 Underestimation of resources for data loading
 Hidden problems with source systems
 Required data not captured
 Increased end-user demands
 Data homogenization
 High demand for resources
 Data ownership
 High maintenance
 Long duration projects
 Complexity of integration

The Major components of a Data Warehouse

1. Operational Data

The source of data for the data warehouse is supplied from:

College of Computer Studies 24
 Mainframe operational held in first generation hierarchical and network databases.
 Departmental data held in propriety file systems such as VSAM, RMS, and relational DBMS
such as Informix, Oracle.
 Private data held on workstations and private servers.
 External systems such as the Internet, commercially available databases, or databases associated
with an organization’s supplier or customers.

2. Load Manager – (also called the front-end component) performs all the operations associated with
the extraction and loading of data into the warehouse.

3. Warehouse Manager – performs all the operations associated with the management of the data in
the warehouse.

4. Query Manager - (also called the back-end component) performs all the operations associated with
the management with the management of user queries.

5. End-user access tools – is to provide information to business users for strategic decision making.

Can be categorized into five main groups:

 Reporting and query tools.
 Application development tools.
 Executive information system (EIS) tools.
 Online analytical processing (OLAP) tools.
 Data mining tools.

3.2 Online Analytical Processing (OLAP) – the dynamic synthesis, analysis, and consolidation of large
volumes of multi-dimensional data.

Rules for OLAP Systems

 Multi-dimensional conceptual view
 Transparency
 Accessibility
 Consistent reporting performance
 Client-server architecture
 Generic dimensionality
 Dynamic sparse matrix handling
 Multi-user support
 Unrestricted cross-dimensional operations
 Intuitive data manipulation
 Flexible reporting
 Unlimited dimensions and aggregations

Categories of OLAP Tools

College of Computer Studies 25
1. Multi-dimensional OPLAP (MOLAP or MD-OPLAP) – use specialized data structures and
Multi-dimensional Database Management (MDDBMSs) to organize, navigate, and analyze

2. Relational OPLAP (ROLAP) – is the fastest-growing style of OLAP technology. ROLAP
supports RDBMS products through the use of a meta-data layer, thus avoiding the
requirement to create a static multi-dimensional data structure.

3. Managed Query Environment (MQE) – they provide limited analysis capability, either
directly against RDBMS products, or by using an intermediate MOLAP server.

3.3 Data Mining

Data Mining – the process of extracting valid, previously unknown, comprehensible, and actionable
information from large databases using it to make crucial business decisions.

Four Main Operations Associated with Data Mining Techniques:

1. Predictive modeling 3. Link analysis
2. Database segmentation 4. Deviation detection



College of Computer Studies 26