You are on page 1of 284

Wollo University

Kombolcha Institute of Technology

College of Informatics

Department of Information Technology

Fundamentals of Database System

By: Tadesse Kebede

1 2/24/2023
Introduction
 Data: What is data?
 Facts concerning people, objects, events or other entities.
 Can be in the form of text, graphics, sound and video segments
 They are difficult to interpret or make decisions based on
 Unprocessed, raw facts and can be stored in database

 Information: What is Information?


 Data presented in a form suitable for interpretation.
 Data processed to be useful in decision making.
 Processed data
 Can’t be stored in database

2 2/24/2023
Introduction … cont’d
➢ Database: What is database?
➢ An organized collection of logically related data.

➢ A shared collection of logically interrelated data designed to


meet the varied information needs of an organization.

➢ A shared collection – can be used simultaneously by many


department and user

➢ Logically related - comprise the important objects and the


relationships between these objects

➢ A computerized means of record keeping system

3 2/24/2023
Introduction …. cont’d
A database has the following implicit properties:
 A database represents some aspect of the real world,
sometimes called the mini world or the Universe of Discourse
(UoD).
 Changes to the mini world are reflected in the database.

 A database is a logically coherent collection of data with some


inherent meaning.
 A random assortment of data cannot correctly be referred to as a
database.
 A database is designed, built, and populated with data for a
specific purpose.

4 2/24/2023
Introduction …. cont’d
Data
Course Section Semester Name Rank

MIS 3353 100 Su 01 Kemp Instr


MIS 3353 200 Su 01 Schwarzkopf Assoc P
MIS 3373 200 Su 01 Kemp Instr
MIS 4663 900 Fa 01 Schwarzkopf Assoc P
MIS 4663 901 Fa 01 Van Horn Prof

5 2/24/2023
Introduction … cont’d
➢ Meta Data: What do we mean by meta data?
➢ Descriptions of the properties or characteristics of the data,
including data types, field sizes, allowable values, and
documentation
➢ Data that describes data
➢ Data about data
➢ Description of fields
➢ Display and format instructions
➢ Structure of files and tables
➢ Security and access rules
➢ Triggers and operational rules

6 2/24/2023
Introduction …. cont’d
Metadata
Data Item Value

Name Type Length Min Max Description

Course Char 7 Three digit depart-


ment reference and 4
digit
Section Integer 3 001 900 Section number

Semester Char 10 Semester and year

Name Char 30 Instructor name

Rank Char 10 Instructor rank

7 2/24/2023
Data management approaches
 Data management : keeping your data records
We have three approaches
 Manual Approach
 File-Based Approach
 Database Approach
Manual File Handling Systems
 The primitive and traditional way of information handling
 This may work well if the number of items to be stored is small.
 Includes intensive human labor
 Events and objects are written on files (paper)
 Each of the files containing various kinds of information is labeled and
stored in one or more cabinets
 The cabinets could be kept in safe places for security
8 2/24/2023
Manual File Handling Systems ..cont’d
Limitations of Manual File Handling
 Problem of Data Organization
 Problem of Efficiency
 Prone to error
 Difficult to update, retrieve, integrate
 You have the data but it is difficult to compile the information
 Significant amount of duplication of data
 Cross referencing is difficult

Two computerized approaches evolved to overcome the


limitations of the above approaches
 File based approach → decentralised
 Database approach→ centralised
9 2/24/2023
File based Approach
 File based systems were an early attempt to computerize the manual
filing system.
 It is a decentralized computerized data handling method i.e. to
develop a program or a number of programs for each different
application.
 Since every application defines and manages its own data, the system
is subjected to serious data duplication problem.

10 2/24/2023
Limitations of File-Based systems
 Data Redundancy (Duplication of data)
 Same data is held by different programs
 Staffsalary(staffno, name, sex, salary)
 Staff(staffno,name,position,sex,dateofb,salary)
 Wasted space (Uncontrolled duplication of data)
 Separation and isolation of data
– Each program maintains its own set of data. Users of one program
may be unaware of potentially useful data held by other programs.
 Limited data sharing- No centralized control of data
 Data Inconsistency and confusion
 Data dependence
 File structure is defined in the program code and is dependent on
the application programming language.
11 2/24/2023
Limitations of File-Based systems .. Cont’d
 Incompatible file formats - Lack of data sharing and availability)
 Programs are written in different languages, and so cannot easily
access each others files.
E.g. Personnel write in C, Payroll writes in COBOL
 Poor Security and administration
 Update Anomalies
 Modification Anomalies
 Deletion Anomalies
 Insertion Anomalies

12 2/24/2023
Database Approach
 Here, a single repository of data is maintained.
 What emerged were the database and database management systems
 Basic Database terminologies
 Enterprise: an organization like library, bank, university, etc.
 Entity: Person, place, thing, or event about which we wish to keep
data
 Attribute (Field): Property of an entity. E.g. Name, age,
telephone, grade, sex, etc.
 Record: A logically connected set of one or more Attributes that
describe a person, place or thing. (Logically related data)
 File: A collection of related records. E.g. Student file
 Relationship: an association among entities (entity records)
 Query: question asked for database
13 2/24/2023
Benefits of Database systems
 Data can be shared: two or more users can access and use.
 Improved data accessibility: By using structured query languages,
the users can easily access data without programming experience.
 Redundancy can be reduced: Isolated data is integrated in
database.
 Quality data can be maintained: the different integrity constraints
in the database approach will maintain the quality leading to better
decision making.
 Inconsistency can be avoided: controlled data redundancy will
avoid inconsistency of the data in the database to some extent.
 Transaction support can be provided: basic demands of any
transaction support systems are implanted in a full scale DBMS.

14 2/24/2023
Benefits of Database systems … cont’d
 Integrity can be maintained: Data at different applications will be
integrated together with additional constraints.
 Security measures can be enforced: The shared data can be secured
by data security mechanisms.
 Improved decision support: the database will provide information
useful for decision making
 Standards can be enforced: ways of using different data by users
 Less Labor: data maintenance will not demand much resource
 Centralized information control: Since relevant data in the
organization will be stored at one repository, it can be controlled and
managed at the central level.
 Data Independence - Applications insulated from how data is
structured and stored

15 2/24/2023
Limitations and risk of database approach
 Introduction of new professional and specialized personnel
 High cost to be incurred to develop and maintain the system
 Complex backup and recovery services from the users
perspective
 High impact on the system when failure occurs to the central
system

16 2/24/2023
Users and Actors of Database System
 Actors on the scene: The people whose jobs involve the day-to-day
use of a large database
 Workers behind the scene: Those who work to maintain the
database system environment, but who are not actively interested in
the database itself.

Actors on the Scene


 Database Administrators
 Database Designers
 End Users
 System Analysts and Application Programmers (Software Engineers)

17 2/24/2023
Database Administrators
 In a database environment, the primary resource is the database itself
and the secondary resource is the DBMS and related software.
 Administering these resources is the responsibility of the Database
Administrator (DBA).
 The DBA is responsible for authorizing access to the database, for
coordinating and monitoring its use, and for acquiring software and
hardware resources as needed.
 The DBA is accountable for problems such as breach of security or
poor system response time.

18 2/24/2023
Database Designer
 Database designers are responsible for identifying the data to be stored
in the database and for choosing appropriate structures to represent and
store this data.
 It is the responsibility of database designers to communicate with all
prospective database users, in order to understand their requirements,
and to come up with a design that meets these requirements.
 In many cases, the designers are on the staff of the DBA and may be
assigned other staff responsibilities after the database design is
completed.
 The final database design must be capable of supporting the
requirements of all user groups.

19 2/24/2023
End Users
 End users are the people whose jobs require access to the database for
querying, updating, and generating reports;
 The database primarily exists for their use. There are several categories
of end users:
 Casual end users:- occasionally access the database. They are
typically middle or high-level managers or other occasional
browsers.
 Naive or parametric end users:- Their main job revolves around
constantly querying and updating the database, using standard types
of queries and updates called canned transactions that have been
carefully programmed and tested.
 Bank tellers check account balances and post withdrawals and deposits
 Reservation clerks for airlines, hotels, and car rental companies check
availability for a given request and make reservations
20 2/24/2023
End Users … Cont’d
 Sophisticated end users: Include engineers, scientists, business
analysts, and others who thoroughly familiarize themselves with
the facilities of the DBMS so as to implement their applications to
meet their complex requirements.
 Stand-alone users: Maintain personal databases by using ready
made program packages that provide easy to use menu or graphics
based interfaces.
 An example is the user of a tax package that stores a variety of
personal financial data for tax purposes.

21 2/24/2023
System Analysts and Application
Programmers (Software Engineers)
 System analysts: Determine the requirements of end users, especially
naive and parametric end users, and develop specifications for canned
transactions that meet these requirements.
 Application programmers implement these specifications as programs;
then they test, debug, document, and maintain these canned
transactions.
 Such analysts and programmers (nowadays called software engineers)
should be familiar with the full range of capabilities provided by the
DBMS to accomplish their tasks.

22 2/24/2023
Workers behind the Scene
 These persons are typically not interested in the database itself.
These include:
 DBMS system designers and implementers:-are persons who
design and implement the DBMS modules and interfaces as a
software package.
 A DBMS is a complex software system that consists of many
components or modules, including modules for implementing the
catalog, query language, interface processors, data access,
concurrency control, recovery, and security.
 The DBMS must interface with other system software, such as the
operating system and compilers for various programming
languages

23 2/24/2023
Workers behind the Scene … cont’d
 Tool developers: Include persons who design and implement tools
 Tools are software packages that facilitate database system design and
use, and help improve performance.
 Tools are optional packages that are often purchased separately.
 They include packages for database design, performance monitoring,
natural language or graphical interfaces, prototyping, simulation, and
test data generation.
 Operators and maintenance personnel: are the system
administration personnel who are responsible for the actual running
and maintenance of the hardware and software environment for the
database system.

24 2/24/2023
Some Common uses of Databases
In a university
 Containing information about a student, the course she/he is enrolled
in, the dormitory she/he has been given.
 Containing details of Staff who work at the university at personnel,
payroll, etc.
In a library
 There may be a database containing details of the books in the library
and details of the users,
 The database system handles activities such as
 Allowing a user to reserve a book
 Notifying when materials are overdue:

25 2/24/2023
Some Common uses of Databases … Cont’d
In travel agencies
 When you make inquiries about a travel, the travel agent may access
databases containing flight details
 Flight no., date, time of departure, time of arrival
Insurance
 When you wish to take out insurance, there is database containing
 Your personal details: name, address, age
 information on whether you drink or smoke,
 Your medical records to determine the cost of the insurance
Supermarkets
 When you buy goods from some supermarkets, a database will be accessed.
 The checkout assistant will run a barcode reader over the purchases.

26 2/24/2023
27 2/24/2023
Chapter 2

Database System Concepts and Architecture

1
Outlines

 Data Models, Schema and Instances

 DBMS Architecture and Data Independence

 Database Language and Interface

 The Database System Environment

 Classification of DBMS

2
Schemas, Instances and Database State
Schema
 A schema is a description of a particular collection of data, using a
given data model
 It is a definition of database

E.g. in relational data model we have the following rule to define


schema
1. Table name outside the parenthesis
2. Column name inside the parenthesis
3. Primary key underlined
Student (SID, Name, Age, Sex)

3
Schemas, Instances, and Database State …
Cont’d
 Schema is specified during database design and is not expected to
change frequently.
 A displayed schema is called a schema diagram
 Even if it is not common to change the database schema. Some times
there is a need of change of data base state. It is called schema
evolution.
 We have three Schemas
1. Internal Schema:- To describe Physical storage structures and access
path, typically uses a physical data model
2. Conceptual Schema: - To describe the structure and constraints for the
whole data base for a community of users uses a conceptual or an
implementation data model.
3. External Schema:- To describe the various user views.

4
Schemas, Instances, and Database State …
Cont’d
Database State
 The data in the database at a particular moment in time is called a
database state or snapshot.
 It is also called the current set of occurrences or instances in the
database.
 In a given database state, each schema construct has its own current set
of instances: for example, the STUDENT construct will contain the set
of individual student entities (records) as its instances.
 Every time we insert or delete a record, or change the value of a data
item in a record, we change one state of the database into another state.

5
Schemas, Instances, and Database State …
Cont’d
 When we define a new database, we specify its database schema only to
the DBMS. At this point, the corresponding database state is the empty
state with no data.
 We get the initial state of the database when the database is first
populated or loaded with the initial data.
 From then on, every time an update operation is applied to the database,
we get another database state. At any point in time, the database has a
current state
 The DBMS is partly responsible for ensuring that every state of the
database is a valid state i.e, a state that satisfies the structure and
constraints specified in the schema.

6
DBMS
 What is DBMS?
 DBMS is a software package used for providing efficient, convenient
and safe multi-user storage of and access to massive amounts of
persistent data.
 A DBMS also provides a systematic method for creating, updating,
storing, retrieving data in a database.
 DBMS also provides the service of controlling data access, enforcing
data integrity, managing concurrency control, and recovery.
 A full scale DBMS should at least have the following services to
provide to the user.
 Data storage, retrieval and update in the database
 A user accessible catalogue
 Transaction support service

7
DBMS … Cont’d
 Concurrency Control Services: access of database by different
users simultaneously
 Recovery Services: a mechanism for recovering from failure
 Authorization Services (Security): support the access authorization
 Support for Data Communication: support data transfer
 Integrity Services: rules about data and the change that took place
on the data, correctness and consistency of stored data
 Services to promote data independency between the data and the
application
 Utility services: sets of utility service facilities like
 Importing data
 Statistical analysis support
 Index reorganization
 Garbage collection
8
DBMS Language
1. Data Definition Language (DDL)
 Language used to define each data element required by the
organization
 Commands for setting up schema of the database
 Used to set up a database, create, delete and alter table with the
facility of handling constraints
 Is used to define the internal and external schema
2. Data Manipulation Language (DML)
 Used for data manipulation
 Typical manipulations include retrieval, insertion, deletion, and
modification of the data.
 Since the required data or query by the user will be extracted using this
type of language, it is also called “Query Language”
9
DBMS Language … Cont’d
We have two types of DMLs:-
 Non-Procedural Manipulation Languages
 That allows the user to state what data is needed rather than how
it is to be retrieved.
E.g. SQL
 Procedural Data Manipulation Languages
 That allows the user to tell the system what data is needed and
exactly how to retrieve the data;

10
DBMS Language … Cont’d
How the Programmer Sees the DBMS
 Start with DDL to create tables
CREATE TABLE Students (
Name CHAR (30)
ID CHAR (9) PRIMARY KEY NOT NULL,
Category CHAR (20)) . . .
 Continue with DML to populate tables:
INSERT INTO Students
VALUES (‘Rahel’, ‘ICT 123’, ‘undergraduate’)
3. Data dictionary
 The data dictionary contains definitions of objects in the system such
as tables and table relationships and rules defined on objects.

11
Database Development Life Cycle
 The major steps in database design are;
1. Planning: That is identifying information gap in an organization and
propose a database solution to solve the problem.
2. Analysis: That concentrates more on fact finding about the problem
or the opportunity.
 Feasibility analysis, requirement determination and structuring, and
selection of best design method are also performed at this phase.
3. Design: in database designing more emphasis is given to this phase.
The phase is further divided into three sub-phases.
1. Conceptual Design: concise description of the data, data
type, relationship between data and constraints on the data.
 Used to elicit and structure all information requirements

12
Database Development Life Cycle … Cont’d
2. Logical Design: a higher level conceptual abstraction with selected
specific data model to implement the data structure.
 It is particular DBMS independent and with no other physical
considerations.
3. Physical Design: physical implementation of the upper level design of
the database with respect to internal storage and file structure to
develop all technology and organizational specification.
4. Implementation: coding, testing and deployment of the designed
database for use.
5. Operation and Support: administering and maintaining the operation
of the database system and providing support to users.

13
DBMS Architecture and Data Independence
 A major aim of a database system is to provide users with an abstract
view of data, hiding certain details of how data is stored and
manipulated.
 Since a database is a shared resource, each user may require a different
view of the data held in the database. Accordingly there are several
types of architectures of database systems.
 The American National Standards Institute/Standards Planning and
Requirements Committee (ANSI-SPARC) also introduced the three
level architecture of the database based on their degree of abstraction.
 The architecture is consists of the three levels: internal level,
conceptual level and external level.
 In this architecture, schemas can be defined at three levels
 The goal of the three-schema architecture is to separate the user
applications and the physical database.

14
DBMS Architecture … Cont’d
1. The Internal level:
 Has an internal schema, which describes the physical storage structure of
the database.
 The internal schema uses a physical data model and describes the
complete details of data storage and access paths for the database.
 It describes the physical representation of the database on the computer.
 This level describes how the data is stored in the database.
 The way the DBMS and OS perceive the data
 The internal level is concerned with such things as:
 Storage space allocation for data
 Record description for storage
 Record placement

15
DBMS Architecture ... Cont’d
2. The conceptual level
 Has a conceptual schema, which describes the structure of the whole
database for a community of users.
 The conceptual schema hides the details of physical storage structures and
concentrates on describing entities, data types, relationships, user
operations, constraints, security and integrity information.
 A high-level data model or an implementation data model can be used at
this level.
 The community view of the database.
 This level describes what data is stored in the database and the
relationships among the data.
 It is a complete view of the data requirements of the organization.

16
DBMS Architecture … Cont’d
3. The External Level
 Includes a number of external schemas or user views.
 Each external schema describes the part of the database that a particular
user group is interested in and hides the rest of the database from that
user group.
 The users’ view of the database.
 The way users perceive the data
 Describe part of the database that is relevant to each user.
 Each user has a view of the real world in different way. For example:
dates may be viewed in (day, month, year) or (year, month, day)
 Entities, attributes or relationships that are not of interest to the users
may still be represented in the database, but the users will be unaware of
them.

17
DBMS Architecture … Cont’d
ANSI-SPARC Architecture and Database Design Phases

18
DBMS Architecture … Cont’d
Sno. fname lname age Sal. Staff_no lname Br.no.
External
level

Staff_no. fname lname DOB Sal. Br. No.


Conceptual level

Struct STAFF
{
Int staff_no;
Char fname[15];
Internal level Char lname[15];
Struct date date_of_birth;
Flooat sal;
Struct staff *next
};
19
DBMS Architecture … Cont’d
Example of data abstraction

A view when posting the grades to all students:

20
Data Independence
 The three-schema architecture can be used to explain the concept of data
independence.
 Data independence is defined as the capacity to change the schema at
one level of a database system without having to change the schema at
the next higher level.
 We can define two types of data independence:
 Physical data independence
 Logical data independence

21
Data Independence … Cont’d
1. Logical data independence
 Is the capacity to change the conceptual schema without having to
change external schemas or application programs.
 We may change the conceptual schema to expand the database (by
adding a record type or data item), or to reduce the database (by
removing a record type or data item).
 Only the view definition and the mappings need be changed in a
DBMS that supports logical data independence.
 Modifications at the logical level are necessary whenever the logical
structure of the database is altered (for example, when money
market accounts are added to a banking system).

22
Data Independence … Cont’d
2. Physical data independence
 Is the capacity to change the internal schema without having to change
the conceptual (or external) schemas.
 Changes to the internal schema may be needed because some physical
files had to be reorganized.
 Modifications at the physical level are occasionally necessary to
improve performance.

23
Chapter 3

Database Models

1
Outlines
 Introduction to Modeling

 Network, hierarchical and Object Model

 Relational Model

2
Database Models
 A specific DBMS has its own specific Data Definition Language,
but this type of language is too low level to describe the data
requirements of an organization in a way that is readily
understandable by a variety of users.
 We need a higher-level language.
 Such a higher-level is called data-model.
 A data model is a collection of tools or concepts for describing
data, the meaning of data, data relationships, and data
constraints.
 A model is a representation of real world objects and events and
their associations.
 The main purpose of Data Model is to represent the data in
understandable way.

3
Database Models … Cont’d
 A database model is a type of data model that determines the
logical structure of a database and fundamentally determines in
which manner data can be stored, organized, and manipulated.

 Data model can be divided into four:


1) Hierarchical Model
2) Network Model
3) Relational Model
4) Object oriented Model

4
Hierarchical Model
 Consists of an ordered set of trees in a parent child mode.
 A parent node can have more than one child node and a child node
should have only one parent
 Connection between child and its parent is called a Link.
 The simplest data model
 Record type is referred to as node or segment
 The top node is the root node
 The relationship between parent and child can be either 1-1 or 1-M
 To add new record type or relationship, the database must be
redefined and then stored in a new form.

5
Hierarchical Model… Cont’d

6
Hierarchical Model… Cont’d
Advantage of Hierarchical Model
 Good for tree type problem (e.g. Family Tree Problem)
 Language is simple; uses constructs like GET, GET UNIQUE, GET
NEXT, GET NEXT WITHIN PARENT etc.

Disadvantages of Hierarchical Model


 Addition, deletion, and search operations are very difficult.
 There is duplication of data.
 Navigational and procedural nature of processing
 Database is visualized as a linear arrangement of records
 Little scope for "query optimization"

7
Network Model
 Allows record types to have more that one parent unlike hierarchical
model
 A network data models sees records as set members
 Each set has an owner and one or more members
 Allow no many to many relationship between entities
 Like hierarchical model network model is a collection of physically
linked records.
 Allow member records to have more than one owner

8
Network Model … Cont’d

9
Network Model … Cont’d
Advantage of Network Data Model:
 Network Model is able to model complex relationships and
represents semantics of add/delete on the relationships.
 Can handle most situations for modeling using record types and
relationship types.
 Language is navigational; uses constructs like FIND, FIND member,
FIND owner, FIND NEXT within set, GET etc.

Disadvantages of Network Data Model:


 Navigational and procedural nature of processing
 Database contains a complex array of pointers that thread through a
set of records.
 Little scope for automated "query optimization”

10
Object Oriented Model
 The OO approach of defining objects that can be used in many programs
is now also being applied to database systems.

 An object can have properties (or attributes) but also behaviour, which
is modelled in methods (functions) in the object.

 In an OO database , each type of object in the database’s mini-world is


modelled by a class i.e. (Customer class, Account class ) like tables in
the relational model.

 A class has properties (attributes).

 A class also has methods that are stored with the class definition.

11
Object Oriented Model … Cont’d
 One advantage of the OO model is sub-classes. As there are different
types of account, they can be modelled as sub-classes of the Account
class.

 SavingsAccount and CurrentAccount.

 This makes sense because the different account types have some
different behaviour e.g. gaining interest in a savings account but
some behaviour the same e.g. lodging or withdrawing cash. This is
the inheritance concept of OO programming.

12
Object Oriented Model … Cont’d

Diagram – class name at the top, properties in the middle, methods at the bottom.

13
Relation Model
 Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A
Relational Model for Large Shared Data Banks')
 Terminologies originates from the branch of mathematics called set
theory and relation
 Can define more flexible and complex relationship
 Viewed as a collection of tables called “Relations” equivalent to
collection of record types
 Relation: Two dimensional table
 Stores information or data in the form of tables ( rows and columns)
 A row of the table is called tuple which is equivalent to record
 A column of a table is called attribute which is equivalent to fields
 Data value is the value of the Attribute

14
Relational Model … Cont’d
 Records are related by the data stored jointly in the fields of records in
two tables or files. The related tables contain information that creates
the relation
 The tables seem to be independent but are related some how.
 No physical consideration of the storage is required by the user
 Many tables are merged together to come up with a new virtual view
of the relationship

15
Relational Model … Cont’d
Relational Data model (also called the second generation data
model), describes entities and their relationships in the form of table

Entity : Student Entity: course

Id Name Courseno Course no. Course_title Credit_hours


123 Abebe K. INST 321

INST 321 Database Systems 3


234 Almaz M. INST 205

456 Abebe W. INST 321 INST 205 Introduction to ICT 3

16
Relational Data Model
Properties of Relational Databases
 Each row of a table is uniquely identified by a PRIMARY KEY
composed of one or more columns
 Each tuple in a relation must be unique
 Group of columns, that uniquely identifies a row in a table is called a
CANDIDATE KEY
 ENTITY INTEGRITY RULE of the model states that no component
of the primary key may contain a NULL value.
 A column or combination of columns that matches the primary key of
another table is called a FOREIGN KEY.
 FOREIGN KEY is Used to cross-reference tables.

17
Properties of Relational Databases … Cont’d
 The REFERENTIAL INTEGRITY RULE of the model states that,
for every foreign key value in a table there must be a corresponding
primary key value in another table in the database or it should be
NULL.

 All tables are LOGICAL ENTITIES

 A table is either a BASE TABLES (Named Relations) or VIEWS


(Unnamed Relations)

 Only Base Tables are physically stores

 VIEWS are derived from BASE TABLES with SQL instructions like:
[SELECT .. FROM .. WHERE .. ORDER BY]
18
Properties of Relational Databases … Cont’d
 Database is the collection of tables
 Each entity in one table
 Attributes are fields (columns) in table
 Order of rows and columns is immaterial
 Entries with repeating groups are said to be un-normalized
 Entries are single-valued
 Each column (field or attribute) has a distinct name
 All values in a column represent the same attribute and have the same
data format

19
Building Blocks of the Relational Data Model

The building blocks of the relational data model are:

 Entities: real world physical or logical object

 Attributes: properties used to describe each Entity or real world object.

 Relationship: the association between Entities

 Constraints: rules that should be obeyed while manipulating the data.

20
ENTITIES
 The ENTITIES (persons, places, things etc.) which the organization
has to deal with.

 The name given to an entity should always be a singular noun


descriptive of each item to be stored in it.

E.g. student not students.

 Every relation has a schema, which describes the columns, or fields

 The relation itself corresponds to our familiar notion of a table.

 A relation is a collection of tuples, each of which contains values for a


fixed number of attributes.
21
ATTRIBUTES
 The ATTRIBUTES - the items of information which characterize
and describe these entities.

 Attributes are pieces of information about entities.

 The analysis must of course identify those which are actually


relevant to the proposed application.

 Attributes will give rise to recorded items of data in the database

22
ATTRIBUTES … Cont’d
At this level we need to know such things as:
 Attribute name (be explanatory words or phrases)
 Domain: is a set of values from which attribute values may be taken.
 Each attribute has values taken from a domain.
 For example, the domain of Name is string and that for salary is real
 Whether the attribute is part of the entity identifier (attributes which just
describe an entity and those which help to identify it uniquely)
 Whether it is permanent or time-varying (which attributes may change their
values over time)
 Whether it is required or optional for the entity (whose values will sometimes
be unknown or irrelevant)

23
Types of Attributes
1. Simple (Atomic) Vs Composite Attribute
 Simple : contains a single value (not divided into sub parts)
 E.g. Age, gender
 Composite: Divided into sub parts (composed of other attributes)
 E.g. Name, address
2. Single-valued Vs multi-valued Attributes
 Single-valued : have only single value(the value may change but
has only one value at one time)
 E.g. Name, Sex, Id. No. color_of_eyes
 Multi-Valued: have more than one value
 E.g. Address, dependent-name
 Person may have several college degrees

24
Types of Attributes … Cont’d
3. Stored Vs. Derived Attribute
 Stored : not possible to derive or compute E.g. Name, Address
 Derived: The value may be derived (computed) from the values of
other attributes.
 E.g. Age (current year – year of birth), Length of employment (current date-
start date), Profit (earning-cost) , G.P.A (grade point/credit hours)
4. Null Values
 NULL applies to attributes which are not applicable or which do not
have values.
 You may enter the value NA (meaning not applicable)
 Value of a key attribute can not be null.

Default value - assumed value if no explicit value

25
RELATIONSHIPS
 Related entities require setting of LINKS from one part of the database
to another.
 A relationship should be named by a word or phrase which explains its
function
 Role names are different from the names of entities forming the
relationship: one entity may take on many roles, the same role may be
played by different entities.
 An important point about a relationship is how many entities
participate in it.
 The number of entities participating in a relationship is called the
DEGREE of the relationship.
 UNARY/RECURSIVE RELATIONSHIP: Single entity
 BINARY RELATIONSHIPS: Two entities associated
 TERNARY RELATIONSHIP: Three entities associated
 N-NARY RELATIONSHIP: arbitrary number of entity sets
26
RELATIONSHIPS … Cont’d
 Another important point about relationship is the range of instances
that can be associated with a single instance from one entity in a single
relationship.
 The number of instances participating or associated with a single
instance from another entity in a relationship is called the
CARDINALITY of the relationship.
 ONE-TO-ONE, e.g. Building - Location,
 ONE-TO-MANY, e.g. hospital - patient,
 MANY-TO-ONE, e.g. Employee - Department
 MANY-TO-MANY, e.g. Author - Book.

27
RELATIONSHIPS … Cont’d

28
Relational Constraints/Integrity Rules
Relational Integrity
 Domain Integrity: No value of the attribute should be beyond the
allowable limits
 Entity Integrity: In a base relation, no attribute of a primary key can
be null
 Referential Integrity: If a foreign key exists in a relation, either the
foreign key value must match a candidate key in its home relation or
the foreign key value must be null foreign key to primary key match-
ups
 Enterprise Integrity: Additional rules specified by the users or
database administrators of a database are incorporated

29
Relational Constraints/Integrity Rules .. Cont’d
Key constraints
 If tuples are need to be unique in the database, and then we need to
make each tuple distinct. To do this we need to have relational keys.
 Super Key: an attribute or set of attributes that uniquely identifies a
tuple within a relation.
 Candidate Key: a super key such that no proper subset of that
collection is a Super Key within the relation.
 A candidate key has two properties:
1. Uniqueness
2. Irreducibility
 If a candidate key consists of more than one attribute it is called
composite key.
30
Relational Constraints/Integrity Rules .. Cont’d
 Primary Key: the candidate key that is selected to identify tuples
uniquely within the relation.

 The entire set of attributes in a relation can be considered as a


primary case in a worst case.

 Foreign Key: an attribute, or set of attributes, within one relation


that matches the candidate key of some relation.

 A foreign key is a link between different relations to create the


view or the unnamed relation

31
Relational languages and views
 The languages in relational DBMS are DDL and DML.
 We have the two kinds of relation in relational database. The
difference is on how the relation is created, used and updated:
1. Base Relation
 A Named Relation corresponding to an entity in the conceptual
schema, whose tuples are physically stored in the database.
2. View
 Is the dynamic result of one or more relational operations operating on
the base relations to produce another virtual relation.
 So a view is a virtually derived relation that does not necessarily
exist in the database but can be produced upon request by a particular
user at the time of request.

32
Relational languages and views … Cont’d
Purpose of a view

 Hides unnecessary information from users

 Provide powerful flexibility and security

 Provide customized view of the database for users

 A view of one base relation can be updated.

 Update on views derived from various relations is not allowed

 Update on view with aggregation and summary is not allowed.

33
Chapter 4

Data Modelling using Entity Relationship Model

1
Outlines
 Using High level Data Models for Database Design
 Entity types and Sets, Attributes and Keys
 Relationships, Roles and Structural Constraints
 Weak Entity Types
 Database Abstraction
 E/R Diagram naming conventions, and Design issues

2
Database Design
Database design consists of several tasks:
 Requirements Analysis,
 Conceptual Design, and Schema Refinement,
 Logical Design,
 Physical Design and Tuning
 In general, one has to go back and forth between these tasks to refine
a database design, and decisions in one task can influence the choices
in another task.
 In developing a good design, one should ask:
 What are the important queries and updates?
 What attributes/relations are involved?

3
The Three levels of Database Design
Constructing a model independent of any physical
Conceptual considerations.
Design After the completion of Conceptual Design one has to go for
refinement of the schema, which is verification of Entities,
Attributes, and Relationships

Constructing a model of the information used in an enterprise


based on a specific data model (e.g. relational, hierarchical or
Logical
network or object), but independent of a particular DBMS and
Design other physical considerations.

• Producing a description of the implementation of the


database on secondary storage.
• Describes the storage structures and access methods used to
Physical achieve efficient access to the data.
Design • Tailored to a specific DBMS system -- Characteristics are
function of DBMS and operating systems
• Includes estimate of storage space
4
Conceptual Database Design
 Conceptual design revolves around discovering and analyzing
organizational and user data requirements
 The important activities are to identify
 Entities
 Attributes
 Relationships
 Constraints
 And based on these components develop the ER model using
 ER diagrams

5
The Entity Relationship (E-R) Model
 Entity-Relationship modeling is used to represent conceptual view of
the database
 The main components of ER Modeling are:
 Entities
 Corresponds to entire table, not row
 Represented by Rectangle
 Attributes
 Represents the property used to describe an entity or a relationship
 Represented by Oval
 Relationships
 Represents the association that exist between entities
 Represented by Diamond
 Constraints
 Represent the constraint in the data
6
The Entity Relationship (E-R) Model … Cont’d
Before working on the conceptual design of the database, one has to
know and answer the following basic questions.
 What are the entities and relationships in the enterprise?
 What information about these entities and relationships should we
store in the database?
 What are the integrity constraints that hold?
 Constraints on each data with respect to update, retrieval and store.
 Represent this information pictorially in ER diagrams, then map ER
diagram into a relational schema

7
Developing an E-R Diagram
 Designing conceptual model for the database is not a one linear
process but an iterative activity where the design is refined again
and again.
 To identify the entities, attributes, relationships, and constraints on
the data, there are different set of methods used during the analysis
phase.
 These include information gathered by…
 Interviewing end users individually and in a group

 Questionnaire survey

 Direct observation

 Examining different documents

8
Developing an E-R Diagram … Cont’d
 The basic E-R model is graphically depicted and presented for review.
 The process is repeated until the end users and designers agree that the
E-R diagram is a fair representation of the organization’s activities and
functions.
 Checking for Redundant Relationships in the ER Diagram.
 Relationships between entities indicate access from one entity to
another.
 It is therefore possible to access one entity occurrence from another
entity occurrence even if there are other entities and relationships that
separate them.
 This is often referred to as Navigation' of the ER diagram
 The last phase in ER modeling is validating an ER Model against
requirement of the user.
9
Graphical Representations in ER Diagramming
 Entity is represented by a RECTANGLE containing the name of the
entity.

 Connected entities are called relationship participants


 Attributes are represented by OVALS and are connected to the
entity by a line.

10
Graphical Representations in ER Diagramming .. Cont’d
 A derived attribute is indicated by a DOTTED LINE.
(……..)

 PRIMARY KEYS are underlined.

 Relationships are represented by DIAMOND shaped symbols

11
Graphical Representations in ER Diagramming .. Cont’d
Example : Build an ER Diagram for the following information:
 Students
 Have an Id, Name, Dept, Age, Gpa
 Courses
 Have an Id, Name, Credit Hours
 Students enroll in courses and receive a grade

12
Graphical Representations in ER Diagramming .. Cont’d

13
Entity versus Attributes
 Consider designing a database of employees for an organization:
 Should address be an attribute of Employees or an entity (connected
to Employees by a relationship)?
 If we have several addresses per employee, address must be an entity
(attributes cannot be set-valued/multi valued)
 If the structure (city, Woreda, Kebele, etc) is important.
E.g. want to retrieve employees in a given city, address must be
modeled as an entity (attribute values are atomic)
 Cardinality on Relationship expresses the number of entity
occurrences/tuples associated with one occurrence/tuple of related
entity.

14
Entity versus Attributes … Cont’d
 Existence Dependency: the dependence of an entity on the existence
of one or more entities.
 Weak entity: an entity that can not exist without the entity with
which it has a relationship
 Participating entity in a relationship is either optional or
mandatory.

15
Structural Constraints on Relationship
1. Constraints on Relationship / Multiplicity/ Cardinality Constraints
 Multiplicity constraint is the number of or range of possible occurrence
of an entity type/relation that may relate to a single occurrence/tuple of
an entity type/relation through a particular relationship.
 Mostly used to insure appropriate enterprise constraints.

One-to-one relationship:
 A customer is associated with at most one loan via the relationship
borrower
 A loan is associated with at most one customer via borrower

16
Structural Constraints on Relationship .. Cont’d

17
Structural Constraints on Relationship .. Cont’d

18
Structural Constraints on Relationship .. Cont’d
One-to-Many Relationships
 In the one-to-many relationship a loan is associated with at most one
customer via borrower, a customer is associated with several
(including 0) loans via borrower

19
Structural Constraints on Relationship .. Cont’d

20
Structural Constraints on Relationship .. Cont’d

• Many-To-Many Relationship

21
Structural Constraints on Relationship .. Cont’d

22
Participation of an Entity Set in a Relationship Set
Total participation: every entity in the entity set participates in at least
one relationship in the relationship set.
 The entity with total participation will be connected with the
relationship using a double line.
E.g. : Participation of EMPLOYEE in “belongs to” relationship with
DEPARTMENT is total since every employee should belong to a
department.

23
Participation of an Entity Set in a Relationship Set .. Cont’d
E.g.: Participation of EMPLOYEE in “manages” relationship with
DEPARTMENT, DEPARTMENT will have total participation but not
EMPLOYEE

Partial participation: some entities may not participate in any


relationship in the relationship set
E.g.: Participation of EMPLOYEE in “manages” relationship with
DEPARTMENT, EMPLOYEE will have partial participation since not all
employees are managers.

24
Problem in ER Modeling
 The Entity-Relationship Model is a conceptual data model that views
the real world as consisting of entities and relationships.
 The model visually represents these concepts by the Entity-
Relationship diagram.
 While designing the ER model one could face a problem on the
design which is called a connection traps.
 Connection traps are problems arising from misinterpreting certain
relationships
 There are two types of connection traps;
1. Fan trap
2. Chasm trap

25
Fan trap
 Occurs where a model represents a relationship between entity
types, but the pathway between certain entity occurrences is
ambiguous.

 May exist where two or more one-to-many (1:M) relationships fan


out from an entity.

 The problem could be avoided by restructuring the model so that


there would be no 1:M relationships fanning out from a singe entity
and all the semantics of the relationship is preserved.

26
Fan trap … Cont’d

Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6 Emp6


working in Branch 1 (Bra1)? Thus from this ER Model one can not tell which car
is used by which staff since a branch can have more than one car and also a branch
is populated by more than one employee. Thus we need to restructure the model to
avoid the connection trap.
27
Fan trap … Cont’d
 To avoid the Fan Trap problem we can go for restructuring of the E-R
Model. This will result in the following E-R Model.

28
Chasm Trap
 Occurs where a model suggests the existence of a relationship
between entity types, but the path way does not exist between certain
entity occurrences.
 May exist when there are one or more relationships with a minimum
multiplicity on cardinality of zero forming part of the pathway
between related entities.
Example:

If we have a set of projects that are not active currently then we can not
assign a project manager for these projects. So there are project with no
project manager making the participation to have a minimum value of
zero.

29
Chasm Trap … Cont’d
Problem:
 How can we identify which BRANCH is responsible for which
PROJECT? We know that whether the PROJECT is active or not
there is a responsible BRANCH. But which branch is a question to
be answered, and since we have a minimum participation of zero
between employee and PROJECT we can’t identify the BRANCH
responsible for each PROJECT.
 The solution for this Chasm Trap problem is to add another relation
ship between the extreme entities (BRANCH and PROJECT)

30
Chapter 5

Relational Algebra

1
Outlines
 Relational Algebra
 Relational Calculus

2
Relational Algebra
 Relational algebra is a theoretical language with operations that work
on one or more relations to define another relation without changing
the original relation
 The basic set of operations for the relational model is known as the
relational algebra.
 These operations enable a user to specify basic retrieval requests.
 The result of the retrieval is a new relation, which may have been
formed from one or more relations.
 A sequence of relational algebra operations forms a relational algebra
expression, whose result will also be a relation that represents the
result of a database query (or retrieval request).
3
Relational Algebra … Cont’d
 The output from one operation can become the input to another
operation (nesting is possible)
 There are different basic operations that could be applied on relations
on a database based on the requirement.
 Selection ( σ ): Selects a subset of rows from a relation.
 Projection ( π ): Deletes unwanted columns from a relation.
 Renaming: assigning intermediate relation for a single operation
 Cross-Product ( x ): Allows us to combine two relations.
 Set-Difference ( - ): Tuples in relation1, but not in relation2.
 Union (∪ ): Tuples in relation1 or in relation2.
 Intersection (∩): Tuples in relation1 and in relation2
 Join :Tuples joined from two relations based on a condition
4
Relational Algebra … Cont’d
 Table1: Sample table used to illustrate different kinds of relational
operations. The relation contains information about employees, IT skills
they have and the school where they attend each skill.

5
Selection
 Selects subset of tuples/rows in a relation that satisfy selection
condition.
 Selection operation is a unary operator (it is applied to a single
relation)
 The Selection operation is applied to each tuple individually
 The degree of the resulting relation is the same as the original relation
but the cardinality (no. of tuples) is less than or equal to the original
relation.
 The Selection operator is commutative.
 Set of conditions can be combined using Boolean operations (∧(AND),
∨(OR), and ~(NOT))
 No duplicates in result.

6
Selection … Cont’d
 Result relation can be the input for another relational algebra
operation (Operator composition).
 It is a filter that keeps only those tuples that satisfy a qualifying
condition i.e. those satisfying the condition are selected while others
are discarded.
Notation:
<Selection Condition> <Relation Name>
Example: Find all Employees with skill type of Database.
< SkillType =”Database”> (Employee)

 This query will extract every tuple from a relation called Employee
with all the attributes where the SkillType attribute with a value of
“Database”.
7
Selection … Cont’d
The resulting relation will be the following.

Example: Find all Employee with SkillType Database and School Unity
the relational?
< SkillType =”Database” AND School=”Unity”> (Employee)

8
Projection
 Selects certain attributes while discarding the other from the base
relation.
 The PROJECT creates a vertical partitioning.
 Deletes attributes that are not in projection list.
 Schema of result contains exactly the fields in the projection list, with
the same names that they had in the (only) input relation.
 Projection operator has to eliminate duplicates.
 Note: real systems typically don’t do duplicate elimination unless
the user explicitly asks for it.
 If the Primary Key is in the projection list, then duplication will not
occur
 Duplication removal is necessary to insure that the resulting table is
also a relation.
9
Projection … Cont’d
Notation:
π <Selected Attributes> <Relation Name>
Example: To display Name, Skill, and Skill Level of an employee, the
query and the resulting relation will be:
π <FName, LName, Skill, Skill_Level> (Employee)

10
Projection … Cont’d
Exercise: Write a relational operation that display Name, Skill and
Skill Level of an employee with Skill SQL and SkillLevel greater than
5?

11
Rename Operation
 Allows us to name, and therefore to refer to, the results of relational
algebra expressions.
 Allows us to refer to a relation by more than one name.
Example:  x (E) returns the expression E under the name X
 We may want to apply several relational algebra operations one after
the other. The query could be written in two different forms:
1. Write the operations as a single relational algebra
expression by nesting the operations.
2. Apply one operation at a time and create intermediate result
relations.
 In the latter case, we must give names to the relations that hold the
intermediate results.

12
Rename Operation … Cont’d
 If we want to have the Name, Skill, and Skill Level of an employee
with salary greater than 1500 and working for department 5, we
can write the expression for this query using the two alternatives:

• Then Result will be equivalent with the relation we get using the
first alternative.
13
UNION Operation
 The result of this operation, denoted by R U S, is a relation that includes
all tuples that are either in R or in S or in both R and S.
 Duplicate tuples are eliminated.
 The two operands must be “type compatible”.
Type Compatibility
 The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must
have the same number of attributes, and the domains of corresponding
attributes must be compatible; that is, Dom(Ai)=Dom(Bi) for i=1, 2,.. , n.
 The resulting relation for;
 R1 ∪ R2,
 R1 ∩ R2, or
 R1 - R2 has the same attribute names as the first operand relation R1
(by convention).
14
UNION Operation … Cont’d
Example

15
INTERSECTION Operation
 The result of this operation, denoted by R ∩ S, is a relation that
includes all tuples that are in both R and S.
 The two operands must be "type compatible" ∩

16
Set Difference (or MINUS) Operation
 The result of this operation, denoted by R - S, is a relation that
includes all tuples that are in R but not in S.
 The two operands must be "type compatible”.
Some Properties of the Set Operators
 Notice that both union and intersection are commutative
operations; that is
 R ∪ S = S ∪ R, and R ∩ S = S ∩ R
 Both union and intersection can be treated as n-nary operations
applicable to any number of relations as both are associative
operations; that is
 R ∪ (S ∪ T) = (R ∪ S) ∪ T, and (R ∩ S) ∩ T = R ∩ (S ∩ T)
 The minus operation is not commutative; that is, in general
 R-S≠S–R

17
Set Difference (or MINUS) Operation … Cont’d

18
CARTESIAN (cross product) Operation
 This operation is used to combine tuples from two relations in a
combinatorial fashion.
 That means, every tuple in Relation1 (R) one will be related with
every other tuple in Relation2 (S).
 In general, the result of R(A1, A2, . . ., An) x S(B1,B2, . . ., Bm) is a
relation Q with degree n + m attributes Q(A1, A2, . . ., An, B1, B2, . .
., Bm), in that order.
 Where R has n attributes and S has m attributes.
 The resulting relation Q has one tuple for each combination of tuples
i.e. one from R and one from S.
 Hence, if R has n tuples, and S has m tuples, then | R x S | will have
n* m tuples.
 The two operands do NOT have to be "type compatible”
19
CARTESIAN (cross product) Operation … Cont’d
Example

20
CARTESIAN (cross product) Operation … Cont’d

Exercise: Extract employee information about managers of the


departments.

21
JOIN Operation
 The sequence of Cartesian product followed by select is used quite
commonly to identify and select related tuples from two relations.
 This special operation is called JOIN.
 JOIN Operation is denoted by a symbol.
 This operation is very important for any relational database with more
than a single relation, because it allows us to process relationships
among relations.
 The general form of a join operation on two relations:
R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is:

Where R and S can be any relations that result from general relational
algebra expressions
22
JOIN Operation … Cont’d
 Since JOIN function in two relation, it is a Binary operation.
 The type of JOIN which is called THETA JOIN (θ - JOIN) used θ as
a logical operator used in the join condition.
θ Could be { <, ≤ , >, ≥, ≠, = }
 In Theta join tuples whose join attributes are null do not appear in the
result.

Example: Thus in the above example we want to extract employee


information about managers of the departments, the algebra query using
the JOIN operation will be.

23
JOIN Operation … Cont’d

24
JOIN Operation … Cont’d

25
EQUIJOIN Operation
 The most common use of join involves join conditions with equality
comparisons only ( = ).
 Such a join, where the only comparison operator used is called an
EQUIJOIN.

26
NATURAL JOIN Operation
 The standard definition of natural join requires that the two join
attributes, or each pair of corresponding join attributes, have the
same name in both relations.
 If this is not the case, a renaming operation on the attributes is applied
first. The result of the natural join is the set of all combinations of
tuples in R and S that are equal on their common attribute names.

27
OUTER JOIN Operation
 OUTER JOIN is another version of the JOIN operation where non
matching tuples from the first Relation are also included in the
resulting Relation where attributes of the second Relation for a non
matching tuples from Relation one will have a value of NULL.
 An extension of the join operation that avoids loss of information.
 Outer Join Can be:
 Left Outer Join
 Right Outer Join
 Full Outer Join

28
Left Outer Join
The result of the left outer join is
the set of all combinations of
tuples in R and S that are equal on
their common attribute names, in
addition to tuples in R that have no
matching tuples in S.

29
Right Outer Join
The result of the right outer
join is the set of all
combinations of tuples in R and
S that are equal on their
common attribute names, in
addition to tuples in S that have
no matching tuples in R.

30
Full Outer Join
The result of the full outer join is
the set of all combinations of
tuples in R and S that are equal on
their common attribute names, in
addition to tuples in S that have no
matching tuples in R and tuples in
R that have no matching tuples in
S in their common attribute names

31
SEMIJOIN Operation
 SEMI JOIN is another version of the JOIN operation where the
resulting Relation will contain those attributes of Relation one that are
related with tuples in the second Relation.

32
Relational Calculus
 A relational calculus expression creates a new relation, which is
specified in terms of variables that range over rows of the stored
database relations (in tuple calculus) or over columns of the stored
relations (in domain calculus).
 In a calculus expression, there is no order of operations to specify how
to retrieve the query result.
 A calculus expression specifies only what information the result should
contain rather than how to retrieve it.
 In Relational calculus, there is no description of how to evaluate a
query, this is the main distinguishing feature between relational algebra
and relational calculus.
 Relational calculus is considered to be a non procedural language.

33
Relational Calculus … Cont’d
 This differs from relational algebra, where we must write a sequence
of operations to specify a retrieval request.
 Hence relational algebra can be considered as a procedural way of
stating a query.
 When applied to relational database, the calculus is not that of
derivative and differential but in a form of first-order logic or
predicate calculus
 A predicate is a truth-valued function with arguments.
 When we substitute values for the arguments in the predicate, the
function yields an expression, called a proposition, which can be
either true or false.

34
Relational Calculus … Cont’d
 If a predicate contains a variable, as in ‘x is a member of staff’, there
must be a range for x. When we substitute some values of this range for
x, the proposition may be true; for other values, it may be false.
 If COND is a predicate, then the set off all tuples evaluated to be true
for the predicate COND will be expressed as follows:
{t | COND(t)}
Where t is a tuple variable and COND (t) is a conditional expression
involving t. The result of such a query is the set of all tuples t that
satisfy COND (t).
 If we have set of predicates to evaluate for a single query, the predicates
can be connected using ∧(AND), ∨(OR), and ~(NOT)

35
Tuple-oriented Relational Calculus
 The tuple relational calculus is based on specifying a number of tuple
variables.
 Tuple relational calculus is interested in finding tuples for which a
predicate is true for a relation.
 Based on use of tuple variables.
 Tuple variable is a variable that ‘ranges over’ a named relation: that is,
a variable whose only permitted values are tuples of the relation.
 If E is a tuple that ranges over a relation employee, then it is
represented as EMPLOYEE(E) i.e. Range of E is EMPLOYEE
 Then to extract all tuples that satisfy a certain condition, we will
represent it as all tuples E such that COND(E) is evaluated to be true.

36
Tuple-oriented Relational Calculus … Cont’d
{E ⁄ COND(E)}
 The predicates can be connected using the Boolean operators:
∧ (AND), ∨ (OR), ∼ (NOT)
 COND(t) is a formula, and is called a Well-Formed-Formula (WFF)
if:
Where the COND is composed of n-nary predicates (formula
composed of n single predicates) and the predicates are
connected by any of the Boolean operators.

 And each predicate is of the form A θ B and θ is one of the logical


operators { <, ≤ , >, ≥, ≠, = } which could be evaluated to either true or
false.
 And A and B are either constant or variables.
 Formulae should be unambiguous and should make sense.
37
Tuple-oriented Relational Calculus … Cont’d
Example (Tuple Relational Calculus)
 Extract all employees whose skill level is greater than or equal to 8

{E | Employee(E) ∧ E.SkillLevel >= 8}

• E.SkillLevel means the value of the SkillLevel (SkillLevel) attribute


for the tuple E.

38
Tuple-oriented Relational Calculus … Cont’d
Exercise: Find EmpId, FName, LName, Skill and School where the
skill is attended where of employees with skill level greater than or equal
to 8.

{E.EmpId, E.FName, E.LName, E.Skill, E.School | Employee(E) ∧


E.SkillLevel >= 8}

39
Quantifiers in Relation Calculus
 To tell how many instances the predicate applies to, we can use the
two quantifiers in the predicate logic.
 One relational calculus expressed using Existential Quantifier can
also be expressed using Universal Quantifier.

1. Existential quantifier ∃ (‘there exists’)


 Existential quantifier used in formulae that must be true for at least
one instance, such as:
 An employee with skill level greater than or equal to 8 will be:
{E | Employee(E) ∧ (∃E)(E.SkillLevel >= 8)}
 This means, there exist at least one tuple of the relation employee
where the value for the SkillLevel is greater than or equal to 8

40
Quantifiers in Relation Calculus … Cont’d
2. Universal quantifier ∀ (‘for all’)
 Universal quantifier is used in statements about every instance, such
as:
 An employee with skill level greater than or equal to 8 will be:
{E | Employee(E) ∧ (∀E)(E.SkillLevel >= 8)}
 This means, for all tuples of relation employee where value for the
SkillLevel attribute is greater than or equal to 8.

41
Chapter 6

Structured Query Language (SQL)


Outline
 SQL Statements
 SQL Query
 Data Manipulation Language
 Constraints and Triggers

2
What is SQL ?
 SQL stands for structured query language
 Is a non-procedural language, i.e you can specify what information
you require, rather than how to get it.
 Is an ANSI standard computer language
 Allow you to access a database
 SQL can execute queries against a database
 Is easy to learn
 Has two major components
 DDL
 DML

3
Writing SQL Commands
 SQL statement consists of :-
 Reserved words :- fixed part of SQL language and have fixed
meaning
 Must spelt exactly
 User defined words :- are made up by the user (according to
certain syntax rule)
 They represent the names of varies database objects, i.e, tables,
columns, views, indexes ---
 ;(semicolon) :- used as statement terminator to mark the end of SQL
statement
 SQL is case insensitive
 Exception :- literal character data must be typed exactly as it appear in
the database. E.g. if we store a persons name ‘ABEBE’ and
search for it using the string ‘Abebe’ the row will not found.
4
Writing SQL Commands … Cont’d
 SQL is free format
 For readability begin with a new line for clauses
 SQL identifier
 Used to identify objects in database , such as table name, view name ,columns
 Consists of letter, digit and underscore(_)
 can be no longer than 128 character
 Can’ t contain space
 Must start with letter or underscore

5
SQL data types
1. Character data : Consists of sequence of character
Syntax CHARACTER[length]
CHARACTERVARYING[length]
Length :- the maximum number of character the a column hold.
Abbreviation CHARACTER CHAR
CHARACTERVARYING VARCHAR
 A character string may be defined as having a fixed or varying length.
 If fixed length is defined and we enter a string fewer character than
this length , the string padded with blanks to make up the required
size,
 If varying length is defined and we enter a string fewer character than
this length, only those characters entered are stored.
E.g. name CHAR(20), address VARCHAR(20)
6
SQL data types … Cont’d
2. Numeric data:- The exact numeric data type is used to define
numbers with an exact representation.
Syntax NUMERIC[precision, [scale]]
DECIMAL[precision, [scale]]
INTEGER
SMALLINTGER
Precision:- the total number of significant decimal digit
scale:- the total number of decimal place
Abbreviations:-
INTEGER INT
DECIMAL DEC
NUMERIC NUM
SMALLINTEGR SMALLINT
7
SQL data types … Cont’d
 NUMERIC and DECIMAL store number in decimal notation
 INTEGER is used for large positive or negative value
 SMALLINT is used for small positive or negative value
E.g. age SMALLINT
salary DECIMAL(5,2)

#1. for salary column which is valid


A. 999.99
B. 999.888
C. 9999.09

8
SQL data types … Cont’d
3. Date time data:- Used to define points in time
▪ DATE , TIME, TIMESTAMP
 DATE:- is used to sore calendar dates using YEAR,MONTH & DAY
fields.
 TIME:-is used to store time using HOUR,MINUTE & SECOND
fields
 TIMESTAMP:- used to store date and time. E.g. birthdate DATE
4.Boolean data
 Consists of the distinct truth values TRUE or FALSE, unless
prohibited by a NOTNULL constraint
 For NULL type it return UNKNOWN result.
▪ E.g. status BOOLEAN

9
Integrity enhancement features
 Integrity refers constraints that we wish to impose in order to protect
the database from becoming inconsistent.
 It includes:- Required data
domain constraint
Entity integrity
Referential integrity
1. Required data:- Some column must contain a valid value.
 They are not allowed to contain NULL values
 NULL is distinct from blank or zero
 NULL is used to represent data that is either not available, missing
or not applicable.
 Set at defining stage (CREATE, ALTER)
E.g. position VARCHAR(10) NOT NULL
10
Integrity enhancement features … Cont’d
2. Domain Constraint:- Every column has a domain, i.e. a set of legal
values. To set these constraint use
❖ CHECK: Check clause allows a constraint to be defined on a
column or the entire table.
CHECK (searchcondition)
e.g. Sex CHAR NOT NULL CHECK(sex IN(‘M’, ‘F’))
❖ DOMAIN
Syntax: Create DOMAIN domainname[AS] datatype
[DEFAULT defaultoption]
[CHECK (searchcondition)]
E.g. Create DOMAIN sextype as CHAR
DEFAULT ‘M’
CHECK(VALUE IN(‘M’,’F’));
 This creates a domain sextype.
11
Integrity enhancement features … Cont’d
 when defining a column sex,we use a domain name sextype in place
of the data type CHAR
I.e. sex sextype NOT NULL
 To remove domain use DROP DOMAIN constraint
Syntax: DROP DOMAIN domainname [RESTRICT|CASCADE]
 The drop behavior, RESTRICT|CASCADE specifies the action to be
taken if the domain is currently being used.
 If RESTRICT is specified and the domain is used in an existing table or
view the drop will fail.
 If CASCADE is specified any column that is based on domain is
automatically changed to use the domains underlying datatype
(constraint/default for domain is replaced by constraint/default of column,
if appropriate)

12
Integrity enhancement features … Cont’d
3. Entity integrity:-A primary key of a table must contain a unique,
non-null value for each row
PRIMARY KEY (sid)
 To define a composite primary key
PRIMARY KEY (sid,cid)
 Or we can use UNIQUE clause
Sid VARCHAR(5) NOT NULL,
cid VARCHAR(9) NOTNULL,
UNIQUE(sid,cid)

13
Integrity enhancement features … Cont’d
4. Referential Integrity:-
Syntax: FOREIGN KEY(columnname) REFERENCES relationname
 Referential action specified using ON UPDATE and ON DELETE subclause of
Fk clause.
 When a user attempt to delete a row from a parent table, SQL support four
options regarding the actions to be taken :-
 CASCADE:- delete the row from the parent table and child table.
 SET NULL:- delete the parent and set null for child row , i.e. occur if the Fk
column don’t have NOT NULL qualifier specified.
 SET DEFAULT:- delete the row from the parent and set default value for
child, i.e. valid if the Fk column have a DEFAULT value specified
 NO ACTION:-reject the delete operation from the parent table. It is the
default setting if the ON DELETE rule is omitted.
E.g.1. FOREIGN KEY(cid) REFERENCES course ON DELETE SET NULL
2. FOREIGN KEY(cid) REFERENCES course ON UPDATE CASCADE
14
Data Definition Language (DDL)

 The SQL data definition language(DDL) allow database objects such


as schemas, domains, tables, views and indexes to be created and
destroyed.
 The main SQL data definition language statements are:-
CREATE DATABASE DELETE DATABASE
CRATE TABLE ALTER TABLE DROP TABLE
CRATE VIEW DROP VIEW
CREATE INDEX DROP INDEX

15
Data Definition Language (DDL) … Cont’d
1. Creating a database
Syntax: CREATE DATABASE database-name
e.g. CREATE DATABASE KIoT
2. Creating a table
Syntax:

CREATE TABLE table_name

(column-name datatype[NOT NULL][UNIQUE],

PRIMARY KEY(listofcolumn), FOREIGN KEY(listofcolumn)

REFERENCES parenttablename(listofcandidatekeycolumn))

16
Data Definition Language (DDL) … Cont’d
1. Create table department
(did varchar(9) primary key, deptname varchar(12) NOT NULL
UNIQUE CHECK( deptname IN(‘IT’,’maths’,’stat’),school
varchar(20))

2. create table student


(sid varchar(9) primary key, fname varchar(9) NOT NULL,lname
varchar(20),did varchar(9), constraint fk_did FOREIGN KEY(did)
REFERENCES department(did))

17
Data Definition Language (DDL) … Cont’d
3. Deleting database and Dropping tables
DELETE :- is used to delete the whole database
Syntax: DATABASE database-name
E.g. DELETE DATABASE KIoT
DROP:- used to remove a table from a database
Syntax: DROP TABLE table-name.
E.g. DROP TABLE student;
4. Altering a table: Used to modify a table after it is created
Syntax: ALTER TABLE table-name
DROP COLUMN column-name
ADD column-name data-type
ALTER COLUMN column-name data-type
E.g. ALTER TABLE student
ADD age int,
ALTER COLUMN sid int;
18
Data Mefinition Language (DML)
 Is used to insert, retrieve and modify database information
 The following are commonly used DML clauses:-
 INSERT:- to insert data in a table
 SELECT:- to query data in the database
 UPDATE:- to update data in a table
 DELETE:- to delete data from a table

1. INSERT: The INSERT command in SQL is used to add records to


an existing table.
e.g. INSERT INTO department
VALUES(‘d001’,’compsc’,’SMComps’)

19
Data Mefinition Language (DML) … Cont’d
2. SELECT:- used to query to retrieve selected data that much the criteria that you specify.
syntax
SELECT [DISTINCT|ALL]{*|[columnexpression[AS newname]]
FROM tablename
WHERE[condition]
GROUP BY columnlist
HAVING[condition]
ORDER BY columnlist
 The sequence of processing in a SELECT statement is:-
 FROM : specifies the table(s) to be used
 WHERE: filter rows subject to some condition
 GROUP BY: from group of rows with the same column value
 HAVING: filter the groups subject to some condition
 ORDER BY: specifies the order of the output

 The result of a select statement is another table. That table is called view
20
Data Definition Language (DML) … Cont’d
1. Retrieve all rows and columns:- list all columns in the table
 Where clause is unnecessary
e.g. SELECT sid,fname,lname,age,did
FROM student;
 Use (*) for quick way of expressing all columns
e.g. SELECT * FROM student;
2. Use of DISTINCT clause: DISTINCT is used to eliminate the
duplicates
E.g. SELECT DISTINCT propertyno FROM property
Property Property
PA14
PA14
PG4
PG4 PG4
PA14
PG36
21 PG36
Data Definition Language (DML) … Cont’d
3. Calculated fields:- We can use calculated fields in SQL
E.g. SELECT sid,age/2 AS halfage
FROM student;
4. Row selection(WHERE clause):- used to retrieve all rows from a
table that satisfy a condition. Use the following basic predicates:-
 Comparison:- compare the value of one expression with the value
of another expression.
 Range:- test whether the value is fall with in a specified range of
values.
 Set membership:- test whether the value of an expression equals
one of a set of values.
 Pattern match:- test whether a string matches a specified pattern.
 Null:- test whether a column has a null(unknown) value

22
Data Definition Language (DML) … Cont’d
I. Comparison search condition:-
in SQL the following simple comparison operator are available:

= equals

<> Is not equal to(ISO standard)

< Less than

> Grater than

<= Less than or equal to

>= Greater than or equal to

!= Is not equal to (allowed in some


dialects)
23
Data Definition Language (DML) … Cont’d
 SQL also supports logical support like:-
AND,OR,NOT
 The rules for evaluating a conditional expression are:-
 an expression is evaluated from left to right
 Sub expression in the bracket are evaluated first
 NOTs are evaluated before ANDs and Ors
 ANDs are evaluated before Ors
 Compound comparison
E.g. select * from student
where fname=‘Alemu’ OR lname=‘Kebede’;

24
Data Definition Language (DML) … Cont’d
II. Range search condition(BETWEEN/NOT BETWEEN):- between
includes the end points of the range.
E.g. select * from student
Where age BETWEEN 15 AND 25;
 Use a negated version (NOT BETWEEN) that check for values
outside the range.
III. Set membership search condition(IN/NOT IN):-
 IN checks/tests whether a data value matches one of a list of values.
 NOT IN is the negated version
E.g. select * from student
Where fname IN(‘Abebe’,’Kebede’);

25
Data Definition Language (DML) … Cont’d
IV. Pattern match search condition(LIKE/NOT LIKE)
 (%) percent character represent any sequence of zero or more
character.
 ( _ ) underscore character represent any single character.
 LIKE ‘H%’ :- The first character must be H, the rest can be any thing
 LIKE ‘H_ _ _’ :- There must be exactly 4 characterstics ,the first
must be H
 LIKE ‘%e’ :- The last character is e
 LIKE ‘%Abebe%’:- A sequence of character of any length containing
‘Abebe’
 NOT LIKE ‘H%’:- the first character can’t be an H.

E.g. select * from student


where fname LIKE ‘A%’;
26
Data Definition Language (DML) … Cont’d
V. Null search condition(IS NULL/ISNOT NULL)
 A NULL comment is considered to have unknown value.
e.g. select * from student
where did=‘d001’ AND comment IS NULL
 The negated version (IS NOT NULL) can be used to test values that
are not null.

27
Modifying data in the database
 Update statement allows the contents of existing data to be modified.
Syntax
UPDATE <table name>
SET column name1 =data value1,column name2 =data value2,………
where search condition]
 rows in a named table to be changed

 For UPDATE all rows (where close is omitted)


If you want to double age of student
Eg . UPDATE student
Set age=age*2;

28
Modifying data in the database … Cont’d
 UPDATE specific rows
use where to specify a condition
E.g. UPDATE student
Set age=25
Where sid=’s001’;
 Update mulitiple columns
Eg. UPDATE student
Set age=45,did=’d080’
Where fname=’Abebe’

29
Modifying data in the database … Cont’d
 Deleting data from the data base
Use the DELETE statement allow rows to be delete from a named table
This statement does not touch table definition
Syntax DELETE FROM <tablename>
WHERE [search condition]
I. DELETE all rows : delete/remove all rows
where is omitted
Eg. DELETE from student or DELETE * from student
II. Delete specific rows: where clause is used for specifying a condition.
Eg. Delete from student
Where sid=’s001’

30
SQL Aggregate Functions
 COUNT :- return the number of values in a specified column.

 SUM :- return the sum of values in a specified column

 AVG :- return the average values in a specified column.

 MIN :- return the smallest values in a specified column.

 MAX :- return the largest values in a specified column.

31
SQL Aggregate Functions … Cont’d
 Use of count(*): To count all values in a column
Eg. Select count (*) As newage
From student
Where age>25
 This statement count all columns that have age above >25
 As used for labeling column name
 Use of count(DISTINCT <column name>): Used to count distinct
value(remove repetition)
Eg. Select count(distinct age) as newage
From student
Where age>15

32
SQL Aggregate Functions … Cont’d
 Use of count and sum: We can combine functions together
E.g. Find the total no. student and sum of their age whose age is above
25
Select count(sid) as sno,sum(age) as newage
From student
Where age ≥ 25
 Use of MIN,MAX,AVG: find the min,max and average of student
age.
E.g. Select MIN(age) as minage, MAX(age) as maxage, AVG(age) as
average from staff.
 use group by used to display a grouped query.
E.g. Select did count(fname) as no ,sum(avg)as no_age
From student Group by did Order by did;
33
Use Group by Clause
 Used to group aggregate functions that applied to records based on
column values.
 used to display a grouped query.
Eg. Select did count(fname) as no ,sum(avg)as noage
From student
Group by did;
Order by did;
 Sorting results(ORDER BY clause):-
Use Ascending (ASC)
Descending (DESC)
E.g. select fname,lname,age from student
ORDER by age DESC;

34
Use of Having Clause
 The HAVING clause is used to filter grouped data
 Is used to perform aggregate function on grouped data
E.g. Select did, count(fname) as noofstud, sum(age) as newage
from student
group by did
having count(fname)>1

35
Using a sub query using equality
 Sub queries consits of queries within queries that is used to retrive data
from related tables
Eg. Select fname,lname
From student where did=(select did from department where
depthead=‘Jemal’);
 using sub query with an aggregate function
Eg. Select fname,lname,age(select avg(age) from student) as agediff
From student
Where age>(select avg(age)from student);

36
Use ANY/SOME and ALL
 ALL :- this clause return value if it is satisfied by all values
produced by the sub query.
 ANY /some :- the condition will be true if it is satisfied by any(one
or more)values Produced by sub query.
e.g1. Select fname,lname,age from student
Where age >some(select age from head
where did=’d001’)
e.g2.select fname,lname,age from student
where age>ALL(select age from head
where did=’d001’)

37
Join
J oin:- used to combine columns from several tables in to a single table
I. Simple join
 Used to join columns from two or more tables based on equality
 Is a type of equi join
 Use alias for short hand representation of tables
 Table Aliases: Using full table names as prefixes can make SQL queries
unnecessarily wordy. Table aliases can make the code a little more concise.

E.g. Select s.sid,fname, lname from student s, department d where s.did=d.did


Order by lname desc
The SQL standared provides the f/f alternative
 From Student Join ON departement h.did=s.did
 From student Join department USING did
 From Student NATURAL Join department
38
Join … Cont’d
 Three table Join: Selecting values from three tables
E.g. select h.hid,d.dname,s.fname, s.lname
from head h,department d, student s
where h.hid=d.hid AND d.did=s.did
order by h.hid

E.g. Multiple grouping columns


Find the no of student in compSc
Select s.sid,s.fname, count(*) as noos stud from student s,department p,
Where s.did=p.did
group by s.sid, s.did
order by s.sid,s.did
39
Join … Cont’d
 The inner join of two table
Eg. Select b*,p*
from student b, department p where b.did=p.did
 Left outer join
Eg. Select b.*,p.*
From student b LEFT JOIN Dept p ON b.bdid=p.pdid

 <Right join>
 <full join>

40
Views
 A view is a virtual table based on the result-set of a SELECT statement.
 A view contains rows and columns, just like a real table.
 The fields in a view are fields from one or more real tables in the database.
 You can add SQL functions, WHERE, and JOIN statements to a view and
present the data as if the data were coming from a single table.
 Note: The database design and structure will NOT be affected by the functions,
where, or join statements in a view.

Syntax: CREATE VIEW view_name AS


SELECT column_name(s)
FROM table_name
WHERE condition
 Note: The database does not store the view data. The database engine recreates
the data, using the view's SELECT statement, every time a user queries a view.
41
Using Views
 A view could be used from inside a query, a stored procedure, or from
inside another view.
 By adding functions, joins, etc., to a view, it allows you to present
exactly the data you want to the user.
 The sample database Northwind has some views installed by default. The
view "Current Product List" lists all active products (products that are
not discontinued) from the Products table.

 The view is created with the following SQL:


CREATE VIEW [Current Product List] AS
SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No
42
Using Views … Cont’d
 We can query the view above as follows:
SELECT * FROM [Current Product List]
 Another view from the Northwind sample database selects every
product in the Products table that has a unit price that is higher than
the average unit price:
CREATE VIEW [Products above Average Price] AS
SELECT ProductName, UnitPrice
FROM Products
WHERE UnitPrice> (SELECT AVG (UnitPrice) FROM
Products)
 We can query the view above as follows:
SELECT * FROM [Products above Average Price]

43
Using Views … Cont’d
 Another example view from the Northwind database calculates the total sale
for each category in 1997. Note that this view selects its data from another view
called "Product Sales for 1997":
CREATE VIEW [Category Sales For 1997] AS
SELECT DISTINCT CategoryName,Sum(ProductSales) AS
CategorySales
FROM [Product Sales for 1997]
GROUP BY CategoryName
 We can query the view above as follows:
SELECT * FROM [Category Sales For 1997]
 We can also add a condition to the query. Now we want to see the total sale only
for the category "Beverages":
SELECT * FROM [Category Sales for 1997]
WHERE CategoryName='Beverages'
44
Database Design

1
Outlines
 Steps of Database Design

 Convert ER to relations

 Normalization

 Physical database design

2
Database Design
 Database design is the process of coming up with different kinds of
specification for the data to be stored in the database.
 Describe how data is stored in the computer system.
 Defining its structure, characteristics and contents of data.
Database Design Process
Step 1:- Requirements collection and Analysis
 Prospective users are interviewed to collect information.
 This step result in a concise set of users requirement.
 The functional requirement should be specified as well as data
requirements.
 Functional requirement can be documented using diagrams such as
sequence diagrams, DFD scenarios.

3
Database Design … Cont’d
Step 2:- Conceptual Design
 Create conceptual schema.
 Conceptual schema:- concise description of data requirement of the user,
and include a detailed description of the entity types, relationships,
constraint.
 End user must understand it.
Step 3:- Database implementation (Logical Design)
 Use one of DBMS for implementation.
 The conceptual schema is transformed from the high level data model
into implementation model.
Step 4:- Physical Design
 International storage structure, indexes, access paths and file
organizations are specified.
 Application programs are designed and implemented.
4
Database Design … Cont’d
 Generally the design part has divided into three sub phases
 Conceptual Design
 Logical Design
 Physical Design
Design strategies:
1. Top Down
 Start with high level abstraction and refine it.
 High level entities
 Add sub classes
 Attributes
2. Bottom up
 Start with basic abstraction and then combine them.
 Attributes
 Group in to entities
5
Database Design … Cont’d
3. Inside out
 Special case of bottom up.
 Focus on central set of concepts and work out wards.
 No burned on initial designer.
4. Mixed
 Start with top down then use inside out or bottom up.
 Divide and conquer.

6
Conceptual Design
 Is the process of constructing a model of the information used in an
enterprise, independent of any physical consideration.
 Is the source of information for logical design.
 Is high level and understand by non technical user.
 Conceptual model of enterprise, independent of implementation detail
such as target DBMS, application programs, programming language,
hardware platform, performance issues etc.
Tasks to be performed:-
 Identity entity types and relationships.
 Associate attributes with entities.
 Determine attribute domains.
 Determine unique identifier (Key) attributes.
 Use entity relationship model (ER).
7
Conceptual Design … Cont’d
Why conceptual model:
 Independent of DBMS.
 Allow easy common b/n user and developer.
 Is permanent description of the database requirements.
Database requirements
 We must convert written database requirement in to an E-R diagram.
 Need to determine the entities, attributes and relationships.
 Nouns = entities
 Adjectives = attributes
 Verbs = relationships

8
Logical Design
 Is the process of constructing model of data used in an organization.

 Constructing a model based on a specific data model. (E.g. relation, oo)

 Independent of a particular DBMS and other physical consideration.

 Conceptual schema – Logical schema.

 ER Diagram converts to relations.

9
Logical Design … Cont’d
Converting ER Diagram to Relations
 Three basic rules to convert ER into tables.
 For a relation with one to one cardinality
 All the attributes are merged into a single table.
 i.e. primary key or candidate key of one relation is foreign key for the
other.
 For a relation with one to many cardinality
 Post the primary key or candidate key for the “one” side as a foreign
key attribute to the “many side”.
 For a relationship with many to many
 Create a new table (which is the associative entity) and post primary
key or candidate key from each entity as attributes in the new table
along with some additional attributed (if applicable)
10
Logical Design … Cont’d

11
Logical Design … Cont’d

12
Logical Design … Cont’d

13
Logical Design … Cont’d
Mapping Regular Entities to relation
 Simple attributes: ER Attributes map directly on to the relation.
 Composite attribute: Use only their simple, component attributes
 Multi-Valued Attribute: Becomes a separate relation with a foreign
key taken from the super entity.

14
Logical Design … Cont’d

15
Normalization
 A relational database is merely a collection of data, organized in a
particular manner. The father of the relational database approach, Codd
created a series of rules called normal forms that help define that
organization.
 One of the best ways to determine what information should be stored in a
database is to clarify what questions will be asked of it and what data
would be included in the answers.
 Database normalization is a series of steps followed to obtain a database
design that allows for consistent storage and efficient access of data in a
relational database. These steps reduce data redundancy and the risk of
data becoming inconsistent.
 NORMALIZATION is the process of identifying the logical associations
between data items and designing a database that will represent such
associations but without suffering the update anomalies which are;
Insertion, Deletion and Modification Anomalies
16
Normalization … Cont’d
 Normalization may reduce system performance since data will be cross
referenced from many tables.
 Thus DE normalization is sometimes used to improve performance, at
the cost of reduced consistency guarantees.
 Normalization normally is considered as good if it is lossless
decomposition.
 Mnemonic for remembering the rationale for normalization could be
the following:
 No Repeating or Redundancy: no repeating fields in the table
 The Fields Depend Upon the Key: the table should solely depend on the key
 The Whole Key: no partial key dependency
 And Nothing But The Key: no inter data dependency

17
Normalization … Cont’d
 All the normalization rules will eventually remove the update
anomalies that may exist during data manipulation after the
implementation.
 Pitfalls of Normalization
 Requires data to see the problems
 May reduce performance of the system
 Is time consuming,
 Difficult to design and apply and
 Prone to human error

18
Normalization … Cont’d
 The underlying ideas in normalization are simple enough. Through
normalization we want to design for our relational database a set of
tables that;
1. Contain all the data necessary for the purposes that the
database is to serve,
2. Have as little redundancy as possible,
3. Accommodate multiple values for types of data that require
them,
4. Permit efficient updates of the data in the database, and
5. Avoid the danger of losing data unknowingly
 The type of problems that could occur in insufficiently normalized
table is called update anomalies which includes;

19
Normalization … Cont’d
1. Insertion anomalies
 An "insertion anomaly" is a failure to place information about a new database
entry into all the places in the database where information about that new
entry needs to be stored.
 In a properly normalized database, information about a new entry needs to be
inserted into only one place in the database.
 In an inadequately normalized database, information about a new entry may
need to be inserted into more than one place and, human fallibility being what
it is, some of the needed additional insertions may be missed.
2. Deletion anomalies
 A "deletion anomaly" is a failure to remove information about an existing
database entry when it is time to remove that entry.
 In a properly normalized database, information about an old, to-be-gotten-rid-
of entry needs to be deleted from only one place in the database.

20
Normalization … Cont’d
 In an inadequately normalized database, information about that old entry may
need to be deleted from more than one place, and, human fallibility being what
it is, some of the needed additional deletions may be missed.
3. Modification anomalies
 A modification of a database involves changing some value of the attribute of a
table. In a properly normalized database table, what ever information is
modified by the user, the change will be effected and used accordingly.
 The purpose of normalization is to reduce the chances for anomalies to occur in a
database.

21
Normalization … Cont’d
 Deletion Anomalies: If employee with ID 16 is deleted then ever
information about skill C++ and the type of skill is deleted from the
database. Then we will not have any information about C++ and its skill
type.
 Insertion Anomalies: What if we have a new employee with a skill called
Pascal? We can not decide weather Pascal is allowed as a value for skill
and we have no clue about the type of skill that Pascal should be
categorized as.
 Modification Anomalies: What if the address for Helico is changed fro
Piazza to Mexico? We need to look for every occurrence of Helico and
change the value of School_Add from Piazza to Mexico, which is prone to
error.
 Database-management system can work only with the information that we
put explicitly into its tables for a given database and into its rules for
working with those tables, where such rules are appropriate and possible.
22
Functional Dependency (FD)
 Before moving to steps of normalization, it is important to have an
understanding of "functional dependency."
Data Dependency
 The logical association between data items that point the database
designer in the direction of a good database design are referred to as
determinant or dependent relationships.
 Two data items A and B are said to be in a determinant or dependent
relationship if certain values of data item B always appears with
certain values of data item A.
 If the data item A is the determinant data item and B the dependent
data item then the direction of the association is from A to B and not
vice versa.

23
Functional Dependency (FD) … Cont’d
 The essence of this idea is that if the existence of something, call it A, implies
that B must exist and have a certain value, then we say that "B is functionally
dependent on A."
 We also often express this idea by saying that "A determines B," or that "B is
a function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the
statement, "If A, then B.“
 It is important to note that the value B must be unique for a given value of A,
i.e., any given value of A must imply just one and only one value of B, in
order for the relationship to qualify for the name "function." (However, this
does not necessarily prevent different values of A from implying the same
value of B.)
 X Y holds if whenever two tuples have the same value for X, they must
have the same value for Y

24
Functional Dependency (FD) … Cont’d
 The notation is: A B which is read as; B is functionally dependent on A
 In general, a functional dependency is a relationship among attributes. In
relational databases, we can have a determinant that governs one other
attribute or several other attributes.
 FDs are derived from the real-world constraints on the attributes

 Since the type of Wine served depends on the type of Dinner, we say
Wine is functionally dependent on Dinner. Dinner Wine
Since both Wine type and Fork type are determined by
the Dinner type, we say Wine is functionally dependent
on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
25
Partial Dependency
 If an attribute which is not a member of the primary key is dependent
on some part of the primary key (if we have composite primary key)
then that attribute is partially functionally dependent on the primary
key.
 Let {A,B} is the Primary Key and C is no key attribute.
Then if {A,B} C and B C
Then C is partially functionally dependent on {A,B}

26
Full Dependency
 If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key (if we
have composite primary key) then that attribute is fully functionally
dependent on the primary key.

 Let {A,B} is the Primary Key and C is no key attribute

27
Transitive Dependency
 In mathematics and logic, a transitive relationship is a relationship of
the following form: "If A implies B, and if also B implies C, then A
implies C."

28
Steps of Normalization
 We have various levels or steps in normalization called Normal Forms.
 The level of complexity, strength of the rule and decomposition increases
as we move from one lower level Normal Form to the higher.
 A table in a relational database is said to be in a certain normal form if it
satisfies certain constraints.
 Normal form below represents a stronger condition than the previous one.
 Normalization towards a logical design consists of the following steps:
 Un Normalized Form: Identify all data elements
 First Normal Form: Find the key with which you can find all data
 Second Normal Form: Remove part-key dependencies. Make all data
dependent on the whole key.
 Third Normal Form: Remove non-key dependencies. Make all data
dependent on nothing but the key.
 For most practical purposes, databases are considered normalized if they
29 adhere to third normal form.
UNNORMALIZED FORM (UNF)
 A table that contains one or more repeating groups.
 A repeating group is a field or group of fields that hold multiple values
for a single occurrence of a field.

Repeating group= (Skill, SkillType, School, SchoolAdd, SkillLevel)

30
First Normal Form (1NF)
 Requires that all column values in a table are atomic (e.g., a number is
an atomic value, while a list or a set is not).
 We have tow ways of achieving this:
 1. Putting each repeating group into a separate table and
connecting them with a primary key-foreign key relationship
 2. Moving this repeating groups to a new row by repeating the
common attributes. If so then Find the key with which you can find
all data
 Definition of a table (relation) in 1NF if:
 There are no duplicated rows in the table. Unique identifier
 Each cell is single-valued (i.e., there are no repeating groups).
 Entries in a column (attribute, field) are of the same kind.

31
First Normal Form (1NF) … Cont’d
 FIRST NORMAL FORM (1NF): Remove all repeating groups.
 Distribute the multi-valued attributes into different rows and identify
a unique identifier for the relation so that is can be said is a relation in
relational database.

32
First Normal Form (1NF) … Cont’d
 Example 2: Consider the following UNF relation.

Here , Tele and fax fields are multi-valued


- To change in to 1NF relation, we need to split the table in to three
- The following tables are equivalent 1st Normal form of the above employee
table:

33
Second Normal form 2NF
 No partial dependency of a non key attribute on part of the primary
key.
 Any table that is in 1NF and has a single-attribute (i.e., a non-
composite) key is automatically also in 2NF.
 Definition of a table (relation) in 2NF
 It is in 1NF and
 If all non-key attributes are dependent on all of the key. i.e. no
partial dependency.
 Since a partial dependency occurs when a non-key attribute is
dependent on only a part of the (composite) key, the definition of
2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and
if it has no partial dependencies."

34
Second Normal form 2NF … Cont’d
 Example for 2NF:

 This schema is in its 1NF since we don’t have any repeating groups or
attributes with multi-valued property. To convert it to a 2NF we need to
remove all partial dependencies of non key attributes on part of the
primary key.
 {EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund,
ProjMangID
 But in addition to this we have the following dependencies
EmpID EmpName
ProjNo ProjName, ProjLoc, ProjFund, ProjMangID

35
Second Normal form 2NF … Cont’d
 As we can see some non key attributes are partially dependent on
some part of the primary key. Thus these collections of attributes
should be moved to a new relation.

36
Second Normal form 2NF … Cont’d
• Example 2: Normalize the following relation.

• The primary key for this table is the composite key (PatientId,
RelativeId).

37
Second Normal form 2NF … Cont’d
 So, to determine if it satisfies 2NF, you have to find out if all other
fields in it depend fully on both PatientId and RelativeId; that is,you
need to decide whether the following conditions are true:
 (PatientId, RelativeId) Relationship and
 (PatientId, RelativeId) Patient_tel.
 However, on the dependencies in the patient table, only the following
are true:
 (PatientId, RelativeId) Relationship and
 (PatientId) Patient_tel.

Therefore; based on the above dependency the normalized relation will be


divided into to tables.

38
Second Normal form 2NF … Cont’d

39
Third Normal Form (3NF )
 Eliminate Columns Not Dependent On Key - If attributes do not contribute to
a description of the key, remove them to a separate table.
 This level avoids update and delete anomalies.
 Definition of a Table (Relation) in 3NF
 It is in 2NF and
 There are no transitive dependencies between attributes.
 Example for (3NF): Assumption: Students of same batch (same year) live in
one building or dormitory

• This schema is in its 2NF since the primary key is a single attribute.
40
Third Normal Form (3NF ) … Cont’d

41
Third Normal Form (3NF ) … Cont’d
 Consider the following example:

Now, PK = empid
 We have functional dependencies:
 Empid → depid
 Depid → depname
 Or Depid → depbudjet
 Therefore, the above table is not is 3NF. To normalize it, we can use the
functional dependencies:
 Depid → depname
 Depid → depbudjet And
 Empid → depid
42
Third Normal Form (3NF ) … Cont’d
 So that the resulting tables are the following:

43
Other Normal Forms
 Boyce-Codd Normal Form (BCNF): Isolate Independent Multiple
Relationships - No table may contain two or more 1:n or N:M
relationships that are not directly related. The correct solution, to cause
the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent.
 Def.: A table is in BCNF if it is in 3NF and if every determinant is a
candidate key.
 Forth Normal form (4NF): Isolate Semantically Related Multiple
Relationships - There may be practical constrains on information that
justify separating logically related many-to-many relationships.
 Def.: A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.

44
Other Normal Forms … Cont’d
 Fifth Normal Form (5NF): A model limited to only simple
(elemental) facts.
 Def.: A table is in 5NF, also called "Projection-Join Normal
Form" (PJNF), if it is in 4NF and if every join dependency in the
table is a consequence of the candidate keys of the table.
 Domain-Key Normal Form (DKNF): A model free from all
modification anomalies.
 Def.: A table is in DKNF if every constraint on the table is a
logical consequence of the definition of keys and domains.

45
Physical Database Design … Cont’d
 The Logical database design is concerned with the what;
 The Physical database design is concerned with the how.
 Physical database design is the process of producing a description of the
implementation of the database on secondary storage.
 It describes the base relations, file organization, and indexes used to
achieve effective access to the data along with any associated integrity
constraints and security measures.
 Sources of information for the physical design process include global
logical data model and documentation that describes model.
 Knowledge of the DBMS that is selected to host the database systems,
with all its functionalities, is required since functionalities of current
DBMS vary widely.

46
Steps in physical database design
1. Translate logical data model for target DBMS
 To determine the file organizations and access methods that will be
used to store the base relations; i.e. the way in which relations and
tuples will be held on secondary storage
 Design enterprise constraints for target DBMS
 This phase is the translation of the global logical data model to produce
a relational database schema in the target DBMS. This includes creating
the data dictionary based on the logical model and information
gathered.
 After the creation of the data dictionary, the next activity is to
understand the functionality of the target DBMS so that all necessary
requirements are fulfilled for the database intended to be developed.

47
Steps in physical database design … Cont’d
 Knowledge of the DBMS includes:
 how to create base relations
 whether the system supports:
 definition of Primary key
 definition of Foreign key
 definition of Alternate key
 definition of Domains
 Referential integrity constraints
 definition of enterprise level constraints
 Some tasks to be done:
 1.1. Design base relation
 1.2. Design representation of derived data
 1.3. Design enterprise constraint

48
Steps in physical database design … Cont’d
1.1. Design base relation
 Designing base relation involves identification of all necessary requirements
about a relation starting from the name up to the referential integrity constraints.
 The implementation of the physical model is dependent on the target DBMS
since some has more facilities than the other in defining database definitions.
 The base relation design along with every justifiable reason should be fully
documented.
1.2. Design representation of derived data
 While analyzing the requirement of users, we may encounter that there are some
attributes holding data that will be derived from existing or other attributes. A
decision on how to represent such data should be devised.
 Most of the time derived attributes are not expressed in the logical model but
will be included in the data dictionary. Whether to store stored attributes in a
base relation or calculate them when required is a decision to be made by the
designer considering the performance impact.
49
Steps in physical database design … Cont’d
1.3. Design enterprise constraint
 Data in the database is not only subjected to constraints on the
database and the data model used but also with some enterprise
dependent constraints.
 This constraint definition is also dependent on the DBMS selected
and enterprise level requirements.
 All the enterprise level constraints and the definition method in the
target DBMS should be fully documented.

50
Steps in physical database design … Cont’d
2. Design physical representation
This phase is the level for determining the optimal file organizations to store the
base relations and indexes that are required to achieve acceptable performance,
that is, the way in which relations and tuples will be held on the secondary
storage.
 2.1. Analyze transactions
To understand the functionality of the transactions that will run on the
database and to analyze the important transactions
 2.2. Choose file organization
To determine an efficient file organization for each base relation
 2.3. Choose indexes
Used for quick access
 2.4. Estimate disk space and system requirement
To estimate the amount of disk space that will be required by the database.
51
Steps in physical database design … Cont’d
3. Design user view
To design the user views that were identified in the conceptual
database design methodology
4. Design security mechanisms
5. Consider controlled redundancy
 To determine whether introducing redundancy in a controlled
manner by relaxing the normalization rules will improve the
performance of the system.
6. Monitor and tune the operational system
 To design the access rules to the base relations and user views

52
Chapter 8
Record Storage and primary File Organization
Chapter Outline
 Disk Storage Devices
 Files of Records
 Operations on Files
 Unordered Files
 Ordered Files
 Hashed Files
 Dynamic and Extendible Hashing Techniques
 RAID Technology

Chapter 13-2
Disk Storage Devices (cont.)
 Preferred secondary storage device for high storage capacity and low
cost.

 Data stored as magnetized areas on magnetic disk surfaces.

 A disk pack contains several magnetic disks connected to a rotating


spindle.

 Disks are divided into concentric circular tracks on each disk surface.
Track capacities vary typically from 4 to 50 Kbytes.

Chapter 13-3
Disk Storage Devices (cont.)
Because a track usually contains a large amount of information, it is
divided into smaller blocks or sectors.
 The division of a track into sectors is hard-coded on the disk surface
and cannot be changed. One type of sector organization calls a portion
of a track that subtends a fixed angle at the center as a sector.

 A track is divided into blocks. The block size B is fixed for each
system. Typical block sizes range from B=512 bytes to B=4096 bytes.
Whole blocks are transferred between disk and main memory for
processing.

Chapter 13-4
Disk Storage Devices (cont.)

Chapter 13-5
Disk Storage Devices (cont.)
 A read-write head moves to the track that contains the block to be
transferred. Disk rotation moves the block under the read-write head
for reading or writing.
 A physical disk block (hardware) address consists of a cylinder
number (imaginery collection of tracks of same radius from all
recoreded surfaces), the track number or surface number (within the
cylinder), and block number (within track).
 Reading or writing a disk block is time consuming because of the seek
time s and rotational delay (latency) rd.
 Double buffering can be used to speed up the transfer of contiguous
disk blocks.

Chapter 13-6
Disk Storage Devices (cont.)

Chapter 13-7
Typical Disk
Parameters

Chapter 13-8
Records
 Fixed and variable length records
 Records contain fields which have values of a particular type (e.g.,
amount, date, time, age)
 Fields themselves may be fixed length or variable length
 Variable length fields can be mixed into one record: separator characters
or length fields are needed so that the record can be “parsed”.

Chapter 13-9
Blocking
 Blocking: refers to storing a number of records in one blo ck on the disk.
 Blocking factor (bfr) refers to the number of records per block.
 There may be empty space in a block if an integral number of records do
not fit in one block.
 Spanned Records: refer to records that exceed the size of one or more
blocks and hence span a number of blocks.

Chapter 13-10
Files of Records
 A file is a sequence of records, where each record is a collection of
data values (or data items).

 A file descriptor (or file header ) includes information that describes


the file, such as the field names and their data types, and the addresses
of the file blocks on disk.

 Records are stored on disk blocks. The blocking factor bfr for a file is
the (average) number of file records stored in a disk block.

 A file can have fixed-length records or variable-length records.

Chapter 13-11
Files of Records (cont.)
 File records can be unspanned (no record can span two blocks) or
spanned (a record can be stored in more than one block).
 The physical disk blocks that are allocated to hold the records of a file
can be contiguous, linked, or indexed.
 In a file of fixed-length records, all records have the same format.
Usually, unspanned blocking is used with such files.
 Files of variable-length records require additional information to be
stored in each record, such as separator characters and field types.
Usually spanned blocking is used with such files.

Chapter 13-12
Operation on Files
Typical file operations include:
 OPEN: Readies the file for access, and associates a pointer that will
refer to a current file record at each point in time.
 FIND: Searches for the first file record that satisfies a certain
condition, and makes it the current file record.
 FINDNEXT: Searches for the next file record (from the current
record) that satisfies a certain condition, and makes it the current file
record.
 READ: Reads the current file record into a program variable.
 INSERT: Inserts a new record into the file, and makes it the current
file record.

Chapter 13-13
Operation on Files (cont.)
 DELETE: Removes the current file record from the file, usually by
marking the record to indicate that it is no longer valid.
 MODIFY: Changes the values of some fields of the current file
record.
 CLOSE: Terminates access to the file.
 REORGANIZE: Reorganizes the file records. For example, the
records marked deleted are physically removed from the file or a new
organization of the file records is created.
 READ_ORDERED: Read the file blocks in order of a specific field
of the file.

Chapter 13-14
Unordered Files
 Also called a heap or a pile file.

 New records are inserted at the end of the file.

 To search for a record, a linear search through the file records is


necessary. This requires reading and searching half the file blocks on
the average, and is hence quite expensive.
 Record insertion is quite efficient.
 Reading the records in order of a particular field requires sorting the
file records.

Chapter 13-15
Ordered Files
 Also called a sequential file.
 File records are kept sorted by the values of an ordering field.
 Insertion is expensive: records must be inserted in the correct order. It
is common to keep a separate unordered overflow (or transaction ) file
for new records to improve insertion efficiency; this is periodically
merged with the main ordered file.
 A binary search can be used to search for a record on its ordering field
value. This requires reading and searching log2 of the file blocks on the
average, an improvement over linear search.
 Reading the records in order of the ordering field is quite efficient.

Chapter 13-16
Ordered Files
(cont.)

Chapter 13-17
Average Access Times
The following table shows the average access time to access a specific
record for a given type of file

Chapter 13-18
Hashed Files
 Hashing for disk files is called External Hashing
 The file blocks are divided into M equal-sized buckets, numbered
bucket0, bucket1, ..., bucket M-1. Typically, a bucket corresponds to one
(or a fixed number of) disk block.
 One of the file fields is designated to be the hash key of the file.
 The record with hash key value K is stored in bucket i, where i=h(K),
and h is the hashing function.
 Search is very efficient on the hash key.
 Collisions occur when a new record hashes to a bucket that is already
full. An overflow file is kept for storing such records. Overflow records
that hash to each bucket can be linked together.

Chapter 13-19
Hashed Files (cont.)
There are numerous methods for collision resolution, including the following:
 Open addressing: Proceeding from the occupied position specified by the hash
address, the program checks the subsequent positions in order until an unused
(empty) position is found.

 Chaining: For this method, various overflow locations are kept, usually by
extending the array with a number of overflow positions. In addition, a pointer
field is added to each record location. A collision is resolved by placing the
new record in an unused overflow location and setting the pointer of the
occupied hash address location to the address of that overflow location.

 Multiple hashing: The program applies a second hash function if the first
results in a collision. If another collision results, the program uses open
addressing or applies a third hash function and then uses open addressing if
necessary.

Chapter 13-20
Hashed Files (cont.)

Chapter 13-21
Hashed Files (cont.)
 To reduce overflow records, a hash file is typically kept 70-80% full.
 The hash function h should distribute the records uniformly among
the buckets; otherwise, search time will be increased because many
overflow records will exist.
 Main disadvantages of static external hashing:
- Fixed number of buckets M is a problem if the number of records
in the file grows or shrinks.
- Ordered access on the hash key is quite inefficient (requires
sorting the records).

Chapter 13-22
Hashed Files - Overflow handling

Chapter 13-23
Dynamic And Extendible Hashed Files
Dynamic and Extendible Hashing Techniques

 Hashing techniques are adapted to allow the dynamic growth and


shrinking of the number of file records.

 These techniques include the following: dynamic hashing , extendible


hashing , and linear hashing .

 Both dynamic and extendible hashing use the binary representation of


the hash value h(K) in order to access a directory. In dynamic hashing
the directory is a binary tree. In extendible hashing the directory is an
array of size 2d where d is called the global depth.

Chapter 13-24
Dynamic And Extendible Hashing (cont.)

 The directories can be stored on disk, and they expand or shrink


dynamically. Directory entries point to the disk blocks that contain the
stored records.

 An insertion in a disk block that is full causes the block to split into
two blocks and the records are redistributed among the two blocks.
The directory is updated appropriately.

 Dynamic and extendible hashing do not require an overflow area.

 Linear hashing does require an overflow area but does not use a
directory. Blocks are split in linear order as the file expands.

Chapter 13-25
Extendible
Hashing

Chapter 13-26
Parallelizing Disk Access using RAID Technology.

 Secondary storage technology must take steps to keep up in performance


and reliability with processor technology.

 A major advance in secondary storage technology is represented by the


development of RAID, which originally stood for Redundant Arrays of
Inexpensive Disks.

 The main goal of RAID is to even out the widely different rates of
performance improvement of disks against those in memory and
microprocessors.

Chapter 13-27
RAID Technology (cont.)
 A natural solution is a large array of small independent disks acting as
a single higher-performance logical disk. A concept called data
striping is used, which utilizes parallelism to improve disk
performance.
 Data striping distributes data transparently over multiple disks to
make them appear as a single large, fast disk.

Chapter 13-28
RAID Technology (cont.)
Different raid organizations were defined based on different combinations of the
two factors of granularity of data interleaving (striping) and pattern used to
compute redundant information.
 Raid level 0 has no redundant data and hence has the best write performance.
 Raid level 1 uses mirrored disks.
 Raid level 2 uses memory-style redundancy by using Hamming codes, which
contain parity bits for distinct overlapping subsets of components. Level 2
includes both error detection and correction.
 Raid level 3 uses a single parity disk relying on the disk controller to figure out
which disk has failed.
 Raid Levels 4 and 5 use block-level data striping, with level 5 distributing data
and parity information across all disks.
 Raid level 6 applies the so-called P + Q redundancy scheme using Reed-
Soloman codes to protect against up to two disk failures by using just two
redundant disks.

Chapter 13-29
Use of RAID Technology (cont.)

Different raid organizations are being used under different situations


 Raid level 1 (mirrored disks)is the easiest for rebuild of a disk from other
disks
 It is used for critical applications like logs
 Raid level 2 uses memory-style redundancy by using Hamming codes, which
contain parity bits for distinct overlapping subsets of components. Level 2
includes both error detection and correction.
 Raid level 3 ( single parity disks relying on the disk controller to figure out
which disk has failed) and level 5 (block-level data striping) are preferred for
Large volume storage, with level 3 giving higher transfer rates.
 Most popular uses of the RAID technology currently are: Level 0 (with
striping), Level 1 (with mirroring) and Level 5 with an extra drive for parity.
 Design Decisions for RAID include – level of RAID, number of disks, choice
of parity schemes, and grouping of disks for block-level striping.

Chapter 13-30
Use of RAID
Technology (cont.)

Chapter 13-31
Trends in Disk Technology

Chapter 13-32
Storage Area Networks
 The demand for higher storage has risen considerably in recent times.
 Organizations have a need to move from a static fixed data center
oriented operation to a more flexible and dynamic infrastructure for
information processing.
 Thus they are moving to a concept of Storage Area Networks (SANs).
In a SAN, online storage peripherals are configured as nodes on a high-
speed network and can be attached and detached from servers in a very
flexible manner.
 This allows storage systems to be placed at longer distances from the
servers and provide different performance and connectivity options.

Chapter 13-33
Storage Area Networks (contd.)
Advantages of SANs are:

 Flexible many-to-many connectivity among servers and storage


devices using fiber channel hubs and switches.
 Up to 10km separation between a server and a storage system using
appropriate fiber optic cables.
 Better isolation capabilities allowing nondisruptive addition of new
peripherals and servers.

 SANs face the problem of combining storage options from multiple


vendors and dealing with evolving standards of storage management
software and hardware.

Chapter 13-34

You might also like