Professional Documents
Culture Documents
College of Informatics
1 2/24/2023
Introduction
Data: What is data?
Facts concerning people, objects, events or other entities.
Can be in the form of text, graphics, sound and video segments
They are difficult to interpret or make decisions based on
Unprocessed, raw facts and can be stored in database
2 2/24/2023
Introduction … cont’d
➢ Database: What is database?
➢ An organized collection of logically related data.
3 2/24/2023
Introduction …. cont’d
A database has the following implicit properties:
A database represents some aspect of the real world,
sometimes called the mini world or the Universe of Discourse
(UoD).
Changes to the mini world are reflected in the database.
4 2/24/2023
Introduction …. cont’d
Data
Course Section Semester Name Rank
5 2/24/2023
Introduction … cont’d
➢ Meta Data: What do we mean by meta data?
➢ Descriptions of the properties or characteristics of the data,
including data types, field sizes, allowable values, and
documentation
➢ Data that describes data
➢ Data about data
➢ Description of fields
➢ Display and format instructions
➢ Structure of files and tables
➢ Security and access rules
➢ Triggers and operational rules
6 2/24/2023
Introduction …. cont’d
Metadata
Data Item Value
7 2/24/2023
Data management approaches
Data management : keeping your data records
We have three approaches
Manual Approach
File-Based Approach
Database Approach
Manual File Handling Systems
The primitive and traditional way of information handling
This may work well if the number of items to be stored is small.
Includes intensive human labor
Events and objects are written on files (paper)
Each of the files containing various kinds of information is labeled and
stored in one or more cabinets
The cabinets could be kept in safe places for security
8 2/24/2023
Manual File Handling Systems ..cont’d
Limitations of Manual File Handling
Problem of Data Organization
Problem of Efficiency
Prone to error
Difficult to update, retrieve, integrate
You have the data but it is difficult to compile the information
Significant amount of duplication of data
Cross referencing is difficult
10 2/24/2023
Limitations of File-Based systems
Data Redundancy (Duplication of data)
Same data is held by different programs
Staffsalary(staffno, name, sex, salary)
Staff(staffno,name,position,sex,dateofb,salary)
Wasted space (Uncontrolled duplication of data)
Separation and isolation of data
– Each program maintains its own set of data. Users of one program
may be unaware of potentially useful data held by other programs.
Limited data sharing- No centralized control of data
Data Inconsistency and confusion
Data dependence
File structure is defined in the program code and is dependent on
the application programming language.
11 2/24/2023
Limitations of File-Based systems .. Cont’d
Incompatible file formats - Lack of data sharing and availability)
Programs are written in different languages, and so cannot easily
access each others files.
E.g. Personnel write in C, Payroll writes in COBOL
Poor Security and administration
Update Anomalies
Modification Anomalies
Deletion Anomalies
Insertion Anomalies
12 2/24/2023
Database Approach
Here, a single repository of data is maintained.
What emerged were the database and database management systems
Basic Database terminologies
Enterprise: an organization like library, bank, university, etc.
Entity: Person, place, thing, or event about which we wish to keep
data
Attribute (Field): Property of an entity. E.g. Name, age,
telephone, grade, sex, etc.
Record: A logically connected set of one or more Attributes that
describe a person, place or thing. (Logically related data)
File: A collection of related records. E.g. Student file
Relationship: an association among entities (entity records)
Query: question asked for database
13 2/24/2023
Benefits of Database systems
Data can be shared: two or more users can access and use.
Improved data accessibility: By using structured query languages,
the users can easily access data without programming experience.
Redundancy can be reduced: Isolated data is integrated in
database.
Quality data can be maintained: the different integrity constraints
in the database approach will maintain the quality leading to better
decision making.
Inconsistency can be avoided: controlled data redundancy will
avoid inconsistency of the data in the database to some extent.
Transaction support can be provided: basic demands of any
transaction support systems are implanted in a full scale DBMS.
14 2/24/2023
Benefits of Database systems … cont’d
Integrity can be maintained: Data at different applications will be
integrated together with additional constraints.
Security measures can be enforced: The shared data can be secured
by data security mechanisms.
Improved decision support: the database will provide information
useful for decision making
Standards can be enforced: ways of using different data by users
Less Labor: data maintenance will not demand much resource
Centralized information control: Since relevant data in the
organization will be stored at one repository, it can be controlled and
managed at the central level.
Data Independence - Applications insulated from how data is
structured and stored
15 2/24/2023
Limitations and risk of database approach
Introduction of new professional and specialized personnel
High cost to be incurred to develop and maintain the system
Complex backup and recovery services from the users
perspective
High impact on the system when failure occurs to the central
system
16 2/24/2023
Users and Actors of Database System
Actors on the scene: The people whose jobs involve the day-to-day
use of a large database
Workers behind the scene: Those who work to maintain the
database system environment, but who are not actively interested in
the database itself.
17 2/24/2023
Database Administrators
In a database environment, the primary resource is the database itself
and the secondary resource is the DBMS and related software.
Administering these resources is the responsibility of the Database
Administrator (DBA).
The DBA is responsible for authorizing access to the database, for
coordinating and monitoring its use, and for acquiring software and
hardware resources as needed.
The DBA is accountable for problems such as breach of security or
poor system response time.
18 2/24/2023
Database Designer
Database designers are responsible for identifying the data to be stored
in the database and for choosing appropriate structures to represent and
store this data.
It is the responsibility of database designers to communicate with all
prospective database users, in order to understand their requirements,
and to come up with a design that meets these requirements.
In many cases, the designers are on the staff of the DBA and may be
assigned other staff responsibilities after the database design is
completed.
The final database design must be capable of supporting the
requirements of all user groups.
19 2/24/2023
End Users
End users are the people whose jobs require access to the database for
querying, updating, and generating reports;
The database primarily exists for their use. There are several categories
of end users:
Casual end users:- occasionally access the database. They are
typically middle or high-level managers or other occasional
browsers.
Naive or parametric end users:- Their main job revolves around
constantly querying and updating the database, using standard types
of queries and updates called canned transactions that have been
carefully programmed and tested.
Bank tellers check account balances and post withdrawals and deposits
Reservation clerks for airlines, hotels, and car rental companies check
availability for a given request and make reservations
20 2/24/2023
End Users … Cont’d
Sophisticated end users: Include engineers, scientists, business
analysts, and others who thoroughly familiarize themselves with
the facilities of the DBMS so as to implement their applications to
meet their complex requirements.
Stand-alone users: Maintain personal databases by using ready
made program packages that provide easy to use menu or graphics
based interfaces.
An example is the user of a tax package that stores a variety of
personal financial data for tax purposes.
21 2/24/2023
System Analysts and Application
Programmers (Software Engineers)
System analysts: Determine the requirements of end users, especially
naive and parametric end users, and develop specifications for canned
transactions that meet these requirements.
Application programmers implement these specifications as programs;
then they test, debug, document, and maintain these canned
transactions.
Such analysts and programmers (nowadays called software engineers)
should be familiar with the full range of capabilities provided by the
DBMS to accomplish their tasks.
22 2/24/2023
Workers behind the Scene
These persons are typically not interested in the database itself.
These include:
DBMS system designers and implementers:-are persons who
design and implement the DBMS modules and interfaces as a
software package.
A DBMS is a complex software system that consists of many
components or modules, including modules for implementing the
catalog, query language, interface processors, data access,
concurrency control, recovery, and security.
The DBMS must interface with other system software, such as the
operating system and compilers for various programming
languages
23 2/24/2023
Workers behind the Scene … cont’d
Tool developers: Include persons who design and implement tools
Tools are software packages that facilitate database system design and
use, and help improve performance.
Tools are optional packages that are often purchased separately.
They include packages for database design, performance monitoring,
natural language or graphical interfaces, prototyping, simulation, and
test data generation.
Operators and maintenance personnel: are the system
administration personnel who are responsible for the actual running
and maintenance of the hardware and software environment for the
database system.
24 2/24/2023
Some Common uses of Databases
In a university
Containing information about a student, the course she/he is enrolled
in, the dormitory she/he has been given.
Containing details of Staff who work at the university at personnel,
payroll, etc.
In a library
There may be a database containing details of the books in the library
and details of the users,
The database system handles activities such as
Allowing a user to reserve a book
Notifying when materials are overdue:
25 2/24/2023
Some Common uses of Databases … Cont’d
In travel agencies
When you make inquiries about a travel, the travel agent may access
databases containing flight details
Flight no., date, time of departure, time of arrival
Insurance
When you wish to take out insurance, there is database containing
Your personal details: name, address, age
information on whether you drink or smoke,
Your medical records to determine the cost of the insurance
Supermarkets
When you buy goods from some supermarkets, a database will be accessed.
The checkout assistant will run a barcode reader over the purchases.
26 2/24/2023
27 2/24/2023
Chapter 2
1
Outlines
Classification of DBMS
2
Schemas, Instances and Database State
Schema
A schema is a description of a particular collection of data, using a
given data model
It is a definition of database
3
Schemas, Instances, and Database State …
Cont’d
Schema is specified during database design and is not expected to
change frequently.
A displayed schema is called a schema diagram
Even if it is not common to change the database schema. Some times
there is a need of change of data base state. It is called schema
evolution.
We have three Schemas
1. Internal Schema:- To describe Physical storage structures and access
path, typically uses a physical data model
2. Conceptual Schema: - To describe the structure and constraints for the
whole data base for a community of users uses a conceptual or an
implementation data model.
3. External Schema:- To describe the various user views.
4
Schemas, Instances, and Database State …
Cont’d
Database State
The data in the database at a particular moment in time is called a
database state or snapshot.
It is also called the current set of occurrences or instances in the
database.
In a given database state, each schema construct has its own current set
of instances: for example, the STUDENT construct will contain the set
of individual student entities (records) as its instances.
Every time we insert or delete a record, or change the value of a data
item in a record, we change one state of the database into another state.
5
Schemas, Instances, and Database State …
Cont’d
When we define a new database, we specify its database schema only to
the DBMS. At this point, the corresponding database state is the empty
state with no data.
We get the initial state of the database when the database is first
populated or loaded with the initial data.
From then on, every time an update operation is applied to the database,
we get another database state. At any point in time, the database has a
current state
The DBMS is partly responsible for ensuring that every state of the
database is a valid state i.e, a state that satisfies the structure and
constraints specified in the schema.
6
DBMS
What is DBMS?
DBMS is a software package used for providing efficient, convenient
and safe multi-user storage of and access to massive amounts of
persistent data.
A DBMS also provides a systematic method for creating, updating,
storing, retrieving data in a database.
DBMS also provides the service of controlling data access, enforcing
data integrity, managing concurrency control, and recovery.
A full scale DBMS should at least have the following services to
provide to the user.
Data storage, retrieval and update in the database
A user accessible catalogue
Transaction support service
7
DBMS … Cont’d
Concurrency Control Services: access of database by different
users simultaneously
Recovery Services: a mechanism for recovering from failure
Authorization Services (Security): support the access authorization
Support for Data Communication: support data transfer
Integrity Services: rules about data and the change that took place
on the data, correctness and consistency of stored data
Services to promote data independency between the data and the
application
Utility services: sets of utility service facilities like
Importing data
Statistical analysis support
Index reorganization
Garbage collection
8
DBMS Language
1. Data Definition Language (DDL)
Language used to define each data element required by the
organization
Commands for setting up schema of the database
Used to set up a database, create, delete and alter table with the
facility of handling constraints
Is used to define the internal and external schema
2. Data Manipulation Language (DML)
Used for data manipulation
Typical manipulations include retrieval, insertion, deletion, and
modification of the data.
Since the required data or query by the user will be extracted using this
type of language, it is also called “Query Language”
9
DBMS Language … Cont’d
We have two types of DMLs:-
Non-Procedural Manipulation Languages
That allows the user to state what data is needed rather than how
it is to be retrieved.
E.g. SQL
Procedural Data Manipulation Languages
That allows the user to tell the system what data is needed and
exactly how to retrieve the data;
10
DBMS Language … Cont’d
How the Programmer Sees the DBMS
Start with DDL to create tables
CREATE TABLE Students (
Name CHAR (30)
ID CHAR (9) PRIMARY KEY NOT NULL,
Category CHAR (20)) . . .
Continue with DML to populate tables:
INSERT INTO Students
VALUES (‘Rahel’, ‘ICT 123’, ‘undergraduate’)
3. Data dictionary
The data dictionary contains definitions of objects in the system such
as tables and table relationships and rules defined on objects.
11
Database Development Life Cycle
The major steps in database design are;
1. Planning: That is identifying information gap in an organization and
propose a database solution to solve the problem.
2. Analysis: That concentrates more on fact finding about the problem
or the opportunity.
Feasibility analysis, requirement determination and structuring, and
selection of best design method are also performed at this phase.
3. Design: in database designing more emphasis is given to this phase.
The phase is further divided into three sub-phases.
1. Conceptual Design: concise description of the data, data
type, relationship between data and constraints on the data.
Used to elicit and structure all information requirements
12
Database Development Life Cycle … Cont’d
2. Logical Design: a higher level conceptual abstraction with selected
specific data model to implement the data structure.
It is particular DBMS independent and with no other physical
considerations.
3. Physical Design: physical implementation of the upper level design of
the database with respect to internal storage and file structure to
develop all technology and organizational specification.
4. Implementation: coding, testing and deployment of the designed
database for use.
5. Operation and Support: administering and maintaining the operation
of the database system and providing support to users.
13
DBMS Architecture and Data Independence
A major aim of a database system is to provide users with an abstract
view of data, hiding certain details of how data is stored and
manipulated.
Since a database is a shared resource, each user may require a different
view of the data held in the database. Accordingly there are several
types of architectures of database systems.
The American National Standards Institute/Standards Planning and
Requirements Committee (ANSI-SPARC) also introduced the three
level architecture of the database based on their degree of abstraction.
The architecture is consists of the three levels: internal level,
conceptual level and external level.
In this architecture, schemas can be defined at three levels
The goal of the three-schema architecture is to separate the user
applications and the physical database.
14
DBMS Architecture … Cont’d
1. The Internal level:
Has an internal schema, which describes the physical storage structure of
the database.
The internal schema uses a physical data model and describes the
complete details of data storage and access paths for the database.
It describes the physical representation of the database on the computer.
This level describes how the data is stored in the database.
The way the DBMS and OS perceive the data
The internal level is concerned with such things as:
Storage space allocation for data
Record description for storage
Record placement
15
DBMS Architecture ... Cont’d
2. The conceptual level
Has a conceptual schema, which describes the structure of the whole
database for a community of users.
The conceptual schema hides the details of physical storage structures and
concentrates on describing entities, data types, relationships, user
operations, constraints, security and integrity information.
A high-level data model or an implementation data model can be used at
this level.
The community view of the database.
This level describes what data is stored in the database and the
relationships among the data.
It is a complete view of the data requirements of the organization.
16
DBMS Architecture … Cont’d
3. The External Level
Includes a number of external schemas or user views.
Each external schema describes the part of the database that a particular
user group is interested in and hides the rest of the database from that
user group.
The users’ view of the database.
The way users perceive the data
Describe part of the database that is relevant to each user.
Each user has a view of the real world in different way. For example:
dates may be viewed in (day, month, year) or (year, month, day)
Entities, attributes or relationships that are not of interest to the users
may still be represented in the database, but the users will be unaware of
them.
17
DBMS Architecture … Cont’d
ANSI-SPARC Architecture and Database Design Phases
18
DBMS Architecture … Cont’d
Sno. fname lname age Sal. Staff_no lname Br.no.
External
level
Struct STAFF
{
Int staff_no;
Char fname[15];
Internal level Char lname[15];
Struct date date_of_birth;
Flooat sal;
Struct staff *next
};
19
DBMS Architecture … Cont’d
Example of data abstraction
20
Data Independence
The three-schema architecture can be used to explain the concept of data
independence.
Data independence is defined as the capacity to change the schema at
one level of a database system without having to change the schema at
the next higher level.
We can define two types of data independence:
Physical data independence
Logical data independence
21
Data Independence … Cont’d
1. Logical data independence
Is the capacity to change the conceptual schema without having to
change external schemas or application programs.
We may change the conceptual schema to expand the database (by
adding a record type or data item), or to reduce the database (by
removing a record type or data item).
Only the view definition and the mappings need be changed in a
DBMS that supports logical data independence.
Modifications at the logical level are necessary whenever the logical
structure of the database is altered (for example, when money
market accounts are added to a banking system).
22
Data Independence … Cont’d
2. Physical data independence
Is the capacity to change the internal schema without having to change
the conceptual (or external) schemas.
Changes to the internal schema may be needed because some physical
files had to be reorganized.
Modifications at the physical level are occasionally necessary to
improve performance.
23
Chapter 3
Database Models
1
Outlines
Introduction to Modeling
Relational Model
2
Database Models
A specific DBMS has its own specific Data Definition Language,
but this type of language is too low level to describe the data
requirements of an organization in a way that is readily
understandable by a variety of users.
We need a higher-level language.
Such a higher-level is called data-model.
A data model is a collection of tools or concepts for describing
data, the meaning of data, data relationships, and data
constraints.
A model is a representation of real world objects and events and
their associations.
The main purpose of Data Model is to represent the data in
understandable way.
3
Database Models … Cont’d
A database model is a type of data model that determines the
logical structure of a database and fundamentally determines in
which manner data can be stored, organized, and manipulated.
4
Hierarchical Model
Consists of an ordered set of trees in a parent child mode.
A parent node can have more than one child node and a child node
should have only one parent
Connection between child and its parent is called a Link.
The simplest data model
Record type is referred to as node or segment
The top node is the root node
The relationship between parent and child can be either 1-1 or 1-M
To add new record type or relationship, the database must be
redefined and then stored in a new form.
5
Hierarchical Model… Cont’d
6
Hierarchical Model… Cont’d
Advantage of Hierarchical Model
Good for tree type problem (e.g. Family Tree Problem)
Language is simple; uses constructs like GET, GET UNIQUE, GET
NEXT, GET NEXT WITHIN PARENT etc.
7
Network Model
Allows record types to have more that one parent unlike hierarchical
model
A network data models sees records as set members
Each set has an owner and one or more members
Allow no many to many relationship between entities
Like hierarchical model network model is a collection of physically
linked records.
Allow member records to have more than one owner
8
Network Model … Cont’d
9
Network Model … Cont’d
Advantage of Network Data Model:
Network Model is able to model complex relationships and
represents semantics of add/delete on the relationships.
Can handle most situations for modeling using record types and
relationship types.
Language is navigational; uses constructs like FIND, FIND member,
FIND owner, FIND NEXT within set, GET etc.
10
Object Oriented Model
The OO approach of defining objects that can be used in many programs
is now also being applied to database systems.
An object can have properties (or attributes) but also behaviour, which
is modelled in methods (functions) in the object.
A class also has methods that are stored with the class definition.
11
Object Oriented Model … Cont’d
One advantage of the OO model is sub-classes. As there are different
types of account, they can be modelled as sub-classes of the Account
class.
This makes sense because the different account types have some
different behaviour e.g. gaining interest in a savings account but
some behaviour the same e.g. lodging or withdrawing cash. This is
the inheritance concept of OO programming.
12
Object Oriented Model … Cont’d
Diagram – class name at the top, properties in the middle, methods at the bottom.
13
Relation Model
Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A
Relational Model for Large Shared Data Banks')
Terminologies originates from the branch of mathematics called set
theory and relation
Can define more flexible and complex relationship
Viewed as a collection of tables called “Relations” equivalent to
collection of record types
Relation: Two dimensional table
Stores information or data in the form of tables ( rows and columns)
A row of the table is called tuple which is equivalent to record
A column of a table is called attribute which is equivalent to fields
Data value is the value of the Attribute
14
Relational Model … Cont’d
Records are related by the data stored jointly in the fields of records in
two tables or files. The related tables contain information that creates
the relation
The tables seem to be independent but are related some how.
No physical consideration of the storage is required by the user
Many tables are merged together to come up with a new virtual view
of the relationship
15
Relational Model … Cont’d
Relational Data model (also called the second generation data
model), describes entities and their relationships in the form of table
16
Relational Data Model
Properties of Relational Databases
Each row of a table is uniquely identified by a PRIMARY KEY
composed of one or more columns
Each tuple in a relation must be unique
Group of columns, that uniquely identifies a row in a table is called a
CANDIDATE KEY
ENTITY INTEGRITY RULE of the model states that no component
of the primary key may contain a NULL value.
A column or combination of columns that matches the primary key of
another table is called a FOREIGN KEY.
FOREIGN KEY is Used to cross-reference tables.
17
Properties of Relational Databases … Cont’d
The REFERENTIAL INTEGRITY RULE of the model states that,
for every foreign key value in a table there must be a corresponding
primary key value in another table in the database or it should be
NULL.
VIEWS are derived from BASE TABLES with SQL instructions like:
[SELECT .. FROM .. WHERE .. ORDER BY]
18
Properties of Relational Databases … Cont’d
Database is the collection of tables
Each entity in one table
Attributes are fields (columns) in table
Order of rows and columns is immaterial
Entries with repeating groups are said to be un-normalized
Entries are single-valued
Each column (field or attribute) has a distinct name
All values in a column represent the same attribute and have the same
data format
19
Building Blocks of the Relational Data Model
20
ENTITIES
The ENTITIES (persons, places, things etc.) which the organization
has to deal with.
22
ATTRIBUTES … Cont’d
At this level we need to know such things as:
Attribute name (be explanatory words or phrases)
Domain: is a set of values from which attribute values may be taken.
Each attribute has values taken from a domain.
For example, the domain of Name is string and that for salary is real
Whether the attribute is part of the entity identifier (attributes which just
describe an entity and those which help to identify it uniquely)
Whether it is permanent or time-varying (which attributes may change their
values over time)
Whether it is required or optional for the entity (whose values will sometimes
be unknown or irrelevant)
23
Types of Attributes
1. Simple (Atomic) Vs Composite Attribute
Simple : contains a single value (not divided into sub parts)
E.g. Age, gender
Composite: Divided into sub parts (composed of other attributes)
E.g. Name, address
2. Single-valued Vs multi-valued Attributes
Single-valued : have only single value(the value may change but
has only one value at one time)
E.g. Name, Sex, Id. No. color_of_eyes
Multi-Valued: have more than one value
E.g. Address, dependent-name
Person may have several college degrees
24
Types of Attributes … Cont’d
3. Stored Vs. Derived Attribute
Stored : not possible to derive or compute E.g. Name, Address
Derived: The value may be derived (computed) from the values of
other attributes.
E.g. Age (current year – year of birth), Length of employment (current date-
start date), Profit (earning-cost) , G.P.A (grade point/credit hours)
4. Null Values
NULL applies to attributes which are not applicable or which do not
have values.
You may enter the value NA (meaning not applicable)
Value of a key attribute can not be null.
25
RELATIONSHIPS
Related entities require setting of LINKS from one part of the database
to another.
A relationship should be named by a word or phrase which explains its
function
Role names are different from the names of entities forming the
relationship: one entity may take on many roles, the same role may be
played by different entities.
An important point about a relationship is how many entities
participate in it.
The number of entities participating in a relationship is called the
DEGREE of the relationship.
UNARY/RECURSIVE RELATIONSHIP: Single entity
BINARY RELATIONSHIPS: Two entities associated
TERNARY RELATIONSHIP: Three entities associated
N-NARY RELATIONSHIP: arbitrary number of entity sets
26
RELATIONSHIPS … Cont’d
Another important point about relationship is the range of instances
that can be associated with a single instance from one entity in a single
relationship.
The number of instances participating or associated with a single
instance from another entity in a relationship is called the
CARDINALITY of the relationship.
ONE-TO-ONE, e.g. Building - Location,
ONE-TO-MANY, e.g. hospital - patient,
MANY-TO-ONE, e.g. Employee - Department
MANY-TO-MANY, e.g. Author - Book.
27
RELATIONSHIPS … Cont’d
28
Relational Constraints/Integrity Rules
Relational Integrity
Domain Integrity: No value of the attribute should be beyond the
allowable limits
Entity Integrity: In a base relation, no attribute of a primary key can
be null
Referential Integrity: If a foreign key exists in a relation, either the
foreign key value must match a candidate key in its home relation or
the foreign key value must be null foreign key to primary key match-
ups
Enterprise Integrity: Additional rules specified by the users or
database administrators of a database are incorporated
29
Relational Constraints/Integrity Rules .. Cont’d
Key constraints
If tuples are need to be unique in the database, and then we need to
make each tuple distinct. To do this we need to have relational keys.
Super Key: an attribute or set of attributes that uniquely identifies a
tuple within a relation.
Candidate Key: a super key such that no proper subset of that
collection is a Super Key within the relation.
A candidate key has two properties:
1. Uniqueness
2. Irreducibility
If a candidate key consists of more than one attribute it is called
composite key.
30
Relational Constraints/Integrity Rules .. Cont’d
Primary Key: the candidate key that is selected to identify tuples
uniquely within the relation.
31
Relational languages and views
The languages in relational DBMS are DDL and DML.
We have the two kinds of relation in relational database. The
difference is on how the relation is created, used and updated:
1. Base Relation
A Named Relation corresponding to an entity in the conceptual
schema, whose tuples are physically stored in the database.
2. View
Is the dynamic result of one or more relational operations operating on
the base relations to produce another virtual relation.
So a view is a virtually derived relation that does not necessarily
exist in the database but can be produced upon request by a particular
user at the time of request.
32
Relational languages and views … Cont’d
Purpose of a view
33
Chapter 4
1
Outlines
Using High level Data Models for Database Design
Entity types and Sets, Attributes and Keys
Relationships, Roles and Structural Constraints
Weak Entity Types
Database Abstraction
E/R Diagram naming conventions, and Design issues
2
Database Design
Database design consists of several tasks:
Requirements Analysis,
Conceptual Design, and Schema Refinement,
Logical Design,
Physical Design and Tuning
In general, one has to go back and forth between these tasks to refine
a database design, and decisions in one task can influence the choices
in another task.
In developing a good design, one should ask:
What are the important queries and updates?
What attributes/relations are involved?
3
The Three levels of Database Design
Constructing a model independent of any physical
Conceptual considerations.
Design After the completion of Conceptual Design one has to go for
refinement of the schema, which is verification of Entities,
Attributes, and Relationships
5
The Entity Relationship (E-R) Model
Entity-Relationship modeling is used to represent conceptual view of
the database
The main components of ER Modeling are:
Entities
Corresponds to entire table, not row
Represented by Rectangle
Attributes
Represents the property used to describe an entity or a relationship
Represented by Oval
Relationships
Represents the association that exist between entities
Represented by Diamond
Constraints
Represent the constraint in the data
6
The Entity Relationship (E-R) Model … Cont’d
Before working on the conceptual design of the database, one has to
know and answer the following basic questions.
What are the entities and relationships in the enterprise?
What information about these entities and relationships should we
store in the database?
What are the integrity constraints that hold?
Constraints on each data with respect to update, retrieval and store.
Represent this information pictorially in ER diagrams, then map ER
diagram into a relational schema
7
Developing an E-R Diagram
Designing conceptual model for the database is not a one linear
process but an iterative activity where the design is refined again
and again.
To identify the entities, attributes, relationships, and constraints on
the data, there are different set of methods used during the analysis
phase.
These include information gathered by…
Interviewing end users individually and in a group
Questionnaire survey
Direct observation
8
Developing an E-R Diagram … Cont’d
The basic E-R model is graphically depicted and presented for review.
The process is repeated until the end users and designers agree that the
E-R diagram is a fair representation of the organization’s activities and
functions.
Checking for Redundant Relationships in the ER Diagram.
Relationships between entities indicate access from one entity to
another.
It is therefore possible to access one entity occurrence from another
entity occurrence even if there are other entities and relationships that
separate them.
This is often referred to as Navigation' of the ER diagram
The last phase in ER modeling is validating an ER Model against
requirement of the user.
9
Graphical Representations in ER Diagramming
Entity is represented by a RECTANGLE containing the name of the
entity.
10
Graphical Representations in ER Diagramming .. Cont’d
A derived attribute is indicated by a DOTTED LINE.
(……..)
11
Graphical Representations in ER Diagramming .. Cont’d
Example : Build an ER Diagram for the following information:
Students
Have an Id, Name, Dept, Age, Gpa
Courses
Have an Id, Name, Credit Hours
Students enroll in courses and receive a grade
12
Graphical Representations in ER Diagramming .. Cont’d
13
Entity versus Attributes
Consider designing a database of employees for an organization:
Should address be an attribute of Employees or an entity (connected
to Employees by a relationship)?
If we have several addresses per employee, address must be an entity
(attributes cannot be set-valued/multi valued)
If the structure (city, Woreda, Kebele, etc) is important.
E.g. want to retrieve employees in a given city, address must be
modeled as an entity (attribute values are atomic)
Cardinality on Relationship expresses the number of entity
occurrences/tuples associated with one occurrence/tuple of related
entity.
14
Entity versus Attributes … Cont’d
Existence Dependency: the dependence of an entity on the existence
of one or more entities.
Weak entity: an entity that can not exist without the entity with
which it has a relationship
Participating entity in a relationship is either optional or
mandatory.
15
Structural Constraints on Relationship
1. Constraints on Relationship / Multiplicity/ Cardinality Constraints
Multiplicity constraint is the number of or range of possible occurrence
of an entity type/relation that may relate to a single occurrence/tuple of
an entity type/relation through a particular relationship.
Mostly used to insure appropriate enterprise constraints.
One-to-one relationship:
A customer is associated with at most one loan via the relationship
borrower
A loan is associated with at most one customer via borrower
16
Structural Constraints on Relationship .. Cont’d
17
Structural Constraints on Relationship .. Cont’d
18
Structural Constraints on Relationship .. Cont’d
One-to-Many Relationships
In the one-to-many relationship a loan is associated with at most one
customer via borrower, a customer is associated with several
(including 0) loans via borrower
19
Structural Constraints on Relationship .. Cont’d
20
Structural Constraints on Relationship .. Cont’d
• Many-To-Many Relationship
21
Structural Constraints on Relationship .. Cont’d
22
Participation of an Entity Set in a Relationship Set
Total participation: every entity in the entity set participates in at least
one relationship in the relationship set.
The entity with total participation will be connected with the
relationship using a double line.
E.g. : Participation of EMPLOYEE in “belongs to” relationship with
DEPARTMENT is total since every employee should belong to a
department.
23
Participation of an Entity Set in a Relationship Set .. Cont’d
E.g.: Participation of EMPLOYEE in “manages” relationship with
DEPARTMENT, DEPARTMENT will have total participation but not
EMPLOYEE
24
Problem in ER Modeling
The Entity-Relationship Model is a conceptual data model that views
the real world as consisting of entities and relationships.
The model visually represents these concepts by the Entity-
Relationship diagram.
While designing the ER model one could face a problem on the
design which is called a connection traps.
Connection traps are problems arising from misinterpreting certain
relationships
There are two types of connection traps;
1. Fan trap
2. Chasm trap
25
Fan trap
Occurs where a model represents a relationship between entity
types, but the pathway between certain entity occurrences is
ambiguous.
26
Fan trap … Cont’d
28
Chasm Trap
Occurs where a model suggests the existence of a relationship
between entity types, but the path way does not exist between certain
entity occurrences.
May exist when there are one or more relationships with a minimum
multiplicity on cardinality of zero forming part of the pathway
between related entities.
Example:
If we have a set of projects that are not active currently then we can not
assign a project manager for these projects. So there are project with no
project manager making the participation to have a minimum value of
zero.
29
Chasm Trap … Cont’d
Problem:
How can we identify which BRANCH is responsible for which
PROJECT? We know that whether the PROJECT is active or not
there is a responsible BRANCH. But which branch is a question to
be answered, and since we have a minimum participation of zero
between employee and PROJECT we can’t identify the BRANCH
responsible for each PROJECT.
The solution for this Chasm Trap problem is to add another relation
ship between the extreme entities (BRANCH and PROJECT)
30
Chapter 5
Relational Algebra
1
Outlines
Relational Algebra
Relational Calculus
2
Relational Algebra
Relational algebra is a theoretical language with operations that work
on one or more relations to define another relation without changing
the original relation
The basic set of operations for the relational model is known as the
relational algebra.
These operations enable a user to specify basic retrieval requests.
The result of the retrieval is a new relation, which may have been
formed from one or more relations.
A sequence of relational algebra operations forms a relational algebra
expression, whose result will also be a relation that represents the
result of a database query (or retrieval request).
3
Relational Algebra … Cont’d
The output from one operation can become the input to another
operation (nesting is possible)
There are different basic operations that could be applied on relations
on a database based on the requirement.
Selection ( σ ): Selects a subset of rows from a relation.
Projection ( π ): Deletes unwanted columns from a relation.
Renaming: assigning intermediate relation for a single operation
Cross-Product ( x ): Allows us to combine two relations.
Set-Difference ( - ): Tuples in relation1, but not in relation2.
Union (∪ ): Tuples in relation1 or in relation2.
Intersection (∩): Tuples in relation1 and in relation2
Join :Tuples joined from two relations based on a condition
4
Relational Algebra … Cont’d
Table1: Sample table used to illustrate different kinds of relational
operations. The relation contains information about employees, IT skills
they have and the school where they attend each skill.
5
Selection
Selects subset of tuples/rows in a relation that satisfy selection
condition.
Selection operation is a unary operator (it is applied to a single
relation)
The Selection operation is applied to each tuple individually
The degree of the resulting relation is the same as the original relation
but the cardinality (no. of tuples) is less than or equal to the original
relation.
The Selection operator is commutative.
Set of conditions can be combined using Boolean operations (∧(AND),
∨(OR), and ~(NOT))
No duplicates in result.
6
Selection … Cont’d
Result relation can be the input for another relational algebra
operation (Operator composition).
It is a filter that keeps only those tuples that satisfy a qualifying
condition i.e. those satisfying the condition are selected while others
are discarded.
Notation:
<Selection Condition> <Relation Name>
Example: Find all Employees with skill type of Database.
< SkillType =”Database”> (Employee)
This query will extract every tuple from a relation called Employee
with all the attributes where the SkillType attribute with a value of
“Database”.
7
Selection … Cont’d
The resulting relation will be the following.
Example: Find all Employee with SkillType Database and School Unity
the relational?
< SkillType =”Database” AND School=”Unity”> (Employee)
8
Projection
Selects certain attributes while discarding the other from the base
relation.
The PROJECT creates a vertical partitioning.
Deletes attributes that are not in projection list.
Schema of result contains exactly the fields in the projection list, with
the same names that they had in the (only) input relation.
Projection operator has to eliminate duplicates.
Note: real systems typically don’t do duplicate elimination unless
the user explicitly asks for it.
If the Primary Key is in the projection list, then duplication will not
occur
Duplication removal is necessary to insure that the resulting table is
also a relation.
9
Projection … Cont’d
Notation:
π <Selected Attributes> <Relation Name>
Example: To display Name, Skill, and Skill Level of an employee, the
query and the resulting relation will be:
π <FName, LName, Skill, Skill_Level> (Employee)
10
Projection … Cont’d
Exercise: Write a relational operation that display Name, Skill and
Skill Level of an employee with Skill SQL and SkillLevel greater than
5?
11
Rename Operation
Allows us to name, and therefore to refer to, the results of relational
algebra expressions.
Allows us to refer to a relation by more than one name.
Example: x (E) returns the expression E under the name X
We may want to apply several relational algebra operations one after
the other. The query could be written in two different forms:
1. Write the operations as a single relational algebra
expression by nesting the operations.
2. Apply one operation at a time and create intermediate result
relations.
In the latter case, we must give names to the relations that hold the
intermediate results.
12
Rename Operation … Cont’d
If we want to have the Name, Skill, and Skill Level of an employee
with salary greater than 1500 and working for department 5, we
can write the expression for this query using the two alternatives:
• Then Result will be equivalent with the relation we get using the
first alternative.
13
UNION Operation
The result of this operation, denoted by R U S, is a relation that includes
all tuples that are either in R or in S or in both R and S.
Duplicate tuples are eliminated.
The two operands must be “type compatible”.
Type Compatibility
The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must
have the same number of attributes, and the domains of corresponding
attributes must be compatible; that is, Dom(Ai)=Dom(Bi) for i=1, 2,.. , n.
The resulting relation for;
R1 ∪ R2,
R1 ∩ R2, or
R1 - R2 has the same attribute names as the first operand relation R1
(by convention).
14
UNION Operation … Cont’d
Example
15
INTERSECTION Operation
The result of this operation, denoted by R ∩ S, is a relation that
includes all tuples that are in both R and S.
The two operands must be "type compatible" ∩
16
Set Difference (or MINUS) Operation
The result of this operation, denoted by R - S, is a relation that
includes all tuples that are in R but not in S.
The two operands must be "type compatible”.
Some Properties of the Set Operators
Notice that both union and intersection are commutative
operations; that is
R ∪ S = S ∪ R, and R ∩ S = S ∩ R
Both union and intersection can be treated as n-nary operations
applicable to any number of relations as both are associative
operations; that is
R ∪ (S ∪ T) = (R ∪ S) ∪ T, and (R ∩ S) ∩ T = R ∩ (S ∩ T)
The minus operation is not commutative; that is, in general
R-S≠S–R
17
Set Difference (or MINUS) Operation … Cont’d
18
CARTESIAN (cross product) Operation
This operation is used to combine tuples from two relations in a
combinatorial fashion.
That means, every tuple in Relation1 (R) one will be related with
every other tuple in Relation2 (S).
In general, the result of R(A1, A2, . . ., An) x S(B1,B2, . . ., Bm) is a
relation Q with degree n + m attributes Q(A1, A2, . . ., An, B1, B2, . .
., Bm), in that order.
Where R has n attributes and S has m attributes.
The resulting relation Q has one tuple for each combination of tuples
i.e. one from R and one from S.
Hence, if R has n tuples, and S has m tuples, then | R x S | will have
n* m tuples.
The two operands do NOT have to be "type compatible”
19
CARTESIAN (cross product) Operation … Cont’d
Example
20
CARTESIAN (cross product) Operation … Cont’d
21
JOIN Operation
The sequence of Cartesian product followed by select is used quite
commonly to identify and select related tuples from two relations.
This special operation is called JOIN.
JOIN Operation is denoted by a symbol.
This operation is very important for any relational database with more
than a single relation, because it allows us to process relationships
among relations.
The general form of a join operation on two relations:
R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is:
Where R and S can be any relations that result from general relational
algebra expressions
22
JOIN Operation … Cont’d
Since JOIN function in two relation, it is a Binary operation.
The type of JOIN which is called THETA JOIN (θ - JOIN) used θ as
a logical operator used in the join condition.
θ Could be { <, ≤ , >, ≥, ≠, = }
In Theta join tuples whose join attributes are null do not appear in the
result.
23
JOIN Operation … Cont’d
24
JOIN Operation … Cont’d
25
EQUIJOIN Operation
The most common use of join involves join conditions with equality
comparisons only ( = ).
Such a join, where the only comparison operator used is called an
EQUIJOIN.
26
NATURAL JOIN Operation
The standard definition of natural join requires that the two join
attributes, or each pair of corresponding join attributes, have the
same name in both relations.
If this is not the case, a renaming operation on the attributes is applied
first. The result of the natural join is the set of all combinations of
tuples in R and S that are equal on their common attribute names.
27
OUTER JOIN Operation
OUTER JOIN is another version of the JOIN operation where non
matching tuples from the first Relation are also included in the
resulting Relation where attributes of the second Relation for a non
matching tuples from Relation one will have a value of NULL.
An extension of the join operation that avoids loss of information.
Outer Join Can be:
Left Outer Join
Right Outer Join
Full Outer Join
28
Left Outer Join
The result of the left outer join is
the set of all combinations of
tuples in R and S that are equal on
their common attribute names, in
addition to tuples in R that have no
matching tuples in S.
29
Right Outer Join
The result of the right outer
join is the set of all
combinations of tuples in R and
S that are equal on their
common attribute names, in
addition to tuples in S that have
no matching tuples in R.
30
Full Outer Join
The result of the full outer join is
the set of all combinations of
tuples in R and S that are equal on
their common attribute names, in
addition to tuples in S that have no
matching tuples in R and tuples in
R that have no matching tuples in
S in their common attribute names
31
SEMIJOIN Operation
SEMI JOIN is another version of the JOIN operation where the
resulting Relation will contain those attributes of Relation one that are
related with tuples in the second Relation.
32
Relational Calculus
A relational calculus expression creates a new relation, which is
specified in terms of variables that range over rows of the stored
database relations (in tuple calculus) or over columns of the stored
relations (in domain calculus).
In a calculus expression, there is no order of operations to specify how
to retrieve the query result.
A calculus expression specifies only what information the result should
contain rather than how to retrieve it.
In Relational calculus, there is no description of how to evaluate a
query, this is the main distinguishing feature between relational algebra
and relational calculus.
Relational calculus is considered to be a non procedural language.
33
Relational Calculus … Cont’d
This differs from relational algebra, where we must write a sequence
of operations to specify a retrieval request.
Hence relational algebra can be considered as a procedural way of
stating a query.
When applied to relational database, the calculus is not that of
derivative and differential but in a form of first-order logic or
predicate calculus
A predicate is a truth-valued function with arguments.
When we substitute values for the arguments in the predicate, the
function yields an expression, called a proposition, which can be
either true or false.
34
Relational Calculus … Cont’d
If a predicate contains a variable, as in ‘x is a member of staff’, there
must be a range for x. When we substitute some values of this range for
x, the proposition may be true; for other values, it may be false.
If COND is a predicate, then the set off all tuples evaluated to be true
for the predicate COND will be expressed as follows:
{t | COND(t)}
Where t is a tuple variable and COND (t) is a conditional expression
involving t. The result of such a query is the set of all tuples t that
satisfy COND (t).
If we have set of predicates to evaluate for a single query, the predicates
can be connected using ∧(AND), ∨(OR), and ~(NOT)
35
Tuple-oriented Relational Calculus
The tuple relational calculus is based on specifying a number of tuple
variables.
Tuple relational calculus is interested in finding tuples for which a
predicate is true for a relation.
Based on use of tuple variables.
Tuple variable is a variable that ‘ranges over’ a named relation: that is,
a variable whose only permitted values are tuples of the relation.
If E is a tuple that ranges over a relation employee, then it is
represented as EMPLOYEE(E) i.e. Range of E is EMPLOYEE
Then to extract all tuples that satisfy a certain condition, we will
represent it as all tuples E such that COND(E) is evaluated to be true.
36
Tuple-oriented Relational Calculus … Cont’d
{E ⁄ COND(E)}
The predicates can be connected using the Boolean operators:
∧ (AND), ∨ (OR), ∼ (NOT)
COND(t) is a formula, and is called a Well-Formed-Formula (WFF)
if:
Where the COND is composed of n-nary predicates (formula
composed of n single predicates) and the predicates are
connected by any of the Boolean operators.
38
Tuple-oriented Relational Calculus … Cont’d
Exercise: Find EmpId, FName, LName, Skill and School where the
skill is attended where of employees with skill level greater than or equal
to 8.
39
Quantifiers in Relation Calculus
To tell how many instances the predicate applies to, we can use the
two quantifiers in the predicate logic.
One relational calculus expressed using Existential Quantifier can
also be expressed using Universal Quantifier.
40
Quantifiers in Relation Calculus … Cont’d
2. Universal quantifier ∀ (‘for all’)
Universal quantifier is used in statements about every instance, such
as:
An employee with skill level greater than or equal to 8 will be:
{E | Employee(E) ∧ (∀E)(E.SkillLevel >= 8)}
This means, for all tuples of relation employee where value for the
SkillLevel attribute is greater than or equal to 8.
41
Chapter 6
2
What is SQL ?
SQL stands for structured query language
Is a non-procedural language, i.e you can specify what information
you require, rather than how to get it.
Is an ANSI standard computer language
Allow you to access a database
SQL can execute queries against a database
Is easy to learn
Has two major components
DDL
DML
3
Writing SQL Commands
SQL statement consists of :-
Reserved words :- fixed part of SQL language and have fixed
meaning
Must spelt exactly
User defined words :- are made up by the user (according to
certain syntax rule)
They represent the names of varies database objects, i.e, tables,
columns, views, indexes ---
;(semicolon) :- used as statement terminator to mark the end of SQL
statement
SQL is case insensitive
Exception :- literal character data must be typed exactly as it appear in
the database. E.g. if we store a persons name ‘ABEBE’ and
search for it using the string ‘Abebe’ the row will not found.
4
Writing SQL Commands … Cont’d
SQL is free format
For readability begin with a new line for clauses
SQL identifier
Used to identify objects in database , such as table name, view name ,columns
Consists of letter, digit and underscore(_)
can be no longer than 128 character
Can’ t contain space
Must start with letter or underscore
5
SQL data types
1. Character data : Consists of sequence of character
Syntax CHARACTER[length]
CHARACTERVARYING[length]
Length :- the maximum number of character the a column hold.
Abbreviation CHARACTER CHAR
CHARACTERVARYING VARCHAR
A character string may be defined as having a fixed or varying length.
If fixed length is defined and we enter a string fewer character than
this length , the string padded with blanks to make up the required
size,
If varying length is defined and we enter a string fewer character than
this length, only those characters entered are stored.
E.g. name CHAR(20), address VARCHAR(20)
6
SQL data types … Cont’d
2. Numeric data:- The exact numeric data type is used to define
numbers with an exact representation.
Syntax NUMERIC[precision, [scale]]
DECIMAL[precision, [scale]]
INTEGER
SMALLINTGER
Precision:- the total number of significant decimal digit
scale:- the total number of decimal place
Abbreviations:-
INTEGER INT
DECIMAL DEC
NUMERIC NUM
SMALLINTEGR SMALLINT
7
SQL data types … Cont’d
NUMERIC and DECIMAL store number in decimal notation
INTEGER is used for large positive or negative value
SMALLINT is used for small positive or negative value
E.g. age SMALLINT
salary DECIMAL(5,2)
8
SQL data types … Cont’d
3. Date time data:- Used to define points in time
▪ DATE , TIME, TIMESTAMP
DATE:- is used to sore calendar dates using YEAR,MONTH & DAY
fields.
TIME:-is used to store time using HOUR,MINUTE & SECOND
fields
TIMESTAMP:- used to store date and time. E.g. birthdate DATE
4.Boolean data
Consists of the distinct truth values TRUE or FALSE, unless
prohibited by a NOTNULL constraint
For NULL type it return UNKNOWN result.
▪ E.g. status BOOLEAN
9
Integrity enhancement features
Integrity refers constraints that we wish to impose in order to protect
the database from becoming inconsistent.
It includes:- Required data
domain constraint
Entity integrity
Referential integrity
1. Required data:- Some column must contain a valid value.
They are not allowed to contain NULL values
NULL is distinct from blank or zero
NULL is used to represent data that is either not available, missing
or not applicable.
Set at defining stage (CREATE, ALTER)
E.g. position VARCHAR(10) NOT NULL
10
Integrity enhancement features … Cont’d
2. Domain Constraint:- Every column has a domain, i.e. a set of legal
values. To set these constraint use
❖ CHECK: Check clause allows a constraint to be defined on a
column or the entire table.
CHECK (searchcondition)
e.g. Sex CHAR NOT NULL CHECK(sex IN(‘M’, ‘F’))
❖ DOMAIN
Syntax: Create DOMAIN domainname[AS] datatype
[DEFAULT defaultoption]
[CHECK (searchcondition)]
E.g. Create DOMAIN sextype as CHAR
DEFAULT ‘M’
CHECK(VALUE IN(‘M’,’F’));
This creates a domain sextype.
11
Integrity enhancement features … Cont’d
when defining a column sex,we use a domain name sextype in place
of the data type CHAR
I.e. sex sextype NOT NULL
To remove domain use DROP DOMAIN constraint
Syntax: DROP DOMAIN domainname [RESTRICT|CASCADE]
The drop behavior, RESTRICT|CASCADE specifies the action to be
taken if the domain is currently being used.
If RESTRICT is specified and the domain is used in an existing table or
view the drop will fail.
If CASCADE is specified any column that is based on domain is
automatically changed to use the domains underlying datatype
(constraint/default for domain is replaced by constraint/default of column,
if appropriate)
12
Integrity enhancement features … Cont’d
3. Entity integrity:-A primary key of a table must contain a unique,
non-null value for each row
PRIMARY KEY (sid)
To define a composite primary key
PRIMARY KEY (sid,cid)
Or we can use UNIQUE clause
Sid VARCHAR(5) NOT NULL,
cid VARCHAR(9) NOTNULL,
UNIQUE(sid,cid)
13
Integrity enhancement features … Cont’d
4. Referential Integrity:-
Syntax: FOREIGN KEY(columnname) REFERENCES relationname
Referential action specified using ON UPDATE and ON DELETE subclause of
Fk clause.
When a user attempt to delete a row from a parent table, SQL support four
options regarding the actions to be taken :-
CASCADE:- delete the row from the parent table and child table.
SET NULL:- delete the parent and set null for child row , i.e. occur if the Fk
column don’t have NOT NULL qualifier specified.
SET DEFAULT:- delete the row from the parent and set default value for
child, i.e. valid if the Fk column have a DEFAULT value specified
NO ACTION:-reject the delete operation from the parent table. It is the
default setting if the ON DELETE rule is omitted.
E.g.1. FOREIGN KEY(cid) REFERENCES course ON DELETE SET NULL
2. FOREIGN KEY(cid) REFERENCES course ON UPDATE CASCADE
14
Data Definition Language (DDL)
15
Data Definition Language (DDL) … Cont’d
1. Creating a database
Syntax: CREATE DATABASE database-name
e.g. CREATE DATABASE KIoT
2. Creating a table
Syntax:
REFERENCES parenttablename(listofcandidatekeycolumn))
16
Data Definition Language (DDL) … Cont’d
1. Create table department
(did varchar(9) primary key, deptname varchar(12) NOT NULL
UNIQUE CHECK( deptname IN(‘IT’,’maths’,’stat’),school
varchar(20))
17
Data Definition Language (DDL) … Cont’d
3. Deleting database and Dropping tables
DELETE :- is used to delete the whole database
Syntax: DATABASE database-name
E.g. DELETE DATABASE KIoT
DROP:- used to remove a table from a database
Syntax: DROP TABLE table-name.
E.g. DROP TABLE student;
4. Altering a table: Used to modify a table after it is created
Syntax: ALTER TABLE table-name
DROP COLUMN column-name
ADD column-name data-type
ALTER COLUMN column-name data-type
E.g. ALTER TABLE student
ADD age int,
ALTER COLUMN sid int;
18
Data Mefinition Language (DML)
Is used to insert, retrieve and modify database information
The following are commonly used DML clauses:-
INSERT:- to insert data in a table
SELECT:- to query data in the database
UPDATE:- to update data in a table
DELETE:- to delete data from a table
19
Data Mefinition Language (DML) … Cont’d
2. SELECT:- used to query to retrieve selected data that much the criteria that you specify.
syntax
SELECT [DISTINCT|ALL]{*|[columnexpression[AS newname]]
FROM tablename
WHERE[condition]
GROUP BY columnlist
HAVING[condition]
ORDER BY columnlist
The sequence of processing in a SELECT statement is:-
FROM : specifies the table(s) to be used
WHERE: filter rows subject to some condition
GROUP BY: from group of rows with the same column value
HAVING: filter the groups subject to some condition
ORDER BY: specifies the order of the output
The result of a select statement is another table. That table is called view
20
Data Definition Language (DML) … Cont’d
1. Retrieve all rows and columns:- list all columns in the table
Where clause is unnecessary
e.g. SELECT sid,fname,lname,age,did
FROM student;
Use (*) for quick way of expressing all columns
e.g. SELECT * FROM student;
2. Use of DISTINCT clause: DISTINCT is used to eliminate the
duplicates
E.g. SELECT DISTINCT propertyno FROM property
Property Property
PA14
PA14
PG4
PG4 PG4
PA14
PG36
21 PG36
Data Definition Language (DML) … Cont’d
3. Calculated fields:- We can use calculated fields in SQL
E.g. SELECT sid,age/2 AS halfage
FROM student;
4. Row selection(WHERE clause):- used to retrieve all rows from a
table that satisfy a condition. Use the following basic predicates:-
Comparison:- compare the value of one expression with the value
of another expression.
Range:- test whether the value is fall with in a specified range of
values.
Set membership:- test whether the value of an expression equals
one of a set of values.
Pattern match:- test whether a string matches a specified pattern.
Null:- test whether a column has a null(unknown) value
22
Data Definition Language (DML) … Cont’d
I. Comparison search condition:-
in SQL the following simple comparison operator are available:
= equals
24
Data Definition Language (DML) … Cont’d
II. Range search condition(BETWEEN/NOT BETWEEN):- between
includes the end points of the range.
E.g. select * from student
Where age BETWEEN 15 AND 25;
Use a negated version (NOT BETWEEN) that check for values
outside the range.
III. Set membership search condition(IN/NOT IN):-
IN checks/tests whether a data value matches one of a list of values.
NOT IN is the negated version
E.g. select * from student
Where fname IN(‘Abebe’,’Kebede’);
25
Data Definition Language (DML) … Cont’d
IV. Pattern match search condition(LIKE/NOT LIKE)
(%) percent character represent any sequence of zero or more
character.
( _ ) underscore character represent any single character.
LIKE ‘H%’ :- The first character must be H, the rest can be any thing
LIKE ‘H_ _ _’ :- There must be exactly 4 characterstics ,the first
must be H
LIKE ‘%e’ :- The last character is e
LIKE ‘%Abebe%’:- A sequence of character of any length containing
‘Abebe’
NOT LIKE ‘H%’:- the first character can’t be an H.
27
Modifying data in the database
Update statement allows the contents of existing data to be modified.
Syntax
UPDATE <table name>
SET column name1 =data value1,column name2 =data value2,………
where search condition]
rows in a named table to be changed
28
Modifying data in the database … Cont’d
UPDATE specific rows
use where to specify a condition
E.g. UPDATE student
Set age=25
Where sid=’s001’;
Update mulitiple columns
Eg. UPDATE student
Set age=45,did=’d080’
Where fname=’Abebe’
29
Modifying data in the database … Cont’d
Deleting data from the data base
Use the DELETE statement allow rows to be delete from a named table
This statement does not touch table definition
Syntax DELETE FROM <tablename>
WHERE [search condition]
I. DELETE all rows : delete/remove all rows
where is omitted
Eg. DELETE from student or DELETE * from student
II. Delete specific rows: where clause is used for specifying a condition.
Eg. Delete from student
Where sid=’s001’
30
SQL Aggregate Functions
COUNT :- return the number of values in a specified column.
31
SQL Aggregate Functions … Cont’d
Use of count(*): To count all values in a column
Eg. Select count (*) As newage
From student
Where age>25
This statement count all columns that have age above >25
As used for labeling column name
Use of count(DISTINCT <column name>): Used to count distinct
value(remove repetition)
Eg. Select count(distinct age) as newage
From student
Where age>15
32
SQL Aggregate Functions … Cont’d
Use of count and sum: We can combine functions together
E.g. Find the total no. student and sum of their age whose age is above
25
Select count(sid) as sno,sum(age) as newage
From student
Where age ≥ 25
Use of MIN,MAX,AVG: find the min,max and average of student
age.
E.g. Select MIN(age) as minage, MAX(age) as maxage, AVG(age) as
average from staff.
use group by used to display a grouped query.
E.g. Select did count(fname) as no ,sum(avg)as no_age
From student Group by did Order by did;
33
Use Group by Clause
Used to group aggregate functions that applied to records based on
column values.
used to display a grouped query.
Eg. Select did count(fname) as no ,sum(avg)as noage
From student
Group by did;
Order by did;
Sorting results(ORDER BY clause):-
Use Ascending (ASC)
Descending (DESC)
E.g. select fname,lname,age from student
ORDER by age DESC;
34
Use of Having Clause
The HAVING clause is used to filter grouped data
Is used to perform aggregate function on grouped data
E.g. Select did, count(fname) as noofstud, sum(age) as newage
from student
group by did
having count(fname)>1
35
Using a sub query using equality
Sub queries consits of queries within queries that is used to retrive data
from related tables
Eg. Select fname,lname
From student where did=(select did from department where
depthead=‘Jemal’);
using sub query with an aggregate function
Eg. Select fname,lname,age(select avg(age) from student) as agediff
From student
Where age>(select avg(age)from student);
36
Use ANY/SOME and ALL
ALL :- this clause return value if it is satisfied by all values
produced by the sub query.
ANY /some :- the condition will be true if it is satisfied by any(one
or more)values Produced by sub query.
e.g1. Select fname,lname,age from student
Where age >some(select age from head
where did=’d001’)
e.g2.select fname,lname,age from student
where age>ALL(select age from head
where did=’d001’)
37
Join
J oin:- used to combine columns from several tables in to a single table
I. Simple join
Used to join columns from two or more tables based on equality
Is a type of equi join
Use alias for short hand representation of tables
Table Aliases: Using full table names as prefixes can make SQL queries
unnecessarily wordy. Table aliases can make the code a little more concise.
<Right join>
<full join>
40
Views
A view is a virtual table based on the result-set of a SELECT statement.
A view contains rows and columns, just like a real table.
The fields in a view are fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and
present the data as if the data were coming from a single table.
Note: The database design and structure will NOT be affected by the functions,
where, or join statements in a view.
43
Using Views … Cont’d
Another example view from the Northwind database calculates the total sale
for each category in 1997. Note that this view selects its data from another view
called "Product Sales for 1997":
CREATE VIEW [Category Sales For 1997] AS
SELECT DISTINCT CategoryName,Sum(ProductSales) AS
CategorySales
FROM [Product Sales for 1997]
GROUP BY CategoryName
We can query the view above as follows:
SELECT * FROM [Category Sales For 1997]
We can also add a condition to the query. Now we want to see the total sale only
for the category "Beverages":
SELECT * FROM [Category Sales for 1997]
WHERE CategoryName='Beverages'
44
Database Design
1
Outlines
Steps of Database Design
Convert ER to relations
Normalization
2
Database Design
Database design is the process of coming up with different kinds of
specification for the data to be stored in the database.
Describe how data is stored in the computer system.
Defining its structure, characteristics and contents of data.
Database Design Process
Step 1:- Requirements collection and Analysis
Prospective users are interviewed to collect information.
This step result in a concise set of users requirement.
The functional requirement should be specified as well as data
requirements.
Functional requirement can be documented using diagrams such as
sequence diagrams, DFD scenarios.
3
Database Design … Cont’d
Step 2:- Conceptual Design
Create conceptual schema.
Conceptual schema:- concise description of data requirement of the user,
and include a detailed description of the entity types, relationships,
constraint.
End user must understand it.
Step 3:- Database implementation (Logical Design)
Use one of DBMS for implementation.
The conceptual schema is transformed from the high level data model
into implementation model.
Step 4:- Physical Design
International storage structure, indexes, access paths and file
organizations are specified.
Application programs are designed and implemented.
4
Database Design … Cont’d
Generally the design part has divided into three sub phases
Conceptual Design
Logical Design
Physical Design
Design strategies:
1. Top Down
Start with high level abstraction and refine it.
High level entities
Add sub classes
Attributes
2. Bottom up
Start with basic abstraction and then combine them.
Attributes
Group in to entities
5
Database Design … Cont’d
3. Inside out
Special case of bottom up.
Focus on central set of concepts and work out wards.
No burned on initial designer.
4. Mixed
Start with top down then use inside out or bottom up.
Divide and conquer.
6
Conceptual Design
Is the process of constructing a model of the information used in an
enterprise, independent of any physical consideration.
Is the source of information for logical design.
Is high level and understand by non technical user.
Conceptual model of enterprise, independent of implementation detail
such as target DBMS, application programs, programming language,
hardware platform, performance issues etc.
Tasks to be performed:-
Identity entity types and relationships.
Associate attributes with entities.
Determine attribute domains.
Determine unique identifier (Key) attributes.
Use entity relationship model (ER).
7
Conceptual Design … Cont’d
Why conceptual model:
Independent of DBMS.
Allow easy common b/n user and developer.
Is permanent description of the database requirements.
Database requirements
We must convert written database requirement in to an E-R diagram.
Need to determine the entities, attributes and relationships.
Nouns = entities
Adjectives = attributes
Verbs = relationships
8
Logical Design
Is the process of constructing model of data used in an organization.
9
Logical Design … Cont’d
Converting ER Diagram to Relations
Three basic rules to convert ER into tables.
For a relation with one to one cardinality
All the attributes are merged into a single table.
i.e. primary key or candidate key of one relation is foreign key for the
other.
For a relation with one to many cardinality
Post the primary key or candidate key for the “one” side as a foreign
key attribute to the “many side”.
For a relationship with many to many
Create a new table (which is the associative entity) and post primary
key or candidate key from each entity as attributes in the new table
along with some additional attributed (if applicable)
10
Logical Design … Cont’d
11
Logical Design … Cont’d
12
Logical Design … Cont’d
13
Logical Design … Cont’d
Mapping Regular Entities to relation
Simple attributes: ER Attributes map directly on to the relation.
Composite attribute: Use only their simple, component attributes
Multi-Valued Attribute: Becomes a separate relation with a foreign
key taken from the super entity.
14
Logical Design … Cont’d
15
Normalization
A relational database is merely a collection of data, organized in a
particular manner. The father of the relational database approach, Codd
created a series of rules called normal forms that help define that
organization.
One of the best ways to determine what information should be stored in a
database is to clarify what questions will be asked of it and what data
would be included in the answers.
Database normalization is a series of steps followed to obtain a database
design that allows for consistent storage and efficient access of data in a
relational database. These steps reduce data redundancy and the risk of
data becoming inconsistent.
NORMALIZATION is the process of identifying the logical associations
between data items and designing a database that will represent such
associations but without suffering the update anomalies which are;
Insertion, Deletion and Modification Anomalies
16
Normalization … Cont’d
Normalization may reduce system performance since data will be cross
referenced from many tables.
Thus DE normalization is sometimes used to improve performance, at
the cost of reduced consistency guarantees.
Normalization normally is considered as good if it is lossless
decomposition.
Mnemonic for remembering the rationale for normalization could be
the following:
No Repeating or Redundancy: no repeating fields in the table
The Fields Depend Upon the Key: the table should solely depend on the key
The Whole Key: no partial key dependency
And Nothing But The Key: no inter data dependency
17
Normalization … Cont’d
All the normalization rules will eventually remove the update
anomalies that may exist during data manipulation after the
implementation.
Pitfalls of Normalization
Requires data to see the problems
May reduce performance of the system
Is time consuming,
Difficult to design and apply and
Prone to human error
18
Normalization … Cont’d
The underlying ideas in normalization are simple enough. Through
normalization we want to design for our relational database a set of
tables that;
1. Contain all the data necessary for the purposes that the
database is to serve,
2. Have as little redundancy as possible,
3. Accommodate multiple values for types of data that require
them,
4. Permit efficient updates of the data in the database, and
5. Avoid the danger of losing data unknowingly
The type of problems that could occur in insufficiently normalized
table is called update anomalies which includes;
19
Normalization … Cont’d
1. Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database
entry into all the places in the database where information about that new
entry needs to be stored.
In a properly normalized database, information about a new entry needs to be
inserted into only one place in the database.
In an inadequately normalized database, information about a new entry may
need to be inserted into more than one place and, human fallibility being what
it is, some of the needed additional insertions may be missed.
2. Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing
database entry when it is time to remove that entry.
In a properly normalized database, information about an old, to-be-gotten-rid-
of entry needs to be deleted from only one place in the database.
20
Normalization … Cont’d
In an inadequately normalized database, information about that old entry may
need to be deleted from more than one place, and, human fallibility being what
it is, some of the needed additional deletions may be missed.
3. Modification anomalies
A modification of a database involves changing some value of the attribute of a
table. In a properly normalized database table, what ever information is
modified by the user, the change will be effected and used accordingly.
The purpose of normalization is to reduce the chances for anomalies to occur in a
database.
21
Normalization … Cont’d
Deletion Anomalies: If employee with ID 16 is deleted then ever
information about skill C++ and the type of skill is deleted from the
database. Then we will not have any information about C++ and its skill
type.
Insertion Anomalies: What if we have a new employee with a skill called
Pascal? We can not decide weather Pascal is allowed as a value for skill
and we have no clue about the type of skill that Pascal should be
categorized as.
Modification Anomalies: What if the address for Helico is changed fro
Piazza to Mexico? We need to look for every occurrence of Helico and
change the value of School_Add from Piazza to Mexico, which is prone to
error.
Database-management system can work only with the information that we
put explicitly into its tables for a given database and into its rules for
working with those tables, where such rules are appropriate and possible.
22
Functional Dependency (FD)
Before moving to steps of normalization, it is important to have an
understanding of "functional dependency."
Data Dependency
The logical association between data items that point the database
designer in the direction of a good database design are referred to as
determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent
relationship if certain values of data item B always appears with
certain values of data item A.
If the data item A is the determinant data item and B the dependent
data item then the direction of the association is from A to B and not
vice versa.
23
Functional Dependency (FD) … Cont’d
The essence of this idea is that if the existence of something, call it A, implies
that B must exist and have a certain value, then we say that "B is functionally
dependent on A."
We also often express this idea by saying that "A determines B," or that "B is
a function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the
statement, "If A, then B.“
It is important to note that the value B must be unique for a given value of A,
i.e., any given value of A must imply just one and only one value of B, in
order for the relationship to qualify for the name "function." (However, this
does not necessarily prevent different values of A from implying the same
value of B.)
X Y holds if whenever two tuples have the same value for X, they must
have the same value for Y
24
Functional Dependency (FD) … Cont’d
The notation is: A B which is read as; B is functionally dependent on A
In general, a functional dependency is a relationship among attributes. In
relational databases, we can have a determinant that governs one other
attribute or several other attributes.
FDs are derived from the real-world constraints on the attributes
Since the type of Wine served depends on the type of Dinner, we say
Wine is functionally dependent on Dinner. Dinner Wine
Since both Wine type and Fork type are determined by
the Dinner type, we say Wine is functionally dependent
on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
25
Partial Dependency
If an attribute which is not a member of the primary key is dependent
on some part of the primary key (if we have composite primary key)
then that attribute is partially functionally dependent on the primary
key.
Let {A,B} is the Primary Key and C is no key attribute.
Then if {A,B} C and B C
Then C is partially functionally dependent on {A,B}
26
Full Dependency
If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key (if we
have composite primary key) then that attribute is fully functionally
dependent on the primary key.
27
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of
the following form: "If A implies B, and if also B implies C, then A
implies C."
28
Steps of Normalization
We have various levels or steps in normalization called Normal Forms.
The level of complexity, strength of the rule and decomposition increases
as we move from one lower level Normal Form to the higher.
A table in a relational database is said to be in a certain normal form if it
satisfies certain constraints.
Normal form below represents a stronger condition than the previous one.
Normalization towards a logical design consists of the following steps:
Un Normalized Form: Identify all data elements
First Normal Form: Find the key with which you can find all data
Second Normal Form: Remove part-key dependencies. Make all data
dependent on the whole key.
Third Normal Form: Remove non-key dependencies. Make all data
dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they
29 adhere to third normal form.
UNNORMALIZED FORM (UNF)
A table that contains one or more repeating groups.
A repeating group is a field or group of fields that hold multiple values
for a single occurrence of a field.
30
First Normal Form (1NF)
Requires that all column values in a table are atomic (e.g., a number is
an atomic value, while a list or a set is not).
We have tow ways of achieving this:
1. Putting each repeating group into a separate table and
connecting them with a primary key-foreign key relationship
2. Moving this repeating groups to a new row by repeating the
common attributes. If so then Find the key with which you can find
all data
Definition of a table (relation) in 1NF if:
There are no duplicated rows in the table. Unique identifier
Each cell is single-valued (i.e., there are no repeating groups).
Entries in a column (attribute, field) are of the same kind.
31
First Normal Form (1NF) … Cont’d
FIRST NORMAL FORM (1NF): Remove all repeating groups.
Distribute the multi-valued attributes into different rows and identify
a unique identifier for the relation so that is can be said is a relation in
relational database.
32
First Normal Form (1NF) … Cont’d
Example 2: Consider the following UNF relation.
33
Second Normal form 2NF
No partial dependency of a non key attribute on part of the primary
key.
Any table that is in 1NF and has a single-attribute (i.e., a non-
composite) key is automatically also in 2NF.
Definition of a table (relation) in 2NF
It is in 1NF and
If all non-key attributes are dependent on all of the key. i.e. no
partial dependency.
Since a partial dependency occurs when a non-key attribute is
dependent on only a part of the (composite) key, the definition of
2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and
if it has no partial dependencies."
34
Second Normal form 2NF … Cont’d
Example for 2NF:
This schema is in its 1NF since we don’t have any repeating groups or
attributes with multi-valued property. To convert it to a 2NF we need to
remove all partial dependencies of non key attributes on part of the
primary key.
{EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund,
ProjMangID
But in addition to this we have the following dependencies
EmpID EmpName
ProjNo ProjName, ProjLoc, ProjFund, ProjMangID
35
Second Normal form 2NF … Cont’d
As we can see some non key attributes are partially dependent on
some part of the primary key. Thus these collections of attributes
should be moved to a new relation.
36
Second Normal form 2NF … Cont’d
• Example 2: Normalize the following relation.
• The primary key for this table is the composite key (PatientId,
RelativeId).
37
Second Normal form 2NF … Cont’d
So, to determine if it satisfies 2NF, you have to find out if all other
fields in it depend fully on both PatientId and RelativeId; that is,you
need to decide whether the following conditions are true:
(PatientId, RelativeId) Relationship and
(PatientId, RelativeId) Patient_tel.
However, on the dependencies in the patient table, only the following
are true:
(PatientId, RelativeId) Relationship and
(PatientId) Patient_tel.
38
Second Normal form 2NF … Cont’d
39
Third Normal Form (3NF )
Eliminate Columns Not Dependent On Key - If attributes do not contribute to
a description of the key, remove them to a separate table.
This level avoids update and delete anomalies.
Definition of a Table (Relation) in 3NF
It is in 2NF and
There are no transitive dependencies between attributes.
Example for (3NF): Assumption: Students of same batch (same year) live in
one building or dormitory
• This schema is in its 2NF since the primary key is a single attribute.
40
Third Normal Form (3NF ) … Cont’d
41
Third Normal Form (3NF ) … Cont’d
Consider the following example:
Now, PK = empid
We have functional dependencies:
Empid → depid
Depid → depname
Or Depid → depbudjet
Therefore, the above table is not is 3NF. To normalize it, we can use the
functional dependencies:
Depid → depname
Depid → depbudjet And
Empid → depid
42
Third Normal Form (3NF ) … Cont’d
So that the resulting tables are the following:
43
Other Normal Forms
Boyce-Codd Normal Form (BCNF): Isolate Independent Multiple
Relationships - No table may contain two or more 1:n or N:M
relationships that are not directly related. The correct solution, to cause
the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent.
Def.: A table is in BCNF if it is in 3NF and if every determinant is a
candidate key.
Forth Normal form (4NF): Isolate Semantically Related Multiple
Relationships - There may be practical constrains on information that
justify separating logically related many-to-many relationships.
Def.: A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.
44
Other Normal Forms … Cont’d
Fifth Normal Form (5NF): A model limited to only simple
(elemental) facts.
Def.: A table is in 5NF, also called "Projection-Join Normal
Form" (PJNF), if it is in 4NF and if every join dependency in the
table is a consequence of the candidate keys of the table.
Domain-Key Normal Form (DKNF): A model free from all
modification anomalies.
Def.: A table is in DKNF if every constraint on the table is a
logical consequence of the definition of keys and domains.
45
Physical Database Design … Cont’d
The Logical database design is concerned with the what;
The Physical database design is concerned with the how.
Physical database design is the process of producing a description of the
implementation of the database on secondary storage.
It describes the base relations, file organization, and indexes used to
achieve effective access to the data along with any associated integrity
constraints and security measures.
Sources of information for the physical design process include global
logical data model and documentation that describes model.
Knowledge of the DBMS that is selected to host the database systems,
with all its functionalities, is required since functionalities of current
DBMS vary widely.
46
Steps in physical database design
1. Translate logical data model for target DBMS
To determine the file organizations and access methods that will be
used to store the base relations; i.e. the way in which relations and
tuples will be held on secondary storage
Design enterprise constraints for target DBMS
This phase is the translation of the global logical data model to produce
a relational database schema in the target DBMS. This includes creating
the data dictionary based on the logical model and information
gathered.
After the creation of the data dictionary, the next activity is to
understand the functionality of the target DBMS so that all necessary
requirements are fulfilled for the database intended to be developed.
47
Steps in physical database design … Cont’d
Knowledge of the DBMS includes:
how to create base relations
whether the system supports:
definition of Primary key
definition of Foreign key
definition of Alternate key
definition of Domains
Referential integrity constraints
definition of enterprise level constraints
Some tasks to be done:
1.1. Design base relation
1.2. Design representation of derived data
1.3. Design enterprise constraint
48
Steps in physical database design … Cont’d
1.1. Design base relation
Designing base relation involves identification of all necessary requirements
about a relation starting from the name up to the referential integrity constraints.
The implementation of the physical model is dependent on the target DBMS
since some has more facilities than the other in defining database definitions.
The base relation design along with every justifiable reason should be fully
documented.
1.2. Design representation of derived data
While analyzing the requirement of users, we may encounter that there are some
attributes holding data that will be derived from existing or other attributes. A
decision on how to represent such data should be devised.
Most of the time derived attributes are not expressed in the logical model but
will be included in the data dictionary. Whether to store stored attributes in a
base relation or calculate them when required is a decision to be made by the
designer considering the performance impact.
49
Steps in physical database design … Cont’d
1.3. Design enterprise constraint
Data in the database is not only subjected to constraints on the
database and the data model used but also with some enterprise
dependent constraints.
This constraint definition is also dependent on the DBMS selected
and enterprise level requirements.
All the enterprise level constraints and the definition method in the
target DBMS should be fully documented.
50
Steps in physical database design … Cont’d
2. Design physical representation
This phase is the level for determining the optimal file organizations to store the
base relations and indexes that are required to achieve acceptable performance,
that is, the way in which relations and tuples will be held on the secondary
storage.
2.1. Analyze transactions
To understand the functionality of the transactions that will run on the
database and to analyze the important transactions
2.2. Choose file organization
To determine an efficient file organization for each base relation
2.3. Choose indexes
Used for quick access
2.4. Estimate disk space and system requirement
To estimate the amount of disk space that will be required by the database.
51
Steps in physical database design … Cont’d
3. Design user view
To design the user views that were identified in the conceptual
database design methodology
4. Design security mechanisms
5. Consider controlled redundancy
To determine whether introducing redundancy in a controlled
manner by relaxing the normalization rules will improve the
performance of the system.
6. Monitor and tune the operational system
To design the access rules to the base relations and user views
52
Chapter 8
Record Storage and primary File Organization
Chapter Outline
Disk Storage Devices
Files of Records
Operations on Files
Unordered Files
Ordered Files
Hashed Files
Dynamic and Extendible Hashing Techniques
RAID Technology
Chapter 13-2
Disk Storage Devices (cont.)
Preferred secondary storage device for high storage capacity and low
cost.
Disks are divided into concentric circular tracks on each disk surface.
Track capacities vary typically from 4 to 50 Kbytes.
Chapter 13-3
Disk Storage Devices (cont.)
Because a track usually contains a large amount of information, it is
divided into smaller blocks or sectors.
The division of a track into sectors is hard-coded on the disk surface
and cannot be changed. One type of sector organization calls a portion
of a track that subtends a fixed angle at the center as a sector.
A track is divided into blocks. The block size B is fixed for each
system. Typical block sizes range from B=512 bytes to B=4096 bytes.
Whole blocks are transferred between disk and main memory for
processing.
Chapter 13-4
Disk Storage Devices (cont.)
Chapter 13-5
Disk Storage Devices (cont.)
A read-write head moves to the track that contains the block to be
transferred. Disk rotation moves the block under the read-write head
for reading or writing.
A physical disk block (hardware) address consists of a cylinder
number (imaginery collection of tracks of same radius from all
recoreded surfaces), the track number or surface number (within the
cylinder), and block number (within track).
Reading or writing a disk block is time consuming because of the seek
time s and rotational delay (latency) rd.
Double buffering can be used to speed up the transfer of contiguous
disk blocks.
Chapter 13-6
Disk Storage Devices (cont.)
Chapter 13-7
Typical Disk
Parameters
Chapter 13-8
Records
Fixed and variable length records
Records contain fields which have values of a particular type (e.g.,
amount, date, time, age)
Fields themselves may be fixed length or variable length
Variable length fields can be mixed into one record: separator characters
or length fields are needed so that the record can be “parsed”.
Chapter 13-9
Blocking
Blocking: refers to storing a number of records in one blo ck on the disk.
Blocking factor (bfr) refers to the number of records per block.
There may be empty space in a block if an integral number of records do
not fit in one block.
Spanned Records: refer to records that exceed the size of one or more
blocks and hence span a number of blocks.
Chapter 13-10
Files of Records
A file is a sequence of records, where each record is a collection of
data values (or data items).
Records are stored on disk blocks. The blocking factor bfr for a file is
the (average) number of file records stored in a disk block.
Chapter 13-11
Files of Records (cont.)
File records can be unspanned (no record can span two blocks) or
spanned (a record can be stored in more than one block).
The physical disk blocks that are allocated to hold the records of a file
can be contiguous, linked, or indexed.
In a file of fixed-length records, all records have the same format.
Usually, unspanned blocking is used with such files.
Files of variable-length records require additional information to be
stored in each record, such as separator characters and field types.
Usually spanned blocking is used with such files.
Chapter 13-12
Operation on Files
Typical file operations include:
OPEN: Readies the file for access, and associates a pointer that will
refer to a current file record at each point in time.
FIND: Searches for the first file record that satisfies a certain
condition, and makes it the current file record.
FINDNEXT: Searches for the next file record (from the current
record) that satisfies a certain condition, and makes it the current file
record.
READ: Reads the current file record into a program variable.
INSERT: Inserts a new record into the file, and makes it the current
file record.
Chapter 13-13
Operation on Files (cont.)
DELETE: Removes the current file record from the file, usually by
marking the record to indicate that it is no longer valid.
MODIFY: Changes the values of some fields of the current file
record.
CLOSE: Terminates access to the file.
REORGANIZE: Reorganizes the file records. For example, the
records marked deleted are physically removed from the file or a new
organization of the file records is created.
READ_ORDERED: Read the file blocks in order of a specific field
of the file.
Chapter 13-14
Unordered Files
Also called a heap or a pile file.
Chapter 13-15
Ordered Files
Also called a sequential file.
File records are kept sorted by the values of an ordering field.
Insertion is expensive: records must be inserted in the correct order. It
is common to keep a separate unordered overflow (or transaction ) file
for new records to improve insertion efficiency; this is periodically
merged with the main ordered file.
A binary search can be used to search for a record on its ordering field
value. This requires reading and searching log2 of the file blocks on the
average, an improvement over linear search.
Reading the records in order of the ordering field is quite efficient.
Chapter 13-16
Ordered Files
(cont.)
Chapter 13-17
Average Access Times
The following table shows the average access time to access a specific
record for a given type of file
Chapter 13-18
Hashed Files
Hashing for disk files is called External Hashing
The file blocks are divided into M equal-sized buckets, numbered
bucket0, bucket1, ..., bucket M-1. Typically, a bucket corresponds to one
(or a fixed number of) disk block.
One of the file fields is designated to be the hash key of the file.
The record with hash key value K is stored in bucket i, where i=h(K),
and h is the hashing function.
Search is very efficient on the hash key.
Collisions occur when a new record hashes to a bucket that is already
full. An overflow file is kept for storing such records. Overflow records
that hash to each bucket can be linked together.
Chapter 13-19
Hashed Files (cont.)
There are numerous methods for collision resolution, including the following:
Open addressing: Proceeding from the occupied position specified by the hash
address, the program checks the subsequent positions in order until an unused
(empty) position is found.
Chaining: For this method, various overflow locations are kept, usually by
extending the array with a number of overflow positions. In addition, a pointer
field is added to each record location. A collision is resolved by placing the
new record in an unused overflow location and setting the pointer of the
occupied hash address location to the address of that overflow location.
Multiple hashing: The program applies a second hash function if the first
results in a collision. If another collision results, the program uses open
addressing or applies a third hash function and then uses open addressing if
necessary.
Chapter 13-20
Hashed Files (cont.)
Chapter 13-21
Hashed Files (cont.)
To reduce overflow records, a hash file is typically kept 70-80% full.
The hash function h should distribute the records uniformly among
the buckets; otherwise, search time will be increased because many
overflow records will exist.
Main disadvantages of static external hashing:
- Fixed number of buckets M is a problem if the number of records
in the file grows or shrinks.
- Ordered access on the hash key is quite inefficient (requires
sorting the records).
Chapter 13-22
Hashed Files - Overflow handling
Chapter 13-23
Dynamic And Extendible Hashed Files
Dynamic and Extendible Hashing Techniques
Chapter 13-24
Dynamic And Extendible Hashing (cont.)
An insertion in a disk block that is full causes the block to split into
two blocks and the records are redistributed among the two blocks.
The directory is updated appropriately.
Linear hashing does require an overflow area but does not use a
directory. Blocks are split in linear order as the file expands.
Chapter 13-25
Extendible
Hashing
Chapter 13-26
Parallelizing Disk Access using RAID Technology.
The main goal of RAID is to even out the widely different rates of
performance improvement of disks against those in memory and
microprocessors.
Chapter 13-27
RAID Technology (cont.)
A natural solution is a large array of small independent disks acting as
a single higher-performance logical disk. A concept called data
striping is used, which utilizes parallelism to improve disk
performance.
Data striping distributes data transparently over multiple disks to
make them appear as a single large, fast disk.
Chapter 13-28
RAID Technology (cont.)
Different raid organizations were defined based on different combinations of the
two factors of granularity of data interleaving (striping) and pattern used to
compute redundant information.
Raid level 0 has no redundant data and hence has the best write performance.
Raid level 1 uses mirrored disks.
Raid level 2 uses memory-style redundancy by using Hamming codes, which
contain parity bits for distinct overlapping subsets of components. Level 2
includes both error detection and correction.
Raid level 3 uses a single parity disk relying on the disk controller to figure out
which disk has failed.
Raid Levels 4 and 5 use block-level data striping, with level 5 distributing data
and parity information across all disks.
Raid level 6 applies the so-called P + Q redundancy scheme using Reed-
Soloman codes to protect against up to two disk failures by using just two
redundant disks.
Chapter 13-29
Use of RAID Technology (cont.)
Chapter 13-30
Use of RAID
Technology (cont.)
Chapter 13-31
Trends in Disk Technology
Chapter 13-32
Storage Area Networks
The demand for higher storage has risen considerably in recent times.
Organizations have a need to move from a static fixed data center
oriented operation to a more flexible and dynamic infrastructure for
information processing.
Thus they are moving to a concept of Storage Area Networks (SANs).
In a SAN, online storage peripherals are configured as nodes on a high-
speed network and can be attached and detached from servers in a very
flexible manner.
This allows storage systems to be placed at longer distances from the
servers and provide different performance and connectivity options.
Chapter 13-33
Storage Area Networks (contd.)
Advantages of SANs are:
Chapter 13-34