Professional Documents
Culture Documents
AND MANAGEMENT
F.O. ECHOBU 1
Database systems: review of basic concepts
DBMS functions
Data modeling
Relational databases
Outline Database query languages:
Distributed databases
Physical database design
Introduction to Data Warehousing.
F.O. ECHOBU 2
• Data: Known facts that can be recorded
• End-user data, that is, raw facts of interest to the end user.
• Metadata, or data about data, through which the end-user data are
integrated and managed.
• Field: A character or group of characters (alphabetic or numeric)
that has a specific meaning. A field is used to define and store
data.
• Record: A logically connected set of one or more fields that
describes a person, place, or thing.
Basic Database • File: A collection of related records
• Database: A collection of data
Systems Concepts • represents some aspect of the real world
• logically coherent collection (not a random collection)
• designed, built & populated for a specific purpose
• Database Management System: Software for creating and
managing databases. It provides users and programmers with a
systematic way to create, retrieve, update and manage data.
• Database schema: The description of the database
F.O. ECHOBU 3
• Data Dictionary
• Data Storage Management
• Data Transformation and Presentation
• Security
• Multi-user Access Control
• Backup and Recovery
DBMS functions
• Data Integrity
• Database Access Language
• Database Communication Interface
F.O. ECHOBU 4
Database System
Environment
Hardware
Software
People
Procedures
Data
F.O. ECHOBU 5
• Database system refers to an organization of components
that define and regulate the collection, storage,
management, and use of data within a database
environment.
• Hardware. Hardware refers to all of the system’s physical
devices; for example, computers (microcomputers,
Database System workstations, servers, and supercomputers), storage
devices, printers, network devices (hubs, switches, routers,
Environment fiber optics), and other devices (automated teller
Hardware machines, ID readers, and so on).
Software
People
Procedures
Data
F.O. ECHOBU 6
• Operating system software - manages all hardware components and
makes it possible for all other software to run on the computers. Examples
of operating system software include Microsoft Windows, Linux, Mac OS,
UNIX, and MVS.
• DBMS software manages the database within the database system. Some
examples of DBMS software include Microsoft SQL Server, Oracle
Database System Corporation’s Oracle, MySQL AB’s MySQL and IBM’s DB2.
Environment • Application programs and utility software are used to access and
manipulate data in the DBMS and to manage the computer environment in
Hardware which data access and manipulation take place. Application programs are
Software most commonly used to access data found within the database to generate
People reports, tabulations, and other information to facilitate decision making.
Procedures
Data Utilities are the software tools used to help manage the database system’s
computer components. For example, all of the major DBMS vendors now
provide graphical user interfaces (GUIs) to help create database structures,
control database access, and monitor database operations.
F.O. ECHOBU 7
• System administrators oversee the database system’s
general operations.
• Database administrators, also known as DBAs, manage
the DBMS and ensure that the database is functioning
Database System properly.
Environment • Database designers design the database structure. They
Hardware
are, in effect, the database architects.
Software
People • Systems analysts and programmers design and implement
Procedures
Data the application programs.
• End users are the people who use the application
programs to run the organization’s daily operations.
F.O. ECHOBU 8
• Procedures are the instructions and rules that govern the
design and use of the database system.
• They play an important role in a company because they
Database System enforce the standards by which business is conducted
within the organization and with customers.
Environment
Hardware • Are also used to ensure that there is an organized way to
Software monitor and audit both the data that enter the database
People
Procedures and the information that is generated through the use of
Data that data.
F.O. ECHOBU 9
• Data covers the collection of facts stored in the database.
F.O. ECHOBU 10
DBMS hides certain details of how data is stored and maintained
and it provides abstract view of data. This is to simplify user-
interaction with the system. Complexity (of data and data
structure) is hidden from users through several levels of
abstraction. This enables the users to manipulate the data without
Data worrying about where it is located or how it is actually stored.
Abstraction
The overall database description can be defined at three levels,
namely, internal, conceptual, and external levels and thus, named
three-level DBMS architecture.
F.O. ECHOBU 11
Three-Schema
Architecture
F.O. ECHOBU 12
Three-Level
Architecture Internal level: It is the lowest level of data abstraction that deals
with the physical representation of the database on the computer
and thus, is also known as physical level. It describes how the data
Internal is physically stored and organized on the storage medium
Conceptual
External
F.O. ECHOBU 13
Conceptual level: This level of abstraction deals with the logical
Three-Level structure of the entire database and thus, is also known as logical
Architecture level. It describes what data is stored in the database, the
relationships among the data and complete view of the user’s
requirements without any concern for the physical
Internal implementation. That is, it hides the complexity of physical
storage structures. The conceptual view is the overall view of the
Conceptual database and it includes all the information that is going to be
External represented in the database.
F.O. ECHOBU 14
External level: It is the highest level of abstraction that deals with
the user’s view of the database and thus, is also known as view
level.
Three-Level In general, most of the users and application programs do not
Architecture require the entire data stored in the database. The external level
describes a part of the database for a particular group of users. It
permits users to access data in a way that is customized according
Internal to their needs, so that the same data can be seen by different
users in different ways, at the same time.
Conceptual
External In this way, it provides a powerful and flexible security mechanism
by hiding the parts of the database from certain users, as the user
is not aware of existence of any attributes that are missing from
the view.
F.O. ECHOBU 15
These three levels are used to describe the schema of the database
at various levels. Thus, the three-level architecture is also known as
three-schema architecture.
The internal level has an internal schema, which describes the
physical storage structure of the database.
Three-Schema The conceptual level has a conceptual schema, which describes
Architecture the structure of entire database.
The external level has external schemas or user views, which
describe the part of the database according to a particular user’s
requirements, and hide the rest of the database from that user.
A key advantage of this architecture is data independence.
F.O. ECHOBU 16
Three-Schema
Architecture
Example
F.O. ECHOBU 17
• Data independence - A condition that exists when data
access is unaffected by changes in the physical data
storage characteristics.
F.O. ECHOBU 18
• Physical Data Independence - Application programs and
ad hoc facilities are logically unaffected when physical
access methods or storage structures are changed.
• Logical Data Independence - Application programs and
ad hoc facilities are logically unaffected when changes are
made to the table structures that preserve the original
Data table values (changing order of column or inserting
columns).
Independence
• Integrity Independence - All relational integrity
constraints must be definable in the relational language
and stored in the system catalog, not at the application
level.
• Distribution Independence - The end users and
application programs are unaware and unaffected by the
data location (distributed vs. local databases).
F.O. ECHOBU 19
DBMS
Architecture
F.O. ECHOBU 20
• Listener. The listener process listens for clients’ requests and
handles the processing of the SQL requests to other DBMS
processes. Once a request is received, the listener passes the
request to the appropriate user process.
• User. The DBMS creates a user process to manage each client
session. Therefore, when you log on to the DBMS, you are
assigned a user process. This process handles all requests you
DBMS submit to the server. There are many user processes—at least
one per each logged-in client.
Processes
• Scheduler. The scheduler process organizes the concurrent
execution of SQL requests.
• Lock manager. This process manages all locks placed on
database objects, including disk pages.
• Optimizer. The optimizer process analyzes SQL queries and
finds the most efficient way to access the data.
F.O. ECHOBU 21
• Receive an application’s (or an end user’s) request.
• Validate, analyze, and decompose the request. The request
might include mathematical and/or logical operations such as
the following: Select all customers with a balance greater than
$1,000. The request might require data from only a single table,
or it might require access to several tables.
F.O. ECHOBU 22
• Improved data sharing.
• Improved data security.
• Better data integration.
• Minimized data inconsistency. Data inconsistency exists
when different versions of the same data appear in
different places.
DBMS Advantages • Improved data access.
• Improved decision making.
• Increased end-user productivity.
F.O. ECHOBU 23
• DBA – access authorization, coordination & monitoring
database usage, problem determination, performance
tuning etc
• Designers – identify the requirements & chose the
appropriate structures to represent & store the data
• Users (Casual, parametric, Sophisticated, stand-alone)
• System analysts & application programmers
Database Users
• DBMS system designers & implementers
• Tool developers
• Operators & maintenance personnel
F.O. ECHOBU 24
• Deciding the information content of the database
• Deciding the storage structure and access strategy
• Liaising with users
• Defining strategy for back up and recovery
• Defining security and integrity checks
Responsibilities of • Monitoring performance and responding to changing
the DBA requirements
F.O. ECHOBU 25
Data modeling
F.O. ECHOBU 26
• A data model is a relatively simple representation, usually
graphical, of more complex real-world data structures.
• In general terms, a model is an abstraction of a more
complex real-world object or event.
• A model’s main function is to help you understand the
complexities of the real-world environment.
Data Modeling • Within the database environment, a data model represents
data structures and their characteristics, relations,
constraints, transformations, and other constructs with the
purpose of supporting a specific problem domain
F.O. ECHOBU 27
Development of
Data Models
F.O. ECHOBU 28
• An entity is anything (a person, a place, a thing, or an
event) about which data are to be collected and stored. An
entity represents a particular type of object in the real
world. (e.g. CUSTOMER, STUDENT, PRODUCT)
• An entity is represented in the Entity Relationship Diagram
(ERD) by a rectangle, also known as an entity box. The
Building Blocks name of the entity, a noun, is written in the center of the
of Data Models rectangle.
Entities
Attributes • The entity name is generally written in capital letters and is
Relationships written in the singular form: STUDENT rather than
Constraints STUDENTS, and EMPLOYEE rather than EMPLOYEES.
F.O. ECHOBU 29
• An attribute is a characteristic of an entity. For example, a
STUDENT entity would be described by attributes such as
Student Surname, Student First Name, Student Phone,
Student Address, Student Course etc.
• Attributes are the equivalent of fields in file systems.
Building Blocks
of Data Models
Entities
Attributes
Relationships
Constraints
F.O. ECHOBU 30
• Three types of relationships or connectivity
• one-to-many (1:M),
• many-to-many (M:N), and
• one-to-one (1:1). The ER model
F.O. ECHOBU 31
Chen’s vs Crow’s
Notations
F.O. ECHOBU 32
• A constraint is a restriction placed on the data to ensure
data integrity. Constraints are normally expressed in the
form of rules.
• For example:
• An employee’s salary must have values that are between 30,000
and 500,000.
Building Blocks • A student’s GPA must be between 0.00 and 5.00.
of Data Models • An account number must have ten digits.
Entities •
Attributes
Relationships
Constraints
F.O. ECHOBU 33
• The network model was created to represent complex data
relationships more effectively than the hierarchical model,
to improve database performance, and to impose a
database standard.
• In the network model, the user perceives the network
database as a collection of records in 1:M relationships.
Network Data
Models • However, unlike the hierarchical model, the network
model allows a record to have more than one parent.
• In network database terminology, a relationship is called a
set. Each set is composed of at least two record types: an
owner record and a member record. A set represents a 1:M
relationship between the owner and the member.
F.O. ECHOBU 34
Network Diagram
Network Data
Models
F.O. ECHOBU 35
• Its basic logical structure is represented by an upside-down tree.
The hierarchical structure contains levels, or segments. A
segment is the equivalent of a file system’s record type.
• Within the hierarchy, the top layer (the root) is perceived as the
parent of the segment directly beneath it.
• The hierarchical model depicts a set of one-to-many (1:M)
Hierarchical Data relationships between a parent and its children segments. (Each
parent can have many children, but each child has only one
Models parent.)
• Limitations:
• It was complex to implement,
• It was difficult to manage, and
• It lacked structural independence.
• There were no standards for how to implement the model
F.O. ECHOBU 36
Hierarchical Diagram
Hierarchical Data
Models
F.O. ECHOBU 37
• The relational model was introduced in 1970 by E. F. Codd
(of IBM) in his landmark paper “A Relational Model of Data
for Large Shared Databanks”.
• The relational model foundation is a mathematical concept
known as a relation.
Relational Data • You can think of a relation (sometimes called a table) as a
Models matrix composed of intersecting rows and columns.
• Each row in a relation is called a tuple.
• Each column represents an attribute.
F.O. ECHOBU 38
Relational Diagram
Relational Data
Models
F.O. ECHOBU 39
The OO data model is based on the following components:
• An object is an abstraction of a real-world entity. In general
terms, an object may be considered equivalent to an ER
model’s entity. More precisely, an object represents only
one occurrence of an entity. (The object’s semantic content
is defined through several of the items in this list.)
Object Oriented • Attributes describe the properties of an object. For
Data Models example, a PERSON object includes the attributes Name,
Social Security Number, and Date of Birth.
• Objects that share similar characteristics are grouped in
classes. A class is a collection of similar objects with shared
structure (attributes) and behavior (methods). In a general
sense, a class resembles the ER model’s entity set.
However, a class is different from an entity set in that it
contains a set of procedures known as methods.
F.O. ECHOBU 40
• A class’s method represents a real-world action such as
finding a selected PERSON’s name, changing a PERSON’s
name, or printing a PERSON’s address. In other words,
methods are the equivalent of procedures in traditional
programming languages. In OO terms, methods define an
object’s behavior. Classes are organized in a class
hierarchy. The class hierarchy resembles an upside-down
Object Oriented tree in which each class has only one parent. For example,
Data Models the CUSTOMER class and the EMPLOYEE class share a
parent PERSON class. (Note the similarity to the
hierarchical data model in this respect.)
• Inheritance is the ability of an object within the class
hierarchy to inherit the attributes and methods of the
classes above it. For example, two classes, CUSTOMER and
EMPLOYEE, can be created as subclasses from the class
PERSON. In this case, CUSTOMER and EMPLOYEE will
inherit all attributes and methods from PERSON.
F.O. ECHOBU 41
• Facilitates interaction among the designer, the
applications programmer, and the end user.
• A well-developed data model can even foster improved
understanding of the organization for which the database
design is developed.
Importance of
Data Modeling
F.O. ECHOBU 42
Normalization is a process for evaluating and correcting table
structures to minimize data redundancies, thereby helping to
eliminate data anomalies. It helps us evaluate table structures and
produce good tables. Normalization is very important in database
design.
Normalization Works through a series of stages called normal
forms:
Normal form (1NF)
Normalization Second normal form (2NF) The four most commonly
Third normal form (3NF) used Normal Forms
Boyce-Codd Normal Form (BCNF)
Forth Normal (4NF)
Fifth Normal (5NF)
Domain-key normal form (DKNF)
F.O. ECHOBU 43
To illustrate the normalization process, we will examine a simple
business application.
In this case we will explore the simplified database activities of a
construction company that manages several building projects.
Normalisation
Each project has its own project number, name, employees
assigned to it and so on. Each employee has an employee number,
Example name, and job classification such as engineer or computer
technician.
The company charges its clients by billing the hours spent on each
contract. The hourly billing rate is dependent on the employee’s
position. Periodically, a report is generated that contains the
information displayed below.
F.O. ECHOBU 44
Normalization
F.O. ECHOBU 45
The structure of the data set above does not handle data very well for
the following reasons:
The project number (PROJ_NUM) is apparently intended to be a
primary key, but it contains nulls.
The table entries invite data inconsistencies. For example, the
JOB_CLASS value “elect.Engineer” might be entered as “elect.Eng.”
In some cases, “el. Eng” or “EE” in others.
Normalization The table displays data redundancies. These data redundancies yield
the following anomalies:
o update anomalies. Modifying the JOB_CLASS for employee number 105
requires (potentially many alterations, one for each EMP_NUM =105)
o insertion anomalies. Just to complete a row definition, a new employee
must be assigned to a project. If the employee is not yet assigned, a
phantom project must be created to complete the employee data entry.
o Deletion anomalies. If employee 103 quits, deletions must be made for
every entry in which EMP_NUM =103. Such deletions will result in loosing
other vital data of project assignments from the database.
F.O. ECHOBU 46
The table above contains what is known as repeating groups. A
repeating group derives its name from the fact that a group of
multiple (related) entries can exist for any single key attribute
occurrence.
F.O. ECHOBU 47
The normalization process starts with a simple three-step
procedure:
Normalization
Step 1: Eliminate the Repeating Groups
Conversion to First Step 2: Identify the Primary Key
Normal Form
Step 3: Identify all Dependencies
F.O. ECHOBU 48
Normalization
Step 1: Eliminate the Repeating Groups
Conversion to First To eliminate the repeating groups, eliminate the nulls by making
Normal Form sure that each repeating group attribute contains an appropriate
data value. This change converts the table1 above to table below
F.O. ECHOBU 49
Normalization
Conversion to First Normal
Form
F.O. ECHOBU 50
Step 2: Identify the Primary Key
Normalization In the layout in table2 even a casual observer will note that
PROJ_NUM is not an adequate primary key because the project
number does not uniquely identify all the remaining entity (row)
Conversion to First attributes.
Normal Form
To maintain a proper primary key that will uniquely identify any
attribute value, the new key must be composed of a combination of
PROJ_NUM and EMP_NUM.
F.O. ECHOBU 51
Step 3: Identify all Dependencies
Dependencies can be depicted with the help of a dependency
Normalization diagram, such diagram depicts all the dependencies found within
a given table structure.
Conversion to First
Normal Form Dependency diagrams are very helpful makes it easy to view all
the relationships among a table’s attributes, and their use makes it
much less likely for one to overlook an important dependency.
F.O. ECHOBU 52
Dependency Diagram: 1NF
Normalization
Conversion to First
Normal Form
F.O. ECHOBU 53
Dependency Diagram 1NF Notes
Looking at the dependency diagram above:
The primary key attributes are bold, underlined, and shaded in a
different color.
Normalization
The arrows above the attributes indicate all desirable
dependencies, that is, dependencies that are based on the primary
Conversion to First key. In this case, note that the entity’s attributes are dependent on
Normal Form the combination of PROJ_NUM and EMP_NUM.
The arrows below the dependency diagram indicate less-desirable
dependencies. Two types of such dependencies exist:
Partial dependencies. Dependencies based on only a part of a
composite primary key are called partial dependencies.
Transitive dependencies. A transitive dependency is a dependency
of one nonprime attribute on another nonprime attribute. The
problem with transitive dependencies is that they still yield data
anomalies.
F.O. ECHOBU 54
1NF requirements
Normalization
1. All the key attributes are defined.
2. There are no repeating groups in the table.
Conversion to First
Normal Form 3. All attributes are dependent on the primary key.
All relational tables satisfy the 1NF requirements. The problem
with the 1NF table2 structure above is that it contains partial
dependencies and transitive dependency.
F.O. ECHOBU 55
The rule of conversion from INF format to 2NF format is:
Eliminate all partial dependencies from the 1NF format.
Normalization The conversion from 1nf to 2nf format is done in two steps:
Step 1: Identify All the Key Components
Conversion to Second Step 2: Identify the Dependent Attributes
Normal Form
F.O. ECHOBU 56
Step 1: Identify All the Key Components
Eliminating partial dependencies from the 1NF table will result in
producing three tables from the original table.
From the dependency diagram we can observe that two partial
dependencies exist:
Normalization PROJ_NAME depends on PROJ_NUM, and
EMP_NAME, JOB_CLASS, and CHG_HOUR depend on EMP_NUM.
To eliminate the existing two partial dependencies, write each
Conversion to Second component on a separate line, and then write the original
Normal Form (composite) key on the last line.
PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NUM
Each component will become the key in a new table. The original
table is now divided into three tables: PROJECT, EMPLOYEE, and
ASSIGN.
F.O. ECHOBU 57
Step 2: Identify the Dependent Attributes
Determine which attributes are dependent on which other
attributes.
Normalization
PROJECT (Proj_num, Proj_name)
Conversion to
Second Normal EMPLOYEE (Emp_num, Emp_name, Job_class, Chg_hour)
Form
ASSIGN (Proj_num, Emp_num, Assign_hours)
F.O. ECHOBU 58
Dependency Diagram 2NF
Normalization
Conversion to
Second Normal
Form
F.O. ECHOBU 59
Conditions for 2NF
A table is in second normal form (2NF) if it satisfies the following
two conditions:
Normalization It is in 1NF
It includes no partial dependencies; that is, no attribute is dependent
on only a portion of the primary key.
Conversion to Second
Normal Form A partial dependency can exist only if a table’s primary key is
composed of several attributes
A table whose primary key consists of only a single attribute is
automatically in 2nf if it is in 1nf.
F.O. ECHOBU 60
The rule of conversion from 2NF format to 3NF format is:
Eliminate all transitive dependencies from the 2NF format.
Normalization The conversion from 2nf to 3nf format is done in three steps:
Step 1: Identify Each New Determinant
Conversion to Third
Normal Form Step 2: Identify the Dependent Attributes
Step 3: Remove the Dependent Attributes from Transitive
Dependencies
F.O. ECHOBU 61
Step 1: Identify Each New Determinant
For every transitive dependency, write its determinant as a PK for
Normalization a new table. (A determinant is any attribute whose value
determines other values within a row.)
If you have three different transitive dependencies, you will have
Conversion to Third three different determinants. The 2NF dependency diagram
Normal Form shows only one case of transitive dependency.
Therefore, write the determinant for this transitive dependency:
JOB_CLASS
F.O. ECHOBU 62
Step 2: Identify the Dependent Attributes
Identify the attributes that are dependent on each determinant
identified in step 1 and identify the dependency. In this case, you
write:
Normalization
Conversion to Third JOB_CLASS CHG_HOUR
Normal Form
Name the table to reflect its contents and function. In this case,
JOB seems appropriate.
F.O. ECHOBU 63
Step 3: Remove the Dependent Attributes from Transitive
Dependencies
Eliminate all the dependent attributes in the transitive
Normalization relationship(s) from each of the tables that have such a transitive
relationship.
Conversion to Third Draw a new dependency diagram to show all the tables
Normal Form Check the tables to make sure that each table has a determinant
and that no table contains inappropriate dependencies (partial or
transitive).
When steps 1-3 above have been completed, the resulting tables
will be shown below
F.O. ECHOBU 64
Dependency Diagram for the 3NF
Normalization
Conversion to Third
Normal Form
F.O. ECHOBU 65
Conditions for 3NF
Normalization A table is in third normal form (3NF) if the following two conditions
are satisfied:
Conversion to Third 1. It is in 2NF
Normal Form
2. it contains no transitive dependencies
F.O. ECHOBU 66
Facilitates data integration.
Reduces data redundancy.
Benefits of Provides a robust architecture for retrieving and maintaining
Normalization data.
Compliments data modelling.
Reduces the chances of data anomalies occurring.
F.O. ECHOBU 67
Relational databases
F.O. ECHOBU 68
Database query
languages
F.O. ECHOBU 69
The main objective of a database management system is to allow
its users to perform a number of operations on the database such
as insert, delete, and retrieve data in abstract terms without
knowing about the physical representations of data.
DBMS A very important advantage of using a DBMS is that it offers data
Languages independence. That is, application programs are insulated from
changes in the way the data is structured and stored. Data
independence is achieved through use of the three levels of data
abstraction; in particular, the conceptual schema and
The external schema provide distinct benefits in this area.
F.O. ECHOBU 70
The DBMS provides two database languages to implement the
databases, namely:
Data definition language (DDL)
Data manipulation language (DML)
F.O. ECHOBU 71
Creating and Modifying Relations Using SQL
The main construct for representing data in the relational model is
a relation. A relation consists of a relation schema and a relation
instance. The relation instance is a table, and the relation schema
DBMS describes the column heads for the table.
Languages We first describe the relation schema and then the relation
instance. The schema specifies the relation's name, the name of
each field (or column, or attribute), and the domain of each field.
A domain is referred to in a relation schema by the domain name
and has a set of associated values.
F.O. ECHOBU 72
Example of Relational Schema
DBMS Students(sid: string, name: string, login: string, age: integer, gpa:
real)
Languages
This shows that the relations name is student and the schemas are
SID of string data type, name of string data type, login of string
also, age of inter and GPA of real.
F.O. ECHOBU 73
Example of Instance of a Relation
F.O. ECHOBU 75
Creation of Relation
The CREATE TABLE statement is used to define a new table. To
create the Students relation, we can use the following statement:
DBMS CREATE TABLE Students
Languages ( sid CHAR(20) ,
Name CHAR(30) ,
Login CHAR(20) ,
DDL
Age INTEGER,
Gpa REAL)
F.O. ECHOBU 76
Creation of Instance of a Relation
Tuples (Instances or records) are inserted, using the INSERT
command.
DBMS
Languages We can insert a single tuple into the Students table as follows:
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, 'Smith', 'smith@ee', 18, 3.2)
F.O. ECHOBU 77
Deleting Tuples From Relation
We can delete tuples using the DELETE command. We can delete all
DBMS Students tuples with name equal to Smith using the command:
Languages
DELETE FROM Students
WHERE name = ‘SMITH'
F.O. ECHOBU 78
Integrity Constraints Over Relations
A database is only as good as the information stored in it, and a
DBMS must therefore help prevent the entry of incorrect
information.
DBMS An integrity constraint (IC) is a condition specified on a database
Languages schema to restrict the entrance of ‘wrong data’ in an instance of
the database.
If a database instance satisfies all the integrity constraints
specified on the database schema, it is a legal instance.
A DBMS enforces integrity constraints, in that it permits only legal
instances to be stored in the database.
F.O. ECHOBU 79
Implementation of ICs
Integrity constraints are specified and enforced at different times:
DBMS When the DBA or end user defines a database schema, he or she
specifies the ICS that must hold on any instance of this database.
Languages
When a database application is run, the DBMS checks for
violations and disallows changes to the data that violate the
specified ICs.
F.O. ECHOBU 80
Referential Integrity Constraints
This is the process of ensuring that data is consistent between
related tables. It deals with parent child relationship in a database.
This type of constrains are usually created to ensure uniformity
DBMS among related tables.
Languages It is implemented using keys. A key is a column value that uniquely
identified a record in a database. It is used to establish relationship
with other tables. It is usually one to one mapping although
sometimes you can have double or multiple mappings. It is usually
referred to as parent child relationship.
There are two types of keys: primary keys and foreign keys.
F.O. ECHOBU 81
Primary Key and Foreign Key
A primary key is column values that uniquely identify a row of
data in a table. It is what is used to establish relationship among
tables. Example of primary key in student table is the student
matriculation number; two students can never have the same
matriculation number.
DBMS
A foreign key is that column value in a table that references a
Languages primary key in another table. Foreign keys are used to indicate
child tables. A foreign key ensures that the parent records are
created before the child record. It also ensures the deletion of
child record before the parent record.
In a database, relationships are usually established through the
use of foreign keys and primary keys.
The purpose of separating data into tables and establishing table
relationship is to reduce data redundancy. The process of reducing
data redundancy is called normalization.
F.O. ECHOBU 82
Primary Key and Foreign Key
Languages
F.O. ECHOBU 83
Key Constraints
A key constraint is a statement that a certain minimal subset of
the fields of a relation is a unique identifier for a tuple.
DBMS A set of fields that uniquely identifies a tuple according to a key
Languages constraint is called a candidate key (often abbreviated as key).
Considering the students relation above, the sid field is a candidate
key.
F.O. ECHOBU 84
Specifying Key Constraints in SQL
In SQL, we can declare a key by using the UNIQUE constraint. At
most one of these candidate keys can be declared to be a primary
key, using the PRIMARY KEY constraint. (SQL does not require
that such constraints be declared for a table.) Let’s consider our
student table definition and specify key information:
CREATE TABLE Students ( MATRICNO CHAR(20) ,
DBMS Name CHAR (30) ,
Languages login CHAR(20) ,
age INTEGER,
gpa REAL,
UNIQUE (name, age),
CONSTRAINT StudentsKey PRIMARY KEY (MARTICNO) )
This definition says that MATRICNO is the primary key and the
combination of NAME and AGE is also a key.
F.O. ECHOBU 85
Foreign Key Constraint
Foreign key can be defined as a set of fields in one relation that is
used to `refer’ to a tuple in another relation. (Must correspond to
the primary key of the second relation).
Sometimes the information stored in a relation is linked to the
information stored in another relation. If one of the relations is
DBMS modified, the other must be checked, and perhaps modified, to
keep the data consistent.
Languages An IC involving both relations must be specified if a DBMS is to
make such checks. The most common IC involving two relations is
a foreign key constraint.
The foreign key in the referencing relation must match the
primary key of the reference relation, that is it must have the
same number of columns and compatible data types though the
column names can be different.
F.O. ECHOBU 86
Representing Foreign keys in SQL
F.O. ECHOBU 87
Querying Relational Data
F.O. ECHOBU 88
Representing Query in SQL
Example: Consider the instance of the students relation used
previously. We can retrieve rows corresponding to students who are
younger than 18 with the following SQL query:
DBMS SELECT *
FROM Students
Languages WHERE age < 18
The symbol “*” means that we retain all fields of selected tuples in
the result.
F.O. ECHOBU 89
Distributed databases
F.O. ECHOBU 90
Distributed
A distributed database management system (DDBMS) governs
Database the storage and processing of logically related data over
interconnected computer systems in which both data and
Management processing functions are distributed among several sites.
Systems
F.O. ECHOBU 91
Distributed
Database
Management
Systems
F.O. ECHOBU 92
Computer workstations (sites or nodes) that form the network
system. The distributed database system must be independent of
the computer system hardware.
Network hardware and software components that reside in each
Distributed workstation. The network components allow all sites to interact
and exchange data. Because the components—computers,
DBMS operating systems, network hardware, and so on—are likely to be
supplied by different vendors, it is best to ensure that distributed
database functions can be run on multiple platforms.
Components
Communications media that carry the data from one workstation
to another. The DDBMS must be communications-media-
independent; that is, it must be able to support several types of
communications media.
F.O. ECHOBU 93
The Transaction Processor (TP), which is the software
component found in each computer that requests data. The
Distributed transaction processor receives and processes the application’s
data requests (remote and local). The TP is also known as the
DBMS application processor (AP) or the transaction manager (TM).
The Data Processor (DP), which is the software component
residing on each computer that stores and retrieves data located
Components at the site. The DP is also known as the data manager (DM). A data
processor may even be a centralized DBMS.
F.O. ECHOBU 94
Data are located near the greatest demand site. The data in a distributed
database system are dispersed to match business requirements.
Faster data access. End users often work with only a locally stored subset
of the company’s data.
Distributed
Faster data processing. A distributed database system spreads out the
DBMS systems workload by processing data at several sites.
Growth facilitation. New sites can be added to the network without
Advantages affecting the operations of other sites.
Improved communications. Because local sites are smaller and located
closer to customers, local sites foster better communication among
departments and between customers and company staff.
F.O. ECHOBU 95
Reduced operating costs. It is more cost-effective to add workstations to
a network than to update a mainframe system. Development work is
done more cheaply and more quickly on low-cost PCs than on
mainframes.
Distributed User-friendly interface. PCs and workstations are usually equipped with
an easy-to-use graphical user interface (GUI). The GUI simplifies training
DBMS and use for end users.
Less danger of a single-point failure. When one of the computers fails,
the workload is picked up by other workstations. Data are also
Advantages distributed at multiple sites.
Processor independence. The end user is able to access any available
copy of the data, and an end user’s request is processed by any processor
at the data location.
F.O. ECHOBU 96
Complexity of management and control. Applications must recognize
data location, and they must be able to stitch together data from various
sites. Database administrators must have the ability to coordinate
database activities to prevent database degradation due to data
Distributed anomalies.
Technological difficulty. Data integrity, transaction management,
DBMS concurrency control, security, backup, recovery, query optimization,
access path selection, and so on, must all be addressed and resolved.
Disadvantages Security. The probability of security lapses increases when data are
located at multiple sites. The responsibility of data management will be
shared by different people at several sites.
F.O. ECHOBU 97
Lack of standards. There are no standard communication protocols at
the database level. (Although TCP/IP is the de facto standard at the
network level, there is no standard at the application level.) For example,
different database vendors employ different—and often incompatible—
techniques to manage the distribution of data and processing in a
Distributed DDBMS environment.
Increased storage and infrastructure requirements. Multiple copies of
DBMS data are required at different sites, thus requiring additional disk storage
space.
Disadvantages Increased training cost. Training costs are generally higher in a
distributed model than they would be in a centralized model, sometimes
even to the extent of offsetting operational and hardware savings.
Costs. Distributed databases require duplicated infrastructure to operate
(physical location, environment, personnel, software, licensing, etc.)
F.O. ECHOBU 98
Application interface to interact with the end user, application
programs, and other DBMSs within the distributed database.
Validation to analyze data requests for syntax correctness.
Transformation to decompose complex requests into atomic data
request components.
Distributed Query optimization to find the best access strategy. (Which database
DBMS fragments must be accessed by the query, and how must data updates, if
any, be synchronized?)
Mapping to determine the data location of local and remote fragments.
Characteristics
I/O interface to read or write data from or to permanent local storage.
Formatting to prepare the data for presentation to the end user or to an
application program.
Security to provide data privacy at both local and remote databases.
F.O. ECHOBU 99
Backup and recovery to ensure the availability and recoverability of the
database in case of a failure.