You are on page 1of 122

DATA BASE DESIGN

AND MANAGEMENT

F.O. ECHOBU 1
 Database systems: review of basic concepts
 DBMS functions
 Data modeling
 Relational databases
Outline  Database query languages:
 Distributed databases
 Physical database design
 Introduction to Data Warehousing.

F.O. ECHOBU 2
• Data: Known facts that can be recorded
• End-user data, that is, raw facts of interest to the end user.
• Metadata, or data about data, through which the end-user data are
integrated and managed.
• Field: A character or group of characters (alphabetic or numeric)
that has a specific meaning. A field is used to define and store
data.
• Record: A logically connected set of one or more fields that
describes a person, place, or thing.
Basic Database • File: A collection of related records
• Database: A collection of data
Systems Concepts • represents some aspect of the real world
• logically coherent collection (not a random collection)
• designed, built & populated for a specific purpose
• Database Management System: Software for creating and
managing databases. It provides users and programmers with a
systematic way to create, retrieve, update and manage data.
• Database schema: The description of the database

F.O. ECHOBU 3
• Data Dictionary
• Data Storage Management
• Data Transformation and Presentation
• Security
• Multi-user Access Control
• Backup and Recovery
DBMS functions
• Data Integrity
• Database Access Language
• Database Communication Interface

F.O. ECHOBU 4
Database System
Environment
Hardware
Software
People
Procedures
Data

F.O. ECHOBU 5
• Database system refers to an organization of components
that define and regulate the collection, storage,
management, and use of data within a database
environment.
• Hardware. Hardware refers to all of the system’s physical
devices; for example, computers (microcomputers,
Database System workstations, servers, and supercomputers), storage
devices, printers, network devices (hubs, switches, routers,
Environment fiber optics), and other devices (automated teller
Hardware machines, ID readers, and so on).
Software
People
Procedures
Data

F.O. ECHOBU 6
• Operating system software - manages all hardware components and
makes it possible for all other software to run on the computers. Examples
of operating system software include Microsoft Windows, Linux, Mac OS,
UNIX, and MVS.
• DBMS software manages the database within the database system. Some
examples of DBMS software include Microsoft SQL Server, Oracle
Database System Corporation’s Oracle, MySQL AB’s MySQL and IBM’s DB2.

Environment • Application programs and utility software are used to access and
manipulate data in the DBMS and to manage the computer environment in
Hardware which data access and manipulation take place. Application programs are
Software most commonly used to access data found within the database to generate
People reports, tabulations, and other information to facilitate decision making.
Procedures
Data Utilities are the software tools used to help manage the database system’s
computer components. For example, all of the major DBMS vendors now
provide graphical user interfaces (GUIs) to help create database structures,
control database access, and monitor database operations.

F.O. ECHOBU 7
• System administrators oversee the database system’s
general operations.
• Database administrators, also known as DBAs, manage
the DBMS and ensure that the database is functioning
Database System properly.
Environment • Database designers design the database structure. They
Hardware
are, in effect, the database architects.
Software
People • Systems analysts and programmers design and implement
Procedures
Data the application programs.
• End users are the people who use the application
programs to run the organization’s daily operations.

F.O. ECHOBU 8
• Procedures are the instructions and rules that govern the
design and use of the database system.
• They play an important role in a company because they
Database System enforce the standards by which business is conducted
within the organization and with customers.
Environment
Hardware • Are also used to ensure that there is an organized way to
Software monitor and audit both the data that enter the database
People
Procedures and the information that is generated through the use of
Data that data.

F.O. ECHOBU 9
• Data covers the collection of facts stored in the database.

• Data are the raw material from which information is generated.

• A data item is a piece of raw entity which has no meaning until it


is associated with a phenomenon.
Database System
Environment • User Data consists of a table(s) of data called Relation(s) where
Hardware Column(s) are called fields of attributes and rows are called
Software Records for tables. A Relation must be structured properly.
People
Procedures • Metadata is a description of the structure of the database is
Data known as Metadata. It basically means "data about data".

F.O. ECHOBU 10
 DBMS hides certain details of how data is stored and maintained
and it provides abstract view of data. This is to simplify user-
interaction with the system. Complexity (of data and data
structure) is hidden from users through several levels of
abstraction. This enables the users to manipulate the data without
Data worrying about where it is located or how it is actually stored.

Abstraction
 The overall database description can be defined at three levels,
namely, internal, conceptual, and external levels and thus, named
three-level DBMS architecture.

F.O. ECHOBU 11
Three-Schema
Architecture

F.O. ECHOBU 12
Three-Level
Architecture  Internal level: It is the lowest level of data abstraction that deals
with the physical representation of the database on the computer
and thus, is also known as physical level. It describes how the data
Internal is physically stored and organized on the storage medium
Conceptual
External

F.O. ECHOBU 13
 Conceptual level: This level of abstraction deals with the logical
Three-Level structure of the entire database and thus, is also known as logical
Architecture level. It describes what data is stored in the database, the
relationships among the data and complete view of the user’s
requirements without any concern for the physical
Internal implementation. That is, it hides the complexity of physical
storage structures. The conceptual view is the overall view of the
Conceptual database and it includes all the information that is going to be
External represented in the database.

F.O. ECHOBU 14
 External level: It is the highest level of abstraction that deals with
the user’s view of the database and thus, is also known as view
level.
Three-Level  In general, most of the users and application programs do not
Architecture require the entire data stored in the database. The external level
describes a part of the database for a particular group of users. It
permits users to access data in a way that is customized according
Internal to their needs, so that the same data can be seen by different
users in different ways, at the same time.
Conceptual
External  In this way, it provides a powerful and flexible security mechanism
by hiding the parts of the database from certain users, as the user
is not aware of existence of any attributes that are missing from
the view.

F.O. ECHOBU 15
These three levels are used to describe the schema of the database
at various levels. Thus, the three-level architecture is also known as
three-schema architecture.
 The internal level has an internal schema, which describes the
physical storage structure of the database.
Three-Schema  The conceptual level has a conceptual schema, which describes
Architecture the structure of entire database.
 The external level has external schemas or user views, which
describe the part of the database according to a particular user’s
requirements, and hide the rest of the database from that user.
 A key advantage of this architecture is data independence.

F.O. ECHOBU 16
Three-Schema
Architecture
Example

F.O. ECHOBU 17
• Data independence - A condition that exists when data
access is unaffected by changes in the physical data
storage characteristics.

Data • Data independence exists when it is possible to make


changes in the data storage characteristics without
Independence
affecting the application program’s ability to access the
data.

F.O. ECHOBU 18
• Physical Data Independence - Application programs and
ad hoc facilities are logically unaffected when physical
access methods or storage structures are changed.
• Logical Data Independence - Application programs and
ad hoc facilities are logically unaffected when changes are
made to the table structures that preserve the original
Data table values (changing order of column or inserting
columns).
Independence
• Integrity Independence - All relational integrity
constraints must be definable in the relational language
and stored in the system catalog, not at the application
level.
• Distribution Independence - The end users and
application programs are unaware and unaffected by the
data location (distributed vs. local databases).

F.O. ECHOBU 19
DBMS
Architecture

F.O. ECHOBU 20
• Listener. The listener process listens for clients’ requests and
handles the processing of the SQL requests to other DBMS
processes. Once a request is received, the listener passes the
request to the appropriate user process.
• User. The DBMS creates a user process to manage each client
session. Therefore, when you log on to the DBMS, you are
assigned a user process. This process handles all requests you
DBMS submit to the server. There are many user processes—at least
one per each logged-in client.
Processes
• Scheduler. The scheduler process organizes the concurrent
execution of SQL requests.
• Lock manager. This process manages all locks placed on
database objects, including disk pages.
• Optimizer. The optimizer process analyzes SQL queries and
finds the most efficient way to access the data.

F.O. ECHOBU 21
• Receive an application’s (or an end user’s) request.
• Validate, analyze, and decompose the request. The request
might include mathematical and/or logical operations such as
the following: Select all customers with a balance greater than
$1,000. The request might require data from only a single table,
or it might require access to several tables.

DBMS • Map the request’s logical-to-physical data components.

Activities • Decompose the request into several disk I/O operations.


• Search for, locate, read, and validate the data.
• Ensure database consistency, security, and integrity.
• Validate the data for the conditions, if any, specified by the
request.
• Present the selected data in the required format.

F.O. ECHOBU 22
• Improved data sharing.
• Improved data security.
• Better data integration.
• Minimized data inconsistency. Data inconsistency exists
when different versions of the same data appear in
different places.
DBMS Advantages • Improved data access.
• Improved decision making.
• Increased end-user productivity.

F.O. ECHOBU 23
• DBA – access authorization, coordination & monitoring
database usage, problem determination, performance
tuning etc
• Designers – identify the requirements & chose the
appropriate structures to represent & store the data
• Users (Casual, parametric, Sophisticated, stand-alone)
• System analysts & application programmers
Database Users
• DBMS system designers & implementers
• Tool developers
• Operators & maintenance personnel

F.O. ECHOBU 24
• Deciding the information content of the database
• Deciding the storage structure and access strategy
• Liaising with users
• Defining strategy for back up and recovery
• Defining security and integrity checks
Responsibilities of • Monitoring performance and responding to changing
the DBA requirements

F.O. ECHOBU 25
Data modeling

F.O. ECHOBU 26
• A data model is a relatively simple representation, usually
graphical, of more complex real-world data structures.
• In general terms, a model is an abstraction of a more
complex real-world object or event.
• A model’s main function is to help you understand the
complexities of the real-world environment.
Data Modeling • Within the database environment, a data model represents
data structures and their characteristics, relations,
constraints, transformations, and other constructs with the
purpose of supporting a specific problem domain

F.O. ECHOBU 27
Development of
Data Models

F.O. ECHOBU 28
• An entity is anything (a person, a place, a thing, or an
event) about which data are to be collected and stored. An
entity represents a particular type of object in the real
world. (e.g. CUSTOMER, STUDENT, PRODUCT)
• An entity is represented in the Entity Relationship Diagram
(ERD) by a rectangle, also known as an entity box. The
Building Blocks name of the entity, a noun, is written in the center of the
of Data Models rectangle.
Entities
Attributes • The entity name is generally written in capital letters and is
Relationships written in the singular form: STUDENT rather than
Constraints STUDENTS, and EMPLOYEE rather than EMPLOYEES.

F.O. ECHOBU 29
• An attribute is a characteristic of an entity. For example, a
STUDENT entity would be described by attributes such as
Student Surname, Student First Name, Student Phone,
Student Address, Student Course etc.
• Attributes are the equivalent of fields in file systems.
Building Blocks
of Data Models
Entities
Attributes
Relationships
Constraints

F.O. ECHOBU 30
• Three types of relationships or connectivity
• one-to-many (1:M),
• many-to-many (M:N), and
• one-to-one (1:1). The ER model

• The name of the relationship usually is an active or passive


Building Blocks verb. For example, a PAINTER paints many PAINTINGs; an
EMPLOYEE learns many SKILLs; an EMPLOYEE manages a
of Data Models DEPARTMENT.
Entities
Attributes • There are several notations like Chen’s Notation and
Relationships Crow’s Foot Notation
Constraints

F.O. ECHOBU 31
Chen’s vs Crow’s
Notations

F.O. ECHOBU 32
• A constraint is a restriction placed on the data to ensure
data integrity. Constraints are normally expressed in the
form of rules.
• For example:
• An employee’s salary must have values that are between 30,000
and 500,000.
Building Blocks • A student’s GPA must be between 0.00 and 5.00.
of Data Models • An account number must have ten digits.
Entities •
Attributes
Relationships
Constraints

F.O. ECHOBU 33
• The network model was created to represent complex data
relationships more effectively than the hierarchical model,
to improve database performance, and to impose a
database standard.
• In the network model, the user perceives the network
database as a collection of records in 1:M relationships.
Network Data
Models • However, unlike the hierarchical model, the network
model allows a record to have more than one parent.
• In network database terminology, a relationship is called a
set. Each set is composed of at least two record types: an
owner record and a member record. A set represents a 1:M
relationship between the owner and the member.

F.O. ECHOBU 34
Network Diagram

Network Data
Models

F.O. ECHOBU 35
• Its basic logical structure is represented by an upside-down tree.
The hierarchical structure contains levels, or segments. A
segment is the equivalent of a file system’s record type.
• Within the hierarchy, the top layer (the root) is perceived as the
parent of the segment directly beneath it.
• The hierarchical model depicts a set of one-to-many (1:M)
Hierarchical Data relationships between a parent and its children segments. (Each
parent can have many children, but each child has only one
Models parent.)
• Limitations:
• It was complex to implement,
• It was difficult to manage, and
• It lacked structural independence.
• There were no standards for how to implement the model

F.O. ECHOBU 36
Hierarchical Diagram

Hierarchical Data
Models

F.O. ECHOBU 37
• The relational model was introduced in 1970 by E. F. Codd
(of IBM) in his landmark paper “A Relational Model of Data
for Large Shared Databanks”.
• The relational model foundation is a mathematical concept
known as a relation.
Relational Data • You can think of a relation (sometimes called a table) as a
Models matrix composed of intersecting rows and columns.
• Each row in a relation is called a tuple.
• Each column represents an attribute.

• The relational data model is implemented through


Relational database management system
software such as Oracle, DB2, Microsoft SQL Server,
MySQL, and other mainframe relational software.

F.O. ECHOBU 38
Relational Diagram

Relational Data
Models

F.O. ECHOBU 39
The OO data model is based on the following components:
• An object is an abstraction of a real-world entity. In general
terms, an object may be considered equivalent to an ER
model’s entity. More precisely, an object represents only
one occurrence of an entity. (The object’s semantic content
is defined through several of the items in this list.)
Object Oriented • Attributes describe the properties of an object. For
Data Models example, a PERSON object includes the attributes Name,
Social Security Number, and Date of Birth.
• Objects that share similar characteristics are grouped in
classes. A class is a collection of similar objects with shared
structure (attributes) and behavior (methods). In a general
sense, a class resembles the ER model’s entity set.
However, a class is different from an entity set in that it
contains a set of procedures known as methods.

F.O. ECHOBU 40
• A class’s method represents a real-world action such as
finding a selected PERSON’s name, changing a PERSON’s
name, or printing a PERSON’s address. In other words,
methods are the equivalent of procedures in traditional
programming languages. In OO terms, methods define an
object’s behavior. Classes are organized in a class
hierarchy. The class hierarchy resembles an upside-down
Object Oriented tree in which each class has only one parent. For example,
Data Models the CUSTOMER class and the EMPLOYEE class share a
parent PERSON class. (Note the similarity to the
hierarchical data model in this respect.)
• Inheritance is the ability of an object within the class
hierarchy to inherit the attributes and methods of the
classes above it. For example, two classes, CUSTOMER and
EMPLOYEE, can be created as subclasses from the class
PERSON. In this case, CUSTOMER and EMPLOYEE will
inherit all attributes and methods from PERSON.
F.O. ECHOBU 41
• Facilitates interaction among the designer, the
applications programmer, and the end user.
• A well-developed data model can even foster improved
understanding of the organization for which the database
design is developed.
Importance of
Data Modeling

F.O. ECHOBU 42
 Normalization is a process for evaluating and correcting table
structures to minimize data redundancies, thereby helping to
eliminate data anomalies. It helps us evaluate table structures and
produce good tables. Normalization is very important in database
design.
 Normalization Works through a series of stages called normal
forms:
 Normal form (1NF)
Normalization Second normal form (2NF) The four most commonly
 Third normal form (3NF) used Normal Forms
 Boyce-Codd Normal Form (BCNF)
 Forth Normal (4NF)
 Fifth Normal (5NF)
 Domain-key normal form (DKNF)

F.O. ECHOBU 43
 To illustrate the normalization process, we will examine a simple
business application.
 In this case we will explore the simplified database activities of a
construction company that manages several building projects.
Normalisation
 Each project has its own project number, name, employees
assigned to it and so on. Each employee has an employee number,
Example name, and job classification such as engineer or computer
technician.
 The company charges its clients by billing the hours spent on each
contract. The hourly billing rate is dependent on the employee’s
position. Periodically, a report is generated that contains the
information displayed below.

F.O. ECHOBU 44
Normalization

F.O. ECHOBU 45
The structure of the data set above does not handle data very well for
the following reasons:
 The project number (PROJ_NUM) is apparently intended to be a
primary key, but it contains nulls.
 The table entries invite data inconsistencies. For example, the
JOB_CLASS value “elect.Engineer” might be entered as “elect.Eng.”
In some cases, “el. Eng” or “EE” in others.

Normalization  The table displays data redundancies. These data redundancies yield
the following anomalies:
o update anomalies. Modifying the JOB_CLASS for employee number 105
requires (potentially many alterations, one for each EMP_NUM =105)
o insertion anomalies. Just to complete a row definition, a new employee
must be assigned to a project. If the employee is not yet assigned, a
phantom project must be created to complete the employee data entry.
o Deletion anomalies. If employee 103 quits, deletions must be made for
every entry in which EMP_NUM =103. Such deletions will result in loosing
other vital data of project assignments from the database.

F.O. ECHOBU 46
 The table above contains what is known as repeating groups. A
repeating group derives its name from the fact that a group of
multiple (related) entries can exist for any single key attribute
occurrence.

 A relational table must not contain repeating groups. The


existence of repeating groups provides evidence that the table
Normalization fails to meet even the lowest normal form requirements, thus
reflecting data redundancies.

 Normalizing the table structure will reduce these data


redundancies. If repeating groups do exist, they must be
eliminated by making sure that each row defines a single entity. In
addition, the dependencies must be identified to diagnose the
normal form.

F.O. ECHOBU 47
 The normalization process starts with a simple three-step
procedure:
Normalization
 Step 1: Eliminate the Repeating Groups
Conversion to First  Step 2: Identify the Primary Key
Normal Form
 Step 3: Identify all Dependencies

F.O. ECHOBU 48
Normalization
Step 1: Eliminate the Repeating Groups
Conversion to First  To eliminate the repeating groups, eliminate the nulls by making
Normal Form sure that each repeating group attribute contains an appropriate
data value. This change converts the table1 above to table below

F.O. ECHOBU 49
Normalization
Conversion to First Normal
Form

F.O. ECHOBU 50
Step 2: Identify the Primary Key
Normalization  In the layout in table2 even a casual observer will note that
PROJ_NUM is not an adequate primary key because the project
number does not uniquely identify all the remaining entity (row)
Conversion to First attributes.
Normal Form
 To maintain a proper primary key that will uniquely identify any
attribute value, the new key must be composed of a combination of
PROJ_NUM and EMP_NUM.

F.O. ECHOBU 51
Step 3: Identify all Dependencies
 Dependencies can be depicted with the help of a dependency
Normalization diagram, such diagram depicts all the dependencies found within
a given table structure.

Conversion to First
Normal Form  Dependency diagrams are very helpful makes it easy to view all
the relationships among a table’s attributes, and their use makes it
much less likely for one to overlook an important dependency.

F.O. ECHOBU 52
Dependency Diagram: 1NF

Normalization

Conversion to First
Normal Form

F.O. ECHOBU 53
 Dependency Diagram 1NF Notes
Looking at the dependency diagram above:
 The primary key attributes are bold, underlined, and shaded in a
different color.
Normalization
 The arrows above the attributes indicate all desirable
dependencies, that is, dependencies that are based on the primary
Conversion to First key. In this case, note that the entity’s attributes are dependent on
Normal Form the combination of PROJ_NUM and EMP_NUM.
 The arrows below the dependency diagram indicate less-desirable
dependencies. Two types of such dependencies exist:
 Partial dependencies. Dependencies based on only a part of a
composite primary key are called partial dependencies.
 Transitive dependencies. A transitive dependency is a dependency
of one nonprime attribute on another nonprime attribute. The
problem with transitive dependencies is that they still yield data
anomalies.
F.O. ECHOBU 54
1NF requirements
Normalization
1. All the key attributes are defined.
2. There are no repeating groups in the table.
Conversion to First
Normal Form 3. All attributes are dependent on the primary key.
All relational tables satisfy the 1NF requirements. The problem
with the 1NF table2 structure above is that it contains partial
dependencies and transitive dependency.

F.O. ECHOBU 55
The rule of conversion from INF format to 2NF format is:
Eliminate all partial dependencies from the 1NF format.
Normalization The conversion from 1nf to 2nf format is done in two steps:
 Step 1: Identify All the Key Components
Conversion to Second  Step 2: Identify the Dependent Attributes
Normal Form

F.O. ECHOBU 56
Step 1: Identify All the Key Components
 Eliminating partial dependencies from the 1NF table will result in
producing three tables from the original table.
 From the dependency diagram we can observe that two partial
dependencies exist:
Normalization  PROJ_NAME depends on PROJ_NUM, and
 EMP_NAME, JOB_CLASS, and CHG_HOUR depend on EMP_NUM.
 To eliminate the existing two partial dependencies, write each
Conversion to Second component on a separate line, and then write the original
Normal Form (composite) key on the last line.
 PROJ_NUM
 EMP_NUM
 PROJ_NUM EMP_NUM
 Each component will become the key in a new table. The original
table is now divided into three tables: PROJECT, EMPLOYEE, and
ASSIGN.

F.O. ECHOBU 57
Step 2: Identify the Dependent Attributes
 Determine which attributes are dependent on which other
attributes.
Normalization
 PROJECT (Proj_num, Proj_name)
Conversion to
Second Normal  EMPLOYEE (Emp_num, Emp_name, Job_class, Chg_hour)
Form
 ASSIGN (Proj_num, Emp_num, Assign_hours)

F.O. ECHOBU 58
Dependency Diagram 2NF

Normalization

Conversion to
Second Normal
Form

F.O. ECHOBU 59
Conditions for 2NF
A table is in second normal form (2NF) if it satisfies the following
two conditions:
Normalization  It is in 1NF
 It includes no partial dependencies; that is, no attribute is dependent
on only a portion of the primary key.
Conversion to Second
Normal Form  A partial dependency can exist only if a table’s primary key is
composed of several attributes
 A table whose primary key consists of only a single attribute is
automatically in 2nf if it is in 1nf.

F.O. ECHOBU 60
The rule of conversion from 2NF format to 3NF format is:
Eliminate all transitive dependencies from the 2NF format.

Normalization  The conversion from 2nf to 3nf format is done in three steps:
 Step 1: Identify Each New Determinant
Conversion to Third
Normal Form  Step 2: Identify the Dependent Attributes
 Step 3: Remove the Dependent Attributes from Transitive
Dependencies

F.O. ECHOBU 61
Step 1: Identify Each New Determinant
 For every transitive dependency, write its determinant as a PK for
Normalization a new table. (A determinant is any attribute whose value
determines other values within a row.)
 If you have three different transitive dependencies, you will have
Conversion to Third three different determinants. The 2NF dependency diagram
Normal Form shows only one case of transitive dependency.
 Therefore, write the determinant for this transitive dependency:
 JOB_CLASS

F.O. ECHOBU 62
Step 2: Identify the Dependent Attributes
 Identify the attributes that are dependent on each determinant
identified in step 1 and identify the dependency. In this case, you
write:
Normalization
Conversion to Third  JOB_CLASS  CHG_HOUR
Normal Form
 Name the table to reflect its contents and function. In this case,
JOB seems appropriate.

F.O. ECHOBU 63
Step 3: Remove the Dependent Attributes from Transitive
Dependencies
 Eliminate all the dependent attributes in the transitive
Normalization relationship(s) from each of the tables that have such a transitive
relationship.
Conversion to Third  Draw a new dependency diagram to show all the tables
Normal Form  Check the tables to make sure that each table has a determinant
and that no table contains inappropriate dependencies (partial or
transitive).
 When steps 1-3 above have been completed, the resulting tables
will be shown below

F.O. ECHOBU 64
Dependency Diagram for the 3NF

Normalization
Conversion to Third
Normal Form

F.O. ECHOBU 65
Conditions for 3NF
Normalization A table is in third normal form (3NF) if the following two conditions
are satisfied:
Conversion to Third 1. It is in 2NF
Normal Form
2. it contains no transitive dependencies

F.O. ECHOBU 66
 Facilitates data integration.
 Reduces data redundancy.
Benefits of  Provides a robust architecture for retrieving and maintaining
Normalization data.
 Compliments data modelling.
 Reduces the chances of data anomalies occurring.

F.O. ECHOBU 67
Relational databases

F.O. ECHOBU 68
Database query
languages

F.O. ECHOBU 69
 The main objective of a database management system is to allow
its users to perform a number of operations on the database such
as insert, delete, and retrieve data in abstract terms without
knowing about the physical representations of data.
DBMS  A very important advantage of using a DBMS is that it offers data
Languages independence. That is, application programs are insulated from
changes in the way the data is structured and stored. Data
independence is achieved through use of the three levels of data
abstraction; in particular, the conceptual schema and
 The external schema provide distinct benefits in this area.

F.O. ECHOBU 70
 The DBMS provides two database languages to implement the
databases, namely:
 Data definition language (DDL)
 Data manipulation language (DML)

 Data definition language (DDL) Is used for defining the database


schema. DDL accepts input in the form of instructions
(statements) and generates the description of schema as output.
DBMS The output is placed in the data dictionary, which is a special type
of table containing metadata. The DBMS refers the data
Languages dictionary before reading or modifying the data. Note that the
database users cannot update the data dictionary; instead it is
only modified by database system itself.
 Data manipulation language (DML) Enables users to retrieve and
manipulate the data. The statement which is used to retrieve the
information is called a query. The part of the DML used to retrieve
the information is called a query language.

F.O. ECHOBU 71
Creating and Modifying Relations Using SQL
 The main construct for representing data in the relational model is
a relation. A relation consists of a relation schema and a relation
instance. The relation instance is a table, and the relation schema
DBMS describes the column heads for the table.
Languages  We first describe the relation schema and then the relation
instance. The schema specifies the relation's name, the name of
each field (or column, or attribute), and the domain of each field.
 A domain is referred to in a relation schema by the domain name
and has a set of associated values.

F.O. ECHOBU 72
Example of Relational Schema
DBMS  Students(sid: string, name: string, login: string, age: integer, gpa:
real)
Languages
 This shows that the relations name is student and the schemas are
SID of string data type, name of string data type, login of string
also, age of inter and GPA of real.

F.O. ECHOBU 73
Example of Instance of a Relation

Sid name Login Age Gpa

001 John john@fudma.edu.ng 18 3.35

002 David david@fudma.edu.ng 19 3.42


DBMS
003 Martins martins@fudma.edu.ng 17 3.56
Languages
004 Umar umar@fudma.edu.ng 20 3.20

 An instance of a relation is a set of tuples, also called records, in


which each tuple has the same number of fields as the relation
schema. A relation instance can be thought of as a table in which
each tuple is a row, and all rows have the same number of fields.
(The term relation instance is often abbreviated to just relation,
when there is no confusion with other aspects of a relation such as
its schema.)
F.O. ECHOBU 74
DDL EXAMPLE
DBMS
 The subset of SQL that supports the creation, deletion, and
Languages modification of tables is called the Data Definition Language
(DDL). Example:
DDL

F.O. ECHOBU 75
Creation of Relation
The CREATE TABLE statement is used to define a new table. To
create the Students relation, we can use the following statement:
DBMS CREATE TABLE Students
Languages ( sid CHAR(20) ,
Name CHAR(30) ,
Login CHAR(20) ,
DDL
Age INTEGER,
Gpa REAL)

F.O. ECHOBU 76
Creation of Instance of a Relation
 Tuples (Instances or records) are inserted, using the INSERT
command.
DBMS
Languages We can insert a single tuple into the Students table as follows:
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, 'Smith', 'smith@ee', 18, 3.2)

F.O. ECHOBU 77
Deleting Tuples From Relation
We can delete tuples using the DELETE command. We can delete all
DBMS Students tuples with name equal to Smith using the command:
Languages
DELETE FROM Students
WHERE name = ‘SMITH'

F.O. ECHOBU 78
Integrity Constraints Over Relations
 A database is only as good as the information stored in it, and a
DBMS must therefore help prevent the entry of incorrect
information.
DBMS  An integrity constraint (IC) is a condition specified on a database
Languages schema to restrict the entrance of ‘wrong data’ in an instance of
the database.
 If a database instance satisfies all the integrity constraints
specified on the database schema, it is a legal instance.
 A DBMS enforces integrity constraints, in that it permits only legal
instances to be stored in the database.

F.O. ECHOBU 79
Implementation of ICs
Integrity constraints are specified and enforced at different times:
DBMS  When the DBA or end user defines a database schema, he or she
specifies the ICS that must hold on any instance of this database.
Languages
 When a database application is run, the DBMS checks for
violations and disallows changes to the data that violate the
specified ICs.

F.O. ECHOBU 80
Referential Integrity Constraints
 This is the process of ensuring that data is consistent between
related tables. It deals with parent child relationship in a database.
This type of constrains are usually created to ensure uniformity
DBMS among related tables.
Languages  It is implemented using keys. A key is a column value that uniquely
identified a record in a database. It is used to establish relationship
with other tables. It is usually one to one mapping although
sometimes you can have double or multiple mappings. It is usually
referred to as parent child relationship.
 There are two types of keys: primary keys and foreign keys.

F.O. ECHOBU 81
Primary Key and Foreign Key
 A primary key is column values that uniquely identify a row of
data in a table. It is what is used to establish relationship among
tables. Example of primary key in student table is the student
matriculation number; two students can never have the same
matriculation number.
DBMS
 A foreign key is that column value in a table that references a
Languages primary key in another table. Foreign keys are used to indicate
child tables. A foreign key ensures that the parent records are
created before the child record. It also ensures the deletion of
child record before the parent record.
 In a database, relationships are usually established through the
use of foreign keys and primary keys.
 The purpose of separating data into tables and establishing table
relationship is to reduce data redundancy. The process of reducing
data redundancy is called normalization.

F.O. ECHOBU 82
Primary Key and Foreign Key

DBMS Primary Key Reference

Languages

Foreign Key Constrain

F.O. ECHOBU 83
Key Constraints
 A key constraint is a statement that a certain minimal subset of
the fields of a relation is a unique identifier for a tuple.
DBMS  A set of fields that uniquely identifies a tuple according to a key
Languages constraint is called a candidate key (often abbreviated as key).
Considering the students relation above, the sid field is a candidate
key.

F.O. ECHOBU 84
Specifying Key Constraints in SQL
 In SQL, we can declare a key by using the UNIQUE constraint. At
most one of these candidate keys can be declared to be a primary
key, using the PRIMARY KEY constraint. (SQL does not require
that such constraints be declared for a table.) Let’s consider our
student table definition and specify key information:
CREATE TABLE Students ( MATRICNO CHAR(20) ,
DBMS Name CHAR (30) ,
Languages login CHAR(20) ,
age INTEGER,
gpa REAL,
UNIQUE (name, age),
CONSTRAINT StudentsKey PRIMARY KEY (MARTICNO) )

 This definition says that MATRICNO is the primary key and the
combination of NAME and AGE is also a key.

F.O. ECHOBU 85
Foreign Key Constraint
 Foreign key can be defined as a set of fields in one relation that is
used to `refer’ to a tuple in another relation. (Must correspond to
the primary key of the second relation).
 Sometimes the information stored in a relation is linked to the
information stored in another relation. If one of the relations is
DBMS modified, the other must be checked, and perhaps modified, to
keep the data consistent.
Languages  An IC involving both relations must be specified if a DBMS is to
make such checks. The most common IC involving two relations is
a foreign key constraint.
 The foreign key in the referencing relation must match the
primary key of the reference relation, that is it must have the
same number of columns and compatible data types though the
column names can be different.

F.O. ECHOBU 86
Representing Foreign keys in SQL

CREATE TABLE Enrolled ( MATRICNO CHAR(20) ,


dd CHAR(20) ,
grade CHAR(10),
DBMS PRIMARY KEY (MATRICNO, dd),
Languages FOREIGN KEY (MATRICNO) REFERENCES Students
ON DELETE CASCADE
ON UPDATE NO ACTION)

F.O. ECHOBU 87
Querying Relational Data

 A relational database query (query) is a question about the data,


DBMS and the answer consists of a new relation containing the result.
Languages  For example, we might want to find all students younger than 18
or all students enrolled in a course. A query language is a
specialized language for writing queries.
 SQL is the most popular commercial query language for a
relational DBMS.

F.O. ECHOBU 88
Representing Query in SQL
Example: Consider the instance of the students relation used
previously. We can retrieve rows corresponding to students who are
younger than 18 with the following SQL query:
DBMS SELECT *
FROM Students
Languages WHERE age < 18
The symbol “*” means that we retain all fields of selected tuples in
the result.

F.O. ECHOBU 89
Distributed databases

F.O. ECHOBU 90
Distributed
 A distributed database management system (DDBMS) governs
Database the storage and processing of logically related data over
interconnected computer systems in which both data and
Management processing functions are distributed among several sites.
Systems

F.O. ECHOBU 91
Distributed
Database
Management
Systems

F.O. ECHOBU 92
 Computer workstations (sites or nodes) that form the network
system. The distributed database system must be independent of
the computer system hardware.
 Network hardware and software components that reside in each
Distributed workstation. The network components allow all sites to interact
and exchange data. Because the components—computers,
DBMS operating systems, network hardware, and so on—are likely to be
supplied by different vendors, it is best to ensure that distributed
database functions can be run on multiple platforms.
Components
 Communications media that carry the data from one workstation
to another. The DDBMS must be communications-media-
independent; that is, it must be able to support several types of
communications media.

F.O. ECHOBU 93
 The Transaction Processor (TP), which is the software
component found in each computer that requests data. The
Distributed transaction processor receives and processes the application’s
data requests (remote and local). The TP is also known as the
DBMS application processor (AP) or the transaction manager (TM).
 The Data Processor (DP), which is the software component
residing on each computer that stores and retrieves data located
Components at the site. The DP is also known as the data manager (DM). A data
processor may even be a centralized DBMS.

F.O. ECHOBU 94
 Data are located near the greatest demand site. The data in a distributed
database system are dispersed to match business requirements.
 Faster data access. End users often work with only a locally stored subset
of the company’s data.
Distributed
 Faster data processing. A distributed database system spreads out the
DBMS systems workload by processing data at several sites.
 Growth facilitation. New sites can be added to the network without
Advantages affecting the operations of other sites.
 Improved communications. Because local sites are smaller and located
closer to customers, local sites foster better communication among
departments and between customers and company staff.

F.O. ECHOBU 95
 Reduced operating costs. It is more cost-effective to add workstations to
a network than to update a mainframe system. Development work is
done more cheaply and more quickly on low-cost PCs than on
mainframes.

Distributed  User-friendly interface. PCs and workstations are usually equipped with
an easy-to-use graphical user interface (GUI). The GUI simplifies training
DBMS and use for end users.
 Less danger of a single-point failure. When one of the computers fails,
the workload is picked up by other workstations. Data are also
Advantages distributed at multiple sites.
 Processor independence. The end user is able to access any available
copy of the data, and an end user’s request is processed by any processor
at the data location.

F.O. ECHOBU 96
 Complexity of management and control. Applications must recognize
data location, and they must be able to stitch together data from various
sites. Database administrators must have the ability to coordinate
database activities to prevent database degradation due to data
Distributed anomalies.
 Technological difficulty. Data integrity, transaction management,
DBMS concurrency control, security, backup, recovery, query optimization,
access path selection, and so on, must all be addressed and resolved.
Disadvantages  Security. The probability of security lapses increases when data are
located at multiple sites. The responsibility of data management will be
shared by different people at several sites.

F.O. ECHOBU 97
 Lack of standards. There are no standard communication protocols at
the database level. (Although TCP/IP is the de facto standard at the
network level, there is no standard at the application level.) For example,
different database vendors employ different—and often incompatible—
techniques to manage the distribution of data and processing in a
Distributed DDBMS environment.
 Increased storage and infrastructure requirements. Multiple copies of
DBMS data are required at different sites, thus requiring additional disk storage
space.
Disadvantages  Increased training cost. Training costs are generally higher in a
distributed model than they would be in a centralized model, sometimes
even to the extent of offsetting operational and hardware savings.
 Costs. Distributed databases require duplicated infrastructure to operate
(physical location, environment, personnel, software, licensing, etc.)

F.O. ECHOBU 98
 Application interface to interact with the end user, application
programs, and other DBMSs within the distributed database.
 Validation to analyze data requests for syntax correctness.
 Transformation to decompose complex requests into atomic data
request components.
Distributed  Query optimization to find the best access strategy. (Which database
DBMS fragments must be accessed by the query, and how must data updates, if
any, be synchronized?)
 Mapping to determine the data location of local and remote fragments.
Characteristics
 I/O interface to read or write data from or to permanent local storage.
 Formatting to prepare the data for presentation to the end user or to an
application program.
 Security to provide data privacy at both local and remote databases.

F.O. ECHOBU 99
 Backup and recovery to ensure the availability and recoverability of the
database in case of a failure.

Distributed  DB administration features for the database administrator.


 Concurrency control to manage simultaneous data access and to ensure
DBMS data consistency across database fragments in the DDBMS.
 Transaction management to ensure that the data moves from one
Characteristics consistent state to another. This activity includes the synchronization of
local and remote transactions as well as transactions across multiple
distributed segments

F.O. ECHOBU 100


Physical Database
Design

F.O. ECHOBU 101


 The systematic process of designing a database is known as design
methodology
Database Design  Database design involves the following:
Methodology  Understanding operational and business needs of an organization,
 Modeling the specified requirements, and
 Realizing the requirements using a database

F.O. ECHOBU 102


 Requirement collection and analysis
 Conceptual database design
 Choice of a DBMS
Phases of
 Logical database design
Database Design
 Physical database design
& Implementation
 Database system implementation
 Testing and evaluation

F.O. ECHOBU 103


 What is the current system?
 What changes need to be made?
 Is the change total?
Requirement  What data needs to be stored?
Collection and  What applications should be developed?

Analysis  What do the users need from the database?


 Requirement specification techniques such as object-oriented analysis
(OOA), data flow diagrams (DFDs), etc., are used to transform these
requirements into better structured form.
 Results: Use Requirements Specification document

F.O. ECHOBU 104


 A high-level description of the data to be stored in the database, along
with the constraints known to hold over this data is developed
 The goal is to create a simple description of the data that closely
Conceptual matches how users and developers think of the data (and the people
and processes to be represented in the data).
Database  The conceptual schema should be expressive, simple, understandable,
Design minimal, and formal.
 Tool: ER model
 Result: ER Diagram

F.O. ECHOBU 105


 The choice of a DBMS depends on many factors such as cost, DBMS
features and tools, underlying model, portability, and DBMS hardware
requirements. The technical factors that affect the choice of a DBMS
are the type of DBMS (relational, object, object-relational, etc.),
Choice of storage structures and access paths that DBMS supports, the
interfaces available, the types of high-level query languages, and the
DBMS architecture it supports (client/server, parallel or distributed). The
various types of costs that must be considered while choosing a DBMS
are software and hardware acquisition cost, maintenance cost,
database creation and conversion cost, personnel cost, training cost,
and operating cost.

F.O. ECHOBU 106


 Once an appropriate DBMS is chosen, the next step is to map the high-
level conceptual schema onto the implementation data model of the
Logical selected DBMS. In this phase, the database designer moves from an
abstract data model to the implementation of the database.
Database  In case of relational model, this phase generally consists of mapping
Design the E-R model into a relational schema.

F.O. ECHOBU 107


 In this phase, the physical features such as storage structures, file
Physical organization, and access paths for the database files are specified to
achieve good performance. The various options for file organization
Database and access paths include various types of indexing, clustering of
records, hashing techniques, etc.
Design

F.O. ECHOBU 108


Database  Once the logical and physical database designs are completed, the
database system can be implemented. DDL statements of the selected
System DBMS are used and compiled to create the database schema and
Implementation database files, and finally the database is loaded with the data.

F.O. ECHOBU 109


 In this phase, the database is tested and fine-tuned for the
performance, integrity, concurrent access, and security constraints.
This phase is carried out in parallel with application programming. If
Testing and the testing fails, various actions are taken such as modification of
physical design, modification of logical design or upgrade or change
Evaluation DBMS software or hardware.

F.O. ECHOBU 110


Database Performance
Tuning & Query
Optimisation

F.O. ECHOBU 111


 Database performance tuning refers to a set of activities and
procedures designed to reduce the response time of the database
system—that is, to ensure that an end-user query is processed by
the DBMS in the minimum amount of time.
Database  End users interact with the DBMS through the use of queries to
Performance generate information, using the following sequence:
 1. The end-user (client-end) application generates a query.
Tuning  2. The query is sent to the DBMS (server end).
 3. The DBMS (server end) executes the query.
 4. The DBMS sends the resulting data set to the end-user (client-
end) application.

F.O. ECHOBU 112


 The performance of a typical DBMS is constrained by three main
Database factors:
Performance  CPU processing power,
 Available primary memory (RAM), and
Tuning  Input/output (hard disk and network) throughput.

F.O. ECHOBU 113


Database
Performance
Tuning
Guidelines for Better
Performance

F.O. ECHOBU 114


Introduction to Data
Warehousing

F.O. ECHOBU 115


 Bill Inmon, the acknowledged “father” of the data warehouse,
defines it as “an integrated, subject-oriented, time-variant,
nonvolatile collection of data that provides support for decision
making.
 The data warehouse is usually a read-only database optimized for
Data data analysis and query processing.
Warehouse  Typically, data are extracted from various sources and are then
transformed and integrated—in other words, passed through a
data filter—before being loaded into the data warehouse.
 Users access the data warehouse via front-end tools and/or end-
user application software to extract the data in usable form.

F.O. ECHOBU 116


 Integrated. The data warehouse is a centralized, consolidated
database that integrates data derived from the entire
organization and from multiple sources with diverse formats. Data
integration implies that all business entities, data elements, data
Data characteristics, and business metrics are described in the same
way throughout the enterprise.
Warehouse  Subject-oriented. Data warehouse data are arranged and
optimized to provide answers to questions coming from diverse
functional areas within a company. Data warehouse data are
Characteristics organized and summarized by topic, such as sales, marketing,
finance, distribution, and transportation. For each topic, the data
warehouse contains specific subjects of interest—products,
customers, departments, regions, promotions, and so on.

F.O. ECHOBU 117


 Time-variant. In contrast to operational data, which focus on
current transactions, warehouse data represent the flow of data
through time. The data warehouse can even contain projected
data generated through statistical and other models. It is also
Data time-variant in the sense that once data are periodically uploaded
to the data warehouse, all time-dependent aggregations are
Warehouse recomputed.
 Nonvolatile. Once data enter the data warehouse, they are never
removed. Because the data in the warehouse represent the
Characteristics company’s history, the operational data, representing the near-
term history, are always added to it. Because data are never
deleted and new data are continually added, the data warehouse
is always growing.

F.O. ECHOBU 118


Operational
Database Data
vs
Data
Warehouse

F.O. ECHOBU 119


Data
Warehouse
Creation

F.O. ECHOBU 120


In 1994, William H. Inmon and Chuck Kelley created 12 rules defining
a data warehouse:
1. The data warehouse and operational environments are separated.
2. The data warehouse data are integrated.
3. The data warehouse contains historical data over a long time.
Data 4. The data warehouse data are snapshot data captured at a given
point in time.
Warehouse
5. The data warehouse data are subject oriented.
Rules 6. The data warehouse data are mainly read-only with periodic
batch updates from operational data. No online updates are
allowed.
7. The data warehouse development life cycle differs from classical
systems development. The data warehouse development is data-
driven; the classical approach is process-driven.

F.O. ECHOBU 121


8. The data warehouse contains data with several levels of detail:
current detail data, old detail data, lightly summarized data, and highly
summarized data.
9. The data warehouse environment is characterized by read-only
transactions to very large data sets. The operational environment is
characterized by numerous update transactions to a few data entities at
a time.
Data 10. The data warehouse environment has a system that traces data
sources, transformations, and storage.
Warehouse 11. The data warehouse’s metadata are a critical component of this
environment. The metadata identify and define all data elements. The
Rules metadata provide the source, transformation, integration, storage,
usage, relationships, and history of each data element.
12. The data warehouse contains a chargeback mechanism for resource
usage that enforces optimal use of the data by end users.

F.O. ECHOBU 122

You might also like