You are on page 1of 60

RDBMS CONCEPTS

What is Database ?
A database is a collection of information—preferably related
information and preferably organized.
-a database is a structured object.
-the structured object consists of data and metadata
-the metadata is the structured part (table definition)
-data is the actual stored descriptive information.
NAME EDUCATION % OF Marks
SMITH M Tech 98
JONES B Tech 89
MILLER Phd 87
…. ...... ….
…. ...... ….
What is DBMS ?
(DataBase Management System)

DBMS is collection of programs that enables you to store,


modify, and extract information from a database.

The general purpose of a DBMS is to provide for the


definition, storage, and management of data in a centralized
area that can be shared by many users.
Role of DBMS
•data accessibility :It must ensure that each piece of data is
available to all those users who need it.
•data security : It must protect the data and ensure that it is
not lost or damaged.
•data privacy or data confidentiality : It must protect the data
from access and change by unauthorized users.
•data integrity : It must also guarantee that the data is
consistent. This means that we must be confident that the data
is accurate and reliable and we must be able to insist on
certain values for the data
For example :
- every employee must have a name (not null)
- every row in a table must be unique
- anyone who is a manager must be a employee (FK)
- a student's date of birth must be acceptable (30th, February
1973 would be invalid).
•data concurrency :on a system where several users may be
handling the same data at the same time, the DBMS must also
ensure that there is no confusion when two or more users try
to update the same record at the same time.
data independence : For Programmers who are writing
programs for use on a DBMS, DBMS should make sure that
•Individual users and programmers should not have to worry
about how the data is held
•Nor should they need to know exactly where the data is held
•Nor should they be concerned with pieces of data which are
not used by their particular program
When these last three features are available, the DBMS is said
to offer data independence or data transparency
distributed database : it would be useful if DBMS support
networking for accessing database which may be stored across
the network
Data Model
•A data model is a model that describes in
an abstract way how data is represented in
an information system or a database
management system (DBMS).
The evolution of database modeling techniques
Relational Database Model
The relational database model improves on the restriction of a
hierarchical structure, not completely abandoning the
hierarchy of data, as shown in Figure.

Any table can be accessed directly without having to access all


parent objects.

The trick is to know what to look for—if you want to find the
address of a specific employee, you have to know which
employee to look for, or you can simply examine all
employees.

You don’t have to search the entire hierarchy, from the


company downward, to find a single employee.
Another benefit of the
relational database
model is that any
tables can be linked
together, regardless of
their hierarchical
position.
Relational Database Management System – RDBMS

A RDBMS is a term used to describe an entire suite of


programs for both managing a relational database and
communicating with that relational database engine.

Sometimes Software Development Kit (SDK) front-end tools


and complete management kits are included with relational
database packages (eg: MS Access)

In other words, an RDBMS is both the database engine and


any other tools that come with it.
Relation/Table
Attribute/column/field

Tuples/
Record/
Row

Domain
Relational Databases
A relational database is organized as a
-set of tables of data and a
-mathematical language (relational algebra) – is
available to manipulate the tables.
The presentation of data in the form of tables is known as
a relation
A relation is made up of a number of tuples (rows) and
attributes (columns).
Domains are range of values which can appear in each
column are taken from the domain for that attribute.
Example :The AGE will be in the range 16 to 65, so the
domain of the AGE attribute will be an integer in the
range 16 to 65
Relational algebra
In addition to establishing and maintaining the individual
tuples, there are also facilities for manipulating entire
relations.
These operations form what is known as relational
algebra.
Typically, the operations of relational algebra allow us to:
•SELECT – extract tuples to form a new relation
•PROJECT – extract attributes to form a new relation
•JOIN – add attributes from one relation to another
relation
•PRODUCT – combine all the tuples of one relation will all
those of another relation
•UNION – join two relations to form a new relation
•INTERSECT – form a new relation from two relations
where the key values are the same
The formal language of relational algebra is mathematical
in form and differs from that of the commercial RDBMSs.
Database Model Design
Design is the process of ensuring that it all works without actually
building it. Design is a little like testing something on paper before
spending thousands of hours building it in possibly the wrong way.

Data structure diagram


A useful way in which we can depict the entities and the
relationships between them is by means of a data structure
diagram or DSD.
It is a picture of the data encountered within a computer
system.
The data structure diagram is also known as a data model or a
logical model or an entity-relationship model.
A data structure diagram

Entities
TUTOR This DSD tells us that we
need to hold data about:
teaches on
•The tutors
•The courses, and
COURSE •The students
The information which we
attended by need to know about each
entity are its attributes

STUDENT
A data structure diagram showing attributes

Tutor number
Name
TUTOR Address
Subjects taught
teaches on

Title
COURSE Examinations available

attended by
Student enrolment number
Name
Address
STUDENT Telephone number
Subjects studied
Relationships
An entity does not exist in isolation, but is associated with
other entities by means of a relationship.
On a DSD the entities are linked, one to another, by lines
representing the relationships.
The relationships in our DSD are:
•teaches on - One tutor teaches on many courses.
•is attended by - Each course is attended by many
students.
Note that the arrowed lines point from the one end to the
many end: from the one tutor to the many courses,
and from the one course to the many students.
Types of relationship
2. One-to-one relationship
3. One-to-Many relationship
4. Many-to-Many relationship
One-to-One Relationship
Better transformation of one-to-one relationship
One to Many Relationship
Correct transformation of one-to-many relationship
Many to Many ?
Many to Many ….?
Transformation of many-to-many relationship
TWO One – Many Relationship
Navigating through a data structure diagram
In any data structure diagram, it should be possible
conceptually to navigate from entity to entity by
way of the relationships.
By doing this, we shall be creating paths by which
the DBMS – or our programs – can access the data
and use these paths to answer enquiries about the
data.
COMPANY ORDER

cno cname street city ord_no cno ord_date ship_date

C1 IGATE GUINDY CHENNAI 101 C1 12-FEB-87 15-FEB-87

C2 IBM B’GATTA Rd BLORE 102 C3 25-JUN-90 02-JUL-90


103 C1 29-AUG-92 05-SEP-92
C3 ACCEN TIDEL PARK CHENNAI

PRODUCT LINEITEM
ord_no lineno prodid qty sellprice
prodid prodname sellprice costprice
P1 Mouse 350 300 101 1 P1 2 350

P2 Keyboard 400 360 101 2 P2 3 400

P3 CPU 4900 4200 101 3 P5 1 1000


P4 64MB RAM 550 520 102 1 P2 3 400
P5 128MB RAM 1000 930 103 1 P3 1 4900
P6 Mother Board 5400 4900 103 2 P6 1 5400
Data Modeling Continued…
When talking about data in a database, it is common to look at
the data in three different ways:
•External or user views: the view of the data that each person
has for their functional area.
•Logical Data Model or Conceptual Model: the entire
collection of user views for each functional area. In other
words, it is the enterprise's overall view of the database.
•Physical or Internal view: the structure used to physically
implement the logical data model on a physical medium.
What is a Key field in a Database?
A key is the smallest subset of attributes from the
relation such that the key is unique for each tuple.
A key functionally determines a tuple (row).
The selection of keys will depend on the particular
application being considered.
Keys are crucial to a table structure for many reasons,
some of which are identified below:
•They ensure that each record in a table is precisely identified.
•They help establish and enforce various types of integrity.
•They serve to establish table relationships.
Functional dependency (FD)
•A Functional Dependency describes a relationship
between attributes in a single relation.
•An attribute is functionally dependant on another if we
can use the value of one attribute to determine the value
of another.
Ex 1: Employee_Name is functionally dependant on
Social_Security_Number because
Social_Security_Number can be used to determine the
value of Employee_Name.

Ex 2:If we know the value of the destination attribute,


then we can find the corresponding value for the fare
attribute. The fare attribute is therefore functionally
dependent upon the destination attribute.
We use the symbol -> to indicate a functional dependency.
-> means functionally determines
Examples :
destination -> fare
The attributes listed on the left hand side of the -> are called
determinants.
One can read A -> B as, "A determines B".
Not all determinants are keys.
E F CODD's RULES
Codd presented twelve rules that a database must obey
if it is to be considered truly relational.
1. The information rule. All information in a relational
database is represented explicitly at the logical level
and in exactly one way  by values in tables.
Rule 1 is basically the informal definition of a
relational database.
2. Guaranteed access rule. Each and every datum
(atomic value) in a relational database is guaranteed to
be logically accessible by resorting to a combination
of table name, primary key value, and column name.
Rule 2 stresses the importance of primary keys for
locating data in the database. The table name locates
the correct table, the column name finds the correct
column, and the primary key value finds the row
containing an individual data item of interest.
3. Systematic treatment of null values. Null values are
supported in a fully relational DBMS for representing
missing information and inapplicable information in a
systematic way, independent of the data type.
Rule 3 requires support for missing data through
NULL values.
4. Dynamic online catalog based on the relational
model. The database description is represented at the
logical level in the same way as ordinary data, so that
authorized users can apply the same relational
language to its interrogation as they apply to the
regular data.
Rule 4 requires that a relational database be self-describing. In
other words, the database must contain certain system tables
whose columns describe the structure of the database itself.
5. Comprehensive data sublanguage rule. A relational system
may support several languages and various modes of terminal
use (for example, the fill-in-the-blanks mode). However, there
must be at least one language whose statements are
expressible, per some well-defined syntax, as character strings,
and that is comprehensive in supporting all of the following
items:
•Data definition
•View definition
•Data manipulation (interactive and by program)
•Integrity constraints
•Authorization
•Transaction boundaries (begin, commit, and rollback)
Rule 5 mandates using a relational database language, such as
SQL, although SQL is not specifically required. The language
must be able to support all the central functions of a DBMS --
creating a database, retrieving and entering data, implementing
database security, and so on.
6. View updating rule. All views that are theoretically
updateable are also updateable by the system.
Rule 6 deals with views, which are virtual tables used to give
various users of a database different views of its structure. It is
one of the most challenging rules to implement in practice,
and no commercial product fully satisfies it today.
7. High-level insert, update, and delete. The capability of
handling a base relation or a derived relation as a single
operand applies not only to the retrieval of data but also to the
insertion, update, and deletion of data.
Rule 7 stresses the set-oriented nature of a relational database.
It requires that rows be treated as sets in insert, delete, and
update operations. The rule is designed to prohibit
implementations that only support row-at-a-time, navigational
modification of the database.
8. Physical data independence. Application programs and
terminal activities remain logically unimpaired whenever any
changes are made in either storage representations or access
methods.
9. Logical data independence. Application programs and
terminal activities remain logically unimpaired when
information preserving changes of any kind that theoretically
permit unimpairment are made to the base tables.
Rule 8 and Rule 9 insulate the user or application program
from the low-level implementation of the database. They
specify that specific access or storage techniques used by the
DBMS, and even changes to the structure of the tables in the
database, should not affect the user's ability to work with the
data.
10. Integrity independence. Integrity constraints specific to a
particular relational database must be definable in the
relational data sublanguage and storable in the catalog, not in
the application programs.
Rule 10 says that the database language should support
integrity constraints that restrict the data that can be entered
into the database and the database modifications that can be
made. This is another of the rules that is not support in most
commercial DBMS products.
11. Distribution independence. A relational DBMS has
distribution independence.
Rules 11 says that the database language must be able to
manipulate distributed data location on other computer
systems.
12. Nonsubversion rule. If a relational system has a low-level
(single record at a time) language, that low level cannot be
used to subvert or bypass the integrity rules and constraints
expressed in the higher level relational language (multiple
records at a time).
Rule 12 prevents "other paths" into the database that might
subvert the relational structure and integrity.
Data integrity
Data integrity means that the data values in the
database are correct and consistent.
Data integrity means, in part, that you can correctly
and consistently navigate and manipulate the tables in
the database.
Data integrity is enforced in the relational model by
entity and referential integrity rules.
Entity Integrity

•The entity integrity rule states that for every instance


of an entity, the value of the primary key must exist,
be unique, and cannot be null.
•Without entity integrity, the primary key could not
fulfill its role of uniquely identifying each instance of
an entity.

Eg: EMPNO column is primary key in EMP table


DEPTNO column is primary key in DEPT table
Referential Integrity
•The referential integrity rule states that if a relational
table has a foreign key, then every value of the foreign
key must either be null or match the values in the
relational table in which that foreign key is a primary
key.
•Referential integrity ensures that we can correctly
navigate between related entities
Eg : DEPTNO column in EMP is FOREIGN KEY
with reference to DEPTNO column in DEPT which is
a PRIMARY KEY
Foreign key….
A foreign key creates a hierarchical relationship
between two associated entities.
The entity containing the foreign key is the child, or
dependent, and the table containing the primary key
from which the foreign key values are obtained is the
parent.
In order to maintain referential integrity between the
parent and child as data is inserted or deleted from the
database certain insert and delete rules are to be
considered.
If FK and PK are in same table it is self referential
EMP table
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
7900 JAMES CLERK 7698 03-DEC-81 950 30
7902 FORD ANALYST 7566 03-DEC-81 3000 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10

DEPT table SALGRADE table


DEPTNO DNAME LOC GRADE LOSAL HISAL
10 ACCOUNTING NEW YORK 1 700 1200
20 RESEARCH DALLAS 2 1201 1400
30 SALES CHICAGO 3 1401 2000
40 OPERATIONS BOSTON 4 2001 3000
5 3001 9999
Modification Anomalies
•Once our E-R model has been converted into relations, we
may find that some relations are not properly specified. There
can be a number of problems:
•Deletion Anomaly: Deleting a relation results in some related
information (from another entity) being lost.
•Insertion Anomaly: Inserting a relation requires we have
information from two or more entities - this situation might
not be feasible.
NORMALIZATION
NORMALIZATION
It is a process of analyzing the given relation schemas
based on their Functional Dependencies (FDs) and
primary key to achieve the properties
- Minimizing redundancy
- Minimizing insertion, deletion and update anomalies.
Note: Don’t include computed fields

Normalization comprises a set of rules called Normal


Forms i.e., First normal form, Second normal form,
Third normal form, Forth normal form, Boyce/Codd
Normal Form etc.,
FIRST NORMAL FORM
Reduce entities to first normal form (1NF) by removing
repeating or multivalued attributes to another, child entity
Relation in 1NF
Second Normal Form
Reduce first normal form entities to second normal form
(2NF) by removing attributes that are not dependent on
the whole primary key (partial dependency)
Relation in 2NF
Transitive dependency
An attribute in a table is dependent on another non primary
key attribute rather than the primary key.
A condition where A, B, and C are attributes of a relation such
that
if A -> B and B -> C
then C is transitively dependent on A via B i.e, A->C is a
transitive dependency.
This situation is eliminated when the table is in third normal
form.
Third Normal Form
•A relation is in third normal form (3NF) if it is in second
normal form and it contains no transitive dependencies.
OR
Reduce second normal form entities to third normal form
(3NF) by removing attributes that depend on other,
nonkey attributes.
Decomposed into 3NF
All the tables