You are on page 1of 54

DATABASES

AGENDA

• Data & Information • Denormalization


• Purpose Of Database • Oralce 19c Architecture
• Challenges without a DBMS
• DBMS
• Types Of Data Model
• ACID Properties
• The Schema
• Database Architecture
• Normalization
• Relationships
• Normalization
• Normalization Forms
DATA & INFORMATION

Data an Introduction
• Data : Raw Facts.
Data is raw, unorganized facts that need to be processed. Data can be
something simple and seemingly random and useless until it is organized.

• Information: Processed Data


When data is processed, organized, structured or presented in a given context
so as to make it useful, it is called information.
PURPOSE OF DATABASE

 Do you find yourself entering the same values of information into multiple
spreadsheets/reports/documents?
 When you make the changes in your spreadsheet/reports/documents, are you forced to
make the same changes in others?
 Do you have a large amount of data that is becoming larger and unmanageable?
 Do several people in your organization have the need to view your data at the same time?
 Are you tracking related information in several spreadsheets – such as separate sheets for
sales for different departments or different geographical locations?
 When viewing your information, are you constantly scrolling on your screen to view it
all? Or do you have a difficult time viewing the specific sets of data that you want?
CHALLENGES WITHOUT AN DBMS...

• System crashes: Read ‘students.txt’


Read ‘courses.txt’
Find&update the record “Mary Johnson” CRASH !
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
• What is the problem ?
• Large data sets (say 50GB)
• Why is this a problem ?
• Simultaneous access by many users
• Lock students.txt – what is the problem ?

5
DATABASE MANAGEMENT SYSTEM

• DBMS
• is a collection of programs that enables you to store, modify and extract information
from a database
• Is a piece of software that provides services for accessing a database
• Why DBMS
• Secure and Survivable medium, for the storage and retrieval of data.
BENEFITS OF A DATABASE

• Redundancy can be reduced


• Inconsistency can be avoided
• Sharing
• Standards can be enforced
• Security restrictions
• Integrity
DBMS

Two/Tier Architecture Or Client Server Architecture

BC
/ OD ity User Application
BC tiv
JD nec
con

JDBC/ODBC
connectivity
User Application
Data File JD
con BC/O
ne
cti DBC
vi t
y

User Application

Server (C /C++ Programs)


DATABASE MODEL

Hierarchical model represents the data in a tree-like structure. A child record is associated to
a single parent. To maintain order there is a sort field which keeps sibling nodes into a recorded
manner. This was formed for the earlier database management systems based out in mainframe
IMS databases.
DATA MODELS

• Network database model, a child can be linked to multiple parents, a feature that was not supported by
the hierarchical data model. The parent nodes are known as owners and the child nodes are called
members.
DATABASE MODELS

• Data models define how the logical structure of a database is modeled. Data
Models are fundamental entities to introduce abstraction in a DBMS. Data
models define how data is connected to each other and how they are processed
and stored inside the system.
• Entity-Relational (ER) Model is based on the notion of real-world entities
and relationships among them. ER model is based on Entities and
their attributes and Relationships among entities.
• Entity is a real-world entity having properties called attributes.
Every attribute is defined by its set of values
called domain.
• Relationship − The logical association among entities is called relationship.
Relationships are mapped with
entities in various ways. Mapping cardinalities define the number of association
between two entities.
• Entity Cardinality
• 1:1 : Customer -> Adhaar Card.
• 1:m : Customer -> Sales
• M:m : Student <-> Subject
ER DIAGRAM
DATABASE MODELS

• Relational Model is the most popular data model in DBMS. The data is stored in a tabular format and is defined
as an n-ary relation.

Attribute/Column

Deptno Dname Loc


Tuple/Row 10 IT Chennai
20 Admin Delhi
30 HR Chennai
RELATIONAL DATA MODEL
FILE BASED / DBMS

• Advent Of Databases
File Based DBMS
• Data was collated and stored in the form of Access is only Physical Physical as well as Logical
ledgers using in the manual mode.
Predetermined access to data Flexible access to data (SQL)
• Data was collated and stored in electronic excel
sheets, files in the computer mode. At any point in time only one given Concurrent users
user
• Database: organized collection of data that Redundancy permitted Redundancy controlled.
is stored and accessed electronically.
Restricted unauthorized access.
• Database Management System: A software
Back up and recovery process
for creating and managing databases.
Data is Isolated
ACID PROPERTIES

Atomicity Consistency Isolation Durability


A transaction is completed All changes to the Two different users are A database must be able
or has not begun database is applied in accessing the same table to survive System failure.
order to maintain and performing the same
consistency in the operation. The operations
database. This would are independent of either
enable the other user to user.
see the changes applied
ATOMICITY

• Either all of a transaction's changes are stored in the


database, or none of them are stored. In the event of an
external error, it is obviously ideal if the recovery process
can complete any transactions that were in progress at the
time; however, it is also acceptable for those transactions to
be completely rolled back.
• Results of a transaction's execution are either all committed
or all rolled back. All changes take effect, or none do.
• Example:
• If there are 100 transactions to be processed, either all 100
will be processed or rolled back, in case even if one fails.
CONSISTENCY

• The database is transformed from one valid state to another


valid state. This defines a transaction as legal only if it obeys
user-defined integrity constraints. Illegal transactions aren't
allowed and, if an integrity constraint can't be satisfied then
the transaction is rolled back.

Example:
• If you are storing bank accounts that relate to bank
customers, it should not be possible to create an account for
a customer who does not exist, and it should not be possible
to delete a customer from the customers table if there are
still accounts referring to them in the accounts table.
ISOLATION

• Isolation means that transactions do not affect each other while they are
running. Each transaction should be able to view the world as though it is the
only one reading and altering things. In practice this is not usually the case, but
locks are used to achieve the illusion
DURABILITY

• Once committed (completed), the results of a transaction are


permanent and survive future system and media failures
• Example:
• If the airline reservation system computer gives you seat 22A
and crashes a millisecond later, it won't have forgotten that you
are sitting in 22A and also give it to someone else. Furthermore,
if a programmer spills coffee into a disk drive, it will be possible
to install a new disk and recover the transactions up to the coffee
spill, showing that you had seat 22A
DATA MODEL

• A diagram that address the following


• Representation of the data structures (Entities)
• Attributes that are present in the Entities
• The relationship that exists between two entities.

Data Model has two outputs


ER diagram
documents that contains
Data Dictionary
Decision Logs
Business Rules
DATA MODEL
DATA MODEL

• Conceptual Data Model: This Data Model defines WHAT the system contains.


This model is typically created by Business stakeholders and Data Architects.
The purpose is to organize, scope and define business concepts and rules
• Logical Data Model: Defines HOW the system should be implemented
regardless of the DBMS. This model is typically created by Data Architects
and Business Analysts. The purpose is to developed technical map of rules
and data structures.
• Physical Data Model: This Data Model describes HOW the system will be
implemented using a specific DBMS system. This model is typically created
by DBA and developers. The purpose is actual implementation of the
database.
DATABASE ARCHITECTURE

• Centralized:
• Client Server

N-Tier :
• Distributed
DATABASE ARCHITECTURE
NORMALIZATION

• Normalization theory is based on the observation that relations with certain


properties are more effective in inserting, updating and deleting data than other
sets of relations containing the same data
• Normalization is a multi-step process beginning with an “unnormalized”relation.
• Normalization theory is based on the observation that relations with certain
properties are more effective in inserting, updating and deleting data than other
sets of relations containing the same data
• Normalization is a multi-step process beginning with an “unnormalized”relation.
NORMAL FORMS

• First Normal Form (1NF)


• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
NORMALIZATION

Functional
dependency
No transitive
of nonkey
dependency
attributes on
between
the primary
nonkey
attributes
Boyce- key - Atomic
Codd and values only
Higher
All Full
determinants Functional
are candidate dependency
keys - Single of nonkey
multivalued attributes on
dependency the primary
key
FUNCTIONAL DEPENDENCIES

• Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs
• FDs and keys are used to define normal forms for relations
• FDs are constraints that are derived from the meaning and interrelationships of the data attributes
• A set of attributes X functionally determines a set of attributes Y if the value of X determines a
unique value for Y
• X Y holds if whenever two tuples have the same value for X, they must have the same value
for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R)
• X  Y in R specifies a constraint on all relation instances r(R)
• FDs are derived from the real-world constraints on the attributes
EXAMPLES OF FUNCTIONAL
DEPENDENCY

• Social Security Number determines employee name


SSN  ENAME
• Project Number determines project name and location
PNUMBER  {PNAME, PLOCATION}
• Employee SSN and project number determines the hours per week that the employee works
on the project
{SSN, PNUMBER}  HOURS
Functional Dependency and Keys
An FD is a property of the attributes in the schema R 
The constraint must hold on every relation instance r(R)
 If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples
with t1[K]=t2[K])
FIRST STEP IN NORMALIZATION

• First step in normalization is to convert the data into a two-dimensional table


• In unnormalized relations data can repeat within a column
UNNORMALIZED RELATION
FIRST NORMAL FORM

• To move to First Normal Form a relation must contain only atomic values at
each row and column.
• No repeating groups
• A column or set of columns is called a Candidate Key when its values can uniquely
identify the row in the relation
FIRST NORMAL FORM

IS 257 – Fall 2008


1NF STORAGE ANOMALIES

• Insertion: A new patient has not yet undergone surgery -- hence no surgeon # --
Since surgeon # is part of the key, we cannot insert.
• Insertion: If a surgeon is newly hired and has not operated yet -- there will be no
way to include that person in the database.
• Update: If a patient comes in for a new procedure, and has moved, we need to
change multiple address entries.
• Deletion (type 1): Deleting a patient record may also delete all info about a
surgeon.
• Deletion (type 2): When there are functional dependencies (like side effects and
drug) changing one item eliminates other information.
SECOND NORMAL FORM

• A relation is said to be in Second Normal Form when every non-key attribute


is fully functionally dependent on the primary key.
• That is, every non-key attribute needs the full primary key for unique identification
WHY IS THIS NOT IN 2NF?

IS 257 – Fall 2008


SECOND NORMAL FORM
SECOND NORMAL FORM

IS 257 – Fall 2008


1NF ANOMALIES REMOVED

Insertion: Can now enter new patients without surgery.

Insertion: Can now enter Surgeons who have not operated.

Deletion (type 1): If Charles Brown dies, the corresponding tuples from Patient and
Surgery tables can be deleted without losing information on David Rosen.

Update: If John White comes in for third time, and has moved, we only need to change
the Patient table
2NF ANOMALIES

• Insertion: Cannot enter the fact that a particular drug has a particular side effect
unless it is given to a patient.
• Deletion: If John White receives some other drug because of the penicillin rash,
and a new drug and side effect are entered, we lose the information that
penicillin can cause a rash
• Update: If drug side effects change (a new formula) we have to update multiple
occurrences of side effects.
THIRD NORMAL FORM

• A relation is said to be in Third Normal Form if there is no


transitive functional dependency between non-key attributes
• When one non-key attribute can be determined with one or more non-
key attributes there is said to be a transitive functional dependency.
• The side effect column in the Surgery table is determined by
the drug administered
• Side effect is transitively functionally dependent on drug so Surgery is
not 3NF
WHY IS THIS NOT IN 3NF?
THIRD NORMAL FORM

IS 257 – Fall 2008


THIRD NORMAL FORM

IS 257 – Fall 2008


2NF ANOMALIES REMOVED

• Insertion: We can now enter the fact that a particular drug has a particular side
effect in the Drug relation.
• Deletion: If John White receives some other drug as a result of the rash from
penicillin, the information on penicillin and rash is maintained.
• Update: The side effects for each drug appear only once.
BOYCE-CODD NORMAL FORM

• Most 3NF relations are also BCNF relations.


• A table is in BCNF if every functional dependency X->Y, X is the super key of
the table. For BCNF, the table should be in 3NF, and for every FD. LHS is
super key.
• A 3NF relation is NOT in BCNF if:
• Candidate keys in the relation are composite keys (they are not single attributes)
• There is more than one candidate key in the relation, and
• The keys are not disjoint, that is, some attributes in the keys are common
F: { (student, Teacher) -> subject (student, subject) -> Teacher Teacher -> subject}

Student Teacher Subject The table is not in BCNF, because in the FD (teacher->subject), teacher is
Jhansi P.Naresh Database not a key. This relation suffers with anomalies
Ram K.Das C
If we delete the student Jhansi, the teacher R Prasad who teaches C will
Lakshman P.Naresh Database also be lost.
Jhansi R.Prasad C
Mathew Azhar Networks
Teacher-> subject violates BCNF [since teacher is not a candidate key].
If X->Y violates BCNF then divide R into R1(X, Y) and R2(R-Y).
Ram Azhar Networks
Student Teacher
Teacher Subject
Jhansi P.Naresh
P.Naresh Database
Ram K.Das
K.Das C
Lakshman P.Naresh
P.Naresh Database
Jhansi R.Prasad
R.Prasad C
Mathew Azhar
Azhar Networks
Ram Azhar
CHALLENGES ARISING OUT OF
NORMALIZING

• Normalization splits database information across multiple tables.


• To retrieve complete information from a normalized database, the JOIN
operation must be used.
• JOIN tends to be expensive in terms of processing time, and very large joins
are very expensive.
DENORMALIZATION

• Usually driven by the need to improve query speed


• Query speed is improved at the expense of more complex or problematic DML
(Data manipulation language) for updates, deletions and insertions.
DOWNWARD DENORMALIZATION

Customer After: Customer


Before:
ID ID
Address Address
Name Name
Telephone Telephone

Order
Order Order No
Order No Date Taken
Date Taken Date Dispatched
Date Dispatched Date Invoiced
Date Invoiced Cust ID
Cust ID Cust Name
UPWARD DENORMALIZATION

Order Order
Before: After:
Order No Order No
Date Taken Date Taken
Date Dispatched Date Dispatched
Date Invoiced Date Invoiced
Cust ID Cust ID
Cust Name Cust Name
Order Price

Order Item
Order No Order Item
Item No Order No
Item Price Item No
Num Ordered Item Price
Num Ordered
PROS & CONS

Normalized De-Normalized
Smaller Tables Large Table
Current Transactional Historical Data
Quick insert update Slow insert update
Reports needs multiple joins will take Quick report with less joins.
time Very large Databases.
DB size less. OLAP
OLTP
Oracle 19C Architecture

You might also like