You are on page 1of 56

Introduction to

Databases
Dr. Zaheeruddin Asif
A lecture based on With diagrams from various sources
What is a Database? 2

• A flat file is considered to be one-dimensional


storage system because it presents its
information from a single point of view.

• A collection of data that is multidimensional in


the sense that internal links between its
entries make the information accessible from a
variety of perspectives
Data Hierarchy
3
A file versus a database organization
4
Files Application 5
Data Files programs Users

Payroll Reports
Payroll programs

Invoicing Reports
Invoicing programs

Inventory
Inventory control Reports
control programs

Management
Management inquiries Reports
inquiries programs
Drawbacks of File system 6

• Data Duplication

• Data Inconsistency

• Program Data Dependence


Advantage of Database 7

• Better Data Integrity


• More Information
• More Control of data
• Better Decision Making
• Improved Security
• Standardization of Data
Database Approach 8
Payroll Reports
program
Payroll data

Inventory Inventory
data program Reports
Database
management
Invoicing system
Data Invoicing
Reports
program
Other
data
Other Reports
programs

Database Interface Applications programs Users


Disadvantages 9

• Expensive

• More Expertise needed

NOTE: To provide data privileges database relies


on schemas and sub schemas
Schemas 10

• Schema: A description of the structure of an


entire database, used by database software to
maintain the database

• Subschema: A description of only that portion


of the database pertinent to a particular user’s
needs, used to prevent sensitive data from
being accessed by unauthorized personnel
Example 11

• Suppose a schema for university database indicates that


each student record contains items such as name,
address, phone number, academic record etc. Moreover
student and faculty are linked in the form that each
student has a faculty advisor.
• Now the universities registrar would have a sub schema
view in which he is able to see the faculty supervising
students but should not be able to see the past
employment history of the faculty member
The conceptual layers of a database
implementation 12
Data Models 13

• Hierarchical
• Network
• Relational
Hierarchical Database 14

Project 1

Department A Department B Department C

Employee Employee Employee Employee Employee Employee


1 2 3 4 5 6
A relation containing
redundancy 15
Decomposed Relations 16
ER DIAGRAM
Attributes
17
Last name Colour
Entities

First 1 M
Customer Order Product Name
name

1:M one-to-many relationship

Date

Identification Identification
number number
Relational Operations 18

• SELECT
• PROJECT
• JOIN
SELECT Operation 19
PROJECT Operation 20
JOIN Operation 21
Application of JOIN 22
SQL 23

Abbreviation of structured query language, and


pronounced either see-kwell or as separate
letters. SQL is a standardized query language
for requesting information from a database.
The original version called SEQUEL (structured
English query language) was designed by an
IBM research center in 1974 and 1975. SQL was
first introduced as a commercial database
system in 1979 by Oracle Corporation.
SQL 24

Historically, SQL has been the favorite query


language for database management systems
running on minicomputers and mainframes.
SQL Syntax 25
SQL Syntax 26
Database Integrity 27

• Commit
• Rollback
• Commit point
• Cascading Rollback
• Locking
• Shared Lock
• Exclusive Lock
Normalization 28

• INF
• 2NF
• 3NF

• the truth, the whole truth and nothing but the


truth.
• Pure juice, nothing removed, nothing added
Functional Dependency 29

• A Functional Dependency describes a relationship


between attributes within a single relation.

• An attribute is functionally dependent on another if we can use the


value of one attribute to uniquely determine the value of another.

• Showed by X->Y (read X functionally determines Y)

• E.g Std_id -> Std_name (Std_id functionally determines Std_name)

Fall 2019 01/29/2022


Functional Dependencies
30

EmpNum EmpEmail EmpFname EmpLname


123 jdoe@abc.com John Doe
456 psmith@abc.com Peter Smith
555 alee1@abc.com Alan Lee
633 pdoe@abc.com Peter Doe
787 alee2@abc.com Alan Lee

If EmpNum is the PK then the FDs:


EmpNum  EmpEmail
EmpNum  EmpFname
EmpNum  EmpLname
Fall 2019 must exist. 01/29/2022
Functional Dependencies 31
EmpNum  EmpEmail
EmpNum  EmpFname 3 different ways
EmpNum  EmpLname you might see FDs
depicted
EmpEmail
EmpNum EmpFname

EmpLname

EmpNum EmpEmail EmpFname EmpLname

Fall 2019 01/29/2022


Determinant 32

Functional Dependency

EmpNum  EmpEmail

Attribute on the LHS is known as the determinant


• EmpNum is a determinant of EmpEmail

Fall 2019 01/29/2022


Lets Test our Concepts 33
• List all functional dependencies in the tables
Std_ID Std Name Father’s Name Program Course

Std_ID  Std Name


Std_ID  Father’s Name
Std_ID  Program

CNIC Emp_Name Father’s Name Dept Designation

CNIC  Emp_Name

CNIC  Father’s Name


Fall 2019 01/29/2022
Transitive dependency 34

Transitive dependency

Consider attributes A, B, and C, and where


A  B and B  C.
Functional dependencies are transitive, which means
that we also have the functional dependency AC
We say that C is transitively dependent on A through B.

Fall 2019 01/29/2022


Transitive dependency
35
EmpNum  DeptNum
EmpNum EmpEmail DeptNum DeptName

DeptNum  DeptName

EmpNum EmpEmail DeptNum DeptName

DeptName is transitively dependent on EmpNum via DeptNum


EmpNum  DeptName
Fall 2019 01/29/2022
Partial dependency 36
A partial dependency exists when an attribute B is
functionally dependent on an attribute A, and A is a
component of a candidate key.

Std_ID Subject Score Instructor

Candidate keys: {Std_ID, Subject} Instructor is functionally


dependent on Subject and has no direct relation with Std_ID
Subject is determinant of Instructor and Subject is part of candidate
key
Fall 2019 01/29/2022
Normalization in DBMS 37

• Database Normalization is a technique of organizing the


data in the database.
• Normalization is a systematic approach of decomposing
tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion
Anomalies.
• It is a multi-step process that puts data into tabular form
by removing duplicated data from the relation tables.
• Normalization is used for mainly two purpose,
• Eliminating redundant (useless) data.
• Ensuring data dependencies make sense i.e. data is logically
stored.
Fall 2019 01/29/2022
Problem Without Normalization 38
• Without Normalization, it becomes difficult to handle and
update the database, without facing data loss. Insertion,
Updation and Deletion Anomalies are very frequent if Database
is not Normalized. To understand these anomalies let us take
an example of Student table.
S_id S_Name S_Address Subject_opted
401 Adam Cincinnati Bio
402 Alex San Francisco Maths
403 Stuart Chicago Maths
404 Adam Cincinnati Physics

Fall 2019 01/29/2022


Anomalies 39

• Insertion Anomaly : Occurs when certain


attribute of an entity is not yet known and
therefore a tuple has incomplete data.
• Suppose for a new admission, we have a Student
id(S_id), name and address of a student but if
student has not opted for any subjects yet then we
have to insert NULL there, leading to Insertion
Anomaly.

Fall 2019 01/29/2022


Anomalies 40

S_id S_Name S_Address Subject_opted

401 Adam Cincinnati Bio

402 Alex San Francisco Maths

403 Stuart Chicago Maths

404 Adam Cincinnati Physics

Fall 2019 01/29/2022


Anomalies 41

• Updation Anomaly : Occurs in case of data


redundancy and/or partial update.
• To update address of a student who occurs twice or
more in a table, we will have to
update S_Address column in all the rows, else data
will become inconsistent.

Fall 2019 01/29/2022


Anomalies 42

S_id S_Name S_Address Subject_opted

401 Adam Cincinnati Bio

402 Alex San Francisco Maths

403 Stuart Chicago Maths

404 Adam Cincinnati Physics

Fall 2019 01/29/2022


Anomalies 43

• Deletion Anomaly : Occurs when all data is


deleted because of deletion of some data
regarding an entity.
• If (S_id) 401 has only one subject and temporarily
he drops it, when we delete that row, entire
student record will be deleted along with it.

Fall 2019 01/29/2022


Anomalies 44

S_id S_Name S_Address Subject_opted

401 Adam Cincinnati Bio

402 Alex San Francisco Maths

403 Stuart Chicago Maths

404 Adam Cincinnati Physics

Fall 2019 01/29/2022


Steps For Normalization 45

1. Specify the Key of the relation


2. Specify the functional dependencies of the relation.
Sample data (tuples) for the relation can assist with this
step.
3. Apply the definition of each normal form (starting with
1NF).
4. If a relation fails to meet the definition of a normal form,
change the relation (most often by splitting the relation
into two new relations) until it meets the definition.
5. Re-test the modified/new relations to ensure they meet
the definitions of each normal form.

Fall 2019 01/29/2022


First Normal Form (1NF) 46

• As per First Normal Form,


• The table should only have single (atomic) valued
attributes/columns.

• Values stored in a column should be of the same domain

• All the attributes/columns in a table should have unique names.

• Each row is unique

Fall 2019 01/29/2022


Example: Student Table 47

For example, a relation with 3-degree and 3-tuple


Student: Name of the student
Age: Number defining age of student
Subject: What he is opting for

Student Age Subject


Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths

Fall 2019 01/29/2022


1NF 48

• In First Normal Form, no row should have a


column with more than one value, like a list
separated with commas. It must separated into
multiple rows.
• Using the First Normal Form, data redundancy
increases, as there will be many columns with
same data in multiple rows but each row as a
whole will be unique.

Fall 2019 01/29/2022


1NF 49

Student Age Subject

Adam 15 Biology

Adam 15 Maths

Alex 14 Maths

Stuart 17 Maths

Fall 2019 01/29/2022


Second Normal Form (2NF) 50

• A table is said to be in 2NF if both the


following conditions hold:
• Table is in 1NF (First normal form)
• No partial dependency.
• No non-prime attribute is functionally dependent on
the proper subset of any candidate key of table.
• An attribute that is not part of any candidate
key is known as non-prime attribute.

Fall 2019 01/29/2022


2NF Example 51

• Also in the above Table in First Normal Form,


while the candidate key is
{Student, Subject}, Age only depends on
Student column, which is incorrect as per
Second Normal Form.
• To achieve second normal form, it would be
helpful to split out the subjects into an
independent table, and match them up using
the student names as foreign keys.

Fall 2019 01/29/2022


Student Table
Student Age Subject 52
Adam 15 Biology
Adam 15 Maths
• New Student Table following 2NF will be:
Alex 14 Maths
New Student Table following 2NF will be:
Stuart 17 Maths

New Subject Table following 2NF will


New Student Table be:
following 2NF will be:
Student Subject
Student Age Adam Biology
Adam 15 Adam Maths
Alex 14 Alex Maths
Fall 2019 Stuart 17 Stuart
01/29/2022
Maths
2NF 53

• In the Age table, Student is the primary key


• In Subject Table the candidate key will be
{Student, Subject} column.
• Both the tables qualify for Second Normal
Form

Fall 2019 01/29/2022


Third Normal Form (3NF) 54

• A table is said to be in 3NF if both the


following conditions hold:
• Table is in 2NF (Second normal form)
• No transitive dependency.
• Every non-prime attribute is functionally dependent
on the primary key of table
• In other words, no non-prime attribute is determined
by another non-prime attribute.

Fall 2019 01/29/2022


Advantages of 3NF 55

• The advantage of removing transitive


dependency is,
• Amount of data duplication is reduced.
• Data integrity achieved.
In its broadest use, “data integrity” refers to the
accuracy and consistency of data stored in a
database, data warehouse, data mart or other
construct.

Fall 2019 01/29/2022


Sources 56

• Wikipedia
• Brookshear and Brylow, Computer Science

You might also like