You are on page 1of 41

DBMS – Normalization + SQL

Himanish Shekhar Das


Assistant Professor
Computer Science and Engineering
Email: hsdas0815044@gmail.com
Contact: 9435427610
1
Normalization – Why?
• Database normalization is the process of organizing the attributes of
database to reduce or eliminate data redundancy (having same data but
at different places) .
• Normalization is the process of minimizing redundancy from a relation or
set of relations. Redundancy in relation may cause insertion, deletion and
updation anomalies. So, it helps to minimize the redundancy in
relations. Normal forms are used to eliminate or reduce redundancy in
database tables.
• Problems because of data redundancy
Data redundancy unnecessarily increases size of database as same data is
repeated on many places. Inconsistency problems also arise during insert,
delete and update operations.

2
Functional Dependency
• Functional dependency is denoted by arrow (→). If an attributed A functionally
determines B, then it is written as A → B.
• A function dependency A → B mean for all instances of a particular value of A, there is
same value of B.
• For example in the below table A → B is true, but B → A is not true as there are different
values of A for B = 3.
A B
------
1 3
2 3
4 0
1 3
4 0

3
Trivial and Non-trivial Functional Dependency
• X –> Y is trivial only when Y is subset of X.
Examples
ABC --> AB
ABC --> A
ABC --> ABC
• X –> Y is a non trivial functional dependencies when Y is not a subset of X.
• X –> Y is called completely non-trivial when X intersect Y is NULL.
Examples:
Id --> Name,
Name --> DOB

4
Normal Forms – 1NF
• If a relation contain composite or multi-valued attribute, it violates
1NF. A relation is in 1NF if every attribute in that relation is singled
valued attribute.

5
Normal Forms – 1NF

6
Normal Forms – 2NF
• To be in second normal form, a relation must be in first normal form
and relation must not contain any partial dependency. A relation is in
2NF iff it has No Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key) is dependent on
any proper subset of any candidate key of the table.

7
Normal Forms – 2NF
• Partial Dependency – If proper
subset of candidate key determines
non-prime attribute, it is called
partial dependency.
FD set: {COURSE_NO->COURSE_NAME}
• Example 1 – In relation
Candidate Key: {STUD_NO, COURSE_NO} STUDENT_COURSE given in Table 3

• In FD COURSE_NO->COURSE_NAME, COURSE_NO (proper subset of


candidate key) is determining COURSE_NAME (non-prime attribute).
Hence, it is partial dependency and relation is not in second normal
form.
8
Normal Forms – 2NF
• To convert it to second normal form, we will decompose the relation
STUDENT_COURSE (STUD_NO, COURSE_NO, COURSE_NAME) as :

STUDENT_COURSE (STUD_NO, COURSE_NO)


COURSE (COURSE_NO, COURSE_NAME)

9
Normal Forms – 3NF
• A relation is in third normal form, if there is no transitive
dependency for non-prime attributes is it is in second normal form.
A relation is in 3NF iff at least one of the following condition holds in
every non-trivial function dependency X –> Y
• X is a super key.
• Y is a prime attribute (each element of Y is part of some candidate key).

10
Normal Forms – 3NF

• Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
• Example 1 – In relation STUDENT given in Table 4,FD set: {STUD_NO ->
STUD_NAME, STUD_NO -> STUD_STATE, STUD_NO -> STUD_COUNTRY, STUD_NO ->
STUD_AGE, STUD_STATE -> STUD_COUNTRY}
Candidate Key: {STUD_NO}
• For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on
STUD_NO. It violates third normal form.
11
Normal Forms – 3NF
• To convert it in third normal form, we will decompose the relation
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as:

STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,


STUD_AGE)

STATE_COUNTRY (STATE, COUNTRY)

12
Normal Forms – BCNF
• A relation R is in BCNF if R is in Third Normal Form and for every FD,
LHS is super key.

• A relation is in BCNF iff in every non-trivial functional dependency


X –> Y, X is a super key.

13
Normal Forms – BCNF
• For example, Stu_ID is the key and only prime key attribute.

• We find that City can be identified by Stu_ID as well as Zip itself.


Neither Zip is a superkey nor is City a prime attribute. Additionally,
Stu_ID → Zip → City, so there exists transitive dependency.

14
Normal Forms – BCNF
• To bring this relation into third normal form, we break the relation into two
relations as follows

• In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip
is the super-key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
And Zip → City, Which confirms that both the relations are in BCNF

15
Multivalued Dependency
• If two or more independent relation are kept in a single relation or we
can say multivalue dependency occurs when the presence of one or
more rows in a table implies the presence of one or more other rows
in that same table. Put another way, two attributes (or columns) in a
table are independent of one another, but both depend on a third
attribute. A multivalued dependency always requires at least three
attributes because it consists of at least two attributes that are
dependent on a third.

16
Multivalued Dependency

• In the above table, we can see Students Amit and Akash have interest in more
than one activity.
• This is multivalued dependency because CourseDiscipline of a student are
independent of Activities, but are dependent on the student.
17
Multivalued Dependency
• Therefore, multivalued dependency:
StudentName ->-> CourseDiscipline
StudentName ->-> Activities

• The above relation violates Fourth Normal Form in Normalization.


• To correct it, divide the table into two separate tables and break
Multivalued Dependency:

18
Multivalued Dependency

This breaks the multivalued dependency and now we have two functional dependencies:

StudentName -> CourseDiscipline


StudentName - > Activities

19
Normal Forms – 4NF
• A relation R is in 4NF if and only if the following conditions are
satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued Dependency.

20
Join Dependency
• Join decomposition is a further generalization of Multivalued
dependencies. If the join of R1 and R2 over C is equal to relation R
then we can say that a join dependency (JD) exists, where R1 and R2
are the decomposition R1(A, B, C) and R2(C, D) of a given relations R
(A, B, C, D). Alternatively, R1 and R2 are a lossless decomposition of R.

21
Normal Forms – 5NF
• A relation R is in 5NF if and only if it satisfies following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency)

• Example – If a company makes a product and an agent is an agent for


that company, then he always sells that product for the company.

22
Normal Forms – 5NF
AGENT COMPANY PRODUCT AGENT COMPANY
A1 PQR Nut A1 PQR
A1 PQR Bolt A1 XYZ
A1 XYZ Nut A2 PQR
A1 XYZ Bolt
A2 PQR Nut AGENT PRODUCT
A1 Nut
COMPANY PRODUCT A1 Bolt
PQR Nut A2 Nut
PQR Bolt
XYZ Nut
XYZ Bolt
23
SQL
• SQL is a database computer language designed for the retrieval and
management of data in a relational database. SQL stands for
Structured Query Language.
• Structured Query Language is a standard Database language which is
used to create, maintain and retrieve the relational database.
• The most important point to be noted here is that SQL is case
insensitive, which means SELECT and select have same meaning in
SQL statements. Whereas, MySQL makes difference in table names.
So, if you are working with MySQL, then you need to give table names
as they exist in the database.

24
SQL – Query Structure
■ A typical SQL query has the form:
select A1, A2, ..., An
from r1, r2, ..., rm
where P
● Ai represents an attribute
● Ri represents a relation
● P is a predicate.

■ This query is equivalent to the relational algebra expression.

■ The result of an SQL query is a relation.


∏ A ,A ,…,A (σ P(r1 × r2 ×… × rm ))
1 2 n

25
Select Clause
• The select clause list the attributes desired in the result of a query
● corresponds to the projection operation of the relational algebra

■ Example: find the names of all branches in the loan relation:

select branch_name
from loan

■ In the relational algebra, the query would be:


∏branch_name (loan)

26
Select with DISTINCT
• SQL allows duplicates in relations as well as in query results.

■ To force the elimination of duplicates, insert the keyword distinct


after select.

■ Find the names of all branches in the loan relations, and remove
duplicates

select distinct branch_name


from loan

27
SELECT *
• An asterisk in the select clause denotes “all attributes”
select *
from loan

■ The select clause can contain arithmetic expressions involving the


operation, +, –, ∗, and /, and operating on constants or attributes of
tuples.

■ The query:
select loan_number, branch_name, amount ∗ 100
from loan
would return a relation that is the same as the loan relation, except that
the value of the attribute amount is multiplied by 100.

28
WHERE Clause
• The where clause specifies conditions that the result must satisfy
● Corresponds to the selection predicate of the relational algebra.

■ To find all loan number for loans made at the Jorhat branch with
loan amounts greater than 1200.

select loan_number
from loan
where branch_name = ‘Jorhat' and amount > 1200

■ Comparison results can be combined using the logical connectives and,


or, and not.

■ Comparisons can be applied to results of arithmetic expressions.

29
WHERE with BETWEEN
• SQL includes a between comparison operator

■ Example: Find the loan number of those loans with loan


amounts between 90,000 and 100,000

select loan_number
from loan
where amount between 90000 and 100000

30
FROM Clause
• The from clause lists the relations involved in the query
● Corresponds to the Cartesian product operation of the relational algebra.

■ Find the Cartesian product borrower X loan


select ∗
from borrower, loan

■ Find the name, loan number and loan amount of all customers
having a loan at the Jorhat branch.

select customer_name, borrower.loan_number, amount


from borrower, loan
where borrower.loan_number = loan.loan_number and
branch_name = ‘Jorhat'

31
Rename Operation
• The SQL allows renaming relations and attributes using the as
clause:
oldname as newname

■ Find the name, loan number and loan amount of all customers;
rename the
column name loan_number as loan_id.

select customer_name, borrower.loan_number as loan_id, amount


from borrower, loan
where borrower.loan_number = loan.loan_number

32
SQL
• Student Table: ROLL_N
NAME ADDRESS PHONE AGE
O
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 9156768971 18

• If we want to retrieve attributes ROLL_NO and NAME of all students,


the query will be: ROLL_NO NAME
SELECT ROLL_NO, NAME 1 RAM
FROM STUDENT; 2 RAMESH
3 SUJIT
4 SURESH
33
SQL
• Student Table: ROLL_N
NAME ADDRESS PHONE AGE
O
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 9156768971 18

• Retrieve ROLL_NO and NAME of the students whose ROLL_NO is


greater than 2, the query will be:
SELECT ROLL_NO, NAME ROLL_NO NAME

FROM STUDENT 3 SUJIT


4 SURESH
WHERE ROLL_NO>2;
34
SQL
• Student Table: ROLL_N
NAME ADDRESS PHONE AGE
O
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 9156768971 18

• If we want to retrieve all attributes of students, we can write * in place


of writing all attributes as, the query will be:
SELECT * FROM STUDENT ROLL_NO NAME ADDRESS PHONE AGE
915625313
WHERE ROLL_NO>2; 3 SUJIT ROHTAK
1
20

915676897
4 SURESH DELHI 18
1
35
SQL
• Student Table: ROLL_N
NAME ADDRESS PHONE AGE
O
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 9156768971 18

• If we want to retrieve distinct values of an attribute or group of


attribute, DISTINCT is used, the query will be:
SELECT DISTINCT ADDRESS ADDRESS
FROM STUDENT; DELHI
GURGAON
ROHTAK
36
SQL Syntax
• SQL SELECT Statement
SELECT column1, column2....columnN
FROM table_name;
• SQL DISTINCT Clause
SELECT DISTINCT column1, column2....columnN
FROM table_name;
• SQL WHERE Clause
SELECT column1, column2....columnN
FROM table_name
WHERE CONDITION;
37
SQL Syntax
• SQL AND/OR Clause
SELECT column1, column2....columnN
FROM table_name
WHERE CONDITION-1 {AND|OR} CONDITION-2;
• SQL IN Clause
SELECT column1, column2....columnN
FROM table_name
WHERE column_name IN (val-1, val-2,...val-N);
• SQL BETWEEN Clause
SELECT column1, column2....columnN
FROM table_name
WHERE column_name BETWEEN val-1 AND val-2;
38
SQL Syntax
• SQL LIKE Clause
SELECT column1, column2....columnN
FROM table_name
WHERE column_name LIKE { PATTERN };
• SQL ORDER BY Clause
SELECT column1, column2....columnN
FROM table_name
WHERE CONDITION
ORDER BY column_name {ASC|DESC};
• SQL GROUP BY Clause
SELECT SUM(column_name)
FROM table_name
WHERE CONDITION
GROUP BY column_name;
39
SQL Syntax
• SQL COUNT Clause
SELECT COUNT(column_name)
FROM table_name
WHERE CONDITION;
• SQL HAVING Clause
SELECT SUM(column_name)
FROM table_name
WHERE CONDITION
GROUP BY column_name
HAVING (arithematic function condition);
40
SQL
• Student Table: ROLL_N
NAME ADDRESS PHONE AGE
O
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 9156768971 18

• If we want to represent the relation in ascending order by AGE, we can


use ORDER BY clause: ROLL_NO NAME ADDRESS PHONE AGE
SELECT * 1 RAM DELHI 9455123451 18
FROM STUDENT 2 RAMESH GURGAON 9652431543 18
4 SURESH DELHI 9156768971 18
ORDER BY AGE;
3 SUJIT ROHTAK 9156253131 20
41

You might also like