You are on page 1of 60

CSC2243: Databases, Visual

Basic and Software Engineering


By NGIRUWONSANGA Albert
E-mail: ngiruwonsanga.rw@gmail.com

Phone contact: 0788 471 881


University of Rwanda
College of Education-School of Education

Department of Mathematics, Science and Physical Education


Year 2 MCsE DEGREE-Academic Year: 2019/2020
Normalization

• Normalization is a technique of organizing the data into


multiple related tables, to minimize data redundancy.

• Data redundancy means repetition of similar data at


multiple places.

• Repetition of data increases the size of database.

2
Normalization

• Repetition of data increases other issues like:


i. Insertion problems
ii. Deletion problems
iii. Updation problems

• Reasons for data repetition: two different but


related data is stored in the same table.
3
Normalization
Rollno Name Branch Hod Office_tel
1 Akon CSE Mr. X 5337
Data redundancy
2 Bkon CSE Mr. X 5337
3 Ckon CSE Mr. X 5337
4 Dkon CSE Mr. X 5337

Updatation anomaly: i.e. Mr. X leaves, Mr. Y joins as the


new HoD for CSE branch.

4
Normalization

• Deletion anomaly: loss of related dataset when


some other dataset is deleted.

• How Normalization will solve all these problems?

• Normalization will break the existing students Table


into two tables: students Table and Branch Table.

5
Normalization
New Student Table New Branch Table

Rollno Name Branch Branch Hod Branch_tel


1 Akon CSE CSE Mr. Y 5337
2 Bkon CSE
3 Ckon CSE Normalization is not about eliminating
4 Dkon CSE
data redundancy, it’s about
minimizing data redundancy.

6
Normalization

• Less redundancy means fewer problems in


inserting, deleting and updating the data.

• Normalization is good as:


i. It follows Divide and Rule.

ii. Logical, independent, but related data

7
Normalization

3 basic Types of Normalization:

1. 1st Normal Form

2. 2nd Normal Form

3. 3rd Normal Form

4. Etc.

8
1st Normal Form (1NF)
Table 1 Rule 1

Column 1 Column 2 • Each column should


A x,y
contain atomic values.
B w,x
C Y • Entries like x,y and w,x
D z violate this rule.

9
1st Normal Form (1NF)
Table 2 Rule 2

DoB Name •A column should contain


26-10-89 A values that are of the same
13-2-92 SK type.
16-11-65 SA
• Do not inter-mix different
R 8-9-86
types of values in any
column.

10
1st Normal Form (1NF)
Table 3 Rule 3

DoB Name Name • Each column should have


26-10-89 A A
a unique name.
13-2-92 S K
16-11-65 S K • Same names leads to
8-9-86 R A confusion at the time of
data retrieval.

11
1st Normal Form (1NF)
Table 4 Rule 4

Rollno F_Name L_Name • Order in which data is


3 A A
saved doesn’t matter.
4 S K
1 P W • Using SQL query, you can
2 R K easily fetch data in any
order from a table.

12
1st Normal Form (1NF)
• Table 5 Violates 1NF.
Table 5
• How to solved this?
Rollno Name Subject
101 Akon OS,NET • All what, we have to do is
103 Ckon JAVA to break the values into
102 Bkon C, C++ down values, by doing so
few values are getting
repeated.

13
1st Normal Form (1NF)
• But values for subject
Table 6
column is now atomic for
Rollno Name Subject
each row. Hence, the
101 Akon OS
101 Akon NET table is in 1NF.
103 Ckon JAVA
102 Bkon C
102 Bkon C++

14
2nd Normal Form (2NF)

1. What are the criteria for it?

2. Why is it so important?

For the table to be in the second normal form:

• It should be in 1st normal form

• And, it should not have any partial dependencies.

15
2nd Normal Form (2NF)
Student_id Name Regno Branch Address
1 Akon CSE-18 CSE TN
2 Akon IT-18 IT AP
3 Bkon CSE-18 CSE HR
4 Ckon CSE-18 CSE MH
10 Akon CSE-18 CSE TN

16
2nd Normal Form (2NF)

• As the student_id in this table will be unique, it can easily


be used to fetch any data.

• A primary key can be used to fetch data from any column


in the table.

• i.e. Branch name of student with Student_id=10

• i.e. Name of student with Student_id=10


17
2nd Normal Form (2NF)

• All what I need is the Student_id and other columns


depend on it. This is called Dependency or Functional
Dependency.

• Now let’s extend our example to see if two or more


columns together can act as a primary key.

18
2nd Normal Form (2NF)
Subject Table
Subject_id Subject_name
1 Java
2 C++
4 DB

19
2nd Normal Form (2NF)
Score Table
Score_id Student_id Subject_id Marks Teacher
1 1 1 82 Allan
2 1 2 77 Kamali
3 2 1 85 Allan
4 2 2 82 Kamali
5 2 4 95 Jacqueline

In this table, we can’t say that primary key should be


score_id.
20
2nd Normal Form (2NF)

• Score table to save marks obtained by each student in


each subject.

• But Student_id + Subject_id together makes a more


meaningful primary key.

• Student_id + Subject_id can uniquely identify any row of


data in score table.

21
2nd Normal Form (2NF)
• Because we have many to many relationship

• Student table vs Subject table

• One student can study more than one subject.

• One subject can be studied by more than one student.

• In score table, Primary key is a composition of two


columns. Student_id + Subject_id

22
2nd Normal Form (2NF)

• Teacher column only depends on the subject and not on


student. This is called partial dependency.

• And for the table to be in 2nd Normal formal, these


shouldn’t exist.

• How to remove partial dependency?

• Objective is to remove teacher column from score table.


23
2nd Normal Form (2NF)
• We can use many ways:
1. One of the way is to move teacher column in subject
table.
2. Or we can create another table for teacher and use
the teacher_id wherever we want. By doing so, we
can even add more information related to teachers
like date of joining, salary. Etc.
24
2nd Normal Form (2NF)
Subject Table Teacher table

Subject_id Subject_name teacher Teacher_id Teacher_name


1 Java Allan 1 Allan
2 C++ Kamali 2 Kamali
3 PHP Ella 3 Ella
4 DB Jacqueline 4 Jacqueline

25
3rd Normal Form (3NF)

• We have now normalized our score table into 2NF. We


want to save more information in our score table.

• For a table to be in 3rd normal form:


1. It should be in 2nd Normal Form.
2. And it should not have Transitive Dependency.

26
3rd Normal Form (3NF)
Score_id Student_id Subject_id Marks Exam_name Total_marks

With Exam_name and Total_marks fields added. The primary


key for score table is a composite key. Which means it is
made up by 2 attributes: Student_id + Subject_id
27
3rd Normal Form (3NF)

• Column exam_name depends on the primary key.

• i.e. A CSE student will have practical exam but Languages


student won’t.

• Exam depends on student and

• Exam depends on subject.

28
3rd Normal Form (3NF)
• What about the second new column Total_marks? Does it
depend on the primary key? Or some other column?
• Total_marks depends on exam_name.
• Thinking how?
i. Practicals are of 40 marks
ii. Main exams are of 70 marks
iii. Sessional are 50 marks

29
3rd Normal Form (3NF)

• So the value of Total_marks changes with Exams. In other


words, we can say Total_marks depends on Exam_name.

• What is the solution to this problem?

• The solution is to take Exam_name and Total_marks


columns and put them in Exam table and use Exam_id
wherever.

30
3rd Normal Form (3NF)
Student table
Student_id Name Regno Branch Address

Score table
Score_id Student_id Subject_id Marks Exam_name

31
3rd Normal Form (3NF)
Subject table
Subject_id Subject_name teacher

Exam table
Exam_name Total_marks

32
Relational Algebra
• Data manipulation languages: relational algebra and
SQL.

• Relational algebra is used for describing and manipulating


access strategies to distributed databases.

• The relational algebra is a collection of operations on


relations each of which takes one or more relations as
operands and produces one relation as a result.
33
Relational Algebra
• Relational algebra is a procedural query language, which
takes instances of relations as input and yields instances of
relations as output.
• It uses operators to perform queries.
• An operator can be either unary or binary.
• Relational algebra is similar to normal algebra (as in 2+3*x-y),
except we use relations as values instead of numbers, and the
operations and operators are different.
34
Relational Algebra
• Relational algebra is the mathematical basis for
performing queries against a relational database.

• Operations are performed against relations, resulting in


relations.

• Because the result of relational algebra operation is a


relation, operations can be stacked up against each
other.
35
Relational Algebra

• Five basic operations are defined: selection, projection,


Cartesian product, union and difference.

• From these operations, some other operations are derived,


such as:

• intersection, division,

• join, and semi-join.


36
Relational Algebra
• i.e. Let’s use a relation Employees relation

called employees with the Emp_id Emp_name Emp_office


following schema: 1001 Bob 10
1002 Alice 11
Employee(Emp_id,Emp_na
1003 Sandy 10
me,Emp_office) 1004 Larry 11
1005 Susan 11

37
Relational Algebra

1. Selection Operation

• It is represented by the symbol σ (sigma). When we use


this operation, we want a specific set of rows out of this
instance.

• The statement: σEmp_id>1004 (Employees) which is also


written as: σEmp_id>1004 Employees

38
Relational Algebra
Selection Operation
Employees Relation

Emp_id Emp_name Emp_office • Select all of the rows


1005 Susan 11
whose Emp_id is
• This operation actually said “build a
greater than 1004 from
new relation instance that consists of
the instance of the
only the rows whose employee ids are
greater than 1004 from the original relation Employees.
relation instance of employees.
39
Relational Algebra

Selection Operation

• Note that the schema of the resulting relation is the same


as the original instance.

• If the operation had looked like this: σEmp_id>1004 v

Emp_name=Sandy Employees. The resulting relation would look


like:

40
Relational Algebra
Selection Operation
Employees Relation

Emp_id Emp_name Emp_office • Where the rows of


1003 Sandy 10
initial relation whose
1005 Susan 11
σEmp_id>1004 OR whose

Emp_name=Sandy were
selected.

41
Relational Algebra

• Unary operations take only one relation as operand; they


include selection and projection.

• The selection SLFR, where R is the operand to which the


selection is applied and F is a formula which expresses a
selection predicate, produces a result relation with the
same relation schema as the operand relation.

42
Relational Algebra

• And containing the subset of the tuples of the operand


which satisfy the predicate.

• The formula involves attribute names or constants as


operands, arithmetic operators, and logical operators.

• For example, a given schema R(A,B,C), the following is

43
Relational Algebra
• A valid formula: (A=B or A<C).

π
2) The projection PJAttrR ( AttrR) where Attr denotes a subset of
attributes of the operand relation, produces a result having
these attributes as a relation schema.
• The tuples of the result are divided from the tuples of the
operand relation by suppressing the values of attributes which
do not appear in Attr.

44
Relational Algebra
• Moreover, replicated tuples which might result from this
operation are eliminated; thus; the cardinality of the result
might be less than the cardinality of the operand.

• Project command gives all values for certain attributes


specified after the command. It shows a vertical view of
the given table.

45
Relational Algebra
• The binary operations take two relations as operands; we
review Union, difference, Cartesian product, intersect, join
and semi-join.

3) Intersect command is the SQL command that takes two


tables and combines only the rows that appear in both
tables. The tables must be union-compatible to be able
to use the intersect command or else it won’t work.
46
Relational Algebra
4) The union R UN S (R∪S) is meaningful only between two
relations R and S with the same relation schema (the
tables are required to have the same attribute
characteristics for the union command to work);

• Tables must be union-compatible, which means that two


tables being used have the same amount of columns
and the columns have the same names.
47
Relational Algebra
• It produces a relation with the same relation schema as its
operands and the union of the tuples of R and S (i.e. all
tuples appearing either in R or in S or in both).

• Union combines all the rows in one table with all of the
rows in another table except for the duplicate tuples.

5) The difference R DF S is meaningful only between two

48
Relational Algebra
• relations R and S with the same relation schema;

• It produces a relation with the same relation schema as its


operands and the difference between the tuples of R and
S (i.e. all tuples appearing in R but not in S).

• Difference command gets all rows in one table that are


not found in the other table.

49
Relational Algebra
• Basically it subtracts one table from the other table to
leave only attributes that are not the same in both tables.

• For this command to work both tables must be union-


compatible.

6) The Cartesian product R CP S (RXS) produces a relation


whose relation schema includes all the attributes of R
and S.
50
Relational Algebra
• If two attributes with the same name appear in R and S,
they are nevertheless considered different attributes;

• In order to avoid ambiguity, the name of each attribute is


prefixed with the name of its “original” relation.

• Every tuple of R is combined with every tuple of S to form


one tuple of the result (All possible pairs of rows from both
tables being used).
51
Relational Algebra
R Relation S Relation T relation

A B C A B C B C D
R,S and T are
a 1 a a 1 a 1 a 1
operand relations
b 1 b a 3 r 1 a 1
a 1 d 3 c 2
b 2 f 1 d 4
2 a 3

52
Relational Algebra
Selection SLA=a R Projection PJA,B R
In rel. Alg. πA,B R
A B C A B One row has been repeated
a 1 a a 1 (a,1), as the rule stipulates “
a 1 d b 1 replicated tuples which
b 2
In Rel. Alg. It’s written as might result from the

σA=a R Projection
eliminated”.
operation are

53
Relational Algebra
Cartesian Product R CP S Union R UN S
R.A R.B R.C S.A S.B S.C
A B C A B C
a 1 a a 1 a
a 1 a b 1 b
b 1 b a 1 a
a 1 d a 1 a
b 1 b a 1 d
b 2 f a 1 a a 1 d b 2 f
a 1 a a 3 r b 2 f
a 3 r Difference R DF S
b 1 b a 3 r
a 1 d a 3 r
b 2 f a 3 r
54
Relational Algebra

7) The join of two relations R and S is denoted as R JN F S,


where F is a formula which specifies the join predicate.

• The formula of a join specification is given by conjunctions


of comparisons between attributes coordinately taken
from the two operands; thus, if we consider two relations
R(A,B) and S(C,D),

55
Relational Algebra

• The following is a valid formula; A=C and B>D. If only


equality appears in the formula, we denote the operation
as an equi-join.

• A join is derived from selection and Cartesian product as


follows:

R JN F S=SL F (R CP S)

56
Relational Algebra

• Join command takes two or more tables and combines


them into one table. This can be used in combination with
other commands to get specific information.

• Thus, the relation schema of the result of the join includes


all the attributes of R and S, and all the tuples from R and S
which satisfy the join predicate are included in the result.

57
Exercises-Relational Algebra
Stud_num name Age Dept
3 Jones 27 MSPE
7 Smith 34 HLE
11 Bob 18 HLE
15 Jane 23 ECE
18 Mary 31 MSPE
students table

58
Exercises-Relational Algebra
1) Write SQL query to select Stud_num and name.

2) By using Relational Algebra, write expression to select


Stud_num and Age.

3) Write SQL query to select all students with age below than 25.

4) Write Relational Algebra expression to select all students with


age below than 25.

59
Exercises-Relational Algebra (Answ.)
1) SELECT Stud_num, name from students;

2) π Stud_num, Age (students)

3) SELECT * from students where age<25;

4) σAge<25 (students)

60

You might also like