Professional Documents
Culture Documents
Relational Model was proposed by E.F. Codd to model data in the form of relations or
tables. After designing the conceptual model of Database using ER diagram, we need to convert
the conceptual model in the relational model which can be implemented using any RDBMS
languages like Oracle SQL, MySQL etc.
4 SURESH DELHI 18
1. Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is stored
along with its entities. A table has two properties rows and columns. Rows represent
records and columns represent attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its
attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the
relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9. Relation key – Every row has one, two or multiple attributes, which is called relation
key.
10. Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
2. Relational Constraints
Relational Integrity constraints in DBMS are referred to conditions which must be present
for a valid relation. These Relational constraints in DBMS are derived from the rules in the mini-
world that the database represents.
There are many types of Integrity Constraints in DBMS. Constraints on the Relational
database management system is mostly divided into three main categories are:
1. Domain Constraints
2. Key Constraints
3. Referential Integrity Constraints
Domain Constraints
Domain constraints can be violated if an attribute value is not appearing in the
corresponding domain or it is not of the appropriate data type.
Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real numbers,
characters, Booleans, variable length strings, etc.
Example:
Example:
In the given table, CustomerID is a key attribute of Customer Table. It is most likely to
have a single key for one customer, CustomerID =1 is only for the CustomerName =” Google”
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
.
CustomerID CustomerName Status
1 Google Active
2 Amazon Active
3 Apple Inactive
Referential Integrity Constraints
Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A
foreign key is an important attribute of a relation which should be referred to in other
relationships. Referential integrity constraint state happens where relation refers to a key attribute
of a different or same relation. However, that key element must exist in the table.
Example:
3. RELATIONAL LANGUAGE
Relational language is a type of programming language in which the
programming logic is composed of relations and the output is computed based on
the query applied. Relational language works on relations among data and
entities to compute a result.
3.1 Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain
the result of the query. It uses operators to perform queries.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).
o Notation: σ p(r)
Where
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT.
These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of
the attributes are eliminated from the table.
o It is denoted by ∏.
o Notation: ∏ A1, A2, An (r)
Where
A1, A2, A3 is used as an attribute name of relation r.
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are
either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
o Notation: R ∪ S
o R and S must have the attribute of the same number.
o Duplicate tuples are eliminated automatically.
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that
are in both R & S.
o It is denoted by intersection ∩.
o Notation: R ∩ S
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that
are in R but not in S.
o It is denoted by intersection minus (-).
o Notation: R - S
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other
table. It is also known as a cross product.
o It is denoted by X.
o Notation: E X D
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
{T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
The second form of relation is known as Domain relational calculus. In domain relational
calculus, filtering variable uses the domain of attributes. Domain relational calculus uses the same
operators as tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses
Existential (∃) and Universal Quantifiers (∀) to bind the variable. The QBE or Query by example
is a query language related to domain relational calculus.
Notation:{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
4. SQL
o SQL stands for Structured Query Language. It is used for storing and managing data in
relational database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create, read,
update and delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as
their standard database language.
o SQL allows users to query the database in a number of ways, using English-like
statements.
o Structure query language is not case sensitive. Generally, keywords of SQL are
written in uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL statement
on one or multiple text line.
o Using the SQL statements, you can perform most of the actions in a database.
o SQL depends on tuple relational calculus and relational algebra.
SET operators are special type of operators which are used to combine the result of
two queries.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
1. UNION
2. UNION ALL
3. INTERSECT
4. MINUS
There are certain rules which must be followed to perform operations using SET
operators in SQL. Rules are as follows:
1. UNION:
o UNION will be used to combine the result of two select statements.
o Duplicate rows will be eliminated from the results obtained after performing the UNION
operation.
o Query: mysql> SELECT *FROM t_Boaz UNION SELECT *FROM t2_Boaz;
2. UNION ALL
o This operator combines all the records from both the queries.
o Duplicate rows will be not be eliminated from the results obtained after performing the
UNION ALL operation.
o Query: mysql> SELECT *FROM t_Boaz UNION ALL SELECT *FROM t2_Boaz;
3. INTERSECT:
o It is used to combine two SELECT statements, but it only returns the records which are
common from both SELECT statements.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o Query:
mysql> SELECT *FROM t_Boaz INTERSECT SELECT *FROM t2_Boaz;
4. MINUS
o It displays the rows which are present in the first query but absent in the second query with
no duplicates.
o Query: mysql> SELECT *FROM t_Boaz MINUS SELECT *FROM t2_Boaz;
1. COUNT Function
o COUNT function is used to Count the number of rows in a database table. It can work on
both numeric and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified
table. COUNT(*) considers duplicate and Null.
Syntax
COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
2. SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric
fields only.
Syntax
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
SUM()
or
SUM( [ALL|DISTINCT] expression )
3. AVG function
The AVG function is used to calculate the average value of the numeric type. AVG
function returns the average of all non-Null values.
Syntax
AVG()
or
AVG( [ALL|DISTINCT] expression )
4. MAX Function
MAX function is used to find the maximum value of a certain column. This function
determines the largest value of all selected values of a column.
Syntax
MAX()
or
MAX( [ALL|DISTINCT] expression)
5. MIN Function
MIN function is used to find the minimum value of a certain column. This function
determines the smallest value of all selected values of a column.
Syntax
MIN()
or
MIN( [ALL|DISTINCT] expression)
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
3. Complex view is nothing but the view which has been created with multiple joins,
group by statements or set operators to fetch the complex data from multiple tables.
4. The complex views are used to fetch the complex operations to fetch the complex
data from multiple table.
Example:
2.Item:-Customer_code,Item_code,Item_name,Item_category columns
INSERT Statement
The INSERT Statement adds one or more rows to a table. It has two formats:
INSERT INTO table-1 [(column-list)] VALUES (value-list) and, INSERT INTO table-1 [(column-list)]
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
S3 P2 200 S3 P2 200
S2 P3 500
UPDATE Statement
The UPDATE statement modifies columns in selected table rows. It has the following general
format:
DELETE Statement
The DELETE Statement removes selected rows from a table. It has the following general format:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
P3 Dongle Green P2 Widget Red
Sample Table
EMPLOYEE
101 1 Testing
102 2 Development
103 3 Designing
104 4 Development
1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as
the condition is satisfied. It returns the combination of all rows from both the tables where the
condition satisfies.
Syntax
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
INNER JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the
right table. If there is no matching join value, it will return NULL.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
LEFT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
3. RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table
and the matched values from the left table. If there is no matching in both tables, it will return
NULL.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
RIGHT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join
tables have all the records from both tables. It puts NULL on the place of matches not found.
Syntax
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
The DDL Commands in Structured Query Language are used to create and modify the
schema of the database and its objects. The syntax of DDL commands is predefined for describing
the data.
1. CREATE Command
2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command
1. CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other database objects.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Syntax to create a new table:
2. DROP Command
DROP is a DDL command used to delete/remove the database objects from the SQL database.
We can easily remove the entire table, view, or index from the database using this DDL command.
Syntax to remove a database: DROP DATABASE Database_Name;
3. ALTER Command
ALTER is a DDL command which changes or modifies the existing structure of the database,
and it also changes the schema of database objects.
4. TRUNCATE Command
TRUNCATE is another DDL command which deletes or removes all the records from the table.
5. RENAME Command
RENAME is a DDL command which is used to change the name of the database table.
It is the language that we use to perform operations and transactions on the databases.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Allows integrating authentication service for large scale applications.
Provides extra security to database transactions.
Avoids logical errors while performing transactions on our database.
Makes it easy to integrate the frontend and the backend of our application.
It helps us to manage big industrial applications and manage the transactions without any
added overhead.
SQL aggregate functions return a single value, calculated from values in a column.
Useful aggregate functions:
AVG() - Returns the average value
COUNT() - Returns the number of rows
FIRST() - Returns the first value
LAST() - Returns the last value
MAX() - Returns the largest value
MIN() - Returns the smallest value
SUM() - Returns the sum
SQL scalar functions return a single value, based on the input value.
Useful scalar functions:
UCASE() - Converts a field to upper case
LCASE() - Converts a field to lower case
MID() - Extract characters from a text field
LEN() - Returns the length of a text field
ROUND() - Rounds a numeric field to the number of decimals specified
NOW() - Returns the current system date and time
FORMAT() - Formats how a field is to be displayed.
Entity Integrity - This is related to the concept of primary keys. All tables should have
their own primary keys which should uniquely identify a row and not be NULL.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Referential Integrity -This is related to the concept of foreign keys. A foreign key is a key
of a relation that is referred in another relation.
Domain Integrity -This means that there should be a defined domain for all the columns in
a database.
1. Database Security
Database security has many different layers, but the key aspects are:
Authentication
User authentication is to make sure that the person accessing the database is who he claims
to be. Authentication can be done at the operating system level or even the database level itself.
Many authentication systems such as retina scanners or bio-metrics are used to make sure
unauthorized people cannot access the database.
Authorization
Authorization is a privilege provided by the Database Administer. Users of the database
can only view the contents they are authorized to view. The rest of the database is out of bounds
to them.
The different permissions for authorizations available are:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
UNIT – III: DATA NORMALIZATION
Normalization
A large database defined as a single relation may result in data duplication.
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is
also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
Pitfalls in Relational database Design Relational database design requires that we find a
“good” collection of relational schemas. A bad design may lead to
Repetition of information
Inability to represent certain information
Design Goals for Relational Database
Avoid redundant data
Ensure that relationships among attributes are represented
Facilitate the checking of updates for violation of database integrity constraints
Example
Consider the relational schema Lending-schema = (branch-name, branch-city, assets, customer-
name, loan- number, amount)
Redundancy
Data for branch name, branch city, assets are repeated for each loan that a branch makes.
Wastes space and complicates updating
Null Values
cannot store information about a branch if no loan exists
can use null values, but they are difficult to handle.
In the given example the database design is faulty which makes the above pitfalls in database. if
the design is not good then there will be faults in databases.
2. Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss
of information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
3. Functional Dependencies
X → Y
The left side of FD is known as a determinant, the right side of the production is
known as a dependent.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Functional dependency can be written as: Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial depend
encies too.
Example:
ID → Name,
Name → DOB
4. Normalization
A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many records in
relation.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into
a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Normalization works through a series of stages called Normal forms. The normal forms
apply to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining
should be lossless.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
o The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o In the second normal form, all non-key attributes are fully functional dependent on the
primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
83 Computer
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-
trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
EMP_ID EMP_NAME EMP_ZIP
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to identify a
valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Boaz
Computer Mercy
Math Mercy
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Boaz
Semester 1 Mercy
Semester 1 Mercy
Semester 2 Akash
Semester 1 Praveen
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
UNIT – IV: STORAGE AND FILE ORGANIZATION
1. Disks
Magnetic Disk Storage: This type of storage media is also known as online storage media. A
magnetic disk is used for storing the data for a long time. It is capable of storing an entire
database. It is the responsibility of the computer system to make availability of the data from a
disk to the main memory for further accessing. Also, if the system performs any operation over
the data, the modified data should be written back to the disk. The tremendous capability of a
magnetic disk is that it does not affect the data due to a system crash or failure, but a disk failure
can easily ruin as well as destroy the stored data.
2. RAID
RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy
of data to all the disks in the array. RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.
RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
of the data words are stored on a different set disks. Due to its complex structure and high cost,
RAID 2 is not commercially available.
RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4
uses block-level striping. Both level 3 and level 4 require at least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data
block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated and
stored in distributed fashion among multiple disks. Two parities provide additional fault tolerance.
This level requires at least four disk drives to implement RAID.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
3. Tertiary Storage
It is the storage type that is external from the computer system. It has the slowest speed. But it
is capable of storing a large amount of data. It is also known as Offline storage. Tertiary storage is
generally used for data backup. There are following tertiary storage devices available:
o Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact
Disk (CD) can store 700 megabytes of data with a playtime of around 80 minutes. On the
other hand, a Digital Video Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each
side of the disk.
o Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for
archiving or backing up the data. It provides slow access to data as it accesses data
sequentially from the start. Thus, tape storage is also known as sequential-access storage.
Disk storage is known as direct-access storage as we can directly access the data from any
location on disk.
4. Storage Access
These storage media are organized on the basis of data accessing speed, cost per unit of
data to buy the medium, and by medium's reliability. Thus, we can create a hierarchy of storage
media on the basis of its cost and speed.
The higher levels are expensive but fast. On moving down, the cost per bit is decreasing,
and the access time is increasing. Also, the storage media from the main memory to up
represents the volatile nature, and below the main memory, all are non-volatile devices.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
2. Secondary memory
Secondary memory or storage is used to store data in computer system. The secondary storage
is relatively slower than cache or main memory.
Example: Magnetic tape, hard disk, CD, DVD etc.
3. Memory Hierarchy
A computer system has a well-defined hierarchy of memory. A CPU has direct access to it main
memory as well as its inbuilt registers. The access time of the main memory is obviously less than
the CPU speed. To minimize this speed mismatch, cache memory is introduced. Cache memory
provides the fastest access time and it contains data that is most frequently accessed by the CPU.
The memory with the fastest access is the costliest one. Larger storage devices offer slow speed
and they are less expensive, however they can store huge volumes of data as compared to CPU
registers or cache memory.
5. File Organisation
1. The File is a collection of records. Using the primary key, we can access the records. The
type and frequency of access can be determined by the type of file organization which was
used for a given set of records.
2. File organization is a logical relationship among various records. This method defines how
file records are mapped onto disk blocks.
3. File organization is used to describe the way in which the records are stored in terms of
blocks, and the blocks are placed on the storage medium.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
4. The first approach to map the database to the file is to use the several files and store only
one fixed length record in any given file.
File organization contains various methods. These particular methods have pros and cons
on the basis of access or selection. In the file organization, the programmer decides the best-suited
file organization method according to his requirement.
This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are
nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence, then it
will be placed at the end of the file. Here, records are nothing but a row in any table.
There is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7.
Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the end of
the file, and then it will sort the sequence.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
If the database is very large then searching, updating or deleting of record will be time-
consuming because there is no sorting or ordering of records. In the heap file organization, we
need to check all the data until we get the requested record.
When a record has to be received using the hash key columns, then the address is
generated, and the whole record is retrieved using that address. In the same way, when a new
record has to be inserted, then the address is generated using the hash key and record is directly
inserted. The same process is applied in the case of delete and update.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
5.5 Indexed sequential access method (ISAM)
ISAM method is an advanced sequential file organization. In this method, records are stored in
the file using the primary key. An index value is generated for each primary key and mapped with
the record. This index contains the address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data block
is fetched and the record is retrieved from the memory.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
This can directly insert, update or delete any record. Data is sorted based on the key with which
searching is done. Cluster key is a type of key with which joining of the table is performed.
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The
above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here,
all the records are grouped based on the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on
the cluster key, we generate the value of the hash key for the cluster key and store the records with
the same hash key value.
A data dictionary is like the A-Z dictionary of the relational database system holding all
information of each relation in the database.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o Various integrity constraints
With this, the system also keeps the following data based on users of the system:
In addition to this, the system may also store some statistical and descriptive data about the
relations, such as:
A system may also store the storage organization, whether sequential, hash, or heap. It also
notes the location where each relation is stored:
o If relations are stored in the files of the operating system, the data dictionary note, and
stores the names of the file.
o If the database stores all the relations in a single file, the data dictionary notes and store the
blocks containing records of each relation in a data structure similar to a linked list.
At last, it also stores the information regarding each index of all the relations:
All the above information or metadata is stored in a data dictionary. The data dictionary also
maintains updated information whenever they occur in the relations. Such metadata constitutes a
miniature database. Some systems store the metadata in the form of a relation in the database
itself. The system designers design the way of representation of the data dictionary. Also, a data
dictionary stores the data in a non-formalized manner. It does not use any normal form so as to
fastly access the data stored in the dictionary.
For example, in the data dictionary, it uses underline below the value to represent that the
following field contains a primary key.
The database system requires fetching records from a relation, it firstly finds in the relation
of data dictionary about the location and storage organization of the relation. After confirming the
details, it finally retrieves the required record from the database.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.