You are on page 1of 98

UNIT-I: BASIC CONCEPTS:

Database Management System - File based system - Advantages of DBMS over file based system –
Database Approach - Logical DBMS Architecture - Three level architecture of DBMS or logical DBM
architecture – Need for three level architecture - Physical DBMS Architecture - Database Administrator
(DBA) Functions & Role - Data files indices and Data Dictionary - Types of Database.
Relational and ER Models: Data Models - Relational Model – Domains - Tuple and Relation – Super
keys - Candidate keys - Primary keys and foreign key for the Relations - Relational Constraints - Domain
Constraint – Key Constraint - Integrity Constraint - Update Operations and Dealing with Constraint
Violations - Relational Operations - Entity Relationship (ER) Model – Entities – Attributes –
Relationships - More about Entities and Relationships - Defining Relationship for College Database - ER
Diagram - Conversion of E-R Diagram to Relational Database.

MS Bhanu
Introduction to Databases:
What is Data?
The raw facts are called as data. The word “raw” indicates that they have not been processed.
Ex: For example 89 is the data.
What is information?
The processed data is known as information.
Ex: Marks: 89; then it becomes information.
What is Knowledge?
1. Knowledge refers to the practical use of information.
2. Knowledge necessarily involves a personal experience.
DATA/INFORMATION PROCESSING:
The process of converting the data (raw facts) into meaningful information is called as
data/information processing.

Note: In business processing knowledge is more useful to make decisions for any organization.
DIFFERENCE BETWEEN DATA AND INFORMATION:
DATA INFORMATION
1. Raw facts 1. Processed data
2. It is in unorganized form 2. It is in organized form
3. Data doesn‟t help Decision making 3. Information help in Decision making
process process

FILE ORIENTED APPROACH:


The earliest business computer systems were used to process business records and produce
information. They were generally faster and more accurate than equivalent manual systems. These
systems stored groups of records in separate files, and so they were called file processing systems.
 File system is a collection of data. Any management with the file system, user has to write the
procedures
 File system gives the details of the data representation and Storage of data.
 In File system storing and retrieving of data cannot be done efficiently.
2

MS Bhanu
 Concurrent access to the data in the file system has many problems like a Reading the file
while other deleting some information, updating some information
 File system doesn‟t provide crash recovery mechanism.
Eg. While we are entering some data into the file if System crashes then content of the file
is lost.
 Protecting a file under file system is very difficult.
The typical file-oriented system is supported by a conventional operating system. Permanent
records are stored in various files and a number of different application programs are written to
extract records from and add records to the appropriate files.

DISADVANTAGES OF FILE-ORIENTED SYSTEM:


The following are the disadvantages of File-Oriented System:
Data Redundancy and Inconsistency:
Since files and application programs are created by different programmers over a long period
of time, the files are likely to be having different formats and the programs may be written in several
programming languages. Moreover, the same piece of information may be duplicated in several
places. This redundancy leads to higher storage and access cost. In addition, it may lead to data
inconsistency.
Difficulty in Accessing Data:
The conventional file processing environments do not allow needed data to be retrieved in a
convenient and efficient manner. Better data retrieval system must be developed for general use.
Data Isolation:
Since data is scattered(spread) in various files, and files may be in different formats, it is
difficult to write new application programs to retrieve the appropriate data.
Concurrent Access Anomalies:
In order to improve the overall performance of the system and obtain a faster response time,
many systems allow multiple users to update the data simultaneously. In such an environment,
interaction of concurrent updates may result in inconsistent data.
Security Problems:
Not every user of the database system should be able to access all the data. For example, in
banking system, payroll personnel need only that part of the database that has information about
various bank employees. They do not need access to information about customer accounts. It is
difficult to enforce such security constraints.

MS Bhanu
Integrity Problems:
The data values stored in the database must satisfy certain types of consistency constraints. For example,
the balance of a bank account may never fall below a prescribed amount. These constraints are enforced in
the system by adding appropriate code in the various application programs. When new constraints are
added, it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items for different files.
Atomicity Problem:
A computer system like any other mechanical or electrical device is subject to failure. In many
applications, it is crucial to ensure that once a failure has occurred and has been detected, the data are
restored to the consistent state existed prior to the failure

Database
A Database is a collection of related data organised in a way that data can be easily accessed,
managed and updated. Database can be software based or hardware based, with one sole purpose,
storing data.
During early computer days, data was collected and stored on tapes, which were mostly write-only,
which means once data is stored on it, it can never be read again. They were slow and bulky, and soon
computer scientists realised that they needed a better solution to this problem.

DBMS
A DBMS is software that allows creation, definition and manipulation of database, allowing users to
store, process and analyse data easily. DBMS provides us with an interface or a tool, to perform
various operations like creating database, storing data in it, updating data, creating tables in the
database and a lot more.
DBMS also provides protection and security to the databases. It also maintains data consistency in
case of multiple users.
Here are some examples of popular DBMS used these days:
 MySql
 Oracle
 SQL Server
 IBM DB2
 PostgreSQL
 Amazon SimpleDB (cloud based) etc.

Characteristics of Database Management System


A database management system has following characteristics:

MS Bhanu
1. Data stored into Tables: Data is never directly stored into the database. Data is stored into tables,
created inside the database. DBMS also allows to have relationships between tables which makes
the data more meaningful and connected. You can easily understand what type of data is stored
where by looking at all the tables created in a database.
2. Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard
drives were too expensive, unnecessary repetition of data in database was a big problem. But
DBMS follows Normalisation which divides the data in such a way that repetition is minimum.
3. Data Consistency: On Live data, i.e. data that is being continuosly updated and added,
maintaining the consistency of data can become a challenge. But DBMS handles it all by itself.
4. Support Multiple user and Concurrent Access: DBMS allows multiple users to work on
it(update, insert, delete data) at the same time and still manages to maintain the data consistency.
5. Query Language: DBMS provides users with a simple Query language, using which data can be
easily fetched, inserted, deleted and updated in a database.
6. Security: The DBMS also takes care of the security of data, protecting the data from un-
authorised access. In a typical DBMS, we can create user accounts with different access
permissions, using which we can easily secure our data by restricting user access.
7. DBMS supports transactions, which allows us to better handle and manage data integrity in real
world applications where multi-threading is extensively used.

ADVANTAGES OF A DBMS OVER FILE SYSTEM:


Using a DBMS to manage data has many advantages:
Data Independence:
Application programs should be as independent as possible from details of data representation and
storage. The DBMS can provide an abstract view of the data to insulate application code from such
details.
Efficient Data Access:
A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This
feature is especially important if the data is stored on external storage devices.
Data Integrity and Security:
If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on
the data. For example, before inserting salary information for an employee, the DBMS can check that
the department budget is not exceeded. Also, the DBMS can enforce access controls that govern what
data is visible to different classes of users.

MS Bhanu
Concurrent Access and Crash Recovery:
A database system allows several users to access the database concurrently. Answering
different questions from different users with the same (base) data is a central aspect of an information
system. Such concurrent use of data increases the economy of a system.
An example for concurrent use is the travel database of a bigger travel agency. The employees
of different branches can access the database concurrently and book journeys for their clients. Each
travel agent sees on his interface if there are still seats available for a specific journey or if it is
already fully booked.
A DBMS also protects data from failures such as power failures and crashes etc. by the
recovery schemes such as backup mechanisms and log files etc.
Data Administration:
When several users share the data, centralizing the administration of data can offer significant
improvements. Experienced professionals, who understand the nature of the data being managed, and
how different groups of users use it, can be responsible for organizing the data representation to
minimize redundancy and fine-tuning the storage of the data to make retrieval efficient.
Reduced Application Development Time:
DBMS supports many important functions that are common to many applications accessing
data stored in the DBMS. This, in conjunction with the high-level interface to the data, facilitates
quick development of applications. Such applications are also likely to be more robust than
applications developed from scratch because many important tasks are handled by the DBMS instead
of being implemented by the application.

Disadvantages of DBMS
 It's Complexity
 Except MySQL, which is open source, licensed DBMSs are generally costly.
 They are large in size

DATABASE APPROACH
The objectives of database approaches includes,
1. Data sharability
2. Data availability
3. Data independency
4. Data integrity
5. Data security

MS Bhanu
1. Data sharability: the sharability objective ensures that the data item developed by one
application can be shared among all the applications. These objectives results in reducing the level
of unplanned redundancies which basically occur when same data is stored at multiple locations.
2. Data Availability: This objective ensures that the requested data is available to the user in a
meaningful format which results in decreasing the access time.

3. Data independency: this objectives ensures that the database programs are stored in such a
way that they are independent of their storage details. The conceptual schema provides physical
storage details and external schema provide logical storage details i.e., the conceptual schema
provide independence from external schema.

4. Data integrity: This objectives ensures that the data values enters in the database fall within a
specified range and are of correct format. Data integrity can be achived by enabling DBA to have
full control of database and the operations performed on it.

5. Data Security: Data is a dynamic important of an organization and must be confidential. Such
confidential data must be properly secured such that it is not accessed by unauthorized persons.
This can be achieved by employing data security.

Components of a DBMS
A database management system (DBMS) consists of several components. Each component plays very
important role in the database management system environment. The major components of database
management system are:
 Software
 Hardware
 Data
 Procedures
 Database Access Language
Software
The main component of a DBMS is the software. It is the set of programs used to handle the database
and to control and manage the overall computerized database
1. DBMS software itself, is the most important software component in the overall system
2. Operating system including network software being used in network, to share the data of
database among multiple users.
3. Application programs developed in programming languages such as C++, Visual Basic that
are used to to access database in database management system. Each program contains
statements that request the DBMS to perform operation on database. The operations may

MS Bhanu
include retrieving, updating, deleting data etc . The application program may be conventional
or online workstations or terminals.
Hardware
Hardware consists of a set of physical electronic devices such as computers (together with associated
I/O devices like disk drives), storage devices, I/O channels, electromechanical devices that make
interface between computers and the real world systems etc, and so on. It is impossible to implement
the DBMS without the hardware devices, In a network, a powerful computer with high data
processing speed and a storage device with large storage capacity is required as database server.

Data
Data is the most important component of the DBMS. The main purpose of DBMS is to process the
data. In DBMS, databases are defined, constructed and then data is stored, updated and retrieved to
and from the databases. The database contains both the actual (or operational) data and the metadata
(data about data or description about data).
Procedures
Procedures refer to the instructions and rules that help to design the database and to use the DBMS.
The users that operate and manage the DBMS require documented procedures on hot use or run the
database management system. These may include.
1. Procedure to install the new DBMS.
2. To log on to the DBMS.
3. To use the DBMS or application program.
4. To make backup copies of database.
5. To change the structure of database.
6. To generate the reports of data retrieved from database.

MS Bhanu
Database Access Language
The database access language is used to access the data to and from the database. The users use the
database access language to enter new data, change the existing data in database and to retrieve
required data from databases. The user write a set of appropriate commands in a database access
language and submits these to the DBMS. The DBMS translates the user commands and sends it to a
specific part of the DBMS called the Database Jet Engine. The database engine generates a set of
results according to the commands submitted by user, converts these into a user readable form called
an Inquiry Report and then displays them on the screen. The administrators may also use the database
access language to create and maintain the databases.
The most popular database access language is SQL (Structured Query Language). Relational
databases are required to have a database query language.
Users
The users are the people who manage the databases and perform different operations on the databases
in the database system.There are three kinds of people who play different roles in database system
1. Application Programmers
2. Database Administrators
3. End-Users
Application Programmers
The people who write application programs in programming languages (such as Visual Basic, Java, or
C++) to interact with databases are called Application Programmer.
Database Administrators
A person who is responsible for managing the overall database management system is called database
administrator or simply DBA.
End-Users
The end-users are the people who interact with database management system to perform different
operations on database such as retrieving, updating, inserting, deleting data etc.

Three level architecture of DBMS or logical DBM


architecture
1. Physical Level
2. Conceptual Level
3. External Level

MS Bhanu
In the above diagram,
 It shows the architecture of DBMS.
 Mapping is the process of transforming request response between various database levels of
architecture.
 Mapping is not good for small database, because it takes more time.
 In External / Conceptual mapping, DBMS transforms a request on an external schema against
the conceptual schema.
 In Conceptual / Internal mapping, it is necessary to transform the request from the conceptual
to internal levels.
1. Physical Level
 Physical level describes the physical storage structure of data in database.
 It is also known as Internal Level.
 This level is very close to physical storage of data.
 At lowest level, it is stored in the form of bits with the physical addresses on the secondary
storage device.
 At highest level, it can be viewed in the form of files.
 The internal schema defines the various stored data types. It uses a physical data model.
2. Conceptual Level
 Conceptual level describes the structure of the whole database for a group of users.
10

MS Bhanu
 It is also called as the data model.
 Conceptual schema is a representation of the entire content of the database.
 These schema contains all the information to build relevant external records.
 It hides the internal details of physical storage.
3. External Level
 External level is related to the data which is viewed by individual end users.
 This level includes a no. of user views or external schemas.
 This level is closest to the user.
 External view describes the segment of the database that is required for a particular user group
and hides the rest of the database from that user group.
PHYSICAL STRUCTURE of DBMS or OVERALL STRUCTURE of DBMS
Components of DBMS are broadly classified as follows :
1. Query Processor :
(a) DML Compiler
(b) Embedded DML pre-compiler
(c) DDL Interpreter
(d) Query Evaluation Engine
2. Storage Manager :
(a) Authorization and Integrity Manager
(b) Transaction Manager
(c) File Manager
(d) Buffer Manager
3. Data Structure :
(a) Data Files
(b) Data Dictionary
(c) Indices
(d) Statistical Data
1. Query Processor Components :
• DML Pre-compiler : It translates DML statements in a query language into low level instructions
that query evaluation engine understands. It also attempts to transform user's request into an
equivalent but more efficient form.
• Embedded DML Pre-compiler : It converts DML statements embedded in an application program
to normal procedure calls in the host language. The Pre-compiler must interact with the DML
compiler to generate the appropriate code.
• DDL Interpreter : It interprets the DDL statements and records them in a set of tables containing

11

MS Bhanu
meta data or data dictionary.
• Query Evaluation Engine : It executes low-level instructions generated by the DML compiler.
2. Storage Manager Components :
They provide the interface between the low-level data stored in the database and application programs
and queries submitted to the system.
• Authorization and Integrity Manager : It tests for the satisfaction of integrity constraints checks
the authority of users to access data.
• Transaction Manager : It ensures that the database remains in a consistent state despite the system
failures and that concurrent transaction execution proceeds without conflicting.
• File Manager : It manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
• Buffer Manager : It is responsible for fetching data from disk storage into main memory and
deciding what data to cache in memory.
3. Data Structures :
Following data structures are required as a part of the physical system implementation.
• Data Files : It stores the database.
• Data Dictionary : It stores meta data (data about data) about the structure of the database.
• Indices : Provide fast access to data items that hold particular values.
• Statistical Data : It stores statistical information about the data in the database. This information is
used by query processor to select efficient ways to execute query.

12

MS Bhanu
Database Administrator (DBA) Functions & Role
Installing and Configuration of database:
DBA is responsible for installing the database software. He configure the software of database and
then upgrades it if needed. There are many database software like oracle, Microsoft SQL and MySQL
in the industry so DBA decides how the installing and configuring of these database software will
take place.

Deciding the hardware device


Depending upon the cost, performance and efficiency of the hardware, it is DBA who have the duty of
deciding which hardware devise will suit the company requirement. It is hardware that is an interface
between end users and database so it needed to be of best quality.

Managing Data Integrity


Data integrity should be managed accurately because it protects the data from unauthorized use. DBA
manages relationship between the data to maintain data consistency.

Decides Data Recovery and Back up method


If any company is having a big database, then it is likely to happen that database may fail at any
instance. It is require that a DBA takes backup of entire database in regular time span. DBA has to
decide that how much data should be backed up and how frequently the back should be taken. Also
the recovery of data base is done by DBA if they have lost the database.

Tuning Database Performance


Database performance plays an important role for any business. If user is not able to fetch data
speedily then it may loss company business. So by tuning an modifying sql commands a DBA can
improves the performance of database.

Capacity Issues
All the databases have their limits of storing data in it and the physical memory also has some
limitations. DBA has to decide the limit and capacity of database and all the issues related to it.

Database design
The logical design of the database is designed by the DBA. Also a DBA is responsible for physical
design, external model design, and integrity control.

13

MS Bhanu
Database accessibility
DBA writes subschema to decide the accessibility of database. He decides the users of the database
and also which data is to be used by which user. No user has to power to access the entire database
without the permission of DBA.

Decides validation checks on data


DBA has to decide which data should be used and what kind of data is accurate for the company. So
he always puts validation checks on data to make it more accurate and consistence.

Monitoring performance
If database is working properly then it doesn‟t mean that there is no task for the DBA. Yes f course,
he has to monitor the performance of the database. A DBA monitors the CPU and memory usage.

Decides content of the database


A database system has many kind of content information in it. DBA decides fields, types of fields,
and range of values of the content in the database system. One can say that DBA decides the structure
of database files.

Provides help and support to user


If any user needs help at any time then it is the duty of DBA to help him. Complete support is given to
the users who are new to database by the DBA.

Database implementation
Database has to be implemented before anyone can start using it. So DBA implements the database
system. DBA has to supervise the database loading at the time of its implementation.

Improve query processing performance


Queries made by the users should be performed speedily. As we have discussed that users need fast
retrieval of answers so DBA improves query processing by improving their performance.
Types of Database systems:
The Evolution of Database systems are as follows:
1. File Management System
2. Hierarchical database System
3. Network Database System
4. Relational Database System

14

MS Bhanu
File Management System:
The file management system also called as FMS in short is one in which all data is stored on a
single large file. The main disadvantage in this system is searching a record or data takes a long time.
This lead to the introduction of the concept, of indexing in this system. Then also the FMS system
had lot of drawbacks to name a few like updating or modifications to the data cannot be handled
easily, sorting the records took long time and so on. All these drawbacks led to the introduction of the
Hierarchical Database System.
Hierarchical Database System:
The previous system FMS drawback of accessing records and sorting records which took a long time
was removed in this by the introduction of parent-child relationship between records in database. The
origin of the data is called the root from which several branches have data at different levels and the
last level is called the leaf. The main drawback in this was if there is any modification or addition made
to the structure then the whole structure needed alteration which made the task a dull one. In order to
avoid this next system took its origin which is called as the Network Database System.

Network Database System:


In this the main concept of many-many relationships got introduced. But this also followed
the same technology of pointers to define relationships with a difference in this made in the
introduction if grouping of data items as sets.

15

MS Bhanu
Relational Database System:
In order to overcome all the drawbacks of the previous systems, the Relational Database
System got introduced in which data get organized as tables and each record forms a row with many
fields or attributes in it. Relationships between tables are also formed in this system.

Data Model
Data models show that how the data is connected and stored in the system. It shows the relationship
between data. A Model is basically a conceptualization between attributes and entities. There were
basically three main data models in DBMS that were Network, hierarchical, and relational. But
these days, there a lots of data models that are given below.
There are different types of the data models and now let see each of them in detail:
1. Flat data model
2. Entity relationship model
3. Relation model
4. Record base model
5. Network model
6. Hierarchical model
7. Object oriented data model
8. Context data model

Flat Data Model

Flat data model is the first and foremost introduced model and in this all the data used is kept in the
same plane. Since it was used earlier this model was not so scientific.

Flat Data Model

16

MS Bhanu
Entity Relationship Data Model

Entity relationship model is based on the notion of the real world entities and their relationships.
While formulating the real world scenario in to the database model an entity set is created and this
model is dependent on two vital things and they are :

 Entity and their attributes


 Relationships among entities

Entity Relationship Model

An entity has a real world property called attribute and attribute define by a set of values called
domain. For example, in a university a student is an entity, university is the database, name and age
and sex are the attributes. The relationships among entities define the logical association between
entities.

Relational Data Model

Relational model is the most popular model and the most extensively used model. In this model the
data can be stored in the tables and this storing is called as relation, the relations can be normalized
and the normalized relation values are called atomic values. Each row in a relation contains unique
value and it is called as tuple, each column contains value from same domain and it is called as
attribute.

17

MS Bhanu
Network Data Model

Network model has the entities which are organized in a graphical representation and some entities in
the graph can be accessed through several paths.

Network Model

Hierarchical Data Model

Hierarchical model has one parent entity with several children entity but at the top we should have
only one entity called root. For example, department is the parent entity called root and it has several
children entities like students, professors and many more.

Hierarchical model

Object oriented Data Model

Object oriented data model is one of the developed data model and this can hold the audio, video and
graphic files. These consist of data piece and the methods which are the DBMS instructions.

Object Oriented Data Model


18

MS Bhanu
Context Data Model

Context data model is a flexible model because it is a collection of many data models. It is a
collection of the data models like object oriented data model, network model, semi structured model.
So, in this different types of works can be done due to the versatility of it.

Context Model

Therefore, this support different types of users and differ by the interaction of users in database and
also the data models in DBMS brought a revolutionary change in industries by the handling of
relevant data. The data models in DBMS are the systems that help to use and create databases, as we
have seen there are different types of data models and depending on the kind of structure needed we
can select the data model in DBMS.

Relational Model
Relational Model was proposed by E.F. Codd to model data in the form of relations or tables.
After designing the conceptual model of Database using ER diagram, we need to convert the
conceptual model in the relational model which can be implemented using any RDMBS languages
like Oracle SQL, MySQL etc. So we will see what Relational Model is.
What is Relational Model?
Relational Model represents how data is stored in Relational Databases. A relational database stores
data in the form of relations (tables). Consider a relation STUDENT with attributes ROLL_NO,
NAME, ADDRESS, PHONE and AGE shown in Table 1.
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 18

IMPORTANT TERMINOLOGIES

19

MS Bhanu
1. Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
2. Relation Schema: A relation schema represents name of the relation with its attributes. e.g.;
STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is relation schema for
STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
3. Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples, one of
which is shown as:

1 RAM DELHI 9455123451 18


4. Relation Instance: The set of tuples of a relation at a particular instance of time is called as
relation instance. Table 1 shows the relation instance of STUDENT at a particular time. It can
change whenever there is insertion, deletion or updation in the database.
5. Degree: The number of attributes in the relation is known as degree of the relation. The
STUDENT relation defined above has degree 5.
6. Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT
relation defined above has cardinality 4.
7. Column: Column represents the set of values for a particular attribute. The column
ROLL_NO is extracted from relation STUDENT.

ROLL_NO

4
8. NULL Values: The value which is not known or unavailable is called NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.

Keys
An important constraint on an entity is the key. The key is an attribute or a group of attributes
whose values can be used to uniquely identify an individual entity in an entity set.

Types of Keys
There are several types of keys. These are described below.
Candidate key
A candidate key is a simple or composite key that is unique and minimal. It is unique because
no two rows in a table may have the same value at any time. It is minimal because every column is
necessary in order to attain uniqueness.

20

MS Bhanu
From our COMPANY database example, if the entity is Employee(EID, First Name, Last Name,
SIN, Address, Phone, BirthDate, Salary, DepartmentID), possible candidate keys are:

 EID, SIN
 First Name and Last Name – assuming there is no one else in the company with the same
name
 Last Name and DepartmentID – assuming two people with the same last name don‟t work in
the same department
Composite key
A composite key is composed of two or more attributes, but it must be minimal. Using the
example from the candidate key section, possible composite keys are:
 First Name and Last Name – assuming there is no one else in the company with the same
name
 Last Name and Department ID – assuming two people with the same last name don‟t work in
the same department
Primary key
The primary key is a candidate key that is selected by the database designer to be used as an
identifying mechanism for the whole entity set. It must uniquely identify tuples in a table and not be
null. The primary key is indicated in the ER model by underlining the attribute.
 A candidate key is selected by the designer to uniquely identify tuples in a table. It must not be
null.
 A key is chosen by the database designer to be used as an identifying mechanism for the
whole entity set. This is referred to as the primary key. This key is indicated by underlining
the attribute in the ER model.
In the following example, EID is the primary key:
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID)
Secondary key
A secondary key is an attribute used strictly for retrieval purposes (can be composite), for
example: Phone and Last Name.
Alternate key
Alternate keys are all candidate keys not chosen as the primary key.
Foreign key
A foreign key (FK) is an attribute in a table that references the primary key in another table
OR it can be null. Both foreign and primary keys must be of the same data type. In the COMPANY
database example below, DepartmentID is the foreign key:
21

MS Bhanu
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID)
Nulls
A null is a special symbol, independent of data type, which means either unknown or inapplicable.
It does not mean zero or blank. Features of null include:
 No data entry
 Not permitted in the primary key
 Should be avoided in other attributes
 Can represent
o An unknown attribute value
o A known, but missing, attribute value
o A “not applicable” condition
 Can create problems when functions such as COUNT, AVERAGE and SUM are used
 Can create logical problems when relational tables are linked
NOTE: The result of a comparison operation is null when either argument is null. The result of an
arithmetic operation is null when either argument is null (except functions that ignore nulls).

Constraints in Relational Model


While designing Relational Model, we define some conditions which must hold for data present in
database are called Constraints. These constraints are checked before performing any operation
(insertion, deletion and updation) in database. If there is a violation in any of constrains, operation
will fail.
Domain Constraints: These are attribute level constraints. An attribute can only take values which
lie inside the domain range. e.g,; If a constrains AGE>0 is applied on STUDENT relation, inserting
negative value of AGE will result in failure.
Key Integrity: Every relation in the database should have atleast one set of attributes which defines a
tuple uniquely. Those set of attributes is called key. e.g.; ROLL_NO in STUDENT is a key. No two
students can have same roll number. So a key has two properties:
 It should be unique for all tuples.
 It can‟t have NULL values.
Referential Integrity: When one attribute of a relation can only take values from other attribute of
same relation or any other relation, it is called referential integrity. Let us suppose we have 2 relations
STUDENT

ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE

1 RAM DELHI 9455123451 18 CS

2 RAMESH GURGAON 9652431543 18 CS

22

MS Bhanu
3 SUJIT ROHTAK 9156253131 20 ECE

4 SURESH DELHI 18 IT

BRANCH
BRANCH_CODE BRANCH_NAME

CS COMPUTER SCIENCE

IT INFORMATION TECHNOLOGY

ELECTRONICS AND
ECE
COMMUNICATION ENGINEERING

CV CIVIL ENGINEERING

BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of
BRANCH which is called referential integrity constraint. The relation which is referencing to other
relation is called REFERENCING RELATION (STUDENT in this case) and the relation to which
other relations refer is called REFERENCED RELATION (BRANCH in this case).

Integrity Constraints
Constraints enforce limits to the data or type of data that can be inserted/updated/deleted
from a table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints that can be
created in DBMS.

Types of constraints

 NOT NULL
 UNIQUE
 DEFAULT
 CHECK
 Key Constraints – PRIMARY KEY, FOREIGN KEY
 Domain constraints
 Mapping constraints

NOT NULL:

NOT NULL constraint makes sure that a column does not hold NULL value. When we don‟t provide
value for a particular column while inserting a record into a table, it takes NULL value by default. By
specifying NULL constraint, we can be sure that a particular column(s) cannot have NULL values.
Example:
23

MS Bhanu
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (235),
PRIMARY KEY (ROLL_NO)
);

UNIQUE:
UNIQUE Constraint enforces a column or set of columns to have unique values. If a column has a
unique constraint, it means that particular column cannot have duplicate values in a table
Example:
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
DEFAULT:
The DEFAULT constraint provides a default value to a column when there is no value provided while
inserting a record into a table.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
CHECK:
This constraint is used for specifying range of values for a particular column of a table. When this
constraint is being set on a column, it ensures that the specified column must have the value falling in
the specified range.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL CHECK(ROLL_NO >1000) ,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)

24

MS Bhanu
);

In the above example we have set the check constraint on ROLL_NO column of STUDENT table.
Now, the ROLL_NO field must have the value greater than 1000.

Key constraints:
PRIMARY KEY:
Primary key uniquely identifies each record in a table. It must have unique values and cannot contain
nulls. In the below example the ROLL_NO field is marked as primary key, that means the ROLL_NO
field cannot have duplicate and null values.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
FOREIGN KEY:

Foreign keys are the columns of a table that points to the primary key of another table. They act as a
cross-reference between tables.
Domain constraints:

Each table has certain set of columns and each column allows a same type of data, based on its data
type. The column does not accept values of any other data type.

Update Operations Dealing with Constraint Violations:


The operations of the relational model can be categorized into retrievals and updates.

25

MS Bhanu
Relational Algebra or Relational Operations

The relational algebra is a theoretical procedural query language which takes instance of
relations and does operations that work on one or more relations to describe another relation without
altering the original relation(s). Thus, both the operands and the outputs are relations, and so the
output from one operation can turn into the input to another operation which allows expressions to be
nested in the relational algebra, just as you nest arithmetic operations. This property is called closure:
relations are closed under the algebra, just as numbers are closed under arithmetic operations.
The relational algebra is a relation-at-a-time (or set) language where all tuples are controlled in
one statement without the use of loop. There are several variations of syntax for relational algebra
commands and you use a common symbolic notation for the commands and present it informally.

26

MS Bhanu
The primary operations of relational algebra are as follows:
 Select
 Project
 Union
 Set different
 Cartesian product
 Rename

 Projection (π)
Projection is used to project required column data from a relation.
Example :
R
(A B C)
----------
1 2 4
2 2 3
3 2 3
4 3 4

π (BC)
B C
-----
2 4
2 3
3 4

Note: By Default projection removes duplicate data.


 Selection (σ)
Selection is used to select required tuples of the relations.
for the above relation
σ (c>3)R
will select the tuples which have c more than 3.
Note: selection operator only selects the required tuples but does not display them. For displaying,
data projection operator is used.
For the above selected tuples, to display we need to use projection also.
π (σ (c>3)R ) will show following tuples.

27

MS Bhanu
A B C
-------
1 2 4
4 3 4

 Union Operation (∪)

For R ∪ S, The union of two relations R and S defines a relation that contains all the tuples of R, or S,
or both R and S, duplicate tuples being eliminated. R and S must be union-compatible.

For a union operation to be applied, the following rules must hold −

 r, and s must have the same quantity of attributes.


 Attribute domains must be compatible.
 Duplicate tuples gets automatically eliminated.

 Set difference (−)

For R − S The Set difference operation defines a relation consisting of the tuples that are in relation R,
but not in S. R and S must be union-compatible.
Example:

 Rename(ρ)
This is a unary operator which changes attribute names for a relation without changing any
values. Renaming removes the limitations associated with set operators

28

MS Bhanu
Notation: ρ Old Name → New Name ( r )

Where r is the table name


For example, ρ Father → Parent (Paternity)

 Cartesian or Cross product ( ×)

Cross product between two relations let say A and B, so cross product between A X B will results all
the attributes of A followed by each attribute of B. Each record of A will pairs with every record of B.
below is the example
A B
(Name Age Sex ) (Id Course)
------------------ -------------
Ram 14 M 1 DS
Sona 15 F 2 DBMS
kim 20 M

AXB
Name Age Sex Id Course
---------------------------------
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS

Note: if A has „n‟ tuples and B has „m‟ tuples then A X B will have „n*m‟ tuples.

 Division Operation:

29

MS Bhanu
STUDENT_SPORTS
ROLL_NO SPORTS

1 Badminton

2 Cricket

2 Badminton

4 Badminton

ALL_SPORTS

SPORTS

Badminton

Cricket

Division operator A÷B can be applied if and only if:


 Attributes of B is proper subset of Attributes of A.
 The relation returned by division operator will have attributes = (All attributes of A – All
Attributes of B)
 The relation returned by division operator will return those tuples from relation A which are
associated to every B‟s tuple.
Consider the relation STUDENT_SPORTS and ALL_SPORTS given in Table 1 and Table 2 above.
To apply division operator as
STUDENT_SPORTS ÷ ALL_SPORTS

 The operation is valid as attributes in ALL_SPORTS is a proper subset of attributes in


STUDENT_SPORTS.
 The attributes in resulting relation will have attributes {ROLL_NO,SPORTS}-
{SPORTS}=ROLL_NO
 The tuples in resulting relation will have those ROLL_NO which are associated with all B‟s
tuple {Badminton, Cricket}. ROLL_NO 1 and 4 are associated to Badminton only. ROLL_NO
2 is associated to all tuples of B. So the resulting relation will be:

ROLL_NO

 The Natural Join Operation ( )


 The natural join operation simplifies such type of queries. It combines following three operations
into one operation. The natural join operation -
 Forms a Cartesian product on its argument relations,
30

MS Bhanu
 Performs a selection for equality check on common attributes to remove
unnecessary tuples, and
 Removes duplicate attributes.
 The natural join is denoted by symbol (JOIN).
 The notation to perform this operation can be given as
 Relation 1 Relation 2
Example : Combine only consistent information from Account and Branch relation.

As shown in figure 3.11, natural join operation yield only consistent and useful information. It removes
unnecessary tuples as well as duplicate attributes. This makes the retrieval of information from multiple
relations very easy and convenient.
 Outer Join Operation :
 An extension of the join operation that avoids loss of information.
 Computes the join and then adds tuples form one relation that does not match tuples in the other
relation to the result of the join.
 Uses null values:
 null signifies that the value is unknown or does not exist
 All comparisons involving null are (roughly speaking) false by definition.
 We shall study precise meaning of comparisons with nulls later
 Table name: Client
NAME ID
Rahul 10
Vishal 20

 Table name: Salesman

ID CITY
30 Bombay
20 Madras

31

MS Bhanu
40 Bombay
 Join Client Salesman
NAME ID CITY
Vishal 20 madras
 The outer join operation can be divided into three different forms :

 Left outer join ( )


 Right outer join ( )
 Full outer join ( )

 Left outer join ( )


 The left outer join retains all the tuples of the left relation even though there is no
matching tuple in the right relation.
For such kind of tuple, the attributes of right relation will be padded with null in resultant relation

 Left outer join client salesman


NAME ID CITY
Rahul 10 Null
Vishal 20 madras
 Right outer join ( )
 The right outer join retains all the tuples of the right relation even though there is no
matching tuple in the left relation.
 For such kind of tuple, the attributes of left relation will be padded with null in resultant
relation.
 Right outer join client salesman
NAME ID CITY
Null 30 Bomba
y
Vishal 20 madras
Null 40 Bomba
y
 Full outer join ( )
 The full outer join retains all the tuples of both of the relations. It also pads null
values whenever required.
 Full outer join client salesman
NAME ID CITY
Rahul 10 Null
Null 30 Bomba
y

32

MS Bhanu
Vishal 20 madras
Null 40 Bomba
y

Entity – Relationship Modeling


The Entity – Relationship Model: It is a detailed logical representation of the datafor on
organization or for a business area.
 The E – R Model is expressed in terms of entities, relationship, and attributes in the business
environment.
 An E – R Model expressed as an entity – relationship diagrams ( E – R Diagrams). Which is a
graphical representation of an E – R Model?
E – R Model Notation:-

Strong Entity Relationship

Identifying
Week Entity
Relationship

Associative
Attribute
Entity

Multivalued Derived
Attribute Attribute

The E – R Model Construct


The E – R Model Relationship Model construct the following blocks, there are listed below
A. Entities
B. Attributes
C. Relationships

A. Entities: An entities is a person ,place, object, event or concept in the user environment
about which the organization wishes to maintain data.

Example:
33

MS Bhanu
Person: Employee, Student, Patient.
Place: City, Sate, Country.
Object: Machine, Building, Automobile.
Event: Sale, Registration, Renewal.
Concept: Account, Course.

Entity Types & Entity Instance:


An entity type is a collection of entities that share common properties or characteristics. We use
capital letters for names of entity types.
 An entity instance is a single occurrence of an entity type.
 An entity type is described just once (using metadata in a database) while ,any instances of
that entity type may be represented by data stored in the database.
Example:
There is one EMPLOYEE entity type in most organizations but there may be hundreds of instances of
this entity type stored in the database.
Strong Entity & Weak Entity Type
A strong Entity type is that exist independently of other entity types.
Example: STUDENT, EMPLOYEE, AUTOMOBILE & COURSE.
 Instances of a Strong Entities type always have a unique characteristic (identifiers).
 A weak entity type is an entity type whose existing depends on some other entity type.
 The entity type on which the weak entity type depends is called the “Identifying Owners”.
 A weak entity type does not have its own identifiers.
Example: EMPLOYEE is a strong entity type with identifiers employee_id. DEPENDENT is a weak
entity type as indicated by the double linked rectangle.
 The relationship between a weak entity type as indicated by double linked rectangle.
 The relationship between a weak entity type and its owner is called an identifying relationship.
 “HAS” is the identifying relationship (indicated by the double linked diamond symbol)
 The alternative dependent name as a partial identifiers dependent name is a composite attribute
that can be broke down into component parts.

EMPLOYEE

34

MS Bhanu
EMPLOYEE OF WEAK ENTITY
Characteristic entities
Characteristic entities provide more information about another table. These entities have the
following characteristics:
 They represent multivalued attributes.
 They describe other entities.
 They typically have a one to many relationship.
 The foreign key is used to further identify the characterized table.
 Options for primary key are as follows:
1. Use a composite of foreign key plus a qualifying column
2. Create a new simple primary key. In the COMPANY database, these might include:
 Employee (EID, Name, Address, Age, Salary) – EID is the simple primary key.
EmployeePhone (EID, Phone) – EID is part of a composite primary key. Here, EID is also a foreign
key.

Attributes
An attribute is a descriptive property or characteristics of an entity. The attributes of the entity Customer are
CustNo, Name, Street, City, PostCode, TelNo and Balance.

Types of Attributes
There are a few types of attributes you need to be familiar with. Some of these are to be left as
is, but some need to be adjusted to facilitate representation in the relational model. This first section
will discuss the types of attributes. Later on we will discuss fixing the attributes to fit correctly into
the relational model.
Simple attributes
Simple attributes are an attributes that cannot be broke down into smaller components.
Ex: AUTOMOBILE are simple: Vehicle_id, colors, Weight.
Composite attributes
Composite attributes are an attribute that can be broke down into different components.
35

MS Bhanu
Ex: ADDRESS its component such as Street, Number, SubStreet, State, Postcode.

Figure. An example of composite attributes.


Multivalued attributes
Multivalued attributes are attributes that may take more than one value for a given entity
instant.
 We indicate a multivalued attribute with an ellipse with double lines.
Ex: An example of a multivalued attribute from the COMPANY database, as seen in Figure is the
degrees of an employee: BSc, MIT, PhD.

Figure. Example of a multivalued attribute.


Derived attributes
Derived attributes are attributes that contain values calculated from other attributes. An
example of this can be seen in Figure 8.5. Age can be derived from the attribute Birthdate. In this
situation, Birthdate is called a stored attribute, which is physically saved to the database.

Figure. Example of a derived attribute.

RELATIONSHIPS
36

MS Bhanu
A Relationship is an associated among the instances of one or more entity types i.e. Interest of the
organization.

Relationship Cardinality or Relationship Types


It is a meaningful association between entity types. A Relationship is depend by a diamond symbol
containing the name of the relationship.
Ex:
Course_Title
EMP_Nam Course_id
e

Other
Emp_id
Attribute

EMPLOYEE COURSE
Complet
e

There are three main types of relationship that can exist between entities:
i. one-to-one relationship
ii. one-to-many relationship
iii. many-to-many relationship
i. one-to-one relationship: A one to one (1:1) relationship is the relationship of one entity to
only one other entity, and vice versa. It should be rare in any relational database design. In
fact, it could indicate that two entities actually belong in the same table.

Explanation:
An Order generates only one invoice and an Invoice is generated by an order.
ii. one-to-many relationship: A one to many (1:M) relationship should be the norm in any
relational database design and is found in all relational database environments. For example,
one customer makes many orders.

37

MS Bhanu
Explanation:
Each Customer can make one or more orders and an Order is from one customer.
many-to-many relationship : For a many to many relationship, consider the following points:
 It cannot be implemented as such in the relational model.
 It can be changed into two 1:M relationships.
 It can be implemented by breaking up to produce a set of 1:M relationships.
 It involves the implementation of a composite entity.
 Creates two or more 1:M relationships.
 The composite entity table must contain at least the primary keys of the original tables.
 The linking table contains multiple occurrences of the foreign key values.
 Additional attributes may be assigned as needed.

Explanation:
An Order has one or more product and a Product can be in one or more orders.

Degree of Relationship Type


The number of participating entity types in a relationship is known as the degree of a relationship type.
There are 3 types degree of relationships is listed below.
a. Unary (Degree – 1)
b. Binary(Degree – 2)
c. Ternary (Degree – 3)
a. Unary (Degree – 1): It is a Relationship between the instances of a single entity type. Unary
relationships are also called as Recursive Relationship.

38

MS Bhanu
b. Binary (Degree – 2): It is a Relationship between the attribute of two entity types and is the
most common types of relationship in data modeling. This relationship has three types.
i. One to one:

EMPLOY Is – assigned to PRKING PLACE

EE
 It indicate that an employee is assigned one Parking place, & each parking placeis assigned to
one Employee.
ii. One to Many:

PRODUCT Is – assigned PRODUCT


LINE

 It indicate that a Product line may contain several Product and each Product belong to only
one Product Line.
iii. Many to Many:

STUDEN Register for COURSE


T
 It indicates that a Student may register more than one Course and that Each Course may have
many Students Registrations.
c. Ternary (Degree – 3): A Ternary Relationship is simultaneous among the instances of three
entity types.

39

MS Bhanu
Developing an E – R diagram:
Entity Relationship diagrams are major data modeling tool and will help organize the data in our
project into two entities and define the relationships between the entities.
Components of ERD: There are four Components, they are
i. Entity
ii. Relationship
iii. Cardinality
iv. Attribute
i. Entity: A data Entity is anything real or abstract about which we want to store data.
Ex: EMPLOYEE: Employee_id, Employee_name, Address
PAYMENT: Payment_id, Payment_Type.
BOOKS: Book_id, Book_Type.
ii. Relationship: A Relationship is a natural association that exists between one or more
entities.
Ex: Employee process Payment.
iii. Cardinality: Define the number of occurrence of one entity for a single occurrence of the
related entity.
Ex: An Employee may process many Payments but might not process any Payments,
depending on the nature of his / her job.
iv. Attribute: A data Attribute is a characteristics common to all or most instances of a
particular entity.
Ex: Name, Employee_No are all attributes of the entity “EMPLOYEE”

40

MS Bhanu
A Simple Example for E – R Diagram

Here we are going to design an Entity Relationship (ER) model for a college database .
Say we have the following statements.
1. A college contains many departments
2. Each department can offer any number of courses
3. Many instructors can work in a department
4. An instructor can work only in one department
5. For each department there is a Head
6. An instructor can be head of only one department
7. Each instructor can take any number of courses
8. A course can be taken by only one instructor
9. A student can enroll for any number of courses
10. Each course can have any number of students
Good to go. Let's start our design.(Remember our previous topic and the notations we have
used for entities, attributes, relations etc )
Step 1 : Identify the Entities
What are the entities here?
From the statements given, the entities are
1. Department
2. Course
3. Instructor

41

MS Bhanu
4. Student
Stem 2 : Identify the relationships
1. One department offers many courses. But one particular course can be offered by only
one department. hence the cardinality between department and course is One to Many
(1:N)
2. One department has multiple instructors . But instructor belongs to only one
department. Hence the cardinality between department and instructor is One to Many
(1:N)
3. One department has only one head and one head can be the head of only one
department. Hence the cardinality is one to one. (1:1)
4. One course can be enrolled by many students and one student can enroll for many
courses. Hence the cardinality between course and student is Many to Many (M:N)
5. One course is taught by only one instructor. But one instructor teaches many courses.
Hence the cardinality between course and instructor is Many to One (N :1)
Step 3: Identify the key attributes
 "Departmen_Name" can identify a department uniquely. Hence Department_Name is
the key attribute for the Entity "Department".
 Course_ID is the key attribute for "Course" Entity.
 Student_ID is the key attribute for "Student" Entity.
 Instructor_ID is the key attribute for "Instructor" Entity.
Step 4: Identify other relevant attributes
 For the department entity, other attributes are location
 For course entity, other attributes are course_name,duration
 For instructor entity, other attributes are first_name, last_name, phone
 For student entity, first_name, last_name, phone
Step 5: Draw complete ER diagram
By connecting all these details, we can now draw ER diagram as given below.

42

MS Bhanu
43

MS Bhanu
UNIT-II: DATABASE INTEGRITY AND
NORMALISATION

Relational Database Integrity - The Keys - Referential Integrity - Entity Integrity -


Redundancy and Associated Problems – Single Valued Dependencies – Normalisation -
Rules of Data Normalisation - The First Normal Form - The Second Normal Form - The
Third Normal Form - Boyce Codd Normal Form - Attribute Preservation - Lossless-join
Decomposition - Dependency Preservation.

File Organization: Physical Database Design Issues - Storage of Database on Hard Disks -
File Organization and Its Types – Heap files (Unordered files) - Sequential File
Organization - Indexed (Indexed Sequential) File Organization - Hashed File Organization
- Types of Indexes - Index and Tree Structure - Multi-Key File Organization - Need for
Multiple Access Paths - Multi-list File Organization - Inverted File Organization.

44

MS Bhanu
Data redundancy and associated problems
DBMS Data Redundancy:

It refers to the situation when the same data exists in more than one entity. It may also refer to the fact
that unnecessary or duplicated data is stored at different locations in the database. For instance

Problems caused by Data Redundancy:

There are three types of anomalies that occur when the database is not normalized. These are –
Insertion, update and deletion anomaly. Let‟s take an example to understand this.

Example: Suppose a manufacturing company stores the employee details in a table named employee that
has four attributes: emp_id for storing employee‟s id, emp_name for storing employee‟s name,
emp_address for storing employee‟s address and emp_dept for storing the department details in which the
employee works. At some point of time the table looks like this:

emp_id emp_name emp_address emp_dept


101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the same in
two rows or the data will become inconsistent. If somehow, the correct address gets updated in one
department but not in other then as per the database, Rick would be having two different addresses, which
is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if emp_dept field
doesn‟t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the
rows that are having emp_dept as D890 would also delete the information of employee Maggie since she
is assigned only to this department.

45

MS Bhanu
To overcome these anomalies we need to normalize the data. In the next section we will discuss about
normalization.

Functional dependency in DBMS


The attributes of a table is said to be dependent on each other when an attribute of a table uniquely
identifies another attribute of the same table.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id we
can tell the student name associated with it. This is known as functional dependency and can be written as
Stu_Id->Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.
Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B
(Attribute B is functionally dependent on attribute A)
Types of Functional Dependencies

 Trivial functional dependency


 non-trivial functional dependency
 Multivalued dependency
 Transitive dependency
1. Trivial functional dependency in DBMS with example
The dependency of an attribute on a set of attributes is known as trivial functional dependency if the set of
attributes includes that attribute.
Symbolically: A ->B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A->A & B->B
For example: Consider a table with two columns Student_id and Student_Name.
{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as Student_Id is a subset of
{Student_Id, Student_Name}. That makes sense because if we know the values of Student_Id and
Student_Name then the value of Student_Id can be uniquely determined.
Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial dependencies too.
2. Non trivial functional dependency in DBMS
If a functional dependency X->Y holds true where Y is not a subset of X then this dependency is called
non trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address.

46

MS Bhanu
The following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
On the other hand, the following dependencies are trivial:
{emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]
Refer: trivial functional dependency.
Completely non trivial FD:
If a FD X->Y holds true where X intersection Y is null then this dependency is said to be completely non
trivial function dependency.
3. Multivalued dependency in DBMS
Multivalued dependency occurs when there are more than one independent multivalued attributes in a
table.
For example: Consider a bike manufacture company, which produces two colors (Black and white) in
each model every year.

bike_model manuf_year color


M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red
Here columns manuf_year and color are independent of each other and dependent on bike_model. In this
case these two columns are said to be multivalued dependent on bike_model. These dependencies can be
represented like this:
bike_model ->> manuf_year
bike_model ->> color
4. Transitive dependency in DBMS
A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.
For e.g.
X -> Z is a transitive dependency if the following three functional dependencies hold true:
 X->Y
 Y does not ->X
 Y->Z

47

MS Bhanu
Note: A transitive dependency can only occur in a relation of three of more attributes. This dependency
helps us normalizing the database in 3NF (3rd Normal Form).
Example: Let‟s take an example to understand it better:

Book Author Author_age


Game of Thrones George R. R. Martin 66
Harry Potter J. K. Rowling 49
Dying of the Light George R. R. Martin 66
{Book} ->{Author} (if we know the book, we knows the author name)
{Author} does not ->{Book}
{Author} -> {Author_age}
Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that makes
sense because if we know the book name we can know the author‟s age.

KEYS
Keys: Key is an attribute or combination of attributes which can be used for uniquely identification of
each row of data in a table.
Sno Sname AadharNo SEmail
101 Ramu 123456789 abc@gmail.com
102 Rani 987654321 xyz@gmail.com
103 Ramu 4546321789 mno@gmail.com
 SNo, AadharNo, SEmail are Keys.
Why use keys in DBMS
 To identify each row of a table uniquely.
 To maintain data integrity.
 To maintain relationship between tables.
Keys Types
There are different types of keys available in RDBMS. They are
1. Super Key
2. Candidate Key
3. Primary Key
4. Alternate Key
5. Foreign Key
6. Composite Key
7. Compound Key
8. Surrogate Key
48 MS Bhanu
1. Super Key or Key:
a super key is nothing but a key. It is an attribute or combination of attributes for uniquely
identification of each row in a table.
Example:
Sno Sname AadharNo SEmail
101 Ramu 123456789 abc@gmail.com
102 Rani 987654321 xyz@gmail.com
103 Ramu 4546321789 mno@gmail.com
From the above table attributes are “SNo”, “Sname”, “Aadharno”, “SEmail”
Super Keys for the above table
SNO
AadharNo
SEmail
SNO+Aadhar
SNO+SEmail
AadharNo+SEmail
Sno+AdharNo+SEmail
2. Candidate Key:
 It is nothing but a minimal subset of a super key.
 Minimal means “As minimum as possible”
 Candidate key is an attribute or combination of attributes for uniquely identification of each row
in a table.
Example:
Sno Sname AadharNo SEmail
101 Ramu 123456789 abc@gmail.com
102 Rani 987654321 xyz@gmail.com
103 Ramu 4546321789 mno@gmail.com
Super Keys for the above table
SNO
AadharNo
SEmail
SNO+Aadhar
SNO+SEmail
AadharNo+SEmail
Sno+AdharNo+SEmail
Candidate Key from the above table:

49
MS Bhanu
Minimum attribute of a super key. So from the above table SNO, AadharNo, Semail are candidate Key.
But not combination of attributes are not candidate key because its not minimum number of attributes of a
super key.
3. Primary Key:
A primary Key is a candidate Key which can be used for uniquely identification of each row of a table
Example:
Sno Sname AadharNo SEmail
101 Ramu 123456789 abc@gmail.com
102 Rani 987654321 xyz@gmail.com
103 Ramu 4546321789 mno@gmail.com
From the above table candidate keys are
Sno
AadharNo
SEmail
The above candidate keys any of one attribute as a primary key.
4. Alternate Key:
The candidates Key which are not chosen as Primary Key are called Alternate Key.
Example:
Sno Sname AadharNo SEmail
101 Ramu 123456789 abc@gmail.com
102 Rani 987654321 xyz@gmail.com
103 Ramu 4546321789 mno@gmail.com

From the above table candidate keys are


Sno
AadharNo
SEmail
The above candidate keys any of one attribute as a primary key. We can select SNO as a Primary Key,
than remaining Candidate keys (AadharNo, SEmail) are alternate keys.
5. Foreign Key:
 It is used to maintain relationship between two tables.
 Foreign Key must be a Primary Key in the same table or in the other table.
6. Composite Key:
The combination of attributes for uniquely identification of each row in a table
7. Compound Key:

50
MS Bhanu
If a composite key contains any one of the attribute as a foreign key attribute, that attribute is called as
Compound Key.

Normalization
The process of dividing a Relation (Table) into smaller well structure Relations (Tables) without
having any Redundancy.
 It removes/ reduces redundancy and eliminates insertion. Updating and deletion anomalies.
 It divides larger tables into smaller tables and links them using relationships.
 It was proposed by E.F. Codd.
Why the need of Normalization
It the table is not properly normalized & data having redundancy then it will take extra memory space and
also difficult to handle and update the database, without facing data loss.
In RDBMS having 5 types of normal forms. They are
1. 1st Normal Form (1st NF)
2. 2nd Normal Form (2nd NF)
3. 3rd Normal Form (3rd NF)
4. BCNF
5. 4th Normal Form (4th NF)

1st NORMAL FORM (1st NF)


For a table to be in 1NF, it should follow the following 2 rules.
1. It should have single valued attributes.
2. Each record should be unique.

EmpNO EmpName Subject


1 Sri C, C++
2 Ram Java
3 Abhi C,Java
 From the above table EmpNo, EmpName and Subject are attributes of a table. In that table
“Subject” attribute having multiple data values. But in based on1NF rules, every attributes having
single value. But in that table having multiple values.
 We are applying 1NF form rules in that above table, we can design the above table as below.
EmpNO EmpName Subject
1 Sri C
1 Sri C++
2 Ram Java
3 Abhi C
3 Abhi Java

51
MS Bhanu
Note: Now days from the above method is not good, by using another method to applying 1NF rules.

EmpNO EmpName Subject1 Subject2


1 Sri C C++
2 Ram Java Null
3 Abhi C Java

2nd Normal Forms (2nd NF)


A Relation (Table) is said to be in 2NF, if it follow the following Rules.
1. That Relation (Table) should be in 1NF.
2. All Non-key attributes are fully functionally dependent on the entire primary key (P.K) but not on
part of P.K. i.e, if there are any partial dependencies they have removed.
Subject
Sid Sname Saddress SubjectID Marks
Name
01 Ramu KKT 101 C 75
02 Ramu KKT 102 C++ 80
03 Rani WNP 101 C 65
04 Rani WNP 102 C++ 73
Form the above table Sid, Sname, Saddress, SubjectID, SubjectName and Marks are Attributes of table.
In that table already shoul follow 1NF rule. But not follow 2nd rules in 2NF. Based on 2NF rule, it table
having only non-key attributes are fully functionally dependency on the primary key. It means the table
having any partial functional dependency than it remove. But in that above table having partial
dependency, so we can diving the table as follow 2NF rules.
Table1:

Sid Sname Saddress SubjectID Marks


01 Ramu KKT 101 75
02 Rani KKT 102 80
03 Ramu WNP 101 65
04 Rani WNP 102 73
Table2:
SubjectID SubjectName
101 C
102 C++

52
MS Bhanu
3rd Normal Forms (3rd NF)
A Relation (Table) is said to be in 3NF, if and only of it follows below rules.
 Relation should be in 2NF.
 If should not contain any transitive Dependency. It means to remove any transitive Dependency in
a table.
Example:
EmpId EmpName EmpCity EmpZip
101 Ram WNP 509103
102 Rani KKT 509381
103 Ram HYD 500008
104 Raju MBNR 509001
From above table to follow 1NF and also 2NF so to identify the transitive Dependency than
Super Key: {EmpId}, {EmpId,Empzip},{Empid,EmpName}, {EmpId,EmpZip…etc
Candidate key: {EmpId}
Primary Key: {EmpId}
Functional dependency: EmpIdEmpZip
EmpZipEmpcity
Than EmpIdEmpCity its transitive dependency.
To remove transitive dependency in a table than re design the table as follow
Table1:
EmpId EmpName EmpZip
101 Ram 509103
102 Rani 509381
103 Ram 500008
104 Raju 509001

Table2:
EmpCity EmpZip
WNP 509103
KKT 509381
HYD 500008
MBNR 509001

53
MS Bhanu
BCNF(Boyce Codd Normal form)
It is an advance version of 3NF that‟s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table
complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super
key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They
store the data like this
EmpID EmpName DeptName DeptNo
E01 Ramu CS C01
E02 Rani Maths M01
E03 Ravi Accounts A01
E04 Rani Botony B01
E05 Sony CS C01
E06 Sandhya Accounts A01
E07 Sagar Telugu T01
Functional dependencies in the table above:
EmpIDEmpName
DeptNoDeptName
Candidate key: {EmpId and DeptNo}
The table is not in BCNF as either EmpID or DeptNo alone are keys.
To make the table fulfill with BCNF we can break the table in three tables like this:

Table3:

EmpID DeptNo
E01 C01
E02 M01
E03 A01
E04 B01
E05 C01
E06 A01
E07 T01

54
MS Bhanu
UNIT 3
SQL (Structured Query Language)

Meaning – SQL commands - Data Definition Language - Data Manipulation Language -

Data Control Language - Transaction Control Language - Queries using Order by – Where

- Group by - Nested Queries. Joins – Views – Sequences - Indexes and Synonyms - Table

Handling.

55
MS Bhanu
SQL
 SQL stands for Structured Query Language. It is used for storing and managing data in Relational
Database Management System (RDBMS).
 It is a standard language for Relational Database System. It enables a user to create, read, update
and delete relational databases and tables.
 All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as their
standard database language.
 SQL allows users to query the database in a number of ways, using English-like statements.
Rules: SQL follows the following rules:
 Structure query language is not case sensitive. Generally, keywords of SQL are written in
uppercase.

 Statements of SQL are dependent on text lines. We can use a single SQL statement on one or
multiple text line.

 Using the SQL statements, you can perform most of the actions in a database.

 SQL depends on tuple relational calculus and relational algebra.

SQL General Data Types


Each column in a database table is required to have a name and a data type. SQL developers have to
decide what types of data will be stored inside each and every table column when creating a SQL table.
The data type is a label and a guideline for SQL to understand what type of data is expected inside of each
column, and it also identifies how SQL will interact with the stored data. The following table lists the
general data types in SQL:

Data type Description


Fixed length character string, with user-
CHAR(n)
specified length n.
Variable length character strings, with user-
VARCHAR(n)
specified maximum length n.
NUMBER Integer numerical (no decimal). Precision 10
Stores year, month, and day values.
DATE Syntax: DD/Mon/YY
Ex: 28/Aug/2016
Approximate numerical, mantissa precision
FLOAT
16

56
MS Bhanu
INTEGER Integer numerical (no decimal). Precision 10

SQL Language Statements


 SQL commands are divided into 4 types. They are listed below.

1. DDL (Data Definition Language)


DDL : Data Definition Language (DDL) statements are used to define the database structure or
schema. Some examples:
 CREATE – It is used to create database objects in the database
 ALTER – It alters the structure of the database
 DROP – It is used delete objects from the database
 TRUNCATE – It is used remove all records from a table, including all spaces allocated for the
records are removed
i. CREATE: SQL command that adds a new table or View to an SQL database. Tables are a basic
unit of organization and storage of data in SQL.
CREATE TABLE tablename ( column1 datatype constraint,
Column2 datatype constraint, …………………
………………………………………………………………
……………………………
PRIMARY KEY (column_name or names),
FOREIGN KEY(column_name),
REFERENCES Tablename (column_name) );
Example
Create employee table with Emp_ID, Ename, AGE, phno.
CREATE TABLE employee(Emp_ID number, EName Varchar2(20) NOT NULL,
AGE NUMBER NOT NULL, PHONE_NUM NUMBER ,
PRIMARY KEY(Emp_ID));
EMP_ID(PK) EMP_NAME AGE PHONE_NUM

57
MS Bhanu
ii. ALTER: SQL command can be used to add, modify, or drop a column from the existing table or
to rename a table. ALTER statement is used to
1. Add new columns to the existing table
2. Modify existing column size or data type
3. Remove columns from the table
4. Drop constraints
5. Rename a table.
1. To add new columns
Syntax:
 ALTER TABLE table_name ADD new_column_name DATATYPE;
Example:
 ALTER TABLE Employee ADD (Salary NUMBER);
EMP_ID(PK) EMP_NAME AGE PHONE_NUM Salary
2. Modify columns
Syntax:
 ALTER TABLE table_name MODIFY existing_column new_datatype[new_size];
Example:
 ALTER TABLE employee MODIFY emp_name char(25);
3. Remove columns from the table
 ALTER TABLE existing_table_name DROP COLUMN name_of_the_column;
Example of removing column “Salary”
 ALTER TABLE Employee DROP COLUMN Salary
EMP_ID(PK) EMP_NAME AGE PHONE_NUM
4. Drop constraints
 ALTER TABLE existing_table_name
DROP constraint constraint _name;
Example of deleting primary key constraint
 ALTER TABLE employees DROP PRIMARY KEY;
EMP_ID EMP_NAME AGE PHONE_NUM
This would remove primary key constraint on employee table.
5. Rename a table.
 ALTER TABLE existing_table_name RENAME TO new_table_name;
Example of Renaming table:
 ALTER TABLE employees RENAME TO emp;
This would rename the table EMPLOYEES to EMP.

58

MS Bhanu
iii. DROP: The SQL command that removes the entire table. DROP Table statement is used to
remove the table from the database. It means all the table definitions, constraints, permissions
and data stored in tables will be deleted.
Syntax:
DROP TABLE table_name;
Ex: DROP TABLE emp;
Note: Before table drop must remove the dependent constraints
iv. TRUNCATE: TRUNCATE command is used to delete all the rows from the table and free the
space containing the table.
Syntax:
TRUNCATE TABLE table_name;
Example: TRUNCATE TABLE stu;

2. DML (Data Manipulation Language)


DML is used to manipulate records in the table. The following are DML Commands
1. INSERT – To add new records into table
2. UPDATE – For correction or modification of any records in tables.
3. DELETE – To remove records from table
4. SELECT - To retrieve records from table

i. INSERT INTO:
It creates new record(s) in a table. Basic syntax of Insert Into:
INSERT INTO table_name
VALUES (value1, value2, value3,…………………………….valueN);
Example:
INSERT INTO emp (emp_id, emp_name,age,phno)
values(1, „Ramu‟,25,123456789 );
We can insert records into specific columns as well. It is important to note that we can skip only those
columns which have been defined to support NULL values.
EMP_ID EMP_NAME AGE PHONE_NUM
1 Ramu 25 123456789
We can insert another row we are using insert into query again.
INSERT INTO emp (emp_id, emp_name,age,phno)
values(2, „Rani‟,18,456321789 );
EMP_ID EMP_NAME AGE PHONE_NUM
1 Ramu 25 123456789
2 Rani 18 456321789

59

MS Bhanu
ii. SELECT statement: SELECT statement is used to retrieve data from the tables based on
various conditions.
1. Select all columns and data.
2. Retrieve selective columns using column names
3. Retrieve selective data from the table based on conditions
Syntax:
SELECT column_name1, column_name2,………
FROM table_name
WHERE condition;

1. Select all columns and data.


To retrieve data of all the columns, use * like shown below:
 SELECT * FROM Employee
EMP_ID EMP_NAME AGE PHONE_NUM
1 Ramu 25 123456789
2 Rani 18 456321789
2. Retrieve selective columns using column names
We can select some of the columns from the table by providing the column names in the SELECT
query.
 SELECT EMP_ID, AGE FROM Employee;
EMP_ID AGE
1 25
2 18
3. Retrieve selective data.
To retrieve selected rows from the table use “WHERE” Clause.
Syntax:
 SELECT * FROM table_name WHERE condition;
Example :
 SELECT * FROM Employee WHERE EMP_ID=1;
EMP_ID EMP_NAME AGE PHONE_NUM
1 Ramu 25 123456789

iii. Update statement


This is useful for any correction or modification of any data in tables. We can update selected or all rows
of a table.
1. To Update all rows
2. To update selected rows use WHERE clause.
60

MS Bhanu
1) To Update all rows
Syntax:
 UPDATE table_name
SET col1= val1, col2 = val2, ….…. coln = valn ;
Example:
 UPDATE stu5 SET total=DBMS+Eng+Tel, average=total/3;
2) To update selected rows use WHERE clause.
Syntax:
 UPDATE table_name
SET col1= val1, col2 = val2, ….…. coln = valn
WHERE condition ;
Example:
 UPDATE stu5 SET grade = ‘First’
WHERE average > = 60 AND average < 70;

iv. DELETE statement: DELETE statement is used to delete all or selected rows from a
table.
1) To delete all rows
Syntax:
 DELETE FROM table_name ;
Example :
 DELETE FROM stu5 ;
2) We can delete selected records using WHERE clause.
Syntax:
 DELETE FROM table_name WHERE conditions;
Example:
DELETE FROM employee WHERE AGE= 25;
This will delete the records from EMPLOYEE table, which have DEPT_ID as 3.

Row Selection (WHERE clause) or WHERE clause


SELECT statements, we retrieve all data or rows in specified columns from a table. To select only some
rows or to specify a selection criterion, we use WHERE clause. The WHERE clause filters rows from
the FROM clause tables. Omitting the WHERE clause specifies that all rows are used.
There are five basic search conditions that can be used in a query
✓ Comparison: compares the value of an expression to the value of another expression
✓ Range: tests whether the value of an expression falls within a specified range of values.

61

MS Bhanu
✓ Set membership: tests whether a value matches any value in a set of values.
✓ Pattern Match : tests whether a string matches a specified pattern.
✓ Null: tests a column for null (unknown) value.
Each type of these search conditions will be presented in this section.
Syntax:
SELECT * FROM table_name WHERE condition
Example1:
 Select * from Employee2 where DeptName=‟Accounts‟;

Example2:
 Select * from Employee2 where sal>5000;

Integrity Constraints
Constraints are a set of rules. It is used to maintain the quality of information. Integrity constraints
ensure that the data insertion, updating, and other processes have to be performed in such a way that data
integrity is not affected. Thus, integrity constraint is used to guard against accidental damage to the
database.
There are the following categories of data integrity exist with each RDBMS:
 Domain Integrity Constraints
o Check
o Not null
 Entity Integrity Constraints

62

MS Bhanu
o Unique
o Primary key
 Referential Integrity Constraints
o Foreign Key

1. Domain integrity:
It applies valid entries for a given column by checking the type, the format, or the range of values.
 Domain constraints can be defined as the definition of a valid set of values for an attribute.
 The data type of domain includes string, character, integer, time, date, currency, etc. The value of
the attribute must be available in the corresponding domain.

1. Check Constraint: CHECK constraint is used to restrict the value of a column between ranges. It
performs check on the values, before storing them into the database. It‟s like condition checking before
saving data into a column.
Create a table for account using following conditions.
 Account number must be start with ‘A’ and
 Bank balance should be greater than 1000.
 CREATE TABLE account (acno NUMBER CHECK (acno LIKE „A%‟),
bal NUMBER CHECK (bal>1000));
2. NOT NULL Constraint: NOT NULL constraint restricts a column from having a NULL value. Once
NOT NULL constraint is applied to a column, you cannot pass a null value to that column. It enforces a
column to contain a proper value.
 CREATE TABLE student (Sno NUMBER NOT NULL,
Sname VARCHAR(20) NOT NULL);
For the above table student we must provide value for Sno and Sname.

2. Entity Integrity
It specifies that there should be no duplicate rows in a table.
 The entity integrity constraint states that primary key value can't be null.
 This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.
 A table can contain a null value other than the primary key field.

1. UNIQUE Constraint: UNIQUE constraint ensures that a field or column will only have unique
values. A UNIQUE constraint field will not have duplicate data, but It can contain NULL value.
 CREATE TABLE customer (cno NUMBER UNIQUE,
name VARCHAR(20) NOT NULL);

63

MS Bhanu
2. Primary Key Constraint: Primary key constraint uniquely identifies each record in a database.
A Primary Key must contain unique value and it must not contain NULL value. Usually Primary Key is
used to index the data inside the table.
 CREATE TABLE dept (dno NUMBER PRIMARY KEY,
dname VARCHAR(20) NOT NULL, loc VARCHAR(20));

3. Referential integrity: Foreign Key Constraint


A referential integrity constraint is specified between two or more tables. In the Referential integrity
constraints, if a foreign key in Table1 refers to the Primary Key of Table2, then every value of the
Foreign Key in Table1 must be null or be available in Table2.
 CREATE TABLE emp (eno NUMBER PRIMARY KEY,
ename VARCHAR(20) NOT NULL,doj DATE, sal NUMBER
dno NUMBER, FOREIGN KEY(dno) REFERENCES dept(dno));

Working with Aggregate Functions


It allows you perform calculation on a set of values and return a single value. In database management an
aggregate function is a function where the values of multiple rows are grouped together as input on
certain criteria to form a single value of more significant meaning.
 sum( ) returns the summation of all non-NULL values a set.
 count( ) returns the number of rows in a group, including rows with NULL values.
 avg( ) calculates the average of non-NULL values in a set.
 min( ) returns the lowest value (minimum) in a set of non-NULL values.
 max( ) returns the highest value (maximum) in a set of non-NULL values.
Now let us understand each Aggregate function with a example:
Example : SELECT * FROM employee2;

 SUM()
It calculates the sum of a set of values.
Syntax:

64
MS Bhanu
 Select SUM(columnname) from tablename;
To find total salary given to all employees
 SELECT SUM(sal) FROM employee2;

 SUM(DISTINCT column):
Sum of all distinct Non-Null values.
 SELECT SUM(DISTINCT sal) FROM employee2;

 COUNT( )
It counts the number of rows.
 COUNT(*):
Returns total number of records.
Compute total number of employees or rows in the table
 SELECT COUNT(*) FROM employee2;

COUNT(sal): Return number of Non Null values over the column salary
 SELECT COUNT(sal) FROM employee2;

 AVG()
It calculates the average of set of values.
Avg(sal)
Compute average salary in the employee2 table
 SELECT AVG(sal)FROM employee2;

65

MS Bhanu
 MIN( )
It returns the lowest value from a set of non-null values.
Compute lowest salary in the employee2 table
 SELECT MIN(sal)FROM emp;

 MAX()
It returns the highest value (maximum) in a set of non-NULL values.
Compute highest salary in the employee2 table
 SELECT MAX(sal)FROM employee2;

GROUP BY
Using group by clause related rows can be grouped together with respect to the attribute specified on the
group by clause.Group by clause is used to group rows that have the same values.
Compute total salary of each DeptName in the “employee2” table
 SELECT DeptName, SUM(sal) FROM employee2 GROUP BY DeptName ORDER BY
DeptName;

 Compute average salary of each department in the “employee2” table

66
MS Bhanu
 SELECT DeptName, AVG(sal) FROM employee2 GROUP BY DeptName ORDER BY
DeptName;

 Compute highest salary in each department in the “emp” table


 SELECT DeptName,MAX(sal) FROM employee2 GROUP BY DeptName;

 Compute lowest salary in each department in the “employee2” table


 SELECT DeptName,MIN(sal) FROM employee2 GROUP BY DeptName;

 Compute total, average, highest & lowest salaries in each department in the
“emp” table
 SELECT DeptName,SUM(sal),AVG(sal),MAX(sal),MIN(sal)
FROM employee2 GROUP BY DeptName;

67
MS Bhanu
Having Clause
Where clause can‟t be used with aggregated functions hence the „having‟ clause was added to SQL.
 SELECT aggregated_function(),column_name
FROM table_name
GROUP BY column_name
[HAVING condition ] ;
Find the department whose average salary is greater than 5000.
 SELECT avg(sal),DeptName FROM employee2
GROUP BY DeptName HAVING avg(sal) > 5000;

Virtual View
View is a logical table created from existing table or tables/ other views. View contains no data of its
own, but is like a window through which data from tables can be viewed or changed.

Syntax;
CREATE OR REPLACE VIEW view_name AS
SELECT list_of_Column_names
FROM Existing_Table_Name
WHERE Conditions
GROUP BY columns
HAVING conditions
ORDER BY columns ;
OR REPLACE
With the “or replace” option, a view can be created even if one exists with this name already, thus
replacing the old view.
Example: Create view for employees2 table who are working in the department Arts.
 CREATE OR REPLACE VIEW Artsgroup AS
SELECT EmpNo,EmpName, sal, DeptName
FROM employee2 WHERE DeptName=’Arts';
68

MS Bhanu
View created.
0.02 seconds
 SELECT * FROM ArtsGroup;

Example: Create view for employees2 table who are working in the department Commerce.
 CREATE OR REPLACE VIEW CommerceGroup AS
SELECT EmpNo,EmpName, sal, DeptName
FROM employee2 WHERE DeptName=’Commerce';
View created.
 SELECT * FROM CommerceGroup;

Updatable views
DML Operations on Views:
View is a logical table hence, any DML operation applied on table (such as Select, Insert, Update or
Delete) can be applied on views.
1.INSERT
 INSERT INTO CommerceGroup VALUES(111,'Advaith',15000,'Commerce');
69

MS Bhanu
1 row(s) inserted.
 SELECT * FROM CommerceGroup;

2. UPDATE
 UPDATE CommerceGroup SET Sal=20000 WHERE EmpNo=111;
 SELECT * FROM CommerceGroup;

3. DELETE
 DELETE FROM CommerceGroup WHERE EmpNo=111;
 SELECT * FROM CommerceGroup;

70

MS Bhanu
Advantages of View
1. A view is never stored it is only displayed. It doesn‟t require memory space
2. View is automatically updated each time when it is used
3. It is used to restrict view of the table i.e we can hide some of columns or rows in the table.
4. We can view the data without storing the data into the database object.
5. View is a virtual table formed from one or more base tables or views. So we can Join two or more
tables and display it as a one table to user.
6. Limit the access of table so that nobody can insert the rows into the table.

Disadvantages of view
1. View is completely depends on the original table.
2. When the base table is removed view becomes inactive.

Synonym
It is used to create another name to the database objects. Content of original table is modified if we
modify records in table.
Syntax:
 CREATE SYNONYM <new_name> FOR <Existing _ name>
Ex:
 CREATE SYNONYM departments FOR dept;

Sequence
It is used to insert sequence numbers into database tables.
Syntax:
 CREATE SEQUENCE name
INCREMENT BY Value
START WITH initial_value
MINVALUE Value
MAXVALUE Value
CYCLE / NOCYCLE;
Eg:
 CREATE SEQUENCE sq2 INCREMENT BY 1 START WITH 18033029402001;
 INSERT INTO Student
VALUES (sq2.NEXTVAL,'Maheswari',‟Addakula',‟Accounts‟);
 INSERT INTO Student
71

MS Bhanu
VALUES (sq2.NEXTVAL,'Afreen',‟Addakula‟,‟Commerce‟);
 INSERT INTO Student
VALUES (sq2.NEXTVAL,'Anuradha',‟Warne',‟Computers‟);
 SELECT * FROM Student;

Creating Index
Index is used to retrieve data from the database very fast. They are used to speed up searches
Syntax:
 CREATE INDEX index_name ON table_name(column1, column2,…….);
Example:
 CREATE INDEX id1 ON employee2(EmpName,sal);

Removing an Index
Syntax: DROP INDEX index_name;
Example: DROP INDEX id1;
DCL: Data Control Language (DCL)
DCL used to control access to data stored in a database. It apply database security in a multiple user
database environment
72

MS Bhanu
1. GRANT
It gives user's access privileges to database. It is used to provide access or privileges on the database
objects to the users.
Syntax:
 GRANT privilege_name ON Object_name TO username;
Example: Before using DCL commands we need to learn how to create users.
Syntax for creating user:
 CREATE USER user_name IDENTIFIED BY password;
Example:
 CREATE USER niveditha IDENTIFIED BY ndc;
To grant connection to user
 GRANT CONNECT TO niveditha;
To grant connection, all resources and DBA to user
 GRANT CONNECT,RESOURCE, DBA TO niveditha;
To grant DML operations like SELECT, UPDATE operations on employee2 table to user
 GRANT SELECT,UPDATE ON employee2 TO niveditha;

2. REVOKE
It is used to remove the user accessibility to database object. It is used to take back permissions from any
user.
Syntax:
 REVOKE privilege_name ON Object_name FROM username;
Remove privileges from user
To cancel DML operations like SELECT, UPDATE operations on employee2 table from user
 REVOKE SELECT,UPDATE ON employee2 FROM niveditha;
To cancel connection permission from user
 REVOKE CONNECT FROM niveditha;
To cancel connection, Resources and DBA permissions from user
 REVOKE CONNECT,RESOURCE, DBA FROM niveditha;

TCL (Transaction Control Language)


TCL statements are used to manage the changes made by DML statements. It allows statements to be
grouped together into logical transactions.

73
MS Bhanu
1. SAVEPOINT
SAVEPOINT command is used to temporarily save a transaction so that you can ROLLBACK to
that point whenever required. It Identify a point in a transaction to which you can later roll back.
Syntax: SAVEPOINT savepoint_name;
Example : SAVEPOINT sp1;

2. ROLLBACK
Restore database to original since the last COMMIT. It restores the database to last committed or
savepoint state To restore the database to last committed state use following command.
 ROLLBACK;
It restores the database to last savepoint state.
Syntax: ROLLBACK TO savepoint_name;
Example: ROLLBACK TO sp1;

3. COMMIT
Commit mark the any DML changes as permanent. It permanently save any changes into
database. When we use any DML command like insert, update or delete the changes made by these
commands are not permanent until the current session is closed. The changes made by these commands
can be rolled back. To avoid that, we use the COMMIT to mark the changes as permanent.
COMMIT;

SUB QUERIES
A Subquery is a query within another SQL query and embedded within the WHERE clause.
Important Rule:
 A subquery can be placed in a number of SQL clauses like WHERE clause, FROM clause,
HAVING clause.
 You can use Subquery with SELECT, UPDATE, INSERT, DELETE statements along with the
operators like =, <, >, >=, <=, IN, BETWEEN, etc.
 A subquery is a query within another query. The outer query is known as the main query, and the
inner query is known as a subquery.
 Subqueries are on the right side of the comparison operator.
 A subquery is enclosed in parentheses.
 In the Subquery, ORDER BY command cannot be used. But GROUP BY command can be used
to perform the same function as ORDER BY command.
Syntax:
 SELECT column_name

74
MS Bhanu
FROM table_name
WHERE column_name OPERATOR ( SELECT column_name
FROM table_name
WHERE condition );

 SELECT * FROM employee2;

Eg: find employee details whose salary is greater than laxmi’s salary
 SELECT * FROM employee2
WHERE sal > (SELECT sal FROM employee2 WHERE EmpNo = 107);

Multi Row Sub Queries


In multi row subquery, it will return more than one value. In such cases we should include operators like
any, all, in or not in between the comparison operator and the subquery.
Example:
 SELECT * FROM employee2
WHERE sal > ALL (SELECT sal FROM employee
WHERE sal BETWEEN 5000 AND 15000);
Find second highest salary
SELECT MAX(sal) FROM employee2 WHERE sal < (SELECT MAX(sal) FROM employee2);

WORKING WITH JOINS


Joins are used to retrieve records from multiple tables. There are 3 types of joins.
1. Self Join 2. Inner Join 3. Outer Join
75

MS Bhanu
Self Join: Joining a table with itself is a self join.
 SELECT e.EmpNo,e.EmpName,e.sal, m.EmpNo,m.EmoName,m.sal
FROM employee2 e, employee2 m
WHERE e.EmpNo=m.EmpNo;

Inner Joins:
An inner join between two or more is the Cartesian product that satisfies the join condition in the
WHERE clause. Inner joins use a comparison operator like = (Equi Join) or <, <=, >, >=, < > (Non Equi
Join) to match rows from two tables based on the values in common columns from each table.
 SELECT * FROM employee2, department
WHERE employee2.DeptName=department.DeptName;

Outer Joins:
An outer join is used to retrieve the rows with an unmatched value in the relevant column. There are
TWO types of outer joins.
1. Left Outer Join
2. Right Outer Join
Left Outer Join retrieves all rows from first (left) table and only matching rows from second (right)
table.
 SELECT * FROM employee2 LEFT JOIN department
ON employee2.DeptName=department.DeptName;

76

MS Bhanu
Right Outer Join retrieves all rows (Matching and Unlatching rows) from Second (Right) table and
only matching rows from First (Left) table.
 SELECT * FROM employee2 RIGHT JOIN department
ON employee2.DeptName=department.DeptName;

Full Outer Join retrieves all rows (Matching and Unlatching rows) from first (left) and Second
(right) tables.
 SELECT * FROM employee2 FULL JOIN department
ON employee2.DeptName=department.DeptName;

77

MS Bhanu
UNIT-IV
TRANSACTIONS
AND
CONCURRENCY MANAGEMENT

Transactions - Concurrent Transactions - Locking Protocol - Serialisable Schedules - Locks Two Phase

Locking (2PL) - Deadlock and its Prevention - Optimistic Concurrency Control.

Database Recovery and Security: Database Recovery meaning - Kinds of failures - Failure controlling

methods - Database errors - Backup & Recovery Techniques - Security & Integrity - Database Security -

Authorization.

78
MS Bhanu
Transactions
A Transaction is an execution of a program and is seen by DBMS as a series or list of actions. It is
different from an ordinary program and is the result from the execution of a program written in a high –
level data manipulation language or programming language. A transaction starts and ends between the
statements “begin transaction” and “end transaction”.
In a transaction, access to the database is knowledgeable by two operaions.
i Read(x).

ii Write(x).
The first one perform the reading operation of data item x from the database, where as the second one
perform the writing operation of data item x to the database. Consider a transaction Ti which transfers
100/- from “A” account to “B” account. This transaction will follows
Ti :
read(A);
A: = A-100;
Write(A);
Read(B);
Write(B);

ACID Properties or Transaction Properties


A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in
a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known
as ACID properties − in order to ensure accuracy, completeness, and data integrity.
1. Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either
all of its operations are executed or none. There must be no state in a database where a transaction is left
partially completed. States should be defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction.
2. Consistency − The database must remain in a consistent state after any transaction. No transaction
should have any adverse effect on the data residing in the database. If the database was in a consistent
state before the execution of a transaction, it must remain consistent after the execution of the transaction
as well.

79
MS Bhanu
3. Isolation − In a database system where more than one transaction are being executed simultaneously
and in parallel, the property of isolation states that all the transactions will be carried out and executed as
if it is the only transaction in the system. No transaction will affect the existence of any other transaction.
4. Durability − The database should be durable enough to hold all its latest updates even if the system
fails or restarts. If a transaction updates a chunk of data in a database and commits, then the database will
hold the modified data. If a transaction commits but the system fails before the data could be written on to
the disk, then that data will be updated once the system springs back into action.

Transaction States

1. Active State This is the first state in the life cycle of a transaction. A transaction is called in an
active state as long as its instructions are getting executed. All the changes made by the transaction now
are stored in the buffer in main memory.
2. Partially Committed State After the last instruction of transaction has executed, it enters into a
partially committed state. After entering this state, the transaction is considered to be partially committed.
It is not considered fully committed because all the changes made by the transaction are still stored in the
buffer in main memory.
3. Committed State After all the changes made by the transaction have been successfully stored into
the database, it enters into a committed state. Now, the transaction is considered to be fully committed.
4. Failed State When a transaction is getting executed in the active state or partially committed state
and some failure occurs due to which it becomes impossible to continue the execution, it enters into a
failed state.

80
MS Bhanu
5. Aborted State After the transaction has failed and entered into a failed state, all the changes made
by it have to be undone. To undo the changes made by the transaction, it becomes necessary to roll back
the transaction. After the transaction has rolled back completely, it enters into an aborted state.
6. Terminated State This is the last state in the life cycle of a transaction. After entering the
committed state or aborted state, the transaction finally enters into a terminated state where its life cycle
finally comes to an end.

Concurrency Control
The coordination of the simultaneous execution of transactions in a multi user database system is
known as concurrency control.
Need for Concurrency Control
The objective of concurrency control is to ensure the serializability of transaction in a multi-user database
environment. Concurrency control is important because the simultaneous execution of transactions over a
shared database can several data integrity and consistency problems. The three main problems are
a) Lost updates b) Uncommitted data c) Inconsistent data
a) Lost updates:-The Lost Updates problem occurs when two concurrent transactions T1 and T2 are
updating the same data element and one of the updates is lost.
b) Uncommitted data:-Uncommitted data occurs when two transactions T1 and T2 are executed
concurrently and the first transaction (T1) is rolled back after the second transaction (T2) has already
accessed the uncommitted data. Thus it is violating Isolation property of transactions.
c) Inconsistent retrievals: Inconsistent retrievals occur when a transaction access data before and after
another transactions finish working with such data.
Serializability
Serializability ensures that all transactions are executed one after another, in a non-preemptive manner.
In Serial schedule, there is no question of sharing a single data item among many transactions,
because not more than a single transaction is executing at any point of time. However, a serial schedule is
inefficient in the sense that the transactions suffer for having a longer waiting time and response time, as
well as low amount of resource utilization. Let us consider there are two transactions T1 and T2,We have
two accounts A and B, each containing Rs 1000/-.
 T1: We now start a transaction to deposit Rs 100/- from account A to Account B.
 T2 : is a new transaction which deposits Rs.50 from account A to account B

81
MS Bhanu
If we prepare a serial schedule, then either T1 will completely finish before T2 can begin, or T2 will
completely finish before T1 can begin. Suppose, in schedule-1 the transactions are executed serially in
-2 the transactions

In above schedules 1 & 2 transactions are executed serially, the final amount in account A is Rs. 850 & B
Account is Rs. 1150. After the execution, total amount is calculated (A+B) to ensure consistency. The
sum of A+B is Rs. 2000. Since they generate consistence result.

Need for Concurrent Execution


In concurrent schedule, CPU time is shared among two or more transactions in order to run them
concurrently. However, this creates the possibility that more than one transaction may need to access a
single data item for read/write purpose and the database could contain inconsistent value if such accesses
are not handled properly. Let us explain with the help of an example. Suppose transactions are executed
concurrently as shown below.

82
MS Bhanu
In above two schedules-3 and 4 transactions are executed concurrently, the final amount in account A is
Rs. 850 & B Account is Rs. 1150. The sum of A+B is Rs. 2000. Since they generate consistence result.
But let us take another example where a wrong concurrent schedule can bring about disaster. Consider the
following example involving the same T1 and T2

This schedule - S5 results inconsistent state, because we have made the switching at the second
instruction of T1. The result is very confusing. If we consider accounts A and B both containing Rs
1000/- each, then the result of this schedule should have left Rs 900/- in A, Rs 1150/- in B .
The sum of A+B is Rs. 2050. It is inconsistent sate.

Lock Granularity (or) Locking Level


Lock granularity indicates the level of lock use. Locking can take place at the following level.

83
MS Bhanu
a) Database level b) Table level c) Page level d) Row level e) Field (attribute level)
a) Database level:- In database level lock the entire database is locked. So if transaction T1 is
accessing that database, then transaction T2 cannot access it. This level of locking is good for batch
processes but it is unsuitable for multi user DBMS. Because thousands of transactions had to wait for the
previous transaction to be completed before the next one could reserve the entire database. So the data
access would be slow.
b) Table level:- In table level lock the entire table is locked that means if transaction T1 is accessing a
table then transaction T2 cannot access the same table. If a transaction requires access to several tables,
each table may be locked. Table level locks are less restrictive than database level locks. Table level locks
are not suitable for multi-user DBMS. The drawback of table level lock is suppose transaction T1 and T2
cannot access the same table even when they try to use different rows; T2 must wait until T1 unlocks the
table.
c) Page level:- In a page level lock, the DBMS will lock on entire disk page. A disk page or page is
also referred as a disk block, which is described as a section of a disk. A page has a fixed size such as 4k,
8k or 16k. A table can span several pages, and a page can contain several rows of one or more tables.
Page level locks are currently frequently used multi-user DBMS locking method. Page level lock is
shown in the following fig.

In the above fig. T1 and T2 access the same table while locking different disk pages. If T2 requires the
use of a row located on a page that is locked by T1, T2 must wait until the page is unlocked.
d) Row level:- A row level lock is much less restrictive than the other locks. The DBMS allows
concurrent transactions to access different rows of the same table even the rows are located on the same
pages. The row level locking approach improves the availability of data. But row level locking
management requires high overhead because a lock exist for each row in a table of the database. So it
involves a conflicting transaction.
e) Field level:- The field level lock allows concurrent transactions to access the same row as long as
they require the use of different fields (attributes) within that row.

Locking Methods
A lock is a mechanism to control concurrent access to a data item.

84

MS Bhanu
 Lock : In an object is locked by a transaction no other transaction can use that object.
The object may be a database, table, page or row.
 Unlock : If an object is unlocked, any trans action can lock the object for its use.
Data items can be locked in two modes:
1. Exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-
X instruction.
2. Shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
Lock requests are made to the concurrency-control manager by the programmer. Transaction can proceed
only after request is granted.
Lock requests are made to the concurrency-control manager by the programmer. Transaction can proceed
only after request is granted.

Shared/Exclusive locks can lead to two major problems:


 The resulting transaction schedule might not be serializable.
 The schedule might create deadlock.

85

MS Bhanu
Deadlock
A deadlock is a condition where two or more transactions are waiting indefinitely for one another
to give up locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever
gets finished and is in waiting state forever.
Or
A dead lock occurs when two transactions wait indefinitely for each other to unlock data For
example a dead lock occurs when two transactions, T1 and T2 exist in the following mode.
T1: access data items X and Y.
T2: access data items Y and X.
➢ T1 and T2 transactions are executing simultaneously so, T1 has locked data item X and T2 has locked
data item Y. Now transaction T1 is waiting to lock data item Y but it is already locked by T2.
Simultaneous T2 is waiting to lock X but it is already locked by T1. So both transactions are waiting to
access other items. Thus condition is referred as “Dead Lock”.

➢ So in real world DBMS, many transactions can be executed simultaneously, there by increasing the
probability of generating dead Locks.

➢ The 3 basic techniques to control dead locks are


a) Dead Lock Prevention:-
➢ A transaction requesting a new lock is aborted when there is the possibility that a dead lock can occur.
If the transaction is aborted, all changes made by this transaction are rolled back, and all locks obtained
by the transaction are released.
➢ This method is used when there is existing high probability of dead lock.
b) Dead Lock Detection:-
➢ The DBMS tests the database for dead locks. If a dead lock is found, one of the transactions is aborted
and the other transaction continues.
➢ This method is used when there is un probability of dead locks.
c) Dead Lock Avoidance:-
➢ The transaction must obtain all of the locks if needs before it can be executed. This technique avoids
the rollback of conflicting transactions.

Concurrency Control Techniques


Concurrency control is provided in a database to:
(i) Enforce isolation among transactions.
(ii) Preserve database consistency through consistency preserving execution of transactions.
(iii) Resolve read-write and write-read conflicts.
Various concurrency control techniques are:
1. Two-phase locking Protocol

86

MS Bhanu
2. Time stamp ordering Protocol

1. Two-Phase Locking Protocol: Locking is an operation which secures: permission to


read, OR permission to write a data item. Two phase locking is a process used to gain ownership of
shared resources without creating the possibility of deadlock. The 3 activities taking place in the two
phase update algorithm are:
(i). Lock Gaining (ii). Modification of Data (iii). Release Lock
Two phase locking stops deadlock from occurring in distributed systems by releasing all the resources it
has acquired, if it is not possible to acquire all the resources required without waiting for another process
to finish using a lock. This means that no process is ever in a state where it is holding some shared
resources, and waiting for another process to release a shared resource which it requires. This means that
deadlock cannot occur due to resource contention.
A transaction in the Two Phase Locking Protocol can assume one of the 2 phases:
Two Phase Locking Protocol ensures conflict-serializable schedules.
Phase 1: Growing Phase
 Transaction may obtain locks , but not release locks
Phase 2: Shrinking Phase
 Transaction may release locks, but may not obtain locks
The protocol assures serializability. It can be proved that the transactions can be serialized in the order of
their lock points (i.e., the point where a transaction acquired its final lock).
Let‟s discuss an example now. See how the schedule below follows Conservative 2-PL but does not
follow Strict and Rigorous 2-PL.

87

MS Bhanu
Look at the schedule, it completely follows Conservative 2-PL, but fails to meet the requirements of Strict
and Rigorous 2-PL, that is because we unlock A and B before the transaction commits.
2. Timestamp based Concurrency Control
A timestamp is a tag that can be attached to any transaction or any data item, which denotes a specific
time on which the transaction or the data item had been used in any way. A timestamp can be
implemented in 2 ways.
 One is to directly assign the current value of the clock to the transaction or data item.
 The other is to attach the value of a logical counter that keeps increment as new timestamps are
required.
The timestamp of a data item can be of 2 types:
(i) W-timestamp(X): This means the latest time when the data item X has been written into.
(ii) R-timestamp(X): This means the latest time when the data item X has been read from. These 2
timestamps are updated each time a successful read/write operation is performed on the data item
X.

Source and types of Database failures


1. Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every second. The
durability and robustness of a DBMS depends on its complex architecture and its underlying hardware
and system software. If it fails or crashes amid transactions, it is expected that the system would follow
some sort of algorithm or techniques to recover lost data.
2. Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as follows −
3. Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can‟t go any
further. This is called transaction failure where only a few transactions or processes are hurt.
Reasons for a transaction failure could be −
 Logical errors − Where a transaction cannot complete because it has some code error or any
internal error condition.
 System errors − Where the database system itself terminates an active transaction because the
DBMS is not able to execute it, or it has to stop because of some system condition. For example,
in case of deadlock or resource unavailability, the system aborts an active transaction.

88
MS Bhanu
4. System Crash
There are problems − external to the system − that may cause the system to stop abruptly and cause the
system to crash. For example, interruptions in power supply may cause the failure of underlying hardware
or software failure. Examples may include operating system errors.
5. Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or storage drives
used to fail frequently. Disk failures include formation of bad sectors, unreachability to the disk, disk
head crash or any other failure, which destroys all or a part of disk storage.

Database Errors
A database represents an essential corporate resource that should be properly secured using appropriate
controls. We consider database security in relation to the following situations:
 Loss of availability.
 Loss of integrity.
 Loss of confidentiality (secrecy).
 Theft and fraud.
Any situation or event, whether intentionally or incidentally, can cause damage, which can reflect an
adverse effect on the database structure and, consequently, the organization.

 Availability loss − Availability loss refers to non-availability of database objects by legitimate


users.
 Integrity loss − Integrity loss occurs when unacceptable operations are performed upon the
database either accidentally or maliciously. This may happen while creating, inserting, updating or
deleting data. It results in corrupted data leading to incorrect decisions.
 Confidentiality loss − Confidentiality loss occurs due to unauthorized or unintentional disclosure
of confidential information. It may result in illegal actions, security threats and loss in public
confidence.
 Theft and fraud - A threat may occur by a situation or event involving a person or the action or
situations that are probably to bring harm to an organization and its database. Some common types
of threats include following.
o People : In this, an attempt is made either intentionally or unintentionally to cause harm or
damage to the database environment fully or partly. For instance, applications, operating
systems, DBMS, networks, can be damaged by employee, government authorities,
consultants, visitors, hackers, terrorists, hackers, intruders etc.

89
MS Bhanu
o Natural Disasters such as earthquakes and floods can cause harm to the database
environment components fully or to a single component i.e ., partially.
o Malicious code : A software code becomes malicious if it contains corrupted code which
is included in order to harm or damage database management components. Malicious code
can either be in the form of Bugs, Macro code, Boot sector Viruses, Worms, Trojan
Horses, Denial of Service, spoofing, Email Spams.
o Technological disaster : This type of disaster occur due to the malfunction of
equipment‟s and devices which cause damage to operating systems, networks, data files.
Some of the technologies disasters are power failures, media failure, network failure,
hardware failure.

Database Security
Database security is the protection of the Database against intentional or unintentional threats using
computer-based or non-computer-based controls.

Authentication
Authentication is about validating your credentials like User Name/User ID and password to verify user‟s
identity. The system determines whether you are what you say you are using your credentials. It includes
 Username and Password
 Plastic ID Card
 Encryption
Data Encryption − Data encryption refers to coding data when sensitive data is to be communicated
over public channels. Even if an unauthorized agent gains access of the data, he cannot understand it since
it is in an incomprehensible format.
 Cryptography is the science of encoding information before sending via unreliable
communication paths so that only an authorized receiver can decode and use it.
 The coded message is called cipher text and the original message is called plain text.
 Encryption The process of converting plain text to cipher text by the sender is called encoding or
encryption.
 Decryption The process of converting cipher text to plain text by the receiver is called decoding or
decryption.

Authorization
Authorization is the culmination of the administrative policies of the organization. As name specifies,
authorization is a set of rules that can be used to determine which user has what type of access of which
portion of the database. The person who writes access rules is called an authorizer.
90

MS Bhanu
An authorizer may set several forms of authorization on parts of the database. Among them are the
following:
1. Read Authorization: allows reading, but not modification of data.

2. Insert Authorization: allows insertion of new data, but not the modification of existing data, e.g.
insertion of tuple in a relation.

3. Update authorization: allows modification of data, but not its deletion. But data items like primary-
key attributes may not be modified.

4. Delete authorization: allows deletion of data only.

91
MS Bhanu
UNIT 5
Distributed database
A distributed database is a collection of multiple interconnected databases, which are spread physically
across various locations that communicate via a computer network.

Features
 Databases in the collection are logically interrelated with each other. Often they represent a single
logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
 The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.

Distributed Database Management System


A distributed database management system (DDBMS) is a centralized software system that manages a
distributed database in a manner as if it were all stored in a single location.

Features
 It is used to create, retrieve, update and delete distributed databases.
 It synchronizes the database periodically and provides access mechanisms by the virtue of which
the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.
 It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.

Advantages of Distributed Databases


Following are the advantages of distributed databases over centralized databases.
Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and

92

MS Bhanu
local data to the new site and finally connecting them to the distributed system, with no interruption in
current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a halt.
However, in distributed systems, when a component fails, the functioning of the system continues may
be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from local
data itself, thus providing faster response. On the other hand, in centralized systems, all queries have to
pass through the central computer for processing, which increases the response time.
Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not feasible
in centralized systems.

Types of Distributed Databases or Structure of DDMS


Distributed databases can be broadly classified into homogeneous and heterogeneous distributed
database environments, each with further sub-divisions, as shown in the following illustration.

Homogeneous Distributed Databases


In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −
 The sites use very similar software.
 The sites use identical DBMS or DBMS from the same vendor.
 Each site is aware of all other sites and cooperates with other sites to process user requests.
 The database is accessed through a single interface as if it is a single database.
Types of Homogeneous Distributed Database
There are two types of homogeneous distributed database −
 Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
 Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.
93

MS Bhanu
Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different operating systems, DBMS products
and data models. Its properties are −
 Different sites use dissimilar schemas and software.
 The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.
 Query processing is complex due to dissimilar schemas.
 Transaction processing is complex due to dissimilar software.
 A site may not be aware of other sites and so there is limited co-operation in processing user
requests.
Types of Heterogeneous Distributed Databases
 Federated − The heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system.
 Un-federated − The database systems employ a central coordinating module through which the
databases are accessed.

Client-server architecture of Distributed system.


 A client server architecture has a number of clients and a few servers connected in a network.
 A client sends a query to one of the servers. The earliest available server solves it and replies.
 A Client-server architecture is simple to implement and execute due to centralized server
system.

Collaborating server architecture


 Collaborating server architecture is designed to run a single query on multiple servers.
 Servers break single query into multiple small queries and the result is sent to the client.
 Collaborating server architecture has a collection of database servers. Each server is capable
for executing the current transactions across the databases.

94
MS Bhanu
3. Middleware architecture
 Middleware architectures are designed in such a way that single query is executed on multiple
servers.
 This system needs only one server which is capable of managing queries and transactions from
multiple servers.
 Middleware architecture uses local servers to handle local queries and transactions.
 The softwares are used for execution of queries and transactions across one or more independent
database servers, this type of software is called as middleware.

Data Replication
Data replication is the process in which the data is copied at multiple locations (Different computers or
servers) to improve the availability of data.

Goals of data replication


 Increase the availability of data.
 Speed up the query evaluation.

Types of data replication


There are two types of data replication:
1. Synchronous Replication:
In synchronous replication, the replica will be modified immediately after some changes are made in the
relation table. So there is no difference between original data and replica.
2. Asynchronous replication:
In asynchronous replication, the replica will be modified after commit is fired on to the database.

Replication Schemes
The three replication schemes are as follows:

95
MS Bhanu
1. Full Replication
In full replication scheme, the database is available to almost every location or user in communication
network.

Advantages of full replication


 High availability of data, as database is available to almost every location.
 Faster execution of queries.
Disadvantages of full replication
 Concurrency control is difficult to achieve in full replication.
 Update operation is slower.

2. No Replication

No replication means, each fragment is stored exactly at one location.

Advantages of no replication
 Concurrency can be minimized.
 Easy recovery of data.
Disadvantages of no replication
 Poor availability of data.
 Slows down the query execution process, as multiple clients are accessing the same server.

96 MS Bhanu
3. Partial replication

Partial replication means only some fragments are replicated from the database.

Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are
called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid (combination of
horizontal and vertical). Horizontal fragmentation can further be classified into two techniques: primary
horizontal fragmentation and derived horizontal fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments whenever
required. This requirement is called “reconstructiveness.”
Advantages of Fragmentation
 Since data is stored close to the site of usage, efficiency of the database system is increased.
 Local query optimization techniques are sufficient for most queries since data is locally available.
 Since irrelevant data is not available at the sites, security and privacy of the database system can
be maintained.
Disadvantages of Fragmentation
 When data from different fragments are required, the access speeds may be very high.
 In case of recursive fragmentations, the job of reconstruction will need expensive techniques.
 Lack of back-up copies of data in different sites may render the database ineffective in case of
failure of a site.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to
maintain reconstructiveness, each fragment should contain the primary key field(s) of the table. Vertical
fragmentation can be used to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered students in a
Student table having the following schema.
97
MS Bhanu
STUDENT

Regd_No Name Course Address Semester Fees Marks

Now, the fees details are maintained in the accounts section. In this case, the designer will fragment the
database as follows −

CREATE TABLE STD_FEES AS


SELECT Regd_No, Fees
FROM STUDENT;

Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields.
Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each horizontal fragment
must have all columns of the original base table.
For example, in the student schema, if the details of all students of Computer Science Course needs to be
maintained at the School of Computer Science, then the designer will horizontally fragment the database
as follows −

CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";

Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal extraneous
information. However, reconstruction of the original table is often an expensive task.
Hybrid fragmentation can be done in two alternative ways −
 At first, generate a set of horizontal fragments; then generate vertical fragments from one or more
of the horizontal fragments.
 At first, generate a set of vertical fragments; then generate horizontal fragments from one or more
of the vertical fragments.

98
MS Bhanu

You might also like