Veerachary CBCS RDBMS IV SEM-watermark

downloaded from: www.sucomputersforum.
com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Paper: (BC407) for B.Com (Computer Applications)

RELATIONAL DATABASE MANAGEMENT SYSTEMS
Syllabus
Unit-I: BASIC CONCEPTS:

Database Management System- File Based System- Advantages of DBMS over file based
system- Database Approach- Logical DBMS Architecture- Three level architecture of DBMS or
Logical DBMS architecture- Need for three level architecture- Physical DBMS Architecture-
Database Administrator (DBA) Functions & Role- Data files indices and Data Dictionary-
Types of Database.
Relational and ER models: Data Models- Relational model- Domains- Tuple and Relation-
Super keys- Candidate keys- Primary keys and Foreign key for the Relations- Relational
constraints- Domain constraint- Key constraint- Integrity constraint- Update operations and
Dealing with constraint violations- Relational operations- Entity Relationship (ER) Model-
Entities- Attributes- Relationships- More about Entities and Relationships- Defining
Relationship for College Database- ER Diagram- Conversion of ER Diagram to Relational
Database.
Unit-II: DATABASE INTEGRITY AND NORMALIZATION:
Relational Database Integrity- The keys- Referential Integrity- Entity Integrity- Redundancy
and Associated problems- Single valued Dependencies- Normalization- Rules of Data
Normalization- The First Normal Form- The Second Normal Form- The Third Normal Form-
Boyce Codd Normal Form- Attribute Preservation- Lossless Join Decomposition- Dependency
Preservation.
File Organization: Physical database Design Issues- Storage of Database on Hard Disks- File
Organization and Its types- Heap files (Unordered files)- Sequential file organization- Indexed
(Indexed sequential) File Organization- Hashed File Organization- Types of Indexes- Index and
Tree structure- Multi-key File Organization- Need for Multiple Access Paths- Multi-list File
Organization- Inverted File Organization.
Unit-III: STRUCTURED QUERY LANGUAGE (SQL):
Meaning- SQL commands- Data Definition Language- Data Manipulation Language- Data
Control Language- Transaction Control Language- Queries using Order by- Where Group by-
Nested Queries.
Joins- Views- Sequences- Indexes and Synonyms- Table Handling.
Unit-IV: TRANSACTIONS AND CONCURRENCY MANAGEMENT:
Transactions- Concurrent Transactions- Locking Protocol- Serialisable Schedules- Locks Two
Phase Locking (2PL)- Deadlock and its Prevention- Optimistic Concurrency Control.
Database Recovery and Security: Database Recovery meaning- Kinds of Failures- Failure
controlling methods- Database errors- Backup & Recovery Techniques- Security & Integrity-
Database Security- Authorization.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 1
downloaded from: www.sucomputersforum.com
Unit-V: DISTRIBUTED AND CLIENT SERVER DATABASES:

Need for Distributed Database Systems- Structure of Distributed Database- Advantages and
Disadvantages of DDBMS- Advantages of Data Distribution- Disadvantages of Data
Distribution- Data Replication- Data Fragmentation.
Client Server Databases: Emergence of Client Server Architecture- Need for Client Server
Computing- Structure of Client Server Systems & its advantages.
LAB: SQL QUERIES BASED ON VARIOUS COMMANDS.
SUGGESTED READINGS:
1. Database Systems: R.Elmasri & S.B.Navathe, Pearson.
2. Introduction to Database Management System: ISRD Group, McGraw Hill.
3. Database Management System: R.Rmakrishnan & J.Gehrke, McGraw Hill.
4. Modern Database Management: J.A.Hoffer, V.Rames & H.Topi, Parson.
5. Database System Concepts: Silberschatz, Korth & Sudarshan, McGraw Hill.
6. Simplified Approach to DBMS: ParteekBhaia, Kalyani Publishers.
7. Database Management System: NirupamaPathak, Himalaya.
8. Database Management Systems: Pannerselvam, PHI.
9. Relational Database Management System: Srivastava & Srivastava, New Age.
10.PHP MySQL Spoken Tutorials by IIT Bombay.
11.Oracle Database: A Beginner’s Guide: I.Abramson, McGraw Hill.

UNIT- I
** Basic Concepts **
Data: Data is a raw collection of facts about people, places, objects and events, which include
text, graphics, images, sound etc that have meaning in the user’s environment.
 Data is given by the user to the computer.
 It is not understandable (meaningless).
 It requires processing.
Example: 1Data Sri 35 45 55 Data (Here, it is not clear whether 35
2 Ram 75 80 98 is rno (or) marks)
Information: Information is meaningful data in an organized form. Information is processed
data that increases the knowledge of a person who uses the data.
 Information is given by the computer to the user.
 It is understandable (meaningful).
 It is processed data.
Example: Sno Name M1 M2 M3
(Number) (Varchar) (Number) (Number) (Number)
1 Sri 35 45 55 Information
2 Ram 75 80 98
 Data Process
Information
Meta data: Meta data is data about data. It describes the properties/characteristics of other data.
 It includes field names, data types and their size.
 It is used for processing
 It gives meaning to the data.
Sno Name M1 M2 M3 Meta data
(Number) (Varchar) (Number) (Number) (Number)

Data base: A data base is a collection of logically related data stored in a standardized format
and is sharable by multiple users. (OR)
A mass storage of data that is generated over a period of time in a business environment
is called “data base”.
Symbol of data base is:
Example:
1. University database: In this we can store data about students, faculty, courses, results, etc.
2. Bank database: In this we can store data about Account holders.
Database Management System (DBMS): It is a set of databases and a set of application

programs that allows us to organize, manipulate and retrieve information. (OR) The collection
of interrelated data and a set of programs to access data conveniently, easily and efficiently is
called as DBMS.
DBMS = Database + Software
DBMS S/W
User Database
 DBMS software is an interface between user and Database.
 DBMS provides services like storing, updating, deleting and selecting data.
Database system: It is a set of Databases, DBMS, Hardware and people who operate on it.
Evolution of database system:
Late 1950’s : Sequential file processing systems were used in late 1950’s. In these systems, all
records in a file must be processed in sequence.
1960’s : Random access file processing systems were greatly in use in this period. It
supported direct access to a specific record. In this, it was difficult to access
multiple records though they were related to a single file.
1970’s : During this decade the hierarchical and net work database management systems
were developed and were treated as first generation DBMS.
Late 1970’s: E. F. Codd and others developed the relational data model during this 1970’s. This
model is considered second generation DBMS. All data is represented in the form
of tables. SQL is used for data retrieval.
1980’s : Object Oriented model was introduced in 1980’s. In this model, both data and
their relationships (operations) are contained in a single structure known as object.
1990’s : Client/ Server computing were introduced, then data ware housing, internet
applications.
2000 : Object oriented database systems were introduced.
**File Based System**
File processing systems was an early attempt to computerize the manual filing system
that we are all familiar with. A file system is a method for storing and organizing computer files
and the data they contain to make it easy to find and access them. File systems may use a
storage device such as a hard disk or CD-ROM and involve maintaining the physical location of
the files.
Characteristics (OR) Advantages of File Processing System:
 It is a group of files storing data of an organization.
 Each file is independent from one another.
 Each file is called a flat file.
 Each file contained and processed information for one specific function, such as accounting
or inventory.
 Files are designed by using programs written in programming languages such as COBOL, C,
and C++.
 The physical implementation and access procedures are written into database application;
therefore, physical changes resulted in intensive rework on the part of the programmer.
 As systems became more complex, file processing systems offered little flexibility,
presented many limitations, and were difficult to maintain.
Limitations of the File Processing System/ File-Based Approach:
1. Separated and Isolated Data: To make a decision, a user might need data from two separate
files. First, the files were evaluated by analysts and programmers to determine the specific data
required from each file and the relationships between the data and then applications could be
written in a programming language to process and extract the needed data. Imagine the work
involved if data from several files was needed.
2. Duplication of data: Often the same information is stored in more than one file.
Uncontrolled duplication of data is not required for several reasons, such as:
• Duplication is wasteful. It costs time and money to enter the data more than once
• It takes up additional storage space, again with associated costs.

• Duplication can lead to loss of data integrity; in other words the data is no longer consistent.
3. Data Dependence: In file based approach application programs are data dependent. It means
that, with the change in the physical representation (how the data is physically represented in
disk) or access technique (how it is physically accessed) of data, application programs are also
affected and needs modification. In other words application programs are dependent on the how
the data is physically stored and accessed.
4. Difficulty in representing data from the user's view: To create useful applications for the
user, often data from various files must be combined. In file processing it was difficult to
determine relationships between isolated data in order to meet user requirements.
5. Data Inflexibility: Program-data interdependency and data isolation, limited the flexibility
of file processing systems in providing users with ad-hoc information requests.
6. Incompatible file formats: As the structure of files is embedded in the application programs,
the structures are dependent on the application programming language. For example, the
structure of a file generated by a COBOL program may be different from the structure of a file
generated by a 'C' program. The direct incompatibility of such files makes them difficult to
process jointly.
7. Data Security: The security of data is low in file based system because, the data is
maintained in the flat file(s) is easily accessible.
8. Poor data modeling of real world: The file based system is not able to represent the
complex data and interfile relationships, which results poor data modeling properties.
**Database Approach**
In order to remove all limitations of the File Based Approach, a new approach was
required that must be more effective known as Database approach.
The Database is a shared collection of logically related data, designed to meet the
information needs of an organization. A database is a computer based record keeping system
whose over all purpose is to record and maintains information. The database is a single, large
repository of data, which can be used simultaneously by many departments and users. Instead of
disconnected files with redundant data, all data items are integrated with a minimum amount of
duplication.
The database is no longer owned by one department but is a shared corporate resource.
The database holds not only the organization's operational data but also a description of this
data.
For this reason, a database is also defined as a self-describing collection of integrated
records.
The description of the data is known as the Data Dictionary or Meta Data (the 'data about
data'). It is the self-describing nature of a database that provides program-data independence.
A database implies separation of physical storage from use of the data by an application
program to achieve program/data independence. Changes (or updating) can be made to data
without affecting other components of the system.
In the DBMS approach, application program written in some programming language like
Java, Visual Basic.Net, and Developer 2000 etc. uses database connectivity to access the
database stored in the disk with the help of operating system's file management system.
The file system interface and DBMS interface for the university management system is shown.
Building blocks of a Database:
The following three components form the building blocks of a database. They store the
data that we want to save in our database.
Columns: Columns are similar to fields, that is, individual items of data that we wish to store.
A Student' Roll Number, Name, Address etc. are all examples of columns.
Rows: Rows are similar to records as they contain data of multiple. A row can be made up of as
many or as few columns as you want.
Tables: A table is a logical group of columns. For example, you may have a table that stores
details of customers' names and addresses. Another table would be used to store details of parts
and yet another would be used .for supplier's names and addresses.
It is the tables that make up the entire database and it is important that we do not
duplicate data at all.
**Characteristics (OR) Advantages of database (DBMS)**

DBMS: It is a set of databases and a set of application programs that allows us to organize,
manipulate and retrieve information. (OR) The collection of interrelated data and a set of
programs to access data conveniently, easily and efficiently is called as DBMS.
To create, manage and manipulate data in databases, a management system known as
database management system (DBMS) was developed. Following are the advantages of
database.
1. Organized/Related: It should be well organized and related.
2. Shared: Data in a database are shared among different users and applications.

3. Permanent: Data in a database exist permanently in the sense the data can live beyond the
scope of the process that created it.
4. Correctness: Data should be correct with respect to the real world entity that they represent.
5. Security: Data should be protected from unauthorized access.
6. Consistency: Whenever more than one data element in a database represents related real
world values, the values should be consistent with respect to the relationship.
7. Non-redundancy: No two data items in a database should represent the same real world
entity.
8. Independence: Data at different levels should be independent of each other so that the
changes in one level should not affect the other levels.
9. Easily Accessible: It should be available when and where it is needed i.e. it should be easily
accessible.
10.Recoverable: It should be recoverable in case of damage.
11.Flexible to change: It should be flexible to change.
**Logical and physical DBMS architecture (OR) Three level architecture of DBMS**

Following are the three levels of database architecture:

1. Physical Level (or) Internal Level
2. Conceptual Level (or) Logical Level
3. External Level
Physical Level: It is the physical representation of the database on the computer. This is the
lowest level in data abstraction. This level describes how the data is actually stored in the
physical memory like magnetic tapes, hard disks etc. In this level the file organization methods
like hashing, sequential, B+ tree comes into picture. At this level, developer would know the
requirement, size and accessing frequency of the records clearly.
Conceptual Level: This is the next level of abstraction. This level describes what data is stored
in the database and the relationships among the data. This level will not have any information
on what a user views at external level. This level contains the logical structure of the entire
database as seen by the DBA. This level will have all the data in the database. It is a complete
view of the data requirements of the organization that is independent of any storage
considerations. The conceptual level represents: all entities, their attributes, and their
relationships.
Any changes done in this level will not affect the external or physical levels of data. That
is any changes to the table structure or the relation will not modify the data that the user is
viewing at the external view or the storage at the physical level.
External Level: External level is related to the data which is viewed by individual end users.
This level includes a number of user views or external schemas. Views are the subsets of
database. This level is closest to the user. External view describes the segment of the database
that is required for a particular user group and hides the rest of the database from that user
group.
Need (OR) Advantages of Three-Level Architecture:
 The main objective of it is to provide data abstraction.
 Same data can be accessed by different users with different customized views.
 The user is not concerned about the physical data storage details.
 Physical storage structure can be changed without requiring changes in internal structure of
the database as well as users view.
 Conceptual structure of the database can be changed without affecting end users.
** DBA (Data Base Administrator) Functions & Roles **

In some contexts, Database Administrator (DBA) is also known as Database Manager.
Database management control is the responsibility of DBA. The functions of DBA include:
1. Database design
2. User training
3. Database security and integrity
4. Database system performance.
Database design: Database design includes conceptual database design and physical database
design.
Conceptual database design consists of defining the data elements, relationship and
constraints. To do conceptual database design, DBA Designers create different views of the
database. These views must then be integrated into a complete database structure, which defines
the logical structure of the entire database.
Physical database design determines the physical structure of the database. Technical
oriented DBA designers carry out the physical design. Their goal is to optimize the total
combination of hardware, software.
User training: The responsibility of DBA is to educate the users, in how to access (use) the
database through DBMS. This can be done by taking training sessions (classes) or by
interacting with users. An information center provides training and simple programming
services.
Database security and integrity: the DBA provides security procedures and controls to
prevent the abuse (misuse) of data. The DBA assigns ownership of a view to a specific group,
this permits limited access on database. Access to the database is controlled by a password
mechanism. The DBA is responsible for assigning passwords and controlling access
permission.
Data integrity maintains the accuracy and consistency of data value security mechanisms,
such as passwords and data views, protect data integrity.
Database system performance: A database system may respond very slowly when number of
users accessing the database system at a time. At this time, DBA and his staff (technical
personnel) analyze the situation and solve the system respond time problems. This can be done
by creating indexes and physical rearrangement of data.
A database administrator's responsibilities can also include the following tasks:
 Installing and upgrading the database server and application tools.
 Allocating system storage and planning future storage requirements for the database
system.

 Modifying the database structure, as necessary, from information given by application

developers.
 Ensuring compliance with database vendor license agreement.
 Controlling and monitoring user access to the database.
 Monitoring and optimizing the performance of the database.
 Planning for backup and recovery of database information.
 Backing up and restoring databases.
 Contacting database vendor for technical support.
** Data files & Indexes (Indices) **

Data file: A Data file is a computer file which stores data to be used by a computer application
or system. It generally does not refer to files that contain instructions or code to be executed
(typically called program files), or to files which define the operation or structure of an
application or system (which include configuration files, etc.); but specifically to information
used as input, or written as output by some other software program. This is especially helpful
when debugging a program.
Most computer programs work with files. This is because files help in storing information
permanently. Database programs create files of information. Compilers read source files and
generate executable files. A file itself is an ordered collection of bytes stored on a storage
device like tape, magnetic disk, optical disc etc.
Types of Data files: Data files can be stored in two ways:
 Text files.
 Binary files.
A text file (also called ASCII files) stores information in ASCII characters. A text file
contains human-readable characters. A user can read the contents of a text file or edit it using a
text editor. In text files, each line of text is terminated, (delimited) with a special character
known as EOL (End of Line) character.
Example of text files: A text document
A binary file is a file that contains information in the same format in which the
information is held in memory i.e. in the binary form. In binary file, there is no delimiter for a
line. Also no translations occur in binary files. As a result, binary files are faster and easier for a
program to read and write than the text files.
Indexes: A database index is a data structure that improves the speed of data retrieval
operations on a database table at the cost of additional writes and storage space to maintain the
index data structure. Indexes are used to quickly locate data without having to search every row
in a database table every time a database table is accessed. Indexes can be created using one or
more columns of a database table, providing the basis for both rapid random lookups and
efficient access of ordered records.

Indexes enforce uniqueness on the rows in a table.

 Index speeds up searching and sorting in a table based on key values.
 The primary key of a table is automatically indexed.
 Some fields can’t be indexed because of their data type.
** Types of databases **
A data base is a collection of logically related data stored in a standardized format and is
sharable by multiple users. The different database types are:
1. Centralized database
2. Operational database
3. End-user database
4. Commercial database
5. Personal database
6. Distributed database
Centralized database: Users from different locations can access this database from a remote
location at the central database, that store entire information and application programs at a
central computing facility for processing. The application programs pick up the appropriate data
from the database based on the transactions sent by the communications controller for
processing the transaction.
Data validation and verification is carried out by the application programs at the central
computer centre, and a registration number is allotted by the application programs located at the
central facility. The local area office keeps on recording it and hardly does any processing.
Operational database: This is more of a basic form of data that contain information relating to
the operations of an enterprise. Generally, such databases are organized on functional lines such
as marketing, production, employees, etc.
End user database: End user is the user of software, application or a product. This is a shared
database which is shared by users and is meant for use by the end users, just like managers at
different levels. They may not be concerned about the individual transactions as found in
operational databases. This database is more about the summary of the information.
Commercial database: This is a database that contains information which external users may
require. However, they will not be able to afford maintaining such huge database by them. It’s a
paid service to the user as the databases are subject specific. The access to commercial database
can be given through commercial links.
Personal database: The personal databases are maintained, generally, on personal computers.
They contain information that is meant for use only among a limited number of users, generally
working in the same department.
Distributed database: These databases have contributions from the common databases as well
as the data captured from the local operations. The data remains distributed at various sites in
the organization. As the sites are linked to each other with the help of communication links, the
entire collection of data at all the sites constitutes the logical database of the organization.
**Data Models**
According to Hoberman (2009), “A data model is a way of finding the tools for both business
and IT professionals, which uses a set of symbols and text to precisely explain a subset of real
information to improve communication within the organization and thereby lead to a more
flexible and stable application environment”.
 A data model is an idea which describes how the data can be represented and accessed
from software system after its complete implementation.

 It is a simple abstraction of complex real world data gathering environment.
 It defines data elements and relationships among various data elements for a specified
system.
 The main purpose of data model is to give an idea that how final system or software will
look like after development is completed.

Types of Data Models: Following are the types of Data Models
1. Hierarchical Model
2. Relational Model
3. Network Database Model
4. Entity Relationship Model
5. Object Model
1. Hierarchical Model: Hierarchical model was developed by IBM and North American
Rockwell known as Information Management System.
 It represents the data in a hierarchical tree structure. This structure represents the
information using “parent/ child” relationship.

 Each parent can have many children but each child has only one parent.
 This model is the first DBMS model.

 In this model, the data is sorted hierarchically.

 It uses pointer to navigate between the stored data.
2. Relational Model: Relational model is based on first-order predicate logic.

 This model was first proposed by E. F. Codd.
 It represents data as relations or tables.
 Relational database simplifies the database structure by making use of tables and
columns.
Relational data model is the primary data model, which is used widely around the world
for data storage and processing. This model is simple and it has all the properties and
capabilities required to process data with storage efficiency.
Concepts in Relational model:
Table (or) Relation -- In relational data model, relations are saved in the format of Tables. This
format stores the relation among entities. A table has rows and columns, where rows represent
records and columns represent the attributes.

Tuple − A single row of a table, which contains a single record for that relation is called a
tuple.
Relation instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and
their names.
Relation key − Each row has one or more attributes, known as relation key, which can identify
the row in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute
domain.
3. Network Database Model: Network Database Model is same like Hierarchical Model, but
the only difference is that it allows a record to have more than one parent.
 In this model, there is no need of parent to child association like the hierarchical model.
 It replaces the hierarchical tree with a graph.
 It represents the data as record types and one-to-many relationship.
 This model is easy to design and understand.
4. Entity Relationship Model: Entity Relationship Model is a high-level data model.

 It was developed by Chen in 1976.
 This model is useful in developing a conceptual design for the database.
 It is very simple and easy to design logical view of data.
 The developer can easily understand the system by looking at an ER model constructed.

In this diagram,
 Rectangle represents the entities. Eg. Doctor and Patient.
 Ellipse represents the attributes. Eg. DocId, Dname, PId, Pname. Attribute describes each
entity becomes a major part of the data stored in the database.

 Diamond represents the relationship in ER diagrams. Eg. Doctor diagnoses the Patient.
5. Object Model: Object model stores the data in the form of objects, classes and inheritance.
 This model handles more complex applications, such as Geographic Information System
(GIS), scientific experiments, engineering design and manufacturing.

 It is used in File Management System.
 It represents real world objects, attributes and behaviors.
 It provides a clear modular structure.
 It is easy to maintain and modify the existing code.
**Domains**
A domain is defined as the set of all unique values permitted for an attribute. For
example, a domain of date is the set of all possible valid dates, a domain of integer is all
possible whole numbers, a domain of day-of-week is Monday, Tuesday ... Sunday.
This in effect is defining rules for a particular attribute. If it is determined that an attribute
is a date then it should be implemented in the database to prevent invalid dates being entered.
If the system supports domain constraints then this invalid data would not have stored in the
first place. That is, the integrity of the database is being preserved.
**Types of Keys **
Table is a collection of data in the form of rows and columns. Rows are referred as records
and columns are referred as fields.
A table includes several following components, which are called its keys.
1. Primary Key
2. Foreign Key
3. Candidate Key
4. Super Key
5. Composite Key
6. Alternate key
Primary key: A primary is a column or set of columns in a table that uniquely identifies tuples
(rows) in that table.
Example: Student Table
Stu_Id Stu_Name Stu_Age
101 Steve 23
102 John 24
103 Robert 28
104 Carl 22
In the above Student table, the Stu_Id column uniquely identifies each row of the table.
 We denote the primary key by underlining the column name.
 The value of primary key should be unique for each row of the table. Primary key column
cannot contain duplicate values.
 Primary key column should not contain nulls.
 Primary keys are not necessarily to be a single column; more than one column can also be a
primary key for a table. For e.g. {Stu_Id, Stu_Name} collectively can play a role of primary
key in the above table, but that does not make sense because Stu_Id alone is enough to
uniquely identifies rows in a table then why to make things complex. Having that said, we
should choose more than one columns as primary key only when there is no single column
that can play the role of primary key.
Foreign key: Foreign keys are the columns of a table that points to the primary key of another
table. They act as a cross-reference between tables.
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it
points to the primary key of the Student table.
Course_enrollment table:
Student table:
Course_Id Stu_Id
Stu_Id Stu_Name Stu_Age
C01 101
101 Chaitanya 22
C02 102
102 Arya 26
C03 101
103 Bran 25
C05 102
104 Jon 21
C06 103

Candidate Key: A super key with no redundant attribute is known as candidate key. Candidate
keys are selected from the set of super keys, the only thing we take care while selecting
candidate key is: It should not have any redundant attributes. That’s the reason they are also
termed as minimal super key.
For example:
Emp_Id Emp_Number Emp_Name
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
There are two candidate keys in above table:
{Emp_Id}
{Emp_Number}
Note: A primary key is being selected from the group of candidate keys. That means we can
either have Emp_Id or Emp_Number as primary key.
Super key: A super key is a set or one of more columns (attributes) to uniquely identify rows in
a table. Often people get confused between super key and candidate key, so we will also discuss
a little about candidate key here.
How candidate key is different from super key?
Answer is simple – Candidate keys are selected from the set of super keys, the only thing
we take care while selecting candidate key is: It should not have any redundant attribute. That’s
the reason they are also termed as minimal super key.
Let’s take an example to understand this: Employee table
Emp_SSN Emp_Number Emp_Name
123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert
Super keys:
 {Emp_SSN}
 {Emp_Number}
 {Emp_SSN, Emp_Number}
 {Emp_SSN, Emp_Name}
 {Emp_SSN, Emp_Number, Emp_Name}
 {Emp_Number, Emp_Name}
All of the above sets are able to uniquely identify rows of the employee table.
Alternate Key: Out of all candidate keys, only one gets selected as primary key, remaining
keys are known as alternate or secondary keys.
Composite Key: A key that consists of more than one attribute to uniquely identify rows (also
known as records & tuples) in a table is called composite key.
** Database Constraints **
Database constraints are restrictions on the contents of the database or on database
operations.
Need of Constraints: Constraints in the database provide a way to guarantee that :
 The values of individual columns are valid.
 In a table, rows have valid primary key or unique key values.
 In a dependent table, rows have valid foreign key values that reference rows in a parent
table.
Different Types of constraints:
1. Domain Constraints
2. Key Constraints
3. Integrity Rule/ Constraint 1 (Entity Integrity Rule or Constraint)
4. Integrity Rule/ Constraint 2 (Referential Integrity Rule or Constraint)
5. General Constraints
Domain Constraints: Domain Constraints specifies that what set of values an attribute can
take. Value of each attribute X must be an atomic value from the domain of X.
The data type associated with domains includes integer, character, string, date, time, currency
etc. An attribute value must be available in the corresponding domain. Consider the example
below:
Key Constraints: Keys are attributes or sets of attributes that uniquely identify an entity within
its entity set. An Entity set E can have multiple keys out of which one key will be designated as
the primary key. Primary Key must have unique and not null values in the relational table.
Example of Key Constraints in a simple relational table –

Integrity Rule 1 (Entity Integrity Rule or Constraint): The Integrity Rule 1 is also called
Entity Integrity Rule or Constraint. This rule states that no attribute of primary key will contain
a null value. If a relation has a null value in the primary key attribute, then uniqueness property
of the primary key cannot be maintained. Consider the example below:
Integrity Rule 2 (Referential Integrity Rule or Constraint): The integrity Rule 2 is also
called the Referential Integrity Constraints. This rule states that if a foreign key in Table 1 refers
to the Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null or
be available in Table 2. For example,
Some more Features of Foreign Key: Let the table in which the foreign key is defined is
Foreign Table or details table i.e. Table 1 in above example and the table that defines the
primary key and is referenced by the foreign key is master table or primary table i.e. Table 2 in
above example. Then the following properties must be hold:
 Records cannot be inserted into a Foreign table if corresponding records in the master
table do not exist.

 Records of the master table or Primary Table cannot be deleted or updated if

corresponding records in the detail table actually exist.
General Constraints: General constraints are the arbitrary constraints that should hold in the
database. Domain Constraints, Key Constraints, Integrity Rule 1 (Entity Integrity) and 2
(Referential Integrity Constraints) are considered to be a fundamental part of the relational data
model. However, sometimes it is necessary to specify more general constraints like the CHECK
Constraints or the Range Constraints etc.
Check constraints can ensure that only specific values are allowed in certain column.
For example, if there is a need to allow only three values for the color like ‘Bakers Chocolate’,
‘Glistening Grey’ and ‘Superior White’, then we can apply the check constraint. All other
values like ‘GREEN’ etc would yield an error.
Range Constraints is implemented by BETWEEN and NOT BETWEEN. For example,
if it is a requirement that student ages be within 16 to 35, then we can apply the range
constraints for it.
The below example will explain Check Constraint and Range Constraint –
**Update Operations and Dealing with Constraint Violations**
The Update Operation: Consider two existing relations named EMPLOYEE and
DEPARTMENT.

Some Basic Points about the Figure –

 Here, ENO is a Primary Key and DNO is a Foreign Key in EMPLOYEE relation.
 Table that contain candidate key is called referenced relation and
 The table containing foreign key is called referencing relation.
 So, the relation DEPARTMENT is a referenced relation and
 The relation EMPLOYEE is a referencing relation.
The update operation violates only referential Integrity Constraints or Integrity Rule 2.
1. Updation in a referencing relation: Updation of referencing attribute may causes
referential integrity violation. The Updation restricted if causes violation. For Example,
If there is no violation, then updation will be allowed.

2. Updation in a referenced relation – There are again three options available if an updation
causes violation –
1. Reject the updation – (ON UPDATE NO ACTION): It prevents updating a parent

when there are children. It is the Default Constraint. For example,
2. Cascade Updation – (ON UPDATE CASCADE): If updation causes integrity violation,

then update in both the table i.e. if the tuples are updated from the referenced table, then
the tuple will also be updates from the referencing relation that is being updated.

3. Modify the referencing Attributes – (ON UPDATE SET NULL): sets null value or
some valid value in the foreign key field for corresponding updating referenced value. i.e.
changing/updating the referencing attribute values that cause the violation either null or
another valid value. If there is no restriction or constraint applied for putting the NULL
value in the referencing relation – then allow updating from referenced relation otherwise
prohibited.
** Relational Operations **
To manipulate relations, relational model supports nine relational algebra operations. Given
this simple and restricted data structure, it is possible to define some very powerful relational
operators which, from the users' point of view, act in parallel' on all entries in a table
simultaneously, although their implementation may require conventional processing.
Codd originally defined eight relational operators.
1. SELECT originally called RESTRICT
2. PROJECT
3. JOIN
4. PRODUCT
5. UNION
6. INTERSECT
7. DIFFERENCE
8. DIVIDE
SELECT: RESTRICTS the rows chosen from a table to those entries with specified attribute
values.
Example: SELECT item FROM stock_level WHERE quantity > 100
constructs a new, logical table - an unnamed relation with one column per row (i.e. item)
containing all rows from stock_level that satisfy the WHERE clause.
PROJECT: Selects rows made up of a sub-set of columns from a table.
Example: PROJECT stock_item OVER item AND description
produces a new logical table where each row contains only two columns - item and
description. The new table will only contain distinct rows from stock_item; i.e. any duplicate
rows so formed will be eliminated.
JOIN: Associates entries from two tables on the basis of matching column values.
Example: JOIN stock_item WITH stock_level OVER item
It is not necessary for there to be a one-to-one relationship between entries in two tables to
be joined - entries which do not match anything will be eliminated from the result, and entries
from one table which match several entries in the other will be duplicated the required number
of times.
PRODUCT: Builds a relation from two specified relations consisting of all possible
combinations of rows, one from each of the two relations.
For example, consider two relations, A and B, consisting of rows:
A: a B: d => A product B :a d
b e a e
c b d
b e
c d
c e
UNION: Builds a relation consisting of all rows appearing in either or both of the two relations.
A: a B: a => A union B: a
b e b
c c
e
INTERSECT: Builds a relation consisting of all rows appearing in both of the two relations.
A: a B: a => A intersect B: a
b e
c
DIFFERENCE: Builds a relation consisting of all rows appearing in the first and not in the
second of the two relations.
A: a B: a => A - B: b and B - A: e
b e c
c
DIVIDE: Takes two relations, one binary and one unary, and builds a relation consisting of all
values of one column of the binary relation that match, in the other column, all values in the
unary relation.
A: a x B: x => A divide B: a
a y y
a z
b x
c y
** Entity Relationship Model (ER Model)**

ER Model is a high-level data model, developed by Chen in 1976. This model defines the
data elements and relationships for a specified system. It is useful in developing a conceptual
design for the database & is very simple and easy to design logical view of data.
Importance of ER Model:
 ER Model is plain and simple for designing the structure.
 It saves time.
 Without ER diagrams you cannot make a database structure & write production code.
 It displays the clear picture of the database structure.
ER Diagrams:
 ERD stands for Entity Relationship diagram.
 It is a graphical representation of an information system.
 ER diagram shows the relationship between objects, places, people, events etc. within
that system.
 It is a data modeling technique which helps in defining the business process.
 It used for solving the design problems.
Following are the components of ER Diagram,

Notations Representation Description
Rectangle It represents the Entity.
Ellipse It represents the Attribute.
Diamond It represents the Relationship.
It represents the link between attribute and entity set to

Line
relationship set.
Double Rectangle It represents the weak entity.
It represents composite attribute which can be divided into

Composite subparts.
Attribute For eg. Name can be divided into First Name and Last
Name
Multi valued It represents multi valued attribute which can have many
Attribute values for a particular entity. For eg. Mobile Number.
It represents the derived attribute which can be derived

Derived Attribute
from the value of related attribute.
It represents key attribute of an entity which have a unique
Key Attribute value in a table.
For eg. Employee → EmpId (Employee Id is Unique).
**Entities**
Entities: An entity is a person, place, object, event or concept in the user environment about
which an organization maintains the data.
Types of Entities:
1. Strong Entity Types
2. Recursive Entity Types
3. Weak Entity Types
4. Composite Entity Types or Associative Entity Types
Notations Of different Entity Type in ER Diagram:
Entity
Strong Entity Type
Weak Entity Type
Recursive Entity Type
Composite Entity Type (or)

Associative Entity
Strong Entity Type: These are the entities which have a key attribute in its attribute list or a set
that has a primary key. The strong entity type is also called regular entity type. For Example,
The Student’s unique RollNo will identify the students. So, RollNo is set to be the
Primary Key of the STUDENT entity, & Hence STUDENT is a strong entity type because of its
key attribute.
Recursive Entity Type: It is also called Self Referential Relationship Entity Type. It is an
entity type with foreign key referencing to same table or itself. Recursive Entity Type occurs in
a unary relationship.
 “is married to” is a recursive one-to-one relationship between the instances of

PERSON type, i.e. one person gets married to another person.
 “manages” is a recursive one-to-many relationship between the instances of
EMPLOYEE type, i.e. one employee manages another employees.
Recursive relationships
PERSON Is married to EMPLOYEE manages
One-to-one One-to-many
To achieve recursive relationship we need to setup foreign key in a table as shown
below:
PK FK
emp_num Ename Job Sal manager_id
This is a foreign key (manager_id) in a table that references the primary key
(emp_num) values of the same table. Thus, the above relationship is recursive relationship.
Weak Entity Type: Entity Type with no key or Primary Key are called weak entity Type.
The Tuples of weak entity type may not be possible to differentiate using one attribute of weak
entity. For every weak entity, there should be unique OWNER entity type. In the below
example, CHILD is a WEAK entity type and Employee is the OWNER entity type.

Composite Entities: If a Many to Many relationship exist then we must eliminate it by using
composite entities. Composite Entities are the entities which exist as a relationship as well as an
entity. The many to many relationship will be converted to 1 to many relationship.
Composite Entities are also called Bridge Entities, because they act like a bridge between the
two entities which have many to many relationships. Bridge or Composite entity composed of
the primary keys of each of the entities to be connected. A composite entity is represented by a
diamond shape with in a rectangle in an ER Diagram.
In the following example, the associative entity “CERTIFICATE” has the attributes
“cnum” and “date” which are peculiar to the relationship. It associates the instances of
“STUDENT” and “COURSE”.
name cnum date title

rno cnum
STUDENT CERTIFICATE COURSE
**Attributes**
Attribute is a property or characteristics of an entity. It's also known as columns of the
table. In other words, an attribute is a list of all related information of an entity, which has valid
value. Following table shows attributes for few entities.
Entity/ Table Attribute/ Fields
STUDENT name, rno, age, marks, address, total, average
EMPLOYEE ename, salary, dob, doj, department
An attribute can have single value or multiple value or range of values. In addition, each
attribute can contain certain type of data like only numeric value, or only alphabets, or
combination of both, or date or negative or positive values etc. Depending on the values that an
attribute can take, it is divided into different types.
1. Simple Attribute: These kinds of attributes have values which cannot be divided further.
For example, STUDENT_ID attribute which cannot be divided further. Passport Number is
unique value and it cannot be divided.
2. Composite Attribute: This kind of attribute can be divided further to more than one simple
attribute. For example, address of a person. Here address can be further divided as Door#,
street, city, state and pin which are simple attributes.
3. Derived Attribute: Derived attributes are the one whose value can be obtained from other
attributes of entities in the database. For example, Age of a person can be obtained from date
of birth and current date. Average salary, annual salary, total marks of a student etc are few
examples of derived attribute.
4. Stored Attribute: The attribute which gives the value to get the derived attribute are called
Stored Attribute. In example above, age is derived using Date of Birth. Hence Date of Birth
is a stored attribute.
5. Single Valued Attribute: These attributes will have only one value. For example,
EMPLOYEE_ID, passport#, driving license#, SSN etc have only single value for a person.
6. Multi-Valued Attribute: These attribute can have more than one value at any point of time.
Manager can have more than one employee working for him, a person can have more than
one email address, and more than one house etc is the examples.
7. Simple Single Valued Attribute: This is the combination of above four types of attributes.
An attribute can have single value at any point of time, which cannot be divided further. For
example, EMPLOYEE_ID – it is single value as well as it cannot be divided further.
8. Simple Multi-Valued Attribute: Phone number of a person, which is simple as well as he
can have multiple phone numbers is an example of this attribute.
9. Composite Single Valued Attribute: Date of Birth can be a composite single valued
attribute. Any person can have only one DOB and it can be further divided into date, month
and year attributes.
10.Composite Multi-Valued Attribute: Shop address which is located two different locations
can be considered as example of this attribute.
**Relationship**
A Relationship is an association established between common fields (columns) of two or
more tables. A relationship defines how two or more entities are inter-related. For example,
STUDENT and CLASS entities are related as 'Student X studies in a Class Y'. Here 'Studies'
defines the relationship between Student and Class.
Degrees of Relationship: In a relationship two or more number of entities can participate. The
number of entities who are part of a particular relationship is called degrees of relationship. If
only two entities participate in the mapping, then degree of relation is 2 or binary. If three
entities are involved, then degree of relation is 3 or ternary. If more than 3 entities are involved
then the degree of relation is called n-degree or n-nary.

Cardinality of Relationship: How many number of instances of one entity is mapped to how
many number of instances of another entity is known as cardinality of a relationship. In a
‘studies’ relationship above, what we observe is only one Student X is studying in on Class Y.
i.e.; single instance of entity student mapped to a single instance of entity Class. This means the
cardinality between Student and Class is 1:1.
Based on the cardinality, there are 3 types of relationship.
1. One - to - One Relationship
2. One - to - Many Relationship
3. Many - to - Many Relationship
One - to - One Relationship: In One - to - One Relationship, one entity is related with only one
other entity. One row in a table is linked with only one row in another table and vice versa.
For example: A Country can have only one Capital City.
One - to - Many Relationship: In One - to - Many Relationship, one entity is related to many
other entities. One row in a table A is linked to many rows in a table B, but one row in a table B
is linked to only one row in table A.
For example: One Department has many Employees.
Many - to - Many Relationship: In Many - to - Many Relationship, many entities are related
with the multiple other entities. This relationship is a type of cardinality which refers the
relation between two entities.

For example: Various Books in a Library are issued by many Students.
**Defining relationship for college database**

Here we are going to design an Entity Relationship (ER) model for a college database. Say
we have the following statements.
1. A college contains many departments
2. Each department can offer any number of courses
3. Many instructors can work in a department
4. An instructor can work only in one department
5. For each department there is a Head
6. An instructor can be head of only one department
7. Each instructor can take any number of courses
8. A course can be taken by only one instructor
9. A student can enroll for any number of courses
10.Each course can have any number of students
Step 1: Identify the Entities
From the statements given, the entities are
1. Department
2. Course
3. Instructor
4. Student
Stem 2: Identify the relationships
1. One department offers many courses. But one particular course can be offered by only
one department. hence the cardinality between department and course is One to Many
(1:N)
2. One department has multiple instructors. But instructor belongs to only one department.
Hence the cardinality between department and instructor is One to Many (1:N)
3. One department has only one head and one head can be the head of only one department.
Hence the cardinality is one to one. (1:1)
4. One course can be enrolled by many students and one student can enroll for many
courses. Hence the cardinality between course and student is Many to Many (M:N)

5. One course is taught by only one instructor. But one instructor teaches many courses.
Hence the cardinality between course and instructor is Many to One (N :1)
Step 3: Identify the key attributes
 "Departmen_Name" can identify a department uniquely. Hence Department_Name is the
key attribute for the Entity "Department".
 Course_ID is the key attribute for "Course" Entity.
 Student_ID is the key attribute for "Student" Entity.
 Instructor_ID is the key attribute for "Instructor" Entity.
Step 4: Identify other relevant attributes
 For the department entity, other attributes are location
 For course entity, other attributes are course_name,duration
 For instructor entity, other attributes are first_name, last_name, phone
 For student entity, first_name, last_name, phone
Step 5: Draw complete ER diagram
By connecting all these details, we can now draw ER diagram as given below.

**Transform ER Diagram into Tables (Relational database) **

Since ER diagram gives us the good knowledge about the requirement and the mapping
of the entities in it, we can easily convert them as tables and columns. i.e.; using ER diagrams
one can easily created relational data model, which nothing but the logical view of the database.
There are various steps involved in converting it into tables and columns.
There are more than one processes and algorithms available to convert ER Diagrams into
Relational Schema. Some of them are automated and some of them are manual process. We
may focus here on the mapping diagram contents to relational basics.
ER Diagrams mainly comprised of:
 Entity and its attributes
 Relationship, which is association among entities.
Mapping Entity: An entity is a real world object with some attributes.
Mapping Process (Algorithm):

 Create table for each entity
 Entity's attributes should become fields of tables with their respective data types.
 Declare primary key
Mapping relationship: A relationship is association among entities.

Mapping process (Algorithm):

 Create table for a relationship.
 Add the primary keys of all participating Entities as fields of table with their respective
data types.
 If relationship has any attribute, add each attribute as field of table.
 Declare a primary key composing all the primary keys of participating entities.
 Declare all foreign key constraints.
Mapping Weak Entity Sets: A weak entity sets are one which does not have any primary key
associated with it.

 Create table for weak entity set
 Add all its attributes to table as field
 Add the primary key of identifying entity set
 Declare all foreign key constraints
Mapping hierarchical entities: ER specialization or generalization comes in the form of
hierarchical entity sets.


 Create tables for all higher level entities
 Create tables for lower level entities
 Add primary keys of higher level entities in the table of lower level entities
 In lower level tables, add all other attributes of lower entities.
 Declare primary key of higher level table the primary key for lower level table
 Declare foreign key constraints.

UNIT- II
** Database Integrity **
Data integrity contains guidelines for data retention, specifying or guaranteeing the length
of time data can be retained in a particular database. To achieve data integrity, these rules are
consistently and routinely applied to all data entering the system, and any relaxation of
enforcement could cause errors in the data. Implementing checks on the data as close as
possible to the source of input (such as human data entry), causes less erroneous data to enter
the system. Strict enforcement of data integrity rules causes the error rates to be lower, resulting
in time saved troubleshooting and tracing erroneous data and the errors it causes algorithms.
Data integrity also includes rules defining the relations a piece of data can have, to other
pieces of data, such as a Customer record being allowed to link to purchased Products, but not
to unrelated data such as Corporate Assets. Data integrity often includes checks and correction
for invalid data, based on a fixed schema or a predefined set of rules.
Types of integrity constraints: Data integrity is normally enforced in a database system by a
series of integrity constraints or rules. Three types of integrity constraints are an inherent part of
the relational data model: entity integrity, referential integrity and domain integrity:
 Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule
which states that every table must have a primary key and that the column or columns
chosen to be the primary key should be unique and not null.
 Referential integrity concerns the concept of a foreign key. The referential integrity rule
states that any foreign-key value can only be in one of two states. The usual state of affairs
is that the foreign-key value refers to a primary key value of some table in the database.
Occasionally, and this will depend on the rules of the data owner, a foreign-key value can
be null. In this case, we are explicitly saying that either there is no relationship between the
objects represented in the database or that this relationship is unknown.
 Domain integrity specifies that all columns in a relational database must be declared upon
a defined domain. The primary unit of data in the relational data model is the data item.
Such data items are said to be non-decomposable or atomic. A domain is a set of values of
the same type. Domains are therefore pools of values from which actual values appearing
in the columns of a table are drawn.
 User-defined integrity refers to a set of rules specified by a user, which do not belong to
the entity, domain and referential integrity categories.

If a database supports these features, it is the responsibility of the database to ensure data
integrity as well as the consistency model for the data storage and retrieval.
** Data Redundancy **
Data redundancy in database means that some data fields are repeated in the database.
This data repetition may occur either if a field is repeated in two or more tables or if the field is
repeated within the table.
Data can appear multiple times in a database for a variety of reasons. For example, a shop
may have the same customer’s name appearing several times if that customer has bought
several different products at different dates.
Disadvantages (OR) Problems associated with data redundancy:
1. Increases the size of the database unnecessarily.
2. Causes data inconsistency.
3. Decreases efficiency of database.
4. May cause data corruption.
** Functional Dependency **
When one attribute in a relation uniquely determines another attribute is called as
functional dependency.
For example, if an attribute ‘X’ determines the value of ‘Y’ it can be written as ‘XY’ it
means, “Y is functionally dependent upon X”.
Here, X- is the determinant attribute
Y- is the dependent attribute.
The common functional dependencies are:
1) Partial dependency.
2) Transitive dependency.
Partial dependency: A non-key attribute is partially (not fully) depended on key attribute (or)
A dependency based on only part of primary key is known as partial dependency.
Transitive dependency: A non-key attribute is depended on another non-key attribute (or) A

dependency based on an attribute that is not part of primary key is known as transitive
dependency.

rno sname group fee skills faculty
Transitive dependency
Partial dependency
** Single Valued Dependency **

Database is a collection of related information in which one information depends on
another information. The information is either single-valued or multi-valued. For example, the
name of the person or his/her date of birth is single valued facts. But the qualification of a
person is a multivalued facts.
** Normalization **
Normalization is a process of correcting and evaluating table structures to minimize data
redundancies and reducing data anomalies. (OR) it is a step-by-step decomposition of complex
records into simple records.
 Normalization follows series of stages called “normal forms” like:
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
 Normalization involve decomposition (division) of “tables with anomalies” in “smaller
well structured tables”
Rules of Data Normalization:
1. Eliminate Repeating Groups - Make a separate table for each set of related attributes, and
give each table a primary key.
2. Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key,
remove it to a separate table.
3. Eliminate Columns Not Dependent On Key - If attributes do not contribute to a
description of the key, remove them to a separate table.
4. Isolate Independent Multiple Relationships - No table may contain two or more 1:m or
m:n relationships that are not directly related.
5. Isolate Semantically Related Multiple Relationships - There may be practical constrains
on information that justify separating logically related many-to-many relationships.
Consider the following STUDENT table and see how the table is normalized from 1NF to
3NF.
rno sname group fee skills
101 Ravi MBA 30000 C
C++
java
The above STUDENT table consists of multi valued attributes (skills), this can be removed
in 1NF. It consists of partial dependencies which are removed in 2NF and also it consists of
Transitive dependencies which are removed in 3NF.
First Normal Form (1NF): The lowest possible implementation of normal forms is 1NF. A
database table in 1NF must satisfy the following conditions.
 The primary key entity requirements are met.
 Each row and column intersections can contain one and only one value.
 All the table’s attributes are dependent on the primary key attribute.
The above table is changed into following table to satisfy 1NF.
rno sname group fee skills
101 Ravi MBA 30000 C
101 Ravi MBA 30000 C++
101 Ravi MBA 30000 JAVA
Data dependency is:

Table: STUDENT
rno skills sname group fee
Second Normal Form (2NF): A database table in 2NF must satisfy the following conditions.
 The table must be in 1NF.
 The table contains no partial dependencies. It means every non-key attribute is fully
depended on key-attribute.
A dependency based on only part of primary key is known as Partial dependency. The
above table consists of partial dependency (sname, group, fee are dependent on rno i,e, part
of Primary Key). So we need to remove this partial dependency from the above table to
satisfy 2NF. For this we decompose the STUDENT table into STUDENT and SKILLS
tables.
STUDENT SKILLS
rno sname group fee rno skills
101 Ravi MBA 30000 101 C
101 C++
101 JAVA
Removing partial dependency:
Table: STUDENT Table: SKILLS
rno sname group fee rno skills
Now the above table is in second Normal form.
Third Normal Form (3NF): A database table in 3NF must satisfy the following conditions.
 The table must be in 2NF
 The table contains no Transitive dependencies.
A dependency based on an attribute that is not part of primary key is known as Transitive
dependency. The above STUDENT table has transitive dependency i.e. dependency between
“fee” and “group” attributes. So we need to remove this transitive dependency by decomposing
the STUDENT table into STUDENT and GROUP.
STUDENT GROUP SKILLS
rno sname group group fee rno skills
101 Ravi MBA MBA 30000 101 C
101 C++
101 JAVA
Removing Transitive dependencies:

Table: STUDENT Table: GROUP Table: SKILLS
rno sname group group fee rno skills

Diagrammatic Representation of Normalization:

Remove multi valued Remove partial Remove transitive
Attributes Dependency Dependency
Table
With 1NF 2NF 3NF
anomalies
BCNF (Boyce- Codd Normal form):BCNF stands for Boyce-Codd Normal Form. This normal
form is considered to be a special case of 3NF. But there are few differences between BCNF
and 3NF.
 3NF is satisfying 2NF and removing Transitive dependency.
 Transitive dependency exists only when a non-key attribute determines another non-key
attribute.
 But it is possible for a non-key attribute to be the determinant of PK or part of PK
without violating the 3NF requirements. This is nothing but BCNF.
 A table is in BCNF if and only if every determinant in the relation is a candidate key.
Consider the following table:
sid subject faculty
123 Phy Raman
123 Cs James
423 Phy Raman
423 Cs James
537 Cs Patrik
The above table that is in 3NF can be converted to a table in BCNF using simple two step
process.
1. The table is modified so that the determinant in the table that is not a candidate key
(Faculty) becomes a component of PK of the revised table.
sid faculty subject

2. In the above, subject is partially depended on Faculty i.e. there exist Partial dependency.
So decompose the table to eliminate the partial dependency and then the tables satisfy
BCNF.
sid faculty faculty Subject

123 Raman Raman Phy
123 James James Cs
423 Raman Patrik Cs
423 James
537 Patrik

** Decomposition **
 Decomposition is the process of breaking down in parts or elements.
 It replaces a relation with a collection of smaller relations.
 It breaks the table into multiple tables in a database.
 It should always be lossless, because it confirms that the information in the original
relation can be accurately reconstructed based on the decomposed relations.
 If there is no proper decomposition of the relation, then it may lead to problems like loss
of information.
Properties of Decomposition: Following are the properties of Decomposition,
1. Lossless Decomposition
2. Dependency Preservation
3. Lack of Data Redundancy
1. Lossless Decomposition: Decomposition must be lossless. It means that the information
should not get lost from the relation that is decomposed. It gives a guarantee that the join will
result in the same relation as it was decomposed.
Example: Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1, E2,
E3, . . . . En; With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as
'Lossless Join Decomposition'.
In the above example, it means that, if natural joins of all the decomposition give the
original relation, then it is said to be lossless join decomposition.
Example: <Employee_Department> Table
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource
 Decompose the above relation into two relations to check whether decomposition is lossless
or lossy. Now, we have decomposed the relation that is Employee and Department.
Relation 1: <Employee> Table
Eid Ename Age City Salary
E001 ABC 29 Pune 20000
E002 PQR 30 Pune 30000
E003 LMN 25 Mumbai 5000
E004 XYZ 24 Mumbai 4000
E005 STU 32 Bangalore 25000
 Employee Schema contains (Eid, Ename, Age, City, Salary).

Relation 2: <Department> Table

Deptid Eid DeptName
D001 E001 Finance
D002 E002 Production
D003 E003 Sales
D004 E004 Marketing
D005 E005 Human Resource
 Department Schema contains (Deptid, Eid, DeptName).
 So, the above decomposition is a Lossless Join Decomposition, because the two relations
contain one common field that is 'Eid' and therefore join is possible.
 Now apply natural join on the decomposed relations.
Employee ⋈ Department
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource
Hence, the decomposition is Lossless Join Decomposition.

 If the <Employee> table contains (Eid, Ename, Age, City, Salary) and <Department>
table contains (Deptid and DeptName), then it is not possible to join the two tables or
relations, because there is no common column between them. And it becomes Lossy Join
Decomposition.
2. Dependency Preservation:
 Dependency is an important constraint on the database.
 Every dependency must be satisfied by at least one decomposed table.
 If {A → B} holds, then two sets are functional dependent. And, it becomes more useful
for checking the dependency easily if both sets in a same relation.
 This decomposition property can only be done by maintaining the functional dependency.
 In this property, it allows to check the updates without computing the natural join of the
database structure.
3. Lack of Data Redundancy
 Lack of Data Redundancy is also known as a Repetition of Information.
 The proper decomposition should not suffer from any data redundancy.
 The careless decomposition may cause a problem with the data.
 The lack of data redundancy property may be achieved by Normalization process.

** Physical Database Design Issues or Mistakes **

 Relying on the defaults – oftentimes the default settings are not optimal and can cause
performance problems (or operational problems).
 Not basing the physical on a logical model – every database should begin its life as part
of a logical data model; failure to do so will probably cause data integrity issues.
 Normalization problems – could exhibit itself as too much denormalization… or even
over-normalization. Be sure the design is usable given today’s DBMS capabilities and
our organization’s requirements.
 Not enough indexes – indexes optimize data access – build as many of them as you need
to assure optimal performance (without causing data modification issues… because
indexes need to be modified when the table data is updated/inserted/deleted).
 Indexing by table, not by workload – indexes should be created to optimize our query
workload. Yet many still create indexes when creating tables… well before the SQL to
access the data is known.
 Too much (or not enough) free space – use free space appropriately. Free space
provides room for data to grow in between reorganizations. Don’t specify too much free
space (or queries that must read multiple pages of data may not perform well). Don’t
specify too little free space (or data may become disorganized too quickly). And if the
data is static, then don’t specify any free space at all!
 Failing to plan for data archiving – if you never plan to remove data from your tables
then they will grow and grow and grow and… eventually, they may become
unmanageable.
 Failure to share data –Databases are designed for sharing data among users and
applications. Failing to share data is the reason you may have 287 different customer
databases.
** Storage of Database on Hard disks **

Hard disk drives are the most common secondary storage devices in present computer
systems. These are called magnetic disks because they use the concept of magnetization to store
information. Hard disks consist of metal disks coated with magnetizable material. These disks
are placed vertically on a spindle. A read/write head moves in between the disks and is used to
magnetize or de-magnetize the spot under it. A magnetized spot can be recognized as 0 (zero) or
1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk
plate has many concentric circles on it, called tracks. Every track is further divided into
sectors. A sector on a hard disk typically stores 512 bytes of data.
Redundant Array of Independent Disks: RAID or Redundant Array of Independent Disks, is
a technology to connect multiple secondary storage devices and use them as a single storage
media.
RAID consists of an array of disks in which multiple disks are connected together to
achieve different goals. RAID levels define the use of disk arrays.
RAID 0: In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receives a block of data to
write/read in parallel. It enhances the speed and performance of the storage device. There is no
parity and backup in Level 0.
RAID 1: RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
copy of data to all the disks in the array. RAID level 1 is also called mirroring and provides
100% redundancy in case of a failure.
RAID 2: RAID 2 records Error Correction Code using Hamming distance for its data, striped
on different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC
codes of the data words are stored on different set disks. Due to its complex structure and high
cost, RAID 2 is not commercially available.
RAID 3: RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4: In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas level
4 uses block-level striping. Both level 3 and level 4 require at least three disks to implement
RAID.
RAID 5: RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.

RAID 6: RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement RAID.
** File Organization **
File organization is a way of organizing the data or records in a file. It does not refer to
how files are organized in folders, but how the contents of a file are added and accessed. There
are several types of file organization, the most common of them are sequential, relative and
indexed. They differ in how easily records can be accessed and the complexity in which records
can be organized.
Some of the file organizations are
1. Sequential File Organization
2. Heap File Organization
3. Hash/Direct File Organization
4. Indexed Sequential Access Method
5. B+ Tree File Organization
6. Cluster File Organization
** Heap File (Unordered) Organization **

This is the simplest form of file organization. Here records are inserted at the end of the
file as and when they are inserted. There is no sorting or ordering of the records. Once the data
block is full, the next record is stored in the new block. This new block need not be the very
next block. This method can select any block in the memory to store the new records. It is
similar to pile file in the sequential method, but here data blocks are not selected sequentially.
They can be any data blocks in the memory. It is the responsibility of the DBMS to store the
records and manage them.

If a new record is inserted, then in the above case it will be inserted into data block 1.
When a record has to be retrieved from the database, in this method, we need to traverse
from the beginning of the file till we get the requested record. Hence fetching the records in
very huge tables, it is time consuming. This is because there is no sorting or ordering of the
records. We need to check all the data.
Similarly if we want to delete or update a record, first we need to search for the record.
Again, searching a record is similar to retrieving it- start from the beginning of the file till the
record is fetched. If it is a small file, it can be fetched quickly. But larger the file, greater
amount of time needs to be spent in fetching.
In addition, while deleting a record, the record will be deleted from the data block. But it
will not be freed and it cannot be re-used. Hence as the number of record increases, the
memory size also increases and hence the efficiency. For the database to perform better, DBA
has to free this unused memory periodically.

Advantages of Heap File Organization:

 Very good method of file organization for bulk insertion. i.e.; when there is a huge
number of data needs to load into the database at a time, then this method of file
organization is best suited. They are simply inserted one after the other in the memory
blocks.
 It is suited for very small files as the fetching of records is faster in them. As the file size
grows, linear search for the record becomes time consuming.
Disadvantages of Heap File Organization:
 This method is inefficient for larger databases as it takes time to search/modify the
record.
 Proper memory management is required to boost the performance. Otherwise there would
be lots of unused memory blocks lying and memory size will simply be growing.
** Sequential File Organization **

It is one of the simple methods of file organization. Here each file/records are stored one
after the other in a sequential manner. This can be achieved in two ways:
Records are stored one after the other as they are inserted into the tables. This method is
called pile file method. When a new record is inserted, it is placed at the end of the file. In the
case of any modification or deletion of record, the record will be searched in the memory
blocks. Once it is found, it will be marked for deleting and new block of record is entered.
In the diagram above, R1, R2, R3 etc are the records. They contain all the attribute of a
row. i.e.; when we say student record, it will have his id, name, address, course, DOB etc.
Similarly R1, R2, R3 etc can be considered as one full set of attributes.
In the second method, records are sorted (either ascending or descending) each time they
are inserted into the system. This method is called sorted file method.

Sorting of records may be based on the primary key or on any other columns. Whenever
a new record is inserted, it will be inserted at the end of the file and then it will sort – ascending
or descending based on key value and placed at the correct position. In the case of update, it
will update the record and then sort the file to place the updated record in the right place. Same
is the case with delete.
Advantages of Sequential File Organization:

 The design is very simple compared other file organization. There is no much effort
involved to store the data.
 When there are large volumes of data, this method is very fast and efficient. This method
is helpful when most of the records have to be accessed like calculating the grade of a
student, generating the salary slips etc where we use all the records for our calculations
 This method is good in case of report generation or statistical calculations.
 These files can be stored in magnetic tapes which are comparatively cheap.
Disadvantages of Sequential File Organization:
 Sorted file method always involves the effort for sorting the record. Each time any
insert/update/ delete transaction is performed, file is sorted. Hence identifying the record,
inserting/ updating/ deleting the record, and then sorting them always takes some time
and may make system slow.
** Indexed Sequential Access Method (ISAM) **

This is an advanced sequential file organization method. Here records are stored in order
of primary key in the file. Using the primary key, the records are sorted. For each primary key,
an index value is generated and mapped with the record. This index is nothing but the address
of record in the file.

In this method, if any record has to be retrieved, based on its index value, the data block
address is fetched and the record is retrieved from memory.
Advantages of ISAM:
 Since each record has its data block address, searching for a record in larger database is
easy and quick. There is no extra effort to search records. But proper primary key has to
be selected to make ISAM efficient.
 This method gives flexibility of using any column as key field and index will be
generated based on that. In addition to the primary key and its index, we can have index
generated for other fields too. Hence searching becomes more efficient, if there is search
based on columns other than primary key.
 It supports range retrieval, partial retrieval of records. Since the index is based on the key
value, we can retrieve the data for the given range of values. In the same way, when a
partial key value is provided, say student names starting with ‘JA’ can also be searched
easily.
Disadvantages of ISAM:
 An extra cost to maintain index has to be afforded. i.e.; we need to have extra space in the
disk to store this index value. When there is multiple key-index combinations, the disk
space will also increase.
 As the new records are inserted, these files have to be restructured to maintain the
sequence. Similarly, when the record is deleted, the space used by it needs to be released.
Else, the performance of the database will slow down.
** Hash/Direct File Organization **

In this method of file organization, hash function is used to calculate the address of the
block to store the records. The hash function can be any simple or complex mathematical
function. The hash function is applied on some columns/attributes – either key or non-key
columns to get the block address. Hence each record is stored randomly irrespective of the order
they come. Hence this method is also known as Direct or Random file organization. If the hash
function is generated on key column, then that column is called hash key, and if hash function is
generated on non-key column, then the column is hash column.

When a record has to be retrieved, based on the hash key column, the address is
generated and directly from that address whole record is retrieved. Here no effort to traverse
through whole file. Similarly when a new record has to be inserted, the address is generated by
hash key and record is directly inserted. Same is the case with update and delete. There is no
effort for searching the entire file nor sorting the files. Each record will be stored randomly in
the memory.
These types of file organizations are useful in online transaction systems, where retrieval
or insertion/updation should be faster.
Advantages of Hash File Organization:
 Records need not be sorted after any of the transaction. Hence the effort of sorting is
reduced in this method.
 Since block address is known by hash function, accessing any record is very faster.
Similarly updating or deleting a record is also very quick.
 This method can handle multiple transactions as each record is independent of other. i.e.;
since there is no dependency on storage location for each record, multiple records can be
accessed at the same time.
 It is suitable for online transaction systems like online banking, ticket booking system
etc.

Disadvantages of Hash File Organization:

 This method may accidentally delete the data. For example, In Student table, when hash
field is on the STD_NAME column and there are two same names – ‘Antony’, then same
address is generated. In such case, older record will be overwritten by newer. So there
will be data loss. Thus hash columns needs to be selected with utmost care. Also, correct
backup and recovery mechanism has to be established.
 Since all the records are randomly stored, they are scattered in the memory. Hence
memory is not efficiently used.
 Searching for records with exact name or value will be efficient. If the Student name
starting with ‘B’ will not be efficient as it does not give the exact name of the student.
 Hardware and software required for the memory management are costlier in this case.
Complex programs needs to be written to make this method efficient.
** Types of Indexes **
We know that data is stored in the form of records. Every record has a key field, which
helps it to be recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done. Indexing in database
systems is similar to what we see in books. Indexing is defined based on its indexing attributes.
Indexing can be of the following types:
 Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
 Secondary Index − Secondary index may be generated from a field which is a candidate
key and has a unique value in every record, or a non-key with duplicate values.
 Clustering Index − Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field.
Ordered Indexing is of two types:
 Dense Index
 Sparse Index
Dense Index:
In dense index, there is an index record for every search key value in the database. This
makes searching faster but requires more space to store index records itself. Index records
contain search key value and a pointer to the actual record on the disk.

Sparse Index: In sparse index, index records are not created for every search key. An index
record here contains a search key and an actual pointer to the data on the disk. To search a
record, we first proceed by index record and reach at the actual location of the data. If the data
we are looking for is not where we directly reach by following the index, then the system starts
sequential search until the desired data is found.
Multilevel Index: Index records comprise search-key values and data pointers. Multilevel
index is stored on the disk along with the actual database files. As the size of the database
grows, so does the size of the indices. There is an immense need to keep the index records in
the main memory so as to speed up the search operations. If single-level index is used, then a
large size index cannot be kept in memory which leads to multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which can easily
be accommodated anywhere in the main memory.

** B+ Tree **
A B+ tree is a balanced binary search tree that follows a multi-level index format. The
leaf nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at
the same height, thus balanced. Additionally, the leaf nodes are linked using a link list;
therefore, a B+ tree can support random access as well as sequential access.
Structure of B+ Tree: Every leaf node is at equal distance from the root node. A B+ tree is
of the order n where n is fixed for every B+ tree.
Internal nodes:
 Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
 At most, an internal node can contain n pointers.
Leaf nodes:
 Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
 At most, a leaf node can contain n record pointers and n key values.
 Every leaf node contains one block pointer P to point to next leaf node and forms a
linked list.
B Tree Insertion: B+ trees are filled from bottom and each entry is done at the leaf node.
+
 If a leaf node overflows:

o Split node into two parts.
o Partition at i = ⌊ (m+1)/2 ⌋.
o First i entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
th
o i key is duplicated at the parent of the leaf.
 If a non-leaf node overflows:
o Split node into two parts.
o Partition the node at i = ⌈(m+1)/2⌉.
o Entries up to i are kept in one node.
o Rest of the entries are moved to a new node.
B Tree Deletion: B+ tree entries are deleted at the leaf nodes.
+
 The target entry is searched and deleted.

o If it is an internal node, delete and replace with the entry from the left position.
 After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
 If distribution is not possible from left, then
o Distribute from the nodes right to it.
 If distribution is not possible from left or from right, then
o Merge the node with left and right to it.

** Multi key File Organization **

In this section, we have introduced a family of file organization schemes that allow records
to be accessed by more than one key field. Until this point, we have considered only single-key
file organization. Sequential by a given key, direct access by a particular key and indexed
sequential giving both direct and sequential access by a single key. Now we enlarge our base to
include those file organization that enable a single data file to support multiple access paths,
each by a different key. These file organization techniques are at the heart of database
implementation.
Need for the Multiple Access Path:
Many interactive information systems require the support of multi-key files. Consider a
banking system in which there are several types of users: teller, loan officers, branch manager,
bank officers, account holders, and so forth. All have the need to access the same data, say
records of the format shown in figure 22.
Various types of users need to access these records in different ways. A teller might
identify an account record by its ID value. A loan officer might need to access all account
records with a given value for OVERDRAW-LIMIT, or all account records for a given value of
SOCNO. A branch manager might access records by the BRANCH and TYPE group code. A
bank officer might want periodic reports of all accounts data, sorted by ID. An account holder
(customer) might be able to access his or her own record by giving the appropriate ID value or a
combination of NAME, SOCNO, and TYPE code.
Figure 22 : Example record format

Support by Adding Indexes:
An approach to being able to support several different kinds of access to a collection of data
records is to have one data file with multiple access paths. Now there is only one copy of any
data record to be updated, and the update synchronization problem caused by record duplication
is avoided. This approach is called multi-key file organization.
The concept of multiple-key access generally is implemented by building multiple indexes to
provide different access paths to the data records. There may also be multiple linked lists
through the data records.
We have seen already that an index can be structured in several ways, for example as a table,
a binary search tree, a B tree, or a B+ tree. The most appropriate method of implementing a
particular multi-key file is dependent upon the actual uses to be made of the data and the kinds
of multi-key file support available.

** Multi list file organization **

Before defining multi list file organization, let us understand the difference between
linked organization and sequential file organization.
 Linked organizations differ from sequential organizations essentially in that the logical
sequence of records is generally from the physical sequence.
 In a sequential organization, if the ith record of the file is at location 1i then the i + 1'st record
is in the next physical position 1i + c where c may be the length of the i'th record or some
constant that determines the inter-record spacing.
 In a linked organization the next logical record is obtained by following a link value from
the present record. Linking records together in order of increasing primary key value
facilitates easy insertion and deletion once the place at which the insertion or deletion to be
made is known.
 Searching for a record with a given primary key value is difficult when no index is available,
since the only search possible is a sequential search. To facilitate searching on the primary
key as well as on secondary keys it is customary to maintain several indexes, one for each
key.
 An employee number index, for instance, may contain entries corresponding to ranges of
employee numbers. One possibility for the example of figure 23 would be to have an entry
for each of the ranges 501-700, 701-900 and 901-1100. All records having E# in the same
range will be linked together as in figure 24.
 Using an index in this way reduces the length of the lists and thus the search time. This idea
is very easily generalized to allow for easy secondary key retrieval. We just set up indexes
for each key and allow records to be in more than one list. This leads to the multi list
structure for file representation.
 Figure 25 shows the indexes and lists corresponding to multi list representation of the data of
figure 24. It is assumed that the only fields designated as keys are: E#, Occupation, Sex and
Salary. Each record in the file, in addition to all the relevant information fields, has 1 link
field for each key field.
Figure 23 : Sample data for Employee file

 The logical order of records in any particular list may or may not be important depending
upon the application. In the example file, lists corresponding to E#, Occupation and Sex
have been set up in order of increasing E#.

 The salary lists have been set up in order of increasing salary within each range (record A
precedes D and C even though E#(C) and E#(D) are less than E#(A)).
Figure 24 : Linking together all seconds in the same type
Figure 25 : Multi list representation for figure 23

 Inserting a new record into a multilist structure is easy so long as the individual lists do not
have to be maintained in some order. In this case the record may be inserted at the front of
the appropriate lists.
 Deletion of a record is difficult since there are no back pointers. Deletion may be simplified
at the expense of doubling the number of link fields and maintaining each list as a doubly
linked list. When space is at a premium, this expense may not be acceptable.
** Inverted File Organization **

Conceptually, inverted files are similar to multi lists. The difference is that while in multi
lists records with the same key value are linked together with link information being kept in
individual records, in the case of inverted files this link information is kept in the index itself.
Figure 27 shows the indexes for the file of figure 24. A slightly different strategy has been used
in the E# and salary indexes than was used in figure 26, though the same strategy could have
been used here too.
 We shall assume that the index for every key is dense and contains a value entry for each
distinct value in the file. Since the index entries are variable length (the number of records
with the same key value is variable), index maintenance becomes more complex than for
multi list. However, several benefits accrue from this scheme. The retrieval works in two
steps. In the first step, the indexes are processed to obtain a list of records satisfying the
query and in the second, these records are retrieved using this list. The number of disk
accesses needed is equal to the number of records being retrieved plus the number to process
the indexes.
 Inverted files represent one extreme of file organization in which only the index structures
are important. The records themselves may be stored in any way (sequentially ordered by
primary key, random, linked ordered by primary key etc.).
Figure 27 : Indexes for fully inverted file

 Inverted files may also result in space saving compared with other file structures when
record retrieval does not require retrieval of key fields. In this case, the key fields may be
deleted from the records. In the case of multilist structures, this deletion of key fields is
possible only with significant loss in system retrieval performance. Insertion and deletion of
records requires only the ability to insert and delete within indexes.

UNIT- III
** Structured Query Language **
 SQL stands for Structured Query Language, developed by E.F. Codd from IBM in the year
1975.
 It is a programming language which stores, manipulates and retrieves the stored data in
RDBMS.
 SQL syntax is not case sensitive.
 SQL is standardized by both ANSI and ISO.
 It is a standard language for accessing and manipulating databases.
Characteristics of SQL:
 SQL is extremely flexible.
 SQL uses a free form syntax that gives the ability to user to structure the SQL statements in
a best suited way.
 It is a high level language.
 It receives natural extensions to its functional capabilities.
 It can execute queries against the database.
Advantages of SQL:
 SQL provides a greater degree of abstraction than procedural language.
 It is coded without embedded data-navigational instructions.
 It enables the end users to deal with a number of database management systems where it is
available.
 It retrieves quickly and efficiently huge amount of records from a database.
 No coding required while using standard SQL.
** SQL Commands **
SQL commands can be classified into 4 types.
1. Data Definition Language (DDL)
2. Data Manipulation Language (DML)
3. Data Control Language (DCL)
4. Transaction Control Language (TCL)
** Data Definition Language (OR) DDL Commands **

DDL commands are used to define the structure of a database. DDL deals with metadata.
The commands are: CREATE, ALTER, TRUNCATE, DROP and RENAME
1) CREATE COMMAND: It is used to create database objects such as tables, views, etc.
Syntax:
CREATE TABLE <table-name> (Column_name_1 data type (size),
Column_name_2 data type (size),
:
Column_name_n data type (size)
);

Example: To create a table STUDENT with rno, name, marks attributes.

 CREATE TABLE STUDENT (rno Number(2), name varchar2(20), marks number(3));
Now the table is created, to see the structure of the table, use DESC command.
 DESC STUDENT;
Name Null? Type
-----------------------------------------------
rno Number(2)
name Varchar2(20)
marks Number(3)
2) ALTER command: It is used to change the structure of a database object.

Syntax: ALTER TABLE <table-name> [ADD column_name data type (size)]
[MODIFY column_name data type (size)]
[DROP COLUMN column_name]
[RENAME COLUMN old-col-name to new-col-name]
Example: 1) To add a new column to an existing table:

SQL> ALTER TABLE STUDENT ADD (average Number (3,2));
2) To Modify the size of an existing column of a table:
SQL> ALTER TABLE STUDENT MODIFY rno Number (3)
3) To delete an existing column of a table:
SQL> ALTER TABLE STUDENT DROP COLUMN average;
4) To Rename an existing column of a table:
SQL> ALTER TABLE STUDENT RENAME COLUMN rno To rollno;
3)DROP Command: It is used to delete a database object.

Syntax:
DROP object-type object-name;
Example: 1. To delete a table: DROP TABLE STUDENT;
2. To delete an index: DROP INDEX dname-index;
3. To delete view: DROP VIEW emp-view;
4)TRUNCATE Command: It is used to delete rows (not the table’s structure) with auto
commit.
Syntax: TRUNCATE TABLE <table-name>;
Example: To delete rows from a table,

SQL> TRUNCATE TABLE STUDENT;
5)RENAME Command: It is used to rename the database object.

Syntax: RENAME <old- table-name> TO <new-table-name>;
Example: To rename a table,
SQL> RENAME STUDENT TO STD;
** Data manipulation Language (OR) DML commands **
DML commands are used to maintain and access a database, including updating , inserting,
modifying and querying data. It deals with data. The commands are: INSERT, UPDATE,
DELETE and SELECT.
1. INSERT Command: It is used to store a new record into a database table.
Syntax:
INSERT INTO <table-name> [(column1, column2, …… columnn)] values(value1, value2, ……. valuen);
(OR)
INSERT INTO <table name> [column-list] VALUES (value-list);
Example: 1. Take an EMPLOYEE table with the columns: eno, ename, job, sal, hiredate.
To insert a record into EMPLOYEE table:
SQL> insert into EMPLOYEE (eno, ename, job, sal, hiredate) values (101, ‘vasu’
‘clerk’, 7000, ’12-jan-2012’);
(OR)
SQL> insert into EMPLOYEE values (102, ‘sri’, ‘manager’, 10000, ’26-feb-2012’);
2. To insert a record that contains only eno, ename and sal:
SQL> insert into EMPLOYEE (eno, ename, sal) values (201, ‘ram’, 20000);
3. To insert a record through parameter substitution
SQL> insert into EMPLOYEE (eno, ename, job, sal) values (&eno, ‘&ename’, ‘&job’,
&sal); (OR)
SQL> insert into EMPLOYEE values(&eno, ‘&ename’, ‘&job’, &sal);
When we execute the above query, it will ask values from keyboard as follows:
Enter value for eno:104
Enter value for ename: Anji
Enter value for job: Asst manager
Enter value for sal: 15000
2. Update command: It is used to edit/change the values of attributes in a table.
Syntax: UPDATE <table-name> set column_name = value [,column_name = value, …….]
[WHERE condition];
Example: 1. To update (set) salary of ‘Ravi’ to 8000:

SQL> Update EMPLOYEE set sal=8000 WHERE ename=’Ravi’;
2. To Update salary, job of ‘vasu’
SQL> Update EMPLOYEE set sal=8000, job=‘Asstmanger’ where ename=’vasu’;

3. DELETE command: It is used to remove (delete) one or more rows from a table.
Syntax: DELETE FROM <table-name> [WHERE condition];
Example: 1. To delete the record of employee whose name is ‘Ravi’:

SQL> DELETE FROM EMPLOYEE WHERE ename=’Ravi’;
2. To delete all rows of a table
SQL> DELETE FROM EMPLOYEE; (or) DELETE EMPLOYEE;
4. SELECT command: It is used to retrieve data from a table, it allows filtering.
Syntax:
SELECT [DISTINCT] column-List FROM table-List
[WHERE condition]
[GROUP BY grouping-columns]
[HAVING group-condition]
[ORDER BY order by-columns];
Clause Description
WHERE It specifies which rows to retrieve.
GROUP BY It is used to arrange the data into groups.
HAVING It selects among the groups defined by the GROUP BY clause.
ORDER BY It specifies an order in which to return the rows.
Example: 1. To select complete record (information) from a table EMPLOYEE.

SQL> select * from EMPLOYEE;
2. To select the records of employees whose salary> =15000 from EMPLOYEE table.
SQL> select * from EMPLOYEE WHERE sal>=15000;
3. To select name, job, salary from EMPLOYEE table.
SQL> select ename, job, sal from EMPLOYEE;
** Transaction Control Language (OR) TCL commands **

TCL commands are used to control the transcations. TCL commands are: COMMIT,
ROLLABACK and SAVEPOINT.
1. Commit: It is used to make changes permanent on the storage when a transaction is
successfully completed.
Syntax: COMMIT;
Example: To delete records of clerks and save changes on EMPLOYEE table.

SQL> DELETE FROM EMPLOYEE WHERE JOB=’clerk’;
SQL> COMMIT;
We will not able to retrieve the records of clerk back.
2. ROLL back: It is used to cancel (undo) the changes (i.e. to restore the data).
Syntax: ROLLBACK; (OR) ROLLBACK to SAVEPAOINT savepoint_name;

3. Save point: It is used to make Limit (margin) for commit or roll back.
Syntax: SAVEPOINT savepointname;
Example: SAVEPOINT pl;
COMMIT
Transcation
DELETE
SAVEPOINT A
INSERT
UPDATE
SAVEPOINT B
INSERT
ROLLBACK ROLLBACK ROLLBACK
To SAVEPOINT B to SAVEPOINT A
** Data Control Language (OR) DCL commands **

DCL commands are used to control a database, when it is shared among multiple users. It
includes a set of privileges that can be granted to or revoke from a user. Some of the privileges
used here are: select, insert, delete, update, all, references and with grant. The commands are:
GRANT and REVOKE
1. GRANT command: It is used to give privileges (permissions) on database object to other
users.
Syntax: GRANT privilege_list/ role ON object TO user-list/public
[WITH GRANT OPTION];
Example: 1. To grant all permissions on EMPLOYEE table to everybody:

SQL> GRANT all ON EMPLOYEE TO PUBLIC;
2. To grant select, delete permissions on DEPT table to Ravi:
SQL> GRANT select, delete ON DEPT TO Ravi;
3. To grant all permission on DEPT table to ‘Kiran’ and ‘Kiran’ should be able to give
further grants on DEPT to other users.
SQL> GRANT all ON DEPT TO Kiran with GRANT OPTION;
2. REVOKE command: It is used to take the given privileges back (cancel) from the user.
Syntax:
REVOKE privilege-list/role ON object FROM user/public;
Example: 1. To revoke update privilege from everybody on EMPLOYEE table:

SQL> REVOKE update ON EMPLOYEE from public;

2. To revoke all privileges on EMPLOYEE from Ravi
SQL> REVOKE all ON EMPLOYEE from Ravi;
** SQL data types **

Data type represents the type of data an attribute or a variable will hold. The common data
types used in are listed below.
1. Number.
2. Char.
3. Varchar/varchar 2.
4. Long.
5. Date/Time.
6. Lob (Large Objects).
1. Number: “Number” data type is used to define an attribute to store number values that is
both integer and real (Floating point) numbers.
Syntax: Number (p, q)
P represents precision (P digits)

q Represents scale
i) Integer: Only precision is required for integers.
Syntax: Number (p)
The range of precision is 138 (1< p< 38)

ii) Float (or) real: Both precision and scale are required for real numbers. The range of
scale is -84 127. (- 84 < q < 127).
2. Char: It is used to define an attribute to store string values. The default size is 1 and
maximum is 2000.
Syntax: Char (n)
It is a fixed length data type, which means the memory is allocated based on the size defined
by the user but not on the value assigned.
3. Varchar/varchar2: It is also used to define attribute to store the string values. The
minimum size is 1 and maximum is 4000 bytes.
Syntax: Varchar2 (n)
It is a variable length data type, which means the memory is dynamically allocated based
on the value given by the user but not on its size defined.
4. Long: It is used to define an attribute to store the text values, with the size larger than 4000
characters. Maximum size 2GB and it is a variable-length data type.
5. Data/Time: It is used to define an attribute to store data and time values given by the user.
Default format for date is: DD-MON-YY (or) DD-MON-YYYY.
6. Lob (Large Objects): Maximum size of LOB’s is: 4GB.

i) CLOB Character Large Object (stores large text).
ii) BLOB Binary Large Object (stores multimedia data).
** Queries using Order by clause (OR) sorting records of a table **

The “order by” clause can be used to sort the records of a table. If we use the “order by”
clause it must be the last clause of the SQL statement. We can specify an expression, an alias or
a column position as the sort condition.
 ASC: ascending order, default.
 DESC: descending order.
Syntax: SELECT expr from table [WHERE condition]

[Order by {column, expr, numeric-position} [asc/desc]];
Example: 1. To List the employee details in ascending order according to salary.

SQL> select * from EMP order by sal;
2. To List the employee details in descending order according to names.
SQL> Select * from EMP order by ename DESC;
3. To List employee details in ascending order according to name, if there is a
matching name, then sort by empno.
SQL> select * from EMP order by ename, empno;
** Queries using GROUP BY clause (OR) Grouping on data **

Sometimes, we need to perform grouping of data in cases like finding out number of
employees in each department. The GROUP BY clause is commonly used in conjunction with
the aggregate functions to group the records by one or more columns.
If we want sets of grouped data to satisfy a condition we use “HAVING” clause. So,
usage of GROUP BY is mandatory when using HAVING clause in queries.
Syntax:
SELECT columns, aggregate-function (column-name)
FROM table_name
[WHERE column-name operator value]
GROUP BY column-name
HAVING aggregate-function (column-name) operator value;
Examples on GROUP BY:
1. To get number of employees in each department:
SQL> select deptno, count (*) from EMP GROUP BY deptno;
2. To list the total salary, maximum salary, minimum salary and the average salary of
employees job-wise.
SQL> select job, SUM(sal), MAX(sal), MIN(sal), AVG(sal) from EMP GROUP BY job;

Examples on HAVING:
1. To List the department number, whose maximum salary is greaterthan 1000.
SQL> select deptno, MAX(sal) from EMP GROUP BY deptno HAVING MAX(sal) > 1000;
2. To List the jobs, which are done by minimum of 2 persons.
SQL> select job from EMP GROUP BY job HAVING count (*) >=3;
** Nested Queries (OR) Sub Queries **

A Sub query or Inner query or a Nested query is a query within another SQL query and
embedded within the WHERE clause.
A sub query is used to return data that will be used in the main query as a condition to
further restrict the data to be retrieved.
There are a few rules that sub queries must follow:
 Sub queries must be enclosed within parentheses.
 A sub query can have only one column in the SELECT clause, unless multiple columns
are in the main query for the sub query to compare its selected columns.
 Sub queries that return more than one row can only be used with multiple value operators
such as the IN operator.
 The BETWEEN operator cannot be used with a sub query. However, the BETWEEN
operator can be used within the sub query.
 A sub query is a query inside a query.
 A sub query is normally expressed inside parenthesis.
 The first query in the SQL statement is known as outer query.
 The query inside the SQL statement is known as inner query.
 The inner query is executed first.
 The output of an inner query is used as the input for the outer query.
 When a sub query is used in the condition of another sub query of a SQL statement then
it is called as “nested sub query”
 A sub query may produce one or more records comprising one or more columns.
 A sub query is always a “SELECT” statement only; where as a simple query can be any
kind of statement.
EX: 1.To print the name of the employee who draws maximum salary.
 Select ename from EMP where sal= (select max(sal) from EMP);
2. To find the second maximum salary of the employees in EMP table.
 Select max(sal) from EMP where sal(select max(sal) from EMP);
** Joins (OR) Multiple table queries **

If a query is based on multiple tables to fetch its result then it is called multiple table
query. Multiple table queries uses a mechanism called “Joins”.

Joins: Joins are used to fetch the information from multiple tables. Join is a relation between
the common fields of two or more tables. Types of joins are:
1. Equi join (Inner join)
2. Non-Equi join (conditional)
3. Outer join
4. Self join (recursive join)
5. Cartesian product joins.
Consider the following table, for demonstration of joins;
EMP DEPT
Eno Ename Deptno Deptno deptname
1 Abhi 10 10 BCom
2 Balu 20 20 BSc
3 Charan 30 30 BBM
4 dhoni 40 50 BCA
Equi join: A join in which the joining condition is based on equality between the values in
common columns.
EX: Select eno, ename, EMP.Deptno, DEPT.deptno, deptname from EMP, DEPT where
EMP.deptno=DEPT.deptno;
ENO ENAME EMP. DEPTNO DEPT. DEPTNO DEPTNAME
1 Abhi 10 10 Bcom
2 Balu 20 20 BSc
3 charan 30 30 BBM
Non-Equi join: A join in which joining condition is based on non-equality between the values
in common columns.
EX: Select eno, ename, deptname from EMP, DEPT where EMP.Deptno!= DEPT.deptno;
ENO ENAME DEPTNAME
1 Abhi BSc
1 Abhi BBM
1 Abhi BCA
2 Balu BCom
2 Balu BBM
2 Balu BCA
3 Charan BCom
3 Charan BSc
3 Charan BCA
4 Dhoni BCom
4 Dhoni BSc
4 Dhoni BBM
4 Dhoni BCA

Outer join: A join in which that do not have matching values in common columns is also
included in the result. It includes left outer and right outer join.
i) Left outer join: It gives all the values of left table plus matched values from the right table.
Following query displays all records from EMP table even if there is no matching deptno in
DEPT table.
EX: select eno, ename, EMP.deptno, deptname from EMP, DE{T WHERE
EMP.deptno=DEPT.deptno (+);
ENO ENAME EMP. DEPTNO DEPTNAME
1 Abhi 10 BCom
2 Balu 20 BSc
3 Charan 30 BBM
4 Dhoni 40 ------
ii) Right outer join: It gives all the values of right table plus matched values from the left table.
Following query displays all records from DEPT table even if there is no matching deptno in
EMP table.
EX: Select eno, ename, DEPT.deptno, deptname from EMP, DEPT WHERE
EMP.deptno(+)= DEPT.deptno;
ENO ENAME DEPT. DEPTNO DEPTNAME
1 Abhi 10 BCom
2 Balu 20 BSc
3 Charan 30 BBM
-- ------ 50 BCA
Self join: This is a join in which a table is joined to itself, where joining condition is based on
columns of a same table.
EX: Select e.eno, d.ename from EMP e, EMP d where e.mid= d.eno;
EMP
eno ename mid
1 A 3
2 B 3
3 C 3
4 D 4
5 E 4
Cartesian product join: A join in which all possible combinations of all the rows of first table
with each row of second table appear.
EX: Select ename, deptname from EMP, DEPT;
ENAME DEPTNAME
Abhi BCom
Balu BCom
Charan BCom
Dhoni BCom
Abhi BSc
Balu BSc
Charan BSc
Dhoni BSc
Abhi BBM
Balu BBM
Charan BBM
Dhoni BBM
Abhi BCA
Balu BCA
Charan BCA
Dhoni BCA
** Views **
View is a virtual table on one or more tables. The table on which a view is created is
called as a “base table”.
We can provide limitation on updating in base table through a view. If any changes made
to the base table those changes are also reflected in view.
Advantages of views:
 We can provide security to the data.
 We can provide limitation on data.
 We can provide customized view for the data.
 View uses little storage area.
 It allows different users to view the same data in different ways at the same time.
 It does not allow direct access to the tables of the data dictionary.
Disadvantages of views:
 It can’t be indexed.
 Takes time for creating view each time.
 We cannot use DML operations on View, if there is more than one table.
 When table is dropped view becomes inactive.
 View is a database object, so it occupies the space.
 Without table, view will not work.
 Updation is possible for simple view but not for complex views, they are read only type
of views.
Syntax: Create view <view-name> as select columns from <table-name> [WHERE condition];

Example: Create view v1 as select * from EMP where job=’clerk’;

Above example creates view “v1”, it contains the records of all clerk only.
** Sequences **
A sequence is a database object that is used to generate the sequential numeric values for
any column of the base table. It is useful when we need to create a unique number to act as a
primary key.
Characteristics of sequences:
 Sequences are independent objects.
 Sequences have a name and can be used any where.
 Sequences are not tied (linked) to a table.
 Sequences can be created and deleted any time we want..
Syntax: Create sequence <sequence-name> [MINVALUE value] [MAXVALUE value]

[start with value] [increment by value] [cache/no cache];
Ex: create sequence SQ1 MINVALUE 1 MAXVALUE 100 start with 1 increment by 1;
In the above example, sequence object “SQ1” is created and it generates the number like
1, 2, 3, 4, ……..100
To retrieve numbers from sequence object, we use following statement:
Sequencename.nextval;
Example: There is a table called “STUDENT” with two columns sno, sname.
The following insert command uses sequence to insert values into “sno” automatically.
 Insert into STUDENT values (SQ1.nextval, ‘Ravi’);
 Insert into STUDENT values (SQ1.nextval, ‘Ram’);
 Insert into STUDENT values (SQ1.nextval, ‘RAJ’);
The STUDENT will be:

Sno sname
1 Ravi
2 Raj
3 Ram
** Indexes **
 An index is an object which is used to improve performance during retrieval of records.
 It helps to retrieve the data quickly from the tables.
 When column contains a large number of NULL values, then we can create Index.
 It is a structure that provides faster access to the rows of a table based on the values of one or
more columns.

 If a table is very small, then we cannot create Index.

 Indexes are very useful and make the data access very fast.
Syntax: CREATE INDEX index_name ON table_name;
Example: CREATE UNIQUE INDEX emp_ename_index ON Employee(Ename);

 In the above example, the UNIQUE keyword is used when combined values of index
should be unique. It does not allowed duplicate values to be inserted into the table.
 We created an Index on Employee name (Ename) column in the Employee table.
 Indexes can be dropped explicitly using the DROP INDEX command.
DROP INDEX emp_ename_index;
** Synonyms **
A synonym is an alternative name for objects such as tables, views, sequences, stored
procedures, and other database objects.
We generally use synonyms when we are granting access to an object from another
schema and we don't want the users to have to worry about knowing which schema owns the
object.
Create Synonym (or Replace): We may wish to create a synonym so that users do not have to
prefix the table name with the schema name when using the table in a query.
Syntax: The syntax to create a synonym in Oracle is:
CREATE [OR REPLACE] [PUBLIC] SYNONYM [schema .] synonym_name

FOR [schema .] object_name [@ dblink];
In the above syntax:
OR REPLACE
Allows us to recreate the synonym (if it already exists) without having to issue a DROP
synonym command.
PUBLIC
It means that the synonym is a public synonym and is accessible to all users. Remember
though that the user must first have the appropriate privileges to the object to use the
synonym.
schema
The appropriate schema. If this phrase is omitted, Oracle assumes that we are referring to
our own schema.
object_name
The name of the object for which we are creating the synonym. It can be one of the
following: Table/ view/ sequence/ stored procedure/ function/ package/ materialized
view/ java class schema object/ user-defined object/ synonym
For example:
CREATE OR REPLACE PUBLIC SYNONYM suppliers FOR app.suppliers;
Drop synonym: Once a synonym has been created in Oracle, we might at some point need to
drop the synonym.
Syntax: The syntax to drop a synonym in Oracle is:
DROP [PUBLIC] SYNONYM [schema .] synonym_name [force];
In the above syntax:
PUBLIC
Allows us to drop a public synonym. If we have specified PUBLIC, then we don't specify
a schema.
force
It will force Oracle to drop the synonym even if it has dependencies. It is probably not a
good idea to use force as it can cause invalidation of Oracle objects.
For example: DROP PUBLIC SYNONYM suppliers;
This DROP statement would drop the synonym called suppliers that we defined earlier.
** Table handling**
The SQL DROP TABLE statement is used to remove a table definition and all the data,
indexes, triggers, constraints and permission specifications for that table.
NOTE − We should be very careful while using this command because once a table is deleted
then all the information available in that table will also be lost forever.
Syntax: The basic syntax of this DROP TABLE statement is as follows:
DROP TABLE table_name;
Example: Let us first verify the CUSTOMERS table and then we will delete it from the
database as shown below:
SQL> DESC CUSTOMERS;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | | |
| NAME | varchar(20) | NO | | | |
| AGE | int(11) | NO | | | |
| ADDRESS | char(25) | YES | | NULL | |
| SALARY | decimal(18,2) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
This means that the CUSTOMERS table is available in the database, so let us now drop it
as shown below.
SQL> DROP TABLE CUSTOMERS;
Query OK, 0 rows affected (0.01 sec)
Now, if we would try the DESC command, then we will get the following error:
SQL> DESC CUSTOMERS;
ERROR 1146 (42S02): Table 'TEST.CUSTOMERS' doesn't exist
Here, TEST is the database name which we are using for our examples.
UNIT- IV
** Transaction **
 Database Transaction is an atomic unit that contains one or more SQL statements.
 It is a series of operations that performs as a single unit of work against a database.
 It has a beginning and an end to specify its boundary.
Let's take a simple example of bank transaction, Suppose a Bank clerk transfers Rs. 1000
from X's account to Y's account.
X's Account
open-account (X)
prev-balance = X.balance
curr-balance = prev-balance – 1000
X.balance = curr-balance
close-account (X)
Decreasing Rs. 1000 from X's account, saving new balance that is current balance and
after completion of transaction the last step is closing the account.
Y's Account
open-account (Y)
prev - balance = Y.balance
curr - balance = prev-balance + 1000
Y.balance = curr-balance
close-account (Y)
Adding Rs. 1000 in the Y's account and saving new balance that is current balance and after
completion of transaction the last step is closing the account.
The above example defines a very simple and small transaction that tells how the transaction
management actual works.
** Transaction Properties **
Following are the Transaction Properties, referred to by an acronym ACID properties. These
properties guarantee that the database transactions are processed reliably.
1.Atomicity
2.Consistency
3.Isolation
4.Durability
1. Atomicity:
 Atomicity defines that all operations of the transactions are either executed or none.
 Atomicity is also known as 'All or Nothing', it means that either performs the operations
or not performs at all.
 It is maintained in the presence of deadlocks, CPU failures, disk failures, database and
application software failures.
2. Consistency:
 Consistency defines that after the transaction is finished, the database must remain in a
consistent state.
 It preserves consistency of the database.

 If execution of transaction is successful, then the database remains in a consistent state. If
the transaction fails, then the transaction will be rolled back and the database will be
restored to a state consistent.
3. Isolation
 Isolation defines that the transactions are securely and independently processed at the
same time without interference.
 Isolation property does not ensure the order of transactions.
 The operations cannot access or see the data in an intermediate state during a transaction.
 Isolation is needed when there are concurrent transactions occurring at the same time.
4. Durability
 Durability states that after completion of transaction successfully, the changes are
required for the database.
 Durability holds its latest updates even if the system fails or restarts.
 It has the ability to recover committed transaction updates even if the storage media fails.
** Transaction States **
A transaction is a small unit of program which contains several low level tasks. It is an event
which occurs on the database. It has the following states,
1. Active
2. Partially Committed
3. Failed
4. Aborted
5. Committed
1. Active: Active is the initial state of every transaction. The
transaction stays in Active state during execution.
2. Partially Committed: Partially committed state defines
that the transaction has executed the final statement.
3. Failed: Failed state defines that the execution of the
transaction can no longer proceed further.
4. Aborted: Aborted state defines that the transaction has
rolled back and the database is being restored to the
consistent state.
5. Committed: If the transaction has completed its
execution successfully, then it is said to be committed.

** Concurrency Control **
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions. We have
concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent
transactions.
Methods for Concurrency control:
There are main three methods for concurrency control. They are as follows:
1.Locking Methods
3.Optimistic Methods
3.Time-stamp Methods
1. Locking Methods of Concurrency Control: "A lock is a variable, associated with the data
item, which controls the access of that data item." Locking is the most widely used form of the
concurrency control. Locks are further divided into three fields:
i) Lock Granularity
ii) Lock Types (or) Locking protocols
iii) Deadlocks
i) Lock Granularity: A database is basically represented as a collection of named data
items. The size of the data item chosen as the unit of protection by a concurrency control
program is called GRANULARITY. Locking can take place at the following level:
 Database level.
 Table level.
 Page level.
 Row (Tuple) level.
 Attributes (fields) level.
ii) Lock Types: The DBMS mainly uses following types of locking techniques.
a. Binary Locking
b. Shared / Exclusive Locking
c. Two - Phase Locking (2PL)
a. Binary Locking: A binary lock can have two states or values: locked and unlocked (or 1 and
0, for simplicity). A distinct lock is associated with each database item X.
If the value of the lock on X is 1, item X cannot be accessed by a database operation that
requests the item. If the value of the lock on X is 0, the item can be accessed when requested.
We refer to the current value (or state) of the lock associated with item X as LOCK(X).
Two operations, lock_item and unlock_item, are used with binary locking.
b. Shared / Exclusive Locking:

Shared lock: These locks are referred as read locks, and denoted by 'S'.
If a transaction T has obtained Shared-lock on data item X, then T can read X, but cannot write
X. Multiple Shared lock can be placed simultaneously on a data item.

Exclusive lock: These Locks are referred as Write locks, and denoted by 'X'.
If a transaction T has obtained Exclusive lock on data item X, then T can be read as well as
write X. Only one Exclusive lock can be placed on a data item at a time. This means multiple
transactions does not modify the same data simultaneously.
c. Two-Phase Locking (2PL):
Two-phase locking is the standard protocol used to maintain level 3 consistency. 2PL
defines how transactions acquire and relinquish locks. The essential discipline is that after a
transaction has released a lock it may not obtain any further locks. 2PL has the following two
phases:
A growing phase: in which a transaction acquires all the required locks without unlocking any
data. Once all locks have been acquired, the transaction is in its locked
point.
A shrinking phase: in which a transaction releases all locks and cannot obtain any new lock.
A transaction shows Two-Phase Locking technique.
Normal Locking 2- Phase Locking
Lock (A) Lock (A)
Growing Phase
Read (A) Lock (B) Locked point
A = A - 100 Read (A)
Write (A) A = A - 100
Unlock (A) Write (A)
Operations
Lock (B) Read (B)
Read (B) B=B+100
B=B+100 Write (B)
Write (B) Unlock (A)
Shrinking Phase
Unlock (B) Unlock (B)
iii) Deadlocks: A deadlock is a condition where in two or more tasks are waiting for each
other in order to be finished but none of the task is willing to give up the resources that other
task needs. In this situation no task ever gets finished and is in waiting state forever.
Neither transaction can continue because each transaction in the set is on a waiting queue,
waiting for one of the other transactions in the set to release the lock on an item. Transactions
whose lock requests have been refused are queued until the lock can be granted.
A deadlock is also called a circular waiting condition where two transactions are waiting
(directly or indirectly) for each other. Thus in a deadlock, two transactions are mutually
excluded from accessing the next record required to complete their transactions, also called a
deadly embrace.
Example:
A deadlock exists two transactions A and B exist in the following example:
Transaction A = access data items X and Y
Transaction B = access data items Y and X
Here, Transaction-A has acquired lock on X and is waiting to acquire lock on Y. While,
Transaction-B has acquired lock on Y and is waiting to acquire lock on X. But, none of them
can execute further.
Transaction-A Time Transaction-B
--- t0 ---
Lock (X) (acquired lock on X) t1 ---
--- t2 Lock (Y) (acquired lock on Y)
Lock (Y) (request lock on Y) t3 ---
Wait t4 Lock (X) (request lock on X)
Wait t5 Wait
Wait t6 Wait
Wait t7 Wait
Deadlock Detection and Prevention:

Deadlock detection: This technique allows deadlock to occur, but then, it detects it and solves
it. Here, a database is periodically checked for deadlocks. If a deadlock is detected, one of the
transactions, involved in deadlock cycle, is aborted. Other transaction continues their execution.
An aborted transaction is rolled back and restarted.
Resource scheduler is one that keeps the track of resources allocated to and requested by
processes. Thus, if there is a deadlock it is known to the resource scheduler. This is how a
deadlock is detected.
Once a deadlock is detected it is being corrected by following methods:
 Terminating processes involved in deadlock: Terminating all the processes involved in

deadlock or terminating process one by one until deadlock is resolved can be the solutions
but both of these approaches are not good. Terminating all processes cost high and partial
work done by processes gets lost. Terminating one by one takes lot of time because each
time a process is terminated, it needs to check whether the deadlock is resolved or not. Thus,
the best approach is considering process age and priority while terminating them during a
deadlock condition.
 Resource Preemption: Another approach can be the preemption of resources and allocation
of them to the other processes until the deadlock is resolved.
Deadlock Prevention: Deadlock prevention technique avoids the conditions that lead to
deadlocking. It requires that every transaction lock all data items it needs in advance. If any of
the items cannot be obtained, none of the items are locked. In other words, a transaction
requesting a new lock is aborted if there is the possibility that a deadlock can occur.
Thus, a timeout may be used to abort transactions that have been idle for too long. This
is a simple but indiscriminate approach. If the transaction is aborted, all the changes made by
this transaction are rolled back and all locks obtained by the transaction are released. The
transaction is then rescheduled for execution.
Deadlock prevention technique is used in two-phase locking. We have learnt that if all
the four Coffman conditions hold true then a deadlock occurs so preventing one or more of
them could prevent the deadlock. Coffman conditions are:
 Removing mutual exclusion: All resources must be sharable that means at a time more than
one processes can get a hold of the resources. That approach is practically impossible.
 Removing hold and wait condition: This can be removed if the process acquires all the
resources that are needed before starting out. Another way to remove this to enforce a rule of
requesting resource when there are none in held by the process.
 Preemption of resources: Preemption of resources from a process can result in rollback and
thus this needs to be avoided in order to maintain the consistency and stability of the system.
 Avoid circular wait condition: This can be avoided if the resources are maintained in a
hierarchy and process can hold the resources in increasing order of precedence. This avoid
circular wait. Another way of doing this to force one resource per process rule – A process
can request for a resource once it releases the resource currently being held by it. This avoids
the circular wait.
Deadlock Avoidance:
Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock
occurrence. There are two algorithms for deadlock avoidance.
 Wait/Die
 Wound/Wait
Here is the table representation of resource allocation for each algorithm. Both of these
algorithms take process age into consideration while determining the best possible way of
resource allocation for deadlock avoidance.
Wait/Die Wound/Wait
Older process needs a resource held by younger
Older process waits Younger process dies
process
Younger process needs a resource held by older Younger process Younger process
process dies waits
** Optimistic Methods of Concurrency Control **

The optimistic method of concurrency control is based on the assumption that conflicts of
database operations are rare and that it is better to let transactions run to completion and only
check for conflicts before they commit.
An optimistic concurrency control method is also known as validation or certification
methods. No checking is done while the transaction is executing. The optimistic method does
not require locking or time stamping techniques. Instead, a transaction is executed without
restrictions until it is committed. In optimistic methods, each transaction moves through the
following phases:
a. Read phase.
b. Validation or certification phase.
c. Write phase.
a. Read phase:
In a Read phase, the updates are prepared using private (or local) copies (or versions) of the
granule. In this phase, the transaction reads values of committed data from the database,
executes the needed computations, and makes the updates to a private copy of the database
values. All update operations of the transaction are recorded in a temporary update file, which
is not accessed by the remaining transactions.
b. Validation or certification phase:
In a validation (or certification) phase, the transaction is validated to assure that the changes
made will not affect the integrity and consistency of the database. If the validation test is
positive, the transaction goes to the write phase. If the validation test is negative, the transaction
is restarted, and the changes are discarded. If conflicts are detected in this phase, the transaction
is aborted and restarted. The validation algorithm must check that the transaction has:
 Seen all modifications of transactions committed after it starts.
 Not read granules updated by a transaction committed after its start.
c. Write phase:
In a Write phase, the changes are permanently applied to the database and the updated
granules are made public. Otherwise, the updates are discarded and the transaction is restarted.
This phase is only for the Read-Write transactions and not for Read-only transactions.
Advantages of Optimistic Methods for Concurrency Control:

i. This technique is very efficient when conflicts are rare. The occasional conflicts result in
the transaction roll back.

ii. The rollback involves only the local copy of data, the database is not involved and thus
there will not be any cascading rollbacks.
Problems of Optimistic Methods for Concurrency Control:
i. Conflicts are expensive to deal with, since the conflicting transaction must be rolled back.
ii. Longer transactions are more likely to have conflicts and may be repeatedly rolled
back because of conflicts with short transactions.
Applications of Optimistic Methods for Concurrency Control:
i. Only suitable for environments where there are few conflicts and no long transactions.
ii. Acceptable for mostly Read or Query database systems that require very few update
transactions
** Timestamp-based Protocols **
The most commonly used concurrency protocol is the timestamp based protocol. This
protocol uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions
at the time of execution, whereas timestamp-based protocols start working as soon as a
transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by
the age of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is
two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know when the last ‘read and write’ operation was performed on the data item.
** Serialisable Schedules **
In a database system, we can have number of transaction processing. Related transactions
will be processed one after another. There are some transactions processes in parallel. Some of
the transactions can be grouped together.
A schedule is a process of grouping the transactions into one and executing them in a
predefined order. A schedule is required in a database because when some transactions execute
in parallel, they may affect the result of the transaction – means if one transaction is updating
the values which the other transaction is accessing, then the order of these two transactions will
change the result of second transaction. Hence a schedule is created to execute the transactions.
A schedule is called serial schedule, if the transactions in the schedule are defined to execute
one after the other.
Even when we are scheduling the transactions, we can have two transactions in parallel,
if they are independent. But if they are dependent by any chance, then the results will change.
For example, say one transaction is updating the marks of a student in one subject; meanwhile
another transaction is calculating the total marks of a same student. If the second transaction is
executed after first transaction is complete, then both the transaction will be correct. But what if
second transaction runs first? It will have wrong total mark.
This type of incorrect processing of transaction needs to be handled in the system.

Parallel execution of transaction is allowed in the database only if there is any equivalence
relation among the transactions. There are three types of equivalence relation among the
transactions – Result, view and Conflict Equivalence.
Result Equivalence: - If the two transactions generate same result after their execution then it
is called as result equivalence. For example, one transaction is updating the marks of Student X
and the other transaction is to insert a new student. Here both the transactions are totally
independent and their order of execution does not matter. Whichever order they are executed;
the result is the same. Hence it is called result equivalence transactions.
View Equivalence: -Two schedules are said to be view equivalence, if the transaction in one
schedule is same as the one in other. That means, both the transactions in two schedules
perform same tasks. For example, say schedule1 has transaction to insert new students into
STUDENT table and second schedule has the transaction to maintain the old student records in
a new table called OLD_STUDENT. In both the schedules student details are inserted, but
different tables (it does not matter if it is same table). Such schedules are called as view
equivalence schedule.
Conflict Equivalence: - In this case both schedules will have different set of transactions, but
they would be accessing same data and one of the schedules will be inserting/updating the
records. In this equivalence, both the schedules will have conflicting set of transactions. Above
example of updating the marks of one student by one transaction and calculating the total marks
at the same time is a conflicting equivalence.
In all these cases if the transaction is serialized, then the issues can be resolved.
** Database Failures **
A database includes a huge amount of data and transaction. If the system crashes or failure
occurs, then it is very difficult to recover the database.
There are some common causes of failures (kinds of failures) such as,
1. System Crash
2. Transaction Failure
3. Network Failure
4. Disk Failure
5. Media Failure
Each transaction has ACID property. If we fail to maintain the ACID properties, it is the
failure of the database system.
1. System Crash:
 System crash occurs when there is a hardware or software failure or external factors like
a power failure.
 The data in the secondary memory is not affected when system crashes because the
database has lots of integrity. Checkpoint prevents the loss of data from secondary
memory.
2. Transaction Failure:
A transaction has to abort when it fails to execute or when it reaches a point from where
it can’t go any further. This is called transaction failure where only a few transactions or
processes are hurt.
Reasons for a transaction failure could be:
 Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
 System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
3. Network Failure:
 A network failure occurs when a client – server configuration or distributed database
system are connected by communication networks.
4. Disk Failure:
 Disk Failure occurs when there are issues with hard disks like formation of bad sectors,
disk head crash, unavailability of disk etc.
5. Media Failure:
 Media failure is the most dangerous failure because, it takes more time to recover than
any other kind of failures.
 A disk controller or disk head crash is a typical example of media failure.
 Natural disasters like floods, earthquakes, power failures, etc. damage the data.
** Failure controlling methods **

The most common failure controlling methods are:
1. Database Backup
2. Database Recovery
** Database Backup **
Database Backup is storage of data that means the copy of the data. It is a safeguard against
unexpected data loss and application errors.
 It protects the database against data loss.
 If the original data is lost, then using the backup it can reconstructed.
The backups are divided into two types,
1. Physical Backup
2. Logical Backup
1. Physical backup:
 Physical Backups are the backups of the physical files used in storing and recovering
your database, such as data files, control files and archived redo logs, log files.
 It is a copy of files storing database information to some other location, such as disk,
some offline storage like magnetic tape.
 Physical backups are the foundation of the recovery mechanism in the database.

 Physical backup provides the minute details about the transaction and modification to the
database.
2. Logical backup:
 Logical Backup contains logical data which is extracted from a database.
 It includes backup of logical data like views, procedures, functions, tables, etc.
 It is a useful supplement to physical backups in many circumstances but not a sufficient
protection against data loss without physical backups, because logical backup provides
only structural information.
Importance of Backup:
 Planning and testing backup helps against failure of media, operating system, software
and any other kind of failures that cause a serious data crash.
 It determines the speed and success of the recovery.
 Physical backup extracts data from physical storage (usually from disk to tape).
Operating system is an example of physical backup.
 Logical backup extracts data using SQL from the database and store it in a binary file.
 Logical backup is used to restore the database objects into the database. So the logical
backup utilities allow DBA (Database Administrator) to back up and recover selected
objects within the database.
** Storage of Data **
Data storage is the memory structure in the system. The storage of data is divided into
three categories:
1. Volatile Memory
2. Non – Volatile Memory
3. Stable Memory
1. Volatile Memory
 Volatile memory can store only a small amount of data. For example. Main memory,
cache memory etc.
 Volatile memory is the primary memory device in the system and placed along with the
CPU.
 In volatile memory, if the system crashes, then the data will be lost.
 RAM is a primary storage device which stores a disk buffer, active logs and other related
data of a database.
 Primary memory is always faster than secondary memory.
 When we fire a query, the database fetches a data from the primary memory and then
moves to the secondary memory to fetch the record.
 If the primary memory crashes, then the whole data in the primary memory is lost and
cannot be recovered.
 To avoid data loss, create a copy of primary memory in the database with all the logs and
buffers, create checkpoints at several places so the data is copied to the database.
2. Non – Volatile Memory

 Non – volatile memory is the secondary memory.
 These memories are huge in size, but slow in processing. For eg. Flash memory, hard
disk, magnetic tapes etc.
 If the secondary memory crashes, whole data in the primary memory is lost and cannot be
recovered.
3. Stable Memory
 Stable memory is the third form of the memory structure and same as non-volatile
memory.
 In stable memory, copies of the same non – volatile memories are stored in different
places, because if the system crashes and data loss occurs, the data can be recovered from
other copies.
** Database recovery Techniques **

Recovery is the process of restoring a database to the correct state in the event of a
failure. It ensures that the database is reliable and remains in consistent state in case of a failure.
Database recovery can be classified into two parts;
1. Rolling Forward applies redo records to the corresponding data blocks.
2. Rolling Back applies rollback segments to the data files. It is stored in transaction tables.
3. We can recover the database using Log–Based Recovery and concurrent transactions.
Log-Based Recovery:
 Logs are the sequence of records that maintain the records of actions performed by a
transaction.
 In Log – Based Recovery, log of each transaction is maintained in some stable storage. If
any failure occurs, it can be recovered from there to recover the database.
 The log contains the information about the transaction being executed, values that have
been modified and transaction state.
 All these information will be stored in the order of execution.
Example: Assume a transaction to modify the address of an employee. The following logs are
written for this transaction,
Log 1: Transaction is initiated, writes 'START' log.
Log: <Tn START>
Log 2: Transaction modifies the address from 'Pune' to 'Mumbai'.
Log: <Tn Address, 'Pune', 'Mumbai'>
Log 3: Transaction is completed. The log indicates the end of the transaction.
Log: <Tn COMMIT>
There are two methods of creating the log files and updating the database,
1. Deferred Database Modification
2. Immediate Database Modification
1. In Deferred Database Modification, all the logs for the transaction are created and stored
into stable storage system. In the above example, three log records are created and stored it in
some storage system; the database will be updated with those steps.
2. In Immediate Database Modification, after creating each log record, the database is
modified for each step of log entry immediately. In the above example, the database is modified
at each step of log entry that means after first log entry, transaction will hit the database to fetch
the record, then the second log will be entered followed by updating the employee's address,
then the third log followed by committing the database changes.
Recovery with Concurrent Transaction:
 When two transactions are executed in parallel, the logs are interleaved. It would become
difficult for the recovery system to return all logs to a previous point and then start
recovering.
 To overcome this situation 'Checkpoint' is used.
Checkpoint
 Checkpoint acts like a benchmark.
 Checkpoints are also called as Syncpoints or Savepoints.
 It is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage system.
 It declares a point before which the database management system was in consistent state
and all the transactions were committed.
 It is a point of synchronization between the database and the transaction log file.
 It involves operations like writing log records in main memory to secondary storage,
writing the modified blocks in the database buffers to secondary storage and writing a
checkpoint record to the log file.
 The checkpoint record contains the identifiers of all transactions that are active at the
time of the checkpoint.
Recovery
 When concurrent transactions crash and recover, the checkpoint is added to the
transaction and recovery system recovers the database from failure in following manner,
1. Recovery system reads the log files from end to start checkpoint. It can reverse the
transaction.
2. It maintains undo log and redo log.
3. It puts the transaction in the redo log if the recovery system sees a log <Tn, Commit>.
4. It puts the transaction in undo log if the recovery system sees a log with <Tn, Start>.
 All the transactions in the undo log are undone and their logs are removed.
 All the transactions in the redo log and their previous logs are removed and then redone
before saving their logs.
** Database errors **
There are mainly two types of database errors. They are:
1. Logical errors
2. System errors
 Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
 System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
** Database Security **
Database security refers to the collective measures used to protect and secure a database
or database management software from illegitimate use and malicious threats and attacks.
It is a broad term that includes a multitude of processes, tools and methodologies that
ensure security within a database environment.
Database and functions can be managed by two different modes of security controls:
1. Authentication
2. Authorization
Authentication: Authentication is the process of confirming that a user logs in only in
accordance with the rights to perform the activities he is authorized to perform. User
authentication can be performed at operating system level or database level itself. By using
authentication tools for biometrics such as retina and figure prints are in use to keep the
database from hackers or malicious users.
The database security can be managed from outside the database system. Here is some type
of security authentication process:
 Based on Operating System authentications.
 Lightweight Directory Access Protocol (LDAP)
The security service is a part of operating system. For Authentication, it requires two
different credentials; those are userid or username, and password.
Authorization: We can access the Database and its functionality within the database system,
which is managed by the Database manager. Authorization is a process managed by the
Database manager. The manager obtains information about the current authenticated user, that
indicates which database operation the user can perform or access.
Here are different ways of permissions available for authorization:
Primary permission: Grants the authorization ID directly.
Secondary permission: Grants to the groups and roles if the user is a member.
Public permission: Grants to all users publicly.
Context-sensitive permission: Grants to the trusted context role.
Authorization can be given to users based on the categories below:
 System-level authorization
 System administrator [SYSADM]
 System Control [SYSCTRL]
 System maintenance [SYSMAINT]
 System monitor [SYSMON]

Authorities provide controls within the database. Other authorities for database include with
LDAD and CONNECT.
 Object-Level Authorization: Object-Level authorization involves verifying privileges
when an operation is performed on an object.
 Content-based Authorization: User can have read and write access to individual rows
and columns on a particular table using Label-based access Control [LBAC].
Database tables and configuration files are used to record the permissions associated with
authorization names. When a user tries to access the data, the recorded permissions verify the
following permissions:
 Authorization name of the user
 Which group belongs to the user
 Which roles are granted directly to the user or indirectly to a group
 Permissions acquired through a trusted context.
While working with the SQL statements, the Database authorization model considers the
combination of the following permissions:
 Permissions granted to the primary authorization ID associated with the SQL statements.
 Secondary authorization IDs associated with the SQL statements.
 Granted to PUBLIC.
 Granted to the trusted context role.

UNIT- V
** Distributed database **
A distributed database is a collection of multiple interconnected databases, which are
spread physically across various locations that communicate via a computer network.
Features:
 Databases in the collection are logically interrelated with each other. Often they represent
a single logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
 The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous with
a transaction processing system.
Advantages of Distributed Databases (OR) Data distribution:
Following are the advantages of distributed databases over centralized databases.
 Modular Development − If the system needs to be expanded to new locations or new units,
in centralized database systems, the action requires substantial efforts and disruption in the
existing functioning. However, in distributed databases, the work simply requires adding
new computers and local data to the new site and finally connecting them to the distributed
system, with no interruption in current functions.
 More Reliable − In case of database failures, the total system of centralized databases
comes to a halt. However, in distributed systems, when a component fails, the functioning of
the system continues may be at a reduced performance. Hence DDBMS is more reliable.
 Better Response − If data is distributed in an efficient manner, then user requests can be
met from local data itself, thus providing faster response. On the other hand, in centralized
systems, all queries have to pass through the central computer for processing, which
increases the response time.
 Lower Communication Cost − In distributed database systems, if data is located locally
where it is mostly used, then the communication costs for data manipulation can be
minimized. This is not feasible in centralized systems.
** Distributed Database Management System: **

A distributed database management system (DDBMS) is a centralized software system
that manages a distributed database in a manner as if it were all stored in a single location.
Features:
 It is used to create, retrieve, update and delete distributed databases.
 It synchronizes the database periodically and provides access mechanisms by the virtue
of which the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.

 It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.
Factors Encouraging DDBMS: The following factors encourage moving over to DDBMS:
 Distributed Nature of Organizational Units: Most organizations in the current times
are subdivided into multiple units that are physically distributed over the globe. Each unit
requires its own set of local data. Thus, the overall database of the organization becomes
distributed.
 Need for Sharing of Data: The multiple organizational units often need to communicate
with each other and share their data and resources. This demands common databases or
replicated databases that should be used in a synchronized manner.
 Support for Both OLTP and OLAP: Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may have
common data. Distributed database systems aid both these processing by providing
synchronized data.
 Database Recovery: One of the common techniques used in DDBMS is replication of
data across different sites. Replication of data automatically helps in data recovery if
database in any site is damaged. Users can access data from other sites while the
damaged site is being reconstructed. Thus, database failure may become almost
inconspicuous to users.
 Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a uniform
functionality for using the same data among different platforms.
Advantages of DDBMS:
1. Data are located near the greatest demand site. The data in a distributed database system
are dispersed to match business requirements which reduce the cost of data access.
2. Faster data access. End users often work with only a locally stored subset of the company’s
data.
3. Faster data processing. A distributed database system spreads out the systems workload by
processing data at several sites.
4. Growth facilitation. New sites can be added to the network without affecting the operations
of other sites.
5. Improved communications. Because local sites are smaller and located closer to customers,
local sites foster better communication among departments and between customers and
company staff.

6. Reduced operating costs. It is more cost-effective to add workstations to a network than to

update a mainframe system. Development work is done more cheaply and more quickly on low-
cost PCs than on mainframes.
7. User-friendly interface. PCs and workstations are usually equipped with an easy-to-use
graphical user interface (GUI). The GUI simplifies training and use for end users.
8. Less danger of a single-point failure. When one of the computers fails, the workload is
picked up by other workstations. Data are also distributed at multiple sites.
9. Processor independence. The end user is able to access any available copy of the data, and
an end user's request is processed by any processor at the data location.
Disadvantages of DDBMS:
1. Complexity of management and control. Applications must recognize data location, and
they must be able to stitch together data from various sites. Database administrators must have
the ability to coordinate database activities to prevent database degradation due to data
anomalies.
2. Technological difficulty. Data integrity, transaction management, concurrency control,
security, backup, recovery, query optimization, access path selection, and so on, must all be
addressed and resolved.
3. Security. The probability of security lapses increases when data are located at multiple sites.
The responsibility of data management will be shared by different people at several sites.
4. Lack of standards. There are no standard communication protocols at the database level.
(Although TCP/IP is the de facto standard at the network level, there is no standard at the
application level.) For example, different database vendors employ different—and often
incompatible—techniques to manage the distribution of data and processing in a DDBMS
environment.
5. Increased storage and infrastructure requirements. Multiple copies of data are required
at different sites, thus requiring additional disk storage space.
6. Increased training cost. Training costs are generally higher in a distributed model than they
would be in a centralized model, sometimes even to the extent of offsetting operational and
hardware savings.
7. Costs. Distributed databases require duplicated infrastructure to operate (physical location,
environment, personnel, software, licensing, etc.)

** Distributed DBMS Architectures **

DDBMS architectures are generally developed depending on three parameters:
 Distribution − It states the physical distribution of data across the different sites.
 Autonomy − It indicates the distribution of control of the database system and the degree
to which each constituent DBMS can operate independently.
 Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.
Some of the common architectural models are:
 Client - Server Architecture for DDBMS
 Peer - to - Peer Architecture for DDBMS
 Multi - DBMS Architecture
Client - Server Architecture for DDBMS:
This is a two-level architecture where the functionality is divided into servers and clients.
The server functions primarily encompass data management, query processing, optimization
and transaction management. Client functions include mainly user interface. However, they
have some functions like consistency checking and transaction management.
The two different client - server architecture are:
1. Single Server Multiple Client
2. Multiple Server Multiple Client (shown in the following diagram)

Peer- to-Peer Architecture for DDBMS:

In these systems, each peer acts both as a client and a server for imparting database
services. The peers share their resource with other peers and co-ordinate their activities.
This architecture generally has four levels of schemas:
 Global Conceptual Schema − Depicts the global logical view of data.
 Local Conceptual Schema − Depicts logical data organization at each site.
 Local Internal Schema − Depicts physical data organization at each site.
 External Schema − Depicts user view of data.
Multi - DBMS Architecture for DDBMS:

This is an integrated database system formed by a collection of two or more autonomous
database systems.
Multi-DBMS can be expressed through six levels of schemas:
 Multi-database View Level − Depicts multiple user views comprising of subsets of the
integrated distributed database.

 Multi-database Conceptual Level − Depicts integrated multi-database that comprises of

global logical multi-database structure definitions.
 Multi-database Internal Level − Depicts the data distribution across different sites and
multi-database to local data mapping.

 Local database View Level − Depicts public view of local data.
 Local database Conceptual Level − Depicts local data organization at each site.
 Local database Internal Level − Depicts physical data organization at each site.
There are two design alternatives for multi-DBMS:

 Model with multi-database conceptual level.
 Model without multi-database conceptual level.

** Data Replication **
Data replication is the process of storing separate copies of the database at two or more
sites. It is a popular fault tolerance technique of distributed databases.
Advantages of Data Replication:
 Reliability − In case of failure of any site, the database system continues to work since a
copy is available at another site(s).

 Reduction in Network Load − since local copies of data are available, query processing
can be done with reduced network usage, particularly during prime hours. Data updating can
be done at non-prime hours.
 Quicker Response − Availability of local copies of data ensures quick query processing and
consequently quick response time.

 Simpler Transactions − Transactions require less number of joins of tables located at
different sites and minimal coordination across the network. Thus, they become simpler in
nature.

Disadvantages of Data Replication:

 Increased Storage Requirements − Maintaining multiple copies of data is associated with
increased storage costs. The storage space required is in multiples of the storage required for
a centralized system.
 Increased Cost and Complexity of Data Updating − Each time a data item is updated, the
update needs to be reflected in all the copies of the data at the different sites. This requires
complex synchronization techniques and protocols.
 Undesirable Application – Database coupling − If complex update mechanisms are not
used, removing data inconsistency requires complex co-ordination at application level. This
results in undesirable application – database coupling.
Some commonly used replication techniques are −
 Snapshot replication
 Near-real-time replication
 Pull replication
** Data Fragmentation **
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of
the table are called fragments. Fragmentation can be of three types: horizontal, vertical, and
hybrid (combination of horizontal and vertical). Horizontal fragmentation can further be
classified into two techniques: primary horizontal fragmentation and derived horizontal
fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed
from the fragments. This is needed so that the original table can be reconstructed from the
fragments whenever required. This requirement is called “reconstructiveness.”
Advantages of Fragmentation:
 Since data is stored close to the site of usage, efficiency of the database system is
increased.
 Local query optimization techniques are sufficient for most queries since data is locally
available.
 Since irrelevant data is not available at the sites, security and privacy of the database
system can be maintained.

Disadvantages of Fragmentation:
 When data from different fragments are required, the access speeds may be very high.
 In case of recursive fragmentations, the job of reconstruction will need expensive
techniques.
 Lack of back-up copies of data in different sites may render the database ineffective in
case of failure of a site.

Vertical Fragmentation:
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In
order to maintain reconstructiveness, each fragment should contain the primary key field(s) of
the table. Vertical fragmentation can be used to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered
students in a Student table having the following schema.
STUDENT (Regd_No Name Course Address Semester Fees Marks)
Now, the fees details are maintained in the accounts section. In this case, the designer
will fragment the database as follows:
SQL> CREATE TABLE STD_FEES AS SELECT Regd_No, Fees FROM STUDENT;
Horizontal Fragmentation:
Horizontal fragmentation groups the tuples of a table in accordance to values of one or
more fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness.
Each horizontal fragment must have all columns of the original base table.
For example, in the student schema, if the details of all students of Computer Science
Course need to be maintained at the School of Computer Science, then the designer will
horizontally fragment the database as follows:
SQL> CREATE COMP_STD AS SELECT * FROM STUDENT WHERE COURSE =
"Computer Science";
Hybrid Fragmentation:
In hybrid fragmentation, a combination of horizontal and vertical fragmentation
techniques are used. This is the most flexible fragmentation technique since it generates
fragments with minimal extraneous information. However, reconstruction of the original table
is often an expensive task.
Hybrid fragmentation can be done in two alternative ways:

 At first, generate a set of horizontal fragments; then generate vertical fragments from one
or more of the horizontal fragments.
 At first, generate a set of vertical fragments; then generate horizontal fragments from one
or more of the vertical fragments.
** Client Server Databases **

Client: A client is a single-user workstation that provides presentation services, database
services and connectivity along with an interface for user interaction to acquire business needs.
Server: A server is one or more multi-user processors with a higher capacity of shared memory
which provides connectivity and the database services along with interfaces relevant to the
business procedures.
Client/Server computing provides an environment that enhances business procedures by
appropriately synchronizing the application processing between the client and the server.
Client/Server Architecture
The Client/Server model is basically platform independent and blends with “cooperating
processing” or “peer-to-peer” model. The platform provides the opportunity for users to access
the business functionality thereby exposing into risky situations since its transparent to the
underlying technology as well as to the user.
Client/Server architecture of database system has two logical components namely client,
and server. Clients are generally personal computers or workstations whereas server is large
workstations, mini range computer system or a mainframe computer system. The applications
and tools of DBMS run on one or more client platforms, while the DBMS soft wares reside on
the server. The server computer is caned backend and the client's computer is called front end.
These server and client computers are connected into a network. The applications and tools act
as clients of the DBMS, making requests for its services. The DBMS, in turn, processes these
requests and returns the results to the client(s). Client/Server architecture handles the Graphical
User Interface (GUI) and does computations and other programming of interest to the end user.
The server handles parts of the job that are common to many clients, for example, database
access and updates.
** Client/Server Evolution (OR) Emergence of Client Server Architecture **

A long time ago, client-server computing was just using mainframes and connecting to
dumb terminals. Through the years, personal computers started to evolve and replaced these
terminals but the processing is still process on the mainframes. With the improvement in
computer technology, the processing demands started to split between personal computers and
mainframes.
The term client-server refers to a software architecture model consisting of two parts,
client systems and server systems. These two components can interact and form a network that
connects multiple users. Using this technology, PCs are able to communicate with each other on
a network. These networks were based on file sharing architecture, where the PC downloads
files from corresponding file server and the application is running locally using the data
received. However, the shared usage and the volume of data to be transferred must be low to
run the system well.
As the networks grew, the limitations of file sharing architectures become the obstacles in
the client-server system. This problem is solved by replaced the file server with a database
server. Instead of transmitting and saving the file to the client, database server executes request
for data and return the result sets to the client. In the results, this architecture decreases the
network traffic, allowing multiple users to update data at the same time.
Typically either Structured Query Language (SQL) or Remote Procedure Calls (RPCs)
are used to communicate between the client and server.
There are several types of client-server architecture. One of the architecture is the Two
Tier Architecture, where a client is directly connected to a server. This architecture has a good
application development speed and work well in homogeneous environments when the user
population work is small. The problem exists in this architecture is the distribution of
application logic and processing in this model. If the application logic is distributed to dozens of
client systems, the application maintenance will be very difficult.
To overcome the limitations of the Two-Tier Architecture, Three Tier Architecture is
introduced. By introducing the middle tier, clients connect only to the application server instead
of connect directly to the data server. By this way, the load of maintaining the connection is
removed. The database server is able to manage the storage and retrieve the data well. Thus, the
application logic and processing can be handled in any application systematically.
To enhance the Three Tier Architecture, it can be extended to N-tiers when the middle
tier provides connections to various types of services, integrating and coupling them to the
client, and to each other. For example, web server is added to Three Tier Architecture to
become Four Tier Architecture where the web servers handle the connection between
application server and the client. Therefore, more users can be handled at the same time.
** Need for Client/Server Computing **

We are in an era where information technology plays a critical role in business
applications, considered as an area an organization would highly invest in order to widen the
opportunities available to compete the global market.

“A competitive global economy will ensure obsolescence and obscurity to those who
cannot or are unwilling to compete”(Client/Server Architecture,2011), according to this
statement it’s necessary for organizations sustain its market position by re-engineering
prevailing organizational structures and business practices to achieve their business goals. In
short it’s a basic need to evolve with the change of technological aspects.
Therefore organizations should undergo a mechanism to retrieve and process its
corporate data to make business procedures more efficient to excel or to survive in the global
market. The client/server model brings out a logical perspective of distributed corporative
processing where a server handles and processes all client requests. This can be also viewed as a
revolutionary milestone to the data processing industry.
“Client/server computing is the most effective source for the tools that empower
employees with authority and responsibility.”(Client/Server Architecture, 2011)
“Workstation power, workgroup empowerment, preservation of existing investments,
remote network management, and market-driven business are the forces creating the need for
client/server computing”. (Client/Server Architecture, 2011)
Client/server computing has a vast progression in the computer industry leaving any area
or corner untouched. Often hybrid skills are required for the development of client/server
applications including database design, transaction processing, communication skills, graphical
user interface design and development etc. Advanced applications require expertise of
distributed objects and component infrastructures.
Most commonly found client/server strategy today is PC LAN implementation optimized
for the usage of group/batch. This has basically given threshold to many new distributed
enterprises as it eliminates host-centric computing.
Advantages:
Organizations often seek opportunities to maintain service and quality competition to
sustain its market position with the help of technology where the client/server model makes an
effective impact. Deployment of client/server computing in an organization will positively
increase productivity through the usage of cost-effective user interfaces, enhanced data storage,
vast connectivity and reliable application services.
If properly implemented its capable of improving organizational behavior with the help of
the knowledgeable worker-who can manipulate data and respond to the errors appropriately.
 Improved Data Sharing: Data is retained by usual business processes and manipulated on a
server is available for designated users (clients) over an authorized access. The use of
Structured Query Language (SQL) supports open access from all client aspects and also
transparency in network services depict that similar data is being shared among users.
 Integration of Services: Every client is given the opportunity to access corporate
information via the desktop interface eliminating the necessity to log into a terminal mode or
another processor. Desktop tools like spreadsheet, power point presentations etc can be used
to deal with corporate data with the help of database and application servers resident on the
network to produce meaningful information.
 Shared Resources amongst Different Platforms: Applications used for client/server model
is built regardless of the hardware platform or technical background of the entitled software
(Operating System S/W) providing an open computing environment, enforcing users to
obtain the services of clients and servers (database, application, communication servers).
 Inter-Operation of Data: All development tools used for client/server applications access
the back-end database server through SQL, an industry-standard data definition and access
language, helpful for consistent management of corporate data. Advanced database products
enable user/application to gain a merged view of corporate data dispersed over several
platforms. Rather than a single target platform this ensures database integrity with the ability
to perform updates on multiple locations enforcing quality recital and recovery.
 Data Processing capability despite the location: We are in an era which undergoes a
transformation of machine-centered systems to user-centered systems. Machine-centered
systems like mainframe, mini-micro applications had unique access platforms and
functionality keys, navigation options, performance and security were all visible. Through
client/server users can directly log into a system despite of the location or technology of the
processors.
 Easy maintenance: Since client/server architecture is a distributed model representing
dispersed responsibilities among independent computers integrated across a network, it’s an
advantage in terms of maintenance. It’s easy to replace, repair, upgrade and relocate a server
while clients remain unaffected. This unawareness of change is called as encapsulation.
 Security: Servers have better control access and resources to ensure that only authorized
clients can access or manipulate data and server-updates are administered effectively.
** Structure of Client-Server Systems **

Client/Server architecture of database system has two logical components namely client,
and server. Clients are generally personal computers or workstations whereas server is large
workstations, mini range computer system or a mainframe computer system. The applications
and tools of DBMS run on one or more client platforms, while the DBMS soft wares reside on
the server.
The server computer is caned backend and the client's computer is called front end. These
server and client computers are connected into a network. The applications and tools act as
clients of the DBMS, making requests for its services. The DBMS, in turn, processes these
requests and returns the results to the client(s). Client/Server architecture handles the Graphical
User Interface (GUI) and does computations and other programming of interest to the end user.
The server handles parts of the job that are common to many clients, for example, database
access and updates.
1. Single- tier client server architecture:

In a single-tier system the database is centralized, which means the DBMS Software and
the data reside in one location and the dumb terminals were used to access the DBMS as shown.

2. Two- tier client server architecture:

The rise of personal computers in businesses during the 1980s, the increased reliability of
networking hardware causes Two-tier and Three-tier systems became common. In a two-tier
system, different software is required for the server and for the client. Below figure illustrates
the two-tier client server model. At the early stages client server computing model was called
two-tier-computing model in which client is considered as data capture and validation tier and
Server was considered as data storage tier. This scenario is depicted.
Problems of two-tier architecture:

The need of enterprise scalability challenged this traditional two-tier client-server model.
In the mid-1990s, as application became more complex and could be deployed to hundreds or
thousands of end-users, the client side, now undergoes with following problems:
 A' fat' client requiring considerable resources on client's computer to run effectively. This
includes disk space, RAM and CPU.
 Client machines require administration which results overhead.

3. Three-tier client server architecture:

By 1995, three-tier architecture appears as improvement over two-tier architecture. It has
three layers, which are:
i) First Layer: User Interface which runs on end-user's computer (the client).
ii) Second Layer: Application Server It is a business logic and data processing layer. This
middle tier runs on a server which is called as Application Server.
iii) Third Layer: Database Server It is a DBMS, which stores the data required by the
middle tier. This tier may run on a separate server called the database server.
As, described earlier, the client is now responsible for application's user interface, thus it
requires less computational resources now clients are called as 'thin client' and it requires less
maintenance.
Advantages of Client/Server Database System:

 Client/Server system has less expensive platforms to support applications that had
previously been running only on large and expensive mini or mainframe computers
 Client offer icon-based menu-driven interface, which is superior to the traditional command-
line, dumb terminal interface typical of mini and mainframe computer systems.
 Client/Server environment facilitates in more productive work by the users and making
better use of existing data.
 Client/Server database system is more flexible as compared to the Centralized system.
 Response time and throughput is high.
 The server (database) machine can be custom-built (tailored) to the DBMS function and thus
can provide a better DBMS performance.
 The client (application database) might be a personnel workstation, tailored to the needs of
the end users and thus able to provide better interfaces, high availability, faster responses
and overall improved ease of use to the user. A single database (on server) can be shared
across several distinct client (application) systems.
Disadvantages of Client/Server Database System:

 Programming cost is high in client/server environments, particularly in initial phases.
 There is a lack of management tools for diagnosis, performance monitoring and tuning and
security control, for the DBMS, client and operating systems and networking environments.

B.Com (Computer Applications) CBCS, SEMESTER -IV
Relational Database Management Systems

Computer lab- Practical Manual
Time: 2hrs Record: 05 Skill Test: 15 Total Marks: 20
Exercise- 1
1. Create table EMP with columns emp_num, ename, sal and enter 10 records.
2. Add columns dname, dept_num, location for EMP table.
3. Rename the EMP table with Employee and modify the ename column size as 20.
4. Display the all records from the Employees of department number 30.
5. Display the employees details whose have 2 A’s in their name.
6. Drop the column dname and display details of employees whose salary greater than 15000.
Exercise- 2
1. Display the details of employees whose join date is 01/11/2017.
2. Add column job to the employees table and list the clerks in the deptno of 10.
3. Display the details of employees whose salary is less than 10000.
4. Display the details of the employee salaries in descending order.
5. Display the names of the employees in uppercase.
6. Display the names of the employees in lowercase.
Exercise- 3
1. Find the Dept which has maximum number of employees.
2. List the year in which maximum number of employees were recruited.
3. Display the details of employees who are working for departments 10 and 20.
4. Update the HRA=15%, DA=10%, TA=10% for all the employees whose experience is more
than 10 years.
5. Write a query to delete duplicate records from emp.
6. Display the sum of salaries in department wise.

Exercise- 4
1. Make the duplicate table as emp12 on emp.
2. Add constraint primary key for emp_num and dept_num columns for emp table.
3. Remove the referential integrity from emp and dept tables.
4. Display the names of employees who earn the Highest salary in their respective departments.
5. Display the employees whose job as manager.
6. Display the details of employees whose name is ALLEN.
Exercise- 5
1. Display all rows from EMP table. The system waits after every screen full of information.
2. Create view for emp table.
3. Create a view for emp table where deptno=10;
4. Drop the view of emp table.
5. Delete all the records from the emp where the deptname is NULL.
6. Delete the rows of employees whose experience is less than 5 years.

Table: EMP
EMP_NUM ENAME SAL JOB HIREDATE DNAME DEPT_NUM LOCATION
7364 SMITH 8000 CLERK 12/17/80 RESEARCH 20 DALLAS
7499 ALLEN 16000 SALESMAN 02/20/81 SALES 30 CHICAGO
7566 JONES 29750 MANAGER 04/02/81 RESEARCH 20 DALLAS
7654 MARTIN 12500 SALESMAN 09/28/81 SALES 30 CHICAGO
7698 BLAKE 28500 MANAGER 05/01/81 SALES 30 CHICAGO
7782 CLARK 24500 MANAGER 06/09/81 ACCOUNTING 10 NEWYORK
7788 SCOTT 30000 ANALYST 04/19/87 RESEARCH 20 DALLAS
7876 ADAMS 11000 CLERK 05/23/87 RESEARCH 20 DALLAS
7900 JAMES 9500 CLERK 12/03/81 SALES 30 CHICAGO
7902 FORD 8000 ANALYST 12/03/81 RESEARCH 20 DALLAS
7934 MILLER 13000 CLERK 01/23/82 ACCOUNTING 10 NEWYORK
Table: DEPT
DEPT_NUM DNAME LOCATION
10 ACCOUNTING NEWYORK
20 RESEARCH DALLAS
30 SALES CHICAGO
Exercise- 1
1. Create table EMP with columns emp_num, ename, sal and enter 10 records.
Creating table:
SQL> Create table EMP(emp_num number(4), ename varchar2(15), sal number(5));
Table created.
Entering records:
SQL> Insert into EMP values(&emp_num, ‘&ename’, &sal);
Enter value for emp_num: 7364
Enter value for ename: smith
Enter value for sal: 8000
Old1: insert into EMP values(&emp_num, ‘&ename’, &sal)
New1: insert into EMP values(7364, ‘smith’, 8000);

1 row created.
SQL> / and press Enter, add remaining 9 records.
2. Add columns dname, dept_num, location for EMP table.

SQL>Alter table EMP add(dname varchar2(10), dept_num number(2),location varchar2(10));
Table altered.
3. Rename the EMP table with Employee and modify the ename column size as 20.
SQL> Rename EMP to Employee;
Table renamed.
SQL> Alter table EMP modify ename number(20);
Table altered.
4. Display the all records from the Employees of department number 30.
SQL> Select * from EMP where dept_num=30;
5. Display the employees details whose have 2 A’s in their name.

SQL> Select * from EMP where lower(ename) like ‘%a%a%’;
6. Drop the column dname and display details of employees whose salary greater than
15000.
SQL> Alter table EMP drop column dname;
Table altered.
SQL> Select * from EMP where sal>15000;


Exercise- 2
1. Display the details of employees whose join date is 01/11/2017.
SQL> Select * from EMP where to_char(hiredate, ‘mm/dd/yyyy’)=’01/11/2017’;
No rows selected.
2. Add column job to the employees table and list the clerks in the deptno of 10.
SQL> Alter table EMP add(job varchar2(10));
Table altered.
SQL> Select * from EMP where job=’clerk’ and dept_num=10;

3. Display the details of employees whose salary is less than 10000.

SQL> Select * from EMP where sal<10000;
4. Display the details of the employee salaries in descending order.
SQL> Select ename, sal from EMP order by sal desc;

ENAME SAL
SCOTT 30000
JONES 29750
BLAKE 28500
CLARK 24500
ALLEN 16000
MILLER 13000
MARTIN 12500
ADAMS 11000
JAMES 9500
SMITH 8000
FORD 8000
5. Display the names of the employees in uppercase.
SQL> Select upper(ename) from EMP;
ENAME
SMITH
ALLEN
JONES
MARTIN
BLAKE
CLARK
SCOTT
ADAMS
JAMES
FORD
MILLER
6. Display the names of the employees in lowercase.
SQL> Select lower(ename) from EMP;
ENAME
smith
allen
jones
martin
blake
clark
scott
adams
james
ford
miller

Exercise- 3
1. Find the Dept which has maximum number of employees.
SQL> Select dept_num from EMP group by dept_num having count(*) = (Select
max(count(*) from EMP group by dept_num);
DEPT_NUM
30
2. List the year in which maximum number of employees were recruited.

SQL> Select to_char(hiredate, ‘yy’) from EMP group by to_char(hiredate, ‘yy’) having
count(*) = (Select max(count(*)) from EMP group by to_char(hiredate, ‘yy’));
to_char(hiredate, ‘yy’)
81
3. Display the details of employees who are working for departments 10 and 20.
SQL> Select * from EMP where dept_num in(10,20);
4. Update the HRA=15%, DA=10%, TA=10% for all the employees whose experience is
more than 10 years.
SQL> Alter table EMP add(HRA number(5,2), DA number(5,2), TA number(5,2));
Table altered.
SQL> Update EMP set HRA=0.15*sal, DA=0.1*sal, TA=0.1*sal, where
months_between(sysdate, hiredate)>120;
10 rows updated.
5. Write a query to delete duplicate records from emp.

SQL> Delete from EMP where emp_num= (Select emp_num from EMP group by
emp_num having count(*)>1);
0 rows deleted.
6. Display the sum of salaries in department wise.

SQL> Select dept_num, sum(sal) from EMP group by dept_num;
DEPT_NUM SUM(SAL)
10 37500
20 86750
30 66500
Exercise- 4
1. Make the duplicate table as emp12 on emp.
SQL> Create table EMP12 as Select * from EMP;
Table created.
Here, all the records of EMP table are copied into EMP12.
2. Add constraint primary key for emp_num and dept_num columns for emp table.
SQL> Alter table EMP add constraint c1 primary key(emp_num, dept_num);
Table altered.
3. Remove the referential integrity from emp and dept tables.

SQL> Alter table EMP drop constraint c1;
Table altered.
4. Display the names of employees who earn the Highest salary in their respective
departments.
SQL> Select ename, sal, dept_num from EMP where (dept_num, sal) in (Select dept_num,
max(sal) from EMP group by dept_num);
ENAME SAL DEPT_NUM

BLAKE 28500 30
SCOTT 30000 20
FORD 8000 20
5. Display the employees whose job as manager.

SQL> Select emp_num, ename, job from EMP where job= ‘manager’;
EMP_NUM ENAME JOB

7566 JONES MANAGER
7698 BLAKE MANAGER
7782 CLARK MANAGER
6. Display the details of employees whose name is ALLEN.

SQL> Select * from EMP where ename=’ALLEN’;

Exercise- 5
1. Display all rows from EMP table. The system waits after every screen full of
information.
SQL> Set pause on
SQL> Select * from EMP;
2. Create view for emp table.

SQL> Create view emp_view as select * from EMP;
View created.
3. Create a view for emp table where deptno=10;

SQL> Create view emp_view2 as select * from EMP where dept_num=10;
View created.
4. Drop the view of emp table.

SQL> Drop view emp_view;
View dropped.
5. Delete all the records from the emp where the deptname is NULL.
SQL> Delete from EMP where dname=’NULL’;
0 rows deleted.
6. Delete the rows of employees whose experience is less than 5 years.

SQL> Delete from EMP where months_between(sysdate, hiredate)<60;
0 rows deleted.

Veerachary CBCS RDBMS IV SEM-watermark

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Veerachary CBCS RDBMS IV SEM-watermark

Uploaded by

Copyright:

Available Formats

downloaded from: www.sucomputersforum.

Paper: (BC407) for B.Com (Computer Applications)

Unit-I: BASIC CONCEPTS:

Unit-V: DISTRIBUTED AND CLIENT SERVER DATABASES:

LAB: SQL QUERIES BASED ON VARIOUS COMMANDS.

1. Database Systems: R.Elmasri & S.B.Navathe, Pearson.

2. Introduction to Database Management System: ISRD Group, McGraw Hill.

3. Database Management System: R.Rmakrishnan & J.Gehrke, McGraw Hill.

4. Modern Database Management: J.A.Hoffer, V.Rames & H.Topi, Parson.

5. Database System Concepts: Silberschatz, Korth & Sudarshan, McGraw Hill.

6. Simplified Approach to DBMS: ParteekBhaia, Kalyani Publishers.

7. Database Management System: NirupamaPathak, Himalaya.

8. Database Management Systems: Pannerselvam, PHI.

9. Relational Database Management System: Srivastava & Srivastava, New Age.

10.PHP MySQL Spoken Tutorials by IIT Bombay.

11.Oracle Database: A Beginner’s Guide: I.Abramson, McGraw Hill.

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 2

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 3

Database Management System (DBMS): It is a set of databases and a set of application

**File Based System**

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 5

**Characteristics (OR) Advantages of database (DBMS)**

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 7

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 8

Following are the three levels of database architecture:

** DBA (Data Base Administrator) Functions & Roles **

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 10

 Modifying the database structure, as necessary, from information given by application

** Data files & Indexes (Indices) **

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 11

Indexes enforce uniqueness on the rows in a table.

from software system after its complete implementation.

look like after development is completed.

information using “parent/ child” relationship.

 This model is the first DBMS model.

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 13

 In this model, the data is sorted hierarchically.

2. Relational Model: Relational model is based on first-order predicate logic.

 It represents data as relations or tables.

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 14

 It replaces the hierarchical tree with a graph.

 It represents the data as record types and one-to-many relationship.

 This model is easy to design and understand.

4. Entity Relationship Model: Entity Relationship Model is a high-level data model.

 This model is useful in developing a conceptual design for the database.

 It is very simple and easy to design logical view of data.

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 15

entity becomes a major part of the data stored in the database.

(GIS), scientific experiments, engineering design and manufacturing.

 It represents real world objects, attributes and behaviors.

 It provides a clear modular structure.

 It is easy to maintain and modify the existing code.

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 17

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 19

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 20

 Records of the master table or Primary Table cannot be deleted or updated if

**Update Operations and Dealing with Constraint Violations**

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 21

Some Basic Points about the Figure –

If there is no violation, then updation will be allowed.

1. Reject the updation – (ON UPDATE NO ACTION): It prevents updating a parent

2. Cascade Updation – (ON UPDATE CASCADE): If updation causes integrity violation,

Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 23

** Entity Relationship Model (ER Model)**

File Based System

Characteristics (OR) Advantages of database (DBMS)

DBA (Data Base Administrator) Functions & Roles

Data files & Indexes (Indices)

Update Operations and Dealing with Constraint Violations

Entity Relationship Model (ER Model)

Defining relationship for college database

Transform ER Diagram into Tables (Relational database)

Single Valued Dependency

Physical Database Design Issues or Mistakes

Storage of Database on Hard disks

Heap File (Unordered) Organization

Sequential File Organization

Indexed Sequential Access Method (ISAM)