Professional Documents
Culture Documents
com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Syllabus
SUGGESTED READINGS:
UNIT- I
** Basic Concepts **
Data: Data is a raw collection of facts about people, places, objects and events, which include
text, graphics, images, sound etc that have meaning in the user’s environment.
Data is given by the user to the computer.
It is not understandable (meaningless).
It requires processing.
Example: 1Data Sri 35 45 55 Data (Here, it is not clear whether 35
2 Ram 75 80 98 is rno (or) marks)
Information: Information is meaningful data in an organized form. Information is processed
data that increases the knowledge of a person who uses the data.
Information is given by the computer to the user.
It is understandable (meaningful).
It is processed data.
Example: Sno Name M1 M2 M3
(Number) (Varchar) (Number) (Number) (Number)
1 Sri 35 45 55 Information
2 Ram 75 80 98
Data Process
Information
Meta data: Meta data is data about data. It describes the properties/characteristics of other data.
It includes field names, data types and their size.
It is used for processing
It gives meaning to the data.
Sno Name M1 M2 M3 Meta data
(Number) (Varchar) (Number) (Number) (Number)
Data base: A data base is a collection of logically related data stored in a standardized format
and is sharable by multiple users. (OR)
A mass storage of data that is generated over a period of time in a business environment
is called “data base”.
Symbol of data base is:
Example:
1. University database: In this we can store data about students, faculty, courses, results, etc.
2. Bank database: In this we can store data about Account holders.
DBMS S/W
User Database
DBMS software is an interface between user and Database.
DBMS provides services like storing, updating, deleting and selecting data.
Database system: It is a set of Databases, DBMS, Hardware and people who operate on it.
Evolution of database system:
Late 1950’s : Sequential file processing systems were used in late 1950’s. In these systems, all
records in a file must be processed in sequence.
1960’s : Random access file processing systems were greatly in use in this period. It
supported direct access to a specific record. In this, it was difficult to access
multiple records though they were related to a single file.
1970’s : During this decade the hierarchical and net work database management systems
were developed and were treated as first generation DBMS.
Late 1970’s: E. F. Codd and others developed the relational data model during this 1970’s. This
model is considered second generation DBMS. All data is represented in the form
of tables. SQL is used for data retrieval.
1980’s : Object Oriented model was introduced in 1980’s. In this model, both data and
their relationships (operations) are contained in a single structure known as object.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 4
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
1990’s : Client/ Server computing were introduced, then data ware housing, internet
applications.
2000 : Object oriented database systems were introduced.
File processing systems was an early attempt to computerize the manual filing system
that we are all familiar with. A file system is a method for storing and organizing computer files
and the data they contain to make it easy to find and access them. File systems may use a
storage device such as a hard disk or CD-ROM and involve maintaining the physical location of
the files.
Characteristics (OR) Advantages of File Processing System:
It is a group of files storing data of an organization.
Each file is independent from one another.
Each file is called a flat file.
Each file contained and processed information for one specific function, such as accounting
or inventory.
Files are designed by using programs written in programming languages such as COBOL, C,
and C++.
The physical implementation and access procedures are written into database application;
therefore, physical changes resulted in intensive rework on the part of the programmer.
As systems became more complex, file processing systems offered little flexibility,
presented many limitations, and were difficult to maintain.
Limitations of the File Processing System/ File-Based Approach:
1. Separated and Isolated Data: To make a decision, a user might need data from two separate
files. First, the files were evaluated by analysts and programmers to determine the specific data
required from each file and the relationships between the data and then applications could be
written in a programming language to process and extract the needed data. Imagine the work
involved if data from several files was needed.
2. Duplication of data: Often the same information is stored in more than one file.
Uncontrolled duplication of data is not required for several reasons, such as:
• Duplication is wasteful. It costs time and money to enter the data more than once
• It takes up additional storage space, again with associated costs.
• Duplication can lead to loss of data integrity; in other words the data is no longer consistent.
3. Data Dependence: In file based approach application programs are data dependent. It means
that, with the change in the physical representation (how the data is physically represented in
disk) or access technique (how it is physically accessed) of data, application programs are also
affected and needs modification. In other words application programs are dependent on the how
the data is physically stored and accessed.
4. Difficulty in representing data from the user's view: To create useful applications for the
user, often data from various files must be combined. In file processing it was difficult to
determine relationships between isolated data in order to meet user requirements.
5. Data Inflexibility: Program-data interdependency and data isolation, limited the flexibility
of file processing systems in providing users with ad-hoc information requests.
6. Incompatible file formats: As the structure of files is embedded in the application programs,
the structures are dependent on the application programming language. For example, the
structure of a file generated by a COBOL program may be different from the structure of a file
generated by a 'C' program. The direct incompatibility of such files makes them difficult to
process jointly.
7. Data Security: The security of data is low in file based system because, the data is
maintained in the flat file(s) is easily accessible.
8. Poor data modeling of real world: The file based system is not able to represent the
complex data and interfile relationships, which results poor data modeling properties.
**Database Approach**
In order to remove all limitations of the File Based Approach, a new approach was
required that must be more effective known as Database approach.
The Database is a shared collection of logically related data, designed to meet the
information needs of an organization. A database is a computer based record keeping system
whose over all purpose is to record and maintains information. The database is a single, large
repository of data, which can be used simultaneously by many departments and users. Instead of
disconnected files with redundant data, all data items are integrated with a minimum amount of
duplication.
The database is no longer owned by one department but is a shared corporate resource.
The database holds not only the organization's operational data but also a description of this
data.
For this reason, a database is also defined as a self-describing collection of integrated
records.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 6
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
The description of the data is known as the Data Dictionary or Meta Data (the 'data about
data'). It is the self-describing nature of a database that provides program-data independence.
A database implies separation of physical storage from use of the data by an application
program to achieve program/data independence. Changes (or updating) can be made to data
without affecting other components of the system.
In the DBMS approach, application program written in some programming language like
Java, Visual Basic.Net, and Developer 2000 etc. uses database connectivity to access the
database stored in the disk with the help of operating system's file management system.
The file system interface and DBMS interface for the university management system is shown.
Building blocks of a Database:
The following three components form the building blocks of a database. They store the
data that we want to save in our database.
Columns: Columns are similar to fields, that is, individual items of data that we wish to store.
A Student' Roll Number, Name, Address etc. are all examples of columns.
Rows: Rows are similar to records as they contain data of multiple. A row can be made up of as
many or as few columns as you want.
Tables: A table is a logical group of columns. For example, you may have a table that stores
details of customers' names and addresses. Another table would be used to store details of parts
and yet another would be used .for supplier's names and addresses.
It is the tables that make up the entire database and it is important that we do not
duplicate data at all.
3. Permanent: Data in a database exist permanently in the sense the data can live beyond the
scope of the process that created it.
4. Correctness: Data should be correct with respect to the real world entity that they represent.
5. Security: Data should be protected from unauthorized access.
6. Consistency: Whenever more than one data element in a database represents related real
world values, the values should be consistent with respect to the relationship.
7. Non-redundancy: No two data items in a database should represent the same real world
entity.
8. Independence: Data at different levels should be independent of each other so that the
changes in one level should not affect the other levels.
9. Easily Accessible: It should be available when and where it is needed i.e. it should be easily
accessible.
10.Recoverable: It should be recoverable in case of damage.
11.Flexible to change: It should be flexible to change.
**Logical and physical DBMS architecture (OR) Three level architecture of DBMS**
Database design: Database design includes conceptual database design and physical database
design.
Conceptual database design consists of defining the data elements, relationship and
constraints. To do conceptual database design, DBA Designers create different views of the
database. These views must then be integrated into a complete database structure, which defines
the logical structure of the entire database.
Physical database design determines the physical structure of the database. Technical
oriented DBA designers carry out the physical design. Their goal is to optimize the total
combination of hardware, software.
User training: The responsibility of DBA is to educate the users, in how to access (use) the
database through DBMS. This can be done by taking training sessions (classes) or by
interacting with users. An information center provides training and simple programming
services.
Database security and integrity: the DBA provides security procedures and controls to
prevent the abuse (misuse) of data. The DBA assigns ownership of a view to a specific group,
this permits limited access on database. Access to the database is controlled by a password
mechanism. The DBA is responsible for assigning passwords and controlling access
permission.
Data integrity maintains the accuracy and consistency of data value security mechanisms,
such as passwords and data views, protect data integrity.
Database system performance: A database system may respond very slowly when number of
users accessing the database system at a time. At this time, DBA and his staff (technical
personnel) analyze the situation and solve the system respond time problems. This can be done
by creating indexes and physical rearrangement of data.
A database administrator's responsibilities can also include the following tasks:
Installing and upgrading the database server and application tools.
Allocating system storage and planning future storage requirements for the database
system.
Indexes: A database index is a data structure that improves the speed of data retrieval
operations on a database table at the cost of additional writes and storage space to maintain the
index data structure. Indexes are used to quickly locate data without having to search every row
in a database table every time a database table is accessed. Indexes can be created using one or
more columns of a database table, providing the basis for both rapid random lookups and
efficient access of ordered records.
Personal database: The personal databases are maintained, generally, on personal computers.
They contain information that is meant for use only among a limited number of users, generally
working in the same department.
Distributed database: These databases have contributions from the common databases as well
as the data captured from the local operations. The data remains distributed at various sites in
the organization. As the sites are linked to each other with the help of communication links, the
entire collection of data at all the sites constitutes the logical database of the organization.
**Data Models**
According to Hoberman (2009), “A data model is a way of finding the tools for both business
and IT professionals, which uses a set of symbols and text to precisely explain a subset of real
information to improve communication within the organization and thereby lead to a more
flexible and stable application environment”.
A data model is an idea which describes how the data can be represented and accessed
It defines data elements and relationships among various data elements for a specified
system.
The main purpose of data model is to give an idea that how final system or software will
Relational database simplifies the database structure by making use of tables and
columns.
Relational data model is the primary data model, which is used widely around the world
for data storage and processing. This model is simple and it has all the properties and
capabilities required to process data with storage efficiency.
Concepts in Relational model:
Table (or) Relation -- In relational data model, relations are saved in the format of Tables. This
format stores the relation among entities. A table has rows and columns, where rows represent
records and columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a
tuple.
Relation instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and
their names.
Relation key − Each row has one or more attributes, known as relation key, which can identify
the row in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute
domain.
3. Network Database Model: Network Database Model is same like Hierarchical Model, but
the only difference is that it allows a record to have more than one parent.
In this model, there is no need of parent to child association like the hierarchical model.
The developer can easily understand the system by looking at an ER model constructed.
In this diagram,
Rectangle represents the entities. Eg. Doctor and Patient.
Ellipse represents the attributes. Eg. DocId, Dname, PId, Pname. Attribute describes each
5. Object Model: Object model stores the data in the form of objects, classes and inheritance.
This model handles more complex applications, such as Geographic Information System
**Domains**
A domain is defined as the set of all unique values permitted for an attribute. For
example, a domain of date is the set of all possible valid dates, a domain of integer is all
possible whole numbers, a domain of day-of-week is Monday, Tuesday ... Sunday.
This in effect is defining rules for a particular attribute. If it is determined that an attribute
is a date then it should be implemented in the database to prevent invalid dates being entered.
If the system supports domain constraints then this invalid data would not have stored in the
first place. That is, the integrity of the database is being preserved.
**Types of Keys **
Table is a collection of data in the form of rows and columns. Rows are referred as records
and columns are referred as fields.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 16
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
A table includes several following components, which are called its keys.
1. Primary Key
2. Foreign Key
3. Candidate Key
4. Super Key
5. Composite Key
6. Alternate key
Primary key: A primary is a column or set of columns in a table that uniquely identifies tuples
(rows) in that table.
Example: Student Table
Stu_Id Stu_Name Stu_Age
101 Steve 23
102 John 24
103 Robert 28
104 Carl 22
In the above Student table, the Stu_Id column uniquely identifies each row of the table.
We denote the primary key by underlining the column name.
The value of primary key should be unique for each row of the table. Primary key column
cannot contain duplicate values.
Primary key column should not contain nulls.
Primary keys are not necessarily to be a single column; more than one column can also be a
primary key for a table. For e.g. {Stu_Id, Stu_Name} collectively can play a role of primary
key in the above table, but that does not make sense because Stu_Id alone is enough to
uniquely identifies rows in a table then why to make things complex. Having that said, we
should choose more than one columns as primary key only when there is no single column
that can play the role of primary key.
Foreign key: Foreign keys are the columns of a table that points to the primary key of another
table. They act as a cross-reference between tables.
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it
points to the primary key of the Student table.
Course_enrollment table:
Student table:
Course_Id Stu_Id
Stu_Id Stu_Name Stu_Age
C01 101
101 Chaitanya 22
C02 102
102 Arya 26
C03 101
103 Bran 25
C05 102
104 Jon 21
C06 103
Candidate Key: A super key with no redundant attribute is known as candidate key. Candidate
keys are selected from the set of super keys, the only thing we take care while selecting
candidate key is: It should not have any redundant attributes. That’s the reason they are also
termed as minimal super key.
For example:
Emp_Id Emp_Number Emp_Name
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
There are two candidate keys in above table:
{Emp_Id}
{Emp_Number}
Note: A primary key is being selected from the group of candidate keys. That means we can
either have Emp_Id or Emp_Number as primary key.
Super key: A super key is a set or one of more columns (attributes) to uniquely identify rows in
a table. Often people get confused between super key and candidate key, so we will also discuss
a little about candidate key here.
How candidate key is different from super key?
Answer is simple – Candidate keys are selected from the set of super keys, the only thing
we take care while selecting candidate key is: It should not have any redundant attribute. That’s
the reason they are also termed as minimal super key.
Let’s take an example to understand this: Employee table
Emp_SSN Emp_Number Emp_Name
123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert
Super keys:
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
All of the above sets are able to uniquely identify rows of the employee table.
Alternate Key: Out of all candidate keys, only one gets selected as primary key, remaining
keys are known as alternate or secondary keys.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 18
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Composite Key: A key that consists of more than one attribute to uniquely identify rows (also
known as records & tuples) in a table is called composite key.
** Database Constraints **
Database constraints are restrictions on the contents of the database or on database
operations.
Need of Constraints: Constraints in the database provide a way to guarantee that :
The values of individual columns are valid.
In a table, rows have valid primary key or unique key values.
In a dependent table, rows have valid foreign key values that reference rows in a parent
table.
Different Types of constraints:
1. Domain Constraints
2. Key Constraints
3. Integrity Rule/ Constraint 1 (Entity Integrity Rule or Constraint)
4. Integrity Rule/ Constraint 2 (Referential Integrity Rule or Constraint)
5. General Constraints
Domain Constraints: Domain Constraints specifies that what set of values an attribute can
take. Value of each attribute X must be an atomic value from the domain of X.
The data type associated with domains includes integer, character, string, date, time, currency
etc. An attribute value must be available in the corresponding domain. Consider the example
below:
Key Constraints: Keys are attributes or sets of attributes that uniquely identify an entity within
its entity set. An Entity set E can have multiple keys out of which one key will be designated as
the primary key. Primary Key must have unique and not null values in the relational table.
Example of Key Constraints in a simple relational table –
Integrity Rule 1 (Entity Integrity Rule or Constraint): The Integrity Rule 1 is also called
Entity Integrity Rule or Constraint. This rule states that no attribute of primary key will contain
a null value. If a relation has a null value in the primary key attribute, then uniqueness property
of the primary key cannot be maintained. Consider the example below:
Integrity Rule 2 (Referential Integrity Rule or Constraint): The integrity Rule 2 is also
called the Referential Integrity Constraints. This rule states that if a foreign key in Table 1 refers
to the Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null or
be available in Table 2. For example,
Some more Features of Foreign Key: Let the table in which the foreign key is defined is
Foreign Table or details table i.e. Table 1 in above example and the table that defines the
primary key and is referenced by the foreign key is master table or primary table i.e. Table 2 in
above example. Then the following properties must be hold:
Records cannot be inserted into a Foreign table if corresponding records in the master
table do not exist.
The Update Operation: Consider two existing relations named EMPLOYEE and
DEPARTMENT.
2. Updation in a referenced relation – There are again three options available if an updation
causes violation –
3. Modify the referencing Attributes – (ON UPDATE SET NULL): sets null value or
some valid value in the foreign key field for corresponding updating referenced value. i.e.
changing/updating the referencing attribute values that cause the violation either null or
another valid value. If there is no restriction or constraint applied for putting the NULL
value in the referencing relation – then allow updating from referenced relation otherwise
prohibited.
** Relational Operations **
To manipulate relations, relational model supports nine relational algebra operations. Given
this simple and restricted data structure, it is possible to define some very powerful relational
operators which, from the users' point of view, act in parallel' on all entries in a table
simultaneously, although their implementation may require conventional processing.
Codd originally defined eight relational operators.
1. SELECT originally called RESTRICT
2. PROJECT
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 24
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
3. JOIN
4. PRODUCT
5. UNION
6. INTERSECT
7. DIFFERENCE
8. DIVIDE
SELECT: RESTRICTS the rows chosen from a table to those entries with specified attribute
values.
Example: SELECT item FROM stock_level WHERE quantity > 100
constructs a new, logical table - an unnamed relation with one column per row (i.e. item)
containing all rows from stock_level that satisfy the WHERE clause.
PROJECT: Selects rows made up of a sub-set of columns from a table.
Example: PROJECT stock_item OVER item AND description
produces a new logical table where each row contains only two columns - item and
description. The new table will only contain distinct rows from stock_item; i.e. any duplicate
rows so formed will be eliminated.
JOIN: Associates entries from two tables on the basis of matching column values.
Example: JOIN stock_item WITH stock_level OVER item
It is not necessary for there to be a one-to-one relationship between entries in two tables to
be joined - entries which do not match anything will be eliminated from the result, and entries
from one table which match several entries in the other will be duplicated the required number
of times.
PRODUCT: Builds a relation from two specified relations consisting of all possible
combinations of rows, one from each of the two relations.
For example, consider two relations, A and B, consisting of rows:
A: a B: d => A product B :a d
b e a e
c b d
b e
c d
c e
UNION: Builds a relation consisting of all rows appearing in either or both of the two relations.
For example, consider two relations, A and B, consisting of rows:
A: a B: a => A union B: a
b e b
c c
e
INTERSECT: Builds a relation consisting of all rows appearing in both of the two relations.
For example, consider two relations, A and B, consisting of rows:
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 25
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
A: a B: a => A intersect B: a
b e
c
DIFFERENCE: Builds a relation consisting of all rows appearing in the first and not in the
second of the two relations.
For example, consider two relations, A and B, consisting of rows:
A: a B: a => A - B: b and B - A: e
b e c
c
DIVIDE: Takes two relations, one binary and one unary, and builds a relation consisting of all
values of one column of the binary relation that match, in the other column, all values in the
unary relation.
A: a x B: x => A divide B: a
a y y
a z
b x
c y
It saves time.
Without ER diagrams you cannot make a database structure & write production code.
ER Diagrams:
ERD stands for Entity Relationship diagram.
ER diagram shows the relationship between objects, places, people, events etc. within
that system.
It is a data modeling technique which helps in defining the business process.
Multi valued It represents multi valued attribute which can have many
Attribute values for a particular entity. For eg. Mobile Number.
**Entities**
Entities: An entity is a person, place, object, event or concept in the user environment about
which an organization maintains the data.
Types of Entities:
1. Strong Entity Types
2. Recursive Entity Types
3. Weak Entity Types
4. Composite Entity Types or Associative Entity Types
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 27
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Entity
Strong Entity Type: These are the entities which have a key attribute in its attribute list or a set
that has a primary key. The strong entity type is also called regular entity type. For Example,
The Student’s unique RollNo will identify the students. So, RollNo is set to be the
Primary Key of the STUDENT entity, & Hence STUDENT is a strong entity type because of its
key attribute.
Recursive Entity Type: It is also called Self Referential Relationship Entity Type. It is an
entity type with foreign key referencing to same table or itself. Recursive Entity Type occurs in
a unary relationship.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 28
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
One-to-one One-to-many
To achieve recursive relationship we need to setup foreign key in a table as shown
below:
PK FK
emp_num Ename Job Sal manager_id
This is a foreign key (manager_id) in a table that references the primary key
(emp_num) values of the same table. Thus, the above relationship is recursive relationship.
Weak Entity Type: Entity Type with no key or Primary Key are called weak entity Type.
The Tuples of weak entity type may not be possible to differentiate using one attribute of weak
entity. For every weak entity, there should be unique OWNER entity type. In the below
example, CHILD is a WEAK entity type and Employee is the OWNER entity type.
Composite Entities: If a Many to Many relationship exist then we must eliminate it by using
composite entities. Composite Entities are the entities which exist as a relationship as well as an
entity. The many to many relationship will be converted to 1 to many relationship.
Composite Entities are also called Bridge Entities, because they act like a bridge between the
two entities which have many to many relationships. Bridge or Composite entity composed of
the primary keys of each of the entities to be connected. A composite entity is represented by a
diamond shape with in a rectangle in an ER Diagram.
In the following example, the associative entity “CERTIFICATE” has the attributes
“cnum” and “date” which are peculiar to the relationship. It associates the instances of
“STUDENT” and “COURSE”.
**Attributes**
Attribute is a property or characteristics of an entity. It's also known as columns of the
table. In other words, an attribute is a list of all related information of an entity, which has valid
value. Following table shows attributes for few entities.
Entity/ Table Attribute/ Fields
STUDENT name, rno, age, marks, address, total, average
EMPLOYEE ename, salary, dob, doj, department
An attribute can have single value or multiple value or range of values. In addition, each
attribute can contain certain type of data like only numeric value, or only alphabets, or
combination of both, or date or negative or positive values etc. Depending on the values that an
attribute can take, it is divided into different types.
1. Simple Attribute: These kinds of attributes have values which cannot be divided further.
For example, STUDENT_ID attribute which cannot be divided further. Passport Number is
unique value and it cannot be divided.
2. Composite Attribute: This kind of attribute can be divided further to more than one simple
attribute. For example, address of a person. Here address can be further divided as Door#,
street, city, state and pin which are simple attributes.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 30
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
3. Derived Attribute: Derived attributes are the one whose value can be obtained from other
attributes of entities in the database. For example, Age of a person can be obtained from date
of birth and current date. Average salary, annual salary, total marks of a student etc are few
examples of derived attribute.
4. Stored Attribute: The attribute which gives the value to get the derived attribute are called
Stored Attribute. In example above, age is derived using Date of Birth. Hence Date of Birth
is a stored attribute.
5. Single Valued Attribute: These attributes will have only one value. For example,
EMPLOYEE_ID, passport#, driving license#, SSN etc have only single value for a person.
6. Multi-Valued Attribute: These attribute can have more than one value at any point of time.
Manager can have more than one employee working for him, a person can have more than
one email address, and more than one house etc is the examples.
7. Simple Single Valued Attribute: This is the combination of above four types of attributes.
An attribute can have single value at any point of time, which cannot be divided further. For
example, EMPLOYEE_ID – it is single value as well as it cannot be divided further.
8. Simple Multi-Valued Attribute: Phone number of a person, which is simple as well as he
can have multiple phone numbers is an example of this attribute.
9. Composite Single Valued Attribute: Date of Birth can be a composite single valued
attribute. Any person can have only one DOB and it can be further divided into date, month
and year attributes.
10.Composite Multi-Valued Attribute: Shop address which is located two different locations
can be considered as example of this attribute.
**Relationship**
A Relationship is an association established between common fields (columns) of two or
more tables. A relationship defines how two or more entities are inter-related. For example,
STUDENT and CLASS entities are related as 'Student X studies in a Class Y'. Here 'Studies'
defines the relationship between Student and Class.
Degrees of Relationship: In a relationship two or more number of entities can participate. The
number of entities who are part of a particular relationship is called degrees of relationship. If
only two entities participate in the mapping, then degree of relation is 2 or binary. If three
entities are involved, then degree of relation is 3 or ternary. If more than 3 entities are involved
then the degree of relation is called n-degree or n-nary.
Cardinality of Relationship: How many number of instances of one entity is mapped to how
many number of instances of another entity is known as cardinality of a relationship. In a
‘studies’ relationship above, what we observe is only one Student X is studying in on Class Y.
i.e.; single instance of entity student mapped to a single instance of entity Class. This means the
cardinality between Student and Class is 1:1.
Based on the cardinality, there are 3 types of relationship.
1. One - to - One Relationship
2. One - to - Many Relationship
3. Many - to - Many Relationship
One - to - One Relationship: In One - to - One Relationship, one entity is related with only one
other entity. One row in a table is linked with only one row in another table and vice versa.
For example: A Country can have only one Capital City.
One - to - Many Relationship: In One - to - Many Relationship, one entity is related to many
other entities. One row in a table A is linked to many rows in a table B, but one row in a table B
is linked to only one row in table A.
For example: One Department has many Employees.
Many - to - Many Relationship: In Many - to - Many Relationship, many entities are related
with the multiple other entities. This relationship is a type of cardinality which refers the
relation between two entities.
5. One course is taught by only one instructor. But one instructor teaches many courses.
Hence the cardinality between course and instructor is Many to One (N :1)
Step 3: Identify the key attributes
"Departmen_Name" can identify a department uniquely. Hence Department_Name is the
key attribute for the Entity "Department".
Course_ID is the key attribute for "Course" Entity.
Student_ID is the key attribute for "Student" Entity.
Instructor_ID is the key attribute for "Instructor" Entity.
Step 4: Identify other relevant attributes
For the department entity, other attributes are location
For course entity, other attributes are course_name,duration
For instructor entity, other attributes are first_name, last_name, phone
For student entity, first_name, last_name, phone
Step 5: Draw complete ER diagram
By connecting all these details, we can now draw ER diagram as given below.
UNIT- II
** Database Integrity **
Data integrity contains guidelines for data retention, specifying or guaranteeing the length
of time data can be retained in a particular database. To achieve data integrity, these rules are
consistently and routinely applied to all data entering the system, and any relaxation of
enforcement could cause errors in the data. Implementing checks on the data as close as
possible to the source of input (such as human data entry), causes less erroneous data to enter
the system. Strict enforcement of data integrity rules causes the error rates to be lower, resulting
in time saved troubleshooting and tracing erroneous data and the errors it causes algorithms.
Data integrity also includes rules defining the relations a piece of data can have, to other
pieces of data, such as a Customer record being allowed to link to purchased Products, but not
to unrelated data such as Corporate Assets. Data integrity often includes checks and correction
for invalid data, based on a fixed schema or a predefined set of rules.
Types of integrity constraints: Data integrity is normally enforced in a database system by a
series of integrity constraints or rules. Three types of integrity constraints are an inherent part of
the relational data model: entity integrity, referential integrity and domain integrity:
Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule
which states that every table must have a primary key and that the column or columns
chosen to be the primary key should be unique and not null.
Referential integrity concerns the concept of a foreign key. The referential integrity rule
states that any foreign-key value can only be in one of two states. The usual state of affairs
is that the foreign-key value refers to a primary key value of some table in the database.
Occasionally, and this will depend on the rules of the data owner, a foreign-key value can
be null. In this case, we are explicitly saying that either there is no relationship between the
objects represented in the database or that this relationship is unknown.
Domain integrity specifies that all columns in a relational database must be declared upon
a defined domain. The primary unit of data in the relational data model is the data item.
Such data items are said to be non-decomposable or atomic. A domain is a set of values of
the same type. Domains are therefore pools of values from which actual values appearing
in the columns of a table are drawn.
User-defined integrity refers to a set of rules specified by a user, which do not belong to
If a database supports these features, it is the responsibility of the database to ensure data
integrity as well as the consistency model for the data storage and retrieval.
** Data Redundancy **
Data redundancy in database means that some data fields are repeated in the database.
This data repetition may occur either if a field is repeated in two or more tables or if the field is
repeated within the table.
Data can appear multiple times in a database for a variety of reasons. For example, a shop
may have the same customer’s name appearing several times if that customer has bought
several different products at different dates.
Disadvantages (OR) Problems associated with data redundancy:
1. Increases the size of the database unnecessarily.
2. Causes data inconsistency.
3. Decreases efficiency of database.
4. May cause data corruption.
** Functional Dependency **
When one attribute in a relation uniquely determines another attribute is called as
functional dependency.
For example, if an attribute ‘X’ determines the value of ‘Y’ it can be written as ‘XY’ it
means, “Y is functionally dependent upon X”.
Here, X- is the determinant attribute
Y- is the dependent attribute.
The common functional dependencies are:
1) Partial dependency.
2) Transitive dependency.
Partial dependency: A non-key attribute is partially (not fully) depended on key attribute (or)
A dependency based on only part of primary key is known as partial dependency.
Transitive dependency
Partial dependency
** Normalization **
Normalization is a process of correcting and evaluating table structures to minimize data
redundancies and reducing data anomalies. (OR) it is a step-by-step decomposition of complex
records into simple records.
Normalization follows series of stages called “normal forms” like:
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
Normalization involve decomposition (division) of “tables with anomalies” in “smaller
well structured tables”
Rules of Data Normalization:
1. Eliminate Repeating Groups - Make a separate table for each set of related attributes, and
give each table a primary key.
2. Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key,
remove it to a separate table.
3. Eliminate Columns Not Dependent On Key - If attributes do not contribute to a
description of the key, remove them to a separate table.
4. Isolate Independent Multiple Relationships - No table may contain two or more 1:m or
m:n relationships that are not directly related.
5. Isolate Semantically Related Multiple Relationships - There may be practical constrains
on information that justify separating logically related many-to-many relationships.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 40
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Consider the following STUDENT table and see how the table is normalized from 1NF to
3NF.
rno sname group fee skills
101 Ravi MBA 30000 C
C++
java
The above STUDENT table consists of multi valued attributes (skills), this can be removed
in 1NF. It consists of partial dependencies which are removed in 2NF and also it consists of
Transitive dependencies which are removed in 3NF.
First Normal Form (1NF): The lowest possible implementation of normal forms is 1NF. A
database table in 1NF must satisfy the following conditions.
The primary key entity requirements are met.
Each row and column intersections can contain one and only one value.
All the table’s attributes are dependent on the primary key attribute.
The above table is changed into following table to satisfy 1NF.
rno sname group fee skills
101 Ravi MBA 30000 C
101 Ravi MBA 30000 C++
101 Ravi MBA 30000 JAVA
Second Normal Form (2NF): A database table in 2NF must satisfy the following conditions.
The table must be in 1NF.
The table contains no partial dependencies. It means every non-key attribute is fully
depended on key-attribute.
A dependency based on only part of primary key is known as Partial dependency. The
above table consists of partial dependency (sname, group, fee are dependent on rno i,e, part
of Primary Key). So we need to remove this partial dependency from the above table to
satisfy 2NF. For this we decompose the STUDENT table into STUDENT and SKILLS
tables.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 41
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
STUDENT SKILLS
rno sname group fee rno skills
101 Ravi MBA 30000 101 C
101 C++
101 JAVA
Removing partial dependency:
Table: STUDENT Table: SKILLS
Third Normal Form (3NF): A database table in 3NF must satisfy the following conditions.
The table must be in 2NF
A dependency based on an attribute that is not part of primary key is known as Transitive
dependency. The above STUDENT table has transitive dependency i.e. dependency between
“fee” and “group” attributes. So we need to remove this transitive dependency by decomposing
the STUDENT table into STUDENT and GROUP.
STUDENT GROUP SKILLS
rno sname group group fee rno skills
101 Ravi MBA MBA 30000 101 C
101 C++
101 JAVA
Table
With 1NF 2NF 3NF
anomalies
BCNF (Boyce- Codd Normal form):BCNF stands for Boyce-Codd Normal Form. This normal
form is considered to be a special case of 3NF. But there are few differences between BCNF
and 3NF.
3NF is satisfying 2NF and removing Transitive dependency.
Transitive dependency exists only when a non-key attribute determines another non-key
attribute.
But it is possible for a non-key attribute to be the determinant of PK or part of PK
without violating the 3NF requirements. This is nothing but BCNF.
A table is in BCNF if and only if every determinant in the relation is a candidate key.
Consider the following table:
sid subject faculty
123 Phy Raman
123 Cs James
423 Phy Raman
423 Cs James
537 Cs Patrik
The above table that is in 3NF can be converted to a table in BCNF using simple two step
process.
1. The table is modified so that the determinant in the table that is not a candidate key
(Faculty) becomes a component of PK of the revised table.
** Decomposition **
Decomposition is the process of breaking down in parts or elements.
It replaces a relation with a collection of smaller relations.
It breaks the table into multiple tables in a database.
It should always be lossless, because it confirms that the information in the original
relation can be accurately reconstructed based on the decomposed relations.
If there is no proper decomposition of the relation, then it may lead to problems like loss
of information.
Properties of Decomposition: Following are the properties of Decomposition,
1. Lossless Decomposition
2. Dependency Preservation
3. Lack of Data Redundancy
1. Lossless Decomposition: Decomposition must be lossless. It means that the information
should not get lost from the relation that is decomposed. It gives a guarantee that the join will
result in the same relation as it was decomposed.
Example: Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1, E2,
E3, . . . . En; With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as
'Lossless Join Decomposition'.
In the above example, it means that, if natural joins of all the decomposition give the
original relation, then it is said to be lossless join decomposition.
Example: <Employee_Department> Table
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource
Decompose the above relation into two relations to check whether decomposition is lossless
or lossy. Now, we have decomposed the relation that is Employee and Department.
Relation 1: <Employee> Table
Eid Ename Age City Salary
E001 ABC 29 Pune 20000
E002 PQR 30 Pune 30000
E003 LMN 25 Mumbai 5000
E004 XYZ 24 Mumbai 4000
E005 STU 32 Bangalore 25000
Employee Schema contains (Eid, Ename, Age, City, Salary).
Employee ⋈ Department
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource
RAID 0: In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receives a block of data to
write/read in parallel. It enhances the speed and performance of the storage device. There is no
parity and backup in Level 0.
RAID 1: RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
copy of data to all the disks in the array. RAID level 1 is also called mirroring and provides
100% redundancy in case of a failure.
RAID 2: RAID 2 records Error Correction Code using Hamming distance for its data, striped
on different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC
codes of the data words are stored on different set disks. Due to its complex structure and high
cost, RAID 2 is not commercially available.
RAID 3: RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4: In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas level
4 uses block-level striping. Both level 3 and level 4 require at least three disks to implement
RAID.
RAID 5: RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.
RAID 6: RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement RAID.
** File Organization **
File organization is a way of organizing the data or records in a file. It does not refer to
how files are organized in folders, but how the contents of a file are added and accessed. There
are several types of file organization, the most common of them are sequential, relative and
indexed. They differ in how easily records can be accessed and the complexity in which records
can be organized.
Some of the file organizations are
1. Sequential File Organization
2. Heap File Organization
3. Hash/Direct File Organization
4. Indexed Sequential Access Method
5. B+ Tree File Organization
6. Cluster File Organization
If a new record is inserted, then in the above case it will be inserted into data block 1.
When a record has to be retrieved from the database, in this method, we need to traverse
from the beginning of the file till we get the requested record. Hence fetching the records in
very huge tables, it is time consuming. This is because there is no sorting or ordering of the
records. We need to check all the data.
Similarly if we want to delete or update a record, first we need to search for the record.
Again, searching a record is similar to retrieving it- start from the beginning of the file till the
record is fetched. If it is a small file, it can be fetched quickly. But larger the file, greater
amount of time needs to be spent in fetching.
In addition, while deleting a record, the record will be deleted from the data block. But it
will not be freed and it cannot be re-used. Hence as the number of record increases, the
memory size also increases and hence the efficiency. For the database to perform better, DBA
has to free this unused memory periodically.
In the diagram above, R1, R2, R3 etc are the records. They contain all the attribute of a
row. i.e.; when we say student record, it will have his id, name, address, course, DOB etc.
Similarly R1, R2, R3 etc can be considered as one full set of attributes.
In the second method, records are sorted (either ascending or descending) each time they
are inserted into the system. This method is called sorted file method.
Sorting of records may be based on the primary key or on any other columns. Whenever
a new record is inserted, it will be inserted at the end of the file and then it will sort – ascending
or descending based on key value and placed at the correct position. In the case of update, it
will update the record and then sort the file to place the updated record in the right place. Same
is the case with delete.
In this method, if any record has to be retrieved, based on its index value, the data block
address is fetched and the record is retrieved from memory.
Advantages of ISAM:
Since each record has its data block address, searching for a record in larger database is
easy and quick. There is no extra effort to search records. But proper primary key has to
be selected to make ISAM efficient.
This method gives flexibility of using any column as key field and index will be
generated based on that. In addition to the primary key and its index, we can have index
generated for other fields too. Hence searching becomes more efficient, if there is search
based on columns other than primary key.
It supports range retrieval, partial retrieval of records. Since the index is based on the key
value, we can retrieve the data for the given range of values. In the same way, when a
partial key value is provided, say student names starting with ‘JA’ can also be searched
easily.
Disadvantages of ISAM:
An extra cost to maintain index has to be afforded. i.e.; we need to have extra space in the
disk to store this index value. When there is multiple key-index combinations, the disk
space will also increase.
As the new records are inserted, these files have to be restructured to maintain the
sequence. Similarly, when the record is deleted, the space used by it needs to be released.
Else, the performance of the database will slow down.
When a record has to be retrieved, based on the hash key column, the address is
generated and directly from that address whole record is retrieved. Here no effort to traverse
through whole file. Similarly when a new record has to be inserted, the address is generated by
hash key and record is directly inserted. Same is the case with update and delete. There is no
effort for searching the entire file nor sorting the files. Each record will be stored randomly in
the memory.
These types of file organizations are useful in online transaction systems, where retrieval
or insertion/updation should be faster.
Advantages of Hash File Organization:
Records need not be sorted after any of the transaction. Hence the effort of sorting is
reduced in this method.
Since block address is known by hash function, accessing any record is very faster.
Similarly updating or deleting a record is also very quick.
This method can handle multiple transactions as each record is independent of other. i.e.;
since there is no dependency on storage location for each record, multiple records can be
accessed at the same time.
It is suitable for online transaction systems like online banking, ticket booking system
etc.
** Types of Indexes **
We know that data is stored in the form of records. Every record has a key field, which
helps it to be recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done. Indexing in database
systems is similar to what we see in books. Indexing is defined based on its indexing attributes.
Indexing can be of the following types:
Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
Secondary Index − Secondary index may be generated from a field which is a candidate
key and has a unique value in every record, or a non-key with duplicate values.
Clustering Index − Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field.
Ordered Indexing is of two types:
Dense Index
Sparse Index
Dense Index:
In dense index, there is an index record for every search key value in the database. This
makes searching faster but requires more space to store index records itself. Index records
contain search key value and a pointer to the actual record on the disk.
Sparse Index: In sparse index, index records are not created for every search key. An index
record here contains a search key and an actual pointer to the data on the disk. To search a
record, we first proceed by index record and reach at the actual location of the data. If the data
we are looking for is not where we directly reach by following the index, then the system starts
sequential search until the desired data is found.
Multilevel Index: Index records comprise search-key values and data pointers. Multilevel
index is stored on the disk along with the actual database files. As the size of the database
grows, so does the size of the indices. There is an immense need to keep the index records in
the main memory so as to speed up the search operations. If single-level index is used, then a
large size index cannot be kept in memory which leads to multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which can easily
be accommodated anywhere in the main memory.
** B+ Tree **
A B+ tree is a balanced binary search tree that follows a multi-level index format. The
leaf nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at
the same height, thus balanced. Additionally, the leaf nodes are linked using a link list;
therefore, a B+ tree can support random access as well as sequential access.
Structure of B+ Tree: Every leaf node is at equal distance from the root node. A B+ tree is
of the order n where n is fixed for every B+ tree.
Internal nodes:
Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
At most, an internal node can contain n pointers.
Leaf nodes:
Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
At most, a leaf node can contain n record pointers and n key values.
Every leaf node contains one block pointer P to point to next leaf node and forms a
linked list.
B Tree Insertion: B+ trees are filled from bottom and each entry is done at the leaf node.
+
The salary lists have been set up in order of increasing salary within each range (record A
precedes D and C even though E#(C) and E#(D) are less than E#(A)).
We shall assume that the index for every key is dense and contains a value entry for each
distinct value in the file. Since the index entries are variable length (the number of records
with the same key value is variable), index maintenance becomes more complex than for
multi list. However, several benefits accrue from this scheme. The retrieval works in two
steps. In the first step, the indexes are processed to obtain a list of records satisfying the
query and in the second, these records are retrieved using this list. The number of disk
accesses needed is equal to the number of records being retrieved plus the number to process
the indexes.
Inverted files represent one extreme of file organization in which only the index structures
are important. The records themselves may be stored in any way (sequentially ordered by
primary key, random, linked ordered by primary key etc.).
UNIT- III
** Structured Query Language **
SQL stands for Structured Query Language, developed by E.F. Codd from IBM in the year
1975.
It is a programming language which stores, manipulates and retrieves the stored data in
RDBMS.
SQL syntax is not case sensitive.
SQL is standardized by both ANSI and ISO.
It is a standard language for accessing and manipulating databases.
Characteristics of SQL:
SQL is extremely flexible.
SQL uses a free form syntax that gives the ability to user to structure the SQL statements in
a best suited way.
It is a high level language.
It receives natural extensions to its functional capabilities.
It can execute queries against the database.
Advantages of SQL:
SQL provides a greater degree of abstraction than procedural language.
It is coded without embedded data-navigational instructions.
It enables the end users to deal with a number of database management systems where it is
available.
It retrieves quickly and efficiently huge amount of records from a database.
No coding required while using standard SQL.
** SQL Commands **
SQL commands can be classified into 4 types.
1. Data Definition Language (DDL)
2. Data Manipulation Language (DML)
3. Data Control Language (DCL)
4. Transaction Control Language (TCL)
4)TRUNCATE Command: It is used to delete rows (not the table’s structure) with auto
commit.
Syntax: TRUNCATE TABLE <table-name>;
(OR)
INSERT INTO <table name> [column-list] VALUES (value-list);
Example: 1. Take an EMPLOYEE table with the columns: eno, ename, job, sal, hiredate.
To insert a record into EMPLOYEE table:
SQL> insert into EMPLOYEE (eno, ename, job, sal, hiredate) values (101, ‘vasu’
‘clerk’, 7000, ’12-jan-2012’);
(OR)
SQL> insert into EMPLOYEE values (102, ‘sri’, ‘manager’, 10000, ’26-feb-2012’);
2. To insert a record that contains only eno, ename and sal:
SQL> insert into EMPLOYEE (eno, ename, sal) values (201, ‘ram’, 20000);
3. To insert a record through parameter substitution
SQL> insert into EMPLOYEE (eno, ename, job, sal) values (&eno, ‘&ename’, ‘&job’,
&sal); (OR)
SQL> insert into EMPLOYEE values(&eno, ‘&ename’, ‘&job’, &sal);
When we execute the above query, it will ask values from keyboard as follows:
Enter value for eno:104
Enter value for ename: Anji
Enter value for job: Asst manager
Enter value for sal: 15000
2. Update command: It is used to edit/change the values of attributes in a table.
Syntax: UPDATE <table-name> set column_name = value [,column_name = value, …….]
[WHERE condition];
3. DELETE command: It is used to remove (delete) one or more rows from a table.
Syntax: DELETE FROM <table-name> [WHERE condition];
Clause Description
WHERE It specifies which rows to retrieve.
GROUP BY It is used to arrange the data into groups.
HAVING It selects among the groups defined by the GROUP BY clause.
ORDER BY It specifies an order in which to return the rows.
3. Save point: It is used to make Limit (margin) for commit or roll back.
Syntax: SAVEPOINT savepointname;
COMMIT
Transcation
DELETE
SAVEPOINT A
INSERT
UPDATE
SAVEPOINT B
INSERT
ROLLBACK ROLLBACK ROLLBACK
To SAVEPOINT B to SAVEPOINT A
It is a fixed length data type, which means the memory is allocated based on the size defined
by the user but not on the value assigned.
3. Varchar/varchar2: It is also used to define attribute to store the string values. The
minimum size is 1 and maximum is 4000 bytes.
Syntax: Varchar2 (n)
It is a variable length data type, which means the memory is dynamically allocated based
on the value given by the user but not on its size defined.
4. Long: It is used to define an attribute to store the text values, with the size larger than 4000
characters. Maximum size 2GB and it is a variable-length data type.
5. Data/Time: It is used to define an attribute to store data and time values given by the user.
Default format for date is: DD-MON-YY (or) DD-MON-YYYY.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 66
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Examples on HAVING:
1. To List the department number, whose maximum salary is greaterthan 1000.
SQL> select deptno, MAX(sal) from EMP GROUP BY deptno HAVING MAX(sal) > 1000;
2. To List the jobs, which are done by minimum of 2 persons.
SQL> select job from EMP GROUP BY job HAVING count (*) >=3;
Joins: Joins are used to fetch the information from multiple tables. Join is a relation between
the common fields of two or more tables. Types of joins are:
1. Equi join (Inner join)
2. Non-Equi join (conditional)
3. Outer join
4. Self join (recursive join)
5. Cartesian product joins.
Consider the following table, for demonstration of joins;
EMP DEPT
Eno Ename Deptno Deptno deptname
1 Abhi 10 10 BCom
2 Balu 20 20 BSc
3 Charan 30 30 BBM
4 dhoni 40 50 BCA
Equi join: A join in which the joining condition is based on equality between the values in
common columns.
EX: Select eno, ename, EMP.Deptno, DEPT.deptno, deptname from EMP, DEPT where
EMP.deptno=DEPT.deptno;
ENO ENAME EMP. DEPTNO DEPT. DEPTNO DEPTNAME
1 Abhi 10 10 Bcom
2 Balu 20 20 BSc
3 charan 30 30 BBM
Non-Equi join: A join in which joining condition is based on non-equality between the values
in common columns.
EX: Select eno, ename, deptname from EMP, DEPT where EMP.Deptno!= DEPT.deptno;
ENO ENAME DEPTNAME
1 Abhi BSc
1 Abhi BBM
1 Abhi BCA
2 Balu BCom
2 Balu BBM
2 Balu BCA
3 Charan BCom
3 Charan BSc
3 Charan BCA
4 Dhoni BCom
4 Dhoni BSc
4 Dhoni BBM
4 Dhoni BCA
Outer join: A join in which that do not have matching values in common columns is also
included in the result. It includes left outer and right outer join.
i) Left outer join: It gives all the values of left table plus matched values from the right table.
Following query displays all records from EMP table even if there is no matching deptno in
DEPT table.
EX: select eno, ename, EMP.deptno, deptname from EMP, DE{T WHERE
EMP.deptno=DEPT.deptno (+);
ENO ENAME EMP. DEPTNO DEPTNAME
1 Abhi 10 BCom
2 Balu 20 BSc
3 Charan 30 BBM
4 Dhoni 40 ------
ii) Right outer join: It gives all the values of right table plus matched values from the left table.
Following query displays all records from DEPT table even if there is no matching deptno in
EMP table.
EX: Select eno, ename, DEPT.deptno, deptname from EMP, DEPT WHERE
EMP.deptno(+)= DEPT.deptno;
ENO ENAME DEPT. DEPTNO DEPTNAME
1 Abhi 10 BCom
2 Balu 20 BSc
3 Charan 30 BBM
-- ------ 50 BCA
Self join: This is a join in which a table is joined to itself, where joining condition is based on
columns of a same table.
EX: Select e.eno, d.ename from EMP e, EMP d where e.mid= d.eno;
EMP
eno ename mid
1 A 3
2 B 3
3 C 3
4 D 4
5 E 4
Cartesian product join: A join in which all possible combinations of all the rows of first table
with each row of second table appear.
EX: Select ename, deptname from EMP, DEPT;
ENAME DEPTNAME
Abhi BCom
Balu BCom
Charan BCom
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 70
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Dhoni BCom
Abhi BSc
Balu BSc
Charan BSc
Dhoni BSc
Abhi BBM
Balu BBM
Charan BBM
Dhoni BBM
Abhi BCA
Balu BCA
Charan BCA
Dhoni BCA
** Views **
View is a virtual table on one or more tables. The table on which a view is created is
called as a “base table”.
We can provide limitation on updating in base table through a view. If any changes made
to the base table those changes are also reflected in view.
Advantages of views:
We can provide security to the data.
We can provide limitation on data.
We can provide customized view for the data.
View uses little storage area.
It allows different users to view the same data in different ways at the same time.
It does not allow direct access to the tables of the data dictionary.
Disadvantages of views:
It can’t be indexed.
Takes time for creating view each time.
We cannot use DML operations on View, if there is more than one table.
When table is dropped view becomes inactive.
View is a database object, so it occupies the space.
Without table, view will not work.
Updation is possible for simple view but not for complex views, they are read only type
of views.
Syntax: Create view <view-name> as select columns from <table-name> [WHERE condition];
** Sequences **
A sequence is a database object that is used to generate the sequential numeric values for
any column of the base table. It is useful when we need to create a unique number to act as a
primary key.
Characteristics of sequences:
Sequences are independent objects.
Sequences have a name and can be used any where.
Sequences are not tied (linked) to a table.
Sequences can be created and deleted any time we want..
Ex: create sequence SQ1 MINVALUE 1 MAXVALUE 100 start with 1 increment by 1;
In the above example, sequence object “SQ1” is created and it generates the number like
1, 2, 3, 4, ……..100
To retrieve numbers from sequence object, we use following statement:
Sequencename.nextval;
Example: There is a table called “STUDENT” with two columns sno, sname.
The following insert command uses sequence to insert values into “sno” automatically.
Insert into STUDENT values (SQ1.nextval, ‘Ravi’);
Insert into STUDENT values (SQ1.nextval, ‘Ram’);
Insert into STUDENT values (SQ1.nextval, ‘RAJ’);
** Indexes **
An index is an object which is used to improve performance during retrieval of records.
It helps to retrieve the data quickly from the tables.
When column contains a large number of NULL values, then we can create Index.
It is a structure that provides faster access to the rows of a table based on the values of one or
more columns.
** Synonyms **
A synonym is an alternative name for objects such as tables, views, sequences, stored
procedures, and other database objects.
We generally use synonyms when we are granting access to an object from another
schema and we don't want the users to have to worry about knowing which schema owns the
object.
Create Synonym (or Replace): We may wish to create a synonym so that users do not have to
prefix the table name with the schema name when using the table in a query.
Syntax: The syntax to create a synonym in Oracle is:
Drop synonym: Once a synonym has been created in Oracle, we might at some point need to
drop the synonym.
Syntax: The syntax to drop a synonym in Oracle is:
DROP [PUBLIC] SYNONYM [schema .] synonym_name [force];
In the above syntax:
PUBLIC
Allows us to drop a public synonym. If we have specified PUBLIC, then we don't specify
a schema.
force
It will force Oracle to drop the synonym even if it has dependencies. It is probably not a
good idea to use force as it can cause invalidation of Oracle objects.
For example: DROP PUBLIC SYNONYM suppliers;
This DROP statement would drop the synonym called suppliers that we defined earlier.
** Table handling**
The SQL DROP TABLE statement is used to remove a table definition and all the data,
indexes, triggers, constraints and permission specifications for that table.
NOTE − We should be very careful while using this command because once a table is deleted
then all the information available in that table will also be lost forever.
Syntax: The basic syntax of this DROP TABLE statement is as follows:
DROP TABLE table_name;
Example: Let us first verify the CUSTOMERS table and then we will delete it from the
database as shown below:
SQL> DESC CUSTOMERS;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | | |
| NAME | varchar(20) | NO | | | |
| AGE | int(11) | NO | | | |
| ADDRESS | char(25) | YES | | NULL | |
| SALARY | decimal(18,2) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
This means that the CUSTOMERS table is available in the database, so let us now drop it
as shown below.
SQL> DROP TABLE CUSTOMERS;
Query OK, 0 rows affected (0.01 sec)
Now, if we would try the DESC command, then we will get the following error:
SQL> DESC CUSTOMERS;
ERROR 1146 (42S02): Table 'TEST.CUSTOMERS' doesn't exist
Here, TEST is the database name which we are using for our examples.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 74
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
UNIT- IV
** Transaction **
Database Transaction is an atomic unit that contains one or more SQL statements.
It is a series of operations that performs as a single unit of work against a database.
It has a beginning and an end to specify its boundary.
Let's take a simple example of bank transaction, Suppose a Bank clerk transfers Rs. 1000
from X's account to Y's account.
X's Account
open-account (X)
prev-balance = X.balance
curr-balance = prev-balance – 1000
X.balance = curr-balance
close-account (X)
Decreasing Rs. 1000 from X's account, saving new balance that is current balance and
after completion of transaction the last step is closing the account.
Y's Account
open-account (Y)
prev - balance = Y.balance
curr - balance = prev-balance + 1000
Y.balance = curr-balance
close-account (Y)
Adding Rs. 1000 in the Y's account and saving new balance that is current balance and after
completion of transaction the last step is closing the account.
The above example defines a very simple and small transaction that tells how the transaction
management actual works.
** Transaction Properties **
Following are the Transaction Properties, referred to by an acronym ACID properties. These
properties guarantee that the database transactions are processed reliably.
1.Atomicity
2.Consistency
3.Isolation
4.Durability
1. Atomicity:
Atomicity defines that all operations of the transactions are either executed or none.
Atomicity is also known as 'All or Nothing', it means that either performs the operations
or not performs at all.
It is maintained in the presence of deadlocks, CPU failures, disk failures, database and
application software failures.
2. Consistency:
Consistency defines that after the transaction is finished, the database must remain in a
consistent state.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 75
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
** Transaction States **
A transaction is a small unit of program which contains several low level tasks. It is an event
which occurs on the database. It has the following states,
1. Active
2. Partially Committed
3. Failed
4. Aborted
5. Committed
1. Active: Active is the initial state of every transaction. The
transaction stays in Active state during execution.
2. Partially Committed: Partially committed state defines
that the transaction has executed the final statement.
3. Failed: Failed state defines that the execution of the
transaction can no longer proceed further.
4. Aborted: Aborted state defines that the transaction has
rolled back and the database is being restored to the
consistent state.
5. Committed: If the transaction has completed its
execution successfully, then it is said to be committed.
** Concurrency Control **
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions. We have
concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent
transactions.
Methods for Concurrency control:
There are main three methods for concurrency control. They are as follows:
1.Locking Methods
3.Optimistic Methods
3.Time-stamp Methods
1. Locking Methods of Concurrency Control: "A lock is a variable, associated with the data
item, which controls the access of that data item." Locking is the most widely used form of the
concurrency control. Locks are further divided into three fields:
i) Lock Granularity
ii) Lock Types (or) Locking protocols
iii) Deadlocks
i) Lock Granularity: A database is basically represented as a collection of named data
items. The size of the data item chosen as the unit of protection by a concurrency control
program is called GRANULARITY. Locking can take place at the following level:
Database level.
Table level.
Page level.
Row (Tuple) level.
Attributes (fields) level.
ii) Lock Types: The DBMS mainly uses following types of locking techniques.
a. Binary Locking
b. Shared / Exclusive Locking
c. Two - Phase Locking (2PL)
a. Binary Locking: A binary lock can have two states or values: locked and unlocked (or 1 and
0, for simplicity). A distinct lock is associated with each database item X.
If the value of the lock on X is 1, item X cannot be accessed by a database operation that
requests the item. If the value of the lock on X is 0, the item can be accessed when requested.
We refer to the current value (or state) of the lock associated with item X as LOCK(X).
Two operations, lock_item and unlock_item, are used with binary locking.
Exclusive lock: These Locks are referred as Write locks, and denoted by 'X'.
If a transaction T has obtained Exclusive lock on data item X, then T can be read as well as
write X. Only one Exclusive lock can be placed on a data item at a time. This means multiple
transactions does not modify the same data simultaneously.
c. Two-Phase Locking (2PL):
Two-phase locking is the standard protocol used to maintain level 3 consistency. 2PL
defines how transactions acquire and relinquish locks. The essential discipline is that after a
transaction has released a lock it may not obtain any further locks. 2PL has the following two
phases:
A growing phase: in which a transaction acquires all the required locks without unlocking any
data. Once all locks have been acquired, the transaction is in its locked
point.
A shrinking phase: in which a transaction releases all locks and cannot obtain any new lock.
A transaction shows Two-Phase Locking technique.
Normal Locking 2- Phase Locking
Lock (A) Lock (A)
Growing Phase
Read (A) Lock (B) Locked point
A = A - 100 Read (A)
Write (A) A = A - 100
Unlock (A) Write (A)
Operations
Lock (B) Read (B)
Read (B) B=B+100
B=B+100 Write (B)
Write (B) Unlock (A)
Shrinking Phase
Unlock (B) Unlock (B)
iii) Deadlocks: A deadlock is a condition where in two or more tasks are waiting for each
other in order to be finished but none of the task is willing to give up the resources that other
task needs. In this situation no task ever gets finished and is in waiting state forever.
Neither transaction can continue because each transaction in the set is on a waiting queue,
waiting for one of the other transactions in the set to release the lock on an item. Transactions
whose lock requests have been refused are queued until the lock can be granted.
A deadlock is also called a circular waiting condition where two transactions are waiting
(directly or indirectly) for each other. Thus in a deadlock, two transactions are mutually
excluded from accessing the next record required to complete their transactions, also called a
deadly embrace.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 78
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Example:
A deadlock exists two transactions A and B exist in the following example:
Transaction A = access data items X and Y
Transaction B = access data items Y and X
Here, Transaction-A has acquired lock on X and is waiting to acquire lock on Y. While,
Transaction-B has acquired lock on Y and is waiting to acquire lock on X. But, none of them
can execute further.
Transaction-A Time Transaction-B
--- t0 ---
Lock (X) (acquired lock on X) t1 ---
--- t2 Lock (Y) (acquired lock on Y)
Lock (Y) (request lock on Y) t3 ---
Wait t4 Lock (X) (request lock on X)
Wait t5 Wait
Wait t6 Wait
Wait t7 Wait
Deadlock Avoidance:
Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock
occurrence. There are two algorithms for deadlock avoidance.
Wait/Die
Wound/Wait
Here is the table representation of resource allocation for each algorithm. Both of these
algorithms take process age into consideration while determining the best possible way of
resource allocation for deadlock avoidance.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 80
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Wait/Die Wound/Wait
Older process needs a resource held by younger
Older process waits Younger process dies
process
Younger process needs a resource held by older Younger process Younger process
process dies waits
ii. The rollback involves only the local copy of data, the database is not involved and thus
there will not be any cascading rollbacks.
Problems of Optimistic Methods for Concurrency Control:
i. Conflicts are expensive to deal with, since the conflicting transaction must be rolled back.
ii. Longer transactions are more likely to have conflicts and may be repeatedly rolled
back because of conflicts with short transactions.
Applications of Optimistic Methods for Concurrency Control:
i. Only suitable for environments where there are few conflicts and no long transactions.
ii. Acceptable for mostly Read or Query database systems that require very few update
transactions
** Timestamp-based Protocols **
The most commonly used concurrency protocol is the timestamp based protocol. This
protocol uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions
at the time of execution, whereas timestamp-based protocols start working as soon as a
transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by
the age of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is
two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know when the last ‘read and write’ operation was performed on the data item.
** Serialisable Schedules **
In a database system, we can have number of transaction processing. Related transactions
will be processed one after another. There are some transactions processes in parallel. Some of
the transactions can be grouped together.
A schedule is a process of grouping the transactions into one and executing them in a
predefined order. A schedule is required in a database because when some transactions execute
in parallel, they may affect the result of the transaction – means if one transaction is updating
the values which the other transaction is accessing, then the order of these two transactions will
change the result of second transaction. Hence a schedule is created to execute the transactions.
A schedule is called serial schedule, if the transactions in the schedule are defined to execute
one after the other.
Even when we are scheduling the transactions, we can have two transactions in parallel,
if they are independent. But if they are dependent by any chance, then the results will change.
For example, say one transaction is updating the marks of a student in one subject; meanwhile
another transaction is calculating the total marks of a same student. If the second transaction is
executed after first transaction is complete, then both the transaction will be correct. But what if
second transaction runs first? It will have wrong total mark.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 82
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
** Database Failures **
A database includes a huge amount of data and transaction. If the system crashes or failure
occurs, then it is very difficult to recover the database.
There are some common causes of failures (kinds of failures) such as,
1. System Crash
2. Transaction Failure
3. Network Failure
4. Disk Failure
5. Media Failure
Each transaction has ACID property. If we fail to maintain the ACID properties, it is the
failure of the database system.
1. System Crash:
System crash occurs when there is a hardware or software failure or external factors like
a power failure.
The data in the secondary memory is not affected when system crashes because the
database has lots of integrity. Checkpoint prevents the loss of data from secondary
memory.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 83
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
2. Transaction Failure:
A transaction has to abort when it fails to execute or when it reaches a point from where
it can’t go any further. This is called transaction failure where only a few transactions or
processes are hurt.
Reasons for a transaction failure could be:
Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
3. Network Failure:
A network failure occurs when a client – server configuration or distributed database
system are connected by communication networks.
4. Disk Failure:
Disk Failure occurs when there are issues with hard disks like formation of bad sectors,
disk head crash, unavailability of disk etc.
5. Media Failure:
Media failure is the most dangerous failure because, it takes more time to recover than
any other kind of failures.
A disk controller or disk head crash is a typical example of media failure.
Natural disasters like floods, earthquakes, power failures, etc. damage the data.
Physical backup provides the minute details about the transaction and modification to the
database.
2. Logical backup:
Logical Backup contains logical data which is extracted from a database.
It includes backup of logical data like views, procedures, functions, tables, etc.
It is a useful supplement to physical backups in many circumstances but not a sufficient
protection against data loss without physical backups, because logical backup provides
only structural information.
Importance of Backup:
Planning and testing backup helps against failure of media, operating system, software
and any other kind of failures that cause a serious data crash.
It determines the speed and success of the recovery.
Physical backup extracts data from physical storage (usually from disk to tape).
Operating system is an example of physical backup.
Logical backup extracts data using SQL from the database and store it in a binary file.
Logical backup is used to restore the database objects into the database. So the logical
backup utilities allow DBA (Database Administrator) to back up and recover selected
objects within the database.
** Storage of Data **
Data storage is the memory structure in the system. The storage of data is divided into
three categories:
1. Volatile Memory
2. Non – Volatile Memory
3. Stable Memory
1. Volatile Memory
Volatile memory can store only a small amount of data. For example. Main memory,
cache memory etc.
Volatile memory is the primary memory device in the system and placed along with the
CPU.
In volatile memory, if the system crashes, then the data will be lost.
RAM is a primary storage device which stores a disk buffer, active logs and other related
data of a database.
Primary memory is always faster than secondary memory.
When we fire a query, the database fetches a data from the primary memory and then
moves to the secondary memory to fetch the record.
If the primary memory crashes, then the whole data in the primary memory is lost and
cannot be recovered.
To avoid data loss, create a copy of primary memory in the database with all the logs and
buffers, create checkpoints at several places so the data is copied to the database.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 85
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
There are two methods of creating the log files and updating the database,
1. Deferred Database Modification
2. Immediate Database Modification
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 86
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
1. In Deferred Database Modification, all the logs for the transaction are created and stored
into stable storage system. In the above example, three log records are created and stored it in
some storage system; the database will be updated with those steps.
2. In Immediate Database Modification, after creating each log record, the database is
modified for each step of log entry immediately. In the above example, the database is modified
at each step of log entry that means after first log entry, transaction will hit the database to fetch
the record, then the second log will be entered followed by updating the employee's address,
then the third log followed by committing the database changes.
Recovery with Concurrent Transaction:
When two transactions are executed in parallel, the logs are interleaved. It would become
difficult for the recovery system to return all logs to a previous point and then start
recovering.
To overcome this situation 'Checkpoint' is used.
Checkpoint
Checkpoint acts like a benchmark.
Checkpoints are also called as Syncpoints or Savepoints.
It is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage system.
It declares a point before which the database management system was in consistent state
and all the transactions were committed.
It is a point of synchronization between the database and the transaction log file.
It involves operations like writing log records in main memory to secondary storage,
writing the modified blocks in the database buffers to secondary storage and writing a
checkpoint record to the log file.
The checkpoint record contains the identifiers of all transactions that are active at the
time of the checkpoint.
Recovery
When concurrent transactions crash and recover, the checkpoint is added to the
transaction and recovery system recovers the database from failure in following manner,
1. Recovery system reads the log files from end to start checkpoint. It can reverse the
transaction.
2. It maintains undo log and redo log.
3. It puts the transaction in the redo log if the recovery system sees a log <Tn, Commit>.
4. It puts the transaction in undo log if the recovery system sees a log with <Tn, Start>.
All the transactions in the undo log are undone and their logs are removed.
All the transactions in the redo log and their previous logs are removed and then redone
before saving their logs.
** Database errors **
There are mainly two types of database errors. They are:
1. Logical errors
2. System errors
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 87
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
** Database Security **
Database security refers to the collective measures used to protect and secure a database
or database management software from illegitimate use and malicious threats and attacks.
It is a broad term that includes a multitude of processes, tools and methodologies that
ensure security within a database environment.
Database and functions can be managed by two different modes of security controls:
1. Authentication
2. Authorization
Authentication: Authentication is the process of confirming that a user logs in only in
accordance with the rights to perform the activities he is authorized to perform. User
authentication can be performed at operating system level or database level itself. By using
authentication tools for biometrics such as retina and figure prints are in use to keep the
database from hackers or malicious users.
The database security can be managed from outside the database system. Here is some type
of security authentication process:
Based on Operating System authentications.
Lightweight Directory Access Protocol (LDAP)
The security service is a part of operating system. For Authentication, it requires two
different credentials; those are userid or username, and password.
Authorization: We can access the Database and its functionality within the database system,
which is managed by the Database manager. Authorization is a process managed by the
Database manager. The manager obtains information about the current authenticated user, that
indicates which database operation the user can perform or access.
Here are different ways of permissions available for authorization:
Primary permission: Grants the authorization ID directly.
Secondary permission: Grants to the groups and roles if the user is a member.
Public permission: Grants to all users publicly.
Context-sensitive permission: Grants to the trusted context role.
Authorization can be given to users based on the categories below:
System-level authorization
System administrator [SYSADM]
System Control [SYSCTRL]
System maintenance [SYSMAINT]
System monitor [SYSMON]
Authorities provide controls within the database. Other authorities for database include with
LDAD and CONNECT.
Object-Level Authorization: Object-Level authorization involves verifying privileges
when an operation is performed on an object.
Content-based Authorization: User can have read and write access to individual rows
and columns on a particular table using Label-based access Control [LBAC].
Database tables and configuration files are used to record the permissions associated with
authorization names. When a user tries to access the data, the recorded permissions verify the
following permissions:
Authorization name of the user
Which group belongs to the user
Which roles are granted directly to the user or indirectly to a group
Permissions acquired through a trusted context.
While working with the SQL statements, the Database authorization model considers the
combination of the following permissions:
Permissions granted to the primary authorization ID associated with the SQL statements.
Secondary authorization IDs associated with the SQL statements.
Granted to PUBLIC.
Granted to the trusted context role.
UNIT- V
** Distributed database **
A distributed database is a collection of multiple interconnected databases, which are
spread physically across various locations that communicate via a computer network.
Features:
Databases in the collection are logically interrelated with each other. Often they represent
a single logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not synonymous with
a transaction processing system.
Advantages of Distributed Databases (OR) Data distribution:
Following are the advantages of distributed databases over centralized databases.
Modular Development − If the system needs to be expanded to new locations or new units,
in centralized database systems, the action requires substantial efforts and disruption in the
existing functioning. However, in distributed databases, the work simply requires adding
new computers and local data to the new site and finally connecting them to the distributed
system, with no interruption in current functions.
More Reliable − In case of database failures, the total system of centralized databases
comes to a halt. However, in distributed systems, when a component fails, the functioning of
the system continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be
met from local data itself, thus providing faster response. On the other hand, in centralized
systems, all queries have to pass through the central computer for processing, which
increases the response time.
Lower Communication Cost − In distributed database systems, if data is located locally
where it is mostly used, then the communication costs for data manipulation can be
minimized. This is not feasible in centralized systems.
It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
It is designed for heterogeneous database platforms.
It maintains confidentiality and data integrity of the databases.
Factors Encouraging DDBMS: The following factors encourage moving over to DDBMS:
Distributed Nature of Organizational Units: Most organizations in the current times
are subdivided into multiple units that are physically distributed over the globe. Each unit
requires its own set of local data. Thus, the overall database of the organization becomes
distributed.
Need for Sharing of Data: The multiple organizational units often need to communicate
with each other and share their data and resources. This demands common databases or
replicated databases that should be used in a synchronized manner.
Support for Both OLTP and OLAP: Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may have
common data. Distributed database systems aid both these processing by providing
synchronized data.
Database Recovery: One of the common techniques used in DDBMS is replication of
data across different sites. Replication of data automatically helps in data recovery if
database in any site is damaged. Users can access data from other sites while the
damaged site is being reconstructed. Thus, database failure may become almost
inconspicuous to users.
Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a uniform
functionality for using the same data among different platforms.
Advantages of DDBMS:
1. Data are located near the greatest demand site. The data in a distributed database system
are dispersed to match business requirements which reduce the cost of data access.
2. Faster data access. End users often work with only a locally stored subset of the company’s
data.
3. Faster data processing. A distributed database system spreads out the systems workload by
processing data at several sites.
4. Growth facilitation. New sites can be added to the network without affecting the operations
of other sites.
5. Improved communications. Because local sites are smaller and located closer to customers,
local sites foster better communication among departments and between customers and
company staff.
Disadvantages of DDBMS:
1. Complexity of management and control. Applications must recognize data location, and
they must be able to stitch together data from various sites. Database administrators must have
the ability to coordinate database activities to prevent database degradation due to data
anomalies.
2. Technological difficulty. Data integrity, transaction management, concurrency control,
security, backup, recovery, query optimization, access path selection, and so on, must all be
addressed and resolved.
3. Security. The probability of security lapses increases when data are located at multiple sites.
The responsibility of data management will be shared by different people at several sites.
4. Lack of standards. There are no standard communication protocols at the database level.
(Although TCP/IP is the de facto standard at the network level, there is no standard at the
application level.) For example, different database vendors employ different—and often
incompatible—techniques to manage the distribution of data and processing in a DDBMS
environment.
5. Increased storage and infrastructure requirements. Multiple copies of data are required
at different sites, thus requiring additional disk storage space.
6. Increased training cost. Training costs are generally higher in a distributed model than they
would be in a centralized model, sometimes even to the extent of offsetting operational and
hardware savings.
7. Costs. Distributed databases require duplicated infrastructure to operate (physical location,
environment, personnel, software, licensing, etc.)
Local database Conceptual Level − Depicts local data organization at each site.
Local database Internal Level − Depicts physical data organization at each site.
** Data Replication **
Data replication is the process of storing separate copies of the database at two or more
sites. It is a popular fault tolerance technique of distributed databases.
Advantages of Data Replication:
Reliability − In case of failure of any site, the database system continues to work since a
can be done with reduced network usage, particularly during prime hours. Data updating can
be done at non-prime hours.
Quicker Response − Availability of local copies of data ensures quick query processing and
different sites and minimal coordination across the network. Thus, they become simpler in
nature.
increased storage costs. The storage space required is in multiples of the storage required for
a centralized system.
Increased Cost and Complexity of Data Updating − Each time a data item is updated, the
update needs to be reflected in all the copies of the data at the different sites. This requires
complex synchronization techniques and protocols.
Undesirable Application – Database coupling − If complex update mechanisms are not
used, removing data inconsistency requires complex co-ordination at application level. This
results in undesirable application – database coupling.
Some commonly used replication techniques are −
Snapshot replication
Near-real-time replication
Pull replication
** Data Fragmentation **
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of
the table are called fragments. Fragmentation can be of three types: horizontal, vertical, and
hybrid (combination of horizontal and vertical). Horizontal fragmentation can further be
classified into two techniques: primary horizontal fragmentation and derived horizontal
fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed
from the fragments. This is needed so that the original table can be reconstructed from the
fragments whenever required. This requirement is called “reconstructiveness.”
Advantages of Fragmentation:
Since data is stored close to the site of usage, efficiency of the database system is
increased.
Local query optimization techniques are sufficient for most queries since data is locally
available.
Since irrelevant data is not available at the sites, security and privacy of the database
Disadvantages of Fragmentation:
When data from different fragments are required, the access speeds may be very high.
techniques.
Lack of back-up copies of data in different sites may render the database ineffective in
At first, generate a set of horizontal fragments; then generate vertical fragments from one
or more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one
or more of the vertical fragments.
Client/Server Architecture
The Client/Server model is basically platform independent and blends with “cooperating
processing” or “peer-to-peer” model. The platform provides the opportunity for users to access
the business functionality thereby exposing into risky situations since its transparent to the
underlying technology as well as to the user.
Client/Server architecture of database system has two logical components namely client,
and server. Clients are generally personal computers or workstations whereas server is large
workstations, mini range computer system or a mainframe computer system. The applications
and tools of DBMS run on one or more client platforms, while the DBMS soft wares reside on
the server. The server computer is caned backend and the client's computer is called front end.
These server and client computers are connected into a network. The applications and tools act
as clients of the DBMS, making requests for its services. The DBMS, in turn, processes these
requests and returns the results to the client(s). Client/Server architecture handles the Graphical
User Interface (GUI) and does computations and other programming of interest to the end user.
The server handles parts of the job that are common to many clients, for example, database
access and updates.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 99
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
“A competitive global economy will ensure obsolescence and obscurity to those who
cannot or are unwilling to compete”(Client/Server Architecture,2011), according to this
statement it’s necessary for organizations sustain its market position by re-engineering
prevailing organizational structures and business practices to achieve their business goals. In
short it’s a basic need to evolve with the change of technological aspects.
Therefore organizations should undergo a mechanism to retrieve and process its
corporate data to make business procedures more efficient to excel or to survive in the global
market. The client/server model brings out a logical perspective of distributed corporative
processing where a server handles and processes all client requests. This can be also viewed as a
revolutionary milestone to the data processing industry.
“Client/server computing is the most effective source for the tools that empower
employees with authority and responsibility.”(Client/Server Architecture, 2011)
“Workstation power, workgroup empowerment, preservation of existing investments,
remote network management, and market-driven business are the forces creating the need for
client/server computing”. (Client/Server Architecture, 2011)
Client/server computing has a vast progression in the computer industry leaving any area
or corner untouched. Often hybrid skills are required for the development of client/server
applications including database design, transaction processing, communication skills, graphical
user interface design and development etc. Advanced applications require expertise of
distributed objects and component infrastructures.
Most commonly found client/server strategy today is PC LAN implementation optimized
for the usage of group/batch. This has basically given threshold to many new distributed
enterprises as it eliminates host-centric computing.
Advantages:
Organizations often seek opportunities to maintain service and quality competition to
sustain its market position with the help of technology where the client/server model makes an
effective impact. Deployment of client/server computing in an organization will positively
increase productivity through the usage of cost-effective user interfaces, enhanced data storage,
vast connectivity and reliable application services.
If properly implemented its capable of improving organizational behavior with the help of
the knowledgeable worker-who can manipulate data and respond to the errors appropriately.
Improved Data Sharing: Data is retained by usual business processes and manipulated on a
server is available for designated users (clients) over an authorized access. The use of
Structured Query Language (SQL) supports open access from all client aspects and also
transparency in network services depict that similar data is being shared among users.
Integration of Services: Every client is given the opportunity to access corporate
information via the desktop interface eliminating the necessity to log into a terminal mode or
another processor. Desktop tools like spreadsheet, power point presentations etc can be used
to deal with corporate data with the help of database and application servers resident on the
network to produce meaningful information.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 101
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Shared Resources amongst Different Platforms: Applications used for client/server model
is built regardless of the hardware platform or technical background of the entitled software
(Operating System S/W) providing an open computing environment, enforcing users to
obtain the services of clients and servers (database, application, communication servers).
Inter-Operation of Data: All development tools used for client/server applications access
the back-end database server through SQL, an industry-standard data definition and access
language, helpful for consistent management of corporate data. Advanced database products
enable user/application to gain a merged view of corporate data dispersed over several
platforms. Rather than a single target platform this ensures database integrity with the ability
to perform updates on multiple locations enforcing quality recital and recovery.
Data Processing capability despite the location: We are in an era which undergoes a
transformation of machine-centered systems to user-centered systems. Machine-centered
systems like mainframe, mini-micro applications had unique access platforms and
functionality keys, navigation options, performance and security were all visible. Through
client/server users can directly log into a system despite of the location or technology of the
processors.
Easy maintenance: Since client/server architecture is a distributed model representing
dispersed responsibilities among independent computers integrated across a network, it’s an
advantage in terms of maintenance. It’s easy to replace, repair, upgrade and relocate a server
while clients remain unaffected. This unawareness of change is called as encapsulation.
Security: Servers have better control access and resources to ensure that only authorized
clients can access or manipulate data and server-updates are administered effectively.
The client (application database) might be a personnel workstation, tailored to the needs of
the end users and thus able to provide better interfaces, high availability, faster responses
and overall improved ease of use to the user. A single database (on server) can be shared
across several distinct client (application) systems.
Exercise- 1
1. Create table EMP with columns emp_num, ename, sal and enter 10 records.
2. Add columns dname, dept_num, location for EMP table.
3. Rename the EMP table with Employee and modify the ename column size as 20.
4. Display the all records from the Employees of department number 30.
5. Display the employees details whose have 2 A’s in their name.
6. Drop the column dname and display details of employees whose salary greater than 15000.
Exercise- 2
1. Display the details of employees whose join date is 01/11/2017.
2. Add column job to the employees table and list the clerks in the deptno of 10.
3. Display the details of employees whose salary is less than 10000.
4. Display the details of the employee salaries in descending order.
5. Display the names of the employees in uppercase.
6. Display the names of the employees in lowercase.
Exercise- 3
1. Find the Dept which has maximum number of employees.
2. List the year in which maximum number of employees were recruited.
3. Display the details of employees who are working for departments 10 and 20.
4. Update the HRA=15%, DA=10%, TA=10% for all the employees whose experience is more
than 10 years.
5. Write a query to delete duplicate records from emp.
6. Display the sum of salaries in department wise.
Exercise- 4
1. Make the duplicate table as emp12 on emp.
2. Add constraint primary key for emp_num and dept_num columns for emp table.
3. Remove the referential integrity from emp and dept tables.
4. Display the names of employees who earn the Highest salary in their respective departments.
5. Display the employees whose job as manager.
6. Display the details of employees whose name is ALLEN.
Exercise- 5
1. Display all rows from EMP table. The system waits after every screen full of information.
2. Create view for emp table.
3. Create a view for emp table where deptno=10;
4. Drop the view of emp table.
5. Delete all the records from the emp where the deptname is NULL.
6. Delete the rows of employees whose experience is less than 5 years.
Table: EMP
EMP_NUM ENAME SAL JOB HIREDATE DNAME DEPT_NUM LOCATION
7364 SMITH 8000 CLERK 12/17/80 RESEARCH 20 DALLAS
7499 ALLEN 16000 SALESMAN 02/20/81 SALES 30 CHICAGO
7566 JONES 29750 MANAGER 04/02/81 RESEARCH 20 DALLAS
7654 MARTIN 12500 SALESMAN 09/28/81 SALES 30 CHICAGO
7698 BLAKE 28500 MANAGER 05/01/81 SALES 30 CHICAGO
7782 CLARK 24500 MANAGER 06/09/81 ACCOUNTING 10 NEWYORK
7788 SCOTT 30000 ANALYST 04/19/87 RESEARCH 20 DALLAS
7876 ADAMS 11000 CLERK 05/23/87 RESEARCH 20 DALLAS
7900 JAMES 9500 CLERK 12/03/81 SALES 30 CHICAGO
7902 FORD 8000 ANALYST 12/03/81 RESEARCH 20 DALLAS
7934 MILLER 13000 CLERK 01/23/82 ACCOUNTING 10 NEWYORK
Table: DEPT
DEPT_NUM DNAME LOCATION
10 ACCOUNTING NEWYORK
20 RESEARCH DALLAS
30 SALES CHICAGO
Exercise- 1
1. Create table EMP with columns emp_num, ename, sal and enter 10 records.
Creating table:
SQL> Create table EMP(emp_num number(4), ename varchar2(15), sal number(5));
Table created.
Entering records:
SQL> Insert into EMP values(&emp_num, ‘&ename’, &sal);
Enter value for emp_num: 7364
Enter value for ename: smith
Enter value for sal: 8000
Old1: insert into EMP values(&emp_num, ‘&ename’, &sal)
New1: insert into EMP values(7364, ‘smith’, 8000);
1 row created.
SQL> / and press Enter, add remaining 9 records.
6. Drop the column dname and display details of employees whose salary greater than
15000.
SQL> Alter table EMP drop column dname;
Table altered.
SQL> Select * from EMP where sal>15000;
Exercise- 2
1. Display the details of employees whose join date is 01/11/2017.
SQL> Select * from EMP where to_char(hiredate, ‘mm/dd/yyyy’)=’01/11/2017’;
No rows selected.
2. Add column job to the employees table and list the clerks in the deptno of 10.
SQL> Alter table EMP add(job varchar2(10));
Table altered.
SQL> Select * from EMP where job=’clerk’ and dept_num=10;
ENAME SAL
SCOTT 30000
JONES 29750
BLAKE 28500
CLARK 24500
ALLEN 16000
MILLER 13000
MARTIN 12500
ADAMS 11000
JAMES 9500
SMITH 8000
FORD 8000
5. Display the names of the employees in uppercase.
SQL> Select upper(ename) from EMP;
ENAME
SMITH
ALLEN
JONES
MARTIN
BLAKE
CLARK
SCOTT
ADAMS
JAMES
FORD
MILLER
6. Display the names of the employees in lowercase.
SQL> Select lower(ename) from EMP;
ENAME
smith
allen
jones
martin
blake
clark
scott
adams
james
ford
miller
Exercise- 3
1. Find the Dept which has maximum number of employees.
SQL> Select dept_num from EMP group by dept_num having count(*) = (Select
max(count(*) from EMP group by dept_num);
DEPT_NUM
30
to_char(hiredate, ‘yy’)
81
3. Display the details of employees who are working for departments 10 and 20.
SQL> Select * from EMP where dept_num in(10,20);
EMP_NUM ENAME SAL JOB HIREDATE DNAME DEPT_NUM LOCATION
7364 SMITH 8000 CLERK 12/17/80 RESEARCH 20 DALLAS
7566 JONES 29750 MANAGER 04/02/81 RESEARCH 20 DALLAS
7782 CLARK 24500 MANAGER 06/09/81 ACCOUNTING 10 NEWYORK
7788 SCOTT 30000 ANALYST 04/19/87 RESEARCH 20 DALLAS
7876 ADAMS 11000 CLERK 05/23/87 RESEARCH 20 DALLAS
7902 FORD 8000 ANALYST 12/03/81 RESEARCH 20 DALLAS
7934 MILLER 13000 CLERK 01/23/82 ACCOUNTING 10 NEWYORK
4. Update the HRA=15%, DA=10%, TA=10% for all the employees whose experience is
more than 10 years.
SQL> Alter table EMP add(HRA number(5,2), DA number(5,2), TA number(5,2));
Table altered.
SQL> Update EMP set HRA=0.15*sal, DA=0.1*sal, TA=0.1*sal, where
months_between(sysdate, hiredate)>120;
10 rows updated.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 112
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
10 37500
20 86750
30 66500
Exercise- 4
1. Make the duplicate table as emp12 on emp.
SQL> Create table EMP12 as Select * from EMP;
Table created.
Here, all the records of EMP table are copied into EMP12.
2. Add constraint primary key for emp_num and dept_num columns for emp table.
SQL> Alter table EMP add constraint c1 primary key(emp_num, dept_num);
Table altered.
4. Display the names of employees who earn the Highest salary in their respective
departments.
SQL> Select ename, sal, dept_num from EMP where (dept_num, sal) in (Select dept_num,
max(sal) from EMP group by dept_num);
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 113
downloaded from: www.sucomputersforum.com
Relational Database Management Systems B.Com(CA)-II Year(IV Semester)
Exercise- 5
1. Display all rows from EMP table. The system waits after every screen full of
information.
SQL> Set pause on
SQL> Select * from EMP;
5. Delete all the records from the emp where the deptname is NULL.
SQL> Delete from EMP where dname=’NULL’;
0 rows deleted.
0 rows deleted.