Professional Documents
Culture Documents
in
Unit-I
Introduction to Databases: Introduction, Traditional File-Based Systems, Database Approach,
Roles in the Database Environment, Advantages and Disadvantages of DBMSs, The Three-
Level ANSI-SPARC Architecture, Database Languages, Data Models, Functions of a DBMS,
Components of a DBMS Relational Model: Introduction, Terminology, Integrity constraints,
Views. The Relational Algebra: Unary Operations, Set Operations, Join Operations, Division
Operation, Aggregation and Grouping Operations.
Unit-II
Entity-Relationship Modeling: Entity Types, Relationship Types, Attributes, Keys, Strong and
Weak Entity Types, Attributes on Relationships, Structural Constraints, Problems with ER
Models-Fan Traps, Chasm Traps. Enhanced Entity-Relationship Modeling:
Specialization/Generalization, Aggregation, Composition. Functional Dependency: Multi
Valued Dependency, Join Dependency. Normalization: The Purpose of Normalization, How
Normalization supports Database Design, Data Redundancy and Update Anomalies, Functional
Dependencies in brief, The Process of Normalization, INF, 2NF, 3NF, BCNF. The Database
Design Methodology for Relational Database (Appendix-D).
Unit-III
SQL: Introduction, Data Manipulation-Simple Queries, Sorting Results, Using the SQL
Aggregate Functions, Grouping Results, Sub- queries, ANY and ALL, Multi table Queries,
EXISTS and NOT EXIST, Combining Result Tables, Database Updates SQL: The ISO SQL
Data Types, Integrity Enhancement Feature-Domain Constraints, Entity Integrity, Referential
Integrity, General Constraints, Data definition-Creating a Database, Creating a Table, Changing
a Table Definition, Removing a Table, Creating an Index, Removing an Index, Views- Creating
a View, Removing a View, View Resolution, Restrictions on Views, View Updatability, WITH
CHECK OPTION, Advantages and Disadvantages of Views, View Materialization,
Transactions, Discretionary Access Control- Granting Privileges to other Users, Revoking
Privileges from Users Advanced SQL: The SQL Programming Language- Declarations,
Assignments, Control Statements, Exceptions, Cursors, Subprograms, Stored Procedures,
Functions, and Packages, Triggers, Recursion.
Unit-IV
Transaction Management: Transaction Support-Properties of Transaction, Database
Architecture, Concurrency Control- the Need for Concurrency Control, Serializability and
Recoverability, Locking Methods, Deadlock, Time Stamping Methods, Multi-version
Timestamp Ordering, Optimistic Techniques, Granularity of Data Items, Database Recovery-
The Need for Recovery, Transaction and Recovery, Recovery Facilities, Recovery Techniques,
Nested Transaction Model. Security: Database Security-Threats, Computer-Based Controls-
Authorization, Access Controls, Views, Backup and Recovery, Integrity, Encryption, RAID.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 1
Text Book:
1. Thomas M. Connolly, Carolyn E. Begg, Database Systems- A Practical Approach to
Design, Implementation, and Management (6e).
References
1. Sharon Allen, Evan Terry, Beginning Relational Data Modeling
2. Jeffrey A. Hoffer V. Ramesh, Heikki Topi, Modern Database Management
3. Raghu Ramakrishnaan, Johannes Gehrke, Database Management Systems
4. Ramez Elmasri, Shamkant B Navathe, Fundaments of Database Systems
5. Abraham Silberschatz, Henry F. Korth, S. Sudarshan, Database System
6. Concepts C Coronel, S Morris, Peter Rob, Database Systems: Design, Implementation,
and Management
UNIT- I
Data: Data is a raw collection of facts about people, places, objects and events, which include
text, graphics, images, sound etc that have meaning in the user’s environment.
Data is given by the user to the computer.
It is not understandable (meaningless).
It requires processing.
Example: 1Data Sri 35 45 55 Data (Here, it is not clear whether 35
2 Ram 75 80 98 is rno (or) marks)
Information: Information is meaningful data in an organized form. Information is processed
data that increases the knowledge of a person who uses the data.
Information is given by the computer to the user.
It is understandable (meaningful).
It is processed data.
Example: Sno Name M1 M2 M3
(Number) (Varchar) (Number) (Number) (Number)
1 Sri 35 45 55 Information
2 Ram 75 80 98
Data Process
Information
Meta data: Meta data is data about data. It describes the properties/characteristics of other data.
It includes field names, data types and their size.
It is used for processing
It gives meaning to the data.
Sno Name M1 M2 M3 Meta data
(Number) (Varchar) (Number) (Number) (Number)
Data base: A data base is a collection of logically related data stored in a standardized format
and is sharable by multiple users. (OR)
A mass storage of data that is generated over a period of time in a business environment
is called “data base”.
Symbol of data base is:
Example:
1. University database: In this we can store data about students, faculty, courses, results, etc.
2. Bank database: In this we can store data about Account holders.
DBMS S/W
User Database
DBMS software is an interface between user and Database.
DBMS provides services like storing, updating, deleting and selecting data.
Database system: It is a set of Databases, DBMS, Hardware and people who operate on it.
Evolution of database system:
Late 1950’s : Sequential file processing systems were used in late 1950’s. In these systems, all
records in a file must be processed in sequence.
1960’s : Random access file processing systems were greatly in use in this period. It
supported direct access to a specific record. In this, it was difficult to access
multiple records though they were related to a single file.
1970’s : During this decade the hierarchical and net work database management systems
were developed and were treated as first generation DBMS.
Late 1970’s: E. F. Codd and others developed the relational data model during this 1970’s. This
model is considered second generation DBMS. All data is represented in the form
of tables. SQL is used for data retrieval.
1980’s : Object Oriented model was introduced in 1980’s. In this model, both data and
their relationships (operations) are contained in a single structure known as object.
1990’s : Client/ Server computing were introduced, then data ware housing, internet
applications.
2000 : Object oriented database systems were introduced.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 4
File is a place where a group of related data is stored. (OR) Collection of records is called
“file”. At the time of 1950’s sequential file processing systems were used to maintain the data.
Later, in 1960’s Random Access file processing systems were introduced.
File management system was the first method used to store the data in a
computerized database. The traditional filing system (TFS) is a method of storing and
arranging computer files and the information in the file (data). Basically it organizes
these files into a database for the storage, organization, manipulation, and retrieval by the
computer's operating system.
**Database Approach**
In order to remove all limitations of the File Based Approach, a new approach was
required that must be more effective known as Database approach.
The Database is a shared collection of logically related data, designed to meet the
information needs of an organization. A database is a computer based record keeping system
whose over all purpose is to record and maintains information. The database is a single, large
repository of data, which can be used simultaneously by many departments and users. Instead of
disconnected files with redundant data, all data items are integrated with a minimum amount of
duplication.
The database is no longer owned by one department but is a shared corporate resource.
The database holds not only the organization's operational data but also a description of this
data.
For this reason, a database is also defined as a self-describing collection of integrated
records. The description of the data is known as the Data Dictionary or Meta Data (the 'data
about data'). It is the self-describing nature of a database that provides program-data
independence.
2. Minimal data redundancy: Unnecessary data can be reduced. There will not be any
duplicate. Each primary fact is recorded in only one place in the database, so that we can
avoid the wasted storage space.
3. Improve data consistency. By eliminating data redundancy, we can improve data
consistency. For example, if a customer address is stored only once, updating that becomes
simple.
4. Improved data sharing: A database is designed as a shared resource. Authorized users are
granted permission to use the database and each user is provided one or more user views to
facilitate this use.
5. Improved data accessibility: without any programming experience, one can retrieve and
display data very easily. The language used to write queries is called structured query
language ( SQL).
6. Enforcement of standards: these standards will include naming conventions, data quality
standards and procedures for accessing, updating and protecting data.
7. Focus on data: DBMS focuses on data. It first defines the data and then all queries, reports
and programs to access the data through the DBMS.
8. Reduced program maintenance: In a database environment the data are more independent
of application programs. As a result, program maintenance can be significantly reduced in a
database environment.
** Disadvantage of DBMS **
1. Cost: DBMS requires high initial investment for hardware, software and trained staff. A
significant investment based upon size and functionality of organization if required.
2. Complexity: A DBMS fulfill lots of requirement and it solves many problems related to
database. But all these functionality has made DBMS extremely complex software.
Developer, designer, DBA and End user of database must have complete skills if they want
to user it properly. If they don’t understand this complex system then it may cause loss of
data or database failure.
3. Technical staff requirement: Any organization has many employees working for it and
they can perform many others tasks too that are not in their domain but it is not easy for
them to work on DBMS. A team of technical staff is required who understand DBMS and
company have to pay handsome salary to them too.
4. Database Failure: As we know that in DBMS, all the files are stored in single database so
chances of database failure become more. Any accidental failure of component may cause
loss of valuable data. This is really a big question mark for big firms.
5. Size: As DBMS becomes big software due to its functionalities so it requires lots of space
and memory to run its application efficiently. It gains bigger size as data is fed in it.
6. Performance: Traditional files system was very good for small organizations as they give
splendid performance. But DBMS gives poor performance for small scale firms as its speed
is slow.
** Three-level ANSI SPARC Database Architecture **
The Architecture of most of commercial dbms are available today is mostly based on this
ANSI-SPARC database architecture. ANSI SPARC THREE-TIER architecture has main three
levels:
1. Internal Level (or) Physical level
2. Conceptual Level (or) Logical level
3. External Level
These three levels provide data abstraction; means hide the low level complexities from end
users. A database system should be efficient in performance and convenient in use.
Using these three levels, it is possible to use complex structures at internal level for efficient
operations and to provide simpler convenient interface at external level.
1. Internal level:
This is the lowest level of data abstraction.
It describes how the data are actually stored on storage devices.
It deals with complex low level data structures, file structures and access methods in
detail.
It also deals with Data Compression and Encryption techniques, if used.
2. Conceptual level:
This is the next higher level than internal level of data abstraction.
It describes what data are stored in the database and what relationships exist among those
data.
It is also known as logical level.
database.
Application developers also work on this level.
3. External Level:
This is the highest level of data abstraction.
It describes only part of the entire database that a end user concern.
It is also known as an view level.
End users need to access only part of the database rather than entire database.
Different user needs different views of database. And so, there can be many view level
** Data models **
A data model is a simple graphical representation of real-world entities (objects).
According to Hoberman (2009), “A data model is a way of finding the tools for both business
and IT professionals, which uses a set of symbols and text to precisely explain a subset of real
information to improve communication within the organization and thereby lead to a more
flexible and stable application environment”.
A data model represents the structure of data in a database.
A data model is a communication tool (conceptual tool) for describing data, relationship
and constraints.
Data modeling is the first step in designing a database.
A data model can provide interaction among the designer, application programmer and
end user.
Level1 Segment
Asst_Manager1 Asst_Manager2
(Root Children)
Level2 Segment
Clerk1 Clerk2 Clerk1 Clerk2 (Level1 Children)
Advantages:
It provides data sharing.
It provides data security.
It provides data integrity, conceptual simplicity.
Efficient with 1: M relationship.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 13
Disadvantages:
Does not support many-to-many (M: N) relationship.
No support to DDL& DML.
Difficult to manage.
Structural dependency.
Network model: In the network model, a parent can have several children and a child can also
have many parent records. Records are physically linked through pointers. A pointer can store
address.
The network model was created to represent complex relationships.
This model was developed in 1970’s.
The network database is a collection of records in 1: M relationship.
In network database terminology, a relationship is called a set. Each set is composed of at
least two record types; a parent record & a child record.
A set represents 1: M relationship between the parent and child.
Nodes
Suppliers
S1 S2 S3
Links
Customers
C1 C2 C3
Advantages:
It supports DDL& DML.
Handles more relationships types such as M: N.
It supports data integrity.
Conceptual simplicity.
Disadvantages:
Complex implementation
Structural dependency.
Relational model: Relational model was introduced in late 1970’s by E.F.Codd. In this model,
the data is maintained in the form of tables consisting of rows and columns.
Data in two tables is related through common columns.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 14
Unlike hierarchical and network models, there are no physical links in relational model.
Example: The following set of tables-is an example for relational model.
Table: EMPLOYEE Table: DEPARTMENT
empno ename job salary deptno deptno deptname
(Foreign key) (Primary key)
1003 Ravi Manager 12000 10 10 Finance
1004 Rajesh Clerk 5000 20 20 Accounts
1005 Kiran Clerk 5000 30 30 Marketing
Advantages:
Structural independence.
Supports SQL (Structured Query Language).
Easier database design, implementation and use.
Disadvantages:
Requires high configuration (Hardware & Software)
Possibility of poor design & implementation.
Object-based data model: These models Represents the data in the form of objects. The
object-based data models are:
1. Object-oriented data model
2. Entity-relationship model.
Object-oriented data model: This model was introduced in 1990’s. In this model, the data is
represented in the form of objects. In this model we can store not only data but also procedures
(methods). Object-oriented data model is based on following components.
Object: Object is a real time entity that must be existing in the world.
Attribute: It describes the properties/characteristics of an object.
Ex: STUDENT object contains attributes like sno, name, marks, etc.
Method: It is a set of instructions that performs a specific task.
Representation of an object Example
STUDENT Object
Method1 sno
name
Data
Data marks Attributes
** Functions of DBMS **
There are several functions that a DBMS performs to ensure data integrity and
consistency of data in the database.
1. Data Dictionary Management: Data Dictionary is where the DBMS stores definitions of
the data elements and their relationships (metadata). The DBMS uses this function to look
up the required data component structures and relationships.
2. Data Storage Management: This particular function is used for the storage of data and any
related data entry forms or screen definitions, report definitions, data validation rules,
procedural code, and structures that can handle video and picture formats. Users do not need
to know how data is stored or manipulated.
3. Data Transformation and Presentation: This function exists to transform any data entered
into required data structures. By using the data transformation and presentation function the
DBMS can determine the difference between logical and physical data formats.
4. Security Management: This is one of the most important functions in the DBMS. Security
management sets rules that determine specific users that are allowed to access the database.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 16
Users are given a username and password or sometimes through biometric authentication
(such as a fingerprint or retina scan) but these types of authentication tend to be more costly.
This function also sets restraints on what specific data any user can see or manage.
5. Multiuser Access Control: Data integrity and data consistency are the basis of this
function. Multiuser access control is a very useful tool in a DBMS, it enables multiple users
to access the database simultaneously without affecting the integrity of the database.
6. Backup and Recovery Management: Backup and recovery is brought to mind whenever
there is potential outside threats to a database. For example if there is a power outage,
recovery management is how long it takes to recover the database after the outage. Backup
management refers to the data safety and integrity; for example backing up all your mp3
files on a disk.
7. Data Integrity Management: The DBMS enforces these rules to reduce things such as data
redundancy, which is when data is stored in more than one place unnecessarily, and
maximizing data consistency, making sure database is returning correct/same answer each
time for same question asked.
8. Database Access Languages and Application Programming Interfaces: A query
language is a nonprocedural language. An example of this is SQL (structured query
language). SQL is the most common query language supported by the majority of DBMS
vendors. The use of this language makes it easy for user to specify what they want done
without the headache of explaining how to specifically do it.
9. Database Communication Interfaces: This refers to how a DBMS can accept different end
user requests through different network environments. An example of this can be easily
related to the internet. A DBMS can provide access to the database using the Internet
through Web Browsers (Mozilla Firefox, Internet Explorer, and Netscape).
10. Transaction Management: This refers to how a DBMS must supply a method that will
guarantee that all the updates in a given transaction are made or not made. All transactions
must follow what is called the ACID properties.
A – Atomicity: states a transaction is an indivisible unit that is either performed as a whole and
not by its parts, or not performed at all.
C – Consistency: A transaction must alter the database from one constant state to another
constant state.
I – Isolation: Transactions must be executed independently of one another. Part of a transaction
in progress should not be able to be seen by another transaction.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 17
** Components of DBMS **
Database system refers to the set of components that define and control the collection,
storage, management and use of data. The general database system architecture is shown below.
Database system refers to the set of components that define and control the collection, storage,
management and use of data. The general database system architecture is shown below.
Software
P
E
O
Application DBMS
P
Programs
L
E
Database
(DATA)
Hardware
Software: To make the database system work properly, two types of software are needed. i)
DBMS software ii) application programs (operating system)
DBMS software: It manages the database within the database system. Some example of DBMS
software includes Oracle, Access, MySQL and etc.
Application programs: These are used to access and manipulate data in the DBMS and to
manage the computer environment. Most of the application programs provide GUI (graphical
user interface).
People: This component includes all users of database system. According to the job nature, two
main types of users identified: they are,
i) Practitioners: system administrators, database administrators, database designers,
system analysts and programmers.
ii) Users: clerks, management, supervisors.
Data: Data refers to the collection of raw facts stored in the database. Because data are the raw
material, from which information is generated. No database system can exist without data.
Relationship between four components: Practitioners consult with the users to identity data
needs and design database structures as per the needs. The database structures are designed
using DBMS. Users enter the data into a system by specified procedures.
The entered data maintained in hardware media such as disks. Application Allow users to
access the database.
** Relational Model **
Relational model stores data in the form of tables. This concept purposed by Dr. E.F.
Codd, a researcher of IBM in the year 1960s. The relational model consists of three major
components:
1. The set of relations and set of domains that defines the way data can be represented (data
structure).
2. Integrity rules that define the procedure to protect the data (data integrity).
3. The operations that can be performed on data (data manipulation).
A rational model database is defined as a database that allows you to group its data items
into one or more independent tables that can be related to one another by using fields common
to each related table.
Characteristics of Relational Database: Relational database systems have the following
characteristics:
The whole data is conceptually represented as an orderly arrangement of data into rows
and columns, called a relation or table.
All values are scalar. That is, at any given row/column position in the relation there is one
and only one value.
All operations are performed on an entire relation and result is an entire relation, a
concept known as closure.
Basic Terminology used in Relational Model:
Table (or) Relation -- In relational data model, relations are saved in the format of Tables. This
format stores the relation among entities. A table has rows and columns, where rows represent
records and columns represent the attributes.
Tuple − Each row of data is a tuple. Actually, each row is an n-tuple, but the "n-" is usually dropped.
Relation instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and
their names.
Relation key − each row has one or more attributes, known as relation key, which can identify
the row in the relation (table) uniquely.
Attribute domain − every attribute has some pre-defined value scope, known as attribute
domain.
The figure shows a relation with the formal names of the basic components marked the
entire structure is, as we have said, a relation.
Key Constraints: Keys are attributes or sets of attributes that uniquely identify an entity within
its entity set. An Entity set E can have multiple keys out of which one key will be designated as
the primary key. Primary Key must have unique and not null values in the relational table.
Example of Key Constraints in a simple relational table –
Integrity Rule 1 (Entity Integrity Rule or Constraint): The Integrity Rule 1 is also called
Entity Integrity Rule or Constraint. This rule states that no attribute of primary key will contain
a null value. If a relation has a null value in the primary key attribute, then uniqueness property
of the primary key cannot be maintained. Consider the example below:
Integrity Rule 2 (Referential Integrity Rule or Constraint): The integrity Rule 2 is also
called the Referential Integrity Constraints. This rule states that if a foreign key in Table 1 refers
to the Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null or
be available in Table 2. For example,
Some more Features of Foreign Key: Let the table in which the foreign key is defined is
Foreign Table or details table i.e. Table 1 in above example and the table that defines the
primary key and is referenced by the foreign key is master table or primary table i.e. Table 2 in
above example. Then the following properties must be hold:
Records cannot be inserted into a foreign table if corresponding records in the master
table do not exist.
Records of the master table or Primary Table cannot be deleted or updated if
** Views **
View is a virtual table on one or more tables. The table on which a view is created is
called as a “base table”.
We can provide limitation on updating in base table through a view. If any changes made
to the base table those changes are also reflected in view.
Advantages of views:
We can provide security to the data.
We can provide limitation on data.
We can provide customized view for the data.
View uses little storage area.
It allows different users to view the same data in different ways at the same time.
It does not allow direct access to the tables of the data dictionary.
Disadvantages of views:
It can’t be indexed.
Takes time for creating view each time.
When table is dropped view becomes inactive.
View is a database object, so it occupies the space.
Without table, view will not work.
Updation is possible for simple view but not for complex views, they are read only type
of views.
Syntax: Create view <view-name> as select columns from <table-name> [WHERE condition];
** Relational Algebra **
Relational model supports eight relational operators: SELECT, PROJECT, JOIN,
INTERSECT, UNION, DIFFERENCE (MINUS), PRODUCT AND DIVIDE. The use of
relational operators on existing tables produces new relations. In this
Unary operators are: SELECT, PROJECT
Set operators are: UNION, INTERSECT, PRODUCT AND MINUS
Join operators are: EQUI JOIN, NON EQUI JOIN AND OUTER JOIN
Division operator is: DIVIDE
SELECT: SELECT also known as RESTRICT, yields values for all the rows found in a table
that satisfy given condition. In other words, SELECT yields a horizontal subset of a table.
EMP: Result
PROJECT: Yields all values for selected attributes. In other words, PROJECT yields a vertical
subset of a table.
EMP:
UNION: combines all rows from two tables, excluding duplicate rows. The tables must have
the same attribute characteristics (data types should match).
T1 T2 Result
T1 T2 Result
sname sname
Ramu Kiran sname
Raju INTERSECT Ramu gives Kiran
Kiran srinu Ramu
DIFFERENCE (MINUS): Yields all rows in one table that are not found in the other table,
that is, it subtracts one table from the other. However a difference B is not same as B
DIFFERENCE A.
T1 T2 Result
PRODUCT: Yields all possible pairs of rows from two tables-also known as the Cartesian
product. Therefore, if one table has 3 rows and the other table has 2 rows, the product Yields a
list composed of 3*2=6 rows.
T1 T2 Result
STD COURSE
Types of Joins:
Equi-join links tables based on an equality condition that compares specified columns of
each table. The outcome of the equi-join does not eliminate duplicate columns.
Theta Join is a non-equi-join that compares specified columns of each table using a
comparison operator other than the “equals to” operator.
Outer join: In an outer Join the unmatched pairs would be retained and the values for the
unmatched other tables would be left blank or null.
DIVIDE: uses one single column table say A as the divisor and one 2-column table say B as the
dividend. The tables must have a common column. The output of the divide operation is a
single column with the values of column from the dividend table (B) rows where the value of
the common column in both tables match.
T1 T2 Result
c1 c2 c1 c2
A 5 A gives
DIVIDE 5
A 9 B 9
B 5
B 9
Aggregate (Group) functions (Operations): These functions are used to operate on a group of
values. They aggregate the group to perform calculations such as finding out sum, max, min
values and etc.
SUM: finds sum of the values of a column.
Ex: SELECT SUM(SAL) FROM EMP;
AVG: finds average of the values of a column.
Ex: SELECT AVG (SAL) FROM EMP;
MAX: finds maximum of the values of a column.
Ex: SELECT MAX (SAL) FROM EMP WHERE JOB=’MANAGER’;
MIN: finds minimum of the values of a column.
Ex: SELECT MIN (SAL) FROM EMP;
COUNT: finds number of records.
Ex: SELECT COUNT (*) FROM EMP;
UNIT- II
ER Diagrams:
ERD stands for Entity Relationship diagram.
It is a graphical representation of an information system.
ER diagram shows the relationship between objects, places, people, events etc. within
that system.
It is a data modeling technique which helps in defining the business process.
Multi valued It represents multi valued attribute which can have many
Attribute values for a particular entity. For eg. Mobile Number.
**Entity Types**
Entities: An entity is a person, place, object, event or concept in the user environment about
which an organization maintains the data.
Types of Entities:
1. Strong Entity Types
2. Recursive Entity Types
3. Weak Entity Types
4. Composite Entity Types or Associative Entity Types
Notations Of different Entity Type in ER Diagram:
Entity
Strong Entity Type: These are the entities which have a key attribute in its attribute list or a set
that has a primary key. The strong entity type is also called regular entity type. For Example,
The Student’s unique RollNo will identify the students. So, RollNo is set to be the
Primary Key of the STUDENT entity, & Hence STUDENT is a strong entity type because of its
key attribute.
Recursive Entity Type: It is also called Self Referential Relationship Entity Type. It is an
entity type with foreign key referencing to same table or itself. Recursive Entity Type occurs in
a unary relationship.
“is married to” is a recursive one-to-one relationship between the instances of
PERSON type, i.e. one person gets married to another person.
“manages” is a recursive one-to-many relationship between the instances of
EMPLOYEE type, i.e. one employee manages another employees.
Recursive relationships
One-to-one One-to-many
PK FK
emp_num Ename Job Sal manager_id
This is a foreign key (manager_id) in a table that references the primary key
(emp_num) values of the same table. Thus, the above relationship is recursive relationship.
Weak Entity Type: Entity Type with no key or Primary Key are called weak entity Type.
The Tuples of weak entity type may not be possible to differentiate using one attribute of weak
entity. For every weak entity, there should be unique OWNER entity type. In the below
example, CHILD is a WEAK entity type and Employee is the OWNER entity type.
Composite Entities: If a Many to Many relationship exist then we must eliminate it by using
composite entities. Composite Entities are the entities which exist as a relationship as well as an
entity. The many to many relationship will be converted to 1 to many relationship.
Composite Entities are also called Bridge Entities, because they act like a bridge between the
two entities which have many to many relationships. Bridge or Composite entity composed of
the primary keys of each of the entities to be connected. A composite entity is represented by a
diamond shape with in a rectangle in an ER Diagram.
In the following example, the composite entity “CERTIFICATE” has the attributes
“cnum” and “date” which are peculiar to the relationship. It associates the instances of
“STUDENT” and “COURSE”.
6. Multi-Valued Attribute: These attribute can have more than one value at any point of time.
Manager can have more than one employee working for him, a person can have more than
one email address, and more than one house etc is the examples.
7. Simple Single Valued Attribute: This is the combination of above four types of attributes.
An attribute can have single value at any point of time, which cannot be divided further. For
example, EMPLOYEE_ID – it is single value as well as it cannot be divided further.
8. Simple Multi-Valued Attribute: Phone number of a person, which is simple as well as he
can have multiple phone numbers is an example of this attribute.
9. Composite Single Valued Attribute: Date of Birth can be a composite single valued
attribute. Any person can have only one DOB and it can be further divided into date, month
and year attributes.
10. Composite Multi-Valued Attribute: Shop address which is located two different locations
can be considered as example of this attribute.
One - to - One Relationship: In One - to - One Relationship, one entity is related with only one
other entity. One row in a table is linked with only one row in another table and vice versa.
For example: A Country can have only one Capital City.
One - to - Many Relationship: In One - to - Many Relationship, one entity is related to many
other entities. One row in a table A is linked to many rows in a table B, but one row in a table B
is linked to only one row in table A.
For example: One Department has many Employees.
Many - to - Many Relationship: In Many - to - Many Relationship, many entities are related
with the multiple other entities. This relationship is a type of cardinality which refers the
relation between two entities.
For example: Various Books in a Library are issued by many Students.
**Types of Keys **
Table is a collection of data in the form of rows and columns. Rows are referred as records
and columns are referred as fields.
A table includes several following components, which are called its keys.
1. Primary Key
2. Foreign Key
3. Candidate Key
4. Super Key
5. Composite Key
6. Alternate key
Primary key: A primary is a column or set of columns in a table that uniquely identifies tuples
(rows) in that table.
Example: Student Table
Stu_Id Stu_Name Stu_Age
101 Steve 23
102 John 24
103 Robert 28
104 Carl 22
In the above Student table, the Stu_Id column uniquely identifies each row of the table.
We denote the primary key by underlining the column name.
The value of primary key should be unique for each row of the table. Primary key column
primary key for a table. For e.g. {Stu_Id, Stu_Name} collectively can play a role of primary
key in the above table, but that does not make sense because Stu_Id alone is enough to
uniquely identifies rows in a table then why to make things complex. Having that said, we
should choose more than one columns as primary key only when there is no single column
that can play the role of primary key.
Foreign key: Foreign keys are the columns of a table that points to the primary key of another
table. They act as a cross-reference between tables.
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it
points to the primary key of the Student table.
Course_enrollment table:
Course_Id Stu_Id Student table:
Stu_Id Stu_Name Stu_Age
C01 101
101 Chaitanya 22
C02 102
102 Arya 26
C03 101 103 Bran 25
C05 102 104 Jon 21
C06 103
Candidate Key: A super key with no redundant attribute is known as candidate key. Candidate
keys are selected from the set of super keys, the only thing we take care while selecting
candidate key is: It should not have any redundant attributes. That’s the reason they are also
termed as minimal super key.
For example: Employee table
Emp_Id Emp_Number Emp_Name
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
There are two candidate keys in above table:
{Emp_Id}
{Emp_Number}
Note: A primary key is being selected from the group of candidate keys. That means we can
either have Emp_Id or Emp_Number as primary key.
Super key: A super key is a set or one of more columns (attributes) to uniquely identify rows in
a table. Often people get confused between super key and candidate key, so we will also discuss
a little about candidate key here.
How candidate key is different from super key?
Answer is simple – Candidate keys are selected from the set of super keys, the only thing
we take care while selecting candidate key is:
It should not have any redundant attribute. That’s the reason they are also termed as
minimal super key.
Let’s take an example to understand this: Employee table
Emp_SSN Emp_Number Emp_Name
123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert
Super keys:
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
All of the above sets are able to uniquely identify rows of the employee table.
Alternate Key: Out of all candidate keys, only one gets selected as primary key, remaining
keys are known as alternate or secondary keys.
Composite Key: A key that consists of more than one attribute to uniquely identify rows (also
known as records & tuples) in a table is called composite key.
** Attributes on relationship **
Occasionally it is convenient, or even necessary, to connect attributes with a relationship,
rather than with any one of the entity sets that the relationship connects. For instance, consider
the relationship of "Multiway Relationships" figure, which represents contracts between a star
and studio for a movie. (Here, we have reverted to the earlier concept of three-way contracts in
"Multiway Relationships" example, not the four-way relationship of "Roles in Relationships"
example).
We might wish to record the salary related with this contract. However, we cannot relate
it with the star; a star might get different salaries for different movies. Likewise, it does not
make sense to relate the salary with a studio (they may pay different salaries to different stars)
or with a movie (different stars in a movie may receive different salaries).
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 37
However, it is suitable to relate a salary with the (star, movie, studio) triple in the
relationship set for the Contracts relationship. In the above figure we see "Multiway
Relationships" figure fleshed out with attributes. The relationship has attribute salary, while the
entity sets have the same attributes that we showed for them in "Entity-Relationship Diagrams"
figure.
It is never necessary to place attributes on relationships. We can instead invent a new
entity set, whose entities have the attributes assigned to the relationship. If we then incorporate
this entity set in the relationship, we can leave out the attributes on the relationship itself.
However, attributes on a relationship are a useful convention, which we shall continue to use
where appropriate.
** Structural constrains in E-R model **
There are three Types of Structural (Relationship) Constraints:
1) Structural Constraints
a) Participation Constraints
b) Cardinality Ratio
2) Overlap Constraints
3) Covering Constraints
Structural Constraints are applicable for binary relationships and
Overlap and Covering Constraints are applicable for EERD (Extended ER Diagrams).
a) Participation (or) Optionality Constraints: Participation concerns with the
involvement of entities in a relationship. It specifies whether the existence of an entity depends
on another entity. There are two types of Participation Constraints
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 38
1. Total/Mandatory Participation
2. Partial/Optional Participation
Notations of Different Types of Participation In ER Diagram –
The problem of above ER diagram is that, which staff works in a particular department
remain answered. The solution is to restructure the original E-R model to' represent the correct
association as shown.
In other words the two entities should have a direct relationship between them to provide
the necessary information.
There is one another way to solve the problem of e-r diagram of figure, by introducing
direct relationship between DEPT and STAFF as shown in figure.
Chasm Trap: As discussed earlier, a chasm trap occurs when a model suggests the existence of
a relationship between entity types, but the pathway does not exist between certain entity
occurrences. It occurs where there is a relationship with partial participation, which forms part
of the pathway between entities that are related.
For example: Let us consider a database where, a single branch is allocated many staff who
handles the management of properties for rent. Not all staff members handle the property and
not all property is managed by a member of staff. The above case is represented in the e-r
diagram.
Now, the above E-R diagram is not able to represent what properties are available at a
branch. The partial participation of Staff and Property in the SP relation means that some
properties cannot be associated with a branch office through a member of staff.
We need to add the missing relationship which is called BP between the Branch and the
Property entities as shown.
Example: EERM for EMPLOYEEE super type with two sub types: “FULL TIME
EMPLOYEE” and “PART TIME EMPLOYEE”.
ename
Super type enum address
(General type)
EMPLOYEE
FULL TIME EMPLOYEE has its own attributes “salary” and “hra”.
PART TIME EMPLOYEE has its own attribute “hourly-salary”.
And both share common attributes enum, ename, address from “EMPLOYEE” super
type.
** Generalization and Specialization **
Generalization: Generalization is an object set that is a super set of another object set. (OR)
Generalization is a process of creating general entity type (super type/generic type) from a set of
specialized entity types (sub types).
This is a bottom-up process of identifying a parent object from child objects.
It forms a super type from a set of subtypes.
It is a process of defining higher level entities from lower level entities.
Example: Assume that an organization needs to store the details of different types of vehicles
such as CAR and TRUCK. The ER model for both entities can be represented as shown below.
number number
name cost name cost
CAR TRUCK
CAR entity type have “name”, “number”, “cost” and “no of seats” attributes.
TRUCK entity type have “name”, “number”, “cost” and “permit” attributes.
These two types share common attributes name, number and cost. Hence, from these two
entity types we can create a super type “VEHICLE” with the common attributes.
number cost
name
CAR TRUCK
EER model with super type/subtype relationship for VEHICLE super type, with two
subtypes: CAR, TRUCK.
Specialization: Specialization is an object set that is a subset of another object set. (OR) It is a
process of creating one or more subtypes from super type entity.
This is a top-down process of deriving child objects from parent objects.
It forms subtypes from super type.
It is a process of creating one or more specialized entities from general entity.
Example: Assume that an organization needs to store the details of products as PRODUCT
entity. ER model for PRODUCT entity is represented as:
number cost
quantity batch_no
Some of the attributes in the above example are not necessary when we go for subtypes
such as PURHASED PRODUCT and MANUFACTURED PRODUCT.
Let us draw, EER model with super type/subtype relationship for PRODUCT super type
with two subtypes: PURCHASED PRODUCT and MANUFACTURED PRODUCT.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 44
number
descrription quantity
PURCHASED MANUFACTURED
PRODUCT PRODUCT
** Aggregation **
Basically a relationship links two or more entities. In object-oriented model, relationships
can also be viewed as objects and can have attributes and participate in other relationships. Such
relationships are called aggregates.
Aggregation: It aggregation, relationship between two or more entities are grouped as one
object set. Participation of this object set in another relationship is called aggregation.
Aggregation involves the use of an aggregation i.e. a relationship viewed as an object set.
Aggregation establishes higher-level relationship, which involves three or more object sets.
Generally, we represent an aggregate by drawing a box around the relationship and its
participating object sets.
Example: Consider two object sets EXPLOYEE and BRANCH with a relationship stating “a
BRANCH has EMPLOYEE”. There is an object “MANAGER” who manages both BRANCH
and EMPLOYEEs.
1 *
BRANCH has EMPLOYEE
1 *
manages 1 1 manages
MANAGER
BRANCH OFFICE
1 Aggregation
manages
Newly perceived object set
1
MANAGER
** Composition **
Composition is a stronger form of aggregation where the part can not exist without its
containing whole entity type and the part can only be part of one entity type.
1 manages 1 DEPARTMENT
EMPLOYEE
s
SUPERVISOR
works on
Whole entity
1
PROJECT
Part entity
Here, filled diamond on the side of whole entity to indicate composition, project must
have only one department.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 46
** Functional Dependency **
Functional Dependency is a relationship that exists between multiple attributes of a
relation (or) when one attribute in a relation uniquely determines another attribute is called as
functional dependency. This concept is given by E. F. Codd.
For example, if an attribute ‘X’ determines the value of ‘Y’ it can be written as ‘XY’ it
means, “Y is functionally dependent upon X”.
Here, X- is the determinant attribute
Y- is the dependent attribute.
The common functional dependencies are:
1) Full functional dependency
2) Partial functional dependency.
3) Transitive functional dependency.
Fully functional dependency: If an attribute B is functionally dependent on a composite key
A, but not on any subset of that composite key, then attribute B is fully functionally dependent
on A.
Partial functional dependency: A non-key attribute is partially (not fully) depended on key
attribute (or) A dependency based on only part of primary key is known as partial dependency.
Transitive functional dependency: A non-key attribute is depended on another non-key
attribute (or) A dependency based on an attribute that is not part of primary key is known as
transitive dependency.
** Join dependency **
A join dependency is a constraint on the set of legal relations over a database scheme. A
table T is subject to a join dependency, if T can always be recreated by joining multiple tables
each having a subset of the attributes of T. If one of the tables in the join has all the attributes of
the table T, the join dependency is called trivial.
The join dependency plays an important role in the Fifth normal form, also known as project-
join normal form, because it can be proven that if you decompose a scheme in tables to, the
decomposition will be lossless-join decomposition.
** Data redundancy **
Data redundancy is the presence of unnecessary data and duplicate data. The file system’s
structure promotes the storage of the same data in different locations. If this data is to be
updated consistently the data stored in different locations will have different versions of the
same data. Data redundancy exists when the same data are stored unnecessarily at different
places. This data redundancy leads to two main problems. They are:
1. Data inconsistency
2. Data anomalies.
Data inconsistency: Data inconsistency exists when different and conflicting versions of the
same data appear in different places. For example, if we change a student address in the
ADMISSIONS file and if you forget to make corresponding changes in the SCHOLARSHIP
file these files contain different data for the same student. And no doubt reports will yield
inconsistency results.
Data Anomalies: Anomaly means “abnormality”. Generally a field value change should be
made in only a single place. But because of data redundancy, a field may have different values
in different locations. If the data is not normalized, we may have to update a field’s value in
many places. Even if we do not update it in one place anomaly develops. There are three types
of data anomalies found.
i) Modification anomalies: If there is some redundant data, and when you modify that the
value must be modified in many places. Failure to do causes modification anomaly, which
results different values for the same attribute.
ii) Insertion anomalies: If there is some redundant data, and when you insert new values for
that the values must be inserted in many places. Failure to do cause insertion anomaly,
which results different sets of records in different tables.
iii) Deletion anomalies: if there is some redundant data and when you delete values for that the
values must be deleted in many places. Failure to do so causes deletion anomaly, which
results having values and deleted values for the same attribute
** Normalization **
Normalization is a process of correcting and evaluating table structures to minimize data
redundancies and reducing data anomalies. (OR) it is a step-by-step decomposition of complex
records into simple records.
Purpose of Normalization: Normalization is a technique for producing a set of suitable
relations that support the data requirements of an enterprise.
Characteristics of a suitable set of relations include:
i) the minimal number of attributes necessary to support the data requirements of the
enterprise;
ii)attributes with a close logical relationship are found in the same relation;
iii)minimal redundancy with each attribute represented only once with the important
exception of attributes that form all or part of foreign keys.
The benefits of using a database that has a suitable set of relations is that the database
will be:
i)easier for the user to access and maintain the data;
ii)take up minimal storage space on the computer.
Or simply, the concept of Normalization makes a Relation or Table free from
insert/update/delete anomalies and saves space by removing duplicate data.
The Process of Normalization: Normalization follows series of stages called “normal forms”
like:
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
Normalization involve decomposition (division) of “tables with anomalies” in “smaller
well structured tables”
Consider the following STUDENT table and see how the table is normalized from 1NF to
3NF.
rno sname group fee skills
101 Ravi MBA 30000 C
C++
java
The above STUDENT table consists of multi valued attributes (skills), this can be removed
in 1NF. It consists of partial dependencies which are removed in 2NF and also it consists of
Transitive dependencies which are removed in 3NF.
First Normal Form (1NF): The lowest possible implementation of normal forms is 1NF. A
database table in 1NF must satisfy the following conditions.
The primary key entity requirements are met.
Each row and column intersections can contain one and only one value.
All the table’s attributes are dependent on the primary key attribute.
The above table is changed into following table to satisfy 1NF.
rno sname group fee skills
101 Ravi MBA 30000 C
101 Ravi MBA 30000 C++
101 Ravi MBA 30000 JAVA
Second Normal Form (2NF): A database table in 2NF must satisfy the following conditions.
The table must be in 1NF.
The table contains no partial dependencies. It means every non-key attribute is fully
depended on key-attribute.
A dependency based on only part of primary key is known as Partial dependency. The
above table consists of partial dependency (sname, group, fee are dependent on rno i,e, part
of Primary Key). So we need to remove this partial dependency from the above table to
satisfy 2NF. For this we decompose the STUDENT table into STUDENT and SKILLS
tables.
STUDENT SKILLS
rno sname group fee rno skills
101 Ravi MBA 30000 101 C
101 C++
101 JAVA
Third Normal Form (3NF): A database table in 3NF must satisfy the following conditions.
The table must be in 2NF
A dependency based on an attribute that is not part of primary key is known as Transitive
dependency. The above STUDENT table has transitive dependency i.e. dependency between
“fee” and “group” attributes. So we need to remove this transitive dependency by decomposing
the STUDENT table into STUDENT and GROUP.
STUDENT GROUP SKILLS
rno sname group group fee rno skills
101 Ravi MBA MBA 30000 101 C
101 C++
101 JAVA
Removing Transitive dependencies:
Table: STUDENT Table: GROUP Table: SKILLS
Table
With 1NF 2NF 3NF
anomalies
BCNF (Boyce- Codd Normal form):BCNF stands for Boyce-Codd Normal Form. This normal
form is considered to be a special case of 3NF. But there are few differences between BCNF
and 3NF.
3NF is satisfying 2NF and removing Transitive dependency.
Transitive dependency exists only when a non-key attribute determines another non-key
attribute.
But it is possible for a non-key attribute to be the determinant of PK or part of PK
without violating the 3NF requirements. This is nothing but BCNF.
A table is in BCNF if and only if every determinant in the relation is a candidate key.
Consider the following table: STUDENT
The above table that is in 2NF can be converted to a table in BCNF using simple two step
process.
1. The table is modified so that the determinant in the table that is not a candidate key
(group) becomes a component of PK of the revised table.
Our focus is to produce a relational database schema that can be implemented in the
target DBMS. All process performed for every step of design need to be documented for easy
maintenance.
Step 4: Design File Organizations and Indexes: Since one of the key focuses of the physical
design phase is on the performance efficiency, determining the optimal file organization and
indexes is a crucial task. Among the steps that need to be taken are as follows:
Step 5: Design User Views: This step is important for a multi-user environment. The objective
of this step is to design the user views that were identified during the requirement and analysis
of the system development lifecycle.
Step 6: Design security mechanism: Security is one of the important aspects in the database
design. The objective of this step is to realize the security measures as required by the user. The
Designer must investigate the security features provided by the selected DBMS.
UNIT- III
** Structured Query Language **
SQL stands for Structured Query Language, developed by E.F. Codd from IBM in the year
1975.
It is a programming language which stores, manipulates and retrieves the stored data in
RDBMS.
SQL syntax is not case sensitive.
SQL is standardized by both ANSI and ISO.
It is a standard language for accessing and manipulating databases.
Characteristics of SQL:
SQL is extremely flexible.
SQL uses a free form syntax that gives the ability to user to structure the SQL statements in
a best suited way.
It is a high level language.
It receives natural extensions to its functional capabilities.
It can execute queries against the database.
Advantages of SQL:
SQL provides a greater degree of abstraction than procedural language.
It is coded without embedded data-navigational instructions.
It enables the end users to deal with a number of database management systems where it is
available.
It retrieves quickly and efficiently huge amount of records from a database.
No coding required while using standard SQL.
** SQL Commands **
SQL commands can be classified into 4 types.
1. Data Definition Language (DDL)
2. Data Manipulation Language (DML)
3. Data Control Language (DCL)
4. Transaction Control Language (TCL)
4)TRUNCATE Command: It is used to delete rows (not the table’s structure) with auto
commit.
Syntax: TRUNCATE TABLE <table-name>;
(OR)
INSERT INTO <table name> [column-list] VALUES (value-list);
Example: 1. Take an EMPLOYEE table with the columns: eno, ename, job, sal, hiredate.
To insert a record into EMPLOYEE table:
SQL> insert into EMPLOYEE (eno, ename, job, sal, hiredate) values (101, ‘vasu’
‘clerk’, 7000, ’12-jan-2012’);
(OR)
SQL> insert into EMPLOYEE values (102, ‘sri’, ‘manager’, 10000, ’26-feb-2012’);
2. To insert a record that contains only eno, ename and sal:
SQL> insert into EMPLOYEE (eno, ename, sal) values (201, ‘ram’, 20000);
3. To insert a record through parameter substitution
SQL> insert into EMPLOYEE (eno, ename, job, sal) values (&eno, ‘&ename’, ‘&job’,
&sal); (OR)
SQL> insert into EMPLOYEE values(&eno, ‘&ename’, ‘&job’, &sal);
When we execute the above query, it will ask values from keyboard as follows:
Enter value for eno:104
Enter value for ename: Anji
Enter value for job: Asst manager
Enter value for sal: 15000
2. Update command: It is used to edit/change the values of attributes in a table.
Syntax: UPDATE <table-name> set column_name = value [,column_name = value, …….]
[WHERE condition];
3. DELETE command: It is used to remove (delete) one or more rows from a table.
Syntax: DELETE FROM <table-name> [WHERE condition];
Clause Description
WHERE It specifies which rows to retrieve.
GROUP BY It is used to arrange the data into groups.
HAVING It selects among the groups defined by the GROUP BY clause.
ORDER BY It specifies an order in which to return the rows.
3. Save point: It is used to make Limit (margin) for commit or roll back.
Syntax: SAVEPOINT savepointname;
COMMIT
Transcation
DELETE
SAVEPOINT A
INSERT
UPDATE
SAVEPOINT B
INSERT
ROLLBACK ROLLBACK ROLLBACK
To SAVEPOINT B to SAVEPOINT A
It is a fixed length data type, which means the memory is allocated based on the size defined
by the user but not on the value assigned.
3. Varchar/varchar2: It is also used to define attribute to store the string values. The
minimum size is 1 and maximum is 4000 bytes.
Syntax: Varchar2 (n)
It is a variable length data type, which means the memory is dynamically allocated based
on the value given by the user but not on its size defined.
4. Long: It is used to define an attribute to store the text values, with the size larger than 4000
characters. Maximum size 2GB and it is a variable-length data type.
5. Data/Time: It is used to define an attribute to store data and time values given by the user.
Default format for date is: DD-MON-YY (or) DD-MON-YYYY.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 61
** Sorting Results **
The SQL ORDER BY clause is used to sort the data in ascending or descending order,
based on one or more columns. Some databases sort the query results in an ascending order by
default.
Syntax: The basic syntax of the ORDER BY clause which would be used to sort the result in an
ascending or descending order is as follows:
SELECT column-list
FROM table_name
[WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
We can use more than one column in the ORDER BY clause. Make sure that whatever
column we are using to sort, that column should be in the column-list.
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would sort the result in an ascending order by NAME
and SALARY.
SQL> SELECT * FROM CUSTOMERS ORDER BY NAME, SALARY;
This would produce the following result :
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
The following code block has an example, which would sort the result in a descending
order by NAME.
SQL> SELECT * FROM CUSTOMERS ORDER BY NAME DESC;
This would produce the following result −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
+----+----------+-----+-----------+----------+
Notice that all aggregate functions above ignore NULL values except for the COUNT function.
SQL aggregate functions syntax: aggregate_function (DISTINCT | ALL expression)
In this, if we explicitly use DISTINCT modifier, the aggregate function ignores duplicate
values and only consider the unique values. If you use the ALL modifier, the aggregate function
uses all values for calculation or evaluation. The ALL modifier is used by default if we do not
specify any modifier explicitly.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 63
Examples:
COUNT function example: To get the number of the products in the products table, we use the
COUNT function as follows:
SQL> SELECT COUNT(*) FROM products;
AVG function example: To calculate the average units in stock of the products, we use the
AVG function as follows:
SQL> SELECT AVG(unitsinstock) FROM products;
SUM function example: To calculate the sum of units in stock by product category, we use
the SUM function with the GROUP BY clause as the following query:
SQL> SELECT categoryid, SUM(unitsinstock) FROM products GROUP BY categoryid;
MIN function example: To get the minimum units in stock of products in the products table,
we use the MIN function as follows:
SQL> SELECT MIN(unitsinstock) FROM products;
MAX function example: To get the maximum units in stock of products in the products table,
we use the MAX function as shown in the following query:
SQL> SELECT MAX(unitsinstock) FROM products;
Examples on HAVING:
1. To List the department number, whose maximum salary is greaterthan 1000.
SQL> select deptno, MAX(sal) from EMP GROUP BY deptno HAVING MAX(sal) > 1000;
2. To List the jobs, which are done by minimum of 2 persons.
SQL> select job from EMP GROUP BY job HAVING count (*) >=3;
** Nested Queries (OR) Sub Queries **
A Sub query or Inner query or a Nested query is a query within another SQL query and
embedded within the WHERE clause.
A sub query is used to return data that will be used in the main query as a condition to
further restrict the data to be retrieved.
There are a few rules that sub queries must follow:
Sub queries must be enclosed within parentheses.
A sub query can have only one column in the SELECT clause, unless multiple columns
are in the main query for the sub query to compare its selected columns.
Sub queries that return more than one row can only be used with multiple value operators
such as the IN operator.
The BETWEEN operator cannot be used with a sub query. However, the BETWEEN
operator can be used within the sub query.
A sub query is a query inside a query.
A sub query is normally expressed inside parenthesis.
The first query in the SQL statement is known as outer query.
The query inside the SQL statement is known as inner query.
The inner query is executed first.
The output of an inner query is used as the input for the outer query.
When a sub query is used in the condition of another sub query of a SQL statement then
it is called as “nested sub query”
A sub query may produce one or more records comprising one or more columns.
A sub query is always a “SELECT” statement only; where as a simple query can be any
kind of statement.
EX: 1.To print the name of the employee who draws maximum salary.
Select ename from EMP where sal= (select max(sal) from EMP);
2. To find the second maximum salary of the employees in EMP table.
Select max(sal) from EMP where sal(select max(sal) from EMP);
** Sub Queries with EXIST and NOT EXIST **
EXISTS: This operator gets the result set only when a sub query returns at least one row. The
following query lists the department names in which there is at least one employee.
SQL> select dname from DEPT D where EXISTS(select * from EMP where deptno=D.deptno);
NOT EXISTS: This operator gets the result set only when a sub query does not returns any
row. The following query lists the department names in which there are no employees.
SQL> select dname from DEPT D where NOT EXISTS(select * from EMP where deptno=D.deptno);
ALL must be preceded by comparison operators and evaluates true if all of the subqueries
values meet the condition.
ALL is used with SELECT, WHERE, HAVING statement.
OrderDetails Table
To Find the name of the product if all the records in the OrderDetails has Quantity
either equal to 6 or 2.
SELECT ProductName FROM Products WHERE ProductID = ALL (SELECT
ProductId FROM OrderDetails WHERE Quantity = 6 OR Quantity = 2);
Output:
ANY: ANY compares a value to each value in a list or results from a query and evaluates to
true if the result of an inner query contains at least one row.
ANY return true if any of the subqueries values meet the condition.
Equi join: A join in which the joining condition is based on equality between the values in
common columns.
EX: Select eno, ename, EMP.Deptno, DEPT.deptno, deptname from EMP, DEPT where
EMP.deptno=DEPT.deptno;
ENO ENAME EMP. DEPTNO DEPT. DEPTNO DEPTNAME
1 Abhi 10 10 Bcom
2 Balu 20 20 BSc
3 charan 30 30 BBM
Non-Equi join: A join in which joining condition is based on non-equality between the values
in common columns.
EX: Select eno, ename, deptname from EMP, DEPT where EMP.Deptno!= DEPT.deptno;
ENO ENAME DEPTNAME
1 Abhi BSc
1 Abhi BBM
1 Abhi BCA
2 Balu BCom
2 Balu BBM
2 Balu BCA
3 Charan BCom
3 Charan BSc
3 Charan BCA
4 Dhoni BCom
4 Dhoni BSc
4 Dhoni BBM
4 Dhoni BCA
Outer join: A join in which that do not have matching values in common columns is also
included in the result. It includes left outer and right outer join.
i) Left outer join: It gives all the values of left table plus matched values from the right table.
Following query displays all records from EMP table even if there is no matching deptno in
DEPT table.
EX: select eno, ename, EMP.deptno, deptname from EMP, DE{T WHERE
EMP.deptno=DEPT.deptno (+);
ENO ENAME EMP. DEPTNO DEPTNAME
1 Abhi 10 BCom
2 Balu 20 BSc
3 Charan 30 BBM
4 Dhoni 40 ------
ii) Right outer join: It gives all the values of right table plus matched values from the left table.
Following query displays all records from DEPT table even if there is no matching deptno in
EMP table.
EX: Select eno, ename, DEPT.deptno, deptname from EMP, DEPT WHERE
EMP.deptno(+)= DEPT.deptno;
ENO ENAME DEPT. DEPTNO DEPTNAME
1 Abhi 10 BCom
2 Balu 20 BSc
3 Charan 30 BBM
-- ------ 50 BCA
Self join: This is a join in which a table is joined to itself, where joining condition is based on
columns of a same table.
EX: Select e.eno, d.ename from EMP e, EMP d where e.mid= d.eno;
EMP
eno ename mid
1 A 3
2 B 3
3 C 3
4 D 4
5 E 4
Cartesian product join: A join in which all possible combinations of all the rows of first table
with each row of second table appear.
EX: Select ename, deptname from EMP, DEPT;
ENAME DEPTNAME
Abhi BCom
Balu BCom
Charan BCom
Dhoni BCom
Abhi BSc
Balu BSc
Charan BSc
Dhoni BSc
Abhi BBM
Balu BBM
Charan BBM
Dhoni BBM
Abhi BCA
Balu BCA
Charan BCA
Dhoni BCA
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 70
INDEX - Used to create and retrieve data from the database very quickly
SQL NOT NULL Constraint: By default, a column can hold NULL values. The NOT NULL
constraint enforces a column to NOT accept NULL values. This enforces a field to always
contain a value, which means that you cannot insert a new record, or update a record without
adding a value to this field.
The following SQL ensures that the "ID", "LastName", and "FirstName" columns will
NOT accept NULL values:
SQL> CREATE TABLE Persons (ID int NOT NULL, LastName varchar(255) NOT NULL,
FirstName varchar(255) NOT NULL, Age int);
SQL UNIQUE Constraint: The UNIQUE constraint ensures that all values in a column are
different. Both the UNIQUE and PRIMARY KEY constraints provide a guarantee for
uniqueness for a column or set of columns. A PRIMARY KEY constraint automatically has a
UNIQUE constraint. However, you can have many UNIQUE constraints per table, but only one
PRIMARY KEY constraint per table.
The following SQL creates a UNIQUE constraint on the "ID" column when the "Persons"
table is created:
SQL> CREATE TABLE Persons(ID int NOT NULL UNIQUE, LastName varchar(255) NOT
NULL, FirstName varchar(255), Age int);
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 71
SQL PRIMARY KEY Constraint: The PRIMARY KEY constraint uniquely identifies each
record in a database table. Primary keys must contain UNIQUE values, and cannot contain
NULL values. A table can have only one primary key, which may consist of single or multiple
fields.
The following SQL creates a PRIMARY KEY on the "ID" column when the "Persons"
table is created:
SQL>CREATE TABLE Persons(ID int NOT NULL PRIMARY KEY, LastName
varchar(255) NOT NULL, FirstName varchar(255), Age int);
SQL FOREIGN KEY Constraint: A FOREIGN KEY is a key used to link two tables
together. A FOREIGN KEY is a field (or collection of fields) in one table that refers to the
PRIMARY KEY in another table. The table containing the foreign key is called the child table,
and the table containing the candidate key is called the referenced or parent table.
Look at the following two tables:
"Persons" table:
PersonID LastName FirstName Age
1 Hansen Ola 30
2 Svendson Tove 23
3 Pettersen Kari 20
"Orders" table:
OrderID OrderNumber PersonID
1 77895 3
2 44678 3
3 22456 2
4 24562 1
Notice that the "PersonID" column in the "Orders" table points to the "PersonID" column
in the "Persons" table.
The "PersonID" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "PersonID" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links
between tables.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 72
The FOREIGN KEY constraint also prevents invalid data from being inserted into the
foreign key column, because it has to be one of the values contained in the table it points to.
The following SQL creates a FOREIGN KEY on the "PersonID" column when the
"Orders" table is created:
SQL> CREATE TABLE Orders (OrderID int NOT NULL, OrderNumber int NOT
NULL, PersonID int REFERENCES Persons(PersonID));
SQL CHECK Constraint: The CHECK constraint is used to limit the value range that can be
placed in a column. If you define a CHECK constraint on a single column it allows only certain
values for this column. If you define a CHECK constraint on a table it can limit the values in
certain columns based on values in other columns in the row.
The following SQL creates a CHECK constraint on the "Age" column when the
"Persons" table is created. The CHECK constraint ensures that you can not have any person
below 18 years:
SQL> CREATE TABLE Persons (ID int NOT NULL, LastName varchar(255) NOT NULL,
FirstName varchar(255), Age int CHECK(Age>=18));
SQL DEFAULT Constraint: The DEFAULT constraint is used to provide a default value for
a column.The default value will be added to all new records IF no other value is specified.
The following SQL sets a DEFAULT value for the "City" column when the "Persons"
table is created:
SQL> CREATE TABLE Persons(ID int NOT NULL, LastName varchar(255) NOT NULL,
FirstName varchar(255), Age int, City varchar(255) DEFAULT 'Sandnes');
** Indexes **
An index is an object which is used to improve performance during retrieval of records.
Creating an Index:
Syntax: CREATE INDEX index_name ON table_name;
In the above example, the UNIQUE keyword is used when combined values of index
should be unique. It does not allowed duplicate values to be inserted into the table.
We created an Index on Employee name (Ename) column in the Employee table.
Removing an Index: Indexes can be dropped explicitly using the DROP INDEX command.
Syntax: DROP INDEX Index_name;
Example: DROP INDEX emp_ename_index;
** Views **
View is a virtual table on one or more tables. The table on which a view is created is
called as a “base table”.
We can provide limitation on updating in base table through a view. If any changes made
to the base table those changes are also reflected in view.
Advantages of views:
We can provide security to the data.
We can provide limitation on data.
We can provide customized view for the data.
View uses little storage area.
It allows different users to view the same data in different ways at the same time.
It does not allow direct access to the tables of the data dictionary.
Disadvantages of views:
It can’t be indexed.
Takes time for creating view each time.
When table is dropped view becomes inactive.
View is a database object, so it occupies the space.
Without table, view will not work.
Updation is possible for simple view but not for complex views, they are read only type
of views.
Creating Views: Database views are created using the CREATE VIEW statement. Views can
be created from a single table, multiple tables or another view. To create a view, a user must
have the appropriate system privilege according to the specific implementation.
The basic CREATE VIEW syntax is as follows:
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 74
View Resolution:
The SQL Server query processor treats indexed and non indexed views differently:
The rows of an indexed view are stored in the database in the same format as a table. If the
query optimizer decides to use an indexed view in a query plan, the indexed view is treated
the same way as a base table.
Only the definition of a non indexed view is stored, not the rows of the view. The query
optimizer incorporates the logic from the view definition into the execution plan it builds for
the SQL statement that references the non indexed view.
The logic used by the SQL Server query optimizer to decide when to use an indexed view is
similar to the logic used to decide when to use an index on a table. If the data in the indexed
view covers all or part of the SQL statement, and the query optimizer determines that an index
on the view is the low-cost access path, the query optimizer will choose the index regardless of
whether the view is referenced by name in the query.
Restrictions applicable while creating views: Views can be created referencing tables and
views only in the current database.
A view cannot be indexed.
A view cannot be Altered or renamed. Its columns cannot be renamed.
To alter a view, it must be dropped and re-created.
ANSI_NULLS and QUOTED_IDENTIFIER options should be turned on to create a
view.
All tables referenced in a view must be part of the same database.
Any user defined functions referenced in a view must be created with
SCHEMABINDING option.
Cannot use ROWSET, UNION, TOP, ORDER BY, DISTINCT, COUNT(*),
COMPUTE, COMPUTE BY in views.
Updating a View: A view can be updated under certain conditions which are given below:
The SELECT clause may not contain the keyword DISTINCT.
functional languages, and it is sometimes described as a form of pre computation. As with other
forms of pre computation, database users typically use materialized views for performance
reasons, i.e. as a form of optimization.
Materialized views which store data based on remote tables are also known as snapshots.
In any database management system following the relational model, a view is a virtual table
representing the result of a database query. Whenever a query or an update addresses an
ordinary view's virtual table, the DBMS converts these into queries or updates against the
underlying base tables.
A materialized view takes a different approach: the query result is cached as a concrete
("materialized") table (rather than a view as such) that may be updated from the original base
tables from time to time. This enables much more efficient access, at the cost of extra storage
and of some data being potentially out-of-date. Materialized views find use especially in data
warehousing scenarios, where frequent queries of the actual base tables can be expensive.
** Transactions **
Database Transaction is an atomic unit that contains one or more SQL statements.
It is a series of operations that performs as a single unit of work against a database.
It has a beginning and an end to specify its boundary.
Let's take a simple example of bank transaction, suppose a Bank clerk transfers Rs. 1000
from X's account to Y's account.
X's Account
open-account (X)
prev-balance = X.balance
curr-balance = prev-balance – 1000
X.balance = curr-balance
close-account (X)
Decreasing Rs. 1000 from X's account, saving new balance that is current balance and
after completion of transaction the last step is closing the account.
Y's Account
open-account (Y)
prev - balance = Y.balance
curr - balance = prev-balance + 1000
Y.balance = curr-balance
close-account (Y)
Adding Rs. 1000 in the Y's account and saving new balance that is current balance and after
completion of transaction the last step is closing the account.
The above example defines a very simple and small transaction that tells how the transaction
management actual works.
** Transaction Properties **
Following are the Transaction Properties, referred to by an acronym ACID properties. These
properties guarantee that the database transactions are processed reliably.
1.Atomicity
2.Consistency
3.Isolation
4.Durability
1. Atomicity:
Atomicity defines that all operations of the transactions are either executed or none.
Atomicity is also known as 'All or Nothing', it means that either performs the operations
or not performs at all.
It is maintained in the presence of deadlocks, CPU failures, disk failures, database and
consistent state.
It preserves consistency of the database.
If execution of transaction is successful, then the database remains in a consistent state. If
the transaction fails, then the transaction will be rolled back and the database will be
restored to a state consistent.
3. Isolation
Isolation defines that the transactions are securely and independently processed at the
The operations cannot access or see the data in an intermediate state during a transaction.
Isolation is needed when there are concurrent transactions occurring at the same time.
4. Durability
Durability states that after completion of transaction successfully, the changes are
required for the database.
Durability holds its latest updates even if the system fails or restarts.
It has the ability to recover committed transaction updates even if the storage media fails.
** Transaction States **
A transaction is a small unit of program which contains several low level tasks. It is an event
which occurs on the database. It has the following states,
1. Active
2. Partially Committed
3. Failed
4. Aborted
5. Committed
1. Active: Active is the initial state of every transaction. The
transaction stays in Active state during execution.
2. Partially Committed: Partially committed state defines
that the transaction has executed the final statement.
3. Failed: Failed state defines that the execution of the
transaction can no longer proceed further.
4. Aborted: Aborted state defines that the transaction has
rolled back and the database is being restored to the
consistent state.
5. Committed: If the transaction has completed its
execution successfully, then it is said to be committed.
DACs are discretionary because the subject (owner) can transfer authenticated objects or
information access to other users. In other words, the owner determines object access privileges.
Recall that Lampson's gold standard identifies authorization, authentication, and audit as
essential mechanisms for computer security. We begin studying authorization, which controls
whether actions of principals are allowed, by considering access control. An access control
policy specifies access rights, which regulate whether requests made by principals should be
permitted or denied.
In access control, we refine the notion of a principal to be one of a:
user: a human
subject: a process executing on behalf of a user
object. The principle of Failsafe Defaults says that this should be the default.
Determine access: decide whether a subject has access, according to some policy, to take an
DECLARE
<variable declarations)
BEGIN
<SQL, PL/SQL statements>
EXCEPTION
<handling exceptions>
ENO;
S NUMBER;
BEGIN
S: = X+Y;
dbms_output.put_line (‘Sum=’||S)
End;
The above program gives the following output.
Enter value for X: 50
Enter value for Y: 30
Sum = 80
In Oracle, to display the output of a variable or a string in PL/SQL block we use
dbms_output.put_line( ). Here, “dbms-output’ is a predefined package and put_line( ) is
one of the procedure of this package.
To activate the server, we will use the following command.
set serveroutput on
Declarations and Assignments: The assignment statement sets the current value of a variable,
field, parameter, or element that has been declared in the current scope.
The assignment operator (: =) in the assignment statement cam also appear in a constant
or variable declaration.
** Data types in PL/SQL **
Data type represents the type of data a variable will hold. Following are the different
types of data types.
1. Scalar data types
2. Composite types
3. Reference types.
Scalar types: All built- in data types are called as scalar types. Scalar type allow only one
element.
Example: Number(p), Number(p, q), char, varchar2, data, blob, Boolean, binary- integer,
binary- float, binary-double.
Declaration: variable datatype;
Ex: x number;
Composite type: These data types allow group of elements.
Example: record, table, arrays.
** Control Statements **
Control statements(structures) are programming constructs used to control the execution
flow in a program. The different control statements are listed below.
1. Conditional control structures:
a) if - then
b) if - else
c) if - elsif
2. Iterative control structures (loops)
a) Simple loop
b) While loop
c) For loop
3. Jumping control structures
a) Goto
b) Exit
c) Exit-when
Conditional control structures: These are used to execute the related statements based on the
condition specified.
if-then: If executes the related statements only when the condition is true.
Syntax: if condition then
Statements
end if;
Example: Program to find the given number is positive or not.
SQL> declare
x number : = &x;
begin
if (x>0) then
dbms_output.put_line (‘positive number’);
end if;
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 84
end;
/
Output: Enter value for x: 10
Positive number
if-else: It executes ‘if -block’ statements only when the condition is true and ‘else-block’
statements only when the condition is false.
Syntax: if condition then
statements
else
statements
end if;
Example: Program to find given number is even or odd
SQL> declare
x number: = &x;
begin
if mod(x,2) =0 then
dbms_output.put_line (‘even number’);
else
dbms_output.put_line(‘odd number’);
end if;
end;
/
Output: Enter value for x: 5
Odd number
if-elsif: In this, first block gets executed only when condition1 is true, and second block will be
executed only when condition1 is false and condition2 is true and so on.
Syntax: if condition1 then
Statements
elsif condition2 then
Statements
:
else
Statements
end if;
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 85
Iterative control structures: These are used to execute related statements as long as the
condition is true.
Simple loop: It executes the related statements infinite times. This is by nature infinite loop, to
come out of this we need to use ‘exit’
Syntax: Loop
Statements
end loop
While loop: It executes the related statements as long as the condition is true, it stops the loop
when the condition is false.
Syntax:
while condition
Loop
Statements
end loop;
SQL>declare Output:
a number: =1: 1
begin 2
while a<=10 3
Loop 4
dbms_output.put_line(a); 5
a:=a+1; 6
end loop; 7
end; 8
/ 9
10
For loop: It executes the related expressions as long as the variable ‘var’ matches with the
values of the given range.
Syntax:
For var IN [REVERSE] startval .. endval
Loop
Statements
End loop;
SQL> declare
n number :=&n;
f number: =1;
begin
<<xyz>>
f:=f*n;
n:=n-1;
if (n>0) then
goto xyz;
else
dbms_output.put_line (‘Factorial is = ‘||f);
end if;
end;
/
Output: Enter value for n: 5
Factorial is= 120
Exit-when: It stops the execution of a corresponding loop on meeting the given condition.
Syntax:
Loop
Statements
Exit when condition;
End loop;
Example: Program to print 1 to 10 numbers.
SQL> declare Output:
a number :=1; 1
begin 2
loop 3
dbms_output.put_line(a); :
a:= a+1; :
exit when a>10; 10
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 88
end loop;
end;
/
** Exceptions **
Exception is a condition caused by a runtime error in the program. Handling runtime
errors is called as exception handling. Actually, when a runtime error occurs, a program gets
terminated abruptly. But, exception handling ensures safe termination of a program. Exceptions
are two types.
1. System defined exceptions: These are raised automatically by oracle when a
corresponding runtime error occurs.
2. User defined exceptions: These should be defined and raised by users.
System defined exception (or) predefined exceptions:
1) ZERO_DIVIDE when trying to divide a number by ZERO.
2) VALUE_ERROR when converting inappropriate character to a number.
3) INVALID_NUMBER when converting or rounding values.
4) NO_DATA_FOUND when data is not found during fetching.
5) TOO_MANY_ROWS When more than one row is fetched.
6) DUP_VAL_ON_INDEX when entering duplicate values for a unique or primary key
attribute.
7) LOGIN_DENIED when the login or password is invalid while logging.
8) STORAGE_ERROR When it runs out memory
9) PROGRAM_ERROR Internal error of PL/SQL.
10) INVALID_CURSOR Illegal cursor opening.
11) CURSOR_ALREADY_OPEN opening a cursor which is already opened.
12) OTHERS can catch any type of exception.
Syntax for Exception- handling:
DECLARE
Declaration of variables
BEGIN
Statements
Exception
WHEN exception-name1 THEN
Error-handling statements
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 89
Steps(Syntax) to use Explicit cursor: Opening, closing of cursor and fetching values from the
cursor should be written carefully.
DECLARE
CURSOR cursor-name IS SQL-Query;
BEGIN
OPEN cursor-name;
FETCH cursor-name INTO variables; /* repeated in a loop*/
CLOSE cursor-name;
END;
Example on Explicit cursor: Following is a program that prints the names of all the clerks
from “EMP” table using cursors.
DECLARE
N EMP.ENAME%TYPE;
CURSOR C1 IS SELECT ENAME FROM EMP WHERE JOB= ‘CLERE’;
BEGIN
OPEN C1;
LOOP
FETCH C1 INTO N;
EXIT WHEN C1% NOTFOUND;
DBMS _OUTPUT.PUT_LINE(N);
END LOOP;
CLOSE C1;
END;
Example on Implicit cursor: Following is a program that updates the salary of all mangers by
10% and number of records that have been updated.
BEGIN
Update EMP set sal = sal+sal*0.1 WHERE JOB= ‘MANAGER’;
IF SQL%FOUND THEN
dbms_output.put_line(SQL% ROW COUNT ||‘ RECORDS UPDATEd’);
END IF;
END;
Cursor processing commands: To work with cursors, we need to use three cursor processing
commands ‘OPEN’, ‘CLOSE’ and ‘ FETCH’,
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 91
i) OPEN: By default the status of the cursor is closed. Opening is mandatory to fetch
values from the cursor. When we open a cursor, it executes the SQL command and
populates the cursor with data.
Syntax:
Open cursor-name;
ii) FETCH: FETCH is used to retrieve values from the cursor. But, it can fetch values one
by one sequentially until there are no elements found in the cursor i.e. first time it point to
the first record, then second record and so on.
Syntax:
FETCH cursor-name INTO variable1, variable 2,….. ;
iii) CLOSE: This closes the cursor for process. Once, the cursor has been closed we cannot
fetch values from it
Syntax:
CLOSE cursor-name;
Cursor attributes: Cursor attributes are used to know the status of a cursor
Explicit cursor attributes:
1. %found: returns TRUE, if a cursor can fetch the records, otherwise FALSE
2. %notfound: returns TRUE, if a cursor cannot fetch the records, otherwise FALSE
3. %rowcount: holds the record number it is pointing to.
4. %isopen: returns TRUE, if the cursor is in open state, otherwise FALSE.
Implicit cursor attributes:
1) SQL%found: if no records are affected, it returns false.
2) SQL%notfound: if no records are affected, it returns true.
3) SQl%rowcount: holds the number of rows affected by the related commands.
4) SQL%isopen: After executing related commands, it returns false.
** Subprograms **
A PL/SQL subprogram is a named PL/SQL block that can be invoked with a set of
parameters. A subprogram can be either a procedure or a function. Typically, we use a
procedure to perform an action and a function to compute and return a value.
Stored procedures: Stored procedure is a named PL/SQL block which performs one more
specific task. A procedure has a header and a body. The header consists of name of the
procedure and the parameters.
The body consists of declaration section, execution section and exception section similar
to a general PL/SQL block. It is also a database object stored in a data-dictionary. A procedure
can be called in any program.
A procedure takes three types of parameters: IN, OUT and IN OUT.
We can only read values from IN parameter.
We can write values into OUT parameter.
We can read and write values, if the parameter is of type IN OUT.
Syntax:
Stored functions: A function is a named PL/SQL block which is similar to a procedure. The
main difference between a procedure and a function is, a function returns a value but a
procedure dose not.
Syntex:
CREATE [OR REPLACE] FUNCTION function-name[(argument list)] RETURN
datatype IS/AS
Declarations
BEGIN
Statements
EXCEPTION
Statements
RETURN value;
END;
Executing a function: To execute a function, we simply make a call of in a PL/SQL. It can also
be executed using ‘EXEC with DBMS_OUTPUT.PUT_LINE statement’ or by calling in
SELECT statements
Example: Following is a function that returns factorial of a give number.
CREATE OR REPLACE FUNCTION FACT(X NUMBER) RETURN NUMBER IS
K NUMBER: =1;
BEGIN
FOR I IN 1.. X
LOOP
K:=K*I;
END LOOP;
RETURN K;
END;
*Three different ways of executing the above function as shown below:
i) SQL> EXEC dbms_output.put_line(FACT(4));
ii) SQL> BEGIN
dbms_output.put_line (‘factorial=’||fact (4));
END;
iii) SQL> select FACT(4) from dual;
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 94
** Packages **
Packages are schema objects that groups logically related PL/SQL types, variables, and
subprograms. A package will have two mandatory parts:
Package specification
Package body or definition
Package Specification: The specification is the interface to the package. It just DECLARES
the types, variables, constants, exceptions, cursors, and subprograms that can be referenced
from outside the package. In other words, it contains all information about the content of the
package, but excludes the code for the subprograms.
All objects placed in the specification are called public objects. Any subprogram not in
the package specification but coded in the package body is called a private object.
The following code shows a package specification having a single procedure. We can
have many global variables defined and multiple procedures or functions inside a package.
CREATE PACKAGE cust_sal AS
PROCEDURE find_sal(c_id customers.id%type);
END cust_sal;
/
When the above code is executed at the SQL prompt, it produces the following result:
Package created.
Package Body: The package body has the codes for various methods declared in the package
specification and other private declarations, which are hidden from the code outside the
package.
The CREATE PACKAGE BODY Statement is used for creating the package body. The
following code shows the package body declaration for the cust_sal package created above. I
assumed that we already have CUSTOMERS table.
CREATE OR REPLACE PACKAGE BODY cust_sal AS
PROCEDURE find_sal(c_id customers.id%TYPE) IS
c_sal customers.salary%TYPE;
BEGIN
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 96
** Triggers **
Trigger is a procedure that gets executed automatically when an event occurs. The events
could be INSERT or UPDATE or DELETE. A trigger fires before or after a transaction takes
place. The events are:
BEFORE INSERT
AFTER INSERT
BEFORE UPDATE
AFTER UPDATE
BEFORE DELETE
AFTER DELETE
Types of triggers: There are mainly two types of triggers.
1) Row level triggers (or) For each row: These triggers gets executed once for each record
that has been affected by an event.
2) Statement level triggers (or) For each statement: These triggers gets executed only
once but every time the event occurs.
Characteristics:
A triggers is invoked before or after a row is inserted, updated or deleted
A trigger may be row level or column level
A trigger does not take parameters.
A trigger is table dependent.
Each table may have one or more triggers.
Advantages:
It can be used to enforce complex constraints
It is security object to provide security to the table like tracking the transaction.
It can be used to ensure proper date entry.
It can replicate table for back up purposes
It can be used to define the default values of database table.
It can be used to interrupt a transaction when it is inappropriate.
Syntax:
CREATE [OR REPLACE] TRIGGER trigger-name BEFORE/AFTER
[INSERT] [OR UPDATE] [OR DELETE] ON table-name
[FOR EACH CONDTION] [WHEN Condition]
BEGIN
Statements;
EXCEPTION
Statements;
END;
Example: Following trigger inserts or updates values of ename and job in uppercase even if we
enter date in lowercase strings.
CREATE OR REPLACE TRIGGER T1 BEFORE INSERT OR UPDATE ON
EMP FOR EACH ROW
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 98
BEGIN
:NEW.ENAME: = UPPER(:NEW.ENAME);
:NEW.JOB : = UPPER(:NEW.JOB);
END;
After creating the trigger, if we issue the following commands with column values in
lower cases, trigger automatically converts them into uppercase letters.
SQL> UPDATE EMP SET JOB = ‘manager’ WHERE ENAME = ‘SMITH’;
Now, in EMP table, SMITH’s job becomes ‘MANAGER’ i.e. in uppercase letters.
SQL> INSERT INTO EMP(EMPNO, ENAME, JOB) VALUES (1234, ‘ramesh’,
‘clerk’);
The new record in EMP table will be 1234, RAMESH. CLERK, i.e. name and job
in uppercase letters
UNIT- IV
** Transaction Management **
Database Transaction is an atomic unit that contains one or more SQL statements.
It is a series of operations that performs as a single unit of work against a database.
It has a beginning and an end to specify its boundary.
Let's take a simple example of bank transaction, Suppose a Bank clerk transfers Rs. 1000
from X's account to Y's account.
X's Account
open-account (X)
prev-balance = X.balance
curr-balance = prev-balance – 1000
X.balance = curr-balance
close-account (X)
Decreasing Rs. 1000 from X's account, saving new balance that is current balance and
after completion of transaction the last step is closing the account.
Y's Account
open-account (Y)
prev - balance = Y.balance
curr - balance = prev-balance + 1000
Y.balance = curr-balance
close-account (Y)
Adding Rs. 1000 in the Y's account and saving new balance that is current balance and after
completion of transaction the last step is closing the account.
The above example defines a very simple and small transaction that tells how the transaction
management actual works.
** Transaction Properties **
Following are the Transaction Properties, referred to by an acronym ACID properties. These
properties guarantee that the database transactions are processed reliably.
1.Atomicity
2.Consistency
3.Isolation
4.Durability
1. Atomicity: Atomicity defines that all operations of the transactions are either executed or
none.
Atomicity is also known as 'All or Nothing', it means that either performs the operations
or not performs at all.
It is maintained in the presence of deadlocks, CPU failures, disk failures, database and
application software failures.
2. Consistency: Consistency defines that after the transaction is finished, the database must
remain in a consistent state.
** Transaction States **
A transaction is a small unit of program which contains several low level tasks. It is an event
which occurs on the database. It has the following states,
6. Active
7. Partially Committed
8. Failed
9. Aborted
10. Committed
6. Active: Active is the initial state of every transaction. The
transaction stays in Active state during execution.
7. Partially Committed: Partially committed state defines
that the transaction has executed the final statement.
8. Failed: Failed state defines that the execution of the
transaction can no longer proceed further.
9. Aborted: Aborted state defines that the transaction has
rolled back and the database is being restored to the
consistent state.
10. Committed: If the transaction has completed its
execution successfully, then it is said to be committed.
** Database(DBMS) Architecture **
A Database Management system is not always directly available for users and
applications to access and store data in it. A Database Management system can be centralized
(all the data stored at one location), decentralized (multiple copies of database at different
locations) or hierarchical, depending upon its architecture.
1-tier DBMS architecture also exist, this is when the database is directly available to the
user for using it to store data. Generally such a setup is used for local application development,
where programmers communicate directly with the database for quick response.
Database Architecture is logically of two types:
1. 2-tier DBMS architecture
2. 3-tier DBMS architecture
2-tier DBMS Architecture: 2-tier DBMS architecture includes an Application layer between
the user and the DBMS, which is responsible to communicate the user's request to the database
management system and then send the response from the DBMS to the user.
An application interface known as ODBC(Open Database Connectivity) provides an API
that allow client side program to call the DBMS. Most DBMS vendors provide ODBC drivers
for their DBMS.
Such architecture provides the DBMS extra security as it is not exposed to the End User
directly. Also, security can be improved by adding security and authentication checks in the
Application layer too.
3-tier DBMS Architecture: 3-tier DBMS architecture is the most commonly used architecture
for web applications.
It is an extension of the 2-tier architecture. In the 2-tier architecture, we have an
application layer which can be accessed programatically to perform various operations on the
DBMS. The application generally understands the Database Access Language and processes
end users requests to the DBMS.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 102
** Concurrency Control **
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions. We have
concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent
transactions.
Need For Concurrency Control:
Process of managing simultaneous operations on the database without having them
interfere with one another.
Prevents interference when two or more users are accessing database simultaneously and
at least one is updating data.
Although two transactions may be correct in themselves, interleaving of operations may
produce an incorrect result.
1) The Lost Update Problem: This problem occurs when two transactions that access the
same database items have their operations interleaved in a way that makes the value of some
database item incorrect. Successfully completed update is overridden by another user.
2) The Temporary Update (or Dirty Read) Problem: This problem occurs when one
transaction updates a database item and then the transaction fails for some reason. The
updated item is accessed by another transaction before it is changed back to its original
value. Occurs when one transaction can see intermediate results of another transaction
before it has committed.
3) The Incorrect Summary Problem: If one transaction is calculating an aggregate summary
function on a number of records while other transactions are updating some of these records,
the aggregate function may calculate some values before they are updated and others after
they are updated. Occurs when transaction reads several values but second transaction
updates some of them during execution of first.
** Deadlocks **
A deadlock is a condition where in two or more tasks are waiting for each other in order
to be finished but none of the task is willing to give up the resources that other task needs. In
this situation no task ever gets finished and is in waiting state forever.
Neither transaction can continue because each transaction in the set is on a waiting queue,
waiting for one of the other transactions in the set to release the lock on an item. Transactions
whose lock requests have been refused are queued until the lock can be granted.
A deadlock is also called a circular waiting condition where two transactions are waiting
(directly or indirectly) for each other. Thus in a deadlock, two transactions are mutually
excluded from accessing the next record required to complete their transactions, also called a
deadly embrace.
Example: A deadlock exists two transactions A and B exist in the following example:
Transaction A = access data items X and Y
Transaction B = access data items Y and X
Here, Transaction-A has acquired lock on X and is waiting to acquire lock on Y. While,
Transaction-B has acquired lock on Y and is waiting to acquire lock on X. But, none of them
can execute further.
Transaction-A Time Transaction-B
--- t0 ---
Lock (X) (acquired lock on X) t1 ---
--- t2 Lock (Y) (acquired lock on Y)
Lock (Y) (request lock on Y) t3 ---
Wait t4 Lock (X) (request lock on X)
Wait t5 Wait
Wait t6 Wait
Wait t7 Wait
deadlock or terminating process one by one until deadlock is resolved can be the solutions
but both of these approaches are not good. Terminating all processes cost high and partial
work done by processes gets lost. Terminating one by one takes lot of time because each
time a process is terminated, it needs to check whether the deadlock is resolved or not. Thus,
the best approach is considering process age and priority while terminating them during a
deadlock condition.
Resource Preemption: Another approach can be the preemption of resources and allocation
Deadlock prevention technique is used in two-phase locking. We have learnt that if all
the four Coffman conditions hold true then a deadlock occurs so preventing one or more of
them could prevent the deadlock. Coffman conditions are:
Removing mutual exclusion: All resources must be sharable that means at a time more than
one processes can get a hold of the resources. That approach is practically impossible.
Removing hold and wait condition: This can be removed if the process acquires all the
resources that are needed before starting out. Another way to remove this to enforce a rule of
requesting resource when there are none in held by the process.
Avoid circular wait condition: This can be avoided if the resources are maintained in a
hierarchy and process can hold the resources in increasing order of precedence. This avoid
circular wait. Another way of doing this to force one resource per process rule – A process
can request for a resource once it releases the resource currently being held by it. This avoids
the circular wait.
Deadlock Avoidance:
Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock
occurrence. There are two algorithms for deadlock avoidance.
Wait/Die
Wound/Wait
Here is the table representation of resource allocation for each algorithm. Both of these
algorithms take process age into consideration while determining the best possible way of
resource allocation for deadlock avoidance.
Wait/Die Wound/Wait
Older process needs a resource held by younger
Older process waits Younger process dies
process
Younger process needs a resource held by older Younger process Younger process
process dies waits
Every transaction has a timestamp associated with it, and the ordering is determined by
the age of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is
two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the
system know when the last ‘read and write’ operation was performed on the data item.
Timestamp are divided into further fields:
1. Granule Timestamps
2. Timestamp Ordering
3. Conflict Resolution in Timestamps
1. Granule Timestamps: Granule timestamp is a record of the timestamp of the last
transaction to access it. Each granule accessed by an active transaction must have a granule
timestamp. A separate record of last Read and Write accesses may be kept. Granule timestamp
may cause additional Write operations for Read accesses if they are stored with the granules.
The problem can be avoided by maintaining granule timestamps as an in-memory table. The
table may be of limited size, since conflicts may only occur between current transactions. An
entry in a granule timestamp table consists of the granule identifier and the transaction
timestamp. The record containing the largest (latest) granule timestamp removed from the table
is also maintained. A search for a granule timestamp, using the granule identifier, will either be
successful or will use the largest removed timestamp.
2. Timestamp Ordering: Following are the three basic variants of timestamp-based methods of
concurrency control:
Total timestamp ordering
The algorithm allows the granule to be read by any transaction younger than the last transaction
that updated the granule. A transaction is aborted if it tries to update a granule that has
previously been accessed by a younger transaction. The partial timestamp ordering algorithm
aborts fewer transactions than the total timestamp ordering algorithm, at the cost of extra
storage for granule timestamps
(c) Multi version Timestamp ordering:
The multi version timestamp ordering algorithm stores several versions of an updated
granule, allowing transactions to see a consistent set of versions for all granules it accesses. So,
it reduces the conflicts that result in transaction restarts to those where there is a Write-Write
conflict. Each update of a granule creates a new version, with an associated granule timestamp.
A transaction that requires read access to the granule sees the youngest version that is older than
the transaction. That is, the version having a timestamp equal to or immediately below the
transaction’s timestamp.
3. Conflict Resolution in Timestamps:
To deal with conflicts in timestamp algorithms, some transactions involved in conflicts
are made to wait and to abort others. Following are the main strategies of conflict resolution in
timestamps:
WAIT-DIE:
The older transaction waits for the younger if the younger has accessed the granule first.
The younger transaction is aborted (dies) and restarted if it tries to access a granule after
an older concurrent transaction.
WOUND-WAIT:
The older transaction pre-empts the younger by suspending (wounding) it if the younger
Data item granularity significantly affects concurrency control performance. Thus, the
degree of concurrency is low for coarse granularity and high for fine granularity.
Example of data item granularity:
1. A field of a database record (an attribute of a tuple)
2. A database record (a tuple or a relation)
3. A disk block
4. An entire file
5. The entire database
The following diagram illustrates a hierarchy of granularity from coarse (database) to fine
(record). DB
f1 f2
r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j
Granularity of data items and Multiple Granularity Locking
To manage such hierarchy, in addition to read and write, three additional locking modes,
called intention lock modes are defined:
Intention-shared (IS): indicates that a shared lock(s) will be requested on some
descendent nodes(s).
Intention-exclusive (IX): indicates that an exclusive lock(s) will be requested on some
descendent node(s).
Shared-intention-exclusive (SIX): indicates that the current node is locked in shared
mode but an exclusive lock(s) will be requested on some descendent nodes(s).
These locks are applied using the following compatibility matrix: In this, Intention-shared
(IS), Intention-exclusive (IX) and Shared-intention-exclusive (SIX)
IS IX S SIX X
IS yes yes yes yes no
IX yes yes no no no
S yes no yes no no
SIX yes no no no no
X no no no no no
** Database Recovery **
There are many situations in which a transaction may not reach a commit or abort point.
1. An operating system crash can terminate the DBMS processes
2. The DBMS can crash
3. The system might lose power
4. A disk may fail or other hardware may fail.
5. Human error can result in deletion of critical data.
In any of these situations, data in the database may become inconsistent or lost. For example,
if a transaction has completed 30 out of 40 scheduled writes to the database when the DBMS
crashes, then the database may be in an inconsistent state as only part of the transaction’s work
was completed.
Database Recovery is the process of restoring the database and the data to a consistent state.
This may include restoring lost data up to the point of the event (e.g. system crash).
Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
3. Network Failure: A network failure occurs when a client – server configuration or
distributed database system are connected by communication networks.
4. Disk Failure: Disk Failure occurs when there are issues with hard disks like formation of
bad sectors, disk head crash, unavailability of disk etc.
5. Media Failure: Media failure is the most dangerous failure because, it takes more time to
recover than any other kind of failures. A disk controller or disk head crash is a typical example
of media failure. Natural disasters like floods, earthquakes, power failures, etc. damage the data.
We can conclude from above example that, the need for recovery is to:
States of all the executed transaction should be verified. This is necessary to know
Transaction and Recovery: A transaction has to abort when it fails to execute or when it
reaches a point from where it can’t go any further. This is called transaction failure where only
a few transactions or processes are hurt.
Reasons for a transaction failure could be:
Logical errors
System errors
When a system crashes, it may have several transactions being executed and various files
opened for them to modify the data items. Transactions are made of various operations, which
are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a
whole must be maintained, that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to be rolled
back.
No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −
Maintaining the logs of each transaction, and writing them onto some stable storage
3. We can recover the database using Log–Based Recovery and concurrent transactions.
1. Log-Based Recovery:
Logs are the sequence of records that maintain the records of actions performed by a
transaction.
In Log – Based Recovery, log of each transaction is maintained in some stable storage. If
any failure occurs, it can be recovered from there to recover the database.
The log contains the information about the transaction being executed, values that have
been modified and transaction state.
All these information will be stored in the order of execution.
Example: Assume a transaction to modify the address of an employee. The following logs are
written for this transaction,
Log 1: Transaction is initiated, writes 'START' log.
Log: <Tn START>
Log 2: Transaction modifies the address from 'Pune' to 'Mumbai'.
Log: <Tn Address, 'Pune', 'Mumbai'>
Log 3: Transaction is completed. The log indicates the end of the transaction.
Log: <Tn COMMIT>
There are two methods of creating the log files and updating the database,
1. Deferred Database Modification
2. Immediate Database Modification
1. In Deferred Database Modification, all the logs for the transaction are created and stored
into stable storage system. In the above example, three log records are created and stored it in
some storage system; the database will be updated with those steps.
2. In Immediate Database Modification, after creating each log record, the database is
modified for each step of log entry immediately. In the above example, the database is modified
at each step of log entry that means after first log entry, transaction will hit the database to fetch
the record, then the second log will be entered followed by updating the employee's address,
then the third log followed by committing the database changes.
Checkpoint
Checkpoint acts like a benchmark.
It is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage system.
It declares a point before which the database management system was in consistent state
and all the transactions were committed.
It is a point of synchronization between the database and the transaction log file.
It involves operations like writing log records in main memory to secondary storage,
writing the modified blocks in the database buffers to secondary storage and writing a
checkpoint record to the log file.
The checkpoint record contains the identifiers of all transactions that are active at the
time of the checkpoint.
Recovery
When concurrent transactions crash and recover, the checkpoint is added to the
transaction and recovery system recovers the database from failure in following manner,
1. Recovery system reads the log files from end to start checkpoint. It can reverse the
transaction.
2. It maintains undo log and redo log.
3. It puts the transaction in the redo log if the recovery system sees a log <T n, Commit>.
4. It puts the transaction in undo log if the recovery system sees a log with <Tn, Start>.
All the transactions in the undo log are undone and their logs are removed.
All the transactions in the redo log and their previous logs are removed and then redone
before saving their logs.
Nested Transaction Model:
A nested transaction model as proposed by Moss is a generalization of the flat transaction
model that allows nesting. A nested transaction forms a tree of transactions with the root being
called a top-level transaction and all other nodes called nested transactions (subtransactions).
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 117
Secondary permission: Grants to the groups and roles if the user is a member.
Public permission: Grants to all users publicly.
Context-sensitive permission: Grants to the trusted context role.
Authorization can be given to users based on the categories below:
System-level authorization
While working with the SQL statements, the Database authorization model considers the
combination of the following permissions:
Permissions granted to the primary authorization ID associated with the SQL statements.
Secondary authorization IDs associated with the SQL statements.
Granted to PUBLIC.
Security Threats:
This word has been used several times already. Security threat is any hostile agent which
randomly or with use of specialized techniques can obtain or change information in the
information system. Random security threats are:
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 119
policies.
Human errors - unintentional violations such as incorrect input or wrong use of
applications.
Intended security threats can be categorized according to their originator:
authorized users - abuse there privileges
** Access Controls **
Access control is a way of limiting access to a system or to physical or virtual resources.
In computing, access control is a process by which users are granted access and certain
privileges to systems, resources or information.
In access control systems, users must present credentials before they can be granted
access. In physical systems, these credentials may come in many forms, but credentials that
can't be transferred provide the most security.
For example, a key card may act as an access control and grant the bearer access to a
classified area. Because this credential can be transferred or even stolen, it is not a secure way
of handling access control.
A more secure method for access control involves two-factor authentication. The person
who desires access must show credentials and a second factor to corroborate identity. The
second factor could be an access code, a PIN or even a biometric reading.
There are three factors that can be used for authentication:
1. Something only known to the user, such as a password or PIN
2. Something that is part of the user, such as a fingerprint, retina scan or another biometric
measurement
3. Something that belongs to the user, such as a card or a key
For computer security, access control includes the authorization, authentication and audit of
the entity trying to gain access. Access control models have a subject and an object.
The subject - the human user - is the one trying to gain access to the object - usually the
software. In computer systems, an access control list contains a list of permissions and the users
to whom these permissions apply.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 120
Such data can be viewed by certain people and not by other people and is controlled by
access control. This allows an administrator to secure information and set privileges as to what
information can be accessed, who can access it and at what time it can be accessed.
** Data Backup and Recovery **
In a computer system we have primary and secondary memory storage. Primary memory
storage devices - RAM is a volatile memory which stores disk buffer, active logs, and other
related data of a database. It stores all the recent transactions and the results too. When a query
is fired, the database first fetches in the primary memory for the data, if it does not exist there,
then it moves to the secondary memory to fetch the record. Fetching the record from primary
memory is always faster than secondary memory.
What happens if the primary memory crashes? All the data in the primary memory is lost
and we cannot recover the database.
In such cases, we can follow any one the following steps so that data in the primary memory
are not lost.
We can create a copy of primary memory in the database with all the logs and buffers,
and are copied periodically into database. So in case of any failure, we will not lose all
the data. We can recover the data till the point it is last copied to the database.
We can have checkpoints created at several places so that data is copied to the database.
Suppose the secondary memory itself crashes. What happens to the data stored in it? All the
data are lost and we cannot recover. We have to think of some alternative solution for this
because we cannot afford for loss of data in huge database.
There are three methods used to back up the data in the secondary memory, so that it can be
recovered if there is any failure.
1. Remote Backup: - Database copy is created and stored in the remote network. This
database is periodically updated with the current database so that it will be in sync with
data and other details. This remote database can be updated manually called offline
backup. It can be backed up online where the data is updated at current and remote
database simultaneously. In this case, as soon as there is a failure of current database,
system automatically switches to the remote database and starts functioning. The user
will not know that there was a failure.
2. In the second method, database is copied to memory devices like magnetic tapes and
kept at secured place. If there is any failure, the data would be copied from these tapes to
bring the database up.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 121
3. As the database grows, it is an overhead to backup whole database. Hence only the log
files are backed up at regular intervals. These log files will have all the information about
the transaction being made. So seeing these log files, database can be recovered. In this
method log files are backed up at regular intervals, and database is backed up once in a
week.
There are two types of data backup – physical data backup and Logical data backup.
Physical data backup: The physical data backup includes physical files like data files, log
files, control files, redo- undo logs etc. They are the foundation of the recovery mechanism in
the database as they provide the minute details about the transactions and modification to the
database
Logical data backup: Logical backup includes backup of logical data like tables, views,
procedures, functions etc. Logical data backup alone is not sufficient to recover the database as
they provide only the structural information. The physical data back actually provides the
minute details about the database and is very much important for recovery.
** Data integrity **
Data integrity is the maintenance and the assurance of the accuracy and consistency of,
data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of
any system which stores, processes, or retrieves data. The term is broad in scope and may have
widely different meanings depending on the specific context – even under the same general
umbrella of computing. It is at times used as a proxy term for data quality, while data validation
is a pre-requisite for data integrity. Data integrity is the opposite of data corruption.
The overall intent of any data integrity technique is the same: ensure data is recorded
exactly as intended (such as a database correctly rejecting mutually exclusive possibilities,) and
upon later retrieval, ensure the data is the same as it was when it was originally recorded. In
short, data integrity aims to prevent unintentional changes to information. Data integrity is not
to be confused with data security, the discipline of protecting data from unauthorized parties.
Any unintended changes to data as the result of a storage, retrieval or processing
operation, including malicious intent, unexpected hardware failure, and human error, is failure
of data integrity. If the changes are the result of unauthorized access, it may also be a failure of
data security. Depending on the data involved this could manifest itself as benign as a single
pixel in an image appearing a different color than was originally recorded, to the loss of
vacation pictures or a business-critical database, to even catastrophic loss of human life in a
life-critical system.
Prepared by G. Veerachary MCA, AP-SET, UGC-NET Page 122
Integrity types:
Physical integrity: Physical integrity deals with challenges associated with correctly storing
and fetching the data itself. Challenges with physical integrity may include electromechanical
faults, design flaws, material fatigue, corrosion, power outages, natural disasters, acts of war
and terrorism, and other special environmental hazards such as ionizing radiation, extreme
temperatures, pressures and g-forces. Ensuring physical integrity includes methods such as
redundant hardware, an uninterruptible power supply, certain types of RAID arrays, radiation
hardened chips, error-correcting memory, use of a clustered file system, using file systems that
employ block level checksums such as ZFS, storage arrays that compute parity calculations
such as exclusive or or use a cryptographic hash function and even having a watchdog timer on
critical subsystems.
Physical integrity often makes extensive use of error detecting algorithms known as
error-correcting codes. Human-induced data integrity errors are often detected through the use
of simpler checks and algorithms, such as the Damm algorithm or Luhn algorithm. These are
used to maintain data integrity after manual transcription from one computer system to another
by a human intermediary (e.g. credit card or bank routing numbers). Computer-induced
transcription errors can be detected through hash functions.
In production systems, these techniques are used together to ensure various degrees of
data integrity. For example, a computer file system may be configured on a fault-tolerant RAID
array, but might not provide block-level checksums to detect and prevent silent data corruption.
As another example, a database management system might be compliant with the ACID
properties, but the RAID controller or hard disk drive's internal write cache might not be.
Logical integrity: This type of integrity is concerned with the correctness or rationality of a
piece of data, given a particular context. This includes topics such as referential integrity and
entity integrity in a relational database or correctly ignoring impossible sensor data in robotic
systems. These concerns involve ensuring that the data "makes sense" given its environment.
Challenges include software bugs, design flaws, and human errors. Common methods of
ensuring logical integrity include things such as check constraints, foreign key constraints,
program assertions, and other run-time sanity checks.
Both physical and logical integrity often share many common challenges such as human
errors and design flaws, and both must appropriately deal with concurrent requests to record and
retrieve data, the latter of which is entirely a subject on its own.
** Data Encryption **
Encryption is a security method in which information is encoded in such a way that only
authorized user can read it. It uses encryption algorithm to generate ciphertext that can only be
read if decrypted.
Types of Encryption: There are two types of encryptions schemes as listed below:
Symmetric Key encryption
Public Key encryption
Symmetric Key encryption: Symmetric key encryption algorithm uses same cryptographic
keys for both encryption and decryption of cipher text.
Public Key encryption: Public key encryption algorithm uses pair of keys, one of which is a
secret key and one of which is public. These two keys are mathematically linked with each
other.
A DBMS can use encryption to protect information in certain situations where the normal
security mechanisms of the DBMS are not adequate. For example, an intruder may steal tapes
containing some data or tap a communication line. By storing and transmitting data in an
encrypted form, the DBMS ensures that such stolen data is not intelligible to the intruder. Thus,
encryption is a technique to provide privacy of data.
Encrypting data gives rise to serious technical problems at the level of physical storage
organization. For example indexing over data, which is stored in encrypted form, can be
very difficult.
RAID 1: RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
copy of data to all the disks in the array. RAID level 1 is also called mirroring and provides
100% redundancy in case of a failure.
RAID 2: RAID 2 records Error Correction Code using Hamming distance for its data, striped
on different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC
codes of the data words are stored on different set disks. Due to its complex structure and high
cost, RAID 2 is not commercially available.
RAID 3: RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4: In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas level
4 uses block-level striping. Both level 3 and level 4 require at least three disks to implement
RAID.
RAID 5: RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.
RAID 6: RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement RAID.