You are on page 1of 65

Database Management Systems

Mahatma Gandhi University, Nalgonda.

B.Sc (Computer Science): III Year

Study Material
(2017-18)

https://mguugcs.blogspot.in
B.SC-Computer Science-III Year DBMS Study Material

UNIT-I  Data management is a discipline that


Introduction focuses on the proper generation, storage,
Good decisions require good information that is and retrieval of data.
derived from the data. Data are likely to be
managed most efficiently when they are stored in a Database and DBMS
database. A database is a shared, integrated structure that
Data: stores a collection of End-user data and Metadata.
 End-user data is the raw facts of interest to
Data are raw facts. the end user.
The word ‘raw’ indicates that the facts have not yet  Metadata provide a description of the data
been processed. in the database.
For example, the metadata stores
Information: information such as the name of each data element,
Its data type, whether the data element can
The processed and meaningful data is known as
be left empty, and so on.
Information.
 Metadata provides a more complete picture
For example, of the data in the database.
Suppose that you want to know what the users of a A database management system (DBMS) is a
computer lab think of its services. Then you would collection of programs that manages the database
begin by surveying by a web survey form that structure and controls access to the data in the
enables users to respond to your questions. When database.
the survey form has been completed, the form’s raw  Examples of DBMS: Oracle, Teradata,
data are saved to a data repository. Although you Microsoft-SQL Server, My-SQL,
have the facts in hand, they are not particularly PostgreSQL, MS-Access, etc.
useful in this format. Therefore, you transform the
Role and advantages of the DBMS
raw data into a data summary (Information). Now
The DBMS serves as the intermediary between the
it’s possible to get quick answers to many
user and the database. We can access the data in a
questions.
database through the DBMS.

In this “information age,” production of


accurate, relevant, and timely information is the
key to good decision making. Good decision
making is the key to business survival in a global
market.
We are now in the “knowledge age.”
 Data are the foundation of information.
 Information is the bedrock of knowledge.
 Knowledge implies familiarity, awareness,
and understanding of information.
 A key characteristic of knowledge is that The DBMS receives all application requests and
fulfil those requests.
“new” knowledge can be derived from
DBMS provides the following advantages:
“old” knowledge. I. Improved data sharing: The DBMS offers
access to more data. It allows end users to
share their data.

https://mguugcs.blogspot.in Page: 1
B.SC-Computer Science-III Year DBMS Study Material

II. Improved data security: Accessing the data According to the number of users:
by many users gives risks to the data According to the number of users, the databases are
security. So that the DBMS provides a classified as single-user or multiuser.
framework for better enforcement of data I) A single-user database supports only one
privacy and security policies. user at a time.
III. Better data integration: DBMS provides  If user A is using a single- user database,
an integrated view of the organization’s then users B and C must wait until user A
operations. completes his work.
IV. Minimized data inconsistency: Data  A single-user database that runs on a
inconsistency exists when different personal computer is called a desktop
versions of the same data appear in different database.
places. The probability of data
inconsistency is greatly reduced by the II) A multiuser database supports multiple
DBMS. users at the same time.
V. Improved data access: The DBMS makes it  When the database is used by a specific
possible to produce quick answers to department within an organization then it is
queries. A query is a request issued to the called a workgroup database.
DBMS, for example, to read or update the  When the database is used by the entire
data. organization then it is known as an
VI. Improved decision making: Better- enterprise database.
managed data provides better quality
information and knowledge. It is useful to According to the Location:
make better decisions. According to the location, the databases are
VII. Increased end-user productivity: The classified as Centralized or Distributed.
availability of data and information I) Centralized database: A database that supports
empowers end users give better
data located at a single site is called a centralized
productivity.
**** database.
II). Distributed Database: A database that
Database Systems – Disadvantages supports data distributed at several sites is called a
Database systems have the following distributed database.
disadvantages:
According to the type of usage:
 Increased costs: Training, licensing, and
According to type of usage, the databases are
regulation costs are high.
classified as Operational database or data
 Management complexity: It is a complex
process to manage the changes in the warehouse.
database system. I). Operational Database: A database that is
 Maintaining currency: The database designed to support a company’s day-to-day
system must keep current. operations is known as an operational database
Frequent upgrade/replacement cycle: Changes (Also called as a transactional or production
in the DBMS may require hardware upgrades too. database)
II) Datawarehouse: A database that focuses
primarily on storing large amount of data and to
***
generate information required for decision support.
Types of Databases
Databases can be classified according to: ***
 The number of users
 The database locations and
 The type of usage.

https://mguugcs.blogspot.in Page: 2
B.SC-Computer Science-III Year DBMS Study Material

Why Database design is important? Even a simple file system requires several file
Database design refers to the activities that design management programs.
the database structure. A database’s structure must IV) Lack of security and limited data sharing:
be designed carefully. Otherwise even a good File system has lack of security and limited data
DBMS will perform poorly with a badly designed sharing. Sharing data among multiple users
database. introduces a lot of security risks.
A database designer should mind that: Designing V) Extensive programming:
a transactional database / data warehouse database In a file system environment it is difficult to make
/ a centralized database or a distributed database, changes. Even a simple file system management
each requires a different approach. requires several programs.
A well-designed database facilitates better data VI) Structural and Data Dependence:
management. A poorly designed database may lead In a file system, access to a file is dependent on its
to bad decision making and bad decision making structure. For example, to add a new field to a file,
can lead to the failure of an organization. So that all of the file system programs must be modified.
the Database design is very important. VII) Data Redundancy:
With a file system, the same data might be stored in
*** different locations. It leads to Data redundancy.
Files and File Systems Uncontrolled data redundancy leads to Poor data
Basic File Terminology: security, data inconsistency.
 Field: A character or group of characters VIII) Lack of Design and Data-Modeling Skills
that has a specific meaning. A field is Data-modeling skills are very important part of the
used to define and store data. design process. File systems does not have design
 Record: A collection of fields is known and data modeling features.
as a Record. It describes a person, place, ****
or thing.
 File: A collection of related records is Database Systems
known as a File.
For example, a file might contain the Database System Components
records for the students currently enrolled at A group of components that define and control the
Mahatma Gandhi University. collection, storage, management, and use of data
*** within a database environment are known as
Database Systems.
Problems with File System Data Management
The following are the problems associated with file A database system has five major parts:
systems: 1) Hardware: Hardware refers to all of the
I) Lengthy development times: system’s physical devices. It includes computers,
In a file system approach even the simplest data- storage devices, printers, network devices and other
retrieval task requires extensive programming. devices.
Programmers need to specify what must be done 2) Software: Database systems require three types
and how to do it. of software:
II) Difficulty of getting quick answers: I. Operating System software
In a file system there is a need to write programs to II. DBMS software, and
produce even a simplest report. It does not support III. Application Programs and Utilities.
ad-hoc queries. Operating system software: It manages all
III) Complex system administration: hardware components and runs all other software.
As the number of files in the system expands it Examples: Microsoft Windows, Linux, Mac OS,
makes the System administration very difficult. UNIX, and MVS.

https://mguugcs.blogspot.in Page: 3
B.SC-Computer Science-III Year DBMS Study Material

DBMS software: It manages the database within system. Procedures play an important role in a
the database system. company because they enforce the
Examples: Oracle Corporation’s Oracle, standards within the organization.
Microsoft’s SQL Server, Sun’s MySQL, and 5) Data: Data are the collection of facts stored in
IBM’s DB2. the database. Data are the raw material that is useful
Application programs and utility software: for generating information.
These are used to access and manipulate data in the ***
DBMS. DBMS Functions
Utilities are the software tools used to manage the A DBMS performs several important functions.
database system’s computer components. They include:
1. Data dictionary management
3) People: People include all users of the database 2. Data storage management
system. There are five types of users in a database 3. Data transformation and presentation
system: 4. Security management
I. System Administrators 5. Multi-user access control
II. Database Administrators 6. Backup and recovery management
III. Database Designers 7. Data integrity management
IV. System Analysts and Programmers, 8. Database access languages and application
and programming interfaces
V. End Users.  Data dictionary management: DBMS
stores metadata in a data dictionary. DBMS
uses the data dictionary to manage data
component structures and relationships.

 Data storage management: DBMS


creates and manages the standards for the
data storage. It is important for database
performance tuning.

 Data transformation and presentation:


DBMS transforms entered data into the
required data structures. It formats the data
Figure: Database System Environment.
to meet the user’s logical expectations.
System administrators monitor the database  Security management: DBMS creates a
system’s general operations. security system for user security and data
Database administrators also known as DBAs, privacy. Security rules determine the ways
manage the DBMS and controls its operations of accessing the DBMS.
Database designers design the database structure.
They are also known as database architects.  Multi-user access control: DBMS allows
System analysts and programmers design and multiple users to access the database
implement the application programs. concurrently without compromising its
End users are the people who use the application integrity.
programs to run the organization’s daily operations.
For example, sales clerks, supervisors, managers,  Backup and recovery management:
and directors are all classified as end users. DBMS provides backup and data recovery
to ensure data safety and integrity.
4) Procedures: Procedures are the instructions and
rules that direct the design and use of the database

https://mguugcs.blogspot.in Page: 4
B.SC-Computer Science-III Year DBMS Study Material

 Data integrity management: The DBMS customer phone, customer address, and
provides integrity rules to minimize data customer credit limit.
redundancy and maximize data
consistency. A relationship is an association among entities.
For example:: an agent can serve many customers,
 Database access languages and and each customer may be served by one agent.
application programming interfaces:
The DBMS provides data access through
Data models use three types of relationships:
Structured Query Language (SQL). DBMS
one-to-many[1:M] , many-to-many[M:N], and
also provides application programming
interfaces to COBOL, C, Java, etc. one-to-one[ 1:1].
*** One-to-many (1:M ) relationship. A painter
paints many different paintings, but each one of
Data Models them is painted by only one painter. Thus, the
painter (the “one”) is related to the paintings (the
“many”). Therefore, the relationship “PAINTER
The importance of Data models
paints PAINTING” is 1:M.
A data model is a graphical representation of real-
Many-to-many (M:N) relationship. An
world data structures. Data models represent data
employee may learn many job skills, and each job
structures and their characteristics, relations, and
skill may be learned by many employees.
constraints.
Therefore, the relationship “EMPLOYEE learns
Importance of Data Models:
SKILL” is M:N.
Data models are a communication tool that can
One-to-one (1:1 or 1..1) relationship. A retail
facilitate interaction among the designer, the
company’s store be managed by a single employee.
applications programmer, and the end user. A well-
In turn, each store manager, who is an employee,
developed data model provides a better
manages only
understanding of the organization. A database
a single store. Therefore, the relationship
should be designed in a way that it can support all
“EMPLOYEE manages STORE” is 1:1.
categories of users in an organization. So, a proper
data model is necessary to create a good database.
A constraint is a restriction placed on the data.
The basic data-modeling components are entities,
Constraints are normally expressed in the form of
attributes, relationships, and constraints.
rules. For example:
Explain about the basic building blocks of Data  An employee’s salary must have values that
are between 6,000 and 350,000.
models?
 A student’s GPA must be between 0.00 and
The basic building blocks of all data models are:
4.00.
 Entities
 Each class must have one and only one
 Attributes
teacher.
 Relationships and
 Constraints
***
An entity is anything about which data are to be
collected and stored.
For example: A Person, A Place or a Thing. Business Rules

An attribute is a characteristic of an entity. A business rule is a brief, strict, and clear


For example, a CUSTOMER entity has attributes description of a policy, procedure, or principle
such as customer last name, customer first name, within a specific organization.
Examples of business rules are as follows:

https://mguugcs.blogspot.in Page: 5
B.SC-Computer Science-III Year DBMS Study Material

 A customer may generate many invoices.


 An invoice is generated by only one
customer.
 A training session cannot be scheduled for
fewer than 10 employees or for more than
30 employees.

Discovering Business Rules:


The main sources of business rules are company
managers, policy makers, and written Limitations of Hierarchical model:
documentation. A faster and more direct source is  It was complex to implement,
direct interviews with end users.  It was difficult to manage, and
Translating Business rules into data model  It lacked structural independence.
components:  There were no standards for how to
A noun in a business rule will be taken as an entity, implement the model.
and a verb will be a relationship among the entities. The Network Model :
For example, the business rule:  The network model was created to
“a customer may generate many invoices” represent complex data relationships, to
contains two nouns (customer and invoices) and a improve database performance and to
verb (generate) that associates the nouns. So, here enforce a database standard.
 The Conference on Data Systems
 Customer and Invoice should be
Languages (CODASYL) created the
represented by entities.
Database Task Group (DBTG) in the late
 There is a “generate” relationship between 1960s
customer and invoice.  The network model represents the network
*** database as a collection of records in 1:M
relationships. The network model allows a
The Evolution of Data Models record to have more than one parent. Here, a
The efforts for a better data management led to relationship is called a set. Each set can contain
several different models. Those are: an owner record and a member record. A set
represents a 1:M relationship between the
The Hierarchical model : owner and the member.
 The hierarchical model was developed in
1960s. It was used to manage large amounts
of data such as the Apollo Moon landing
project in 1969.

 Its structure is represented by an upside-


down tree. The hierarchical structure
contains levels, or segments. It shows a
Parent-Child relationship.
Limitations of Network model:
 The hierarchical model shows a one-to-  The network model became too
many (1:M) relationship. (Each parent can cumbersome.
have many children, but each child has only  No ad hoc query capability.
one parent.)  Limited data independence

https://mguugcs.blogspot.in Page: 6
B.SC-Computer Science-III Year DBMS Study Material

The Relational Model:  Attributes: Attributes are the


 The relational model was introduced in characteristics of the entity. For example,
1970 by E. F. Codd. It was introduced in a the entity EMPLOYEE will have attributes
paper “A Relational Model of Data for such as a UID (Aadhar Number), a last
Large Shared Databanks”. name, and a first name.
A relation (a table) is a matrix composed of rows  Relationships: Relationships are the
and columns. Each row in a relation is called a associations among data. Mainly there are
tuple. Each column represents an attribute. three types of relationships: one-to-many
(1:M), many-to-many (M:N), and one-to-
one (1:1). The name of the relationship
usually is a verb. For example, a PAINTER
paints many PAINTINGs; an EMPLOYEE
learns many SKILLs; an EMPLOYEE
manages a STORE.
The following are different types of relationships
using ER notations using Chenn notation and
Crow’s foot notation:
 The relational data model is implemented
through a very sophisticated relational
database management system (RDBMS).
 Tables are related to each other through the
sharing of a common attribute.
 The relational data model/ RDBMS uses
Structured Query Language (SQL) to
translate user queries into instructions for The object-oriented data model (OODM)
retrieving the requested data.
 The object-oriented data model
The Entity Relationship Model ( ERM) (OODM), represents both data and their
relationships in a single structure known as
 Peter Chen introduced the ER data model an object. The OODM is the basis for the
in 1976. object-oriented database management
 An E-R Model is the graphical system (OODBMS).
representation of entities and their
relationships in a database structure. The OO data model is based on the following
 ER models are normally represented in an components:
entity relationship diagram (ERD)  An object is an abstraction of a real-world
The ER model is based on the following entity. Attributes describe the properties of
components: an object. For example, a PERSON object
 Entity: an entity was defined as anything includes the attributes Name, Social
about which data are to be collected and Security Number, and Date of Birth.
stored. An entity is represented in the ERD  A class is a collection of similar objects
by a rectangle, also known as an entity box. with attributes methods.
The name of the entity should be a noun and  Inheritance is the ability of a class to
that should be written in the center of the inherit the attributes and methods of the
rectangle. classes above it. For example, two classes,
 The entity name is generally written in CUSTOMER and EMPLOYEE, can be
capital letters and is written in the singular created as subclasses from the class
form: PAINTER rather than PAINTERS, PERSON.
and EMPLOYEE rather than
EMPLOYEES. ***

https://mguugcs.blogspot.in Page: 7
B.SC-Computer Science-III Year DBMS Study Material

Degrees of Data Abstraction graphically represent the conceptual


schema.
Data Abstraction Model
The conceptual model is independent of both
In the early 1970s, the American National software and hardware.
Standards Institute (ANSI) Standards Planning The Internal Model:
and Requirements Committee (SPARC) defined a
framework for data modeling based on degrees of  The internal model is the representation of
data abstraction.
the database as “seen” by the DBMS.
The ANSI/SPARC architecture defines three levels
of data abstraction:  An internal schema represents the internal
External, Conceptual, and Internal. model.
The External Model:  Internal model is hardware-independent.
 The external model is the end users’ view Therefore, a change in storage devices or
of the data environment.
 End users are the people who use the even a change in operating systems will not
application programs to generate affect the internal model.
information.
 End users view their data subsets as The Physical Model:
separate from other departments.
 ER diagrams will be used to represent the  The physical model operates at the lowest
external views. level of abstraction.
 A specific representation of an external
view is known as an external schema.  It describes the way how the data are saved
on storage media.
 The physical model requires the definition
of both the physical storage devices and the
(physical) access methods required to reach
the data within those storage devices.
 The Physical Model is both software- and
hardware dependent.
***

The Relational Database Model


The Conceptual Model :
A logical view of Data:
The relational data model allows the logical
 The conceptual model represents a global
representation of the data and its relationships. It
view of the entire database. stores the data in the form of a logical construct
 Conceptual model is also known as a known as a Relation. A Relation is also known as a
conceptual schema. Table.
 The most widely used conceptual model is
the ER model. The ERD is used to

https://mguugcs.blogspot.in Page: 8
B.SC-Computer Science-III Year DBMS Study Material

Table structure and contents. Integrity Rules


 A table is a two-dimensional structure with The rules, which are essential to maintain data
rows and columns. integrity, are known as integrity rules.
 A table is also called a relation. Entity Integrity: An Entity Integrity specifies that
 A table contains an entity set. For example, a table must have a primary key. It must not be null
a STUDENT table contains a collection of and cannot have duplicate values.
entity occurrences, each representing a For example consider the following STUDENT
student.
table:
 E.F CODD proposed the Relational model. SNO SNAME DOB COURSE PHONE
468001 Akhila 10-JUN-95 B.SC 9848012345
Chracteristics of a Table: 468002 Bharath 12-DEC-95 B.SC 9848112345
468003 Charitha 24-AUG-95 B.SC 9848212345
1. A table is a two-dimensional structure 468004 Dharani 01-JAN-95 B.SC 9848312345
composed of rows and columns. 468005 Eesha 02-FEB-95 B.SC 9848412345
2. Each table row (tuple) represents a single
entity occurrence of an entity set. The above STUDENT table has entity integrity
3. Each table column represents an attribute. because it has a Primary Key (SNO), NO null
4. Each row/column intersection represents a values and no duplicate values too.
single data value.
5. All values in a column must be of same data Referential Integrity
format. A Referential Integrity specifies that A foreign key
6. The order of the rows and columns is (FK) values must match the primary key values in
immaterial to the DBMS. the related table.
7. Each table must have a Primary Key.
For example consider the following EMP and
Relational Keys
DEPT tables:
A key consists of one or more attributes that
EMP:
determine other attributes. In a Relational Model
Keys are important because they are useful to
uniquely identify each row, establish relationship
among the tables and to maintain the data integrity.
The following are the different types of keys:
Superkey: An attribute (or combination of
attributes) that uniquely identifies each row in a
table.
Candidate key: A minimal (irreducible) super
key. A candidate key is a superkey without
unnecessary attributes
Primary key: A candidate key that uniquely
identifies all other attribute values in a table. It
cannot contain null values.
The above two tables, EMP and DEPT maintains
Secondary key: An attribute (or combination of
referential integrity. Because the foreign key of
attributes) used strictly for data retrieval purposes.
EMP (DEPTNO) matches the values of the Primary
Key in the DEPT (DEPTNO).
Foreign key: A foreign key (FK) is an attribute
*****
whose values must match the primary key values in
the related table.

https://mguugcs.blogspot.in Page: 9
B.SC-Computer Science-III Year DBMS Study Material

Relational Set Operators


Relational algebra defines the theoretical way of
manipulating the data in a database. It uses the
following eight relational operators:
1. UNION
2. INTERSECT
3. DIFFERENCE
4. PRODUCT
5. SELECT
6. PROJECT
7. DIVIDE
8. JOIN 3. DIFFERENCE: Difference yields all rows in
one table that are NOT found in the other table. It
1. UNION:
subtracts one table from the other. The tables must
It combines all rows from two tables excluding
be union-compatible.
duplicate rows. The two tables must be union-
compatible to use in a UNION.
When two tables share the same number of
columns, same names, and same domains, they are
said to be union-compatible.

4. PRODUCT: It yields possible pairs of rows


from two tables also known as the Cartesian
Product If one table has four rows and the other has
two rows, then the PRODUCT yields a list of 4 X
2 = 8 rows.

2. INTERSECT: Intersect yields only the rows


that appear in both the tables. For an INTERSECT,
the tables must be union-compatible. For example,
we cannot use INTERSECT if one of the attributes
is numeric and one is character.

https://mguugcs.blogspot.in Page: 10
B.SC-Computer Science-III Year DBMS Study Material

5. SELECT: It is also known as RESTRICT. It


yields values for all rows that satisfy a condition.
The SELECT yields a horizontal subset of the table.

7. JOIN: It allows information to be combined


from two or more tables. JOIN is the real power
behind the relational database. It joins the tables by
common attributes.
(a). Natural Join: A natural join links tables by
selecting only rows with common values in their
common attributes. A natural join is the result of
three-stage process
1. First a Product of the tables is created. For
6. PROJECT: It yields all values for selected example:
attributes. In other words, PROJECT yields a
vertical subset of a table.

2.Second, a SELECT is performed on the result of


the Product, as shown below:

https://mguugcs.blogspot.in Page: 11
B.SC-Computer Science-III Year DBMS Study Material

all of the attribute names and their characteristics.


In short, the data dictionary contains metadata—
data about data.
For example, The following is a data dictionary:

3. Third, a PROJECT is performed on the result of


select, as shown below:

The data dictionary is sometimes called as “the


database designer’s database” because it records
The final outcome of a natural join yields a table the design decisions about tables and their
that has only the matched rows. structures.

EquiJoin: It joins the tables on the basis of an System Catalog


equality condition. The outcome of the equijoin Like the data dictionary, the system catalog
does not eliminate duplicate columns. The equijoin contains metadata. The system catalog can be
uses the equality comparison operator (=) used in described as a detailed system data dictionary. It
the condition. If any other comparison operator is describes all objects within the database, including
used, the join is called as a theta join. data about table names, the table’s creator and
(c). Outer Join: In an outer join, the matched pairs creation date, the number of columns in each table,
would be retained and any unmatched values in the indexes, authorized users, and access privileges,
other table would be left null. etc.
 A left outer join yields all of the rows of
the Left table in the join, including those
Modern RDBMS provide only a system catalog. So
that do not have a matching value in the
that, a designer’s data dictionary information may
Right table.
 A right outer join yields all of the rows in be derived from the system catalog.
the Right table in the join, including those
that do not have matching values in the Left Homonyms and Synonyms
table. Homonym: The word homonym indicates the use
of same attribute name to label different attributes.
8. DIVIDE: The DIVIDE operation uses one For example, C_NAME is used to label customer
single-column table (i.e. column “a”) as the divisor name in CUSTOMER table and also in
and one 2-column table (i.e. columns “a” and “b”) CONSULTANT table.
as the dividend. The tables must have a common
column (i.e. column “a”.) Synonym: A synonym is the opposite of homonym
The output of the DIVIDE operation is a single and indicates the use of different name to describe
column, where the value of the common the same attribute. For example car and auto refer
column (i.e. column “a”) in both tables matches. to the same object.
Data Dictionary & System Catalog  Both Homonyms and Synonyms should be
The data dictionary provides a detailed avoided in database.
description of all tables in the database. It contains

https://mguugcs.blogspot.in Page: 12
B.SC-Computer Science-III Year DBMS Study Material

Data Redundancy Rule No.2. Guaranteed Access: Every value in a


Data redundancy exists when the same data are table should be accessible through a combination of
stored unnecessarily at different places. table name, primary key and column name.
Un-controlled data redundancy causes several data Rule No.3.Systematic Treatment of Nulls: Nulls
integrity problems. must be represented and treated in a systematic
Data redundancy leads to: way, independent of data type.
Rule.No. 4. DynamicOn-LineCatalog based on
 Data inconsistency: Data inconsistency the Relational Model: The metadata must be stored
exists when different and conflicting and managed in the table.
versions of the same data appear in different Rule.No. 5. Comprehensive Data Sublanguage:
places. The relational database must support one well
 Data anomalies: Data redundancy forces a defined, declarative language with support for data
field value changes in many different definition, view definition, data manipulation,
locations. If it not made successfully, It may
integrity constraints, authorization and transaction
lead to a data anomaly.
management.
Some common data anomalies are:
Rule.No. 6. View Updating: Any view that is
 Update anomalies: Redundant data should
be updated in all of its locations. Otherwise theoretically updatable must be updatable through
it leads to the update anomaly. the system.
 Insertion anomalies: It is the inability to Rule.No.7. High-Level Insert, Update and
add data to the database due to the absence Delete: The database must support set-level inserts,
of other data. updates and deletes.
 Deletion anomalies: It is the unintended Rule.No.8. Physical Data Independence:
loss of data due to deletion of other data. Application programs and ad-hoc facilities are
logically unaffected when storage structures are
Indexes changed.
An index is an orderly arrangement used to access Rule.No.9. Logical Data Independence:
rows in a table. An Index is composed of an index Application programs and ad hoc facilities are
key and a set of pointers. Each key points to the Logically unaffected when changes are made to the
location of the data identified by the key. table structures.
Rule.No.10. Integrity Independence: All
DBMS uses indexes for many different purposes. relational integrity constraints must be definable in
Indexes can also used to retrieve data ordered by a the relational language and stored in the system
specific attribute. Indexes play an important role in catalog.
DBMS. To define a table’s primary key, the DBMS Rule.No.11. Distribution Independence: The end
automatically creates a unique index on the primary users and application programs are unaware and
key column. A table can have many indexes, but unaffected by the data location (distributed or local
each index is associated with only one table. database).
Rule.No.12. Non Subversion: If the system
Codd’s Rules supports low-level access to the data, there must not
In 1985, Dr.E.F.Codd published a list of 12 rules to be a way to bypass the integrity rules of the
define a relational database system. Those are: database.
Rule No. 1. Information: All information in a Rule Zero: All the 12 rules are based on the notion
relational database must be logically represented in that to be considered relational, it must
rows and columns within tables. use its relational facilities exclusively to manage
the database.
*****

https://mguugcs.blogspot.in Page: 13
B.SC-Computer Science-III Year DBMS Study Material

UNIT-II Types of Attributes

Data Modeling and Normalization Attributes are of the following types:


1. Required Attribute
The Entity Relationship Model ( ERM) 2. Optional Attribute
 Data modeling is the first step in the 3. Composite Attribute
database design process. Entity- 4. Simple Attribute
Relationship Models can be extensively 5. Single-Valued Attribute
used for data modeling. 6. Multivalued Attribute
 An E-R Model is the graphical 7. Derived Attribute
representation of entities and their Required Attribute:
relationships in a database structure. A required attribute is an attribute that must
 Peter Chen introduced the ER data model have a value; in other words, it cannot be left
in 1976. empty.
 ER models can be represented in an entity
For example:
relationship diagram (ERD)
STU_LNAME and STU_FNAME are the required
 An ERD can be represented by using
different notations, such as: attributes of the STUDENT entity.
 Chen Notation Optional Attribute:
 Crow’s Foot Notation An optional attribute is an attribute that does not
 UML Notation require a value; therefore, it can be left empty.
 The Chen notation favors conceptual For example:
modeling. Phone number and e-mail address are the optional
 The Crow’s foot notation favors a more attributes. Because every Student do not (yet) have
implementation approach. a phone number and an e-mail address.
 The UML notation can be used for both
conceptual and implementation modeling. Composite Attribute:
A composite attribute is an attribute that can be
Entity further subdivided into additional attributes.
An entity is anything about which data is to be For example:
collected and stored in a database. The word entity The attribute ADDRESS is a Composite attribute.
refers to a table. Each table row represents an entity It can be subdivided into street, city, state, and zip
instance. In both the Chen and Crow’s Foot code.
notations: Simple Attribute:
 An entity is represented by a rectangle with A simple attribute is an attribute that cannot be
the entity’s name. subdivided.
 The entity name is a noun and is usually For example: age, sex, and marital status would be
written in all capital letters. classified as simple attributes.

Attributes: Single-Valued Attribute:


Attributes are characteristics of entities. For A single-valued attribute is an attribute that can
example, the STUDENT entity includes the have only a single value.
attributes STU_LNAME, STU_FNAME, and For example: A person can have only one Social
STU_INITIAL. In Chen notation, attributes are Security number.
represented by ovals. Multivalued Attribute:
Multivalued attributes are attributes that can have
many values.

https://mguugcs.blogspot.in Page: 14
B.SC-Computer Science-III Year DBMS Study Material

For Example: SKILLS is a multivalued attribute. The basic types of connectivity for relations are:
Because a person may have many SKILLs. • One to Many (1:M)
• Many to Many (M:M)
Derived Attribute: • One to One (1: 1).
A derived attribute is an attribute whose value One-to-many (1:M ) relationship. A painter
is calculated (derived) from other attributes. paints many different paintings, but each one of
For example: them is painted by only one painter. Thus, the
An employee’s age, EMP_AGE, may be painter (the “one”) is related to the paintings (the
calculated from the current date and the “many”). Therefore, the relationship “PAINTER
EMP_DOB paints PAINTING” is 1:M.

Relationships Many-to-many (M:N) relationship. An


employee may learn many job skills, and each job
A relationship is an association between entities. skill may be learned by many employees.
The entities that participate in a relationship are Therefore, the relationship “EMPLOYEE learns
also known as participants. SKILL” is M:N.
Each relationship is identified by a name that
describes the relationship. The relationship name For example: The following is a Ternary
is an active or passive verb. relationship:
A DOCTOR writes one or more PRESCRIPTIONs.
For example, a STUDENT takes a CLASS, a A PATIENT may receive one or more
PROFESSOR teaches a CLASS. PRESCRIPTIONs.
A DRUG may appear in one or more
Connectivity and Cardinality: PRESCRIPTIONs

 The connectivity describes the relationship


classification. The values of connectivity
are "one" or "many".
 The cardinality of a relationship is the
number of related occurrences for each of
the two entities.
In the ERD, cardinality is indicated by placing the
numbers beside the entities, using the format (x,y).
The first value represents the minimum number of One-to-one (1:1 or 1..1) relationship. A retail
associated entities, and the second value represents company’s store be managed by a single employee.
the maximum number of entities. In turn, each store manager, who is an employee,
For example, consider the following figure: manages only a single store. Therefore, the
relationship “EMPLOYEE manages STORE” is
1:1.

Relationship Degree

A relationship degree indicates the number of


entities that participate in a relationship.
 Unary Relationship
 Binary Relationship
 Ternary Relationship

https://mguugcs.blogspot.in Page: 15
B.SC-Computer Science-III Year DBMS Study Material

Unary Relationship
A unary relationship exists when an association is
maintained with in a single entity.
For example: EMPLOYEE manages EMPLOYEE
entity is the manager

For example: “an EMPLOYEE may manage many


EMPLOYEEs, and each EMPLOYEE is managed
by one EMPLOYEE.” so that, EMPLOYEE
manages EMPLOYEE is a recursive relationship.
Example 2: A PERSON is married_to A PERSON
****
Binary Relationship Types of Relationship Strengths
A binary relationship exists between two entities in
Weak (Non-identifying) Relationships:
a relationship.
A weak relationship exists if the PK of the related
For example: PROFESSOR teaches one or more entity does not contain a PK component of the
CLASSes” represents a binary relationship. parent entity. For example, suppose that the
COURSE and CLASS entities are defined as:

Ternary Relationship
A ternary relationship exists when three entities are
associated. A ternary relationship implies an Here, the relationship between COURSE and
association among three different entities. CLASS is a weak relationship. Because the CLASS
PK did not inherit the PK component from the
COURSE entity. A weak relationship is also known
as Non-identifying relationship.

Strong (Identifying) Relationships


A strong relationship exists when the PK of the
related entity contains a PK component of the
parent entity. For example, suppose that the
COURSE and CLASS entities are defined as:

Here, the relationship between COURSE and


Recursive Relationships CLASS is a strong relationship. Because the
A recursive relationship is one in which a CLASS PK inherited the PK component
relationship can exist between occurrences of the (CRS_CODE + CLASS_SECTION) from the
same entity set. It is lso known as Unary COURSE entity. A strong relationship is also
relationship. known as identifying relationship.

***

https://mguugcs.blogspot.in Page: 16
B.SC-Computer Science-III Year DBMS Study Material

Developing E-R diagram

Designing an ERD has the following steps:


 Create a detailed description of the
organization’s operations.
 Identify the business rules from that
description.
 Identify the main entities and relationships
from the business rules. Figure: An example complete relationship matrix
 Develop the initial ERD.
 Identify the attributes and primary keys of Here, You can identify the following relationships:
the entities.  A Department is assigned a Lecturer.
 Revise and review the ERD.  A Department is run by a Supervisor.
 A Lecturer belongs to a Department.
A Simple Example:  A Lecturer Teaches a Class.
 A Supervisor runs a Department
Create a detailed description of the  A Class uses the services of a Lecturer
organization’s operations.
A database designer gathers this information from Develop the initial ERD.
record reviews, interviews, by examining the Now, you can create an initial ERD. For this:
organization daily operations.  Place all the entities in rectangles
For example:  Use diamonds for the relationships
A Famous college has several departments. Each  Use the lines to connect the entities and
department has a supervisor and at least one relationships.
Lecturer. Lecturers must be assigned to one or more
departments. At least one Lecturer is assigned to a The following is an example of rough ERD:
Class. The important data fields are the names of
the departments, classes, supervisors and Lecturers.

Identify the business rules:


You may identify the following business rules from
the above description:
1. Each department should be managed by only one
supervisor
2. A department may have one or more Lecturers
3. A lecturer can be assigned to one or more
departments.
4. At least one lecturer should be assigned to a
class. Identify the attributes and primary keys of the
Identify the main entities and relationships : entities:
From the above description, College, Department, The following are the attributes and primary keys
Supervisor, Lecturer and Class should be identified for the entities:
as the entities. DEPARTMENT( DEPTNO, DNAME,
SUPERVISOR_NO);
Find Relationships: SUPERVISOR(SUPERVISOR_NO, SNAME,
A relationship matrix is useful to identify the
QUALIFICATION, DOJ,);
relationships.
LECTUERER(LECTNO, LNAME,
QUALIFICATION, SUBJECT, DOJ,
SUPERVISOR_NO);

https://mguugcs.blogspot.in Page: 17
B.SC-Computer Science-III Year DBMS Study Material

CLASS(CLASSNO, CNAME, YEAR, LECTNO); Documentation is essential for the database .


Now, Assign the attributes to the entities. Use maintenance.
Ovals for the attributes.
Revise and Review the ERD: The Extended Entity Relationship Model
Now, you can revise and review your ERD to see if
anything has been omitted. The extended entity relationship model (EERM) is
**** the result of adding more constructs to the ER
model. It is useful to design more accurate database
Database Design Challenges schemas
Database designers should address many An EER model can be created by using all ERM
challenges/conflicting goals, such as: concepts and it also includes:
1. Adherence to design standards  Subclasses and superclasses
2. Processing speed, and  Specialization and generalization
3. Information requirements.  Entity clustering.
Design standards:  Attribute and relationship inheritance
The database design must conform to design An Extended Relationship Model(EERM) is also
standards. Such standards eliminates the data known as the Enhanced Entity Relationship Model.
redundancies and data anomalies. The design
Entity Supertypes:
standards will give well-defined database
an entity supertype is a generic entity type that is
components. related to one or more entitysubtypes.
Processing speed: An entity supertype contains the common
High processing speeds are essential in a database characteristics.
design. High processing speed means minimal Entity subtypes:
access time. A higher transaction-speed design An entity subtype is a subgroup of the entity
includes the derived attributes in the design. supertype. An entity subtype contain the unique
Information requirements: characteristics.
The timely information is essential from a database Specialization hierarchy
design. Complex information requirements need Entity supertypes and subtypes are organized in a
data transformations. They may expand the number specialization hierarchy. It shows the arrangement
of entities and attributes within the design. of higher-level entity supertypes (parent entities)
and lower-level entity subtypes (child entities).
Therefore, the database may have to sacrifice some
A specialization hierarchy can:
of its “clean” design structures to ensure maximum
• Support attribute inheritance.
information generation. • Define a special supertype attribute known
**** as the subtype discriminator.
Database Design Goals • Define disjoint/overlapping constraints and
•A design with all the requirements and design complete/partial constraints.
standards is an important goal.
•The designer should also consider the end-user Attribute Inheritance
requirements such as performance, security, The property that enables an entity subtype to
sharing, and data integrity. inherit the attributes and relationships of the
•The designer must verify that all update, retrieval, supertype is known as Inheritance. A supertype
and deletion options are available. contains common attributes. But, subtypes contain
only the unique attrinutes.
•The database design should support query and
reporting requirements
For Example : The aviation business employs can
•Finally, document is essential for this the designer be categorized as: pilots, mechanics, accountants
need to put all design activities in writing and many other types of employees. Then the

https://mguugcs.blogspot.in Page: 18
B.SC-Computer Science-III Year DBMS Study Material

following figure illustrates that PILOTS, Specialization and Generalization


MECHANICS and ACCOUNTANTS inherit
EMP_NUM, EMP_LNAME, etcc from the Specialization is the top-down process of
supertype EMPLOYEE. One important identifying lower-level entity subtypes from a
inheritance characteristic is that all entity subtypes higher-level entity supertype.
inherit their primary key attribute from their
supertype. Specialization is based on grouping unique
characteristics and relationships of the subtypes.
For example: For the aviation business, There are
multiple entity subtypes from the Employee
supertype.

Generalization is the bottom-up process of


identifying a higher-level entity supertype from
lower-level entity subtypes. Generalization is based
on grouping common characteristics and
relationships of the subtypes.
For example, For piano, violin, and guitar, There is
one supertype “string instrument” to hold the
common characteristics.
****

Entity Clustering
In Real-time, An ER diagram may contain
Subtype Discriminator: hundreds of entity types and their relationships. It
A subtype discriminator is the attribute in the makes the ERD becomes crowd and makes it
supertype entity that determines to which subtype unreadable and inefficient. Entity clusters can solve
the supertype occurrence is related. this problem.
An entity cluster is a “virtual” entity type that is
Disjoint and Overlapping Constraints used to represent multiple entities and relationships
An entity supertype can have disjoint or in the ERD. It is a temporary entity used to
overlapping entity subtypes. represent multiple entities and relationships to
Disjoint subtypes: simplify the ERD.
Disjoint subtypes are subtypes that contain a ***
unique subset of the supertype entity set. Each
entity instance of the supertype can appear in only ENTITY INTEGRITY
one of the subtypes. For example, An employee SELECTING PRIMARY KEYS
who is a pilot can appear only in the PILOT Primary key is the most important characteristic of
subtype, not in any of the other subtypes. These are an entity. It uniquely identifies each entity instance.
also known as non-overlapping subtypes. Disjoint Therefore, it is essential to select the primary key
subtypes can be indicated with the letter ‘d’ . properly.
Overlapping subtypes :
Overlapping subtypes are subtypes that contain Primary Key Guidelines:
non-unique subsets of the supertype entity set A primary key is the attribute or combination of
Each entity instance of the supertype may appear in attributes. It’s main function is to uniquely identify
more than one subtype. For example, in a university an entity instance or row within a table.
environment, a person may be an employee or a •As a determinant, it should determine all its
student or both. Overlapping subtypes can be dependents. It should guarantee entity integrity.
indicated with the letter ‘o’ .
****

https://mguugcs.blogspot.in Page: 19
B.SC-Computer Science-III Year DBMS Study Material

•Second, primary keys and foreign keys are used to


implement relationships among entities. Therefore,
there is a need to choose a good primary key.
The following are the desirable primary key
characteristics: Figure : Fan Trap
A single site contains many departments and
1) PK Characteristic : Unique values employs many staff. However, which staff work in
Justification: The PK must uniquely identify a particular department.
each entity instance. It cannot contain The fan trap is resolved by restructuring the
nulls. original ER model to represent the correct
association.
2) PK Characteristic: Nonintelligent :
Justification:: The PK should not have
embedded semantic meaning. It should be factless.

3) PK Characteristic: No change over time Figure : Resolved Fan Trap


Justification: The PK should be permanent and ***
unchangeable.
NORMALIZATION OF DATABASE
4) PK Characteristic: Preferably single-attribute TABLES
Justification:: A primary key should have the Database Tables and Normalization
minimum number of attributes. Normalization is a process for evaluating
Single attribute primary keys simplify the and correcting table structures to minimize data
implementation of foreign keys.
redundancies, reducing data anomalies.
5) PK Characteristic: Preferably numeric
Justification:: Primary Key values can be Normalization proceeds through a series of stages
better managed when they are numeric. called normal forms. There are Six normal forms.
Numeric fileds makes it easy the database to  First Normal Form(1NF)
automatically increment the PK values for each  Second Normal Form(2NF)
new row.  Third Normal Form(3NF)
 Boyce-Codd Normal Form(BCNF)
6) PK Characteristic: Security compliant  Fourth Normal Form(4NF)
Justification:: The selected primary key must  Fifth Normal Form(5NF)
not be an attribute(s) that can cause a
security risk or violation. For example, using a Need for Normalization:
Social Security number as a PK in an EMPLOYEE There are two common situations in which
table is not a good idea. database designers use normalization.
• The designer uses normalization to analyze the
***** relationships that exist among the attributes within
FanTrap each entity, to determine if the structure can be
A fan trap occurs when you have one entity in two improved through normalization.
1:M relationships to other entities and there is an • Database designers often need to modify existing
association among the other entities that is not data structures and use the normalization process to
expressed in the model. improve the existing data structure to create an
A fan trap occurs when a model represents a appropriate database design.
relationship between entity types, but the pathway
between certain entity occurrences is ambiguous. It
occurs when 1:m relationships fan out from a single
entity.

https://mguugcs.blogspot.in Page: 20
B.SC-Computer Science-III Year DBMS Study Material

First Normal Form (1NF): of the primary key. Every time activity value
A table is in first normal form, if it has no repeats, corresponding fee value repeats
multivalued attributes or repeating groups. To To create 2NF relations from the activities
convert the above table into first normal form, the relation, can divide relations in two:
process starts with a 3-step procedure.
1. Eliminate the repeating groups, by SIDActivity
eliminating nulls and ensuring data Activity Fee
values.
2. Identify the Primary Key. Student_Activity
3. Identify All Dependencies like partial, SID Activity
transitive etc.
By doing the above 3 steps the DATAORG table 100 Skiing
will be as follows; 100 Golf
150 Swimming
175 Squash
200 Swimming
200 Golf

Activity_Fee
Activity Fees
Skiing 200
Swimming 50
Golf 65
Second Normal Form (2NF):
A Relation is said to be in 2NF if every non- Squash 50
key attribute is fully functionally dependent
on all parts of primary key A table is in 2NF form if:
Note: If the primary key consists of  Single value dependencies in each cell
one attribute, relation is automatically in  No Repeating Attributes
2NF.  No Multi-valued Attributes
 Each Attribute is fully dependent
SID Activity Fee on the primary key.
100 Skiing 200
Third Normal Form (3NF)
100 Golf 65 A relation is in third normal form (3NF) if it is in
second normal form and no transitive dependencies
150 Swimming 50 exist.
175 Squash 50
A transitive dependency in a relation is a functional
175 Swimming 50 dependency between the primary key and one or
200 Swimming 50 more non-key attributes that are dependent on the
primary key via another non-key attribute.
200 Golf 65
For example, there are two transitive dependencies
Fee not fully functionally dependent on
in the following CUSTOMER ORDER relation:
primary key, only dependent on activity part

https://mguugcs.blogspot.in Page: 21
B.SC-Computer Science-III Year DBMS Study Material

Converting a Relation to BCNF


A relation that is in 3NF can be converted into
BCNF by using a simple two-step process. This
process is shown in Figure B-2.
OrderID→ CustomerID → CustomerName
In the first step,
OrderID→CustomerI → CustomerAddress
The relation should be restructured (modified). So
that the determinant in the relation that is not a
Transitive dependencies create unnecessary
candidate key becomes a component of the primary
redundancy that may lead to anomalies.
key of the revised relation. The attribute that is
functionally dependent on that determinant
Removing Transitive Dependencies:
becomes a non-key attribute.
Transitive dependencies can be removes by means
of a three-step procedure:
It gives the following table:
1. For each nonkey attribute that is a determinant in
a relation, create a new relation. That
attribute becomes the primary key of the new
relation.
2. Move all of the attributes that are functionally
dependent on the primary key of the new
relation from the old to the new relation.
3. Leave the attribute that serves as a primary key Figure: Revised STUDENT ADVISOR relations
in the new relation in the old relation to serve (1NF)
as a foreign key that allows you to associate the The above new relation has a partial functional
two relations. dependency. (Major is functionally dependent on
Advisor).
The results of applying these steps to the above
relation is shown below: In the second step,
Decompose the relation to eliminate the partial
functional dependency. This gives two relations, as
shown below:

Boyce-Codd Normal Form


A relation is in Boyce-Codd normal form (BCNF) Figure: Two relations in BCNF
if and only if every determinant in the relation is a The above relations are in BCNF because there is
candidate key. The following STUDENT only one candidate key (the primary key) in each
ADVISOR table is not in BCNF because although relation.
the attribute Advisor is a determinant, it is not a
candidate key. Fourth Normal Form (4NF)
A relation is in Fourth Normal Form, When it is is
in BCNF and there should be no Multi-valued
dependencies.
Multivalued Dependencies
A multivalued dependency is a type of dependency
that exists when there are at least three attributes
Figure: Functional dependencies in STUDENT
(e.g., A, B, and C) in a relation, with a well-defined
ADVISOR

https://mguugcs.blogspot.in Page: 22
B.SC-Computer Science-III Year DBMS Study Material

set of B and C values for each A value, but those B This could be controlled by designing a program
and C values are independent of each other. that can validate classification based
on the student hours.
To remove the multivalued dependency from a 3. Preaggregated Data: This occurs for example,
relation, we divide the relation into two new when we try to store the student grade point
relations. Each of these tables contains two
average(STU_GPA) aggregate value in the
attributes that have a multivalued relationship in the
original relation. STUDENT table when this can be calculated from
ENROLL and COURSE tables.
Fifth Normal Form (5NF)
This could be controlled by updating STU_GA via
A Fifth normal form is a higher level normal form administrative routines.
that deals with a property called “lossless joins.”
4. Information Requirements: This occurs when
5NF is not of practical significance because lossless we try to use a temporary de-normalized table to
joins occur very rarely and are difficult to detect. hold report data.
The solution is to eliminate the problems caused by
the multivalued dependency. We do this by A temporary table is deleted once the report is
creating new tables for the components of the done.
multivalued dependency.
Domain-key normal form (DKNF) ***

A Domain-Key Normal Form (DKNF) is an


attempt to define an “ultimate normal form” that
takes into account all possible types of
dependencies and constraints. Although the
definition of DKNF is quite simple, its practical
value is minimal.
***

Denormalization
Denormalization is the process of attempting to
optimize the performance of a database by adding
redundant data or by grouping data. In some cases,
denormalization helps cover up the inefficiencies
inherent in relational database software.
The following are some common denormalizaion
examples:
1. Redundant Data: This occurs for example, when
we try to store ZIP and CITY attributes in the
CUSTOMER table when ZIP determines the CITY.
This could be controlled by designing a program
that can validate city based on the
zip code.
2. Derived Data: This occurs for example, When
we try to store STU_HRS and STU_CLASS when
STU_HRS determines STU_CLASS.

https://mguugcs.blogspot.in Page: 23
B.SC-Computer Science-III Year DBMS Study Material

UNIT-III Create Command: The create command is used to


Introduction to SQL create different database objects, such as schema
 SQL stands for Structured Query Language. object, table object, view object, index object etc in
SQL was first developed by IBM in 1979. the database.
SQL is also pronounced as "sequel". Creating Table Syntax:
 SQL language contains Queries. CREATE TABLE tablename (column1 data type
 A query allows the user to describe desired [constraint] [,column2 data type [constraint] ]
data, leaving the DBMS to carry out [,PRIMARY KEY (column1 [, column2]) ]
planning, optimizing and performing the [,FOREIGN KEY (column1 [, column2])
physical operations necessary to produce REFERENCES tablename] [,
the result, so SQL is called as non- CONSTRAINT constraint ] );
procedural language, which is used to Example:
interact with the database through DBMS.
CREATE TABLE STUDENT (RNO
 SQL is not vendor specific, but there may
NUMBER(12),SNAME VARCHAR(20),
be slight variations with different vendors
MARKS NUMBER(3), DOB DATE, ADDR
like Oracle SQL, Microsoft SQL Server,
MySQL, PostgreSQL and so on. VARCHAR(20), COURSENO NUMBER(5),
PRIMARY KEY (RNO), FOREIGN KEY (CNO
Database Schema REFERENCES COURSE));
In the SQL environment, a schema is a
group of database objects, such as tables and Alter Command: The alter command is used to
indexes, that are related to each other. Usually, the modify the table structure that is like adding or
schema belongs to a single user or application. A deleting a column to an existing table and
single database can hold multiple schemas modifying the data type and size of a column.
belonging to different users or applications. Think The alter command uses three keywords in
of a schema as a logical grouping of database conjunction to it. They are:
objects, such as tables, indexes, and views. 1) ADD: To append a new column
Schemas are useful in that they group tables by 2) MODIFY: To change the data type or size of a
owner (or function) and enforce a first level of colun
security by allowing each user to see only the tables 3) DROP: To delete a column
that belong to that user. Alter with Add:
Schema Syntax: Syntax:
CREATE SCHEMA AUTHORIZATION {creator}; Alter table tablename ADD colname
Example: datatype(size);
If the creator is SCOTT, use the command: Example:
CREATE SCHEMA AUTHORIZATION SCOTT; Alter table student ADD mobile number(10);
Alter with Modify:
Data Definition Language (DDL) Syntax:
The Data Definition Language commands Alter table tablename MODIFY colname
are used to create different database objects such as newdatatype(newsize);
schema, tables, views, indexes etc., to modify the Example:
table structures and to delete the tables permanently Alter table student MODIFY mobile varchar(5);
from the database. The DDL commands are: Alter with Drop:
1. Create Syntax:
2. Alter Alter table tablename DROP COLUMN colname
3. Drop Example:
Alter table student DROP column mobile;

https://mguugcs.blogspot.in Page: 24
B.SC-Computer Science-III Year DBMS Study Material

Drop Command: The Drop command is used to Delete Command: This command is used to delete
permanently delete a table from the database. The the records/rows in a table.
drop table will delete both the table structure and 1) Syntax:
the data in the table. Once a table is dropped, we Delete * from tablename;
cannot retrieve it back, so we have to be cautious Example:
before using a drop command. Delete * from Student;
Syntax: Description: The above query will delete all the
Drop table tablename; rows in the student table.
Example: 2) Syntax:
Drop table Student; Delete from tablename where condition;
Example:
Data Manipulation Language (DML) Delete from student where marks < 35;
Data manipulation commands are used to Description: The above query will delete only
add data to a table, modify data in the table, delete those student rows who got less than 35.
data in the table and to display data in the table. The
different Data Manipulation Commands are: Select Command: This command is used to
INSERT, SELECT, UPDATE, DELETE, retrieve and display the data in the database. Select
COMMIT, and ROLLBACK. command can display all rows and columns or
particular rows and particular columns;
Insert Command: This command is used to add Example:
data values into a table. select * from student;
Syntax: Description: Retrieves and display all the rows and
Insert into tablename values(col1val, col2val, ... columns data of the student table.
colNval);
Example: Example:
Insert into student values(101,'Ravi',755,'21-DEC- Select * from student where marks > 75;
1985','Hyderabad'); Description: Displays only the particular rows of
the students who got marks greater than 75 but
Update Command: display all the columns.
This command is used to modify the data values in
a table: Example:
1) Syntax: Select rno, dob from student;
Update tablename SET columnname = value; Description: Display only the particular columns
Example: rno and dob data but displays all rows.
Update student SET marks = 75;
Description: The above query will change all the Example:
data values of all the students to 75. Select rno, dob from student where marks > 75;
Description: Displays only the particular columns
2) Syntax: RNO and DOB and display only those rows of
Update tablename SET columname=value students whose marks is greater than 75.
WHERE condition;
Example:
Update Student SET marks=40 where marks = 30;
Description: The above query will set the data
value 40 to only those students whose marks are
equal to 30 as specified in the where clause.

https://mguugcs.blogspot.in Page: 25
B.SC-Computer Science-III Year DBMS Study Material

Data Types added to a table. The end user may, of


The different data types in Oracle SQL can be course, enter a value other than the default value.
categorized into Numeric, Character and Date; 6) CHECK: The CHECK constraint is used
Numeric: to validate data when an attribute value is entered.
1) NUMBER(L,D) Where L indicates length The CHECK constraint does precisely what
and D indicates the number of decimal places. its name suggests: it checks to see that a specified
2) INTEGER / INT condition exists.
3) SMALL INT The Relational Database Model, must adhere to
4) DECIMAL(L,D) Similar to Number but the rules on entity integrity and referential integrity and
storage length is a minimum specification. is crucial in a relational database environment.
Character: Fortunately, most SQL implementations support
1) CHAR(L) Fixed-length character data for up to both integrity rules.
255 characters. Unused Spaces are left unused. Note: The CREATE TABLE command lets us
2)VARCHAR(L)Variable-length character data. define constraints in two different places:
Unused Spaces will be used when needed.  When we create the column definition
3)VARCHAR2(L) Same as VARCHAR(L). (known as a column constraint).
Oracle automatically converts VARCHAR to  When we use the CONSTRAINT keyword
VARCHAR2 (known as a table constraint).
Date:
DATE Used to store Date type data values. Stores Example Queries:
date in Julian date format. 1) create table student (rollno number(5) NOT
NULL, name varchar(20) not null);
SQL Constraints: 2) create table student (rollno number(5) UNIQUE,
A constraint in SQL is a condition/rule that name varchar(20));
is applied to the columns. The different constraints 3) create table student(rollno number(5) primary
in SQL are; key, name varchar(20));
1) NOT NULL: The NOT NULL 4) create table student(rollno number(5) Primary
constraint ensures that a column does not accept Key, name varchar(20), marks number(3),
nulls. addr varchar(20), courseno number(2)
2) UNIQUE: The UNIQUE constraint REFERENCES course(courseno));
ensures that all values in a column are unique. 5) create table student(stuname varchar(20), marks
3) PRIMARY KEY: Entity integrity is number(3) DEFAULT 0);
enforced automatically when the PRIMARY KEY
constraint is specified in the CREATE SQL Indexes
TABLE command on column(s). Indexes are used to improve the efficiency
4) FOREIGN KEY: Referential Integrity of searches and to avoid duplicate column
is enforced automatically when the FOREIGN values. We know how to declare unique indexes on
KEY constraint is specified in the CREATE selected attributes when the table is created.
TABLE command on column(s). When the foreign In fact, when we declare a primary key, the DBMS
key is applied, we cannot delete a value in the automatically creates a unique index. Even with
primary table which is referenced by another table this feature, we often need additional indexes. The
and also if any changes is made in the ability to create indexes quickly and efficiently is
primary table, they will be reflected in the reference important. Using the CREATE INDEX command,
table. SQL indexes can be created on the basis of any
5) DEFAULT: The DEFAULT constraint selected attribute.
assigns a value to an attribute when a new row is

https://mguugcs.blogspot.in Page: 26
B.SC-Computer Science-III Year DBMS Study Material

Syntax: Relational Operators:


CREATE [UNIQUE] INDEX indexname ON These operators are used for comparisions. They
tablename(column1 [, column2]); are;
Example: Operator Description
CREATE INDEX STU_INDEX ON = Equal to
STUDENT(ROLLNO); < Less than
<= Less than or Equal to
Many RDBMSs, including Access, automatically > Greater than
create a unique index on the PK attribute(s) when >= Greater than or Equal to
we declare the PK. A common practice is to create < > or != Not Equal to
an index on any field that is used as a search key,
in comparison operations in a conditional Example Queries:
expression, or when we want to list rows in a 1) select empno, ename, sal from emp where sal <
specific order. 2000;
Syntax for Deleting an Index: 2) select ename, sal, deptno from emp where sal
DROP INDEX indexname >=3000;
Example:
DROP INDEX PROD_PRICEX; Logical Operators:
These operators are used for combining more than
Operators in SQL one expression into a compound expression. The
An operator is used to perform either arithmetic or different logical operators in SQL are;
logical computations. The operators in SQL will be 1) AND Logical AND operator
either a symbol or a word. SQL supports the 2) OR Logical OR operator
following kinds of operators. They are; 3) NOT Logical NOT operator
1) Arithmetic Operators Example Queries:
2) Relational Operators 1) select empno, ename, sal from emp where sal >
3) Logical Operators 1000 AND sal < 3000;
4) Special Operators 2) select ename, deptno from emp where sal <=
1000 OR comm > 0;
Arithmetic Operators: 3) select deptno, ename, sal from emp where NOT
These operators are used to do arithmetic (sal = 3000);
operations. The operators are;
Operator Description Special Operators:
+ Add SQL contains some special operators, they are;
- Subtract 1) BETWEEN: Used to check whether an attribute
* Multiply value is within a range.
/ Divide 2) IS NULL: Used to check whether an attribute
^ Raise to the power of value is null.
3) LIKE: Used to check whether an attribute value
Example Queries: matches a given string pattern.
1) select empno, sal+500 from emp; 4) IN: Used to check whether an attribute value
2) update emp set sal = sal + 500; matches any value within a value list.
3) select ename, sal*2 from emp; 5) EXISTS: Used to check whether a subquery
returns any rows.

https://mguugcs.blogspot.in Page: 27
B.SC-Computer Science-III Year DBMS Study Material

Example Queries: 4) SUM:


1) select * from emp where sal BETWEEN 1000 The SUM function computes the total sum for any
AND 2000; specified attribute, using whatever condition(s) we
2) select * from emp where comm IS NULL; have imposed.
3) select * from emp where ename LIKE 'S%'; Syntax:
4) select * from emp where sal IN (1000, 1500, SUM(columnname)
2000); Example:
5) select * from emp where EXISTS (select * from SELECT SUM(SAL) FROM EMP;
product where sal=5000); 5) AVG:
**** The AVG function format is similar to those of
Aggregate Functions MIN and MAX and is subject to the same operating
SQL can perform various mathematical summaries restrictions.
for us, such as counting the number of rows that Syntax:
contain a specified condition, finding the minimum AVG(columnname)
or maximum values for some specified attribute, Example:
summing the values in a specified column, and SELECT AVG(SAL) FROM EMP;
averaging the values in a specified column. Those
aggregate functions are shown below; Ordering Data:
1) COUNT: The ORDER BY clause is used when we
The COUNT function is used to tally the number of want to display the records sorted in either
non-null values of an attribute. COUNT can be ascending or descending order on particular
used in conjunction with the DISTINCT clause. column(s).
Syntax: Syntax:
COUNT(columnname) SELECT columnlist FROM tablelist
Examples: [WHERE conditionlist ]
SELECT COUNT(COMM) FROM EMP; [ORDER BY columnlist ASC / DESC];
SELECT COUNT(MGR) FROM EMP; Examples:
SELECT * FROM EMP ORDER BY SAL ASC;
2) MAX: SELECT * FROM EMP WHERE SAL > 1000
The MAX function is used to find the maximum of ORDER BY DEPTNO;
the given attribute. SELECT * FROM EMP ORDER BY ENAME
Syntax: DESC;
MAX(columnname) SELECT * FROM EMP ORDER BY SAL DESC,
ENAME ASC;
Example:
SELECT MAX(SAL) FROM EMP; Grouping Data:
Frequency distributions can be created quickly and
3) MIN: easily using the GROUP BY clause within the
The MAX function is used to find the maximum of SELECT statement.
the given attribute. Syntax:
Syntax: SELECT columnlist
MIN(columnname) FROM tablelist
Example: [WHERE conditionlist ]
SELECT MIN(SAL) FROM EMP; [GROUP BY columnlist ]
[HAVING conditionlist ]
[ORDER BY columnlist [ASC | DESC] ] ;

https://mguugcs.blogspot.in Page: 28
B.SC-Computer Science-III Year DBMS Study Material

Note: The GROUP BY clause is generally used to only specified columns and specified
when we have attribute columns combined with rows in a table.
aggregate functions in the SELECT statement. The  Views may also be used as the basis for
GROUP BY clause is valid only when used in reports.
conjunction with one of the SQL aggregate
functions, such as COUNT, MIN, MAX, AVG, and We can create a view by using the CREATE VIEW
SUM. command:
Example: Syntax:
SELECT ENAME, SUM(SAL), DEPTNO FROM CREATE VIEW viewname AS SELECT query
EMP GROUP BY DEPTNO; Example:
CREATE VIEW EMP_10 AS
The above code will generate a “not a GROUP BY SELECT JOB, AVG(SAL) AS TOTAL FROM EMP
expression” error. However, if we write the GROUP BY JOB;
preceding SQL command sequence in conjunction ****
with some aggregate function, the GROUP BY ADVANCED SQL
clause works properly. Relational Set Operators:
The set operators in SQL are Set-Oriented
The GROUP BY Feature's HAVING Clause that is they operate over entire sets of rows and
A particularly useful extension of the GROUP BY columns of the tables at once. SQL provides four
feature is the HAVING clause. The HAVING different set operators, they are;
clause operates very much like the WHERE clause 1) UNION
in the SELECT statement. However, the WHERE 2) UNION ALL
clause applies to columns and expressions for 3) INTERSECT
individual rows, while the HAVING clause is 4) MINUS
applied to the output of a GROUP BY operation.
1) UNION: The UNION statement combines rows
Example: from two or more queries without including
SELECT ENAME, SUM(SAL), DEPTNO FROM duplicate rows.
EMP GROUP BY DEPTNOHAVING In other words, the UNION statement combines the
(SUM(SAL) > 2500) output of two SELECT queries. (Remember that
ORDER BY SUM(SAL) DESC; the SELECT statements must be union-compatible.
That is, they must return the same number of
Virtual Tables attributes and similar data types.)
A view is a virtual table based on a SELECT query. Syntax:
The query can contain columns, computed query UNION query
columns, aliases, and aggregate functions from one Example:
or more tables. The tables on which the view is SELECT * FROM EMP1 UNION SELECT *
based are called base tables. A relational view has FROM EMP2;
several special characteristics:
 We can use the name of a view anywhere a 2) UNION ALL: The UNION ALL statement
table name is expected in a SQL statement. combines rows from two or more queries
 Views are dynamically updated. That is, the including the duplicate rows.
view is re-created on demand each time it is Syntax:
invoked. query UNION ALL query
 Views provide a level of security in the Example:
database because the view can restrict users SELECT * FROM EMP1 UNION ALL SELECT *
FROM EMP2;

https://mguugcs.blogspot.in Page: 29
B.SC-Computer Science-III Year DBMS Study Material

3)INTERSECT: The INTERSECT statement SELECT P_CODE, P_PRICE FROM


combines rows from two or more queries which PRODUCT
appear in both the sets.
WHERE P_PRICE >= (SELECT
Syntax:
AVG(P_PRICE) FROM PRODUCT);
query INTERSECT query
Example: IN Subqueries
SELECT * FROM EMP1 INTERSECT SELECT *
FROM EMP2; When you want to compare a single attribute to a
list of values, you use the IN operator.
4) MINUS: The MINUS statement combines rows IN subquery.
from two or more queries and returns only
the rows that appear in the first set but not in the The following example lists all customers who
second set. have purchased hammers, saws, or saw blades.
Syntax: SELECT CUS_CODE, CUS_LNAME FROM
query MINUS query CUSTOMER JOIN PRODUCT WHERE
Example: P_CODE IN (SELECT P_CODE FROM
SELECT * FROM EMP1 MINUS SELECT * PRODUCT WHERE P_DESCRIPT LIKE
FROM EMP2; '%hammer%' OR P_DESCRIPT LIKE
**** '%saw%');
Subqueries
HAVING Subqueries
A subquery is a query that is nested inside another
query. The inner query is always executed first by A subquery with a HAVING clause is known as a
the RDBMS. A Sub query is also known as a Having Sub query.
nested query or an inner query. Generally HAVING clause is used to filter the
Characteristics of Subqueries: output of a GROUP BY query

 A subquery is a query inside a query. For example:


 It can be expressed inside brackets SELECT P_CODE, SUM(P_UNITS) FROM
(parentheses).
PRODUCT GROUP BY P_CODE HAVING
 The first query in the SQL statement is
known as the outer query. SUM(P_UNITS) > (SELECT AVG(P_UNITS)
 The query inside the SQL statement is FROM PRODUCT);
known as the inner query. Correlated Subqueries
 The inner query is executed first.
 The output of an inner query is used as the A correlated subquery is a subquery that executes
input for the outer query. once for each row in the outer query.
This is similar to a nested loop in a programming
Types of Subqueries: language. For example:
WHERE Subqueries: FOR X = 1 TO 2
The subquery, that use an inner SELECT subquery FOR Y = 1 TO 3
on the right side of a WHERE condition is known
as a WHERE Subquery. PRINT “X = “X, “Y = “Y

For example, to find all products with a price END


greater than or equal to the average product price: END

https://mguugcs.blogspot.in Page: 30
B.SC-Computer Science-III Year DBMS Study Material

will yield the output Conversion / Date or Time Functions


These functions are used to convert from character
X =1 Y= 1
format to date format and from date format to
X =1 Y= 2 character format.
1) TO_CHAR: Returns a character String or a
X =1 Y= 3 formatted string from a date format.
X =2 Y= 1 Syntax:
TO_CHAR(date_value,fmt)Where fmt = format;
X =2 Y= 2 can be:
X =2 Y= 3 MONTH: Full name of month, or MON: Three-
letter month name, or MM: Two-Digit month
For a correlated sub query the RDBMS works as name, or DD: Number day of month, or D: Number
follows: for day of week, or DAY: Name of day of week,
1. It initiates the outer query. or YY: Two-digit year value, or YYYY: Four-digit
year value.
2. For each row of the outer query result set, it Example:
executes the inner query by passing the outer row 1) Select TO_CHAR ( HIREDATE,'YYYY') from
to the inner query. emp;
The query is called a correlated subquery because 2) select TO_CHAR( HIREDATE,'MM') as month
the inner query is related to the outer query by from emp;
referencing a column of the outer subquery.
2) TO_DATE: Returns a date format value from a
For example: character or string format.
Syntax:
SELECT P_CODE, P_UNITS FROM
TO_DATE (char_value, fmt)
PRODUCT, LINE WHERE P_UNITS >
Where fmt = format; can be:
(SELECT AVG(P_UNITS) FROM PRODUCT
MONTH: Full name of month, or MON: Three-
WHERE PRODUCT.P_CODE
letter month name, or MM: Two-Digit month
=LINE.P_CODE);
name, or DD: Number day of month, or D: Number
**** for day of week, or DAY: Name of day of week,
SQL Functions or YY: Two-digit year value, or YYYY: Four-digit
A SQL function can be derived from an year value.
existing attribute. Functions always use a Example:
numerical, date, or string value. The value may be 1)Select TO_DATE (
part of the command itself (a constant or literal) or HIREDATE,'YYYY/MM/DD') from emp;
it may be an attribute located in a table. Therefore, 2) select TO_DATE( HIREDATE,'DD/MM/YY')
a function may appear anywhere in an SQL from emp;
statement where a value or an attribute can be used.
There are many types of SQL functions, such as Numeric Functions:
arithmetic, trigonometric, string, date, and time Numeric functions can be grouped in many
functions. different ways, such as algebraic, trigonometric,
and logarithmic. Numeric functions take one
numeric parameter and return one value. Some of
the numeric functions in Oracle SQL are;
1) ABS: Returns the absolute value of a number.

https://mguugcs.blogspot.in Page: 31
B.SC-Computer Science-III Year DBMS Study Material

2)ROUND Returns the rounded value of a 2) UPPER Returns a String in all


number. uppercase letters.
3)CEIL Returns the ceiling value of a 3) LENGTH Returns the length of a
number. String.
4)FLOOR Returns the floor value of a number. 4) SUBSTRING Returns Substring or part of
1) ABS: a given string pattern.
It is used to find the absolute value of a given 1) LOWER:
number or attribute. Used to convert a string to lowercase letters.
Syntax: Syntax: LOWER(string_value)
ABS( numeric_value) Example:
Example: 1) Select LOWER(ENAME) from emp;
1) select ABS(-1.93) from dual; 2) Select LOWER( ' MGU ' ) from dual;
2) select ABS( 1.95 ) from dual; 2) UPPER:
2) ROUND: Used to convert a string to uppercase letters.
It is used to round a value of a given number or Syntax: UPPER(string_value)
attribute. Example:
Syntax: 1) Select UPPER(ENAME) from emp;
ROUND( numeric_value) 2) Select UPPER( ' MGU ' ) from dual;
Example: 3) LENGTH:
1) select ROUND(1.93) from dual; Used to find the string length.
2) select ROUND( 1.45 ) from dual; Syntax: LENGTH(string_value)
3) CEIL: Example
It is used to ceil a value of a given number or 1) Select LENGTH(ENAME) from emp;
attribute. 2) Select LENGTH( ' MGU ' ) from dual;
Syntax: 4) SUBSTR:
CEIL( numeric_value) Used to return a substring from a given string.
Example: Syntax: SUBSTR(string_value,P,L)Where P =
1) select CEIL( 1.93 ) from dual; Start Position L = Length of characters
2) select CEIL( 1.45 ) from dual; Example: 1) Select SUBSTR( ENAME,1,3 ) from
4) FLOOR: emp;
It is used to floor a value of a given number or 2) Select LOWER( ' MGU ',2,4 ) from dual;
attribute.
Syntax: ****
FLOOR( numeric_value) Oracle Sequences
Example: We can use a “sequence” to assign values to a
1) select FLOOR( 1.93 ) from dual; column on a table. The properties are;
2) select FLOOR( 1.45 ) from dual;  Oracle sequences are an independent object
in the database. (Sequences are not a data
type.)
String Functions  Oracle sequences have a name and can be
String manipulations are among the most-used used anywhere a value is expected.
functions in programming. Various string operating  Oracle sequences are not tied to a table or a
column.
functions in SQL are;
 Oracle sequences generate a numeric value
1) LOWER Returns a String in all that can be assigned to any column in any
lowercase letters. table.

https://mguugcs.blogspot.in Page: 32
B.SC-Computer Science-III Year DBMS Study Material

 The table attribute to which we assigned a INSERT INTO INVOICE VALUES


value based on a sequence can be edited and (INV_NUMBER_SEQ.NEXTVAL, 20010,
modified. SYSDATE);
 An Oracle sequence can be created and INSERT INTO LINE VALUES
deleted anytime. (INV_NUMBER_SEQ.CURRVAL, 1,'13-Q2/P2',
Syntax: 1, 14.99);
CREATE SEQUENCE name [START INSERT INTO LINE VALUES
WITH n] [INCREMENT BY n] [CACHE | (INV_NUMBER_SEQ.CURRVAL, 2,'23109-HB',
NOCACHE] 1, 9.95);
where: COMMIT;
 name is the name of the sequence. Dropping the Sequence:
 n is an integer value that can be
DROP SEQUENCE CUS_CODE_SEQ;
positive or negative.
DROP SEQUENCE INV_NUMBER_SEQ;
 START WITH specifies the initial
sequence value. (The default value is 1.)  Dropping a sequence does not delete the
 INCREMENT BY determines the values we assigned to table attributes.
value by which the sequence is  It deletes only the sequence object from the
incremented. database.
 The CACHE or NOCACHE clause ****
indicates whether Oracle will pre-allocate Types of JOINS
sequence numbers in memory. Joins are of the following types:
Example:  Inner Join
To create a sequence to  Outer Join
automatically assign values to the customer  Left Outer Join
code each time a new customer is added and  Right Outer Join
create another sequence to automatically  Full Outer Join
assign values to the invoice number each  Cross Join
•INNER JOIN: returns rows when there is a match
time a new invoice is added. The SQL code
in both tables.
to accomplish those tasks is:
Example:
CREATE SEQUENCE CUS_CODE_SEQ START
SQL> SELECT ID, NAME, AMOUNT, DATE
WITH 20010;
FROM CUSTOMERS
CREATE SEQUENCE INV_NUMBER_SEQ
INNER JOIN ORDERS
START WITH 4010 INCREMENT BY 10;
ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
Usage in Tables After Creation:
•LEFT JOIN: returns all rows from the left table,
To use sequences during data entry, we must use
even if there are no matches in the right table.
two special pseudo-columns with the sequence
Example:
name:
SQL> SELECT ID, NAME, AMOUNT, DATE
NEXTVAL  Retrieves the next available value
FROM CUSTOMERS
from a sequence.
LEFT JOIN ORDERS
CURRVAL  Retrieves the current value of a
ON CUSTOMERS.ID =
sequence.
ORDERS.CUSTOMER_ID;
Example:
•RIGHT JOIN: returns all rows from the right
INSERT INTO CUSTOMER VALUES
table, even if there are no matches in the left table.
(CUS_CODE_SEQ.NEXTVAL, ‘Connery’,
Example:
‘Sean’, NULL, ‘615’, ‘898-2008’, 0.00);
SQL> SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS

https://mguugcs.blogspot.in Page: 33
B.SC-Computer Science-III Year DBMS Study Material

RIGHT JOIN ORDERS declare


ON CUSTOMERS.ID =
<declaration of variables, constants, function,
ORDERS.CUSTOMER_ID;
procedure, cursor etc.>;
•FULL JOIN: returns rows when there is a match begin
in one of the tables.
SQL> SELECT ID, NAME, AMOUNT, DATE <executable statement(s)>;
FROM CUSTOMERS exception
FULL JOIN ORDERS
ON CUSTOMERS.ID = <exception handling>;
ORDERS.CUSTOMER_ID end;
•CARTESIAN JOIN: returns the Cartesian
product of the sets of records from the two or more /
tables. This section starts with the keyword
SQl>SELECT ID, NAME, AMOUNT, DATE declare.
FROM CUSTOMERS, ORDERS;  The Begin section contains the Executable
statements. This section starts with the
•SELF JOIN: is used to join a table to itself. keyword begin.
SQL> SELECT a.ID, b.NAME, a.SALARY  The Exception handling section contains
FROM CUSTOMERS a, CUSTOMERS b the statements that may cause any
WHERE a.SALARY < b.SALARY; exception. This section starts with the
keyword exception.
****  The PL/SQL block is terminated by the end
PL/SQL keyword. In PL/SQL Block only the
SQL does not have procedural functionality. So to executable section is required, the
solve this problem the SQL-99 standard defined the declarative and exception handling sections
use of persistent stored modules. are optional.
 The first section in a PL/SQL block is the
A persistent stored module (PSM) is a block of Declaration section. It can be used to define
code containing standard SQL statements and variables and cursors , etc.
procedural extensions that is stored and executed at  / forward slash executes the PL/SQL block.
the DBMS server.
MS SQL Server implements persistent stored A Simple Block
modules via Transact-SQL.
Example 1:
Oracle implements PSMs through its
procedural SQL language. Begin

Procedural SQL (PL/SQL) is a language that dbms_output.put_line(‘Welcome to PL/SQL


makes it possible to combine procedural code and Programming ’);
SQL statements with in the database.
dbms_output.put_line(‘PL/SQL stands for
Procedural SQL.’)
PL/SQL Block
dbms_output.put(‘It’s a PSM from Oracle’);
PL/SQL code is grouped into structures. These are
called as blocks. A Block contains three sections: End;
/

https://mguugcs.blogspot.in Page: 34
B.SC-Computer Science-III Year DBMS Study Material

Example 2: name varchar2(20);


To get the name and salary of specified employee. sal number(9,2);
Declare Begin
name varchar2(20); /* Get employee name and Salary */
sal number(9,2); Select ename,salary into name, sal from emp where
empno=1002;
Begin
/* Display Employee name and Salry */
Select ename,salary into name, sal from emp where
empno=1002; dbms_output.put_line(‘Name : ’ || name);
dbms_output.put_line(‘Name : ’ || name); dbms_output.put_line(‘Salary : ’ || sal);
dbms_output.put_line(‘Salary : ’ || sal); End;
End; /
/ Selection/Control statements
Comments in PL/SQL Block Within PL/SQL block, Selection statements are
useful to execute a particular set of statements
Comment improves readability and makes your
program more understandable. The PL/SQL engine based on a condition. ‘if’ statement can be used to
ignores them at the compilation and execution time. execute a sequence of statements based on some
condition. There are various form of if statement.
There are two kinds of comments:
if-then form
Single-line comments
if-then statement is the simple form of the if
It starts with two dashes.
statement. It has the following format:
Example
IF <boolean_expression> THEN
Begin — — declaration section is ignored
statements;
Insert into emp(ename, empno)
END IF;
values(‘Raju’,1234);
Here, boolean_expression is any expression that
End;
evaluates to a Boolean value.
/
Example
Multiline comments
Accept Number from a User and display Hello
It is starts with /* delimiter and end with the */ message if the entered number is Positive.
delimiter. This is the same style of comments as in
Declare
the C language.
num number;
Example
Begin
To get the name and salary of specified employee.
num := &num;
Declare
if num > 0 then

https://mguugcs.blogspot.in Page: 35
B.SC-Computer Science-III Year DBMS Study Material

dbms_output.put_line(‘Hello’); Example
end if; Accept number from a user and find out whether it
is Odd or Even.
end;
Declare
/
num number;
Example
Begin
Display Salary of a specified employee increasing
by 500 if its salary is more than 3000. num := &num;
Declare if mod(num,2) = 0 then
sal number(9,2); dbms_output.put_line(no,’is even’);
num emp.empno%type; else
Begin dbms_output.put_line(no,’is Odd’);
num := &num; end if;
Select salary into sal from emp where empno=num; End;
If sal > 3000 then /
sal := sal + 500; ****
end if; CURSORS
dbms_output.put_line(‘Salary : ’ || sal); Oracle creates a special memory area, known as
context area. The Context Area can be useful for
End;
processing the records(rows) returned by an SQL
/ statement.

If-then-else form In PL/SQL, A cursor is a pointer to the context area.


It can be used to control the context area. A cursor
f-then-else statement allows us to execute either holds the rows returned by a SQL statement. The
true block statements or False block statements set of rows the cursor holds is known as the active
depend on condition. It has the following format: set.

IF <boolean_expression> THEN We can assign a name to a cursor. So that it can be


opened to fetch and process the each row returned
True block statements; by the SQL statement.
ELSE There are two types of cursors:
False block statements;  Implicit cursors
END IF;  Explicit cursors.
Implicit Cursors:
True block statements are executed only if the
condition is satisfied otherwise the else portion will Implicit cursors are automatically created by
be executed. Oracle whenever an SQL statement is executed.

https://mguugcs.blogspot.in Page: 36
B.SC-Computer Science-III Year DBMS Study Material

ORACLE associates an implicit cursor ,whenever a Explicit Cursors:


DML statement (INSERT, UPDATE and Explicit cursors are programmer defined
DELETE) is issued. cursors. An Explicit cursor is useful for gaining
more control over the context area.
In PL/SQL, an implicit cursor(SQL Cursor) can An explicit cursor should be defined in the
have the attributes like declaration section of the PL/SQL Block.
The syntax for creating an explicit cursor is :
 %FOUND, CURSOR cursor_name IS select_statement;
 %ISOPEN,
 %NOTFOUND, and The following are the steps to create an explicit
 %ROWCOUNT. cursor:
1. Declaring the cursor
Attribute: %FOUND 2. Opening the cursor
3. Fetching the cursor
Description: It returns TRUE if an INSERT, 4. Closing the cursor
UPDATE, or DELETE statement affected one or
more rows or a SELECT INTO statement returned Declaring the Cursor:
one or more rows. Otherwise, it returns FALSE. A cursor can be declared for initializing in the
Attribute: %NOTFOUND memory. A cursor can be declared by assigning a
name and its associated SELECT statement.
Description: It is the logical opposite of
……………… %FOUND. For example:

Attribute: %ISOPEN CURSOR C_Emp IS SELECT EMPNO, ENAME,


Description: It Always returns FALSE for implicit SAL FROM EMP;
cursors, because Oracle closes the SQL cursor
Opening the Cursor:
automatically after executing its associated SQL
statement. A cursor can be opened to allocate memory to it and
to make it ready for fetching the rows.
Attribute: %ROWCOUNT
Description: It returns the number of rows affected For example:
by an INSERT, UPDATE or DELETE statements
or by a SELECT INTO statement. OPEN C_EMP;
Fetching the Cursor:
For example:
The following PL/SQL code illustrates the usage of Fetching the cursor involves accessing one row at a
Implicit Cursors: time.
DECLARE
number_of_rows number(2); For example:
BEGIN FETCH C-EMP INTO C_ENO, C_NAME, C-
UPDATE customers SET salary = salary + 500; SAL;
IF sql%notfound THEN
dbms_output.put_line('no customers selected'); Closing the Cursor:
ELSIF sql%found THEN
Number_of_rows := sql%rowcount; Closing the cursor means releasing the allocated
dbms_output.put_line( number_of_rows || ' memory.
customers selected ');
For example:
END IF;
END; CLOSE c_customers;
/

https://mguugcs.blogspot.in Page: 37
B.SC-Computer Science-III Year DBMS Study Material

For example: A Trigger can be created by using the following


syntax:
DECLARE
CREATE OR REPLACE TRIGGER trigger_name
C_ENO EMP.EMPNO%type;
[BEFORE / AFTER]
C_NAME EMP.ENAME%type;
[DELETE / INSERT / UPDATE OF
C_SAL EMP.SAL%type;
column_name] ON table_name
CURSOR C_EMP is SELECT EMPNO,
[FOR EACH ROW]
ENAME,SAL FROM EMP;
[DECLARE]
BEGIN
[variable_name data type[:=initial_value] ]
OPEN C_EMP;
BEGIN
LOOP
PL/SQL instructions;
FETCH C_EMP into C_ENO, C_ENAME,
C_SAL; ___.
EXIT WHEN C_EMP%notfound; END;
dbms_output.put_line(C_EMPNO || ' '
A trigger definition contains the following parts:
||C_EMPNME || ' ' ||C_SAL);
END LOOP;
The triggering timing: BEFORE or AFTER.
CLOSE C_EMP;
It indicates when the trigger will be executed.
END;
The triggering event: This statement indicates a
****
DML event (INSERT, UPDATE, or DELETE).
Triggers /
A trigger is PL/SQL code that is automatically
executed when some database event occurs. The triggering level: There are two types of
triggers:
 A trigger is automatically executed before •Statement-level triggers and
or after a data row is inserted, updated, or •Row-level triggers.
deleted.
 A trigger is associated with a database table. A statement-level / Table level trigger is executed
 Each table may have one or more triggers.
just once for the table. If we omit the FOR EACH
ROW keyword then it becomes a statement level
Triggers are useful to: trigger. This is the default case.
A row-level trigger is executed once for each row.
 To enforce constraints. It requires the use of the FOR EACH ROW
 To enforce referential integrity. keyword.
 To implement database auditing. The triggering action: It is the PL/SQL code
 To update table values, insert records in
enclosed in between the BEGIN and END
tables, and call other stored procedures.
keywords.

https://mguugcs.blogspot.in Page: 38
B.SC-Computer Science-III Year DBMS Study Material

For example: Syntax:


The following PL/SQL Code creates a Trigger on
CREATE OR REPLACE PROCEDURE
EMP Table to detect the changes in it.
procedure_name [(argument [IN/OUT] data-type,
_ )] [IS/AS] [variable_name data
CREATE OR REPLACE TRIGGER T_EMP
type[:=initial_value] ]
BEFORE
INSERT OR BEGIN
UPDATE OF sal OR
UPDATE OF deptno OR PL/SQL or SQL statements;
DELETE _
ON EMP
END;
BEGIN
CASE
WHEN INSERTING THEN Here,
DBMS_OUTPUT.PUT_LINE('Inserting new  argument specifies the parameters for a
Records'); stored procedure.
WHEN UPDATING('sal') THEN  IN/OUT indicates whether the parameter is
DBMS_OUTPUT.PUT_LINE('Updating for input, output, or both.
the salary');  data-type is one of the PL/ SQL data types.
WHEN UPDATING('deptno') THEN  Variables can be declared between the
DBMS_OUTPUT.PUT_LINE('Updating keywords IS and BEGIN.
department Number');
WHEN DELETING THEN A Stored Procedure can be executed by using the
DBMS_OUTPUT.PUT_LINE('Deleting following syntax:
the Rows');
EXEC procedure_name[(parameter_list)];
END CASE;
END; Example:
****
CREATE OR REPLACE PROCEDURE
Stored Procedures
PRC_EMP
A stored procedure is a named collection of
AS
PL/SQL and SQL statements.
BEGIN
Stored procedures are stored in the database. They
can be used to encapsulate and represent business UPDATE EMP SET COMM = SAL *.05
transactions.
WHERE DEPT_NO = 20;
Advantages of stored procedures:
DBMS_OUTOUT.PUT_LINE(‘ UPDATED
 Stored procedures can reduce network ‘5% COMMISSION ‘);
traffic and increase performance.
 All transactions are executed locally on the END;
RDBMS, so each SQL statement does not /
have to travel over the network.
 Stored procedures help reduce code This can be executed as:
duplication. It offers code sharing. It
EXEC PRC_EMP;
minimizes the errors and the cost of
application development and maintenance. ***

https://mguugcs.blogspot.in Page: 39
B.SC-Computer Science-III Year DBMS Study Material

Database Design
Information System:
An Information System is composed of People,
Hardware, Software, Databases, application
programs and Procedures. It provides for data
collection, storage, and retrieval. It also facilitates
to transform the data into useful information.
Databases are a part of an information system. So
that, for a successful database design, it is
important to study an information system. A
carefully designed development process is
mandatory for a better information system.

An Introduction to Systems Development Life


Cycle:
Systems analysis is used to determine the need for
an information system and to design and establish
its limits. The creation and evolution of information Planning
system is called the Systems Development Life The SDLC planning phase gives a general
Cycle (SDLC). It is a continuous process of overview of the company and its objectives.
creation, maintenance, enhancement, and This phase should answer some important
replacement of the information system. questions:
Should the existing system be continued?
Systems Development Life Cycle Should the existing system be modified?
Should the existing system be replaced?
The Systems Development Life Cycle (SDLC)
traces the life cycle of an information system. It is If it is decided that a new system is necessary, then
a continuous process of creation, maintenance, it verifies the feasibility of the system:It includes:
enhancement, and replacement of the information The hardware and software requirements:It
system. determines the hardware requirements (desktop
computer, multiprocessor computer, mainframe, or
A traditional SDLC is divided into 5 phases: supercomputer) and the software requirements
1) Planning (Type of operating systems, DBMS software, etc).
2) Analysis The system cost: It determines whether the system
3) Detailed systems design is economically feasible or not.
4) Implementation and The operational cost:. It determines whether the
5) Maintenance. skilled personnel are available or not.
The following figure shows the phases in SDLC: Analysis
The Analysis phase studies user requirements and
the existing systems. This phase also includes the
logical systems design. The logical design must
specify the conceptual data model, inputs,
processes, and expected outputs. For this, the
designer may use tools such as data flow diagrams
(DFDs), hierarchical input process output (HIPO)
diagrams, and entity relationship (ER) diagrams.

https://mguugcs.blogspot.in Page: 40
B.SC-Computer Science-III Year DBMS Study Material

Detailed Systems Design: The following figure shows the phases in DBLC:

In this phase, the designer completes the detailed


system design. It includes the technical
specifications for the screens, menus, reports, and
other useful devices. The Training principles and
methodologies are also planned.

Implementation:
During the implementation phase, the hardware,
DBMS software, and application programs are
installed, and the database design is implemented.
This phase also performs the coding, testing, and
debugging. The system becomes fully operational
at the end of this phase.

Maintenance
While operating the system, the end users may
begin to request some changes in it. Those changes
generate 3 types of system maintenance activities:
Corrective maintenance: It can be done in response 2. Database design
to the systems errors.
This phase focuses on the design of the database.
Adaptive maintenance : It can be due to the changes
During this phase, it:
in the business environment.
Perfective maintenance: It can be done to enhance  Creates a conceptual design
the system.  Selects the DBMS Software
 Creates the logical design
*****  Creates the Physical design.
Data Base Life Cycle (DBLC) 3. Implementation and Loading: During this
The process of designing, implementing and phase, it:
evaluating the database in an information system is  Install the DBMS: The DBMS may be
known as Database Life Cycle (DBLC). The installed on a server.
Database Life Cycle (DBLC) is a continuous  Create the Database(s): It creates the
(iterative) process. The DBLC is contains six tables and other objetcs.
phases:  Load or Convert the Data: It loads the
1. Database initial study data into the database tables
2. Database design 4. Testing and Evaluation: During this step, the
3. Implementation and loading DBA tests and fine tunes the database.
4. Testing and evaluation
5. Operation 5. Operation:
6. Maintenance and evolution. This phase contains a complete information
system. It provides the database, its admins, its
1. The Database Initial Study: users, and its application programs.
The database initial study is to: 6. Maintenance and Evolution:
 Analyze the company situation. During this phase, the database administrator
 Define problems and constraints. performs routine maintenance activities.
 Define objectives.
****
 Define scope and boundaries.

https://mguugcs.blogspot.in Page: 41
B.SC-Computer Science-III Year DBMS Study Material

Database Design Strategies


There are two classical approaches for a database
design: 1. Top-Down Design
2. Bottom-Up Design
The following figure shows the two approaches:

Top-down design first defines the data sets and


then defines the data elements for each set.
 It first defines entities and then it defines the
attributes for each entity.
 A top-down approach may suits for the
large and complex databases.
Bottom-up design first identifies the data
elements and then groups them into data sets.
 It first defines attributes, and then groups
them into entities.
 A bottom-up approach may well suits for
small databases.

***

Centralized VS. Decentralized Design

Centralized design and Decentralized design are


two different philosophies for the database design.

Centralized design is good for the data


components with small number of objects and
procedures. Centralized design has small databases
and can be successfully done by a single person or
by a small.

Decentralized design is good for the data


components with more number of objects and
procedures. In a decentralized design, the database
design task is divided into several modules across
several sites. Later all modules are integrated into
one conceptual model.

****

https://mguugcs.blogspot.in Page: 42
B.SC-Computer Science-III Year DBMS Study Material

Unit-IV Serializability ensures that the schedule for the


concurrent execution of the transactions yields
Transaction Management and Concurrency consistent results.
Control
The Transaction Log
Transaction: The transaction log is a critical part of the database.
A transaction is any action that reads from and/or DBMS uses a transaction log to record all
writes to a database. transactions that update the database. It contains the
before values and after values of each transaction.
 A transaction is a logical unit of work that The information stored in the transaction log is
must be entirely completed or entirely useful for database recovery management.
aborted; no intermediate states are
acceptable. Concurrency Control

 A successful transaction changes the The coordination of the simultaneous execution of


database from one consistent state to
transactions in a multiuser database system is
another.
known as concurrency control.
Transaction Properties
Each transaction must have four very important The simultaneous execution of transactions can
properties. Those are: create several problems. The three main problems
1. Atomicity are:
2. Consistency  Lost updates
3. Isolation  Uncommitted data
4. Durability.  Inconsistent retrievals

These properties are known as the ACID test. In Lost Updates:


addition, the DBMS must have the property of The lost update problem occurs when two
serializability to control multiple transactions. concurrent transactions are updating the same data
Atomicity: Atomicity requires that all operations element.
of a transaction be completed. If not, the transaction
is aborted. Uncommitted Data:
The uncommitted data problem occurs when two
Consistency indicates the database’s consistent transactions are executed concurrently and the first
state. When a transaction is completed, the database transaction (T1) is rolled back after the second
must be in a consistent state. transaction (T2) has already accessed the
uncommitted data.
Isolation means that the data used during the
execution of a transaction cannot be used by a Inconsistent Retrievals:
second transaction until the first one is completed. Inconsistent retrievals occur when a transaction
This property is particularly useful in multiuser accesses data before and after another
database environments. transaction(s) finish working with such data.

Durability ensures that once transaction changes *****


are done (committed), they cannot be undone or
lost, even in the event of a system failure.

https://mguugcs.blogspot.in Page: 43
B.SC-Computer Science-III Year DBMS Study Material

Concurrency Control with Locking Methods tables The following figure shows a page level
lock:
A lock guarantees exclusive use of a data item to a
current transaction.
Lock Granularity
Lock granularity indicates the level of lock use. It
is of following levels:
 Database Level Locks
 Table Level Locks
 Page Level Locks
 Row Level Locks
 Field (attribute) Level Locks
Row Level
A row-level lock allows the transactions to access
Database Level
different rows of the same table. The following
In a database-level lock, the entire database is
figure illustrates row-level lock:
locked. It prevents other transactions to use any
tables in the database. The following figure shows
a database level lock:

Field Level
The field-level lock allows the transactions to
Table Level access different fields (attributes) within a row.
In a table-level lock, the entire table is locked. It Field-level locking requires high overhead. It can
prevents other transactions to use any row in the be rarely implemented.
tables. The following figure shows a database level
lock: ***
Lock Types
The DBMS may use different lock types, such as:
 Binary Locks
 Shared/exclusive.

Binary Locks
 A binary lock has only two states: locked
(1) or unlocked (0). It specifies that every
transaction requires a lock and unlock
operation for each of its data items.
Page Level  If a transaction locks any database object
In a page-level lock, an entire diskpage is locked. then the other transactions are not allowed
It prevents other transactions to use any page in the until it unlocks that object.

https://mguugcs.blogspot.in Page: 44
B.SC-Computer Science-III Year DBMS Study Material

Shared/Exclusive Locks The following figure shows the two-phase locking


Exclusive Locks protocol:
 An exclusive lock reserves access only for
the transaction that locked the object.
 An exclusive lock is issued when a
transaction wants to update (write) a data
item and only if no other locks are held on
the data item.

Shared Locks:
 A shared lock grants read access on the
object. A shared lock is issued when a
transaction wants to read data from the
database.
***
Deadlock
A database deadlock is caused when two or Two-phase locking increases the transaction
more transactions wait for each other to processing cost and might cause deadlocks.
unlock data.
***
*** Deadlocks
Two-Phase Locking Protocol A deadlock occurs when two transactions wait
indefinitely for each other to unlock data.
Two-phase locking protocol defines how For example, a deadlock occurs when two
transactions acquire and release locks. This transactions, T1 and T2, exist in the following
protocol has two phases in it. Those are: mode:
1. A growing phase T1 = access data items X and Y
2. A Shrinking Phase
T2 = access data items Y and X
Growing Phase:
The three basic techniques to control deadlocks are:
 In this phase a transaction gets all required
locks without unlocking any data.
Deadlock prevention: A transaction with a new
 Once all locks have been acquired, the
transaction is in its locked point. lock request is aborted when there is the possibility
Shrinking phase: that a deadlock can occur.
 In this phase a transaction releases all locks Deadlock detection: The DBMS periodically tests
and cannot obtain any new lock. the database for deadlocks. If a deadlock is found,
then that transaction is aborted and the other
The two-phase locking protocol is has the transaction continues.
following rules:
 Two transactions cannot have conflicting Deadlock avoidance: The transaction must obtain
locks. all of the locks it needs before it can be executed.
 No unlock operation can precede a lock
operation in the same transaction.
***
 No data are affected until all locks are
obtained.

https://mguugcs.blogspot.in Page: 45
B.SC-Computer Science-III Year DBMS Study Material

Concurrency Control with Time-Stamping 1. Read Phase


Methods: 2. Validation Phase
The time stamping approach assigns a unique time 3. Write Phase
stamp to each transaction. It gives an order for the
transactions.  During the read phase, the transaction reads
Time stamps must have two properties: the database and makes all its update
operations in a temporary update file.
 Uniqueness and
 Monotonicity.
 During the validation phase, the transaction
 Uniqueness ensures unique time stamp
is validated to ensure the integrity and
values
consistency of the database. If the
 Monotonicity ensures that time stamp
validation test is positive, then the
values always increase.
transaction goes to the write phase. If the
validation test is negative, the transaction is
There are two schemes to decide which transaction
restarted and the changes are rejected.
is rolled back and which should be continued:
•The wait/die scheme  During the write phase, the changes are
•The wound/wait scheme. permanently applied to the database.
***
The Wait/Die scheme: Database recovery
By using the Wait/Die Scheme: Database recovery restores a database from an
If the transaction requesting the lock is the older inconsistent) to a consistent state.
then, it will wait until the other transaction is Some Critical events can cause damage to a
completed and the locks are released. database. Examples of critical events are:
If the transaction requesting the lock is the younger,
it will die (roll back) and is rescheduled using the Hardware/software failures: A hard disk failure,
same time stamp. memory failure, etc are known as the Hardware
The wound/wait scheme: failures. The application program or operating
By using the Wound/Wait Scheme:
system errors are known as software errors.
 If the transaction requesting the lock is the
older then, it will preempt (wound) the Human-caused incidents: This type of event can
younger transaction. The younger be of unintentional or intentional.
transaction is rescheduled using the same
time stamp.  An unintentional failure is caused by
 If the transaction requesting the lock is the carelessness by end-users. Such errors
younger then, it will wait until the other include deleting the wrong rows from a
transaction is completed and the locks are table, pressing the wrong key on the
released. keyboard, or shutting down the main
database server by accident.
The disadvantage of the time stamping approach  Intentional events are of a more severe
is that it increases memory needs and it demands a nature and are very serious. The security
lot of system resources. threats caused by hackers and virus attacks
**** comes under this category
Concurrency Control with Optimistic Methods
In the optimistic approach a transaction is executed  Natural disasters: This category includes
without restrictions until it is committed. In this fires, earthquakes, floods, and power
failures.
approach each transaction moves into two or three
phases:

https://mguugcs.blogspot.in Page: 46
B.SC-Computer Science-III Year DBMS Study Material

Transaction Recovery 1. Identify the last checkpoint in the


Database transaction recovery is the process to transaction log.
recover a database from an inconsistent state to a
consistent state. 2. For a transaction that started and was
There are four important concepts that affect the committed before the last checkpoint, nothing
recovery process: needs to be done.
1) Write-ahead-log Protocol
2) Redundant transaction logs 3. For a transaction that performed a commit
3) Database Buffers operation after the last checkpoint, the DBMS uses
4) Database checkpoints the transaction log to update the database.
The write-ahead-log protocol ensures that
transaction logs are always written before any Write-through technique
update.
In a write-through technique the database is
Redundant transaction logs Maintains several immediately updated.
copies of the transaction log to ensure DBMS’s The recovery process in a write-through technique
ability to recover data. has the following steps:

Database buffers are temporary storage areas in 1. Identify the last checkpoint in the
primary memory. These are useful to speed up disk transaction log.
operations. It saves a lot of time. 2. For a transaction that started and was
committed before the last checkpoint,
For any transaction that had a ROLLBACK nothing needs to be done
operation after the last checkpoint, nothing needs to
be done. Because the database was never updated. 3. For a transaction that was committed after
Database checkpoints are operations in which the the last checkpoint, the DBMS redoes the
DBMS writes all of its updated buffers to disk. transaction.

Transaction recovery 4. For any transaction that had a ROLLBACK


operation after the last checkpoint, the
Transaction recovery procedures will use two
DBMS uses the transaction log to
techniques:
ROLLBACK the operations by using the
1. Deferred-Write technique “before” values in the transaction log.
2. Write-Through technique.
***
Transaction recovery procedures:
DDBMS
Transaction recovery procedures will use two Distributed Database Management Systems
techniques: Evolution of Distributed Database Management
1. Deferred-Write technique Systems:
2. Write-Through technique. The DBMS, which manages the storage and
processing of the database over interconnected
Deferred-Write technique computer systems is known as a Distributed
In a deferred–write technique only the transaction Database Management System (DDBMS).
log is updated.
The recovery process in a deferred-write technique  In a DDBMS environment, both data and
has the following steps: processing functions are distributed among
several sites.

https://mguugcs.blogspot.in Page: 47
B.SC-Computer Science-III Year DBMS Study Material

During the 1970s, organizations implemented Improved Increased training


centralized database management systems. communications cost: Training costs
 In a centralized database system, data can are higher.
be stored in a single central site and it can Reduced operating
be accessed through dumb terminals. costs: Development Costs: It require
The following figure illustrates the centralized work is done more more
approach: cheaply and more infrastructure,
quickly personnel, software,
etc.

Distributed Processing and Distributed


Databases:
The distributed processing system uses only a
single-site database but the processing can be
performed among several sites.
The centralized approach fails to offer faster
response times and quick access.

***
DDBMS Advantages and Disadvantages:

Advantages Disadvantages
Data are located Complexity of
near the greatest management and
demand site: control:

Faster data access: Technological


End users can access difficulty: Concurrency A distributed database stores its database over
the data in a faster control, security, several sites. Its database is composed of several
manner. backup, recovery, etc parts known as database fragments.
must be maintained. The following figure shows a distributed database
Faster data environment:
processing: In a Security: The
distributed database probability of security
system workload problems increases.
shared by several
sites. Lack of standards:
There are no standard
Growth communication
facilitation: New protocols.
sites can be added
easily. Increased storage
and infrastructure
requirements.

https://mguugcs.blogspot.in Page: 48
B.SC-Computer Science-III Year DBMS Study Material

Characteristics of Distributed Database Levels of data and process distribution


Management Systems
A DDBMS must have at least the following Current database systems can be classified into:
functions: I.Single-Site Processing, Single-Site Data (SPSD)
Application Interface to interact with the end user, II.Multiple-Site Processing, Single-Site Data
application programs and etc. (MPSD)
Validation to analyze for syntax correctness. III.Multiple-Site Processing, Multiple-Site Data
Transformation to decompose complex requests (MPMD)
into atomic level.
Query optimization to find the best access strategy. Single-Site Processing, Single-Site Data:
Mapping to determine the data location. In the single-site processing, single-site data
I/O interface to read or write data. (SPSD), all processing is done on a single computer
Formatting to prepare the data for presentation. and all data are stored on the local disk system.
Security to provide data privacy and security. In an SPSD scenario the DBMS can be accessed by
Backup and recovery to ensure the availability and dumb terminals.
recoverability in case of a failure. The following figure shows SPSD scenario:
DB administration features for the database
administrator.
Concurrency control to manage
simultaneous data access.
Transaction management to ensure that the data
moves from one consistent state to another.
****

DDBMS Components
The DDBMS must include at least the following
components:
Computer workstations that form the network
Multiple-Site Processing, Single-Site Data
system.
(MPSD):
Network hardware and software: The network
In the multiple-site processing, single-site data
components allow all sites to interact and exchange (MPSD) runs multiple processes on different
data. Computers by sharing a single data repository.
Communications media that carry the data from
In a multiple-site processing, single-site
one workstation to another.
data(MPSD) scenario, different computers share a
The Transaction Processor (TP) It is a software
single data repository. All data selection, search,
component. It receives and processes the
application’s data requests. and update functions take place at the workstation.
Processor (DP: It is a software component that The MPSD scenario requires a network file server.
stores and retrieves data located at the site. The following figure shows a MPSD scenario:
The following figure shows the DDBMS
components :

https://mguugcs.blogspot.in Page: 49
B.SC-Computer Science-III Year DBMS Study Material

Multiple-Site Processing, Multiple-Site Data Distribution transparency


(MPMD): Distribution transparency allows a distributed
The MPMD is a fully distributed DBMS. It database to be managed as like a centralized
supports multiple data processors and transaction database. There are three levels of distribution
processors at multiple sites. transparency:
The following figure shows heterogeneous  Fragmentation transparency: The end
distributed database scenario: user does not need to know the fragment
names or fragment locations.
 Location transparency: End user must
specify the database fragment names but
not the location.
 Local mapping transparency: End user
must specify both the fragment names and
their locations.

Suppose,
The EMPLOYEE data are distributed over three
different locations: New Delhi employee data are
stored in fragment E1, Mumbai employee data
are stored in fragment E2, and Chennai
employee data are stored in fragment E3, as
shown in the figure:
Distributed Database Transparency Features

Transparency features are the functional


characteristics that can hide the complexities in
using a DDBMS.
The DDBMS transparency features are:
1) Distribution transparency: It allows a
distributed database to be treated as a single
logical database.
Now to display all employees who born before
2) Transaction transparency: It allows a
January 1, 1990.
transaction to update data at more than one
network site. Then:
3) Failure transparency: If your node fails Case 1: The Database Supports Fragmentation
then DDBMS allows other nodes in the Transparency
network to fulfil your processing needs. In this case, it does not specify fragment names or
4) Performance transparency: It allows the locations.
system to perform like a centralized DBMS. Ex: SELECT * FROM EMPLOYEE WHERE
5) Heterogeneity transparency: It allows the EMP_DOB < ’01-JAN-1990’;
integration of several different DBMSs
under a common schema. Case 2: The Database Supports Location
Transparency
In this case, Fragment names must be specified in
the query:
Ex: SELECT * FROM E1 WHERE EMP_DOB
< ’01-JAN-1990’
UNION

https://mguugcs.blogspot.in Page: 50
B.SC-Computer Science-III Year DBMS Study Material

SELECT * FROM E2 WHERE EMP_DOB <


’01-JAN-1990’ ;
UNION
SELECT * FROM E3 WHERE EMP_DOB <
’01-JAN-1990’ ;

Case 3: The Database Supports Local Mapping


Transparency
In this case, both Fragment names and location
must be specified in the query:
Ex:
SELECT *FROM E1 NODE ND WHERE
EMP_DOB < ’01-JAN-1990’ ;
UNION
SELECT *FROM E2 NODE MB WHERE
EMP_DOB < ’01-JAN-1990’ ; ****
UNION
SELECT * FROM E3 NODE CN WHERE Distributed Concurrency Control:
EMP_DOB < ’01-JAN-1990’ ; Concurrency control is very important in a
distributed database environment. A two-phase
Transaction Transparency commit protocol ensures database consistency in a
Transaction transparency ensures integrity and DDBMS.
consistency. DDBMS requires complex
mechanisms to manage its transactions: Two-Phase Commit Protocol:
Distributed Requests and Distributed The two-phase commit protocol guarantees that if a
Transactions transaction operation cannot be committed, All
A distributed transaction allows a transaction to changes made at the other sites by that transaction
refer several different local or remote DP sites, as will be cancelled (undone) to maintain consistency.
shown below:
The protocol is implemented in two phases :
Phase 1: Preparation
The coordinator sends a PREPARE TO COMMIT
message to all subordinates.
1. The subordinates receive the message; write the
transaction log, using the write-ahead protocol;
and send an acknowledgment (YES/PREPARED
TO COMMIT or NO/NOT PREPARED)
message to the coordinator.
2. If all nodes are PREPARED TO COMMIT, the
transaction goes to phase 2.
If one or more nodes reply NO or NOT
PREPARED, the coordinator broadcasts an
ABORT message to all subordinates.

A distributed request allows a single SQL


statement to refer data located at several different
DP sites, as shown below:

https://mguugcs.blogspot.in Page: 51
B.SC-Computer Science-III Year DBMS Study Material

Phase 2: The Final COMMIT 3. The Type of Information:


1. The coordinator broadcasts a COMMIT Information mode can be classified as:
message to all subordinates and waits for the  Statistics based query optimization
replies.  Rule-based query optimization
2. Each subordinate receives the COMMIT A statistics based query optimization algorithm
message, and then updates the database using the uses statistical information about the database to
DO protocol. determine best access strategy
3. The subordinates reply with a COMMITTED
or NOT COMMITTED message to the coordinator. A rule-based query optimization algorithm uses
If one or more subordinates did not commit, the user-defined rules to determine the best query
coordinator sends an ABORT message and UNDO access strategy.
all changes.
Performance Transparency and ***
Query Optimization
Client/Server VS. DDBMS
In a Client/server architecture the client (TP)
The DDBMS uses query optimization
interacts with the end user and sends a request to
techniques/algorithms to ensure database
the server (DP). The server receives, schedules, and
performance.
executes the request, selecting only those records
A query optimization algorithm can be classified on
that are needed by the client.
the basis of:
Client/server applications offer several
(1). Operation mode.
advantages:
(2). Timing Mode.
 Client/server solutions are less expensive.
(3). Type of Information.
 Client/server solutions improving their
functionality and simplicity.
1. Operation Mode:  More people in the job market have PC
Operation modes can be classified as: skills.
 Automatic query optimization  The PC is well established in the workplace.
 Manual query optimization  Numerous data analysis and query tools
Automatic query optimization means that the exist to facilitate interaction with many of
DDBMS finds the best optimization for execution. the DBMSs
Client/server applications have some
Manual query optimization means that the disadvantages:
User/Programmer needs to find the best  The client/server architecture
optimization for execution. creates a more complex
environment
2. Timing Mode:  Security problems.
Timing modes can be classified as:  The burden of training increases the
 Static query optimization cost of maintaining the
environment.
 Dynamic query optimization ****

Static query optimization


It selects the best optimization strategy at
compilation time.

Dynamic query optimization It selects the best


optimization strategy at run-time (execution time).

https://mguugcs.blogspot.in Page: 52
B.SC-Computer Science-III Year DBMS Study Material

Distributed database design: Each fragment has unique columns. The primary
A distributed database design includes three new key column is common to all fragments.
issues: This is the equivalent of the PROJECT statement in
1. Data Fragmentation SQL.
2. Data Replication
3. Data Allocation The Vertical fragmentation on CUSTOMER yields
the following fragments:
Data fragmentation—Dividing a database into
two or more fragments is known as Data CUST_SERVICES_V1
Cust_I Cust_Name City State
Fragmentation. It is three types: D
1. Horizontal fragmentation
1001 ARUN HYDERABAD TELANGANA
2. Vertical fragmentation
1002 BHARATH MUMBAI MAHARASTRA
3. Mixed fragmentation.
1003 CHARITHA WARANGAL TELANGANA

Consider the following table: CUSTOMER 1004 DHARANI VIZAG ANDHRARADESH

1005 EESHWAR PUNE MAHARASTRA

1006 FAYAZ AMARAVATHI ANDHRAPRADESH

CUST_COLLECTIONS_V2
Cust_ID Loan_Amount Due

1001 100000 25000

1002 400000 250000

1003 200000 15000

1004 400000 250000

Horizontal fragmentation: The division of a 1005 100000 25000


relation into rows is known as Horizontal
1006 300000 100000
fragmentation. Each fragment has unique rows.

The horizontal fragmentation on CUSTOMER Mixed fragmentation: It is a combination of


based on STATE yields the following 3 fragments: horizontal and vertical fragmentations. First It
divides a table into fragments of rows, Then each
fragment can be divided into fragments of columns.

CUST_SERVICES_M1
Cust_ID Cust_Name City State

1001 ARUN HYDERABAD TELANGANA

1003 CHARITHA WARANGAL TELANGANA

CUST_COLLECTIONS_M2
Cust_ID Loan_Amount Due

1001 100000 25000

1003 200000 15000


Vertical fragmentation: The division of a relation
into columns is known as Vertical fragmentation.

https://mguugcs.blogspot.in Page: 53
B.SC-Computer Science-III Year DBMS Study Material

CUST_SERVICES_M3 Client / Server vs. DDBMS:


Cust_ID Cust_Name City State Because the trend toward distributed
1002 BHARATH MUMBAI MAHARASTRA databases is firmly established, many database
vendors have used the “client/server” label to
1005 EESHWAR PUNE MAHARASTRA indicate distributed database capability. However,
distributed databases do not always accurately
CUST_COLLECTIONS_M4
Cust_ID Loan_Amount Due
reflect the characteristics implied by the
client/server label.
1002 400000 250000 Client/server architecture refers to the way
1005 100000 25000 in which computers interact to form a system. The
client/server architecture features a user of
CUST_SERVICES_M5 resources, or a client, and a provider of resources,
Cust_ID Cust_Name City State or a server. The client/server architecture can be
1004 DHARANI VIZAG ANDHRARADESH used to implement a DBMS in which the client is
1006 FAYAZ AMARAVATHI ANDHRAPRADESH
the TP and the server is the DP.
Client/server applications offer several
CUST_COLLECTIONS_M6 advantages.
Cust_ID Loan_Amount Due  Client/server solutions tend to be less
expensive than alternate midrange
1004 400000 250000
computer or mainframe solutions in terms
1006 300000 100000 of startup infrastructure requirements.
 Client/server solutions allow the end user to
Data Replication: use the PC’s GUI, thereby improving
Storing the fragments at multiple sites is known as functionality and simplicity. In particular,
Data Replication. It stores the fragment copies at using the ubiquitous Web browser in
several sites. It enhances data availability and conjunction with Java and .NET
response time. frameworks provides a familiar end-user
interface.
 More people in the job market have PC
skills than mainframe skills. The majority
of current students are learning Java and
.NET programming skills.
 The PC is well established in the workplace.
In addition, the increased use of the Internet
as a business channel, coupled with security
advances (SSL/TLS, virtual private
Data Allocation networks multifactor authentication, etc.)
The process of deciding where to locate data is provide a more reliable and secure platform
known as Data Allocation. There are different types for business transactions.
of Data allocation:  Numerous data analysis and query tools
Centralized data allocation: It allocates the entire exist to facilitate interaction with many of
database at one site. the DBMSs that are available in the PC
Partitioned data allocation: It allocates the market.
database at two or more sites.  There is a considerable cost advantage to
Replicated data allocation: It allocates the offloading applications development from
database at several sites. the mainframe to powerful PCs.

***

https://mguugcs.blogspot.in Page: 54
B.SC-Computer Science-III Year DBMS Study Material

Client/server applications are also subject to some


disadvantages.
 The client/server architecture creates a
more complex environment in which
different platforms (LANs, operating
systems, and so on) are often difficult to
manage.
 An increase in the number of users and
processing sites often paves the way for
security problems.
 The client/server environment makes it
possible to spread data access to a much
wider circle of users. Such an environment
increases the demand for people with a
broad knowledge of computers and
software applications. The burden of
training increases the cost of maintaining
the environment.

***

https://mguugcs.blogspot.in Page: 55
B.SC-Computer Science-III Year DBMS Study Material

Unit-V BI architecture
BI architecture is composed of data, people,
The need for data analysis processes, technology, and the management of
Most of the managers want to monitor daily such components.
transactions to see how the business is performing. The following figure shows all those components
This kind of data analysis provide information such within the BI framework:
as:
 Are our sales promotions working?
 What market percentage are we
controlling?
 Are we attracting new customers?
The business climate is dynamic. The firms should
react fast to the changes in order to remain
competitive. So that, there is a need for decision
support systems.
Different managerial levels require different
decision support needs. Changes in the business
world had shown new ways of managing the data.
This new decision support framework became
known as Business Intelligence. BI environment should provide the following four
*** components:
Business Intelligence 1. ETL Tools
2. Data Store
 Business intelligence (BI) is a term used to 3. Data query and analysis tools
describe the set of tools and processes that 4. Data presentation and visualization tools.
efficiently extract useful information to
support decision making. ETL Tools:
ETL refers to Extraction,
 BI is a framework that allows a business to Transformation and Loading tools. These are
transform data into information, used for collecting, filtering, integrating, and
information into knowledge, and aggregating operational data to load it into a data
knowledge into wisdom. store.
Data Store:
 BI has the potential to create “business
The data store is meant for decision support. It is
wisdom”. This business wisdom can be
useful to make sound business decisions. generally represented by a data warehouse or a data
BI involves the following general steps: mart.
1. Collecting and storing operational data Data Query and Analysis tools:
2. Aggregating the operational data into decision This component performs data retrieval, data
support data analysis, and data mining tasks. This component is
3. Analyzing decision support data to generate known as OLAP (On-Line Analytical Processing)
information tool.
4. Presenting the information to the end users. Data Presentation and Visualization tools:
5. Making business decisions. This component is responsible for presenting the
6. Monitoring results. data to the end user. The query tool and the
presentation tool are the front end to the BI
environment.
***

https://mguugcs.blogspot.in Page: 56
B.SC-Computer Science-III Year DBMS Study Material

Operational Data vs. Decision Support Data Subject-oriented: Data warehouse data are
Decision support data differ from Operational data organized and summarized by topic. For each
in three main areas: topic, the data warehouse contains specific
I. Time span subjects—products, customers, departments,
II. Granularity regions, and so on.
III. Dimensionality.
Time span: Time-variant: Data warehouse data is time-
Operational data cover a short time frame. Decision variant. The data warehouse contains a time ID
support data covers a longer time frame. that is used to generate summaries and
Granularity (level of aggregation): aggregations by week, month, quarter, year, and
Decision support data must be presented at so on.
different levels of aggregation from highly
Nonvolatile: The data in a data warehouse are
summarized to near-atomic. It allows drill down /
never removed. Because data are never deleted
roll up the data.
Dimensionality: and new data are continually added, the data
Operational data focus on individual transactions. warehouse is always growing.
But the Decision support data can have different
dimensions.

The following are the differences between


Operational data and Decision support data:
 Operational data represent current data.
Decision support data represent historic
data.
 Operational data is for update transactions.
Decision support data is for query (read-
only) transactions.
 Operational data are commonly stored in
many tables. Decision support data are
generally stored in a few tables.
 The operational database requires
normalized tables. But decision support Figure: Creating a DataWarehouse
database is non-normalized.
 Finally, decision support data can have very  In summary, the data warehouse is usually
large amounts of data. a read-only database optimized for data
analysis and query processing.
Data Mart
The Data Warehouse  Creating a data warehouse requires more
Bill Inmon, known as “father” of the data
time and money. Therefore, many
warehouse. He defined the Data warehouse as : companies focused on easily manageable
“an integrated, subject-oriented, time-variant, data sets. These data sets can serve the
needs of small groups within the
nonvolatile collection of data that provides
organization. These smaller data stores are
support for decision making.” called data marts.
Integrated: The data warehouse is a centralized  A data mart is a small, single-subject data
warehouse subset. It can provide decision
database that integrates data of different formats. support to a small group of people.
It brings all the data into a common standard.

https://mguugcs.blogspot.in Page: 57
B.SC-Computer Science-III Year DBMS Study Material

 A data mart can also be created from a 12. The data warehouse contains a chargeback
larger data warehouse to support faster data mechanism for resource usage.
access to a group.
 Constructing a data mart reduces the costs ***
and time. It is possible to migrate gradually
from data marts to data warehouses. OnLine Analytical Processing ( OLAP)
Information technology (IT) departments
also benefit from this approach because On-Line Analytical Processing (OLAP), create an
their personnel have the opportunity to advanced data analysis environment that supports
learn the issues and develop the skills decision making.
required to create a data warehouse. OLAP systems share four main characteristics:
 The only difference between a data mart  Multidimensional data analysis techniques.
and a data warehouse is the size and scope.  Advanced database support.
***  Easy-to-use end-user interfaces.
Twelve Rules that define a DWH  Client/Server architecture.

In 1994, W.H. Inmon and Chuck Kelley created Multidimensional data analysis techniques:
12 rules that defines a data warehouse, Modern OLAP tools have the capability for
multidimensional analysis. It allows end users to
1 The data warehouse and operational aggregate data at different levels.
environments are separated.
Advanced Database Support:
2. The data warehouse data are integrated. OLAP tools must have advanced data access
features, such as:
3. The data warehouse contains historical data
 Access to different kinds of DBMSs, files
of a long time. and external data sources.
4. The data warehouse data are time variant.  Access to data warehouse data as well as the
operational databases.
5. The data warehouse data are subject  Advanced features such as drill-down and
oriented. roll-up.
.
6. The data warehouse data are mainly read- Easy-to-Use End-User Interface
only. OLAP tools designed with easy-to-use GUI. This
makes OLAP easily accepted and readily used.
7.The data warehouse development life cycle
differs from SDLC. Client/Server Architecture
8. The data warehouse contains data with The client/server environment enables an OLAP
several levels of detail system to be divided into several components.
Those components can then be placed on the same
9.The data warehouse environment is computer, or they can be distributed among several
characterized by read-only transactions. computers.
***
10. The data warehouse environment has a
system that traces data sources,
transformations and storage.
11. The data warehouse’s metadata identify
and define all data elements.

https://mguugcs.blogspot.in Page: 58
B.SC-Computer Science-III Year DBMS Study Material

Star Schema The following table illustrates some possible


The star schema is a data modeling technique. It attributes for each dimension in Sales example:
DIMENSION POSSIBLE
is used to map multidimensional decision support
NAME DESCRIPTION ATTRIBUTES
data into a relational database. The purpose of a star
Location It provides a Region, State,
schema is performing advanced data analysis.
description of the City, Store and
The star schema has four components:
location. so on.
 Facts
Product It provides a Product ID,
 Dimensions
description of the description,
 Attributes products. Color, Size, and
 Attribute hierarchies. so on.
Time It provides a time Year, Quarter,
Facts: frame for the Month, Week,
Facts are numeric values that represent a business Sales fact. Day, Time of the
aspect. day, and so on.
Commonly used facts are: units, costs, prices,
and revenues. Attribute Hierarchy
Attributes within dimensions can be organized in a
Facts are normally stored in a fact table. A fact hierarchy. The attribute hierarchy provides a top-
table is the center of the star schema. The fact table down data organization. This hierarchy can be used
contains facts that are linked through their for two main purposes:
dimensions. The computed or derived facts are I. Aggregation and
sometimes called Metrics. II. Drill-Down/Roll-Up data analysis.
For example, Location dimension attributes can be
Dimensions: organized in a hierarchy by Region, State, City, and
Dimensions are the characteristics that provide Store as shown in the figure, below:
additional views to a given fact. Dimensions are
normally stored in dimension tables. For example:
Sales have product, location, and time dimensions.
Dimension Tables in a Star-Schema are De-
Normalized in nature.
The following is the structure of a Star-Schema:

The attribute hierarchy provides the capability to


perform drill-down and roll-up searches in a data
warehouse. The attribute hierarchy information is
stored in the DBMS’s data dictionary and is used
by the OLAP tool to access the data warehouse
properly.
An example Star Schema for SALES with
PRODUCT, STORE (location), and
PERIOD(time) dimensions:
Attributes:
Each dimension table contains attributes. Attributes
can be used to search, filter the facts.

https://mguugcs.blogspot.in Page: 59
B.SC-Computer Science-III Year DBMS Study Material

Data mining process involves four phases:


I. Data preparation.
II. Data analysis and classification.
III. Knowledge acquisition.
IV. Prognosis.

Data preparation phase: In this phase the main


data sets are identified and cleansed. The data
warehouse is the target set for data mining
operations.
Data analysis and classification phase: This
*** phase uses some algorithms to find:
 Data groupings, classifications, clusters, or
sequences.
Data Mining
 Data dependencies, links, or relationships.
Data mining refers to the activities that analyze
the data, discover the opportunities hidden in the  Data patterns, trends, and deviations.
data relationships and using them to predict
business behavior. Knowledge acquisition phase: This phase uses the
Data mining tools use ‘knowledge discovery’, results of the data analysis and classification phase.
‘Artificial Intelligence’ to obtain knowledge. This During this phase, it selects the appropriate
knowledge is then used to make forecasts of values modeling or knowledge acquisition algorithms.
such as Sales returns. Prognosis phase: In this phase, the data mining
The following pyramid represents how knowledge findings are used to predict future behavior and
is extracted from data. forecast business outcomes.
***
SQL Extensions for OLAP
The rise of OLAP tools has raised the development
of SQL extensions to support multidimensional
data analysis. SQL provides two extensions to the
GROUP BY clause that are particularly useful:
ROLLUP and CUBE.
The ROLLUP Extension:
The ROLLUP extension is used with the
GROUP BY clause to generate aggregates by
different dimensions. The Syntax of the GROUP
BY ROLLUP is as follows:
Data form the pyramid base and represent
operational databases. The second level contains SELECTcolumn_list aggregate_function
information that represents the purified and (expression) FROM table List
processed data. Information forms the basis for
[WHERE condition]
decision making. Knowledge is found at the
pyramid’s apex. It represents highly specialized GROUP BY ROLLUP (column1, column2 [, ...])
information.
[HAVING condition] [ORDER BY column1 [,
column2, _]]

https://mguugcs.blogspot.in Page: 60
B.SC-Computer Science-III Year DBMS Study Material

Example: The data user applies intelligence to data to produce


information. Information is the basis of knowledge
SELECT V_CODE, P_CODE, SUM(Units *
used in decision making by the user. To manage
Price) as “Total Sales” FROM SALES
data as a corporate asset, manager must understand
GROUP BY ROLLUP(V_CODE, P_CODE) the values of information that is, processed data.
***
ORDER BY V_CODE, P_CODE;
The need for and role of a database in an
organization
In every organization the database main role is to
The CUBE Extension: support managerial decision at all levels by keeping
The CUBE extension is also used with the data privacy and security.
GROUP BY clause to generate aggregates by the listed An organization’s managerial structure might be
columns, including the last one. The syntax of the divided into three levels:
GROUP BY CUBE is as follows:  Top level management
 Middle level management
SELECTcolumn_list aggregate_function (expression)
 Operational management
FROM table List [WHERE condition]

GROUP BY CUBE (column1, column2 [, ...]) Top level management makes strategic
decisions.
[HAVING condition] [ORDER BY column1 [,
Middle level management makes tactical
column2, _]]
decisions.
Example: Operational management makes daily
operational decisions.
SELECT V_CODE, P_CODE, SUM(Units * Price) as “Total
Sales” FROM SALES
At the top level management , the database must
GROUP BY ROLLUP(MONTH, P_CODE) be able to do the following:
1. Provide the information for strategic decision
ORDER BY MONTH, P_CODE;
making, planning, and goals.
*** 2. Provide access to external and internal data.
3. Provide a framework for defining and enforcing
Data as a corporate asset
organizational policies.
Data is a valuable asset that requires careful
4. Provide feedback to monitor whether the
management. The Data can be translated into
company is achieving its goals.
information. An organization is subject to a data-
information-decision cycle.
At Middle level management, the database must
be able to do the following:
1) Deliver the data for tactical decisions and
planning.
2) Provide a framework for enforcing and ensuring
the data security.
3) Security means protecting the data against
unauthorized users.
4) Privacy deals with the rights of individuals.

Figure: Data-Information-Decision Cycle

https://mguugcs.blogspot.in Page: 61
B.SC-Computer Science-III Year DBMS Study Material

At the Operational level management the 2. View definition:


database must do the following: The DBA must define data views to protect the
1) The database must represent and support the physical tables. A view is a logical representation
company operations as closely as possible. of one or more tables. They are also called virtual
2) The database has to produce query results within table. The main advantage of view is it does not
specified performance levels. occupy space in memory because of view has no
c) The database enhances the company’s short term own data.
operational ability. The SQL command CREATE VIEW is used in
*** relational database to define views. Example: to
create a view for clerks the following command is
Database Security
used.
Protecting the data in the database is a function of
SQL> create view emp_clerk as select * from emp
authorization management. It defines procedures to
where job=’CLERK’;
protect database security and integrity. Those
3. DBMS access control:
procedures include:
Database access can be controlled by placing limits
 User access management on the use of DBMS query and reporting tools.
 View definition 4. DBMS Usage Monitoring:
 DBMS access control The DBA must audit the use of the data in the
 DBMS usage monitoring database. For this it uses the audit log. An audit log
automatically records a brief description of the
1. User access management: database operations performed by all users.
This function is designed to limit access to the ***
database. It includes the following procedures: Database Administration Tools
 Defining each user to the database: This Database Administration tools are:
is achieved at the operating system level 1) Data Dictionary
and at the DBMS level. At the operating 2) CASE Tools.
system level, the DBA will create a user Id 1. Data Dictionary:
to log on to the computer system. At the
Data Dictionary is a DBMS component which
DBMS level, the DBA can create a user Id
stores the metadata, that is data about the data.
to access the DBMS.
The two main types of data dictionaries are a)
 Assigning passwords to each user: This
can also be done at both operating system Integrated b) Standalone
and DBMS levels. The database passwords An integrated data dictionary is included with the
can be assigned with predetermined DBMS.
expiration dates. The older DBMS do not have a built-in data
dictionary so the DBA may use standalone data
 Assigned access privileges: DBA assigns dictionary.
access privileges to users to access Data dictionaries are also classified as active or
specified database. Access privileges in passive
relational database are managed through An active data dictionary is automatically
SQL GRANT and REVOKE commands. updated by the DBMS with every database access.
A passive data dictionary is not updated
 Physical security: It can prevent automatically. It requires a batch process running.
unauthorized users from directly accessing The data dictionary stores the following
the DBMS and its resources. Common information of elements.
physical security practices are electronic
1. It contains the data elements names, data types,
personnel badges, closed circuit video or
display format, internal storage format and
biometric technology etc.
validation rules.

https://mguugcs.blogspot.in Page: 62
B.SC-Computer Science-III Year DBMS Study Material

2. The data dictionary stores the name of the table 3. Testing and evaluating databases and
creator, the date of creation and the number of applications.
columns. 4. Operating the DBMS, utilities and
3. The data dictionary stores the information about applications
indexes. 5. Training and supporting users.
4. The data dictionary also stores the relationships 6. Maintaining the DBMS, utilities and
among data elements. applications.
The DBA can use the data dictionary to support
data analysis and design. So the data dictionary can
be used to support a wide range of data 1. Evaluating, selecting and installing the DBMS
administration activities. and Utilities:
DBA’s first and most important technical
2. CASE Tools: responsibilities is selecting the database
CASE stands for Computer Aided Systems management system, utility software and
Engineering. CASE tool provides an automated supporting hardware to be used in the organization.
framework for the systems development life DBA must develop a plan for evaluating and
cycle(SDLC). CASE tools play an important role in selecting the DBMS based on the organization’s
information systems development. needs.
 CASE uses structured methodologies and
powerful graphical interfaces. 2. Designing and Implementing Databases and
 CASE tools provide support for the Applications:
planning analysis and design phases. The DBA function also provides data modeling and
 Backend CASE tools provide support for design services to end users. The primary activities
the coding and implementation phases. of a DBA are to determine and enforce standards
 The CASE data dictionary stores data flow and procedures to be used. It also involves the
diagrams (DFDS), structure charts, physical design, including storage space
descriptions of all external and internal determination and creation, data loading,
entities, data stores, data items, report conversion.
formats and screen formats.
A CASE tool provides 5 components. 3. Testing and Evaluating Databases and
 Graphics. Applications:
 Screen painters and report generators The DBA, must also provide testing and evaluation
 An integrated repository. services for all the database and end user
 An analysis segment. applications. Testing usually starts with the loading
 A program documentation generator. of the testbed database. Testbed database contains
The database and application designers use the test data for the applications. Testbed database’s
CASE tools to store the description of the database purpose is to check the data definition and integrity
schema, data elements, etc. rules of the database and application programs.
****
4. Operating the DBMS, Utilities and
The DBA’s Technical Roles Applications:
The DBA’s technical role include the following: DBMS operations can be divided into 4 main areas:
1. Evaluating, selecting and installing the a. Systemsupport
DBMS and Utilities. b. Performance monitoring and tuning
2. Designing and implementing databases and c. Backup and recovery
applications d. Security auditing and monitoring

https://mguugcs.blogspot.in Page: 63
B.SC-Computer Science-III Year DBMS Study Material

5. Training and supporting users:


DBA’s another technical activity is to train people
how to use the DBMS and its tools.DBA’s also
provides technical training for application
programmer how to use DBMS and its utilities.

6. Maintaining the DBMS, Utilities and


Applications:
DBMS maintenance includes management of the
physical or secondary storage devices. One of the
most common maintenance activities is
reorganizing the physical location of data in the
database. Maintenance activities also include
upgrading the DBMS and utility software.

*****

https://mguugcs.blogspot.in Page: 64

You might also like