You are on page 1of 44

DBMS

Co1
1. Main Characteristics of the Database Approach
a. Allowing a set of concurrent users to retrieve from and to update the
database.
b. Concurrency control within the DBMS guarantees that each transaction is
correctly executed or aborted
c. Recovery subsystem ensures each completed transaction has its effect
permanently recorded in the database
d. OLTP (Online Transaction Processing) is a major part of database applications.
This allows hundreds of concurrent transactions to execute per second.
2. Advantages of DBMS
Controlling Redundancy: Duplication of data is almost avoided but not completely.
Avoiding Inconsistency: When the same data is duplicated and changes are made at
one site, which is not propagated to the other site, it gives rise to inconsistency and the two
entries regarding the same data will not agree
Providing Storage Structures (e.g. indexes) for efficient Query Processing
Data can be shared.
Data Independence.
Concurrent access.
Providing Backup and Recovery from crashes .
Restricting unauthorized access
Solving enterprise requirement than individual requirement
Providing multiple interfaces to different classes of users.

Integrity can be enforced: Integrity of data means that data in database is always
accurate, such that incorrect information cannot be stored in database
Drawing inferences and actions from the stored data using deductive and active rules

3. Disadvantages of DBMS
-> High Cost - Hardware/software, staff training, Data Conversion, maintenance
 Higher impact of a failure.
 Complexity - extremely complex software, knowledge
 Frequent updation 
 Performance - for the small scale firms, DBMS does not give a good
performance
4. When not to use a DBMS
 If the database and applications are simple, well defined, and not expected
to change.
 If access to data by multiple users is not required.
5. Different types of Database Users

1. Database Administrator (DBA) :


Database Administrator (DBA) is a person/team who defines the schema and also
controls the 3 levels of database.
The DBA will then create a new account id and password for the user if he/she need to
access the data base.
DBA is also responsible for providing security to the data base and he allows only the
authorized users to access/modify the data base.
 DBA also monitors the recovery and back up and provide technical support.
 The DBA has a DBA account in the DBMS which called a system or
superuser account.
 DBA repairs damage caused due to hardware and/or software failures.

2. Naive / Parametric End Users :


Parametric End Users are the unsophisticated who don’t have any DBMS knowledge
but they frequently use the data base applications in their daily life to get the desired
results.
For examples, Railway’s ticket booking users are naive users. Clerks in any bank is a
naive user because they don’t have any DBMS knowledge but they still use the
database and perform their given task.

3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They
check whether all the requirements of end users are satisfied.

4. Sophisticated Users :
Sophisticated users can be engineers, scientists, business analyst, who are familiar
with the database. They can develop their own data base applications according to their
requirement. They don’t write the program code but they interact the data base by
writing SQL queries directly through the query processor.

5. Data Base Designers :


Data Base Designers are the users who design the structure of data base which
includes tables, indexes, views, constraints, triggers, stored procedures. He/she
controls what data must be stored and how the data items to be related.

6. Application Program :
Application Program are the back end programmers who writes the code for the
application programs.They are the computer professionals. These programs could be
written in Programming languages such as Visual Basic, Developer, C, FORTRAN,
COBOL etc.

7. Casual Users / Temporary Users :


Casual Users are the users who occasionally use/access the data base but each time
when they access the data base they require the new information, for example, Middle
or higher level manager.

6. Data Abstraction in DBMS

Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding irrelevant
details from user is called data abstraction.

We have three levels of abstraction:


Physical level: This is the lowest level of data abstraction. It describes how data is actually stored in
database. You can get the complex data structure details at this level.

Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data
is stored in database.

View level: Highest level of data abstraction. This level describes the user interaction with database
system.

Example: Let’s say we are storing customer information in a customer table. At physical level these
records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These
details are often hidden from the programmers.

At the logical level these records can be described as fields and attributes along with their data
types, their relationship among each other can be logically implemented. The programmers generally
work at this level because they are aware of such things about database systems.

At view level, user just interact with system with the help of GUI and enter the details at the screen,
they are not aware of how the data is stored and what data is stored; such details are hidden from
them.
7. Data Model

The Data Model is defined as an abstract model that organizes data description, data semantics, and consistency
constraints of data. The data model emphasizes on what data is needed and how it should be organized instead
of what operations will be performed on data. Data Model is like an architect's building plan, which helps to
build conceptual models and set a relationship between data items.

The two types of Data Modeling Techniques are

Skip Ad

1. Entity Relationship (E-R) Model


2. UML (Unified Modelling Language)

Types of Data Models

Types of Data Models: There are mainly three different types of data models: conceptual data models, logical
data models, and physical data models, and each one has a specific purpose. The data models are used to
represent the data and how it is stored in the database and to set the relationship between data items.

1. Conceptual Data Model: This Data Model defines WHAT the system contains. This model is
typically created by Business stakeholders and Data Architects. The purpose is to organize, scope and
define business concepts and rules.
2. Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This
model is typically created by Data Architects and Business Analysts. The purpose is to developed
technical map of rules and data structures.
3. Physical Data Model: This Data Model describes HOW the system will be implemented using a
specific DBMS system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.

Conceptual Data Model

The 3 basic tenants of Conceptual Data Model are

 Entity: A real-world thing


 Attribute: Characteristics or properties of an entity
 Relationship: Dependency or association between two entities

Characteristics of a conceptual data model


 Offers Organisation-wide coverage of the business concepts.
 This type of Data Models are designed and developed for a business audience.
 The conceptual model is developed independently of hardware specifications like data storage capacity,
location or software specifications like DBMS vendor and technology. The focus is to represent data as
a user will see it in the "real world."

Logical Data Model

At this Data Modeling level, no primary or secondary key is defined. At this Data modeling level, you need to
verify and adjust the connector details that were set earlier for relationships.

Characteristics of a Logical data model

 Describes data needs for a single project but could integrate with other logical data models based on the
scope of the project.
 Designed and developed independently from the DBMS.
 Data attributes will have datatypes with exact precisions and length.
 Normalization processes to the model is applied typically till 3NF.

Physical Data Model


Characteristics of a physical data model:

 The physical data model describes data need for a single project or application though it maybe
integrated with other physical data models based on project scope.
 Data Model contains relationships between tables that which addresses cardinality and nullability of the
relationships.
 Developed for a specific version of a DBMS, location, data storage or technology to be used in the
project.
 Columns should have exact datatypes, lengths assigned and default values.
 Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.

8. What is Data Independence of DBMS?

Data Independence is defined as a property of DBMS that helps you to change the Database schema at one level
of a database system without requiring to change the schema at the next higher level. Data independence helps
you to keep data separated from all programs that make use of it.

You can use this stored data for computing and presentation. In many systems, data independence is an essential
function for components of the system.

 Physical Data Independence

Physical data independence helps you to separate conceptual levels from the internal/physical levels. It allows
you to provide a logical description of the database without the need to specify physical structures. Compared to
Logical Independence, it is easy to achieve physical data independence.

With Physical independence, you can easily change the physical storage structures or devices with an effect on
the conceptual schema. Any change done would be absorbed by the mapping between the conceptual and
internal levels. Physical data independence is achieved by the presence of the internal level of the database and
then the transformation from the conceptual level of the database to the internal level.
 Logical Data Independence

Logical Data Independence is the ability to change the conceptual scheme without changing

1. External views
2. External API or programs

Any change made will be absorbed by the mapping between external and conceptual levels.

When compared to Physical Data independence, it is challenging to achieve logical data independence

9. Database Language

o A DBMS has appropriate languages and interfaces to express database queries and updates.

o Database languages can be used to read, store and update the data in the database.

Types of Database Language

 1. Data Definition Language

o DDL stands for Data Definition Language. It is used to define database structure or pattern.

o It is used to create schema, tables, indexes, constraints, etc. in the database.

o Using the DDL statements, you can create the skeleton of the database.

o Data definition language is used to store the information of metadata like the number of
tables and schemas, their names, indexes, columns in each table, constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.

o Alter: It is used to alter the structure of the database.

o Drop: It is used to delete objects from the database.

o Truncate: It is used to remove all records from a table.

o Rename: It is used to rename an object.


o Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they come under Data
definition language.

 2. Data Manipulation Language

DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a


database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.

o Insert: It is used to insert data into a table.

o Update: It is used to update existing data within a table.

o Delete: It is used to delete all records from a table.

o Merge: It performs UPSERT operation, i.e., insert or update operations.

o Call: It is used to call a structured query language or a Java subprogram.

o Explain Plan: It has the parameter of explaining data.

o Lock Table: It controls concurrency.

 3. Data Control Language

o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.

o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does not have the feature of
rolling back.)

 4.Transaction Control Language

TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical
transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.

o Rollback: It is used to restore the database to original since the last Commit.

10. Interfaces in DBMS


A database management system (DBMS) interface is a user interface which allows for the ability to input
queries to a database without using the query language itself.
User-friendly interfaces provide by DBMS may include the following:
1. Menu-Based Interfaces for Web Clients or Browsing –
These interfaces present the user with lists of options (called menus) that lead the user through the
formation of a request. Basic advantage of using menus is that they removes the tension of
remembering specific commands and syntax of any query language, rather than query is basically
composed step by step by collecting or picking options from a menu that is basically shown by the
system. Pull-down menus are a very popular technique in Web based interfaces. They are also
often used in browsing interface which allow a user to look through the contents of a database in
an exploratory and unstructured manner.
2. Forms-Based Interfaces –
A forms-based interface displays a form to each user. Users can fill out all of the form entries to
insert a new data, or they can fill out only certain entries, in which case the DBMS will redeem
same type of data for other remaining entries. This type of forms are usually designed or created
and programmed for the users that have no expertise in operating system. Many DBMSs
have forms specification languages which are special languages that help specify such forms.
Example: SQL* Forms is a form-based language that specifies queries using a form designed in
conjunction with the relational database schema.b>
3. Graphical User Interface –
A GUI typically displays a schema to the user in diagrammatic form. The user then can specify a
query by manipulating the diagram. In many cases, GUI’s utilize both menus and forms. Most
GUIs use a pointing device such as mouse, to pick certain part of the displayed schema diagram.

4. Natural language Interfaces –


These interfaces accept request written in English or some other language and attempt to
understand them. A Natural language interface has its own schema, which is similar to the
database conceptual schema as well as a dictionary of important words.
The natural language interface refers to the words in its schema as well as to the set of standard
words in a dictionary to interpret the request. If the interpretation is successful, the interface
generates a high-level query corresponding to the natural language and submits it to the DBMS for
processing, otherwise a dialogue is started with the user to clarify any provided condition or
request. The main disadvantage with this is that the capabilities of this type of interfaces are not
that much advance.
5. Speech Input and Output –
There is an limited use of speech say it for a query or an answer to a question or being a result of a
request it is becoming commonplace Applications with limited vocabularies such as inquiries for
telephone directory, flight arrival/departure, and bank account information are allowed speech for
input and output to enable ordinary folks to access this information.
The Speech input is detected using a predefined words and used to set up the parameters that are
supplied to the queries. For output, a similar conversion from text or numbers into speech take
place.
6. Interfaces for DBA –
Most database system contains privileged commands that can be used only by the DBA’s staff.
These include commands for creating accounts, setting system parameters, granting account
authorization, changing a schema, reorganizing the storage structures of a databases.

11. What is Entity in Database?

An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For
example, in a school database, students, teachers, classes, and courses offered can be considered
as entities. All these entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. An entity set may contain entities with attribute
sharing similar values. For example, a Students set may contain all the students of a school; likewise
a Teachers set may contain all the teachers of a school from all faculties. Entity sets need not be
disjoint.
Entity-Set and Keys
Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.
For example, the roll_number of a student makes him/her identifiable among students.
 Super Key − A set of attributes (one or more) that collectively identifies an entity in an entity
set.
 Candidate Key − A minimal super key is called a candidate key. An entity set may have
more than one candidate key.
 Primary Key − A primary key is one of the candidate keys chosen by the database designer
to uniquely identify the entity set.

12.What is Attribute in DBMS?

Entities are represented by means of their properties, called attributes. All attributes have values.
For example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a student's
name cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.
Types of Attributes
 Simple attribute − Simple attributes are atomic values, which cannot be divided further. For
example, a student's phone number is an atomic value of 10 digits.
 Composite attribute − Composite attributes are made of more than one simple attribute. For
example, a student's complete name may have first_name and last_name.
 Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, average_salary in a department should not be saved directly in the database,
instead it can be derived. For another example, age can be derived from data_of_birth.
 Single-value attribute − Single-value attributes contain single value. For example −
Social_Security_Number.
 Multi-value attribute − Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.
These attribute types can come together in a way like −

 simple single-valued attributes


 simple multi-valued attributes
 composite single-valued attributes
 composite multi-valued attributes

13. Degree of Relationship


The degree of a relationship is the number of entity types that participate(associate) in a
relationship. By seeing an E-R diagram, we can simply tell the degree of a relationship i.e  the
number of an entity type that is connected to a relationship is the degree of that
relationship.

For example,  If we have two entity type ‘Customer’ and ‘Account’ and they are linked using the
primary key and foreign key. We can say that the degree of relationship is 2 because here two
entities are taking part in the relationship.

Based on the number of entity types that are connected we have the following degree of
relationships:

 Unary
 Binary
 Ternary
 N-ary

Unary (degree 1)
A unary relationship exists when both the participating entity type are the same. When such a
relationship is present we say that the degree of relationship is 1.

For example, Suppose in a classroom, we have many students who belong to a particular club-like
dance club, basketball club etc. and some of them are club leads. So, a particular group of student is
managed by their respective club lead. Here, the group is formed from students and also, the club
leads are chosen from students. So, the ‘Student’ is the only entity participating here. We can
represent this relationship using the E-R diagram as follows:

So, here we get the answer to our first question which was asked in the introduction section. Yes,
there can be only one entity type in a relationship and the minimum degree of a relationship can be
one.
Binary (degree 2)
A binary relationship exists when exactly two entity type participates. When such a relationship is
present we say that the degree is 2. This is the most common degree of relationship. It is easy to
deal with such relationship as these can be easily converted into relational tables.

For example, We have two entity type ‘Customer’ and ‘Account’ where each ‘Customer’ has an
‘Account’ which stores the account details of the ‘Customer’. Since we have two entity types
participating we call it a binary relationship. Also, one ‘Customer’ can have many ‘Account’ but each
‘Account’ should belong to only one ‘Customer’. We can say that it is a one-to-many binary
relationship. (Learn more about types of relationships in DBMS from here)

Ternary(degree 3)
A ternary relationship exists when exactly three entity type participates. When such a relationship is
present we say that the degree is 3. As the number of entity increases in the relationship, it becomes
complex to convert them into relational tables.

For example, We have three entity type ‘Employee’, ‘Department’ and ‘Location’. The relationship
between these entities are defined as an employee works in a department, an employee works at a
particular location. So, we can see we have three entities participating in a relationship so it is a
ternary relationship. The degree of this relation is 3.

N-ary (n degree)
An N-ary relationship exists when ‘n’ number of entities are participating. So, any number of
entities can participate in a relationship. There is no limitation to the maximum number of entities
that can participate. But, relations with a higher degree are not common. This is because the
conversion of higher degree relations to relational tables gets complex. We are making an E-R
model because it can be easily be converted into any other model for implementing the database.
But, this benefit is not available if we use higher degree relations. So, binary relations are more
popular and widely used. Though we can make a relationship with any number of entity types but
we don't do that.
We represent an N-ary relationship as follows:

In the above example, E1 denotes the first entity type, E2 denotes the second entity type and so on.
R represents the relationship. So, here we have a total of 5 entity type which participates in the
relationship. Therefore, the degree of the above n-ary relationship is 5.

14. Types of Cardinality Ratios-


 
There are 4 types of cardinality ratios-

1. Many-to-Many cardinality (m:n)


2. Many-to-One cardinality (m:1)
3. One-to-Many cardinality (1:n)
4. One-to-One cardinality (1:1 )
  

1. Many-to-Many Cardinality-
 

By this cardinality constraint,

 An entity in set A can be associated with any number (zero or more) of entities in set B.
 An entity in set B can be associated with any number (zero or more) of entities in set A.
 

Symbol Used-
 
 

Example-
 

Consider the following ER diagram-

Here,

 One student can enroll in any number (zero or more) of courses.


 One course can be enrolled by any number (zero or more) of students.
 

2. Many-to-One Cardinality-
 

By this cardinality constraint,

 An entity in set A can be associated with at most one entity in set B.


 An entity in set B can be associated with any number (zero or more) of entities in set A.
 

Symbol Used-
 

Example-
 
Consider the following ER diagram-

Here,

 One student can enroll in at most one course.


 One course can be enrolled by any number (zero or more) of students.
 

3. One-to-Many Cardinality-
 

By this cardinality constraint,

 An entity in set A can be associated with any number (zero or more) of entities in set B.
 An entity in set B can be associated with at most one entity in set A.
 

Symbol Used-
 

Example-
 

Consider the following ER diagram-

Here,

 One student can enroll in any number (zero or more) of courses.


 One course can be enrolled by at most one student.
 

4. One-to-One Cardinality-
 

By this cardinality constraint,

 An entity in set A can be associated with at most one entity in set B.


 An entity in set B can be associated with at most one entity in set A.
 

Symbol Used-
 

Example-
 

Consider the following ER diagram-

Here,

 One student can enroll in at most one course.


 One course can be enrolled by at most one student.
 
15. Participation Constraint
In a Relationship, Participation constraint specifies the existence of an entity when it is related to
another entity in a relationship type. It is also called minimum cardinality constraint.
This constraint specifies the number of instances of an entity that can participate in a relationship
type.
There are two types of Participation constraint −
Total Participation
Each entity in the entity set is involved in at least one relationship in a relationship set i.e. the number
of relationship in every entity is involved is greater than 0.

Consider two entities Employee and Department related via Works_For relationship. Now, every
Employee works in at least one department therefore an Employee entity exist if it has at least one
Works_For relationship with Department entity. Thus the participation of Employee in Works_For is
total relationship.

Total Participation is represented by double line in ER diagram.


Partial Participation
Each entity in entity set may or may not occur in at least one relationship in a relationship set.
For example: Consider two entities Employee and Department and they are related to each other via
Manages relationship. An Employee must manage a Department, he or she could be the head of the
department. But not every Employee in the company manages the department. So, participation of
employee in the Manages relationship type is partial i.e. only a particular set of Employees will
manage the Department but not all.

16. What is Relational Model?

Relational Model (RM) represents the database as a collection of relations. A relation is nothing but a table of
values. Every row in the table represents a collection of related data values. These rows in the table denote a
real-world entity or relationship.

The table name and column names are helpful to interpret the meaning of values in each row. The data are
represented as a set of relations. In the relational model, data are stored as tables. However, the physical storage
of the data is independent of the way the data are logically organized.

Relational Model Concepts

1. Attribute: Each column in a Table. Attributes are the properties which define a relation. e.g.,
Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is stored along with its
entities. A table has two properties rows and columns. Rows represent records and columns represent
attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system. Relation instances
never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called relation key.
10. Attribute domain – Every attribute has some pre-defined value and scope which is known as attribute
domain

17. Relational Integrity Constraints

Relational Integrity constraints in DBMS are referred to conditions which must be present for a valid relation.
These Relational constraints in DBMS are derived from the rules in the mini-world that the database represents.

There are many types of Integrity Constraints in DBMS. Constraints on the Relational database management
system is mostly divided into three main categories are:

1. Domain Constraints
2. Key Constraints
3. Referential Integrity Constraints

Domain Constraints

Domain constraints can be violated if an attribute value is not appearing in the corresponding domain or it is not
of the appropriate data type.

Domain constraints specify that within each tuple, and the value of each attribute must be unique. This is
specified as data types which include standard data types integers, real numbers, characters, Booleans, variable
length strings, etc.

Key Constraints

An attribute that can uniquely identify a tuple in a relation is called the key of the table. The value of the
attribute for different tuples in the relation has to be unique.

Referential Integrity Constraints

Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A foreign key is an
important attribute of a relation which should be referred to in other relationships. Referential integrity
constraint state happens where relation refers to a key attribute of a different or same relation. However, that key
element must exist in the table.
CO2
1.DML
DML is short name of Data Manipulation Language which deals with data
manipulation and includes most common SQL statements such SELECT, INSERT,
UPDATE, DELETE, etc., and it is used to store, modify, retrieve, delete and update
data in a database.

 SELECT - retrieve data from a database


 INSERT - insert data into a table
 UPDATE - updates existing data within a table
 DELETE - Delete all records from a database table
 MERGE - UPSERT operation (insert or update)
 CALL - call a PL/SQL or Java subprogram
 EXPLAIN PLAN - interpretation of the data access path
 LOCK TABLE - concurrency Control

2. SQL JOIN statement


 The SQL Joins clause is used to combine records from two or more tables.
 A JOIN is a means for combining fields from two tables by using values common to
each.
Types of JOIN operator
 Cartesian product or Cross Join
 Inner Join
 Outer Join
 Left-outer Join or Left Join
 Right-outer Join or Right Join
 Full Join
 Self Join
CARTESIAN JOIN or CROSS JOIN
 The CARTESIAN JOIN or CROSS JOIN returns the Cartesian product of the sets of
records from the two or more joined tables. 
 Syntax of CARTESIAN JOIN or CROSS JOIN is as follows:
SELECT table1.column1, table2.column2...
FROM table1, table2 ;
Inner Join

 Inner Join or Equi Join returns rows when there is a match in both tables.
 The INNER JOIN creates a new result table by combining column values of two tables
(table1 and table2) based upon the join-predicate. The query compares each row of
table1 with each row of table2 to find all pairs of rows which satisfy the join-
predicate. When the join-predicate is satisfied, column values for each matched pair
of rows of A and B are combined into a result row.
 Syntax of INNER JOIN:
 SELECT table1.column1, table2.column2...
FROM table1 INNER JOIN table2
ON table1.common_field = table2.common_field;
Left Outer Join
 Left-Outer Join returns all rows from the left table, even if there are no matches in
the right table.
 This means that if the ON clause matches 0 (zero) records in right table, the join will
still return a row in the result, but with NULL in each column from right table.
 This means that a left join returns all the values from the left table, plus matched
values from the right table or NULL in case of no matching join predicate.
 Syntax of Left Outer Join or left join
 SELECT table1.column1, table2.column2...
FROM table1 LEFT OUTER JOIN table2
ON table1.common_field = table2.common_field;
Right Outer Join
 RIGHT JOIN  or RIGHT OUTER JOIN returns all rows from the right table, even if there
are no matches in the left table.
 This means that if the ON clause matches 0 (zero) records in left table, the join will
still return a row in the result, but with NULL in each column from left table.
 This means that a right join returns all the values from the right table, plus matched
values from the left table or NULL in case of no matching join predicate.
 Syntax of Right Outer Join or Right join
 SELECT table1.column1, table2.column2...
FROM table1 RIGHT OUTER JOIN table2
ON table1.common_field = table2.common_field;
Full Join
 FULL JOIN combines the results of both left and right outer joins.
 The joined table will contain all records from both tables, and fill in NULLs for missing
matches on either side.
 Syntax of Full join
 SELECT table1.column1, table2.column2...
FROM table1 FULL JOIN table2
ON table1.common_field = table2.common_field;

3.Null Values
The SQL NULL is the term used to represent a missing value. A NULL value in a
table is a value in a field that appears to be blank.
A field with a NULL value is a field with no value. It is very important to understand
that a NULL value is different than a zero value or a field that contains spaces.

Syntax
The basic syntax of NULL while creating a table.
SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
Here, NOT NULL signifies that column should always accept an explicit value of the
given data type. There are two columns where we did not use NOT NULL, which
means these columns could be NULL.
A field with a NULL value is the one that has been left blank during the record
creation.

4. Views in SQL
o Views in SQL are considered as a virtual table. A view also contains rows and
columns.
o To create the view, we can select the fields from one or more tables present in the
database.
o A view can either have specific rows based on certain condition or all the rows
of a table.

1. Creating view

A view can be created using the CREATE VIEW statement. We can create a view from
a single table or multiple tables.

2.Updating a view

A view can be updated under certain conditions:

 The SELECT clause may not contain the keyword DISTINCT.

 The SELECT clause may not contain aggregate functions.

 The SELECT clause may not contain set functions.

 The SELECT clause may not contain set operators.

 The SELECT clause may not contain an ORDER BY clause.

 The FROM clause may not contain multiple tables.

 The WHERE clause may not contain subqueries.

 The query may not contain GROUP BY or HAVING.

 Calculated columns may not be updated.

 All NOT NULL columns from the base table must be included in the view in
order for the INSERT query to function.

So if a view satisfies all the above-mentioned rules then you can update a view.

Advantages of views

 Security: Each user can be given permission to access the database only
through a small set of views that contain the specific data the user is
authorized to see, thus restricting the user's access to stored data
 Query Simplicity: A view can draw data from several different tables and
present it as a single table, turning multi-table queries into single-table
queries against the view.

 Structural simplicity: Views can give a user a "personalized" view of the


database structure, presenting the database as a set of virtual tables that
make sense for that user.

 Consistency: A view can present a consistent, unchanged image of the


structure of the database, even if the underlying source tables are split,
restructured, or renamed.

 Data Integrity: If data is accessed and entered through a view, the DBMS can
automatically check the data to ensure that it meets the specified integrity
constraints.

 Logical data independence:  View can make the application and database
tables to a certain extent independent. If there is no view, the application must
be based on a table. With the view, the program can be established in view of
above, to view the program with a database table to be separated.

Disadvantages of views

 Performance: Views create the appearance of a table, but the DBMS must still
translate queries against the view into queries against the underlying source
tables. If the view is defined by a complex, multi-table query then simple
queries on the views may take considerable time.

 Update restrictions: When a user tries to update rows of a view, the DBMS
must translate the request into an update on rows of the underlying base
tables. This is possible for simple views, but more complex views are often
restricted to read-only.

5. Subqueries or Nested Queries


A Subquery or Inner query or a Nested query is a query within another SQL query
and embedded within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition
to further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE
statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
 Subqueries must be enclosed within parentheses.
 A subquery can have only one column in the SELECT clause, unless multiple
columns are in the main query for the subquery to compare its selected
columns.
 An ORDER BY command cannot be used in a subquery, although the main
query can use an ORDER BY. The GROUP BY command can be used to
perform the same function as the ORDER BY in a subquery.
 Subqueries that return more than one row can only be used with multiple
value operators such as the IN operator.
 The SELECT list cannot include any references to values that evaluate to a
BLOB, ARRAY, CLOB, or NCLOB.
 A subquery cannot be immediately enclosed in a set function.
 The BETWEEN operator cannot be used with a subquery. However, the
BETWEEN operator can be used within the subquery.

6. Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to
obtain the result of the query. It uses operators to perform queries.

Types of Relational operation

1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).

1. Notation:  σ p(r)  
Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR
and NOT. These relational can use as relational operators like =, ≠, ≥, <, >, ≤.

2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)   

Where

A1, A2, A3 is used as an attribute name of relation r.

Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S   

A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains
all tuples that are in both R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S   
Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains
all tuples that are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S  
Cartesian product
o The Cartesian product is used to combine each row in one table with each row
in the other table. It is also known as a cross product.
o It is denoted by X.

1. Notation: E X D  
Rename Operation:

The rename operation is used to rename the output relation. It is denoted by rho (ρ).

7.Query Tree Notation


 Query Tree

 An internal data structure to represent a query

 Standard technique for estimating the work involved in executing the


query, the generation of intermediate results, and the optimization of
execution

 Nodes stand for operations like selection, projection, join, renaming,


division, ….

 Leaf nodes represent base relations

 A tree gives a good visual feel of the complexity of the query and
the operations involved

 Algebraic Query Optimization consists of rewriting the query or


modifying the query tree into an equivalent tree.

8.What is SQL?
 SQL stands for Structured Query Language
 SQL lets you access and manipulate databases
 SQL became a standard of the American National Standards
Institute (ANSI) in 1986, and of the International Organization for
Standardization (ISO) in 1987

What Can SQL do?

 SQL can execute queries against a database


 SQL can retrieve data from a database
 SQL can insert records in a database
 SQL can update records in a database
 SQL can delete records from a database
 SQL can create new databases
 SQL can create new tables in a database
 SQL can create stored procedures in a database
 SQL can create views in a database
 SQL can set permissions on tables, procedures, and views

9. RDBMS
RDBMS stands for Relational Database Management System.

RDBMS is the basis for SQL, and for all modern database systems such as MS
SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.

The data in RDBMS is stored in database objects called tables. A table is a


collection of related data entries and it consists of columns and rows.

How it works

Data is represented in terms of tuples (rows) in RDBMS.

Relational database is most commonly used database. It contains number of tables


and each table has its own primary key.

Due to a collection of organized set of tables, data can be accessed easily in RDBMS.

10.SQL Commands
o SQL commands are instructions. It is used to communicate with the database.
It is also used to perform specific tasks, functions, and queries of data.
o SQL can perform various tasks like create a table, add data to tables, drop the
table, modify the table, set permission for users.
a. CREATE It is used to create a new table in the database.

b. DROP: It is used to delete both the structure and record stored in the table.

c. ALTER: It is used to alter the structure of the database. This change could be either
to modify the characteristics of an existing attribute or probably to add a new
attribute.

d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.

e. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a
table.

f. UPDATE: This command is used to update or modify the value of a column in the table.

g. DELETE: It is used to remove one or more row from a table.

CO3

1.PL/SQL
PL/SQL is tightly integrated with SQL.

• It offers extensive error checking.

• It offers numerous data types as SQL.

• It offers a variety of programming structures

• It supports object-oriented programming.

• It supports the development of web applications and server pages.

• It is not case sensitive

DECLARE
<declarations section>

BEGIN

<executable command(s)>

EXCEPTION

<exception handling>

END;

Programming structures

• Conditional statements

• Loops

• Procedures

• Functions

• Cursors

• Triggers

2.Introduction to MySQL Cursor


• To handle a result set inside a stored procedure, you use a cursor.

• A cursor allows you to iterate a set of rows returned by a query and process each
row individually.

• In simple, a cursor is a mechanism by which you can assign a name to a

SELECT statement.

• MySQL cursor is:

➢ Read-only: Not updatable

➢ Non-scrollable: Can be traversed only in one direction and cannot skip rows

➢ Asensitive: The server may or may not make a copy of its result table
• You can use MySQL cursors in stored procedures, stored functions, and triggers.

3.MySQL Triggers
• A trigger is a stored program invoked automatically in response to an event that occurs in
the associated table.

• MySQL supports triggers that are invoked in response to the INSERT, UPDATE or DELETE
event.

• The SQL standard defines two types of triggers:

– row-level triggers

– statement-level triggers.

• MySQL supports only row-level triggers. It doesn’t

Support statement- level triggers

There are 6 type of triggers that can be

created.

➢ Before Insert

➢ After Insert

➢ Before Update

➢ After Update

➢ Before Delete

➢ After Delete

4.Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them
using relationship.
o The normal form is used to reduce redundancy from the database table.

Types of Normal Forms

There are the four types of normal forms:

Normal Form Description

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional

dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued

dependency.

5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.

5.Data Redundancy
Redundancy means having multiple copies of same data in the database.
This problem arises when a database is not normalized. Suppose a table of
student details attributes are: student Id, student name, college name,
college rank, course opted. 

1.Insertion Anomaly – 
If a student detail has to be inserted whose course is not being
decided yet then insertion will not be possible till the time course is
decided for student. 
This problem happens when the insertion of a data record is not
possible without adding some additional unrelated data to the
record. 
 
2.Deletion Anomaly – 
If the details of students in this table is deleted then the details of
college will also get deleted which should not occur by common
sense. 
This anomaly happens when deletion of a data record results in
losing some unrelated information that was stored as part of the
record that was deleted from a table.  
It is not possible to delete some information without losing some
other information in the table as well.
 
3.Updation Anomaly – 
Suppose if the rank of the college changes then changes will have
to be all over the database which will be time-consuming and
computationally costly. 
If updation do not occur at all places then database will be in
inconsistent state.

6. What is File Organization?


File Organization refers to the logical relationships among various records
that constitute the file, particularly with respect to the means of identification
and access to any specific record. In simple terms, Storing the files in certain
order is called file Organization. File Structure refers to the format of the
label and data blocks and of any logical control record.

Types of File Organizations –

Various methods have been introduced to Organize files. These particular


methods have advantages and disadvantages on the basis of access or
selection . Thus it is all upon the programmer to decide the best suited file
Organization method according to his requirements.
Some types of File Organizations are :
 Sequential File Organization
 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization

7.Storage Devices
Databases are stored in file formats, which contain records. At physical level, the
actual data is stored in electromagnetic format on some device. These storage
devices can be broadly categorized into three types −
Primary Storage
• Direct access by CPU.
• Fast access to data
• Limited storage capacity.
• More expensive.
• Volatile
• Main memory & faster cache memory.

Secondary Storage
• No direct accessto CPU
• Slow accessto data
• Large capacity
• Less expensive
• Non-Volatile
• Online devices
• Magnetic disk & Flash memory
• Most databases are stored permanently on magnetic disks

Tertiary Storage
• No direct access to CPU
• Slower access to data
• Large capacity
• Less expensive than disks
• Non-Volatile
• Offline devices
• Magnetic tapes & Optical disks
• Used for backing up databases

8.Hashing
In a huge database structure, it is very inefficient to search all the index values and
reach the desired data. Hashing technique is used to calculate the direct location of a
data record on the disk without using index structure.

In this technique, data is stored at the data blocks whose address is generated by
using the hashing function. The memory location where these records are stored is
known as data bucket or data blocks.

In this, a hash function can choose any of the column value to generate the address.
Most of the time, the hash function uses the primary key to generate the address of
the data block. A hash function is a simple mathematical function to any complex
mathematical function. We can even consider the primary key itself as the address of
the data block. That means each row whose address will be the same as a primary
key stored in the data block.

Types of Hashing:

Static Hashing

In static hashing, the resultant data bucket address will always be the same. That
means if we generate an address for EMP_ID =103 using the hash function mod (5)
then it will always result in same bucket address 3. Here, there will be no change in
the bucket address.

Hence in this static hashing, the number of data buckets in memory remains constant
throughout. In this example, we will have five data buckets in the memory used to
store the data.

Operations of Static Hashing


o Searching a record

o Insert a Record

o Delete a Record

o Update a Record

Dynamic Hashing

o The dynamic hashing method is used to overcome the problems of static


hashing like bucket overflow.
o In this method, data buckets grow or shrink as the records increases or
decreases. This method is also known as Extendable hashing method.
o This method makes hashing dynamic, i.e., it allows insertion or deletion
without resulting in poor performance.

9. Indexing in DBMS
o Indexing is used to optimize the performance of a database by minimizing the
number of disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in
a database table quickly.

Index structure:

Indexes can be created using some database columns.


o The first column of the database is the search key that contains a copy of the
primary key or candidate key of the table. The values of the primary key are
stored in sorted order so that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of
pointers holding the address of the disk block where the value of the
particular key can be found.

Indexing Methods

Ordered indices

The indices are usually sorted to make searching faster. The indices which are sorted
are known as ordered indices.

Primary Index
o If the index is created on the basis of the primary key of the table, then it is
known as primary indexing. These primary keys are unique to each record and
contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the performance of the searching
operation is quite efficient.
o The primary index can be classified into two types: Dense index and Sparse
index.

Dense index
o The dense index contains an index record for every search key value in the
data file. It makes searching faster.
o In this, the number of records in the index table is same as the number of
records in the main table.
o It needs more space to store index record itself. The index records have the
search key and a pointer to the actual record on the disk.

Sparse index
o In the data file, index record appears only for a few items. Each item points to
a block.
o In this, instead of pointing to each record in the main table, the index points
to the records in the main table in a gap.

Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index
is created on non-primary key columns which may not be unique for each
record.
o In this case, to identify the record faster, we will group two or more columns
to get the unique value and create index out of them. This method is called a
clustering index.
o The records which have similar characteristics are grouped, and indexes are
created for these group.

Secondary Index

In the sparse indexing, as the size of the table grows, the size of mapping also grows.
These mappings are usually kept in the primary memory so that address fetch should
be faster. Then the secondary memory searches the actual data based on the address
got from mapping. If the mapping size grows then fetching the address itself
becomes slower. In this case, the sparse index will not be efficient. To overcome this
problem, secondary indexing is introduced.

Prime Ministers of India | List of Prime Minister of India (1947-2020)


In secondary indexing, to reduce the size of mapping, another level of indexing is
introduced. In this method, the huge range for the columns is selected initially so
that the mapping size of the first level becomes small. Then each range is further
divided into smaller ranges. The mapping of the first level is stored in the primary
memory, so that address fetch is faster. The mapping of the second level and actual
data are stored in the secondary memory (hard disk).
CO4
1.Transaction
o The transaction is a set of logically related operation. It contains a group of
tasks.
o A transaction is an action or series of actions. It is performed by a single user
to perform operations for accessing the contents of the database.
o A transaction is defined as group of tasks [1].
o Single task – It is the minimum processing unit which you can’t be divided
further more [1]. At most one user can use the system.
o A transaction is a logical unit of database.

Operations of Transaction:

Following are the main operations of transaction:

Read(X): Read operation is used to read the value of X from the database and stores
it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from the
buffer.

2.Recoverable Schedules:
 Schedules in which transactions commit only after all transactions whose
changes they read commit are called recoverable schedules. In other
words, if some transaction T j is reading value updated or written by some
other transaction Ti, then the commit of Tj must occur after the commit of
T i.
Irrecoverable Schedule:
 The table below shows a schedule with two transactions, T1 reads and
writes A and that value is read and written by T2. T2 commits. But later
on, T1 fails. So we have to rollback T1. Since T2 has read the value
written by T1, it should also be rollbacked. But we have already
committed that. So this schedule is irrecoverable schedule. When Tj is
reading the value updated by Ti and Tj is committed before committing of
Ti, the schedule will be irrecoverable.
Recoverable with Cascading Rollback:
 The table below shows a schedule with two transactions, T1 reads and
writes A and that value is read and written by T2. But later on, T1 fails.
So we have to rollback T1. Since T2 has read the value written by T1, it
should also be rollbacked. As it has not committed, we can rollback T2 as
well. So it is recoverable with cascading rollback. Therefore, if Tj is
reading value updated by Ti and commit of Tj is delayed till commit of Ti,
the schedule is called recoverable with cascading rollback.
Cascadeless Recoverable Rollback:
 The table below shows a schedule with two transactions, T1 reads and
writes A and commits and that value is read by T2. But if T1 fails before
commit, no other transaction has read its value, so there is no need to
rollback other transaction. So this is a Cascadeless recoverable
schedule. So, if Tj reads value updated by Ti only after Ti is committed,
the schedule will be cascadeless recoverable.
Characterization of schedules based on Serializabilty
• Serial Schedule
• Serializable Schedule
• Result equivalent
• Conflict equivalent
• Conflict serializable

Note: Serializable and Serial are not same. They are different.
Serializability is very hard to verify.
In most of the Database Management Systems, the current approach
is using locks.

3.Concurrency Control
Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.

Concurrency Control Protocols

The concurrency control protocols ensure the atomicity, consistency, isolation,


durability and serializability of the concurrent execution of the database transactions.
Therefore, these protocols are categorized as:

o Lock Based Concurrency Control Protocol


o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol

Lock-Based Protocol

In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:

1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only
read by the transaction.
o It can be shared between the transactions because when the transaction holds
a lock, then it can't update the data on the data item.

2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the
same data simultaneously.

Timestamp Ordering Protocol

o The Timestamp Ordering Protocol is used to order the transactions based on


their Timestamps. The order of transaction is nothing but the ascending order
of the transaction creation.
o The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time or
logical counter.
o The lock-based protocol is used to manage the order between conflicting
pairs among transactions at the execution time. But Timestamp based
protocols start working as soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1
has entered the system at 007 times and transaction T2 has entered the
system at 009 times. T1 has the higher priority, so it executes first as it is
entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read'
and 'write' operation on a data.

Validation Based Protocol

Validation phase is also known as optimistic concurrency control technique. In the


validation based protocol, the transaction is executed in the following three phases:

1. Read phase: In this phase, the transaction T is read and executed. It is used to
read the value of various data items and stores them in temporary local
variables. It can perform all the write operations on temporary variables
without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the
temporary results are written to the database or system otherwise the
transaction is rolled back.

4.UNDO and REDO Recovery Actions


• In this method, FORCE rule is imposed by system
• Transactions in active list are undone
• Transactions in committed list are ignored

Undo:

• Restore all BFIMs from log to database on disk.


• UNDO proceeds backward in log

Redo:

• Restore all AFIMs from log to database on disk.


• REDO proceeds forward in log

Write-Ahead Logging (WAL) protocol consists of two rules:

For Undo:

• Before a data item’s AFIM is flushed

• BFIM must be written to the log


• The log must be saved to disk.

For Redo:

• Before a transaction executes its commit operation


• AFIMs must be written to the log
• The log must be saved on a stable store.

Other Database Recovery Concepts

Steal/No-Steal and Force/No-Force

• Steal: recovery may require UNDO


• No-Steal: NO-UNDO
• Force: NO-REDO
• No-Force: recovery may require REDO

Deferred Update (NO-UNDO/REDO) Recovery Protocol


Recovery subsystem analyzes the log, and creates two lists:
• Active Transaction list: All active (uncommitted)
transaction ids are entered in this list.
• Committed Transaction list: Transactions committed
after the last checkpoint are entered in this table.
During recovery
• Transactions in commit list are redone
• Transactions in active list are ignored
Advantage:
• Only REDO is needed during recovery.
Disadvantage:
• Many buffers may be idle while waiting for transactions that updated
them to commit.
• system may run out of cache buffers when requests are made by new
transactions

5.Aries
• Aries- Algorithms for Recovery and Isolation Exploiting Semantics
• ARIES is a recovery algorithm designed to work with a no-force, steal
database approach
• It is based on WAL (Write Ahead Logging)
• Repeating history during redo
• Logging changes during undo

The ARIES recovery algorithm consists of three steps


• Analysis- The set of transactions that committed after the last checkpoint is
determined
• Redo- necessary redo operations are applied
• Undo- transactions active at the time of crash are undone in reverse order

You might also like