You are on page 1of 149

WACHEMO UNIVERSITY

COLLEGE OF ENGINEERING
AND TECHNOLOGY

SCHOOL OF COMPUTING AND INFORMATICS

DEPARTMENT OF INFORMATION TECHNOLOGY

Advanced Database Systems


Course Code: - ITec3071

Developed By:
By
1. Girma Asefa (Msc,Information Technology, Lecturer)
1. Girma Asefa (Msc,Information Technology)
2. Abduljebar Kedir(Msc,Information Technology, Lecturer)
2. Abduljebir Kedir(Msc,Information Technology)
3. Petros Haile(Msc,Information Technology, Lecturer)
3. Pertros Haile(Msc,Information Technology)
Reviewed By:
Reviewed By
1. Eleni Shiferaw(Bsc.Information Technology, Assistant Lecturer)
1. Eleni Shiferaw(Bsc.Information Technology)
2. Abebe W/Senbat(Msc,Information Technology, Lecturer )
2. Abebe W/Senbat(Msc,Information Technology )

1|P a g e
ETHIOPIA,HOSAINA 2022

Module: Preface

This course covers file organizations, storage management, query optimization, transaction
management, recovery, and concurrency control, database authorization and security.
Additional topics include distributed databases, mobile databases, and integration may also
be covered. A major component of the course is a database implementation project using
current database languages and systems

This module also introduces the fundamental concepts necessary for designing, using, and
implementing database systems and database applications. Our presentation stresses the
fundamentals of database modeling and design, the languages and models provided by the
database management systems, and database system implementation techniques. The
module is meant to be used as a for a one-semester course in database systems. Our goal is
to provide an in-depth and up-to-date presentation of the most important aspects of database
systems and applications, and related technologies. We assume that readers are familiar with
elementary programming and data structuring concepts and those they have had some
exposure to the basics of computer organization.

Acknowledgments

It is a great pleasure to acknowledge the assistance and contributions of many individ ua ls


to this effort. First, we would like to thank Wachemo University for their guidance,
encouragement, and support. We would like to acknowledge the excellent work of IT
department Head Miss Bethlehem Zewdu for her supportive on different angle on
production and preparation of course module thorough copy editing of this module. We also
thank the all IT staff for their contributed.

2|P a g e
Contents
Chapter One: Query Processing And Optimization ................................................................................... 8

1. Basic Terms Related To Database ................................................................................................................... 8

1.1. The Dbms Manages Three Important Things ................................................................... 10

1.1.1. Ddl – Data Definition Language ............................................................................... 10

1.1.2.Dml – Data Manipulation Language........................................................................... 11

1.1.3. Dcl – Data Control Language.................................................................................... 12

3.2. Translating Sql Queries Into Relational Algebra .............................................................. 22

Chapter 2: Database Security And Authorization................................................................................... 33

2.1. Introduction Of Database Security .................................................................................. 33

2.2. Security Issues .............................................................................................................. 34

2.3. Goals And Objectives .................................................................................................... 34

2.4. Security Problems ......................................................................................................... 35

2.5. Solution Options ........................................................................................................... 36

2.6. General Control Procedures............................................................................................ 37

2.7. Privacy Issues ............................................................................................................... 38

2.8. Access Control .............................................................................................................. 40

2.8.1. Levels And Types Of Data Access ........................................................................... 40

2.8.2. Discretionary Control .............................................................................................. 42

2.8.3. Use Of Views ......................................................................................................... 45

2.8.4. Mandatory Control .................................................................................................. 46

2.8.5. Statistical Databases ................................................................................................... 52

2.9. Encryption .................................................................................................................... 55

2.9.1. Data Encryption Standard ........................................................................................ 59

3|P a g e
Chapter 3: Transaction Processing Concepts........................................................................................ 66

3.1.Introduction Of Transaction ............................................................................................ 66

3.2. Properties Of Transaction (Acid Property)....................................................................... 67

3.2.1. Atomicity ............................................................................................................... 67

3.2.2. Consistency ............................................................................................................ 68

3.2.3. Isolation ................................................................................................................. 68

3.2.4. Durability ............................................................................................................... 68

3.3. State Transition Diagram ............................................................................................... 69

3.4. Dbms Schedules And The Types Of Schedules................................................................ 71

3.4.1. Schedule ................................................................................................................. 71

3.4. Transaction Control Language In Sql (Tcl)...................................................................... 75

Chapter 4: Dbms Concurrency Control .......................................................................................................... 83

4.1. Introduction Of Concurrency Controlling ........................................................................ 83

4.2. Methods To Control Concurrency (Mechanisms)............................................................. 83

4.2.2. Unrepeatable Read Problem ..................................................................................... 85

4.2.3.Lost Update Problem................................................................................................ 85

4.3. Concurrency Control ..................................................................................................... 86

4.4. Concurrency Control Protocols ....................................................................................... 87

4.4.1. Types Of Lock: ....................................................................................................... 87

4.5. Locking ........................................................................................................................ 88

4.5.1. Locking Rules......................................................................................................... 88

Table 4.1:Double Locking................................................................................................. 89

4.5.2. Two-Phase Locking (2pl)......................................................................................... 89

4.5.3. Timestamp-Based Protocols..................................................................................... 90

4.5.4. Validation Based Protocol ....................................................................................... 91

Chapter 5:- Database Recovery ........................................................................................................................ 94


4|P a g e
5.1. Introduction Of Database Recovery ................................................................................ 94

5.2 Explain Log Based Recovery Method .............................................................................. 95

5.2.1. Log Based Recovery ............................................................................................... 95

5.2.2. Log Based Recovery Techniques.............................................................................. 96

5.2.3. Explain Shadow Paging Technique......................................................................... 100

Chapter 6: Distributed Database System ................................................................................................... 105

6.1. Introduction Of Distribute System ................................................................................ 105

6.2. Distributed Database Concepts ..................................................................................... 107

6.4.Fragmentation Transparency. ........................................................................................ 109

6.5. Additional Functions Of Distributed Databases ............................................................. 111

6.5. Types Of Distributed Database Systems ........................................................................ 112

6.6. Distributed Database Architectures ............................................................................... 113

6.7. Parallel Versus Distributed Architectures ...................................................................... 113

6.8. General Architecture Of Pure Distributed Databases ...................................................... 115

6.9. Federated Database Schema Architecture...................................................................... 116

6.10. An Overview Of Three-Tier Client-Server Architecture ............................................... 117

6.11. Data Fragmentation ................................................................................................... 120

6.14. Query Processing And Optimization In Distributed Databases ...................................... 128

6.15, Data Transfer Costs Of Distributed Query Processing .................................................. 129

Chapter 7: Spatial /Multimedia/Mobile Databases................................................................................134

7.1. Introduction ................................................................................................................ 134

7.1.1. Multimedia Database............................................................................................. 134

7.1.2. Spatial Data .......................................................................................................... 136

7.1.3. Characteristics Of Spatial Database ........................................................................ 136

7.1.4. Temporal Databases .............................................................................................. 137

7.1.5. Bi-Temporal Relation ............................................................................................ 138

5|P a g e
7.1.6. Valid Time ........................................................................................................... 139

7.1.6. Transaction Time .................................................................................................. 140

7.1.6. Time Specification In Sql ...................................................................................... 140

7.1.8. Multimedia Databases ........................................................................................... 141

7.1.8. Continuous Media Data ......................................................................................... 142

7.1.9. Data Retrieval ....................................................................................................... 142

7.1.10. Content Delivery Networks.................................................................................. 143

7.1.11. Similarity-Based Retrieval ................................................................................... 143

7.1.12. Mobility And Personal Databases ......................................................................... 144

7.2.Mobility....................................................................................................................... 144

7.2.1.Model Of Mobile Computing .................................................................................. 144

7.3. Spatial And Geographic Data ....................................................................................... 145

7.3.1. Geographic Data And Applications ............................................................................ 146

References .....................................................................................................................................................................149

Figure 1. 1 Types of SQL Command ........................................................................................... 13

Figure 1. 2 Steps in Database Query Processing ......................................................................... 24

Figure 2. 1 Database security system........................................................................................ 35


Figure 2. 2 Threats to database security. ................................................................................... 36
Figure 2. 3 Database security: layers of control. ........................................................................ 39
Figure 2. 4 Authorization graph. ............................................................................................... 44
Figure 2. 5 Elements of encryption............................................................................................ 56
Figure 2. 6 DES: single-key encryption. ...................................................................................... 60
Figure 2. 7 RSA: public key encryption....................................................................................... 61
Figure 2. 8 Public key encryption: data exchange. ...................................................................... 63

Figure 3. 1 transaction state in DBMS........................................................................................ 70


6|P a g e
Figure 4. 1 Two phase locking techniques.................................................................................. 90

Figure 5. 1 Explain Shadow Paging Technique .......................................................................... 102


Figure 6. 1 Centralized Database System106
Figure 6. 2 Distributed Database System. ............................................................................... 106
Figure 6. 3 Fragmentation ..................................................................................................... 109
Figure 6. 4 types of distribute system...................................................................................... 113
Figure 6. 5 Parallel versus Distributed Architectures ................................................................ 114
Figure 6. 6 Parallel versus Distributed Architectures ................................................................ 116
Figure 6. 7 Federated Database Schema Architecture .............................................................. 117
Figure 6. 8 An Overview of Three-Tier Client-Server Architecture ............................................. 119

7|P a g e
Chapter One: Query processing and Optimization

CHAPTER OUTCOME

Students will be able to:

 Translating SQL Queries into Relational Algebra


 Basic Algorithms for Executing Query Operations
 Using Heuristic in Query Optimization
 Using Selectivity and Cost Estimates in Query Optimization
 Semantic Query Optimization

1. Basic Terms Related to Database


Define the following terms.
Data
 Fact that can be recorded or stored.
E.g. Person Name, Age, Gender and Weight etc.
Information
 When data is processed, organized, structured or presented in a given context so as to
make it useful, it is called information.
Database
 A Database is a collection of inter-related (logically-related) data.
E.g. Books Database in Library, Student Database in University etc.
Metadata
 Metadata is data about data.
 Data such as table name, column name, data type, authorized user and user
access privileges for any table is called metadata for that table.
Data dictionary
 Data dictionary is an information repository which contains metadata.
 It is usually a part of the system catalog.
Data warehouse
 Data warehouse is an information repository which stored data.

8|P a g e
 It is design to facilitate reporting and analysis.
Field
 A field is a character or group of characters that have a specific meaning.
 It is also called a data item. It is represented in the database by a value.
 For Example customer id, name, society and city are all fields for customer Data.
Record
 A record is a collection of logically related fields.
 For examples, collection of fields (id, name, address & city) forms a record for customer.
2. Database Management System (DBMS)
A database is a collection of related data which represents some aspect of the real world that
is designed for a certain task. A database management system (DBMS) is a software
package designed to define, store, manipulate, retrieve and manage users’ data in a database
by considering appropriate security measures. It consists of a group of programs which
manipulate the database. The DBMS accepts the request (instruction) for data from an
application and instructs the operating system to provide the specific data. It provides an
interface between the data and the software application.
Some other DBMS examples include:
 Microsoft Access
 MySQL
 Oracle
 PostgreSQL
 dBASE
 FoxPro
 MongoDB
 SQLite
 IBM DB2
 LibreOffice Base
 MariaDB
 Microsoft SQL Server etc

9|P a g e
1.1. The DBMS manages three important things
 Data accessibility; allows data to be accessed: provides a centralized view of data that can
be accessed by multiple users, from multiple locations, in a controlled manner.
 It can limit what data the end user sees, as well as how that end user can view the data,
providing many views of a single database schema.
 End users and software programs are free from having to understand where the data is
physically located or on what type of storage media it resides because the DBMS handles
all requests
Manages locking and modification
Defines database schema (the database's logical structure).
Generally, the above three functions provide concurrency, security, data integrity and unifor m
data administration procedures. Typical database administration tasks supported by the DBMS
include change manage me nt, perfor ma nc e monito r ing and tuning, security, and
backup and recovery. Many d a t a b a s e management systems are also responsible
for automated rollbacks and restarts as well as the logging and auditing of activity in
databases.
In a relational database management system (RDBMS), the most widely used type of DBMS
API is SQL, a standard programming language for defining, managing, protecting and
accessing data in an RDBMS. SQL commands are used for communicating with the database.
Everything ranging from creating a table and adding data to modifying a table and setting user
permissions is accomplished using SQL commands.
SQL Commands
There are basically four types of statements that can be executed in SQL Server for different
purposes.

1.1.1. DDL – Data Definition Language


Data definition statements are used to define the database structure or table.
CREATE
Used for creating a new table in the database.
Example: CREATE TABLE Employee (Name varchar2 (20),DOB date, Salary number
(6));
ALTER

10 | P a g e
This command is used for altering the structure of a database. Typically, the ALTER
command is used either to add a new attribute or modify the characteristics of some existing
attribute. For adding new columns to the table:
ALTER TABLE Student ADD (Address varchar2(20),Age number (2);
For modifying an existing column in the table:
ALTER TABLE Student MODIFY (Name varchar2 (20));
LTER TABLE Student DROP COLUMN Age;
DROP
Used for deleting an entire table from the database and all the data stored in it.
Example: DROP TABLE Student
TRUNCATE
Used for deleting all rows from a table and free the space containing the table.
Example: TRUNCATE TABLE Student;
R E N A M E : Used for renaming a table.
Example: RENAME Student TO Student Details;

1.1.2.DML – Data Manipulation Language


Data manipulation statement is used for managing data within table object.
SELECT
Used for retrieving data from table.
Example: SELECT * FROM Student;
INSERT
Used for inserting data into the row of a table.
Example: INSERT INTO Student (Name, Age) VALUES (“Mom”, 30);
Insert command can also be used for inserting data into a table from
another table.
INSERT INTO Student SELECT ID, Stream FROM
Student_Subject_Details;
DELETE
Used for removing one or more rows from a table.
DELETE FROM Student;
DELETE FROM Student WHERE Name=” Mom”;
UPDATE

11 | P a g e
Used to modify or update the value of a column in a table. It can update all rows or some
selective rows in the table.
UPDATE Student SET Name = “Naa’ol” WHERE Id = 22;

1.1.3. DCL – Data Control Language


Data control statement is used to give privileges to access limited data. GRANT, REVOKE,
AUDIT, COMMENT, ANALYZE statements are included in this language. In order to protect
the information in a table from unauthorized access, DCL commands are used. A DCL
command can either enable or disable a user from. Accessing information from a database. List
of user access privileges:

 Alter  Insert
 Delete  Select
 Index  Update

GRANT
Used for granting user access privileges to a database.
GRANT SELECT, UPDATE ON Student TO ABC
This will allow the user to run only SELECT and UPDATE operations on the Student table.
GRANT ALL ON Student TO ABC WITH GRANT OPTION
Allows the user to run all commands on the table as well as grant access privileges to other
users.
REVOKE:
Used for taking back permission given to a user.
REVOKE UPDATE ON Student FROM ABC;
Note: - A user who is not the owner of a table but has been given the privilege to grant
permissions to other users can also revoke permissions.
TCL – Transaction Control Language
Transaction Control Language commands can only be used with DML commands. As these
operations are auto-committed in the database, they can’t be used while creating or dropping
tables. Transaction control statement are used to apply the changes permanently save into
database.

12 | P a g e
COMMIT
Used for saving all transactions made to a database. Ends the current transaction and makes
all changes permanent that were made during the transaction. Releases all transaction
locks acquired on tables.
Example: DELETE FROM Student WHER Age=25;
COMMIT;

ROLLBACK
Used to undo transactions that aren’t yet saved in the database. Ends the Transaction and
undoes al changes made during the transaction. Releases all transaction locks acquired on
tables.
Example: DELETE FROM Student WHERE Age=25;
ROLLBACK;
SAVEPOINT
Used for rolling back to a certain state known as the savepoint. Savepoints
Need to be created first so that they can be used for rollbacking transactions partially.
SAVEPOINT savepoint_name;
Note: - An active savepoint is one that has been specified since the last COMMIT or
ROLLBACK
Command
Note: - An active savepoint is one that has been specified since the last COMMIT or
ROLLBACK
Command.
Summary types of SQL command
Figure 1. 1 Types of SQL Command

13 | P a g e
14 | P a g e
2. Relational Algebra in DBMS
Relational algebra is one of the two formal query languages associated with the relationa l
model. Queries in algebra are composed using a collection of operators. A fundame nta l
property is that every operator in the algebra accepts (one or two) relation instances
as arguments and returns a relation instance as the result.
This property makes it easy to compose operators to form a complex query a relationa l
algebra expression is recursively defined to be a relation, a unary algebra operator
applied to a single expression, or a binary algebra operator applied to two expressions.
We describe the basic operators of the algebra (selection, projection, union, cross-
product, and difference), as well as some additional operators that can be defined in
terms of the basic operators but arise frequently enough to warrant special attention,
in the following sections. Each relational query describes a step-by-step procedure for
computing the desired answer, based on the order in which operators are applied in
the query. The procedural nature of the algebra allows us to think of an algebra expression
as a recipe, or a plan, for evaluating a query, and relational systems in fact use algebra
expressions to represent query evaluation plans.
Every database management system must define a query language to allow users to access
the data stored in the database. Relational Algebra is a procedural query language, which
takes instances of relations as input and yields instances of relations as output from
database tables to access data in different ways. It uses operators to perform queries. An
operator can be
Either unary or binary.
In relational algebra, input is a relation (table from which data has to be accessed) and
output is also a relation (a temporary table holding the data asked for by the user).

15 | P a g e
Relational Algebra works on the whole table at once, so we do not have to use loops etc to
iterate over all the rows (tuples) of data one by one. All we have to do is specify the table
name from which we need the data, and in a single line of command, relational algebra will
traverse the entire given table to fetch data for you.
Relational database systems are expected to be equipped with a query language that can assist
its users to query the database instances. There are two kinds of query languages relationa l
algebra and relational calculus.
The fundamental operations of relational algebra are as follows –
I. Unary Relational Operations
A. SELECT (symbol: σ (Sigma))
This is used to fetch rows (tuples) from table (relation) which satisfies a given conditio n
(predicate)?
Syntax: σp(r)
Where, σ represents the Select Predicate, r is the name of relation (table name in which you
want to look for data), and p is the prepositional logic, where we specify the conditions that
must be satisfied by the data. In prepositional logic, one can use unary and binary operators
Like =, <, > etc, to specify the conditions.
Let's take an example of the Student table we specified above in the Introduction of relationa l
algebra, and fetch data for students with age more than 17.
σage > 17 (Student)

This will fetch the tuples (rows) from table Student, for which age will be greater than 17.
You can also use, and, or etc operators, to specify two conditions, for example,
σage > 17 and gender = 'Male' (Student)

This will return tuples(rows) from table Student with information of male students, of age
more than 17.(Consider the Student table has an attribute Gender too.)
σ topic = "Database" (COURSE)

Output - Selects tuples from Tutorials where topic = 'Database'.


σ topic = "Database" and author = "guru99"( COURSE )

16 | P a g e
Output - Selects tuples from Tutorials where the topic is 'Database' and 'author' is guru99.
σ sales > 50000 (Customers)

Output - Selects tuples from Customers where sales is greater than 50000
B. PROJECT (symbol: π (pi))
Project operation is used to project only a certain set of attributes of a relation. In simple
words, if you want to see only the names all of the students in the Student table, then
you can use Project Operation.
It will only project or show the columns or attributes asked for, and will also remove duplicate
data from the columns.
Syntax: ∏A1, A2...(r)
Where A1, A2 etc are attribute names (column names).
For example: ∏Name, Age(Student)
Above statement will show us only the Name and Age columns for all the rows of
data in Student table.
It eliminates all attributes of the input relation but those mentioned in the projection list.
The projection method defines a relation that contains a vertical subset of Relation.
II.Relational Algebra Operations from Set Theory
A. UNION (υ): This operation is used to fetch data from two relations (tables) or temporary
relation (result of another operation).
For this operation to work, the relations (tables) specified should have same number
of attributes (columns) and same attribute domain. Also the duplicate tuples are
automatically
Eliminated from the result.
Syntax: A ∪ B
Where A and B are relations.
For example, if we have two tables RegularClass and ExtraClass, both
have a column student to save name of student, then,
∏Student (RegularClass) ∪ ∏Student (ExtraClass)

Above operation will give us name of Students who are attending both regular classes and
extra classes, eliminating repetition.
UNION is symbolized by ∪ symbol. It includes all tuples that are in tables A or in B.
It also eliminates duplicate tuples.

17 | P a g e
For a union operation to be valid, the following conditions must hold -
 R and S must be the same number of attributes.
 Attribute domains need to be compatible.
 Duplicate tuples should be automatically removed.

B. INTERSECTION (∩)
Defines a relation consisting of a set of all tuple that are in both A and B. However, A and B
must be union-compatible.

C. DIFFERENCE (-)
This operation is used to find data present in one relation and not present in the second
relation. This operation is also applicable on two relations, just like
Union operation.
Syntax: A – B where A and B are relations.
For example, if we want to find name of students who attend the regular class but not the
extra class, then, we can use the below operation:
∏Student(RegularClass) - ∏Student(ExtraClass)

The result of A - B, is a relation which includes all tuples that are in A but not inB.
 The attribute name of A has to match with the attribute name in B.
 The two-operand relations A and B should be either compatible or Union compatible.
 It should be defined relation consisting of the tuples that are in relation A, but not in B.
D. CARTESIAN PR OD UCT ( x)
This is us e d t o c o mb in e d a ta f r o m t w o d if f e r e n t relations (tables) into one and
fetch data from the combined relation.
Syntax: A X B

18 | P a g e
For example, if we want to find the information for Regular Class and Extra Class which
are conducted during morning, then, we can use the following operation: σtime = 'morning'
(RegularClass X ExtraClass)

For the above query to work, both RegularClass and ExtraClass should have the attribute
time.
This type of operation is helpful to merge columns from two relations. Generally, a Cartesian
product is never a meaningful operation when it performs alone. However, it becomes
meaningful when it is followed by other operations.
III. Binary Relational
Operations
A. JOIN
Join operation is essentially a cartesian product followed by a selection criterion. Join
operation denoted by ⋈.
JOIN operation also allows joining variously related tuples from differe nt
relations.
Types of Joint
Various forms of join operation are:
1. Inner Joins
In an inner join, only those tuples that satisfy the matching criteria are included, while the
rest are excluded. Let's study various types of Inner Joins:
Theta join: The general case of JOIN operation is called a Theta join. It is denoted by
symbol θ
2. EQUI join: When a theta join uses only equivalence condition, it becomes a equi join.
It is the most difficult operations to implement efficiently in an RDBMS and one
reason why RDBMS have essential performance problems.
Natural join (⋈): Natural join can only be performed if there is a common attribute

(column)
between the relations. The name and type of the attribute must be same.
3. Outer join: In an outer join, along with tuples that satisfy the matching criteria, we
also include some or all tuples that do not match the criteria.
Left Outer Join (A B): In the left outer join, operation allows keeping all tuple in the
left relation. However, if there is no matching tuple is found in right relation, then the
attributes of right relation in the join result are filled with null values.
19 | P a g e
Right Outer Join (A B): In the right outer join, operation allows keeping all tuple in
the right relation. However, if there is no matching tuple is found in the left relation, then
the attributes of the left relation in the join result are filled with null values.

Full Outer Join: In a full outer join, all tuples from both relations are included
in the result, irrespective of the matching condition
Database performance tuning often requires a deeper understanding of how
queries are processed and optimized within the database management system.
In this note we provide a general overview of how query processing (rule based

20 | P a g e
and cost-based query optimizers operate) and then provide some specific examples of
query optimization in commercial DBMS.
In this session we discuss the techniques used by DBMS to process, optimize and execute

high- level queries. A query expressed in high level query language such as SQL must first
be scanned, parsed and validated.
The scanner identifies the language tokens such as Keywods, attribute names, and relation
names whereas parser checks the query syntax to determine whether it is formulated
according to the syntax rules (rules of grammar) of the query language.
The query must be validated, by checking that all attribute and relation names are valid and
semantically meaningful names in the schema of particular database being queried. An
internal representation of the query is then created as a tree data structure called query tree.
It is also possible to represent the query using graph data structure called query graph.
The DBMS must then device execution strategy for retrieving the result of query from the
database. A query typically has many possible execution strategies and process of choosing
a suitable one for processing a query is known as query optimization.

4. Query-processing

21 | P a g e
The system executes the query using the optimal strategy generated. In query-optimizatio n,
a SQL query is first translated into an equivalent relational algebra expression using a query
tree data structure before to be optimized.
We will therefore set out below how to pass from a SQL query to an expression in
Relational Algebra. Query is processed in two phases: the query-optimization phase and the
query-processing phase. In order to facilitate the understanding, we will add the query-
compilation phase before the two previous phases because queries are viewed by user as Data
Manipulation Language (DML) scripts.
3.1. Query-compilation: DML processor translates DML statements into low-level
instructions (compiled query) that the query optimizer can understand.

3.2. Translating SQL queries into Relational Algebra


Typically, queries are decomposed into query blocks which form the basic unit that can be
translated into the algebraic operators and optimized. A query block contains a single
SELECT- FROM- WHERE expression as well as GROUP BY and HAVING clause if these
are part of the block. It can also include aggregate operators such as MAX, COUNT, MIN
and SUM. These operators must also include in the extended algebra.
 Query optimization is a function of many relational database management systems. The
query optimizer attempts to determine the most efficient way to execute a given query by
considering the possible query plans. Generally, the query optimizer cannot be accessed
directly by users: once queries are submitted to database server, and parsed by the parser,
they are then passed to the query optimizer where optimization occurs. It aims to choose
an efficient execution strategy for query execution.
 The optimizer automatically generates a set of reasonable strategies for processing a given
query, and selects one optimal on the basis of the expected cost of each of the strategies
generated.

Query Processing would mean the entire process or activity which involves query
translation into low level instructions, query optimization to save resources, cost
estimation or evaluation of query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would
minimize the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write operations
[which typically needs resources such as memory/RAM].
22 | P a g e
The programmer write code to perform the queries with higher level database query
languages such as SQL and a special component of the DBMS called the Query
Processor takes care of arranging the underlying access routines to satisfy a given query.
A query is processed in the following four general steps:
1. Scanning and Parsing
When a query is first submitted (via an applications program), it must be scanned and
parsed to determine if the query consists of appropriate syntax. Scanning is the process
of converting the query text into a tokenized representation. The tokenized representation
is more compact and is suitable for processing by the parser.
This representation may be in a tree form. The Parser checks the tokenized
representation for correct syntax. In this stage, checks are made to determine if columns
and tables identified in the query exist in the database and if the query has been formed
correctly with the appropriate keywords and structure. If the query passes the parsing
checks, then it is passed on to the Query Optimizer.
2. Query Optimization or planning the execution strategy
For any given query, there may be a number of different ways to execute it. Each
operation in the query (SELECT, JOIN, etc.) can be implemented using one or more
different Access Routines.
For example, an access routine that employs an index to retrieve some rows would be
more efficient that an access routine that performs a full table scan.
The goal of the query optimizer to find a reasonably efficient strategy for executing the
query (not quite what the name implies) using the access routines. Optimiza tio n
typically takes one of two forms:
Heuristic Optimization or Cost Based Optimization
In Heuristic Optimization, the query execution is refined based on heuristic rules for
reordering the individual operations. With Cost Based Optimization, the overall cost of
executing the query is systematically reduced by estimating the costs of executing several
different execution plans.
3. Query Code Generator (interpreted or compiled)
Once the query optimizer has determined the execution plan (the specific ordering of
access routines), the code generator writes out the actual access routines to be executed.
With an interactive session, the query code is interpreted and passed directly to the
runtime database processor for execution. It is also possible to compile the access
routines and store them for later execution.
23 | P a g e
4. Execution in the runtime database processor
At this point, the query has been scanned, parsed, planned and (possibly) compiled. The
runtime database processor then executes the access routines against the database. The
results are returned to the application that made the query in the first place. Any runtime
errors are also returned.
The major steps involved in query processing are depicted in the figure
below;

Figure 1. 2 Steps in Database Query Processing

Let us discuss the whole process with an example. Let us consider the following two relations
as the example tables for our discussion;
Employee (Eno, Ename, Phone) Proj_Assigned(Eno, Proj_No, Role, DOP)
Where,
Eno is Employee number, Ename is Employee name,
Proj_No is Project Number in which an employee is
assigned; Role is the role of an employee in a project,
DOP is duration of the project in months.
With this information, let us write a query to find the list of all employees who are working
in a project which is more than 10 months old.
SELECT Ename FROM Employee, Proj_Assigned WHERE Employee. Eno =
Proj_Assigned. Eno AND DOP > 10;
Input:
A query written in SQL is given as input to the query processor. For our case, let us consider
the SQL query written above.
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the query, the
user’s privileges to execute the query, the table names and attribute names, etc. The correct
24 | P a g e
table names attribute names and the privilege of the users can be taken from the system
catalog (data dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL to low
level instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent as follows;
πEname(σDOP>10 Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_Assigned))

Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical data are
information about the size of the table, the length of records, the indexes created on the
table, etc. Optimizer also checks for the conditions and conditional attributes which are parts
of the query.
Step 4: Execution Plan
A query can be expressed in many ways. The query processor module, at this stage, using
the information collected in step 3 to find different relational algebra expressions that are
equivalent and return the result of the one which we have written already.
For our example, the query written in Relational algebra can also be written as the one given
below;
πEname (Employee ⋈Eno (σDOP>10

(Prof_Assigned)))

So far, we have got two execution plans. Only condition is that both plans should give the
same
Result

Step 5: Evaluation
Though we got many execution plans constructed through statistical data, though they return
same result (obvious), they differ in terms of Time consumption to execute the query, or
the Space required executing the query. Hence, it is mandatory to choose one plan which
obviously consumes less cost.
At this stage, we choose one execution plan of the several we have developed. This Executio n
plan accesses data from the database to give the final result.

25 | P a g e
In our example, the second plan may be good. In the first plan, we join two relations (costly
operation) then apply the condition (conditions are considered as filters) on the joined
relation. This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the
Employee table. This join may need to compare less number of records. Hence, the second
plan is the best (with the information known, not always).
Example: See the following schema
Sailors (sid: integer, sname: string, rating: integer, age:
real) Reserves (sid: integer, bid: integer, day: dates,
rname: string)
Reserves:
Each tuple is 40 bytes long
100 tuples per page 1 00 pages. Sailors:
Each tuple is 50 bytes long
80 tuples per page
500 pages.
Consider the following SQL query:
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100
AND S.rating > 5
This query can be expressed in relational algebra (RA) as follows:

πsname (σbid=100∧rating>5 (Reserves sid=sid Sailors))

The algebraic expression partially specifies how to evaluate the query:


✓ Compute the natural join of Reserves and Sailors
✓ Perform the selections

✓ Project the sname field


The RA Tree

26 | P a g e
Query Expressed as a Relational Algebra Tree

SELECT ENAME FROM EMP, ASG WHERE EMP>ENO=ASG>ENO AND


ASG.RESP=”manager”

EXAMPLE
SELECT V,Vno, Vname, count(*), sum(Amount) FROM Vendor V, Transaction T
WHERE V,Vno=T,Vno AND V,Vno berween 1000 AND 2000 GROUP BY V,Vno,
Vname HAVING sum(Amount)>100
➢ S c a n the Vendor table, select all tuples where Vno = [1000, 2000], eliminate attributes
other than Vno and Vname, and place the result in a temporary relation R1
➢ J o i n the tables R1 and Transaction, eliminate attributes other than Vno, Vname, and
Amount,
and place the result in a temporary relation R2. This may
involve:
✓ sorting R1 on Vno
✓ sorting Transaction on Vno
✓ merging the two sorted relations to produce R2
➢ p e r f o r m grouping on R2, and place the result in a temporary relation R3. This
may involve:
✓ sorting R2 on Vno and Vname
✓ grouping tuples with identical values of Vno and Vname

27 | P a g e
✓ counting the number of tuples in each group, and adding their Amounts
➢ S c a n R3, select all tuples with sum(Amount) > 100 to produce the result.

Find the names of employees other than J. Doe who worked on the CAD/CAM project for
either one or two years
SELECT ENAME FROM PROJ P, ASG G, EMP E WHERE G.ENO=E.ENO AND

G.PNO=P.PNO AND E.ENAME<>’J.Doe’ AND P>PNAME=’CAD/CAM’ AND


(G>DUR=12
OR G.DUR=24)

28 | P a g e
EQUIVALENT QUERY

ANOTHER EQUIVALENT QUERY

Example: Tree for a Query Using the relations Bars(name, addr) and Sells(bar, beer, price),
find the names of all the bars that are either on Maple St. or sell Bud for less than $3.

29 | P a g e
Expression Trees: Example.
MovieStar(Name,Address,Gender,Birthdate) StarIn(Title,Year,StarName)
Query: “Find the birthdate and the movie title for those female stars who appeared in movies
in 1996.
SELECT Title, Birthdate FROM MovieStar, StarIn WHERE Year=1996 AND
Gender=’F’ ND Name=StarName;

Example. StarIn (Title, Year, StarName).


SELECT StarName, MIN(Year) as minYear FROM StarIn GROUP BY StarName
HAVING COUNT(Title)>=3;

Review Questions 1.1.

Discuss the following term by using appropriate example


What is RDBMS?
Explain Query Processing in DBMS
How to translating SQL Queries into Relational Algebra?
Parsing and Translation
Query Evaluation Plan
Optimization

30 | P a g e
Chapter 1:- SQL MCQ (Multiple Choice Questions)
.What is the full form of SQL?
A. Structured Query List
B. Structure Query Language
C. Sample Query Language
D. None of these.
1. Which of the following are TCL commands?
A. COMMIT and ROLLBACK
B. UPDATE and TRUNCATE
C. SELECT and INSERT
D. GRANT and REVOKE
2. Which of the following is also called an INNER JOIN?
A. SELF JOIN
B. EQUI JOIN
C. NON-EQUI JOIN
D. None of the above
3. Find the cities name with the condition and temperature from table 'whether' where condition =
sunny or cloudy but temperature >= 60.

A. SELECT city, temperature, condition FROM weather WHERE condition = 'cloudy' AND
condition = 'sunny' OR temperature >= 60
B. SELECT city, temperature, condition FROM weather WHERE condition = 'cloudy' OR condition
= 'sunny' OR temperature >= 60
C. SELECT city, temperature, condition FROM weather WHERE condition = 'sunny' OR condition
= 'cloudy' AND temperature >= 60
D. SELECT city, temperature, condition FROM weather WHERE condition = 'sunny' AND condition
= 'cloudy' AND temperature >= 60
4. Which of the following statement is correct to display all the cities with the condition, temperature,
and humidity whose humidity is in the range of 60 to 75 from the 'whether' table?
A. SELECT * FROM weather WHERE humidity IN (60 to 75)
B. SELECT * FROM weather WHERE humidity BETWEEN 60 AND 75
C. SELECT * FROM weather WHERE humidity NOT IN (60 AND 75)
D. SELECT * FROM weather WHERE humidity NOT BETWEEN 60 AND 75
5. Reducing the complexity of complex queries by similarly handling sub-queries is known as ___
A. Complex query handling

31 | P a g e
B. Multi query optimization
C. Complex query optimization
D. Parametric query optimization

32 | P a g e
Chapter 2: Database Security and Authorization
CHAPTER OUTCOME

Students will be able to:

Establish the goals and objectives of a database security system


Examine potential security problems and review solution option
Understand the principles and applications of access control
Distinguish between authentication and authorization and learn how these are performed
in a database system
Study encryption and its application to databases
Review special security considerations such as security of statistical databases and the use
of views for database security

2.1. Introduction of database security


Think of a storage place for a manufacturing organization where it keeps all the important raw
materials required for making its products. The organization uses these materials to run its
production. Various types of materials form parts of different products. Some types of
materials are more critical and sensitive to the production process. The organization cannot
perform its day-to-day operations without access to the materials.
Will the organization allow anyone to walk into the storage facility off the street and have
access to the materials? Unless the place is properly secured, unauthorized persons can enter
and steal or destroy the materials. Not even all authorized persons may be permitted access to
all parts of the storage area. Sensitive and critical materials must be off-limits to most people.
For a modern organization, its database system is even more significant than many other types
of assets. Many organizations such as financial institutions and travel companies cannot
survive even a single day without their database systems.
Any type of destruction of or unauthorized access to the database system has serious impact.
Obviously, an organization must ensure that its database system is adequately guarded against
accidental breaches of security or theft, misuse, and destruction through malicious intent.
Every organization must protect its database system from intentional and unintentional threats.
To do so, it must employ both computer-based and other types of controls. The DBMS must
include a proper security system to protect the database from unauthorized access.

33 | P a g e
2.2. Security Issues
 What are we trying to protect by ensuring database security?
 What levels of information need to be safeguarded and how?
 What are the types of problems and threats that deserve special attention?
 Can we distinguish between threats from outside and internal threats?
 Do these require different types of protection mechanisms?
 What are the solution options?
 How is protection of privacy related to database security?
Let us address these broad questions before getting into specific access control techniques.
Many organizations are opening up their database systems for access over the Internet. This
openness results in great advantages but, at the same time, makes the database system
vulnerable to threats from a much wider area. Web security demands special attention.

2.3. Goals and Objectives


Specifically, what are we trying to protect? How are we planning to protect? These questions
form the basis for discussions on database security. Let us consider the primary goals and
objectives. Figure below provides an overview of the security system for a database.
Note the following three broad goals of database security highlighted in the figure.
 Denial of access to the database by unauthorized users
 Guarantee of access to all authorized users
 Protection of privacy of data
In a broad sense, you understand database security and what protection means. However, let
us get into specific objectives. What are the particular objectives to deal with individual types
of threats? Here is a list of specific objectives of a security system:
 Shield from destruction, Shield the database from fire or any other such disaster.
 Safeguard from theft. Safeguard the database from malicious schemes of competitors or
profiteers to steal the data content.
 Defense from vandalism.Defend the database from the attempts of ingenious, disgruntled
professionals intending to tamper with and vandalize the database.
 Provide safety from fraud.Keep the database safe from persons with intentions to commit
fraud or to misuse its contents.

34 | P a g e
Figure 2. 1 Database security system

 Shelter of privacy. Shelter the privacy of individuals and institutions about whom data reside in
the database.
 Identification of users. Be able to positively identify authorized users.
 Authorization of users. Guarantee access to authorized users.
 Scope of authorization. Be able to authorize individual users for specific portions of the database
as needed.
 Levels of authorization. Provide individual users with particular authorization levels to read,
update, add, or delete data.
 Monitoring of usage.Be able to monitor access by authorized users to keep audit trails for tracing
actions.

2.4. Security Problems


Many aspects of security problems require attention in a database environment. Legal, social, and
ethical aspects are involved. Does the person requesting for particular information have a legal right to
that piece of information? Also, there are policies questions about who decides on what types of access
authorizations must be granted to whom and when. Operational and administrative aspects need to be
considered How do you allocate passwords, maintain them, and preserve confidentiality? What about
physical controls to prevent problems? Should workstations and servers be guarded with physical lock-
and-key schemes? Are hardware controls available in your environment to be used for database
security? Are there security schemes in the operating system itself? Finally, what are the security
provisions in your DBMS, and to what extent can your environment take advantage of these
provisions? To come up with solution options, first it will be worthwhile to classify the types of security
problems likely to be encountered. When you are able to classify the threats, you will be able to find
35 | P a g e
solutions to each type of problem. Broadly, we may classify the types of security exposure in a database
environment as follows:
 Natural disasters.Fire, floods, and other such catastrophes.
 Human carelessness.Unintended damage caused by authorized users, especially while
running jobs in batch.
 Malicious damage. Sabotage, vandalism, actions of malicious programmers, technical
support staff, and persons performing database administration functions.
 Crime.Theft, embezzlement, industrial espionage, and employees selling a company’s
secrets and data for mailing lists.
 Privacy invasion.Casual curiosity, data lookup by competitors, obtaining data for
political or legal reasons.
Let us put together the components of the problems of database protection and summarize the
potential threats. Figure 3 presents a summary of threats to database security. Note each
component showing the type of threat and its source.

Figure 2. 2 Threats to database security.

2.5. Solution Options


We have looked at the types of potential threats to a database system. Various types of sources
pose different threats. How do you make provisions to protect your database system? When
you consider each type of threat or problem, adopt a three-level approach to problem
resolution:

36 | P a g e
 Minimize the probability of the problem happening. Establish enough protection rings to
enclose the database system. Take all the necessary protective measures and institute
strong deterrents.
 Diminish the damage if it happens. If an intruder manages to penetrate the outer layer of
protection, make it progressively difficult to cut through the inner layers. Guard the most
sensitive portions of the database with the most stringent security measures.
 Devise precise recovery schemes. If a vandal manages to destroy some parts of the
database, have a tested method to recover from the damage. If a fire destroys your
database, plan to be able to restore from a copy stored off-site.
When you examine the types of threats, you will notice that most of the recovery solutio ns
must be a combination of general control procedures and computer-based techniques. Let us
explore the nature of these two types of solution method

2.6. General Control Procedures


These are matters of broad security policy and general procedures. Although these procedures
deal with the security of the database in a computer system, most of these do not involve the
direct use of computers. Many of these relate to planning and policy-making. Some are
physical controls, and a few others involve outside agencies. The following is a list of such
security measures.
 Physical controls. Include physical access to buildings, monitoring of visitors at entrances
and exits, and guarding of workstations and servers.
 Human controls. Safeguard against threats from information system professionals and
specialists by proper security clearance to work on sensitive data.
 Control of equipment.Includes secure placement of equipment such as laptops loaded
with sensitive data and printers that are designated to print critical data.
 Security through outside agencies.Refers to third-party storage areas to keep backup
copies of database and outside installations that can be used for disaster recovery.
 Contingency Plans.Intended to be adopted in case of fire or bomb alerts. Plans must
include designation of responsibilities and procedures for recovery.
 Security Policy.An essential element of the security system to address the scope of the
security schemes, the duties and responsibilities of employees, the procedure to be
followed, and disciplinary action in the event of noncompliance with policy and
procedures.

37 | P a g e
 Computer-Based Techniques. Now let us turn our attention to the types of
countermeasures that are executed through the use of the computer system including the
DBMS. Here is a list of the major techniques:
 Authorization of users. Includes authentication of authorized users and granting of access
privileges to them.
 Tailoring authorization through views. Defining user views to have the ability to
authorize users for specific portions of the database.
 Backup and recovery. Creation of backup copies of the database at regular intervals and
also testing and implementing recovery procedures.
 Protection of sensitive data. Use of encryption technology to protect sensitive data. All
DBMSs have security systems to guarantee database access to authorized users.
Commonly, these security mechanisms are referred to as discretionary and mandatory
security mechanisms. Let us define the scope of this division:
 Discretionary security mechanisms. Used for granting and revoking data access
privileges to users for accessing specific parts of a database in any of the access modes of
read, update, add, and delete.
 Mandatory security mechanisms. Used for establishing security at multiple levels by
classifying users into distinct groups and grouping data into distinct segments and,
thereafter, assigning access privileges for particular user groups to data segments.
From our discussions so far, you must have concluded that database security is critical but also
difficult. You must look toward enforcing database security at different levels. Security
mechanisms must exist at several layers such as within the database system itself, at the leve l
of the operating system, the network, the application, the hardware, and so on. Figure 4 clearly
illustrates the layers of control for database security.

2.7. Privacy Issues


Businesses and government agencies collect and store large volumes of information about
customers, suppliers, distributors, and employees. Data privacy concerns those kinds of
information that relate to individuals and external organizations that are part of the company’s
database. Who owns this information the company that has the database or the individuals and
organizations to whom the information relates? Who can access this information? Can this
information be sold to others? What are the regulations?

38 | P a g e
Data privacy fits into data security in an unorthodox manner. Data security is generally thought
of as the protection of a company’s data from unauthorized access. Who authorizes access,
and who decides on how and to whom access must

Figure 2. 3 Database security: layers of control.

be granted? Of course, the company does this because it is deemed that the company owns the
data in the database. In the same way, data privacy may be thought of as protecting informa tio n
about employees, customers, suppliers, and distributors from unauthorized access. Who
decides on this authorization? Naturally, the owners must make the decision. Who are the
owners—the company or those about whom information is collected and stored? Privacy
issues are becoming more and more sensitive in North America, as they have been in Europe
for some time. Legislation about privacy and confidentiality of information varies from region
to region. Some basic rights are available to those about whom data is retained in corporate
databases.
Individuals and institutions may inquire about what information about them is stored and may
demand to correct any information about them. Privacy concerns escalate with the widespread
use of the Internet. Although formal regulations may not be adequate, organizations are
ethically obliged to prevent misuse of the information they collect about individuals and third -
party institutions.

39 | P a g e
2.8. Access Control
Essentially, database security rests on controlling access to the database system. Controlling
physical access forms one part of database security. The other major part consists of
controlling access through the DBMS. Let us consider two primary dimensions of access
control. One dimension of access control deals with levels of data access. A single user or a
category of users may be granted access privileges to database objects at various levels of
detail.
Another dimension of access control refers to the modes or types of access granted to a single
user or to a category of users. How do you grant access privileges to a single user or user
category? This leads to the two basic approaches to access control.
As noted above, the DBMS provides two basic approaches to access control: discretionar y
control and mandatory control. Discretionary access control refers to the granting of privile ges
or rights to individual users. Although discretionary access control is fairly effective, it is
possible for an unauthorized user to gain privileges through an unsuspecting authorized user.
Mandatory access control is more effective in overcoming the defects of discretionary access
control.
We will first discuss how data access control pertains to levels of database objects and access
types or modes. Data levels and access types form a grid, and access privileges may be granted
at the intersections of data levels and access types. Our discussion will continue on the
mechanisms for granting access privileges under the discretionary or mandatory access control
approaches. In this section, you will also study the two important topics of authentication and
authorization of users.

2.8.1. Levels and Types of Data Access


Let us grasp the significance of data levels for the purpose of granting access privile ges.
Consider the following database relation containing data about a worker in a constructio n
company:

Examine the following list of possible ways of granting of access privileges to a specific user:
 User has unlimited access privileges to the entire WORKER relation.
 User has no access privileges of any kind to any part of the WORKER relation.
 User may only read any part of WORKER relation but cannot make any changes at all.
 User may read only his or her row in the relation but cannot change any columns in that row.

40 | P a g e
 User may read only his or her row in the relation but can change only the Name and
Address columns.
 User may read only the WorkerId, Name, Address, and SuperId columns of any record but
can change only the Name and Address columns.
 User may read only the WorkerId and WageRate columns of any record but can modify
the WageRate column only if the value is less than 5.00.
 User may read all columns of any record but can modify the WageRate only if the SuperId
column value is the value of WorkerId of that user.
The above list is in no way exhaustive. Yet you can readily observe that a general method of
security enforcement must possess a great range and flexibility. A flexi ble security system in
a DBMS must be able to grant privileges at the following data levels:
The whole database
Individual relation; all rows and all columns
All rows but only specific columns of a relation
All columns but only specific rows of a relation
Specific rows and specific columns of a relation
Now let us move on to the consideration of modes or types of data access. You are familiar
with access types or modes of create, read, update, and delete (sometimes indicated by the
acronym CRUD). Let us expand the list of access types to include all types:
Insert or Create. Add data to a file without destroying any data.
Read. User may read and copy data from the database into the user’s environment through an
application program or a database query.
Update. Write updated values.
Delete. Delete and destroy specific data objects.
Move.Move data objects without the privilege of reading the contents.
Execute.Run a program or procedure with implied privileges needed for the execution.
Verify Existence.Verify whether a specific database objects exists in the database. You have
noted the various access types and also the levels of data eligibility based on which access
privileges may be granted. What is your observation from this discussion? What are the
implications? You can easily realize the immense flexibility needed for giving access
privileges. Although numerous variations are possible, most commonly access privileges are
granted to single relations in the CRUD modes.

41 | P a g e
2.8.2. Discretionary Control
As mentioned above, in this approach, individual users are granted privileges or rights to
access specific data items in one or more designated modes. On the basis of the specificatio n
of privileges, a user is given the discretion to access a data item in the read, update, insert, or
delete modes. A user who created a database object automatically derives all privileges to
access the object including the passing on of privileges to other users with regard to that object.
We introduced the SQL commands for granting and revoking access privileges. This is how
SQL supports discretionary access control. Now we will explore the fundamental concepts of
discretionary access control and go over a few more examples.
 Basic Levels There are two basic components or levels for granting or revoking access
privileges:
 Database Objects Users Data item or data element, generally a base table or view A single
user or a group of users identifiable with some authorization identifier With these two
components, access privileges may be granted as shown in the following general
command:
GRANT privileges ON database object TO users

At the level of users, access privileges include the following:


At the level of database objects, the access privileges listed above apply to the following:
Authorization Matrix We can think of the two levels of users and database objects forming a

matrix for the purpose of granting access privileges. Set the users as columns and the database
objects as rows. Then in the cells formed by the intersection of these columns and rows we
can specify the type of privilege granted.
Table presents an example of a type of authorization matrix. Note how this type of presentation
makes it easy to review the access privileges in a database environment.

42 | P a g e
Owner Account. Each database table or relation has an owner. This user account that created
the table possesses all access privileges on that table. The DBA can assign an owner to an
entire schema and grant the appropriate access privileges.
The owner of a database object can grant privileges to another user. This second user can then
pass along the privileges to a third user and so on. The DBMS keeps track of the cycle of
granting of privileges.

Table 2.1. Authorization matrixes.


Here is an example of a cycle of privileges passed along from Rogers, who is the owner of
table EMPLOYEE

The illustrates this cycle of privileges with an authorization graph. Note how the privile ges
are passed along and how the revoking of privileges with cascade option works.
REFERENCES Option the REFERENCES privilege is not the same as the SELECT
privilege. Let us take an example. Suppose Nash is the owner of the DEPARTMENT table as
indicated below:

Nash can authorize Miller to create another table EMPLOYEE with a foreign key in that table
to refer to the DeptNo column in the DEPARTMENT table. Nash can do this by granting
Miller the REFERENCES privilege with respect to the DeptNo column. Note the
EMPLOYEE table shown below:

43 | P a g e
If Miller loses the REFERENCES privilege with respect to the DeptNo column in the
DEPARTMENT table, the foreign key constraint in the EMPLOYEE

Figure 2. 4 Authorization graph.

Table will be dropped. The EMPLOYEE table itself, however, will not be dropped. Now
suppose Miller has the SELECT privilege on the DeptNo column of the DEPARTMENT table,
not the REFERENCES privilege. In this case, Miller will not be allowed to create the
EMPLOYEE table with a foreign key column referring to DeptNo in the DEPARTMENT
table.
Why not grant Miller the SELECT privilege and allow him to create the EMPLOYEE table
with a foreign key column referring to the DeptNo column in the DEPARTMENT table? If
this is done, assume that Miller creates the table with a foreign key constraint as follows:

With the NO ACTION option in the foreign key specification, Nash is prevented from deleting
rows from the DEPARTMENT table even though he is the owner. For this reason, whenever
such a restrictive privilege needs to be authorized, the more stringent privilege REFERENCES
is applied. The SELECT privilege is therefore intended as permission just to read the values.
44 | P a g e
2.8.3. Use of Views
Earlier we had discussions on user views. A user view is like a personalized model of the
database tailored for individual groups of users. If a user group, say, in the marketing
department, needs to access only some columns of the DEPARTMENT and EMPLOYEE

tables, then you can satisfy their information requirements by creating a view comprising just
those columns.
This view hides the unnecessary parts of the database from the marketing group and shows
them only those columns hey require.
Views are not like tables in the sense that they do not store actual data. You know that views
are just like windows into the database tables that store the data. Views are virtual tables.
When a user accesses data through a view, he or she is getting the data from the base tables,
but only from the columns defined in the view.
Views are intended to present to the user exactly what is needed from the database and to make
the rest of the data content transparent to the user.
However, views offer a flexible and simple method for granting access privileges in a
personalized manner. Views are powerful security tools. When you grant access privileges to
a user for a specific view, the privileges apply only to those data items defined in the views
and not to the complete base tables themselves.
Let us review an example of a view and see how it may be used to grant access privileges. For
a user to create a view from multiple tables, the user must have access privileges on those base
tables. The view is dropped automatically if the access privileges are dropped. Note the
following example granting access privilege to Miller for reading EmployeeNo, FirstName,
LastName, Address, and Phone information of employees in the department where Miller
works.
SQL Examples

45 | P a g e
In this we considered a few SQL examples on granting and revoking of access privileges. Now
we will study a few more examples. These examples are intended to reinforce your
understanding of discretionary access control. We will use the DEPARTMENT and
EMPLOYEE tables shown above for our SQL examples.
DBA gives privileges to Miller to create the schema:
GRANT CREATETAB TO Miller;

Miller defines schema, beginning with create schema statement:


CREATE SCHEMA EmployeeDB AUTHORIZATION Miller; (other DDL
Statements follow to define DEPARTMENT and EMPLOYEE tables)
Miller gives privileges to Rodriguez for inserting data in both tables:
GRANT INSERT ON DEPARTMENT, EMPLOYEE TO Rodriguez;
Miller gives Goldstein privileges for inserting and deleting rows in both tables, allowing
permission to propagate these privileges:
GRANT INSERT, DELETE ON DEPARTMENT, EMPLOYEE TO Goldstein WITH
GRANT OPTION;
Goldstein passes on the privileges for inserting and deleting rows in the DEPARTMENT
table to Rogers:
GRANT INSERT, DELETE ON DEPARTMENT TO Rogers;
Miller gives Williams the privilege to update only the salary and position columns in the
EMPLOYEE table:
GRANT UPDATE ON EMPLOYEE (Salary, EmployeePosition) TO Williams
DBA gives Shady privilege to create tables:
GRANT CREATETAB TO Shady;
Shady creates table MYTABLE and gives Miller privilege to insert rows into
MYTABLE:
GRANT INSERT ON MYTABLE TO Miller;

2.8.4. Mandatory Control


Discretionary access control provides fairly adequate protection. This has been the
traditional approach in relational databases. A user either has certain access privileges
or he or she does not. The discretionary access control method does not support
variations based on the sensitivity of parts of the database. Although the method is
sufficient in most database environments, it is not bulletproof. An ingenious

46 | P a g e
professional can drill holes into the protection mechanism and gain unauthorized
access.
Note the actions of user Shady indicated in the last few statements of the previous
subsection. Shady has created a private table MYTABLE of which he is the owner. He
has all privileges on this table. All he has to do is somehow get sensitive data into
MYTABLE. Being a clever professional, Shady may temporarily alter one of Miller’s
programs to take data from the EMPLOYEE data and move the data into MYTABLE.
For this purpose, Shady has already given privileges to Miller for inserting rows into the
MYTABLE table.
This scenario appears as too unlikely and contrived. Nevertheless, it makes the
statement that discretionary access control has its limitations.
Mandatory access control overcomes the shortcomings of discretionary access control.
In the mandatory access control approach, access privileges cannot be granted or
passed on by one user to another in an uncontrolled manner. A well -defined security
policy dictates which classes of data may be accessed by users at which clearance levels.
The most popular method is known as the Bell–LaPadula model. Many of the
commercial relational DBMSs do not currently provide for mandatory access control.
However, government agencies, defense departments, financial institutions, and
intelligence agencies do require security mechanisms based on the mandatory control
technique.

Look at the first property, which is fairly intuitive. This property allows a subject to read an
object only if the subject’s clearance level is higher than or equal to that of the object.

47 | P a g e
Try to understand what the second property is meant to prevent. The second property prohibits
a subject from writing to an object in a security class lower than the clearance level of the
subject. Otherwise, information may flow from a higher class to a lower class. Consider a user
with S clearance. Without the enforcement of the star property, this user can copy an object in
S class and rewrite it as a new object with U classification so that everyone will be able to see
the object.
Get back to the case of Shady trying to access data from the EMPLOYEE table by tricking
Miller. The mandatory access control method would spoil Shady’s plan as follows:
 Classify EMPLOYEE table as S.
 Give Miller clearance for S.
 Give Shady lower clearance for C.
Shady can therefore create objects of C or lower classification. MYTABLE will be in class C
or lower. Miller’s program will not be allowed to copy into MYTABLE because

SPECIAL SECURITY CONSIDERATIONS


We have covered several topics on database security so far. You know the common types of
security threats. You have explored solution options. You have reviewed the two major
approaches to database access control discretionary and mandatory. Before we finish our
discussion of the significant topics, we need to consider just a few more.
In our discussion on granting access privileges, we have been referring to individual users or
user groups that need access privileges. Who are these users, and how do you identify them to
the database system? This is an important question we need to address. Another obvious
question is, where is the DBA in all of these database security provisions, and what is the role
of the DBA? Finally, we will inspect what are known as statistical databases and consider
special security problems associated with these.
Authorization
The security mechanism protecting a database system is expected to prevent users from
performing database operations unless they are authorized to do so. Authorization for data
access implies access control. We have discussed discretionary and mandatory access control
approaches. Let us now complete the discussion by touching on a few remaining topics.
 A Profile To authorize a subject that may be a user, a group of users, a program, or a
module, an account is assigned to the subject. Let us confine our discussion to a subject

48 | P a g e
who is a user. User Samantha Jenkins is eligible to have access to the human resources
database. So first, Jenkins must be assigned an account or user identification.
The DBMS maintains a user profile for each user account. The profile for Jenkins includes all
the database objects such as tables, views, rows, and columns that she is authorized to access.
In the user profile, you will also find the types of access privileges such as read, update, insert,
and delete granted to Jenkins.
Alternatively, the DBMS may maintain an object profile for each database object. An object
profile is another way of keeping track of the authorizations. For example, in the object profile
for the EMPLOYEE table, you will find all the user accounts that are authorized to access the
table. Just like a user profile, an object profile also indicates the types of access privileges.
 Authorization Rules The user profile or the object profile stipulates which user can access
which database object and in what way. These are the authorization rules. By examining
these rules, the DBMS determines whether a specific user may be permitted to perform
the operations of read, update, insert, or delete on a particular database object. You have
already looked at an example of an authorization matrix in table 2. This matrix tends to be
exhaustive and complex in a large database environment.

. Table 2.2. Implementation of authorization rules


Many DBMSs do not implement elaborate information matrices as presented in table 2 to
enforce authorization rules. Instead, they adopt simpler versions to implement authoriza tio n
rules. Authorization rules may be represented as an authorization table for either subjects or
users. On the other hand, an authorization table for objects can also do the job. A DBMS may
implement authorization rules through either of the two tables or both.

49 | P a g e
Table 2 presents both options for implementing authorization rules. Note how you can derive
the same rule authorizing Samantha Jenkins to access the EMPLOYEE table with read and
insert privileges.
Enforcing Authorization Rules We have authorization rules in an authorization matrix or in
the form of authorization tables for users or objects. How are the rules enforced by the DBMS?
A highly protected privilege module with unconstrained access to the entire database exists to
enforce the authorization rules.
This is the arbiter or security enforcer module, although it might not go by those names in
every DBMS. The primary function of the arbiter is to interrupt and examine every
database operation, check against the authorization matrix, and either allow or deny the
operation.
Suppose after going through the interrogation sequence, the arbiter has to deny a database
operation. What are the possible courses of action? Naturally, the particular course of action
to be adopted depends on a number of factors and the circumstances. Here are some basic
options provided in DBMSs:
If the sensitivity of the attempted violation is high, terminate the transaction and lock the
workstation.
For lesser violations, send appropriate message to user.
Record attempted security breaches in the log file.

Table 2.3, Arbiter interrogation list.


Monitor for continued attempts of security breaches by same user for possible censure.
Authentication
Let us return to the authorization of access privileges to Samantha Jenkins. The authoriza tio n
matrix contains security authorization rules for her. When she attempts to perform any
database operation, the DBMS, through its arbiter module, can verify authorization rules as
applicable and either allow or deny the operation. When Samantha Jenkins signs on to the
database system with her user-id, in effect, she declares that she is Samantha Jenkins. All
authorization she can have relates to the user known to the system as Samantha Jenkins.

50 | P a g e
Now when she signs on with her user-id and declares that she is Samantha Jenkins, how does
the system know that she is really who she says she is? How can the system be sure that it is
really Samantha Jenkins and not someone else signing on with her user-id? How can the
system authenticate her identity? Authentication is the determination of whether the user is
who he or she claims to be or declares he or she is through the user-id.
It is crucial that the authentication mechanism be effective and failsafe. Otherwise, all the
effort and sophistication of the authorization rules will be an utter waste. How can you ensure
proper authentication? Let us examine a few of the common techniques for authenticatio n.
 Passwords. Passwords, still the most common method, can be effective if properly
administered. Passwords must be changed fairly often to deter password thefts. They must
be stored in encrypted formats and be masked while being entered. Password formats need
to be standardized to avoid easily detectable combinations. A database environment with
highly sensitive data may require one-time-use passwords.
 Personal information. The user may be prompted with questions for which the user alone
would know the answers such as mother’s maiden name, last four digits of social security
number, first three letters of the place of birth, and so on.
 Biometric verification. Verification through fingerprints, voiceprints, and retina images,
and so on. Smartcards recorded with such biometric data may be used.
 Special procedures. Run a special authentication program and converse with the user.
System sends a random number m to the user. The user performs a simple set of operations
on the random number and types in the result n. System verify n by performing the same
algorithm on m. Of course, m and n will be different each time and it will be hard for a
perpetrator to guess the algorithm
 Hang-up and call-back. After input of user-id, the system terminates the input and
reinitiates input at the workstation normally associated with that user. If the user is there
at that customary workstation and answers stored questions for the user-id, then the system
allows the user to continue with the transaction.

2.8.4.1. Database Administrator


A data administration is a high level function responsible for the overall management of data
resources in an organization. In order to perform its duties, the DA must know a good deal of

51 | P a g e
 System analysis
 Programming.
 Network
 Database
 System design
Responsibility of data base administrator
 Software installation and Maintenance
 Data Extraction, Transformation, and Loading
 Specialized Data Handling
 Database Backup and Recovery
 Database Backup and Recovery
 Authentication
 Troubleshooting

2.8.4.2. Role of the DBA


The database administrator plays a pivotal role in security administration. Along with user
representatives and senior management staff including the DA, the DBA develops a security
policy appropriate for the environment. The DBA is the central authority executing the
security policy by setting up user accounts and granting access privileges. The DBA has a user
account that has extensive access privileges on all of the database objects.
Let us summarize the responsibilities of the DBA
 Creation of new accounts. Assign user-id and password to each user or group of users.
 Creation of views. Create user views as needed for the purpose of tailoring security
provisions for specific user groups.
 Granting of privileges. Grant access privileges for users or user groups to perform
database operations on database objects in accordance with security policy.
 Revocation of privileges. Cancel access privileges originally assigned to users or user
groups.
 Assignments of security levels. Assign user accounts to proper security classification for
mandatory access control. Designate security levels to database objects.
 Maintenance of audit trail. Extend log file record to include updates with user-ids.

2.8.5. Statistical Databases


Statistical databases pose a great and interesting challenge in the matter of data security.
Statistical databases are usually large databases intended to provide statistical information and
52 | P a g e
not information about individual entities. Statistical databases may contain data about lar ge
populations of entities.
A census database contains information about the people in specific geographic areas. The
database system of a large international bank holds information about the savings and
checking account activities of significant strata of the population. Databases of large financ ia l
institutions contain profiles of investors. Databases used in data warehousing and data mining
may be considered as statistical databases in some significant sense.
Need for Data Access Statistical databases serve critical purposes. They store rich data content
providing population statistics by age groups, income levels, household sizes, education
levels, and so on. Government statisticians, market research companies, and institutio ns
estimating economic indicators depend on statistical databases.
These professionals select records from statistical databases to perform statistical and
mathematical functions. They may count the number of entities in the selected sample of
records from a statistical database, add up numbers, take averages, find maximum and
minimum amounts, and calculate statistical variances and standard deviations.
All such professionals need access to statistical databases. However, there is one big differe nce
between users of an operational database needing access privileges and professionals requiring
access privileges to a statistical database. Users of an operational database need informa tio n
to run the day-to-day business to enter an order, to check stock of a single product, to send a
single invoice. That is, these users need access privileges to individual records in the database.
On the other hand, professionals using statistical databases need access privileges to access
groups of records and perform mathematical and statistical calculations from the selected
groups. They are not interested in single records, only in samples containing groups of records
Security Challenge So what is the problem with granting access privileges to professiona ls
to use a statistical database just the way you would grant privileges to use any other type of
database? Here is the problem: The professionals must be able to read individual records in a
select sample group for performing statistical calculations but, at the same time, must not be
allowed to find out what is in a particular record.
For example, take the case of the international bank. The bank’s statisticians need access to
the bank’s database to perform statistical calculations. For this purpose, you need to grant
them access privileges to read individual records. But, at the same time, you cannot allow
them to see Jane Doe’s bank account balance. The challenge in the case of the bank is this:
How can you grant access privileges to the statisticians without compromising the
confidentiality of individual bank customers?
53 | P a g e
Perhaps one possible method is to grant access privileges to individual records because the
statistician needs to read a group of records for the calculations but restrict the queries to
perform only mathematical and statistical functio ns such as COUNT, SUM, AVG, MAX,
MIN, variance and standard deviations.

Although this method appears to be adequate to preserve the confidentiality of individ ua l


customers, a clever professional can run a series of queries and narrow the intersection of the
query result to one customer record. This person can infer the values in individual rows by
running a series of ingenuous queries. Each query produces a result set.
Even though only statistical functions are permitted, by combining the different results
through a series of clever queries, information about a single entity may be determined. Figure
above illustrates how, by using different predicates in queries from a bank’s statistica l
database, the bank balance of a single
Customer Jane Doe may be determined. Assume that the infiltrator knows some basic
information about Jane Doe.
 Solution Options Safeguarding privacy and confidentiality in a statistical database proves
to be difficult. The standard method of granting access privileges does not work. In
addition to discretionary and mandatory techniques, other restrictions must be enforced on
queries. Here is a list of some solution options. None of them is completely satisfactor y.
Combinations of some of the options seem to be effective. Nevertheless, protection of
privacy and confidentiality in statistical databases is becoming more and more essential.
 Only statistical functions. Allow only statistical functions in queries.

54 | P a g e
 Same sample. Reject series of queries to the same sample set of records.
 Query types. Allow only those queries that contain statistical or mathematical functions.
 Number of queries. Allow only a certain number of queries per user per unit time.
 Query thresholds.Reject queries that produce result sets containing fewer than n records,
where n is the query threshold.
 Query combinations. The result set of two queries may have a number of common records
referred to as the intersection of the two queries. Impose a restriction saying that no two
queries may have an intersection larger than a certain threshold number.
 Data pollution. Adopt data swapping. In the case of the bank database, swap balances
between accounts. Even if a user manages to read a single customer’s record, the balances
may have been swapped with balances in another customer’s record.
 Introduce noise. Deliberately introduce slight noise or inaccuracies. Randomly add
records to the result set. This is likely to show erroneous individual records, but statistica l
samples produce approximate responses quite adequate for statistical analysis.
 Log queries. Maintain a log of all queries. Maintain a history of query results and reject
queries that use a high number of records identical to those used in previous queries.

2.9. ENCRYPTION
We have discussed the standard security control mechanisms in detail. You studied the
discretionary access control method whereby unauthorized persons are kept away from the
database and authorized users are guaranteed access through access privileges. You have also
understood the mandatory access control method, which addresses some of the weaknesses of
the discretionary access control scheme. Now you are confident that these two standard
schemes provide adequate protection and that potential intruders cannot invade the database.
However, the assumption is that an infiltrator or intruder tries to break into the system through
normal channels by procuring user-ids and passwords through illegal means. What if the
intruder bypasses the system to get access to the information content of the database? What if
the infiltrator steals the database by physically removing the disks or backup tapes? What if
the intruder taps into the communication lines carrying data to genuine users? What if a clever
infiltrator runs a program to retrieve the data by breaking the defenses of the operating system?
The normal security system breaks down in such cases. Standard security techniques fall short
of expectations to protect data from assaults bypassing the system. If your database contains
sensitive financial data about your customers, then you need to augment your security system
with additional safeguards. In today’s environment of electronic commerce on the Internet,
55 | P a g e
the need for dependable security techniques is all the more essential. Encryption techniques
offer added protection.
What is Encryption?
Simply stated, encryption is a method of coding data to make them unintelligible to an intruder
and then decoding the data back to their original format for use by an authorized user. Some
commercial DBMSs include encryption modules; a few others provide program exits for users
to code their own encryption routines.
Currently, encryption techniques are widely used in applications such as electronic fund
transfers (EFT) and electronic commerce. An encryption scheme needs a cryptosystem
containing the following components and concepts:
 An encryption key to code data (called plaintext)
 An encryption algorithm to change plaintext into coded text (called ciphertext)
 A decryption key to decode ciphertext
 A decryption algorithm to change ciphertext back into original plaintext

Figure 2. 5 Elements of encryption.

Figure above shows the elements of encryption. Note the use of the keys and where encryption
and decryption take place.
The underlying idea in encryption dictates the application of an encryption algorithm to
plaintext where the encryption algorithm may be accessible to the intruder. The idea includes

56 | P a g e
an encryption key specified by the DBA that has to be kept secret. Also is included a
decryption algorithm to do the reverse process of transforming ciphertext back into plaintext.
A good encryption technique, therefore, must have the following features:
Fairly simple for providers of data to encrypt
Easy for authorized users to decrypt
Does not depend on the secrecy of the encryption algorithm
Relies on keeping the encryption key a secret from an intruder
Extremely difficult for an intruder to deduce the encryption key
Just to get a feel for an encryption scheme, let us consider a simple example before proceeding
further into more details about encryption. First, we will use a simple substitution method.
Second, we will use a simple encryption key. Let us say that the plaintext we want to encrypt
is the following plaintext:
ADMINISTRATOR
Simple Substitution Use simple substitution by shifting each letter in the plaintext to three
spaces to the right in the alphabetic sequence. A becomes D, D becomes G; and so on. The
resulting ciphertext is as follows:
DGPLQLVWUDWRU
If the intruder sees a number of samples of the ciphertext, he or she is likely to deduce the
encryption algorithm.
Use of Encryption Key This is a slight improvement over the simple substitution method.
Here, let us a use a simple encryption key stipulated as “SAFE.” Apply the key to each four -
character segment of the plaintext as shown below:
ADMINISTRATOR
SAFESAFESAFES
The encryption algorithm to translate each character of the plaintext is as follows:
Give each character in the plaintext and the key its position number in the alphabetic scheme.
The letter “a” gets 1, the letter “z” gets 26, and a blank in the plaintext gets 27. Add the position
number of each letter of plaintext to the position number of the corresponding letter of the key.
Then apply division modulus 27 to the sum. This calculation means dividing a number by 27
and using the remainder to applying to the algorithm. Use the number resulting from the
division modulus 27 to find the letter to be substituted in the ciphertext.

57 | P a g e
Now compare the ciphertexts produced by the two methods and note how even a simple key
and fairly unsophisticated algorithm could improve the encryption scheme.

Encryption Methods
Three basic methods are available for encryption:
 Encoding. Most simple and inexpensive method. Here, for important fields, the values of
are encoded. For example, instead of storing the names of bank branches, store codes to
represent each name.
 Substitution. Substitute, letter for letter, in the plaintext to produce the ciphertext.
 Transposition. Rearrange characters in the plaintext using a specific algorithm. Usually a
combination of substitution and transposition works well. However, techniques without
encryption keys do not provide adequate protection. The strength of a technique depends
on the key and the algorithm used for encryption and decryption. However, with plain
substitution or transposition, if an intruder reviews sufficient number of encoded texts, he
or she is likely to decipher.
On the basis of the use and disposition of encryption keys, encryption techniques fall into
two categories.
1. Symmetric Encryption This technique uses the same encryption key for both
encryption and decryption. The key must be kept a secret from possible intruders. The
technique relies on safe communication to exchange the key between the provider of

58 | P a g e
data and an authorized user. If the key is to be really secure, you need a key as long as
the message itself. Because this is not efficient, most keys are shorter. The Data
Encryption Standard (DES) is an example of this technique.
2. Asymmetric Encryption This technique utilizes different keys for encryption and
decryption. One is a public key known openly, and the other is a private key known
only to the authorized user. The encryption algorithm may also be known publicly. The
RSA model is an asymmetric encryption method.

2.9.1. Data Encryption Standard


This technique, designed and developed by IBM in 1977, was adopted by the National Bureau
of Standards as the official Data Encryption Standard (DES). Since then, various industry
agencies have adopted DES. The technique uses a single key for both encryption and
decryption. The key has to be kept secret from potential intruders. Similarly, the encryption
algorithm must not be publicly known. The algorithm consists of character substitutions and
transpositions or permutations. Figure 10 illustrates this single-key encryption technique.
How it Works DES divides plaintext into blocks of 64 bits each. A 64-bit key is used to encrypt
each block. Although the key is 64 bits long, effectively, it is only a 56-bit key because 8 bits
are used as parity bits. Even with 56 bits in a key, there are 256 possible distinct keys. So there
are a huge number of choices for establishing a key.
Encryption takes place in the following sequence:
Apply an initial permutation to each block using the key.
Subject each transformed or permuted block to a sequence of 16 complex substitutio n
steps.
Finally, apply another permutation algorithm, the inverse of the initial permutation.
The decryption algorithm is identical to the encryption algorithm, except that the steps are in
the reverse order

59 | P a g e
Figure 2. 6 DES: single-key encryption.

Weaknesses Despite the complexity of the encryption algorithm and the sophistication of key
selection, DES is not universally accepted as absolutely secure.
 56-bit keys are inadequate. With the powerful and special hardware available now, they
are breakable. Even such expensive hardware is within the reach of organized crime and
hostile governments. However, 128-bit keys are expected to be unbreakable within the
foreseeable future. A better technique known as PGP (pretty good privacy) uses 128-bit
keys. Another possible remedy is double application of the algorithm at each step.
 Users must be given the key for decryption. Authorized users must receive the key
through secure means. It is very difficult to maintain this secrecy. This is a major
weakness. Critics point out the following deficiencies:
Public Key Encryption
This technique overcomes some of the problems associated with the DES technique. In
DES you have to keep the encryption key a secret, and this is not an easy thing to
accomplish. Public key encryption addresses this problem. The public key as well as the
encryption algorithm need not be kept secret. Is this like locking the door and making the
key available to any potential intruder? Let us examine the concept.
The widely used public key encryption technique was proposed by Rivest, Shamir, and
Adleman. It is known by the acronym RSA. The RSA model is based on the following
concepts:

60 | P a g e
Two encryption keys are used—one public and the other private.
Each user has a public key and a private key.
The public keys are all published and known openly.
The encryption algorithm is also made freely available.
Only an individual user knows his or her private key.
The encryption and decryption algorithms are inverses of each other.

Figure 2. 7 RSA: public key encryption.

How it Works Consider the following scenario:


Data provider U1 wants to share data with authorized user U2
U2 has a public key P2 and a private key R2.
U1 uses the public key P2 of U2 to encrypt the data and transmits the data to U2.
U2 uses the private key R2 to decrypt the data.
If data have to be transmitted securely, it must be extremely difficult for any intruder to deduce
the private key from the publicly known public key and the encrypting algorithm. The RSA
model stands on the premise that it is virtually impossible to deduce the private keys. This
premise rests on the following two facts:
A known efficient algorithm exists for testing whether or not a given number, however
large, is a prime number.
No known efficient algorithm exists for determining the prime factors of given number,
however large.
Incidentally, one of the authors of the RSA technique had estimated that testing whether a
given 130-digit number is a prime or not would take about 7 minutes on a fast computer. On
the other hand, if you have a large number obtained by multiplying two 63-digit prime

61 | P a g e
numbers, then it would take about 40 quadrillion years on the same machine to determine the
prime factors of the product!
The public key encryption technique treats data as a collection of integers. The public and
private keys are reckoned for an authorized user as follows:
Choose two large, random prime numbers n1 and n2.
Compute the product of n1 and n2. This product is also known as the limit L, assumed to
be larger than the largest integer ever needed to be encoded.
Choose a prime number larger than both n1 and n2 as the public key P
Choose the private key R in a special way based on n1, n2, and P
[If you are interested, R is calculated such that R * P = 1, modulo (n1–1) * (n2–1).] The limit
L and the public key P are made known publicly. Note that the private key R may be computed
easily if the public key P and the prime numbers n1 and n2 are given. However, it is extremely
difficult to compute the private key R if just the public key P and the limit L are known. This
is because finding the prime factors of L is almost impossible if L is fairly large.
Data Exchange Example Let we consider the use of public key encryption in a banking
application. Here are the assumptions:
Online requests for fund transfers may be made to a bank called ABCD. The bank’s
customer known as Good places a request to transfer $1 million.
The bank must be able to understand and acknowledge the request.
The bank must be able to verify that the fund transfer request was in fact made by customer
Good and not anyone else.
Also, customer Good must not be able to allege that the request was made up by the bank
to siphon funds from Good’s account.
Figure 12 illustrates the use of public key encryption technique showing the banking
transaction. Note how each transfer is coded and decoded.

62 | P a g e
Figure 2. 8 Public key encryption: data exchange.

Assessment 2.1
List the major goals and objectives of a database security system.
 Which ones are important?
What are the types of general access control procedures in a database environment?
What is data privacy? What are some of the privacy issues to be addressed
What is discretionary access control?
What are the types of access privileges available to users?
Describe an authorization graph with an example.
DBMS Security and Authorization->Choose Part II
1. Which type of command is GRANT?
A. Transaction Control Language (TCL) command
B. Data Query Language (DQL) command
C. Data Control language (DCL) command
D. Data Definition Language (DDL) command
E. None of these
1. In Oracle, which of the following is a group of privileges that are collected together and
granted to users?
A. Revoke
B. Grant
C. Role

63 | P a g e
D. Synonym
E. View
3. Create, Revoke, Grant and Drop commands are parts of _____________ language in
Oracle.
A. DML
B. DDL
C. Object-Oriented language
D. Procedural language
E. Assembly language
4. What is the SQL statement to grant select, update privileges to the user RAKESH for the
DEPT relation?
A. GRANT DML on DEPT to USER1;
B. GRANT select, update to USER1 on DEPT;
C. GRANT select, update on DEPT to USER1;
D. GRANT on DEPT to USER1 for select, update;
5. Data security threat includes
A. Privacy invasion
B. Hardware failure
C. Fraudulent manipulation of data
D. All of above
6. Locking can be used for
A. Deadlock
B. lost update
C. uncommitted dependency
D. inconsistent data
7. In an Oracle distributed database system, which of the following is the one in which each
server participating in a distributed database is administered independently from all other
databases?
A. Authentication through database links
B. Distributed database security
C. Auditing database links
D. Administration tools
E. Site autonomy

64 | P a g e
8. . _____ limits who gains access to the database while _____ limits what a user can access
within the database.
A. Access authentication, user definition
B. Access authentication, view definition
C. Data access, user monitoring
D. Access control, database security
9. _____ is the process of transforming data into an unreadable form to anyone who does
not know the key.
A. Data authentication
B. Data security
C. Data encryption
D. Database security management

65 | P a g e
Chapter 3: Transaction Processing Concepts

CHAPTER OUTCOME

Students will be able to understanding:

What about the Transaction and System Concepts?


What about the Properties of Transaction?
What Schedules and Recoverability of transaction?
Serializability of Schedules?
Transaction Support in SQL?

3.1. Introduction of transaction


Transaction processing systems are systems with large databases and hundreds of concurrent
users. It provides and all‐or‐noting‖ proposition stating that each work unit performed in
database must either complete in its entirety or have no effect what so ever.
Users of database systems are usually consider consistency and integrity of data as
highly important. A simple transaction is usually issued to the database system in a
language like SQL wrapped in a transaction. The concepts of Transaction management, types,
properties and recovery strategies are described in this unit.
What is a Transaction?
A transaction is a sequence of read and writes operations on data items that logically functio ns
as one unit of work.
 A transaction is a part of program execution that accesses and updates various data
items.
 A transaction can be defined as a group of tasks in which a single task is the minimum
processing unit of work, which cannot be divided further.
 A transaction is a logical unit of work that contains one or more SQL statements.
 A transaction is an atomic unit (transaction either complete 0% or 100%) OR It should
either be done entirely or not at all
 If it succeeds, the effects of write operations persist (commit); if it fails, no effects
of write operations persist (abort)
 A database transaction must be atomic, meaning that it must be either entirely
completed or aborted.
A transaction can be defined as a group of tasks. A single task is the minimum processing unit
which cannot be divided further. Let’s take an example of a simple transaction. Suppose a
66 | P a g e
bank employee transfers Rs 500 from A's account to B's account. This very simple and
small transaction involves several low-level tasks.
A’s Account
Open Account (A)
Old Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
A transaction is an atomic unit of work that should either be completed in its entirety or not
done at all. For recovery purposes, the system needs to keep track of when each
transaction starts, terminates, and commits or aborts . Therefore, the recovery manager of the
DBMS needs to keep track of the following operations:
BEGIN_TRANSACTION. This marks the beginning of transaction execution.
READ or WRITE. These specify read or write operations on the database items that are
executed as part of a transaction.
END_TRANSACTION. This specifies that READ and WRITE transaction operations have
ended and marks the end of transaction execution. However, at this point it may be necessary
to check whether the changes introduced by the transaction can be permanently applied
to the database (committed) or whether the transaction has to be aborted because it violates
serializability

3.2. Properties of Transaction (ACID property)

3.2.1. Atomicity
Either all operations of the transaction are properly reflected in the database or
none are.
Means either all the operations of a transaction are executed or not a single operation is
executed.
For example consider below transaction to transfer Rs. 50 from account A to account B:
67 | P a g e
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
7. COMMIT
In above transaction if Rs. 50 is deducted from account A then it must be added to
account B

3.2.2. Consistency
Execution of a transaction in isolation preserves the consistency of the database. Means our
database must remain in consistent state after execution of any transaction. In above example
total of A and B must remain same before and after the execution of transaction.

3.2.3. Isolation
Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions.
Intermediate transaction results must be hidden from other concurrently executed
transactions.
In above example once your transaction start from step one its result should not
be access by any other transaction until last step (step 7) is completed.

3.2.4. Durability
After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
Once your transaction completed up to step 7 its result must be stored permanently. It
should not be removed if system fails.
Transactions, Database items, Read and Write operations and DBMS buffers
A transaction is an executing program, forms a logical unit of database processing
Txn includes one or more database operations
Txn can be embedded in an application program (or it can be a command line query)
Txn boundary; begin Txn …..end Txn
A single application program can contain many Txns
If a Txn is retrieve and no updates, it is called a read only Txn, otherwise read-write

68 | P a g e
Data item can be a record or an entire block (granularity) or could be a single attribute
Each data item has a unique name
The basic database operation that a Txn can include: read_item(x), write_item(x); x is a
program variable
The basic access is one disk block from disk to memory
Read_item(x):
1. Find the address of the disk block that contains x
2. Copy the disk block to memory buffer (if not in memory)
3. Copy the item from the buffer to program variable x
Write_item(x):
1. Find the address of the disk block that contains x
2. Copy the disk block to memory buffer (if not in memory)
3. Copy the program variable x into buffer
4. Store the updated buffer to disk
COMMIT_TRANSACTION. This signals a successful end of the transaction so that
any
Changes (updates) executed by the transaction can be safely committed to the database and
Will not be undone.
ROLLBACK (or ABORT). This signals that the transaction has ended unsuccessfully, so
that any changes or effects that the transaction may have applied to the database must be
undone.

3.3. State Transition Diagram


Once a transaction is committed, we cannot undo the changes made by the transactions by
rolling back the transaction. Only way to undo the effects of a committed transaction is to
execute a compensating transaction. The creating of a compensating transaction can be quite
complex and so the task is left to the user and it is not handled by the DBMS.
Because failure of transaction may occur, transaction is broken up into states to handle various
situations
Following are the different states in transaction processing in database
Active Committed
Partial committed
Failed
Aborted
69 | P a g e
Figure 3. 1 transaction state in DBMS

Active
This is the initial state. The transaction stay s in this state while it is executing.
Partially Committed
This is the state after the final statement of the transaction is executed.
At this point failure is still possible since changes may have been only done in main memory,
a hardware failure could still occur.
The DBMS needs to write out enough information to disk so that, in case of a failure, the system could
re-create the updates performed by the transaction once the system is brought back up. After it
has written out all the necessary information, it is committed.
Failed
After the discovery that normal execution can no longer proceed.
Once a transaction cannot be completed, any changes that it made must be undone rolling it
back.
Aborted
The state after the transaction has been rolled back and the database has been restored to its state prior to
the start of the transaction.
Committed

70 | P a g e
The transaction enters in this state after successful completion of the transaction.
We cannot abort or rollback a committed transaction.

3.4. DBMS Schedules and the Types of Schedules

3.4.1. Schedule
A schedule is the chronological (sequential) order in which instructions are executed I system.
A schedule for a set of transaction must consist of all the instruction of those transactions and
must preserve the order in which the instructions appear in each individual transaction.
Example of schedule (Schedule 1)

3.4.1.1. Serial schedule


Schedule that does not interleave the actions of different transactions.
In schedule 1 the all the instructions of T1 are grouped and run together. Then all the instructions of T2 are
grouped and run together.
Means schedule 2 will not start until all the instructions of schedule 1 are complete. This type of schedules
is called serial schedule.
So In Serial schedule, a transaction is executed completely before starting the execution of another transaction.
In other words, you can say that in serial schedule, a transaction does not start execution until the currently
running transaction finished execution.
 This type of execution of transaction is also known as non-interleaved execution.
 The example we have seen above is the serial schedule.
Let T1transfer 50 birr from A to B, and T2 transfer 10% of the balance from A to B. An example of a serial
schedule in which T1 is followed by T2

71 | P a g e
3.4.1.2.

Interleaved schedule
Schedule that interleave the actions of different transactions.
Means schedule 2 will start before all instructions of schedule 1 are completed.
This type of schedules is called interleaved schedule.

3.4.1.3. Serializable schedule

A schedule that is equivalent (in its outcome) to a serial schedule has the serializability property.
Two schedules are equivalent schedule if the effect of executing the first schedule is identical (same)
to the effect of executing the second schedule.

72 | P a g e
We can also say that two schedule are equivalent schedule if the output of executing the first schedule
is identical (same) to the output of executing the second schedule.
A schedule that is equivalent (in its outcome) to a serial schedule has the serializability property.
Example of serializable schedule

In above example there are two schedules as schedule 1 and schedule 2


In schedule 1 and schedule 2 the order in which the instructions of transaction are executed is not
the same but whatever the result we get is same. So this is known as serializability of transaction.

3.4.1.3. Conflict serializability


Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both li and lj , and at least one of these instructions
Wrote Q.
1. If li and lj access different data item then l i and lj don’t conflict.
2. li = read(Q), lj = read(Q). li and lj don’t conflict.
3. li = read(Q), lj = write(Q). li and lj conflict.
4. li = write(Q), lj = read(Q). li and lj conflict.
5. li = write(Q), lj = write(Q). li and lj conflict.
Intuitively, a conflict between li and lj forces a (logical) temporal order between them.
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting
instructions, we say that S and S´ are conflict equivalent.

73 | P a g e
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule.

Example
Schedule S can be transformed into Schedule S’ by swapping of non-conflicting series of instructio ns.
Therefore Schedule S is conflict serializable.

Instruction Ii of transaction T1 and Ij of transaction T2 conflict if both of these instruction access same data A
and one of these two instructions performs write operation on that data (A).

In above example the write(A) instruction of transaction T1 conflict with read(A) instruction of
transaction T2 because both the instructions access same data A. But write(A) instruction of
transaction T2 is not conflict with read(B) instruction of transaction T1 because both the instructio ns
access different data. Transaction T2 performs write operation in A and transaction T1 is reading B.
So in above example in schedule S two instructions read(A) and write(A) of transaction T2 and two
instructions read(B) and write(B) of transaction T1 are interchanged and we get schedule S’.
Therefore Schedule S is conflict serializable.
We are unable to swap instructions in the above schedule S’’ to obtain either the serial schedule < T3, T4 >,
or the serial schedule < T4, T3 >.
So above schedule S’’ is not conflict serializable.

3.4.4. View serializability


Let S and S´ be two schedules with the same set of transactions. S and S´ are view
Equivalent if the following three conditions are met, for each data item Q,
 If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also transaction Ti must
read the initial value of Q.

74 | P a g e
 If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj (if any),
then in schedule S’ also transaction Ti must read the value of Q that was produced by the same write(Q)
operation of transaction Tj .
 The transaction Ti (if any) that performs the final write(Q) operation in schedule S then in schedule S’
also the final write(Q) operation must be performed by Ti .
A schedule S is view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable but every view serializable is not conflict
serializable.
Below is a schedule which is view serializable but not conflict serializable

Above schedule is view serializable but not conflict serializable because all the transactions can use
same data item (Q) and all the operations are conflict with each other due to one operation is write
on data item (Q) and that’s why we cannot interchange any non-conflict operation of any transaction.

3.4. Transaction Control Language in SQL (TCL)


The SQL Data Control Language (DCL) provides security for your database. The DCL consists of the GRANT,
REVOKE, COMMIT, and ROLLBACK statements. GRANT and REVOKE statements enable you to determine
whether a user can view, modify, add, or delete database information.
Working with transaction control
Applications execute a SQL statement or group of logically related SQL statements to perform a database
transaction. The SQL statement or statements add, delete, or modify data in the database.
Transactions are atomic and durable. To be considered atomic, a transaction must successfully complete all of
its statements; otherwise none of the statements execute. To be considered durable, a transaction's changes to a
database must be permanent. Complete a transaction by using either the COMMIT or ROLLBACK statements.
COMMIT statements make permanent the changes to the database created by a transaction. ROLLBACK restores
the database to the state it was in before the transaction was performed.
SQL Transaction Control Language Commands (TCL.)

75 | P a g e
This page contains some SQL TCL. Commands that I think it might be useful. Each command’s description is
taken and modified from the SQLPlus help. They are provided as is and most likely are partially described. So,
if you want more detail or other commands, please use HELP in the SQLPlus directly.
Transaction Control:
There are following commands used to control transactions:
 COMMIT: to save the changes.
 ROLLBACK: to rollback the changes.
 SAVEPOINT: creates points within groups of transactions in which to ROLLBACK
 SET TRANSACTION: Places a name on a transaction.
Transactional control commands are only used with the DML commands INSERT, UPDATE and DELETE only.
They cannot be used while creating tables or dropping them because these operations are automatica lly
committed in the database.
The COMMIT Command:
The COMMIT command is the transactional command used to save changes invoked by a transaction
to the database.
The COMMIT command saves all transactions to the database since the last COMMIT or ROLLBACK
command(Source from Sql Tutorial Point).
The syntax for COMMIT command is as follows:
COMMIT;

Consider the CUSTOMERS table having the following records:

76 | P a g e
Following is the example, which would delete records from the table having age = 25 and then
COMMIT the changes in the database.

As a result, two rows from the table would be deleted and SELECT statement would produce the following
result:
The ROLLBACK Command:
The ROLLBACK command is the transactional command used to undo transactions that have not already been
saved to the database.
The ROLLBACK command can only be used to undo transactions since the last COMMIT or
ROLLBACK command was issued.
The syntax for ROLLBACK command is as follows:

Example:
Consider the CUSTOMERS table having the following records:

77 | P a g e
Following is the example, which would delete records from the table having age = 25 and then ROLLBACK
the changes in the database.

As a result, delete operation would not impact the table and SELECT statement would produce the following
result:

The SAVEPOINT Command:


The syntax for SAVEPOINT command is as follows:

This command serves only in the creation of a SAVEPOINT among transactional statements. The
ROLLBACK command is used to undo a group of transactions.
The syntax for rolling back to a SAVEPOINT is as follows:

78 | P a g e
Following is an example where you plan to delete the three different records from the CUSTOMERS table. You
want to create a SAVEPOINT before each delete, so that you can ROLLBACK to any SAVEPOINT at any time
to return the appropriate data to its original state:
Example:
Consider the CUSTOMERS table having the following records:

Now, here is the series of operations:

Now that the three deletions have taken place, say you have changed your mind and decided to ROLLBACK to
the SAVEPOINT that you identified as SP2. Because SP2 was created after the first deletion, the last two
deletions are undone:

79 | P a g e
Notice that only the first deletion took place since you rolled back to SP2:

The RELEASE SAVEPOINT Command:


The RELEASE SAVEPOINT command is used to remove a SAVEPOINT that you have created.
The syntax for RELEASE SAVEPOINT is as follows:

Once a SAVEPOINT has been released, you can no longer use the ROLLBACK command to undo transactions
performed since the SAVEPOINT.

Assessment.3.1
1. Discuss what we meant transaction? Why it is interesting? And describe the possible database
operation
2. Differentiate Read operation vs Write operation by giving appropriate examples.
3. Compare and contrast transaction properties: Atomicity, Consistency, Isolation and Durability by giving an
appropriate example from the real world scenario.
4. Compare and contrast Transaction state:- Active state, Partially committed, Failed state, Aborted State ,
Committed State by giving an appropriate example from the real world scenario.
5. Discuss Concurrency Problems in DBMS by using real world example: Dirty Read Problem/uncommitted
read, Unrepeatable Read Problem and Lost Update Problem
6. Compare and contrast the Recoverable Schedules:- Cascading Schedule and Cascadeless Schedule use
example for each

Chapter three:-Choose part

80 | P a g e
1. What types of schedules occur if a transaction Tj read a data item previously written by a transaction Ti
then the commit operation of Ti must appear before the commit operation of Tj?
A. Recovery scheduling
B. Non recovery scheduling
C. Cascading schedule
D. Cascadeless scheduling
E. All of the above
2. State in which transaction stays while it is executing is termed as
A. Active
B. Partially committed
C. Initial
D. Both A and B
3. Serializability of schedules can be ensured through a mechanism called
A. Concurrency control policy
B. Evaluation control policy
C. Execution control policy Cascading control policy
4. A _________ consists of a sequence of query and/or update statements.
A. Transaction
B. Commit
C. Rollback
D. Flashback
5. Which of the following makes the transaction permanent in the database?

A. View
B. Commit
C. Rollback
D. Flashback
6. Which one of the following is a part of the ACID properties of database transactions?
A. Atomicity, consistency, isolation, database
B. Atomicity, consistency, isolation, durability
C. Atomicity, consistency, integrity, durability
D. Atomicity, consistency, integrity, database
7. Which concept more show that when the operation of one transaction of one user override by the operation
of another user and produce incorrect answer?
A. Lost of update problem
B. Uncommitted dependency
C. Dirty read problem
D. Incorrect summary problem
E. b and c
8. Consider the following transactions with data items P and Q initialized to zero:
1: read (P) ;
read (Q) ;
if P = 0 then Q : = Q + 1 ;
write (Q) ;
T2: read (Q) ;
81 | P a g e
read (P) ;
if Q = 0 then P : = P + 1 ;
write (P) ;
A. A serializable schedule
B. A schedule that is not conflict serializable
C. A conflict serializable schedule
D. A schedule for which a precedence graph cannot be drawn
E. All of the above
9. .Consider three data items D1, D2 and D3 and the following execution schedule of transactions T1, T2 and T3. In the
diagram, R(D) and W(D) denote the actions reading and writing the data item D respectively

A. The schedule is serializable as T2; T3; T1


B. The schedule is serializable as T2; T1; T3
C. The schedule is serializable as T3; T2; T1
D. The schedule is not serializable

82 | P a g e
Chapter 4: DBMS Concurrency Control

4.1. Introduction of concurrency Controlling

CHAPTER OUTCOME

Students will be able to understanding:

Locking Techniques for Concurrency Control


Concurrency Control Based ON Timestamp Ordering
Multisession Concurrency Control Techniques
Validation (Optimistic) Concurrency Control Technique
Granularity of Data Items and Multiple Granularity Locking
Using Locks for Concurrency Control in Indexes

In a multiprogramming environment where more than one transactions can be concurrently


executed, there exists a need of protocols to control the concurrency of transaction to ensure
atomicity and isolation properties of transactions. Concurrency control protocols, which ensure
serializability of transactions, are most desirable. Concurrency control protocols can be broadly
divided into two categories:
Concurrency is the ability of a database to allow multiple (more than one) users to access
data at the same time.

4.2. Methods to control concurrency (Mechanisms)


Optimistic - Delay the checking of whether a transaction meets the isolation and other
integrity rules (e.g., serializability and recoverability) until its end, without blocking any of
its (read, write) operations and then abort a transaction to prevent the violation, if the
desired rules are to be violated upon its commit. An aborted transaction is immedia te ly
restarted and re -executed, which incurs an obvious overhead. If not too many transactions
are aborted, then being optimistic is usually a good strategy.
Pessimistic - Block an operation of a transaction, if it may cause violation of the rules, until the
possibility of violation disappears. Blocking operations is typically involved with performance
reduction.

83 | P a g e
Semi-optimistic - Block operations in some situations, if they may cause violation of
some rules, and do not block in other situations while delaying rules checking (if
needed) to transaction's end, as done with optimistic.
When multiple transactions execute concurrently in an uncontrolled or unrestricted manner, then it
might lead to several problems, such problems are called as concurrency problems, the concurrenc y
problems are-
Concurrency Problems in DBMS

4.2.1. Dirty Read Problem/uncommitted read


 This read is called as dirty read because; there is always a chance that the uncommitted
transaction might roll back later.
 Thus, uncommitted transaction might make other transactions read a value that does not even
exist.
 This leads to inconsistency of the database
 Dirty read does not lead to inconsistency always.
 It becomes problematic only when the uncommitted transaction fails and rolls backs later due
to some reason.
Here,
T1 reads the value of A.
T1 updates the value of A in the buffer.
T2 reads the value of A from the buffer.
T2 writes the updated the value of A.
84 | P a g e
T2 commits.
T1 fails in later stages and rolls back.
In this example,
T2 reads the dirty value of A written by the uncommitted transaction T1.
T1 fails in later stages and roll backs.
Thus, the value that T2 read now stands to be incorrect.
Therefore, database becomes inconsistent

4.2.2. Unrepeatable Read Problem


This problem occurs when a transaction gets to read unrepeated i.e. different values of the same
variable in its different read operations even when it has not updated its value.
Here,

T1 reads the value of X (= 10 say).


T2 reads the value of X (= 10).
T1 updates the value of X (from 10 to 15 say) in the buffer.
T2 again reads the value of X (but = 15).
In this example,
T2 gets to read a different value of X in its second reading.
T2 wonders how the value of X got changed because according to it, it is running in
isolation.

4.2.3.Lost Update Problem


This problem occurs when multiple transactions execute concurrently and updates from one or
more transactions get lost.

85 | P a g e
Here,
T1 reads the value of A (= 10 say).
T2 updates the value to A (= 15 say) in the buffer.
T2 does blind write A = 25 (write without read) in the buffer.
T2 commits.
When T1 commits, it writes A = 25 in the database.
In this example,
T1 writes the over written value of X in the database.
Thus, update from T1 gets lost

4.3. Concurrency Control


As we have discussed this earlier, now we will talk about the concurrency control. Concurrency
control in database management systems permits many users (assumed to be interactive) to access
a database in a multi programmed environment while preserving the illusion that each user has sole
access to the system. Control is needed to coordinate concurrent accesses to a DBMS so that the
overall correctness of the database is maintained.
For example, users A and B both may wish to read and update the same record in the database at
about the same time. The relative timing of the two transactions may have an impact on the state of
the database at the end of the transactions. The end result may be an inconsistent database.
Why Concurrent Control is needed?
Several problems can occur when concurrent transactions execute in an uncontrolled manner.
 The lost update problem: This occurs when two transactions that access the same database items
have their operations interleaved in a way that makes the value of same database item incorrect.

86 | P a g e
 The temporary update (or dirty read) problem: This occurs when one transaction updates a
database item and then the transaction fails for some reason. The updated item is accessed by
another transaction before it is changed back to its original value.
 The incorrect summary problem: If one transaction is calculating an aggregate function on a
number of records while other transaction is updating some of these records, the aggregate
function may calculate some values before they are updated and others after they are updated.
Whenever a transaction is submitted to a DBMS for execution, the system must make sure that:
All the operations in the transaction are completed successfully and their effect is recorded
permanently in the database; or
The transaction has no effect whatever on the database or on the other transactions in the
case of that a transaction fails after executing some of operations but before executing all of
them.

4.4. Concurrency Control Protocols


 Different concurrency control protocols offer different benefits between the amount of
concurrency they allow and the amount of overhead that they execute.
 Concurrency Control techniques in DBMS
Lock-Based Protocols
Two Phase Locking Protocol
Timestamp-Based Protocols
Validation-Based Protocols
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it.

4.4.1. Types of lock:


1. Shared lock:
 It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
 It can be shared between the transactions because when the transaction holds a lock, then it
can't update the data on the data item.
2. Exclusive lock:
 In the exclusive lock, the data item can be both reads as well as written by the transaction.

87 | P a g e
 This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.

4.5. Locking
In order to execute transactions in an interleaved manner it is necessary to have some form
of concurrency control.
This enables a more efficient use of computer resources.
One method of avoiding problems is with the use of locks.
When a transaction requires a database object it must obtain a lock.
Locking is necessary in a concurrent environment to assure that one process does not retrieve
or update a record that is being updated by another process. Failure to use some controls
(locking), would result in inconsistent and corrupt data.
Locks enable a multi-user DBMS to maintain the integrity of transactions by isolating a
transaction from others executing concurrently.
Locks are particularly critical in write intensive and mixed workload (read/write)
environments, because they can prevent the inadvertent loss of data or Consistency problems
with reads. In addition to record locking, DBMS implements several other locking mechanisms to
ensure the integrity of other data structures that provide shared I/O, communication among
different processes in a cluster and automatic recovery in the event of a process or cluster
failure.
Aside from their integrity implications, locks can have a significant impact on performance. While
it may benefit a given application to lock a large amount of data (perhaps one or more tables) and
hold these locks for a long period of time, doing so inhibits concurrency and increases the likelihood
that other applications will have to wait for locked resources.

4.5.1. LOCKING RULES


There are various locking rules that are applicable when a user reads or writes a data to a database.
The various locking rules are –
 Any number of transactions can hold S-locks on an item
 If any transaction holds an X-lock on an item, no other transaction may hold any lock on
the item A transaction holding an X-lock may issue a write or a read request on the data
item

88 | P a g e
 A transaction holding an S-lock may only issue a read request on the data item

Table 4.1: double locking

4.5.2. Two-phase locking (2PL)


 The two-phase locking protocol divides the execution phase of the transaction into three
parts.
 In the first part, when the execution of the transaction starts, it seeks permission for the lock
it requires.
 In the second part, the transaction acquires all the locks.
 The third phase is started as soon as the transaction releases its first lock.
 In the third phase, the transaction cannot demand any new locks. It only releases the acquired
locks.
There are three phases of 2PL: in common
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.

89 | P a g e
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
Figure 4. 1 Two phase locking techniques

4.5.3. Timestamp-based Protocols


In this Protocol in DBMS is an algorithm which uses the System Time or Logical Counter
as a timestamp to serialize the execution of concurrent transactions.
The Timestamp-based protocol ensures that every conflicting read and writes operations are
executed in a timestamp order.
The older transaction is always given priority in this method.
It uses system time to determine the time stamp of the transaction.
This is the most commonly used concurrency protocol.
Example:
Suppose there are there transactions T1, T2, & T3.
o T1 has entered the system at time 0010
o T2 has entered the system at 0020
o T3 has entered the system at 0030
o Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3.

90 | P a g e
4.5.4. Validation Based Protocol
 It assumes that multiple transactions can frequently complete without interfering with
each other.
 Before committing, each transaction verifies that no other transaction has modified the data
it has read.
 If check make known conflicting modifications, the committing transaction rolls back and
can be restarted.
 The validation based protocols require that each transaction Ti executes in two or
three different phases in its lifetime.
Read phase.
During this phase, it reads the values of the various data items and stores them in variables
local to Ti.
It performs all write operations on temporary local variables, without updates of the actual
database.
Transaction Ti performs a validation test to determine whether it can copy to the database
the temporary to local variables that hold the results of write operations without causing a
violation of serializability.
Write phase.
If transaction Ti succeeds in validation (step 2), then the system applies the actual updates
to the database. Otherwise, the system rolls back Ti.
 Start(Ti), the time when Ti started its execution.
 Validation (Ti), the time when Ti finished its read phase and started its validation phase.
 Finish (Ti), the time when Ti finished its write phase.

Multiple Choice Questions and Answers


91 | P a g e
Question 1 : To synchronize the concurrent accessing of database items, we use
A. Transactions
B. State
C. Locks
D. None of the above
Question 2 : When a transaction never progresses then we say that it is
A. Aborted
B. Starved
C. Shared
D. Locked
Question 3 : During pessimistic approach, the order of operation execution is
A. Read, Validate, Compute, Write
B. Validate, Read, Compute, Write
C. Validate, Compare, Read, Write
D. None of the above
Question 4 : A phase during which all locks are requested is known as a
A. Growing phase
B. Shrinking phase
C. Aborted phase
D. None of the above
Question 5 : No more lock requests can be asked for in
A. Growing phase
B. Shrinking phase
C. Aborted phase
D. None of the above
Question 6 : The point in the schedule where the transaction has obtained its final lock is known as
A. Deadlock
B. Commit point
C. Lock point
D. None of the above
Question 6 : 2PL protocol suffers from:
A. Deadlock
B. Cascading Roll back
C. Both (a) and (b)
D. None of the above
Question 7 : Timestamps can be implemented by using a
A. System clock
B. Logical counter
C. Both ‘a’ and ‘a’
D. None of the above
Question 8 : By granularity, we mean
A. Size of data item being locked
B. Size of locks
C. Size of a transaction
92 | P a g e
D. None of the above
Question 9 : The major factor for concurrency control is
A. Granularity
B. Locking
C. Time stamping
D. None of the above
Question 10 : The assumption that hardware errors and bugs in the software bring the system to halt but do
not corrupt the non-volatile storage contents is known as
A. Lock point
B. Fail-stop assumption
C. Single fault assumption
D. None of the above

93 | P a g e
Chapter 5:- Database Recovery

Students will be able to:

Recovery Concepts
Recovery Concepts Based on Deferred Update
Recovery Concepts Based on Immediate Update
Shadow Paging
The ARIES Recovery Algorithm
Recovery in Multidatabase Systems

5.1. Introduction of Database recovery


It is the process of restoring a database to the correct state in the event of before failur e.
Also it service that is provided by the DBMS to ensure that the database is reliable and remain in
consistent state in case of a failure. In general, recovery refers to the various operations involved in
restoring, rolling forward and rolling back a backup.
It the process of restoring the database to the most recent consistence state that exist just before
failure.
This occur when the database is in the inconsistent state and violate the atomic properties
The three states of database recovery are
• Precondition
• Condition
• Post condition
Database recovery is the process of restoring a database to the correct state in the event
of a failure
Database recovery is a service that is provided by the DBMS to ensure that the database is
reliable and remain in consistent state in case of a failure.
Restoring a physical backup means reconstructing it and making it available to the
database server.
To recover a restored backup, data is updated using redo command after the backup
was taken.
Database server such as SQL server or ORACLE server performs cash recovery and
instance recovery automatically after an instance failure.

94 | P a g e
In case of media failure, a database administrator (DBA) must initiate a recovery
operation.
Recovering a backup involves two distinct operations: rolling the backup forward to a
more recent time by applying redo data and rolling back all changes made in uncommitted
transactions to their original state.
In general, recovery refers to the various operation s involved in restoring, rolling forward
and rolling back a backup.
Backup and recovery refers to the various strategies and operations involved in
Protecting the database against data loss and reconstructing the database.

5.2 Explain Log based recovery method

5.2.1. Log based recovery


In short Transaction log is a journal or simply a data file, which contains history of all
transaction performed and maintained on stable storage. This is used to storing some informatio n
about the transaction histories what we done before, when the transaction is fail suddenly it hold
what we did before in buffer It used to storing the information about the transaction
The most widely used structure for recording database modification is the log.
The log is a sequence of log records, recording all the update activities in the database.
In short Transaction log is a journal or simply a data file, which contains history of all
transaction performed and maintained on stable storage
Since the log contains a complete record of all database activity, the volume of data
stored in the log may become unreasonable large.
For log records to be useful for recovery from system and disk failures, the log must
reside on stable storage.
Log contains
1. Start of transaction
2. Transaction-id
3. Record-id
4. Type of operation (insert, update, delete)
5. Old value, new value
6. End of transaction that is committed or aborted

95 | P a g e
All such files are maintained by DBMS itself. Normally these are sequential files.
Recovery has two factors Rollback (Undo) and Roll forward (Redo).
When transaction Ti starts, it registers itself by writing a <Ti start>log record
Before Ti executes write(X), a log record <Ti , X, V1, V2> is written, where V1 is the
value of X before the write, and V2 is the value to be written to X.
 Log record notes that Ti has performed a write on data item Xj
 Xj had value V1 before the write, and will have value V2 after the write
When Ti finishes it last statement, the log record < Ti commit> is written.
Two approaches are used in log based recovery
1. Deferred database modification
2. Immediate database modification

5.2.2. Log based Recovery Techniques


Once a failure occurs, DBMS retrieves the database using the back-up of database and transactio n
log. Various log based recovery techniques used by DBMS are as per below:
1. Deferred Database Modification
2. Immediate Database Modification
Both of the techniques use transaction logs. These techniques are explained in following
sub-sections.

5.2.2.1. Deferred Database Modification log based recovery method.


Concept
Updates (changes) to the database are deferred (or postponed) until the transaction commits.
 During the execution of transaction, updates are recorded only in the transaction log and in
buffers. After the transaction commits, these updates are recorded in the database.
When failure occurs
 If transaction has not committed, then it has not affected the database. And so, no need to do
any undoing operations. Just restart the transaction.
 If transaction has committed, then, still, it may not have modified the database. And so, redo
the updates of the transaction.
Transaction Log
 In this technique, transaction log is used in following ways:

96 | P a g e
 Transaction T starts by writing <T start> to the log.
 Any update is recorded as <T, X, V>, where V indicates new value for data item X. Here, no
need to preserve old value of the changed data item. Also, V is not written to the X in database,
but it is deferred.
 Transaction T commits by writing <T commit> to the log. Once this is entered in log,
actual updates are recorded to the database.
 If a transaction T aborts, the transaction log record is ignored, and no any updates are recorded
to the database.
Example
 Consider the following two transactions, T 0 and T1 given in figure, where T0 executes
before T1. Also consider that initial values for A, B and C are 500, 600 and 700
respectively.

 The following figure shows the transaction log for above two transactions at three
different instances of time.

97 | P a g e
 If failure occurs in case of
1. No any REDO actions are required.
2. As Transaction T0 has already committed, it must be redone.
3. As Transactions T0 and T1 have already committed, they must be redone.

5.2.2.2. Explain Immediate Database Modification log based recovery method.


Concept
 Updates (changes) to the database are applied immediately as they occur without waiting
to reach to the commit point.
 Also, these updates are recorded in the transaction log.
 It is possible here that updates of the uncommitted transaction are also written to the database.
And, other transactions can access these updated values.
When failure occurs
 If transaction has not committed, then it may have modified the database. And so, undo the
updates of the transaction.
 If transaction has committed, then still it may not have modified the database. And so, redo the
updates of the transaction.
Transaction Log
 In this technique, transaction log is used in following ways :
 Transaction T starts by writing <T start> to the log.
 Any update is recorded as <T, X, Vold, Vnew > where Vold indicates the original value of
data item X and Vnew indicates new value for X. Here, as undo operation is required, it
requires preserving old value of the changed data item.
 Transaction T commits by writing <T commit> to the log.
 If a transaction T aborts, the transaction log record is consulted, and required undo
operations are performed.
Example
Again, consider the two transactions, T 0 and T1, given in figure, where T0 executes
before T1.
Also consider that initial values for A, B and C are 500, 600 and 700 respectively.

98 | P a g e
The following figure shows the transaction log for above two transactions at three
different instances of time. Note that, here, transaction log contains original values also along
with new updated values for data items.
If failure occurs in case of -
Undo the transaction T0 as it has not committed, and restore A and B to 500 and 600
respectively.
Undo the transaction T1, restore C to 700; and, Redo the Transaction T0 set A and B to 400
and 700 respectively.
Redo the Transaction T0 and Transaction T0; and, set A and B to 400 and 700
respectively, while set C to 500.

Explain system recovery procedure with Checkpoint record concept


 Problems with Deferred & Immediate Updates Searching the entire log is time-consuming.
 It is possible to redo transactions that have already been stored their updates to the
database.
Checkpoint
 A point of synchronization between data base and transaction log file.
 Specifies that any operations executed before this point are done correctly and stored safely.
 At this point, all the buffers are force-fully written to the secondary storage.
 Checkpoints are scheduled at predetermined time intervals
Used to limit -
1. The size of transaction log file
2. Amount of searching, and
3. Subsequent processing that is required to carry out on the transaction log file.
When failure occurs

99 | P a g e
 Find out the nearest checkpoint.
 If transaction has already committed before this checkpoint, ignore it.
 If transaction is active at this point or after this point and has committed before failure, redo
that transaction.
 If transaction is active at this point or after this point and has not committed, undo that
transaction.
Example
 Consider the transactions given in following figure. Here, Tc indicates checkpoint, while Tf
indicates failure time.
Here, at failure time -
1. Ignore the transaction T1 as it has already been committed before checkpoint.
2. Redo transaction T2 and T3 as they are active at/after checkpoint, but have committed
before failure.
3. Undo transaction T4 as it is active after checkpoint and has not committee\d.

5.2.3. Explain Shadow Paging Technique.


Concept
 Shadow paging is an alternative to transaction- log based recovery techniques.
100 | P a g e
 Here, database is considered as made up of fixed size disk blocks, called pages. These pages
are mapped to physical storage using a table, called page table.
 The page table is indexed by a page number of the database. The information about
physical pages, in which database pages are stored, is kept in this page table.
 This technique is similar to paging technique used by Operating Systems to allocate
memory, particularly to manage virtual memory.
 The following figure depicts the concept of shadow paging

5.2.3.1. Execution of Transaction


 During the execution of the transaction, two page tables are maintained.
1. Current Page Table: Used to access data items during transaction execution.
2. Shadow Page Table: Original page table, and will not get modified during transaction
execution.
Whenever any page is about to be written for the first time
1. A copy of this page is made onto an free page,
2. The current page table is made to point to the copy,
3. The update is made on this copy.
 At the start of the transaction, both tables are same and point· to same pages.
 The shadow page table is never changed, and is used to restore the database in case of any failure
occurs. However, current page table entries may change during transaction execution, as it is
used to record all updates made to the database.
 When the transaction completes, the current page table becomes shadow page table. At this
time, it is considered that the transaction has committed.
 The following figure explains working of this technique.
 As shown in this figure, two pages - page 2 & 5 - are affected by a transaction and
copied to new physical pages. The current page table points to these pages.
 The shadow page table continues to point to old pages which are not changed by the
transaction. So, this table and pages are used for undoing the transaction.

101 | P a g e
Figure 5. 1 Explain Shadow Paging Technique

Advantages
 No overhead of maintaining transaction log.
 Recovery is quite faster, as there is no any redo or undo operations required.
Disadvantages
 Copying the entire page table is very expensive.
 Data are scattered or fragmented.
 After each transaction, free pages need to be collected by garbage collector. Difficult to
extend this technique to allow concurrent transactions
Review Questions
1. In which case the database update strategies transaction is summited to the physical database
only after the transaction is reach its commit point?
A. Immediate update
B. Deferred update
C. Summary problem update
102 | P a g e
D. All of the above
2. Transaction Ti read the item X and write item X lastly, but before the Ti is permanently
committed the transaction the transaction Ji is starting and read the values of Ti which is not
committed. Which concurrency problem of the above idea.
A. Lost of update problem
B. Uncommitted dependency
C. Dirty read problem
D. Incorrect summary
3. From the following which one is true about the locking technique?
A. More than one transaction using the item X at the same time
B. When one transaction using the variable X the other transaction must waiting up to locked is
acquire
C. When one transaction using the variable X the other transaction must waiting up to locked is
released
4. Which one of the following more described about the shared and exclusive locks of lock types?
A. If the transaction only need to write item X it use shared lock
B. If the transaction only need to read the item Y is use the exclusive lock
C. If the transaction only need to read and write the item Z it uses exclusive lock
D. If the transaction only need to read and write the item Z it uses shared lock
E. All of the above
5. From the following which one is false about the two phase of locking?
A. The lock and unlock would done in two different phases
B. The transaction lock first in one phase than released in the other phase
C. Locked point is the point of stop locking and allow to unlock
D. At the growing phase the lock and released activity can occur
E. None of the above
F. All of the above
6. Which one is more describe about multiple granularity of locking techniques?
A. It allow the data with the various size and the hierarchy of the data structure
B. During this time when locking the tables all the database would be locked
C. On this mechanism when locking the table all the ascendants are locked
103 | P a g e
D. On this mechanism when locking the table all the descendent records are locked
E. All of the above
7. Which one is more describe about multiple granularity of locking techniques?
A. It allow the data with the various size and the hierarchy of the data structure
B. During this time when locking the tables all the database would be locked
C. On this mechanism when locking the table all the ascendants are locked
D. On this mechanism when locking the table all the descendent records are locked
E. All of the above
Which of the following is not a recovery technique?
A. Deferred update
B. Immediate update
C. Two-phase commit
D. Recovery management
8. .... deals with soft errors, such as power failures.
A. System recovery
B. Media recovery
C. Database recovery
D. Failure recovery
9. Rollback of transactions is normally used to :
A. Recover from transaction failure
B. Update the transaction
C. Retrieve old records
D. Repeat a transaction
10. A transaction performs on the isolation level of the Read Uncommitted if :
A. A dirty read occurs
B. Non-repeatable read occurs
C. Phantom reads occurs
D. All of the above

104 | P a g e
Chapter 6: Distributed Database System
Outline
Introduction
Distributed Database Concepts
What Constitutes a DDB
Transparency
Availability and Reliability
Scalability and Partition Tolerance
Advantages of Distributed Databases
Data Fragmentation, Replication, and Allocation Techniques for Distributed Database
Design
Data Fragmentation and Sharding
Data Replication and Allocation
Types of Distributed Database Systems
Distributed Database Architectures
Parrallel versus Distributed Architecture
General Architecture of Pure Distributed Database
Federated Database Schema Architecture
An Overview of Three-Tier Client/Server Architecture

6.1. Introduction of distribute system


A Database is a collection of data describing the activities of one or more related organizations with
a specific well defined structure and purpose. A Database is controlled by Database Management
System (DBMS) by maintaining and utilizing large collections of data.
A Distributed System is the one in which hardware and software components at networked
computers communicate and coordinate their activity only by passing messages. In short a
Distributed database is a collection of databases that can be stored at different computer network
sites. This paper presents an overview of Distributed Database System along with their advantages
and disadvantages. This topic also provides various aspects like replication, fragmentation and
various problems that can be faced in distributed database systems.

105 | P a g e
A distributed database is a database in which storage devices are not all attached to a common
processing unit such as the CPU. It may be stored in multiple computers, located in the same
physical location; or may be dispersed over a network of interconnected computers. A distributed
database system consists of loosely-coupled sites that share no physical components. In Centralized
systems, Data, Process and Interface components of an information system are central.

Figure 6. 1 Centralized Database System

In order to work on the system end users uses terminals or terminal emulators. In Distributed
System, Data, Process, and Interface components of an information system are distributed to
multiple locations in a computer network.
Accordingly, the processing workload is distributed across the network. Distributed Systems are
required for Functional distribution, Inherent distribution in application domain, Economics, Better
performance, and increased Reliability.

Figure 6. 2 Distributed Database System.

106 | P a g e
6.2. Distributed Database Concepts
We can define a distributed database (DDB)as a collection of multiple logically interrelated
databases distributed over a computer network, and a distributed database management system
(DDBMS)as a software system that manages a distributed database while making the distributio n
transparent to the user.
Distributed databases are different from Internet Web files. Web pages are basically a very large
collection of files stored on different nodes in a network the Internet with interrelationships among
the files represented via hyperlinks. The common functio ns of database management, includ ing
uniform query processing and transaction processing, do not apply to this scenario yet.
Differences between DDB and Multiprocessor Systems
We need to distinguish distributed databases from multiprocessor systems that use shared storage
(primary memory or disk). For a database to be called distributed, the following minimum
conditions should be satisfied:
 Connection of database nodes over a computer network. There is multiple computers, called
sites or nodes. These sites must be connected by an underlying communication network to
transmit data and commands among sites
 Logical interrelation of the connected databases. It is essential that the information in the
databases be logically related.
 Absence of homogeneity constraint among connected nodes. It is not necessary that all nodes
be identical in terms of data, hardware, and software.
The sites may all be located in physical proximity say, within the same building or a group of
adjacent buildings and connected via a local area network, or they may be geographically distributed
over large distances and connected via a long-haul or wide area network. Local area networks
typically use wireless hubs or cables, whereas long-haul networks use telephone lines or satellite s.
It is also possible to use a combination of networks.
Networks may have different topologies that define the direct communication paths among sites.
The type and topology of the network used may have a significant impact on the performance and
hence on the strategies for distributed query processing and distributed database design. For high-
level architectural issues, however, it does not matter what type of network is used; what matters is
that each site be able to communicate, directly or indirectly, with every other site. For the remainder

107 | P a g e
of this chapter, we assume that some type of communication network exists among sites, regardless
of any particular topology.
We will not address any network specific issues, although it is important to understand that for an
efficient operation of a distributed database system (DDBS), network design and performance
issues are critical and are an integral part of the overall solution. The details of the underlying
communication network are invisible to the end user.
 Transparency
The concept of transparency extends the general idea of hiding implementation details from end
users. A highly transparent system offers a lot of flexibility to the end user/application developer
since it requires little or no awareness of underlying details on their part. In the case of a traditiona l
centralized database, transparency simply pertains to logical and physical data independence for
application developers.
 However, in a DDB scenario, the data and software are distributed over multiple sites
connected by a computer network, so additional types of transparencies are introduced.
Consider the company database in Figure 3.5 that we have been discussing throughout the
book. The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizonta l ly
and stored with possible replication as shown in Figure below. The following types of
transparencies are possible:
 Data organization transparency (also known as distribution or network transparency).
This refers to freedom for the user from the operational details of the network and the placement of
the data in the distributed system. It may be divided into location transparency and naming
transparency.
 Location transparency refers to the fact that the command used to perform a task is
independent of the location of the data and the location of the node where the command was
issued. Naming transparency implies that once a name is associated with an object, the named
objects can be accessed unambiguously without additional specification as to where the data is
located.
 Replication transparency. As we show in Figure below, copies of the same data objects may
be stored at multiple sites for better availability, performance, and reliability. Replicatio n
transparency makes the user unaware of the existence of these copies.

108 | P a g e
6.4.Fragmentation transparency.
Two types of fragmentation are possible.
Horizontal fragmentation distributes a relation (table) into sub relations

Figure 6. 3 Fragmentation (Ramez Elmasri,2011)

That is subsets of the tuples (rows) in the original relation. Vertical fragmentation distributes a
relation into subrelations where each subrelation is defined by a subset of the columns of the
original relation. A global query by the user must be transformed into several fragment queries.
Fragmentation transparency makes the user unaware of the existence of fragments.
Other transparencies include design transparency and execution transparency—referring to
freedom from knowing how the distributed database is designed and where a transaction executes.
Autonomy
Autonomy determines the extent to which individual nodes or DBs in a connected DDB can
operate independently. A high degree of autonomy is desirable for increased flexibility and
customized maintenance of an individual node. Autonomy can be applied to design,
communication, and execution. Design autonomy refers to independence of data model usage and
transaction management techniques among nodes. Communication autonomy determines the
extent to which each node can decide on sharing of information with other nodes. Executio n
autonomy refers to independence of users to act as they please.
109 | P a g e
 Reliability and Availability
Reliability and availability are two of the most common potential advantages cited for distribute d
databases. Reliability is broadly defined as the probability that a system is running (not down) at
a certain time point, whereas availability is the probability that the system is continuous ly
available during a time interval. We can directly relate reliability and availability of the database
to the faults, errors, and failures associated with it. A failure can be described as a deviation of a
system’s behavior from that which is specified in order to ensure correct execution of operations.
Errors constitute that subset of system states that causes the failure .Fault is the cause of an error.
To construct a system that is reliable, we can adopt several approaches. One common approach
stresses fault tolerance; it recognizes that faults will occur, and designs mechanisms that can detect
and remove faults before they can result in a system failure. Another more stringent approach
attempts to ensure that the final system does not contain any faults. This is done through an
exhaustive design process followed by extensive quality control and testing.
A reliable DDBMS tolerates failures of underlying components and processes user requests so
long as database consistency is not violated. A DDBMS recovery manager has to deal with
failures arising from transactions, hardware, and communication networks.
Hardware failures can either be those that result in loss of main memory contents or loss of
secondary storage contents. Communication failures occur due to errors associated with messages
and line failures. Message errors can include their loss, corruption, or out-of-order arrival at
destination.
 Advantages of Distributed Databases
Organizations resort to distributed database management for various reasons. Some important
advantages are listed below.
 Improved ease and flexibility of application development. Developing and maintaining
applications at geographically distributed sites of an organization is facilitated owing to
transparency of data distribution and control.
 Increased reliability and availability. This is achieved by the isolation of faults to their site of
origin without affecting the other databases connected to the network. When the data and
DDBMS software are distributed over several sites, one site may fail while other sites continue
to operate. Only the data and software that exist at the failed site cannot be accessed. This
improves both reliability and availability. Further improvement is achieved by judicio us ly

110 | P a g e
replicating data and software at more than one site. In a centralized system, failure at a single
site makes the whole system unavailable to all users. In a distributed database, some of the data
may be unreachable, but users may still be able to access other parts of the database. If the data
in the failed site had been replicated at another site prior to the failure, then the user will not be
affected at all.
 Improved performance. A distributed DBMS fragments the database by keeping the data closer
to where it is needed most. Data localization reduces the contention for CPU and I/O services
and simultaneously reduces access delays involved in wide area networks. When a large
database is distributed over multiple sites, smaller databases exist at each site. As a result, local
queries and transactions accessing data at a single site have better performance because of the
smaller local databases.
 In addition, each site has a smaller number of transactions executing than if all transactions are
submitted to a single centralized database. Moreover, inter query and intra query parallelis m
can be achieved by executing multiple queries at different sites, or by breaking up a query into
a number of subqueries that execute in parallel. This contributes to improved performance.
 .Easier expansion. In a distributed environment, expansion of the system in terms of adding
more data, increasing database sizes, or adding more processors is much easier

6.5. Additional Functions of Distributed Databases


Distribution leads to increased complexity in the system design and implementation. To achieve the
potential advantages listed previously, the DDBMS software must be able to provide the following
functions in addition to those of a centralized DBMS:
 Keeping track of data distribution. The ability to keep track of the data distributio n,
fragmentation, and replication by expanding the DDBMS catalog.
 Distributed query processing. The ability to access remote sites and transmit queries and data
among the various sites via a communication network.
 Distributed transaction management. The ability to devise execution strategies for queries
and transactions that access data from more than one site and to synchronize the access to
distributed data and maintain the integrity of the overall database.
 Replicated data management. The ability to decide which copy of a replicated data item to
access and to maintain the consistency of copies of a replicated data item.

111 | P a g e
 Distributed database recovery. The ability to recover from individual site crashes and from
new types of failures, such as the failure of communication links.
 Security. Distributed transactions must be executed with the proper management of the
security of the data and the authorization/access privileges of users.
 Distributed directory (catalog) management. A directory contains information (metadata)
about data in the database. The directory may be global for the entire DDB, or local for each
site. The placement and distribution of the directory are design and policy issues.
These functions themselves increase the complexity of a DDBMS over a centralized DBMS. Before
we can realize the full potential advantages of distribution, we must find satisfactory solutions to
these design issues and problems. Including all this additional functionality is hard to accomplis h,
and finding optimal solutions is a step beyond that.

6.5. Types of Distributed Database Systems


The term distributed database management system can describe various systems that differ from
one another in many respects. The main thing that all such systems have in common is the fact that
data and software are distributed over multiple sites connected by some form of communicatio n
network. In this section we discuss a number of types of DDBMSs and the criteria and factors that
make some of these systems different.
The first factor we consider is the degree of homogeneity of the DDBMS software. If all servers (or
individual local DBMSs) use identical software and all users (clients) use identical software, the
DDBMS is called homogeneous; otherwise, it is called heterogeneous. Another factor related to the
degree of homogeneity is the degree of local autonomy. If there is no provision for the local site to
function as a standalone DBMS, then the system has no local autonomy. On the other hand, if direct
access by local transactions to a server is permitted, the system has some degree of local autonomy.
Figure below shows classification of DDBMS alternatives along orthogonal axes of distributio n,
autonomy, and heterogeneity. For a centralized database, there is complete autonomy, but a total
lack of distribution and heterogeneity (Point A in the figure). We see that the degree of local
autonomy provides further ground for classification into federated and multidatabase systems. At
one extreme of the autonomy spectrum, we have a DDBMS that looks like a centralized DBMS to
the user, with zero autonomy (Point B). A single conceptual schema exists, and all access to the

112 | P a g e
system is obtained through a site that is part of the DDBMS which means that no local autonomy
exists.
Along the autonomy axis we encounter two types of DDBMSs called federated database system
(Point C) and multidatabase system (Point D).In such systems, each server is an independent and
autonomous centralized DBMS that has its own local users, local transactions, and DBA, and hence
has a very high degree of local autonomy.

Figure 6. 4 types of distribute system

6.6. Distributed Database Architectures


In this section, we first briefly point out the distinction between parallel and distributed database
architectures. While both are prevalent in industry today, there are various manifestations of the
distributed architectures that are continuously evolving among large enterprises. The parallel
architecture is more common in high performance computing, where there is a need for
multiprocessor architectures to cope with the volume of data undergoing transaction processing and
warehousing applications. We then introduce a generic architecture of a distributed database.
This is followed by discussions on the architecture of three-tier client-server and federated database
systems.

6.7. Parallel versus Distributed Architectures


There are two main types of multiprocessor system architectures that are commonplace:

113 | P a g e
 Shared memory (tightly coupled) architecture. Multiple processors share secondary (disk)
storage and also share primary memory.
 Shared disk (loosely coupled) architecture. Multiple processors share secondary (disk)
storage but each has their own primary memory.
These architectures enable processors to communicate without the overhead of exchanging
messages over a network.
Database management systems developed using the above types of architectures are termed parallel
database management systems rather than DDBMSs, since they utilize parallel processor
technology.

Figure 6. 5 Parallel versus Distributed Architectures(Ramez Elmasri,2011)

Another type of multiprocessor architecture is called shared nothing architecture. In this


architecture, every processor has its own primary and secondary (disk) memory, no common
memory exists, and the processors communicate over a high speed interconnection network (bus or
switch). Although the shared nothing architecture resembles a distributed database computing
environment, major differences exist in the mode of operation.
In shared nothing multiprocessor systems, there is symmetry and homogeneity of nodes; this is not
true of the distributed database environment where heterogeneity of hardware and operating system
at each node is very common. Shared nothing architecture is also considered as an environment for
parallel databases.
114 | P a g e
6.8. General Architecture of Pure Distributed Databases
In this section we discuss both the logical and component architectural models of a DDB. In Figure
6.5, which describes the generic schema architecture of a DDB, the enterprise is presented with a
consistent, unified view showing the logical structure of underlying data across all nodes. This view
is represented by the global conceptual schema (GCS), which provides network). To accommodate
potential heterogeneity in the DDB, each node is shown as having its own local internal schema
(LIS) based on physical organization details at that particular site. The logical organization of data
at each site is specified by the local conceptual schema (LCS). The GCS, LCS, and their underlying
mappings provide the fragmentation and replication transparency discussed in Figure below shows
the component architecture of a DDB. It is an extension of its centralized counterpart. For the sake
of simplicity, common elements are not shown here. The global query compiler references the
global conceptual schema from the global system catalog to verify and impose defined constraints.
The global query optimizer references both global and local conceptual schemas and generates
optimized local queries from global queries. It evaluates all candidate strategies using a cost
function that estimates cost based on response time (CPU, I/O, and network latencies) and estimated
sizes of intermediate results. The latter is particularly important in queries involving joins. Having
computed the cost for each candidate, the optimizer selects the candidate with the minimum cost
for execution. Each local DBMS would have their local query optimizer, transaction manager, and
execution engines as well as the local system catalog, which houses the local schemas. The global
transaction manager is responsible for coordinating the execution across multiple sites in
conjunction with the local transaction manager at those sites.

115 | P a g e
Figure 6. 6 Parallel versus Distributed Architectures(Ramez Elmasri,2011)

6.9. Federated Database Schema Architecture


Typical five-level schema architecture to support global applications in the FDBS environment is
shown in Figure below. In this architecture, the local schema is the
Conceptual schema (full database definition) of a component database and the component schema
are derived by translating the local schema into a canonical data model or common data model
(CDM) for the FDBS. Schema translation from the local schema to the component schema is
accompanied by generating mappings to transform commands on a component schema into
commands on the corresponding local schema. The export schema represents the subset of a
component schema that is available to the FDBS.
The federated schema is the global schema or view, which is the result of integrating all the
shareable export schemas. The external schemas define the schema for a user group or an
application, as in the three-level schema architecture.

116 | P a g e
All the problems related to query processing, transaction processing, and directory and metadata
management and recovery apply to FDBSs with additional considerations.

Figure 6. 7 Federated Database Schema Architecture(Ramez Elmasri,2011)

6.10. An Overview of Three-Tier Client-Server Architecture


As we pointed out in the chapter introduction, full-scale DDBMSs have not been developed to
support all the types of functionalities that we have discussed so far. Instead, distributed database
applications are being developed in the context of the client-server architectures. It is now more
common to use three-tier architecture, particularly in Web applications .
In the three-tier client-server architecture, the following three layers exist:
1. Presentation layer (client).This provides the user interface and interacts with the user. The
programs at this layer present Web interfaces or forms to the client in order to interface with the
application. Web browsers are often utilized, and the languages and specifications used include
HTML, XHTML, CSS, Flash, MathML, Scalable Vector Graphics (SVG), Java, JavaScript,
Adobe Flex, and others. This layer handles user input, output, and navigation by accepting user
commands and displaying the needed information, usually in the form of static or dynamic Web

117 | P a g e
pages. The latter are employed when the interaction involves database access. When a Web
interface is used, this layer typically communicates with the application layer via the HTTP
protocol.
2. Application layer (business logic).This layer programs the application logic. For example,
queries can be formulated based on user input from the client, or query results can be formatted
and sent to the client for presentation. Additional application functionality can be handled at
this layer, such as security checks, identity verification, and other functions. The applicatio n
layer can interact with one or more databases or data sources as needed by connecting to the
database using ODBC, JDBC, SQL/CLI, or other database access techniques.
3. Database server. This layer handles query and update requests from the application layer,
processes the requests, and sends the results. Usually SQL is used to access the database if it is
relational or object-relational and stored database procedures may also be invoked. Query
results (and queries) may be formatted into XML when transmitted between the applicatio n
server and the database server.
Exactly how to divide the DBMS functionality between the client, application server, and database
server may vary. The common approach is to include the functionality of a centralized DBMS at
the database server level. A number of relational DBMS products have taken this approach, where
an SQL server is provided. The application server must then formulate the appropriate SQL queries
and connect to the database server when needed. The client provides the processing for user
interface interactions.
Since SQL is a relational standard, various SQL servers, possibly provided by different vendors,
can accept SQL commands through standards such as ODBC, JDBC, and SQL/CLI.
In this architecture, the application server may also refer to a data dictionary that include s
information on the distribution of data among the various SQL servers, as well as modules for
decomposing a global query into a number of local queries that can be executed at the various sites.
Interaction between an application server and database server might proceed as follows during the
processing of an SQL query
1. The application server formulates a user query based on input from the client layer and
decomposes it into a number of independent site queries. Each site query is sent to the
appropriate database server site.

118 | P a g e
2. Each database server processes the local query and sends the results to the application server
site. Increasingly, XML is being touted as the standard for data exchange, so the database server
may format the query result into XML before sending it to the application server.
3. The application server combines the results of the subqueries to produce the result of the
originally required query, formats it into HTML or some other form accepted by the client, and
sends it to the client site for display.
The application server is responsible for generating a distributed execution plan for a multisite query
or transaction and for supervising distributed execution by sending commands to servers. These
commands include local queries and transactions to be executed, as well as commands to transmit
data to other clients or servers.
Another function controlled by the application server (or coordinator) is that of ensuring consistenc y
of replicated copies of a data item by employing distributed (or global) concurrency control
techniques. The application server must also ensure the atomicity of global transactions by
performing global recovery when certain sites fail.
If the DDBMS has the capability to hide the details of data distribution from the application server,
then it enables the application server to execute global queries and transactions as though the
database were centralized, without having to specify the sites at which the data referenced in the
query or transaction resides.
This property is called distribution transparency. Some DDBMSs do not provide distributio n
transparency, instead requiring that applications are aware of the details of data distribution.

Figure 6. 8 An Overview of Three-Tier Client-Server Architecture(Ramez Elmasri,2011)

119 | P a g e
6.11. Data Fragmentation
In a DDB, decisions must be made regarding which site should be used to store which portions of
the database. For now, we will assume that there is no replication; that is, each relation or portion
of a relation is stored at one site only. We discuss replication and its effects later in this section. We
also use the terminology of relational databases, but similar concepts apply to other data models.
We assume that we are starting with a relational database schema and must decide on how to
distribute the relations over the various sites. To illustrate our discussion, we use the relationa l
database schema in table 6.1. as follow

Table 6. 1:- Data Fragmentation( Ramez Elmasri,2011)


Before we decide on how to distribute the data, we must determine the logical units of the database
that are to be distributed. The simplest logical units are the relations themselves; that is, each whole
relation is to be stored at a particular site. In our example, we must decide on a site to store each of
the relations EMPLOYEE, DEPARTMENT, PROJECT, WORKS_ON, and DEPENDENT. In
many cases, however, a relation can be divided into smaller logical units for distribution.
For example, consider the company database shown in Figure 3.6, and assume there are three
computer sites one for each department in the company. We may want to store the database
information relating to each department at the computer site for that department. A technique called
horizontal fragmentation can be used to partition each relation by department.

120 | P a g e
Horizontal Fragmentation. A horizontal fragment of a relation is a subset of the tuples in that
relation. The tuples that belong to the horizontal fragment are specified by a condition on one or
more attributes of the relation. Often, only a single attribute is involved. For example, we may
define three horizontal fragments on the EMPLOYEE relation in Figure 3.6 with the following
conditions: (Dno= 5), (Dno= 4), and (Dno= 1) each fragment contains the EMPLOYEE tuples
working for a particular department. Similarly, we may define three horizontal fragments for the
PROJECT relation, with the conditions (Dnum= 5), (Dnum= 4), and (Dnum= 1) each fragment
contains the PROJECT tuples controlled by a particular department. Horizontal fragmentatio n
divides a relation horizontally by grouping rows to create subsets of tuples, where each subset has
a certain logical meaning. These fragments can then be assigned to different sites in the distributed
system. Derived horizontal fragmentation applies the partitioning of a primary relation
(DEPARTMENT in our example) to other secondary relations (EMPLOYEE and PROJECT in our
example), which are related to the primary via a foreign key.
This way, related data between the primary and the secondary relations gets fragmented in the same
way.
Vertical Fragmentation. Each site may not need all the attributes of a relation, which would
indicate the need for a different type of fragmentation. Vertical fragmentation divides a relation
“vertically” by columns. A vertical fragment of a relation keeps only certain attributes of the
relation. For example, we may want to fragment the EMPLOYEE relation into two vertical
fragments. The first fragment includes personal information—Name, Bdate, Address, and Sex and
the second include work-related information Ssn, Salary, Super_ssn, and Dno. This vertical
fragmentation is not quite proper, because if the two fragments are stored separately, we cannot put
the original employee tuples back together, since there is no common attribute between the two
fragments. It is necessary to include the primary key or some candidate key attribute in every
vertical fragment so that the full relation can be reconstructed from the fragments. Hence, we must
add the Ssn attribute to the personal information fragment.
Notice that each horizontal fragment on a relation R can be specified in the relational algebra by a
σCi (R) operation. A set of horizontal fragments whose conditions C 1, C2, ...,Cn include all the
tuples in R—that is, every tuple in R satisfies (C1 ORC2 OR...ORCn ) is called a complete
horizontal fragmentation of R. In many cases a complete horizontal fragmentation is also disjoint;
that is, no tuple in R satisfies (Ci ANDCj ) for any i≠j. Our two earlier examples of horizonta l

121 | P a g e
fragmentation for the EMPLOYEE and PROJECT relations were both complete and disjoint. To
reconstruct the relation R from a complete horizontal fragmentation, we need to apply the UNION
operation to the fragments. A vertical fragment on a relation R can be specified by a πLi (R)
operation in the relational algebra. A set of vertical fragments whose projection lists L1 ,L2, ...,Ln
include all the attributes in R but share only the primary key attribute of R is called a complete
vertical fragmentation of R. In this case the projection lists satisfy the following two conditions:
■ L1∪L2∪...∪Ln=ATTRS(R).
■ Li ∩Lj=PK(R) for any i≠j,where ATTRS(R) is the set of attributes of R and PK(R) is the primary
key ofR.
To reconstruct the relation R from a complete vertical fragmentation, we apply the OUTER UNION
operation to the vertical fragments (assuming no horizontal fragmentation is used). Notice that we
could also apply a FULL OUTER JOIN operation and get the same result for a complete vertical
fragmentation, even when some horizontal fragmentation may also have been applied. The two
vertical fragments of the EMPLOYEE relation with projection lists L1 = {Ssn, Name, Bdate,
Address, Sex} and L2= {Ssn, Salary, Super_ssn, Dno} constitute a complete vertical fragmentatio n
of EMPLOYEE.
Two horizontal fragments that are neither complete nor disjoint are those defined on the
EMPLOYEE relation by the conditions (Salary> 50000) and (Dno= 4); they may not include all
EMPLOYEE tuples, and they may include common tuples. Two vertical fragments that are not
complete are those defined by the attribute lists L1 = {Name, Address} and L2 = {Ssn, Name,
Salary}; these lists violate both conditions of a complete vertical fragmentation.
Data Replication and Allocation
Replication is useful in improving the availability of data. The most extreme case is replication of
the whole database at every site in the distributed system, thus creating a fully replicated distributed
database.
This can improve availability remarkably because the system can continue to operate as long as at
least one site is up. It also improves performance of retrieval for global queries because the results
of such queries can be obtained locally from any one site; hence, a retrieval query can be processed
at the local site where it is submitted, if that site includes a server module.
The disadvantage of full replication is that it can slow down update operations drastically, since a
single logical update must be performed on every copy of the database to keep the copies consistent.
122 | P a g e
This is especially true if many copies of the database exist. Full replication makes the concurrenc y
control and recovery techniques more expensive than they would be if there was no replication.
The other extreme from full replication involves having no replication that is; each fragment is
stored at exactly one site. In this case, all fragments must be disjoint, except for the repetition of
primary keys among vertical (or mixed) fragments. This is also called non redundant allocatio n
between these two extremes, we have a wide spectrum of partial replication of the data that is, some
fragments of the database may be replicated whereas others may not.
The number of copies of each fragment can range from one up to the total number of sites in the
distributed system. A special case of partial replication is occurring heavily in applications where
mobile workers such as sales forces, financial planners, and claims adjustors carry partially
replicated databases with them on laptops and PDAs and synchronize them periodically with the
server database.
A description of the replication of fragments is sometimes called a replication schema.
Each fragment or each copy of a fragment must be assigned to a particular site in the distribute d
system. This process is called data distribution (or data allocation). The choice of sites and the
degree of replication depend on the performance and availability goals of the system and on the
types and frequencies of transactions submitted at each site. For example, if high availability is
required, transactions can be submitted at any site, and most transactions are retrieval only, a fully
replicated database is a good choice.
However, if certain transactions that access particular parts of the database are mostly submitted at
a particular site, the corresponding set of fragments can be allocated at that site only. Data that is
accessed at multiple sites can be replicated at those sites. If many updates are performed, it may be
useful to limit replication. Finding an optimal or even a good solution to distributed data allocatio n
is a complex optimization problem.

123 | P a g e
Table 6.2: Data Replication and Allocation( Ramez Elmasri,2011)

124 | P a g e
Example of Fragmentation, Allocation, and Replication
Suppose that the company has three computer sites one for each current department. Sites 2 and 3
are for departments 5 and 4, respectively. At each of these sites, we expect frequent access to the
EMPLOYEE and PROJECT information for the employees who work in that department and the
projects controlled by that department.Further, we assume that these sites mainly access
theName,Ssn,Salary, and Super_ssn attributes of EMPLOYEE. Site 1 is used by company
headquarters and accesses all employee and project information regularly, in addition to keeping
track of DEPENDENT information for insurance purposes
According to these requirements, the whole database in Figure can be stored at site 1. To determine
the fragments to be replicated at sites 2 and 3, first we can horizontally fragment DEPARTMENTb y
its key Dnumber. Then we apply derived fragmentation to the EMPLOYEE, PROJECT, and
DEPT_LOCATIONS relations based on their foreign keys for department number—called
Dno,Dnum, and Dnumber, respectively, in Figure We can vertically fragment the resulting
EMPLOYEE fragments to include only the attributes {Name,Ssn,Salary,Super_ssn,Dno}.Figur e
shows the mixed fragments EMPD_5andEMPD_4, which include the EMPLOYEE tuples
satisfying the conditions Dno= 5 and Dno= 4, respectively. The horizontal fragments of PROJECT,
DEPARTMENT, and DEPT_LOCATIONS are similarly fragmented by department number. All
these fragments—stored at sites 2 and 3—are replicated because they are also stored at headquarters
site 1.
We must now fragment the WORKS_ON relation and decide which fragments of WORKS_ON to
store at sites 2 and 3. We are confronted with the problem that no attribute of WORKS_ONdirectly
indicates the department to which each tuple belongs. In fact, each tuple in WORKS_ONrelates an
employee to a project P.
We could fragment WORKS_ON based on the department D in which works or based on the
department Dthat controls P. Fragmentation becomes easy if we have a constraint stating that
D=Dfor all WORKS_ON tuples—that is, if employees can work only on projects controlled by the
department they work for. However, there is no such constraint in our database in Figure 3.6. For
example, the WORKS_ONtuple <333445555, 10, 10.0> relates an employee who works for

125 | P a g e
department 5 with a project controlled by department 4. In this case, we could fragment
WORKS_ON based on the department in which the employee works (which is expressed by the
condition C) and then fragment further based on the department that controls the projects that
employee is working on.
In Figure, the union of fragments G1, G2, and G3 gives all WORKS_ONtuples for employees who
work for department 5. Similarly, the union of fragments G4,G5, and G6 gives all WORKS_ON
tuples for employees who work for department 4. On the other hand, the union of fragments G1,G4
, and G7 gives all WORKS_ON tuples for projects controlled by department 5. The condition for
each of the fragments G1 through G9 is shown in Figure below The relations that represent M:N
relationships, such as WORKS_ON, often have several possible logical fragmentations. In our
distribution in Figure below we choose to include all fragments that can be joined to

126 | P a g e
Table 6.2, Example of Fragmentation, Allocation, and Replication( Ramez Elmasri,2011)
127 | P a g e
either an EMPLOYEE tuple or a PROJECT tuple at sites 2 and 3. Hence, we place the union of
fragments G1, G2,G3,G4, and G7at site 2 and the union of fragments G4,G5,G6,G2, and G8 at
site3. Notice that fragments G2 andG4 are replicated at both sites. This allocation strategy permits
the join between the local EMPLOYEE or PROJECT fragments at site 2 or site 3 and the local
WORKS_ON fragment to be performed completely locally. This clearly demonstrates how
complex the problem of database fragmentation and allocation is for large databases. The Selected
Bibliography at the end of this chapter discusses some of the work done in this area.

6.14. Query Processing and Optimization in Distributed Databases


Now we give an overview of how a DDBMS processes and optimizes a query. First we discuss the
steps involved in query processing and then elaborate on the communication costs of processing a
distributed query. Finally we discuss a special operation, called a semi join, which is used to
optimize some types of queries in a DDBMS.
A detailed discussion about optimization algorithms is beyond the scope of this book. We attempt
to illustrate optimization principles using suitable examples.
Distributed Query Processing
A distributed database query is processed in stages as follows:
1. .Query Mapping. The input query on distributed data is specified formally using a query
language. It is then translated into an algebraic query on global relations. This translation is
done by referring to the global conceptual schema and does not take into account the actual
distribution and replication of data. Hence, this translation is largely identical to the one
performed in a centralized DBMS. It is first normalized, analyzed for semantic errors,
simplified, and finally restructured into an algebraic query.
2. Localization. In a distributed database, fragmentation results in relations being stored in
separate sites, with some fragments possibly being replicated. This stage maps the distributed
query on the global schema to separate queries on individual fragments using data distributio n
and replication information.
3. Global Query Optimization. Optimization consists of selecting a strategy from a list of
candidates that is closest to optimal. A list of candidate queries can be obtained by permuting
the ordering of operations within a fragment query generated by the previous stage. Time is the

128 | P a g e
preferred unit for measuring cost. The total cost is a weighted combination of costs such as CPU
cost, I/O costs, and communication costs. Since DDBs are connected by a network, often the
communication costs over the network are the most significant. This is especially true when the
sites are connected through a wide area network (WAN).
4. Local Query Optimization. This stage is common to all sites in the DDB. The techniques are
similar to those used in centralized systems.
The first three stages discussed above are performed at a central control site, while the last stage is
performed locally.

6.15, Data Transfer Costs of Distributed Query Processing


In a distributed system, several additional factors further complicate query processing. The first is
the cost of transferring data over the network. This data includes intermediate files that are
transferred to other sites for further processing, as well as the final result files that may have to be
transferred to the site where the query result is needed. Although these costs may not be very high
if the sites are connected via a high-performance local area network, they become quite significa nt
in other types of networks. Hence, DDBMS query optimization algorithms consider the goal of
reducing the amount of data transfer as an optimization criterion in choosing a distributed query
execution strategy.
We illustrate this with two simple sample queries. Suppose that the EMPLOYEE and
DEPARTMENT relations in Figure 3.5 are distributed at two sites as shown in Figure 25.10. We
will assume in this example that neither relation is fragmented. According to Figure 25.10, the size
of the EMPLOYEE relation is 100 *10,000 = 10 6 bytes, and the size of the DEPARTMENT relation
is 35 *100 = 3500 bytes. Consider the query Q:
For each employee, retrieve the employee name and the name of the department for which the
employee works. This can be stated as follows in the relational algebra:

The result of this query will include 10,000 records, assuming that every employee is related to a
department. Suppose that each record in the query result is 40 bytes long.

129 | P a g e
Table 6.3: Data Transfer Costs of Distributed Query Processing(Ramez Elmasri,2011)
The query is submitted at a distinct site 3, which is called the result site because the query result is
needed there. Neither the EMPLOYEE nor the DEPARTMENT relations reside at site 3. There are
three simple strategies for executing this distributed query:
1. Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, and perform
the join at site 3. In this case, a total of 1,000,000 + 3,500 = 1,003,500 bytes must be transferred.
2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send the result to site
3. The size of the query result is 40 *10,000 = 400,000 bytes, so 400,000 + 1,000,000 =
1,400,000 bytes must be transferred.
3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and send the result to
site 3. In this case, 400,000 + 3,500 = 403,500 bytes must be transferred.
If minimizing the amount of data transfer is our optimization criterion, we should choose strategy
3. Now consider another query Q: For each department, retrieve the department name and the name
of the department manager .This can be stated as follows in the relational algebra:

Again, suppose that the query is submitted at site 3. The same three strategies for executing query
Q apply to Q, except that the result of Qincludes only 100 records, assuming that each department
has a manager:
1. .Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, and perform
the join at site 3. In this case, a total of 1,000,000 + 3,500 = 1,003,500 bytes must be transferred.
130 | P a g e
2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send the result to site
3. The size of the query result is 40 *100 = 4,000 bytes, so 4,000 + 1,000,000 = 1,004,000 bytes
must be transferred.
3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and send the result to
site 3. In this case, 4,000 + 3,500 = 7,500 bytes must be transferred.
Again, we would choose strategy 3 this time by an overwhelming margin over strategies 1 and 2.
The preceding three strategies are the most obvious ones for the case where the result site (site 3) is
different from all the sites that contain files involved in the query (sites 1 and 2). However, suppose
that the result site is site 2; then we have two simple strategies:
1. Transfer the EMPLOYEE relation to site 2, execute the query, and present the result to the user
at site 2. Here, the same number of bytes 1,000,000 must be transferred for both Q and Q’.
2. Transfer the DEPARTMENT relation to site 1, execute the query at site 1, and send the result
back to site 2. In this case 400,000 + 3,500 = 403,500 bytes must be transferred for Q and 4,000
+ 3,500 = 7,500 bytes for Q.

Review questions
1. Which one of the following more describe about the distribute database system?
A. A collection of multiple database which have physical relationship and exist in different sites
B. A collection of multiple database which have found in different sites, but data are not have any
relation
C. The distributed database is manage by the DBMS software in different sites
D. While making decision distributed database is more transparent to the users
E. All of the above
F. None of the above
2. Which are the technologies are more used in the distribute database system?
A. user technology and database technology
B. Programming technology and database technology
C. Database technology and telecommunication technology
D. Programming technology and mobile technology
E. Programming technology and telecommunication technology
F. All of the above
3. The most condition satisfied by the distribute database are:-
131 | P a g e
A. The connection of the database nodes are connect over the computer programming language
B. The information of the database in different nodes are non-common-sense consistent to each
other
C. It is must that all sites have the same data, hardware, and software in any means
D. All of the above
E. None of the above
3. Which of the following are not the advantages of distributed database system?
A. It is improved the ease and flexibility of the application
B. It increase the data reliability and availability
C. It make as all distributed database users uses one centralized database
D. None of the above
E. All of the above
4. Distributed database typed as homogenous and heterogeneous based on:-
A. The physical location in which the distributed database is located
B. The logical relationship among the distributed database
C. The software which are need to implement the distributed database
D. All of the above
E. None of the above
5. From the layers involve in the distributed query processing the data fragmentation schema occur
on layer.
A. In query decomposition layer
B. In data localization layer
C. In global optimization layer
D. On distributed execution layer
6. From the layers involves on distribute query processing which one is take place on local site
A. In query decomposition layer
B. In data localization layer
C. In global optimization layer
D. On distributed execution layer
7. Which one of the following is the process of assigning the copy of the fragmentation for
particular site

132 | P a g e
A. Replication
B. Fragmentation
C. Duplication
D. Allocation
E. Data distribution
F. D and E
8. What is the main disadvantages of replication of the distribution of database:-
A. It is slow down the access of data from long distances
B. It is slow down the update of data on different sites
C. It increase the waiting time of the user while they access the data
D. All of the above

133 | P a g e
Chapter 7: Spatial /multimedia/mobile databases
The end of this course the students will be able to:-
Spatial data model spatial queries- multimedia data
Sources mobile databases-data processing

7.1. Introduction
Spatial data is associated with geographic locations such as cities, towns etc. A spatial database is
optimized to store and query data representing objects. These are the objects which are defined in a
geometric space.

7.1.1. Multimedia Database


It the collection of interrelated multimedia data that includes text, graphics (sketches,
drawings), images, animations, video, audio etc and have vast amounts of multisource
multimedia data. The framework that manages different types of multimedia data which
can be stored, delivered and utilized in different ways is known as multimedia database
management system. There are three classes of the multimedia database which includes
static media, dynamic media and dimensional media.
Content of Multimedia Database management system:
1. Media data – The actual data representing an object.
2. Media format data – Information such as sampling rate, resolution, encoding scheme
etc. about the format of the media data after it goes through the acquisition, processing
and encoding phase.
3. Media keyword data – Keywords description relating to the generation of data. It is
also known as content descriptive data. Example: date, time and place of recording.
4. Media feature data – Content dependent data such as the distribution of colors, kinds
of texture and different shapes present in data.
Types of multimedia applications based on data management characteristic are:
1. Repository applications – A Large amount of multimedia data as well as meta-data (Media
format date, Media keyword data, and Media feature data) that are stored for retrieval
purpose, e.g., Repository of satellite images, engineering drawings, and radiology scanned
pictures.
2. Presentation applications – They involve delivery of multimedia data subject to temporal
constraint. Optimal viewing or listening requires DBMS to deliver data at certain rate
offering the quality of service above a certain threshold. Here data is processed as it is
delivered. Example: Annotating of video and audio data, real-time editing analysis.
3. Collaborative work using multime dia information – It involves executing a complex task
by merging drawings, changing notifications. Example: Intelligent healthcare network.

134 | P a g e
There are still many challenges to multimedia databases, some of which are :

Modeling –Working in this area can improve database versus information retrieval techniques
thus, documents constitute a specialized area and deserve special consideration
Design – The conceptual, logical and physical design of multimedia databases has not yet been
addressed fully as performance and tuning issues at each level are far more complex as they
consist of a variety of formats like JPEG, GIF, PNG, MPEG which is not easy to convert from
one form to another.
Storage – Storage of multimedia database on any standard disk presents the problem of
representation, compression, mapping to device hierarchies, archiving and buffering during
input-output operation. In DBMS, a”BLOB”(Binary Large Object) facility allows un typed
bitmaps to be stored and retrieved.
Performance – For an application involving video playback or audio-video synchronization,
physical limitations dominate. The use of parallel processing may alleviate some problems but
such techniques are not yet fully developed. Apart from this multimedia database consume a
lot of processing time as well as bandwidth.
Queries and retrieval –For multimedia data like images, video, audio accessing data through
query opens up many issues like efficient query formulation, query execution and optimization
which need to be worked upon.

Areas where multimedia database is applied are:

Documents and record management: Industries and businesses that keep detailed records and
variety of documents. Example: Insurance claim record.

Knowledge dissemination: Multimedia database is a very effective tool for knowledge


dissemination in terms of providing several resources. Example: Electronic books.

Education and training: Computer-aided learning materials can be designed using multimed ia
sources which are nowadays very popular sources of learning. Example: Digital libraries.

Marketing, advertising, retailing, entertainment and travel. Example: a virtual tour of cities.

135 | P a g e
Real-time control and monitoring: Coupled with active database technology, multimed ia
presentation of information can be very effective means for monitoring and controlling complex
tasks Example: Manufacturing operation control.

7.1.2. Spatial data

Spatial data includes geographic data, such as Maps, Computer-Aided design such as Integrated circuit
design or building designs. Applications of spatial data were initially stored as files in a file system. But as
the complexity of and volume of data plus the number of users that have grown, Adhoc approaches to storing
and retrieving data from a file system

7.1.3. Characteristics of Spatial Database


A spatial database system has the following characteristics
 It is a database system
 It offers spatial data types (SDTs) in its data model and query language.
 It supports spatial data types in its implementation, providing at least spatial indexing and
efficient algorithms for spatial join.

Example

A road map is a visualization of geographic information. A road map is a 2-dimensional object which
contains points, lines, and polygons that can represent cities, roads, and political boundaries such as
states or provinces.
In general, spatial data can be of two types −

 Vector data: This data is represented as discrete points, lines and polygons

 Rastor data: This data is represented as a matrix of square cells.

136 | P a g e
Table 7. 1 Spatial data

The spatial data in the form of points, lines, polygons etc. is used by many different databases as
shown above.
7.1.4. TEMPORAL DATABASES

A temporal database is a database with built-in-support for handling data involving time.

 Typically, databases model only one state – the current state of the real world and don’t
store information about past states.
 When state of the real world changes, the database gets updated and information about old
state gets lost. However, it is also important to store and retrieve information about current
and past states.
Examples:
 Patient database must store information about the medical history of patient.
Judicial records.
Various sensory information.

So we define a temporal database – “Database that stores the states of real world across time”.

Temporal aspects in databases usually include:

 Valid Time.
137 | P a g e
 Transaction Time.
 Bi-temporal Data.

Table 7.1: Temporal Databases

7.1.5. Bi-temporal Relation

It is also true that we can store both Valid Time and Transaction Time in the databases. Such
a
relation involving both Transaction Time and Valid Time is known as Bi-Temporal
Relation.

Table 7.2: Bi-temporal Relation

STORY
 Story of Mr. X….
 Born on 3rd April 1971 at Pulwama.
 Father of Mr. X registers D.O.B on 4th April , 1971 at Pulwama.
 Mr. X completes his graduation on August,1990.
138 | P a g e
 For Job purposes Mr. X goes to Srinagar on August, 1991, but forgets to register his
new address officially.
 It was on December the 25, 1991 he registers the new address officially.
 Unfortunately, Mr. X was accidently hit by a speedy car on April 1 2001.
 The coroner reports the date of death on the very same day.

DIFFERENCE OF STORING DATA


If we store this information of Mr. X in a Current Database what will be the Relation like??
And if we store same information in a Temporal database what will be the Relation like???

Table 7.3: Difference of Storing Data

7.1.6. VALID TIME

A valid time is a time for which a fact is true in the real world.
This time period may be in the past, or span the current time.

Since the official recording the birth doesn’t know if X will move to some other place or

when he will move. In this case Valid To is set to *;

–Person (X, Pulwama, 1971/04/03, *);

–Person (X, SXR, 1991/12/25,*);

–The original entry isn‟t deleted instead state of database is :

–Person (X, Pulwama, 1971/04/03, 1991/12/25);

–Person (X, SXR, 1991/12/25, 1991/12/25, *);

139 | P a g e
–When Mr. X dies, the database looks like:

–Person (X, Pulwama, 1971/04/03, 1991/12/25);

–Person (X,SXR, 1991/12/25, 2001/04/01);

7.1.6. TRANSACTION TIME

 Transaction Time records the time period during which a database entry is accepted as correct.
 Valid time doesn’t record any event that wasn’t made public even though at later
stages, for some auditing purposes, transaction time will note the record if found true.
 Suppose Mr. X moved to Dubai from 1 June 1997 to 1 June 2000. But to avoid
increased taxations, he didn’t report it to authorities. On Feb. 2001 it was in fact discovered
that Mr. X lived in Dubai for so many years.
 Transaction Time will allow capturing this changing information in the database.
 Transaction Time takes into account each entry record when it was entered and when
it was superseded.

Table 7.4: TRANSACTION TIME

7.1.6. TIME SPECIFICATION IN SQL

 Date: four digits for the year (1--9999), two digits for the month (1--12), and two digits for the
date (1--31).
 Time: two digits for the hour, two digits for the minute, and two digits for the second, plus
optional fractional digits.
 Timestamp: the fields of date and time, with six fractional digits for the second’s field.

140 | P a g e
7.1.8. MULTIMEDIA DATABASES

Multimedia data are – Images, Audio and Video – They are the most popular form and
increasingly preferred data these days.
 One approach of storing multimedia data is in File Systems.

We know the size of this type of media is large, in fact very large, they were generally stored

Outside the database in file systems. General database features – Transactional Updates and
querying facilities are seriously dented when Multimedia object size is very large

Second approach

 Each media file has some descriptive attributes File/Data Statistics that include, when a
particular object is created, who created it and to which category it belongs.
 This approach tells that we will use a database for storing the descriptive attributes and keeping
track of files in which the multimedia objects are stored.

Drawbacks

What if that objects is missing or corrupted and database still points the location - multimedia
Data existed.
The better approach is to store the multimedia data in database itself.
 Issues that need to be addressed –
 Databases must support Large Objects

Given the size of multimedia data objects are in GBs, many database objects don‟t support
objects larger than a few GBs.

So larger objects must be split into smaller pieces and stored in databases.
Alternately, the multimedia object may be stored in a file system but database may contain a
pointer to the object – pointer being essentially a file name.

SQL/MED (– Management of External Data) is a standard that allows external data such as

files to be treated as if they are part of database.


Since the size of multimedia data is very large, it is essential that multimedia data is stored and
transmitted in compressed form.
141 | P a g e
Image is widely stored in JPEG

a) JPEG – Joint Pictures Expert Group.


b) MPEG – Moving Pictures Expert Group, developed the MPEG series of standards for
encoding video and audio data. MPEG format achieves high degree of compression.

MPEG-MOVING PICTURE EXPERTS GROUP

MPEG-1 stores a minute of 30-frame per second video and audio in approximately 12.5
MBs; compared to JPEG that stores same length of data at 75 MBs.
This huge difference is due to the fact the successive frames in JPEG are nearly same whereas
MPEG exploits commonalities among sequence of frames.

MPEG-1 experiences Loss of data – Quality of video isn‟t good. It is comparable to VHS
videotape.

MPEG-2 standard is designed for digital broadcast systems and DVDs.

 MPEG-2 experiences negligible loss of video quality.


 Minute video is approximately compressed to 17 MBs.

MPEG-4 provides techniques for further compression of video with variable bandwidth to support
delivery of video data over networks.

7.1.8. CONTINUOUS MEDIA DATA

 Most important types of Continuous media data are Video and Audio data.
 Continuous Media Systems are characterized by their Real Time Information – Deliver y
requirements.
 By Continuous Media Systems we mean data must be delivered smoothly i.e. No gaps
in the Audio or Video.
 Data must be delivered at a rate that doesn’t cause overflow of system buffers.
 Synchronization among distinct data streams must be maintained – Lip Synchronizatio n
of Audio and Video.

7.1.9. DATA RETRIEVAL

Usually data are fetched in periodic cycles. Each cycle may consist of some time period – say
„n‟ seconds.
142 | P a g e
Data will be fetched from database and stored in Memory Buffers.
This stored data will be sent to consumer’s display unit.
Finally the display unit displays the content.

Time Period

If Time Period/Periodic Cycle is small – More disk arm movements are required i.e.. Wastage
of resources.
If Time Period/Periodic Cycle is large – Large memory buffer requirement plus large
initial delay.
Admission Control: When a new request arrives, admission control comes into play, i.e.
System checks if request can be satisfied with the available resources if so it is admitted
otherwise it is rejected.

7.1.10. CONTENT DELIVERY NETWORKS

Content Delivery Networks provide content on demand.


It compromises of:
 Video Server: Multimedia data are stored on several disks – RAID configuration or even
Cloud Storage.
 Terminals: People view multimedia through various devices known as Terminals – PCs,
TVs, Setop-boxes.
 Network: Transmission of multimedia data from a server to multiple terminals requires a
high capacity network.

7.1.11. SIMILARITY-BASED RETRIEVAL

Pictorial Data:

 Two pictures or images that are slightly different as represented in the database may be
considered the same by a user. When a new trademark is to be registered, the system may need
to first to identify all similar trademarks that were registered previously.

Audio Data:

143 | P a g e
Speech-recognition interfaces have been developed that allow the user to give a command or
identify a data item by speaking. The input from user must be then tested for similarity to the
commands stored in the system.

Handwritten Data:

 Signatures are being created to authenticate a particular customer for Bank Account
Validations.

7.1.12. MOBILITY AND PERSONAL DATABASES

Personal Databases have come to exist due to the fact that:

Widespread use of Laptop, Notebooks and PCs.


Widespread use of Cell Phones with the capabilities of a computer.

Mobile Computing

A technology which we avail every second in today’s world.


Wireless Technology has created a situation where machines no longer have fixed locations and
Network Addresses.

7.2.MOBILITY

Location-dependent Queries are a different class of queries, in which location of the user
(computer) is a parameter of the query.

 Value of the location parameter is provided by a Global Positioning System (GPS).

Example Make mytrip and Goibibo Hotels, that involve providing data on Hotels, Roadside
Services to the users travelling a particular destination.

Processing of queries about services ahead on the current route has certain constraints –
like Direction of Motion, Speed of a particular user.

In mobility the Energy (Battery Lifetime) is a scarce resource. This has a great influence on
System Design Architecture, Impact on protocols used to communicate with mobile devices.

7.2.1.MODEL OF MOBILE COMPUTING

Mobile-Computing environment consists of mobile computers known as Mobile Hosts.

144 | P a g e
 The Mobile Hosts communicate with the wired network via computers known as Mobile
Support Stations.
 Cell: is a particular geographical area that a particular Mobile Support Station cover and
support.
 Handoffs are important to support mobility services among users.
 It is also possible for mobile hosts to communicate directly without the intervention of mobile
support station. This is possible only for short range communications like Bluetooth. Range of
10 m and a speed of 721 Kbps.

Bluetooth, Wireless LANs and 2.5G and 3G cellular networks make it possible for a wide

Variety of devices to communicate at low cost. The accounting, monitoring and management data

Pertaining to this communication generate huge databases.

 To improve on the energy efficiency side of mobile devices it is always a preferable method to
use low power Flash Memories and power down a component if that is not in use for a while.
 WAP – Wireless Application Protocol is a standard protocol for Wireless Internet access that
take into measure constraints of mobile and Wireless Web browsing.

7.3. SPATIAL AND GEOGRAPHIC DATA

Spatial data support in databases is important for efficiently storing, indexing and querying of

data on the basis of spatial locations.

Two types of spatial data are particularly important:

 Computer-aided-design (CAD) data: includes spatial information about how objects -such
as buildings, cars or aircraft – are constructed. Other examples that include computer-aided-
design databases are integrated-circuit and electronic-device layouts.
 Geographic Data: Such as road maps, land-usage maps, topographic elevation maps,
political maps showing boundaries, land ownership maps, and so on.

Geographic information systems are special purpose databases tailored for storing geographic
data.

145 | P a g e
7.3.1. GEOGRAPHIC DATA AND APPLICATIONS

Geographic data are spatial in nature, and are of the form of Maps and Satellite images.

Maps in particular not only provide location information like boundaries, rivers and roads but

also a detailed information like Locations, Elevations, Soil Type, Land Usage and Annual

Rainfall.

Applications

 Use in Online Map Services, Vehicle Navigation System.


 Distributed-Network Information for Public-service utilities such as
 Electric Power and Water Supply Systems.
 Land Usage information for ecologists and planners.
 Geographic data is categorized into two types:

Raster Data

 Raster data consists of bit maps or pixel maps, in two or more dimensions. Best example
Satellite image of an area that not only includes the actual image but information like latitude
and longitude of its corners.
 Raster data is often represented as tiles, each covering a fixed size area. A larger area can be
displayed by displaying all the tiles overlapping that area.
 To allow the display of a data at different zoom levels, a separate set of tiles is created for each
zoom level.. Once the zoom level is set by the user, tiles at that zoom level, which overlap the
area being displayed are retrieved and displayed.
 Raster data can be 3-dimensional.

Vector Data

 Vector data are constructed from basic geometric objects such as Points, Line Segments,
Polylines, Triangles and other polygons in 2-D and Cylinders, Spheres, Cuboids are 3-D data.
 Map data are often represented in vector format. Roads being represented as Polylines.
 Geographic features such as large lakes, states and countries are represented as complex
polygons., while as rivers may be represented as complex curves.
146 | P a g e
 Topographical information, i.e information about the elevatio n of each point on surface can be
represented in raster form.

Nearness Queries

A nearness query request objects that lie near a specified location. A query to find all ATMs that
lie within a given distance is an example of a nearness query.

The nearest neighbor query requests the object that is nearest to a specified point.

Example – We want to find a nearest Recharge Point.

Region Queries: Region queries deal with spatial regions. Such a query can ask for objects that

lie partially or fully inside a specified region.

Example – A query to find all retail shops within the geographic boundaries of a given town.
The end

Review Questions

1. Discuss in detail about the Emerging Trends In Databases


2. Discuss about Time Specification In Sql
3. Discuss about the Multimedia Databases…
4. Discuss about the Mobility And Personal Databases
5. Discuss about the Model Of Mobile Computing…
1. Which image files are a lossy format?
A. GIF
B. MPEG
C. JPEG
D. PNG
2. MP3 is an extension of a _______ file.
E. (A) Video file
F. (B) Graphics image
G. (C) Audio file
H. (D) Text file

147 | P a g e
3. A structure of linked elements through which the user can navigate, interactive multimed ia
becomes ______.
A. Hypermedia
B. Hypertext
C. Inter media
D. Digital media
4. Moving Picture Experts Group (MPEG-2), was designed for high-quality DVD with a data rate
of
A. 3 to 6 Mbps
B. 4 to 6 Mbps
C. 5 to 6 Mbps
D. 6 to 6 Mbps

148 | P a g e
References
1. C. J. Date, A. Kannan and S. Swamynathan, An Introduction to Database Systems, Pearson Education,
Eighth Edition, 2009.
2. Abraham Silberschatz, Henry F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill
Education (Asia), Fifth Edition, 2006.
3. Shio Kumar Singh, Database Systems Concepts, Designs and Application, Pearson Education, Second
Edition, 2011.
4. Peter Rob and Carlos Coronel, Database Systems Design, Implementation and Management, Thomson
Learning-Course Technology, Seventh Edition, 2007.
5. Patrick O’Neil and Elizabeth O’Neil, Database Principles, Programming and Performance, Harcourt
Asia Pte. Ltd., First Edition, 2001.
6. Atul Kahate, Introduction to Database Management Systems, Pearson
7. Silberschatz−Korth−Sudarshan, “Database System Concepts”, The McGraw−Hill Companies.
8. Ramez Elmasri, Shamkant B. Navathe, “Fundamental OF Database Systems”, Pearson.
9. Ramez Elmasri, Shamkant B. Navathe(2011), Database Systems(Book)

149 | P a g e

You might also like