Professional Documents
Culture Documents
State
Real world entity
Behavior
1
© e-Learning Centre, UCSC
3
Object Databases - Overview of Object Database
Concepts
1
© e-Learning Centre, UCSC
4
Object Databases - Overview of Object Database
Concepts
1
© e-Learning Centre, UCSC
5
Object Databases - Overview of Object Database
Concepts
• Object Identity
• Database objects needs to have coherence with real-world
objects in order to preserve their integrity and identity. It will
make easy to identify objects and operate on. Therefore, a
unique identity is assigned to each independent object
stored in the database known as Object Identifier (OID).
• OID generally is system generated.
• The value of the OID might be hidden from external users.
However, it is used inside the system to uniquely identify the
objects and to create and manage inter-object references.
1
© e-Learning Centre, UCSC
6
Object Databases - Overview of Object Database
Concepts
• Object Identifier (OID)
• Main properties of OID,
‒ Immutable - value of OID does not change
‒ Unique - used only once. OID of deleted object will
not be assigned to a new object in the future.
• According to these two properties of OID, OID is independent
from any attribute value of an object. (because attribute value
may change over time)
• Object database systems must have mechanism to generate
OID and preserve immutability. Since an object retains its
identifier over its lifetime, the object remains the same despite
changes in its state.
• OID is similar to the primary key attribute in relational
databases, which is used to uniquely identify the tuples.
1
© e-Learning Centre, UCSC
7
Object Databases - Overview of Object Database
Concepts
Literals
• The Object Model supports different literal types, which are
considered as attribute values.
• Literals are embedded inside objects and the object model
facilitates to define complex structured literals within an object.
• Literals do not have identifiers (OIDs) and, therefore, cannot be
individually referenced like objects from other objects.
1
© e-Learning Centre, UCSC
8
Object Databases - Overview of Object Database
Concepts
Literals
The literal types supported by the Object Model are
– Single-valued or atomic types where each value of the
type is considered as an atomic (indivisible) single value.
– Struct (or tuple) constructor which is used to create
standard structured types, such as the tuples (record types)
in the basic relational model.
– Collection (or multivalued) type constructors which
include the set(T), list(T), bag(T), array(T), and
dictionary(K,T) type constructors.
1
© e-Learning Centre, UCSC
9
Object Databases - Overview of Object Database
Concepts
2
© e-Learning Centre, UCSC
0
Object Databases - Overview of Object Database
Concepts
2
© e-Learning Centre, UCSC
1
Object Databases - Overview of Object Database
Concepts
2
© e-Learning Centre, UCSC
2
Object Databases - Overview of Object Database
Concepts
2
© e-Learning Centre, UCSC
3
Object Databases - Overview of Object Database
Concepts
Complex Object
The literal types supported
by the object model enables
to define complex objects
Set Primitive
which may consist of other Obj Obj
List Primitive
objects as illustrated by the Obj Obj
following Department
example..
define type DEPARTMENT
tuple ( Dname: string; Atomic Type
Tuple Type
Dnumber: integer;
Mgr: tuple ( Manager: EMPLOYEE;
Start_date: DATE; );
Locations: set(string); Set Type
Employees: set(EMPLOYEE);
Projects: set(PROJECT); ); 2
© e-Learning Centre, UCSC
4
Activity
2
© e-Learning Centre, UCSC
5
Object Databases - Overview of Object Database
Concepts
Encapsulation
• The concept of encapsulation is applied to database objects
and it is the mechanism that binds together code and the data it
manipulates.
• Thus, encapsulation defines the behavior of a type of object
based on the operations that can be externally applied to
objects of that type.
• Encapsulation can bring a form of data and operation
independence. Therefore, encapsulation is encouraged by
defining the operation in two parts namely signature and
method.
2
© e-Learning Centre, UCSC
6
Object Databases - Overview of Object Database
Concepts
Encapsulation
‒ Signature/Interface of the operation - Specifies the name of
the operations and its arguments (parameters).
‒ Method/Body - Specifies the implementation of the
operation.
• External programs pass messages to the objects, to invoke
operations which includes the operation name and the
parameters.
• Thus, encapsulation restricts the direct access to the values of an
object as the values are required to be accessed through the pre-
defined methods.
2
© e-Learning Centre, UCSC
7
Object Databases - Overview of Object Database
Concepts
Encapsulation
• For database applications, the requirement that all objects be
completely encapsulated is too strict.
• This requirement is relaxed by dividing the structure of an object
into visible and hidden attributes (instance variables).
• Visible attributes can be seen by and are directly accessible to
the database users and programmers via the query language.
• The hidden attributes of an object are completely encapsulated
and can be accessed only through predefined operations.
2
© e-Learning Centre, UCSC
8
Object Databases - Overview of Object Database
Concepts
Class
2
© e-Learning Centre, UCSC
9
Object Databases - Overview of Object Database
Concepts
Class
• The operations which would be defined in a class may include
the
– object constructor operation (often called new), which is
used to create a new object
– destructor operation, which is used to destroy (delete) an
object.
– A number of object modifier operations can also be
declared to modify the states (values) of various attributes of
an object.
– Additional operations can retrieve information about the
object.
3
© e-Learning Centre, UCSC
0
Object Databases - Overview of Object Database
Concepts
The following example shows how the type definitions can be
extended with operations to define classes.
define class DEPARTMENT
type tuple ( Dname: string;
Dnumber: integer;
Mgr: tuple ( Manager: EMPLOYEE;
Start_date: DATE; );
An operation is applied Locations: set (string);
to an object by using
Employees: set (EMPLOYEE);
the dot notation. For
example, if d is a Projects set(PROJECT); );
reference to a operations no_of_emps: integer;
DEPARTMENT object, create_dept: DEPARTMENT;
an operation such as delete_dept: boolean;
no_of_emps can be assign_emp(e: EMPLOYEE): boolean;
invoked by writing (* adds an employee to the department *)
d.no_of_emps. remove_emp(e: EMPLOYEE): boolean;
(* removes an employee from the
department *)
end DEPARTMENT;
3
© e-Learning Centre, UCSC
1
Object Databases - Overview of Object Database
Concepts
• Encapsulation of Operations
define class Student
type tuple ( Firstname: string;
Lname: string;
NIC: string;
Birth_date: DATE;
Address: string;
Gender: char;
Dept: DEPARTMENT; );
operations age: integer;
create_stue: Student;
destroy_stu: Boolean;
end Student;
3
© e-Learning Centre, UCSC
2
Activity
3
© e-Learning Centre, UCSC
3
Object Databases - Overview of Object Database
Concepts
• Persistence of Objects
• Transient objects - in OOPL, objects only exist while the
program is running. Hence, these objects are known as
transient objects.
• Persistent objects - objects that are exist even after the
termination of the program.
• There are two mechanism to make an object persistence.
i) Naming Mechanism
ii) Reachability Mechanism
3
© e-Learning Centre, UCSC
4
Object Databases - Overview of Object Database
Concepts
Persistence of Objects
i) Naming Mechanism
• This mechanism assigns a name to an object which is
unique within database. An operation or a statement can be
used to specify the name.
• Users and applications perform database access through
the named persistent objects which are used as entry points
to the database.
• However, it is not practical to name all objects in a large
database that includes thousands of objects. Therefore,
most objects are made persistent by using the second
mechanism, called reachability.
3
© e-Learning Centre, UCSC
5
Object Databases - Overview of Object Database
Concepts
Persistence of Objects
ii) Reachability Mechanism
• The reachability mechanism works by making the object
reachable from some other persistent object.
3
© e-Learning Centre, UCSC
6
Object Databases - Overview of Object Database
Concepts
3
© e-Learning Centre, UCSC
7
Class DEPARTMENT_SET is
defined as a collection of
define class DEPARTMENT_SET DEPARTMENT objects.
type set (DEPARTMENT);
operations add_dept(d: DEPARTMENT): boolean;
(* adds a department to the
DEPARTMENT_SET object *)
define class DEPARTMENT
remove_dept(d: DEPARTMENT): boolean; type tuple ( Dname: string;
(* removes a department from the Dnumber: integer;
DEPARTMENT_SET object *) Mgr: tuple ( Manager: EMPLOYEE;
Start_date: DATE; );
create_dept_set: DEPARTMENT_SET; Locations: set (string);
destroy_dept_set: boolean; Employees: set (EMPLOYEE);
Projects set(PROJECT); );
end Department_Set; persistent collection of objects of
Operations
… class DEPARTMENT known as
no_of_emps: integer;
Extent
create_dept: DEPARTMENT;
persistent name ALL_DEPARTMENTS: destroy_dept: boolean;
DEPARTMENT_SET; assign_emp(e: EMPLOYEE): boolean;
(* ALL_DEPARTMENTS is a persistent named object of (* adds an employee to the department *)
type DEPARTMENT_SET *) remove_emp(e: EMPLOYEE): boolean;
(* removes an employee from the department *)
…
end DEPARTMENT;
d:= create_dept;
(* create a new DEPARTMENT object in the variable d *)
DEPARTMENT object d is added to the set of
… ALL_DEPARTMENTS by using the add_dept
b:= ALL_DEPARTMENTS.add_dept(d); operation.
38
Object Databases - Overview of Object Database
Concepts
3
© e-Learning Centre, UCSC
9
Object Databases - Overview of Object Database
Concepts
4
© e-Learning Centre, UCSC
0
Object Databases - Overview of Object Database
Concepts
Type Hierarchies and Inheritance
• Type/class Hierarchies and Inheritance are two key features in
Object database. Inheritance direct to type/class hierarchies.
• Inheritance is a key concept in Object model which allows to
inherit structure and/or operations of previously defined classes.
• Inheritance promotes the reuse of existing type definitions and
ease the incremental development of data types.
Rectangle Triangle
Length:int Base:int
Width:int Height:int
4
© e-Learning Centre, UCSC
1
Object Databases - Overview of Object Database
Concepts
4
© e-Learning Centre, UCSC
2
Object Databases - Overview of Object Database
Concepts
4
© e-Learning Centre, UCSC
3
Object Databases - Overview of Object Database
Concepts
NIC, Name,
Address, Birthdate
STUDENT EMPLOYEE
Attributes of Attributes of
Student Employee
(NIC, Name, (NIC, Name,
Address, Address,
Birthdate, Birthdate,
RegNo, Empid, Salary,
IndexNo, Gpa) RegNo, IndexNo, Empid, Salary, Hire_date)
Gpa Hire_date
4
© e-Learning Centre, UCSC
5
Object Databases - Overview of Object Database
Concepts
4
© e-Learning Centre, UCSC
7
Object Databases - Overview of Object Database
Concepts
Geometric_Obj
Name, Colour, Area,
Cal_area
Circle_Obj Rectangle_Obj :
Radius width, height
GetRadius GetWidth(w:float)
(rad:float) Getheight(h:float)
4
© e-Learning Centre, UCSC
8
Activity
4
© e-Learning Centre, UCSC
9
Object Databases - Overview of Object Database
Concepts
Multiple Inheritance
• Multiple inheritance takes place when a subtype inherits
from two or more supertypes. In such cases, the subtype
may inherit all the functions of all it’s supertypes.
• A subtype ENGINEERING_MANAGER that is a subtype of
both MANAGER and ENGINEER.
• A type lattice is created as a result of multiple inheritance
rather than a type hierarchy.
5
© e-Learning Centre, UCSC
0
Object Databases - Overview of Object Database
Concepts
EMPLOYEE
ENGINEERING-MANAGER
5
© e-Learning Centre, UCSC
1
Object Databases - Overview of Object Database
Concepts
5
© e-Learning Centre, UCSC
2
Object Databases - Overview of Object Database
Concepts
5
© e-Learning Centre, UCSC
3
Object Databases - Overview of Object Database
Concepts
Selective inheritance
• Selective inheritance enables subtype to inherit only some of
the functions of a supertype. Other functions are not inherited.
• The functions in a supertype that are not to be inherited by the
subtype are defined through EXCEPT clause .
• The mechanism of selective inheritance is not facilitated in
ODBs, but it is used more frequently in artificial intelligence
applications.
5
© e-Learning Centre, UCSC
4
Object Databases - Overview of Object Database
Concepts
5
© e-Learning Centre, UCSC
5
Object Databases - Overview of Object Database
Concepts
5
© e-Learning Centre, UCSC
6
Object Databases - Overview of Object Database
Concepts
Geometric_Obj
Name, Colour, Area,
Cal_area
Circle_Obj Rectangle_Obj
Radius width, height
GetRadius GetWidth(width:float)
(radius:float) Getheight(height:float)
Cal_area Cal_area
(radius: float) (width : float, height: float)
5
© e-Learning Centre, UCSC
7
Object Databases - Overview of Object Database
Concepts
5
© e-Learning Centre, UCSC
8
Activity
5
© e-Learning Centre, UCSC
9
XML Databases - Reason for the Origination of XML
6
© e-Learning Centre, UCSC
0
XML Databases - Reason for the Origination of XML
6
© e-Learning Centre, UCSC
1
XML Databases - Reason for the Origination of XML
Different
output
with Interactive
User Input
Output
different
input
6
© e-Learning Centre, UCSC
2
Activity
6
© e-Learning Centre, UCSC
3
XML Databases - Structured, Semi - Structured and
Unstructured Data
Structured Data
• Information in Relational Databases - Structured Data
• Each Data Record in the database follows the same
structure
6
© e-Learning Centre, UCSC
4
XML Databases - Structured, Semi - Structured and
Unstructured Data
Semi - Structured Data
• Some data is collected in unplanned situations. Therefore,
all the data may not have the same format.
• Data may have a certain structure. However, not all the
data has the same structure
E.g: Some have additional attributes; Some
attributes maybe present only in some data.
• No predefined schema
• Data model to represent semi - structured data
- Tree data structure
- Graph data structure
6
© e-Learning Centre, UCSC
5
XML Databases - Structured, Semi - Structured and
Unstructured Data
Pick &
Ride P001 Colombo
6
© e-Learning Centre, UCSC
6
XML Databases - Structured, Semi - Structured and
Unstructured Data
6
© e-Learning Centre, UCSC
7
XML Databases - Structured, Semi - Structured and
Unstructured Data
Internal Nodes
Pick &
Ride P001 Colombo
Labels / Tags
Leaf Nodes
6
© e-Learning Centre, UCSC
8
XML Databases - Structured, Semi - Structured and
Unstructured Data
6
© e-Learning Centre, UCSC
9
XML Databases - Structured, Semi - Structured and
Unstructured Data
Unstructured Data
• Almost has no specification of the type of data
eg: Web pages designed using HTML
• <p></p> - Outputs whatever data inside the two tags
regardless of the meaning of them.
• <p><b></b></p> - Data formatting is also mixed with data
values.
7
© e-Learning Centre, UCSC
0
XML Databases - Structured, Semi - Structured and
Unstructured Data
7
© e-Learning Centre, UCSC
1
XML Databases - Structured, Semi - Structured and
Unstructured Data
• Unstructured Data
• HTML documents are harder to analyze and interpret
using software since they do not have schema information
on the type of data.
• However, with the shifting of the human needs to an
online environment, it is required to have correct
interpretations on data presented online and exchange
these data.
• This need gave rise to the use of XML in online data
manipulation environments.
7
© e-Learning Centre, UCSC
2
Activity
7
© e-Learning Centre, UCSC
3
XML Databases - XML Hierarchical (Tree) Data Model
– Attributes
<book>
<title>Gamperaliya </title>
<author id="ABC"> Martin Wickramasinghe </author>
</book>
<book>
<title>Siyalanga Ruu Soba </title>
Element <author id="ABC"> J. B. Dissanayake </author>
Attribute
</book>
7
© e-Learning Centre, UCSC
5
XML Databases - XML Hierarchical (Tree) Data Model
• Constructing an XML Document
7
© e-Learning Centre, UCSC
6
XML Databases - XML Hierarchical (Tree) Data Model
7
© e-Learning Centre, UCSC
7
XML Databases - XML Hierarchical (Tree) Data Model
Complex Elements
Simple Elements
7
© e-Learning Centre, UCSC
8
XML Databases - XML Hierarchical (Tree) Data Model
7
© e-Learning Centre, UCSC
9
XML Databases - XML Hierarchical (Tree) Data Model
8
© e-Learning Centre, UCSC
0
XML Databases - XML Hierarchical (Tree) Data Model
8
© e-Learning Centre, UCSC
1
XML Databases - XML Hierarchical (Tree) Data Model
8
© e-Learning Centre, UCSC
2
XML Databases - XML Hierarchical (Tree) Data Model
8
© e-Learning Centre, UCSC
3
XML Databases - XML Hierarchical (Tree) Data Model
8
© e-Learning Centre, UCSC
4
Activity
8
© e-Learning Centre, UCSC
5
NoSQL Databases - Origins of NoSQL
8
© e-Learning Centre, UCSC
6
NoSQL Databases - Origins of NoSQL
Impedance Mismatch
• Impedance mismatch is the term used to refer to the
dissimilarity between the relational database model and the
programming language model (data structures in-memory).
• Although relational data model represents data as relations
in tabular format which is simple, it introduces certain
limitations. In particular, tabular representation cannot
contain any structure, such as a nested record or a list.
8
© e-Learning Centre, UCSC
7
NoSQL Databases - Origins of NoSQL
Impedance Mismatch
There are no such limitations for in-memory data structures,
which can take on much richer structures than relations.
Consequently, a richer in-memory data structure is required to be
translated into a relational representation to store it on disk.
These two different representations would cause the impedance
mismatch which require translation from one representation to the
other.
8
© e-Learning Centre, UCSC
8
NoSQL Databases - Origins of NoSQL
Orders
ID: 0001
Customer: Kamal
Customers
8
© e-Learning Centre, UCSC
9
NoSQL Databases - Origins of NoSQL
Impedance Mismatch
• Impedance mismatch is made much easier with the object-
relational mapping frameworks, such as Hibernate that
implement well-known mapping patterns.
• Although these mapping frameworks remove a lot of tedious
work the mapping problem is still an issue since the problems
in representing the in-memory data structures from
the database perspective is not taken into consideration. This
would hinder the query performance.
9
© e-Learning Centre, UCSC
0
NoSQL Databases - Origins of NoSQL
9
© e-Learning Centre, UCSC
1
NoSQL Databases - Origins of NoSQL
Drawbacks of Shared Database Integration
The drawbacks of Database Integration are as given below:
• A structure that is designed to integrate many applications
often become more complex than any single application
needs.
• If an application requires to make changes to its data storage, it
needs to coordinate with all the other applications using the
database.
• Since different applications have different structural and
performance needs, an index required by one application may
hinder the performance of insert operations of another
application.
• Usually, a separate team is responsible for each application.
This means that the database cannot trust applications to
update the data in a way that preserves database integrity.
Thus, the responsibility of preserving database integrity is
within the database itself.
9
© e-Learning Centre, UCSC
2
NoSQL Databases - Origins of NoSQL
• Since the application team takes care of both the database and
the application code, the responsibility for database integrity
can be passed onto the application code.
• Web services (where applications would communicate over
HTTP) enabled a new form of a widely used communication
mechanism which challenged the use of SQL with shared
databases.
9
© e-Learning Centre, UCSC
3
NoSQL Databases - Origins of NoSQL
Sales system
Sales
Application
database Database
Shared
Web service
Database Common
Integration database
integration
Inventory system
Application
Inventory system Database
Inventory
system
9
© e-Learning Centre, UCSC
4
NoSQL Databases - Origins of NoSQL
9
© e-Learning Centre, UCSC
5
NoSQL Databases - Origins of NoSQL
9
© e-Learning Centre, UCSC
6
NoSQL Databases - Origins of NoSQL
9
© e-Learning Centre, UCSC
7
NoSQL Databases - Origins of NoSQL
NoSQL databases emerged mainly to provide the following
advantages
a) Flexible data modeling: NoSQL emerged as a solution to the
impedance mismatch problem between relational data models and
object-oriented data models. NoSQL covers four different data
organization models as given below which are highly-customizable
to different businesses' needs.
i) Document databases: store data as documents similar to
JSON (JavaScript Object Notation) objects. Each document has
pairs of fields and values.
ii) Key-Value databases: represent a simpler type of database
where each item contains keys and values. A value can only be
retrieved by referencing its key and thus querying for a specific
key-value pair is simple.
iii) Column-Family databases: store data in tables, rows, and
in dynamic columns. This data model provides a lot of flexibility
over relational databases because each row is not required to
have the identical columns.
9
© e-Learning Centre, UCSC
8
NoSQL Databases - Origins of NoSQL
1
© e-Learning Centre, UCSC 0
NoSQL Databases - Origins of NoSQL
1
© e-Learning Centre, UCSC 0
NoSQL Databases - Origins of NoSQL
1
© e-Learning Centre, UCSC 0
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 0
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 0
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 0
NoSQL Databases - Data models in NoSQL: Sample
of data stored in the tables
Customer Order
1 Shantha 99 1 77
27 Laptop 55 1 77
Order Payment
33 99 34-886-89 55 act-564 1
© e-Learning Centre, UCSC 0
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 1
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 1
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 1
NoSQL Databases - Data models in NoSQL
<Bucket = userData>
<Key = sessionID>
<Value = object>
UserProfile
SessionData
shopping Cart
CartItem
CartItem
1
© e-Learning Centre, UCSC 1
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 1
NoSQL Databases - Data models in NoSQL
Key
111-444-3333 222-555-7777 222-555-777
1
© e-Learning Centre, UCSC 2
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 2
NoSQL Databases - Data models in NoSQL
Document Data Model Example
Relational Document
model model
Tables Collections
Rows Documents
1
© e-Learning Centre, UCSC 2
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 2
NoSQL Databases - Data models in NoSQL
Column
Column-Family Example Column
value
key
6783
ORDER1 data...
ORDER2 data...
row key
ORDER3 data...
Orders
ORDER4 data...
1
© e-Learning Centre, UCSC 2
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 2
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 2
NoSQL Databases - Data models in NoSQL
1
© e-Learning Centre, UCSC 2
NoSQL Databases - Data models in NoSQL
Rows Vertices
Krishna
Columns Key/value pairs
Joins Edges
Fathima Chamal
1 1
© e-Learning Centre, UCSC 2 2
Activity
Match the most relevant description, with the data models given.
1
© e-Learning Centre, UCSC 3
Activity
You have given a set of NoSQL databases and data models. Drop the
databases in left hand side to its relevant data model in right hand side.
Key-value
HBase Neo4j
HyperTable Redis
Document
CouchDB MongoDB
What is the most suitable NoSQL database model for the above use
case?
Drag and drop the correct answer to the blanks from the given list.
(Consistency, Availability, Partition Tolerance, master-slave,
Replication, replica sets, column, memtable)
The CAP theorem states that we can ensure only two features
of_______, _________, and ________________. In Document
databases, availability is improved by data replication using the
______ setup. With __________, we can access data stored in
multiple nodes and the clients do not have to worry about the failure
of the primary node since data is available in other nodes as well. In
MongoDB, availability is achieved using ___________. Cassandra
is a column-family database which uses _____ as the basic unit of
storage. The procedure of receiving a write by Cassandra is as
follows. Data is stored into memory only after stored in a commit log.
The term used to describe that in-memory structure is ________.
Drag and drop the correct answer to the blanks from the given list.
(aggregate , unit, ACID, aggregate-oriented, clusters)
• Handling Relationships
• In ODBs, features of a relationship which is also called
reference attributes are used to manage relationships
between the objects. In relational DBs, attributes with
matching values are used to specify the relationship
among the tuples(records). Foreign keys are used for
referencing relation in relational DBs.
• Single reference or collection of references can be used
in ODBs, but the basic relational model only support
single valued references. Hence, representation of many
to many relationships are not straight forward in relational
model, a separate relation should be created to represent
M:N relationships.
• However, mapping of binary relationships in ODBs is not
a direct process. Therefore, designer should specify
which side should possess the attributes.
1
© e-Learning Centre, UCSC 3
Object databases and Relational databases
• Handling Inheritance
• In ODB a construct is already available in the inner-
workings for handling inheritance. But basic relational
model does not have such options to handle inheritance.
• Specifying Operations
• In ODB, operations should be designed during the design,
as a part of class specification. Relational model does not
require the designed to specify the operations during the
design phase.
• Relational model supports ad-hoc queries while ad-hoc
queries will violate the encapsulation in ODBs.
1
© e-Learning Centre, UCSC 3
XML and Relational databases
• Relationships
• XML databases follow a hierarchical tree structure with
simple and composite elements and attributes
representing relationships. But relational databases have
relationships among tables where one table is the parent
table, and the other table is the dependent table.
• Self - Describing Data
• In XML databases, the tags define both the meaning and
explanation of the data items together with the data
values. Hence different data types are possible within a
single document. In relational databases, data in a single
column should be of the same type and column definition
defines the data definition.
1
© e-Learning Centre, UCSC 4
XML and Relational databases
Inherent Ordering
• In XML databases, the order of the data items is defined
by the ordering of the tags. The document does not
include any other type of ordering. But in relational
databases, unless an order by clause is given, the
ordering of the data rows is according to the row ordering
inside tables.
1
© e-Learning Centre, UCSC 4
NoSQL and Relational databases
1
© e-Learning Centre, UCSC 4
NoSQL and Relational databases
1
© e-Learning Centre, UCSC 4
NoSQL and Relational databases
1
© e-Learning Centre, UCSC 4
NoSQL and Relational databases
1
© e-Learning Centre, UCSC 4
NoSQL and Relational databases
Customer
BELONGS_TO
BILLED_TO
PURCHASED
Address OrderPayment
Product
PAID_WITH
SHIPPED_TO PART_OF
Order
1
© e-Learning Centre, UCSC 4
NoSQL and Relational databases
1
© e-Learning Centre, UCSC 4
NoSQL and Relational databases
Schemalessness in NoSQL
• NoSQL databases have no schema; useful when we have
to work with nonuniform data.
• A key is used to store data in a key-value store.
• Since a document database achieves the same and
therefore, we have no restrictions on the document
structure.
• We can store data in columns in Column-family databases.
1
© e-Learning Centre, UCSC 4
NoSQL and Relational databases
In relational databases all the data items need to be of the same data
type while in XML databases, different data types are allowed.
Statement True/False
A view is not like a relational table
Since NoSQL databases don’t have views, they
cannot have precomputed and cached queries
Graph databases organise data into node and
edge graphs
Storage model describes how the database stores
and manipulates the data internally
An aggregate is a unit for data manipulation and
management of consistency.
1
© e-Learning Centre, UCSC 5
Summary
Major concepts of object-oriented, XML, and NoSQL
databases
Origins of NoSQL
NoSQL Databases
Data models in NoSQL
1
© e-Learning Centre, UCSC 5
Summary
Contrast and compare relational databases concepts
and non-relational databases
Object DB and
Relational DB
1
© e-Learning Centre, UCSC 5
2 : Database Constraints and Triggers
• Categories of Constraints
• Domain Constraints
• Key Constraints and Constraints on NULL Values
• Entity Integrity and Referential Integrity
• Other Types of Constraints
• Handling Constraint Violations for Insert, Delete and Update
Operations
• Inherent constraints
■ A relation consists of a certain number of simple attributes.
■ An attribute value is atomic
■ No duplicate tuples are allowed
Employee
• Other than the Primary Key, there can be other attributes which
cannot contain NULL values.
• For an example, if the columns Empname, Hire_date, NIC and
salary cannot contain NULL values, when the table is created, you
can state it as follows.
• NOT NULL constraint makes sure that a column does not hold
NULL value. When we cannot give a value for a particular column
while inserting a record into a table, default value taken by it is
NULL. By specifying NOT NULL constraint, we ensure that a
particular column(s) does (do) not contain NULL values.
Employee Department
(Referencing Table) (Referenced Table)
Empid ename address did Dept_id dname
E1001 Amal Silva Kandy 002 001 HR
E1002 Shiva Colombo 001 002 Finance
Kumar
003 Research
E1003 Fathima Ampara NULL
Siyam 004 Marketing
Relation Yes/No
Student
Professor
Course
Transcript
Teaching
Department
© e-Learning Centre, UCSC 36
Activity
• Following two tables illustrate rows that a user tries to enter. Identify
what would happen when each of the tuple is inserted into the two
tables given below. Assume that the course table already has the
rows relevant to CS100, CS101 and CS103 courses.
Students(student_id CHAR(05), name VARCHAR(50), age INT)
StudentMarks(stdid CHAR(05),course_id CHAR(05),grade CHAR)
• The other two actions are SET DEFAULT and SET NULL.
• With these actions, when you update or delete a value in the
referenced table you can set a default value or null for the
referencing value.
CONSTRAINT Student_ID_FK
FOREIGN KEY (stdid) REFERENCES Students (student_id)
ON DELETE SET NULL ON UPDATE SET DEFAULT
would prevent
If an insert violates a
referential integrity
constraint
violation
SELECT *
FROM EMPLOYEE E, EMPLOYEE M, DEPARTMENT D
WHERE E.Salary>M.Salary AND E.Dno = D.Dnumber
AND D.Mgr_ssn = M.Ssn ) );
Conditions
© e-Learning Centre, UCSC 59
Activity
• Row Triggers
• Trigger is fired once for each row in a transaction.
• If an update statement modifies multiple tuples of a table, a
row trigger is fired once for each tuple which are affected by
the update query. If there is no tuple affected by the query,
then the row trigger is not executed.
• Have access to :new (new values) and :old (old values).
E.g. update the total salary of a department when the salary
value of an employee tuple is changed.
• Statement Triggers
• Trigger is fired once for each transaction.
• If a DELETE statement deletes several rows from a table, a
statement-level DELETE trigger is fired only once, regardless
of how many rows are deleted from the table.
• Does not have access to :new and :old values.
E.g. Delete a row relevant to an employee who has
completed the contract period.
• Before Trigger
• Runs before any change is made to the database.
E.g. Before withdrawing money account balance is required to
be checked.
• After Trigger
• Runs after changes are made to the database
E.g. After withdrawing money update the account balance
• Instead of
• INSTEAD OF triggers are used to modify views that cannot be
modified directly through UPDATE, INSERT, and DELETE
statements.
When inserting values for the Employee table ensure whether the
salary is >= 20000.
Employee(Empid, Ename, Salary, Dno).
• Create a trigger that accepts insertion into the student table and
checks the GPA. If the GPA of the inserted student is greater than
3.3, or less than or equal to 3.6, that student will be automatically
applying for Computer Science stream. Otherwise, the student has
to apply for Software Engineering stream (Stream is Computer
Science, Biology etc; enrollment is number of students enrolled for
the college, decision is yes or no in getting selected for a stream)
Computer Memory
Secondary
Primary Storage Tertiary Storage
Storage
1. Primary Storage
This operates directly in the computer’s Central
Processing Unit.
Eg: Main Memory, Cache Memory.
• Provides fast access to data.
• Limited storage capacity.
• Contents of primary storage will be deleted when the
computer shuts down or in case of a power failure.
• Comparatively more expensive.
1
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
3.1.1. Computer Memory Hierarchy
2. Secondary Storage
Operates external to the computer’s main memory.
Eg: Magnetic Disks, Flash Drives, CD-ROM
• The CPU cannot process data in secondary storage
directly. It must first be copied into primary storage
before the CPU can handle it.
• Mostly used for online storage of enterprise
databases.
• With regards to enterprise databases, the magnetic
disks have been used as the main storage medium.
• Recently there is a trend to use flash memory for the
purpose of storing moderate amounts of permanent
data.
• Solid State Drive (SSD) is a form of memory that can
be used instead of a disk drive.
1
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures
2. Secondary Storage
• Least expensive type of storage media.
• The storage capacity is measured in:
- kilobytes(kB)
- Megabytes(MB)
- Gigabytes(GB)
- Terabytes(TB)
- Petabytes (PB)
1
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures
3. Tertiary Storage
Operates external to the computer’s main memory.
Eg: CD - ROMs, DVDs
• The CPU cannot process data in tertiary storage
directly. It must first be copied into primary storage
before the CPU can handle it.
• Removable media that can be used as offline storage
falls in this category.
• Large capacity to store data.
• Comparatively less cost.
• Slower access to data than primary storage media.
1
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
4. Flash Memory
• Popular type of memory with its non-volatility.
• Use the technique of EEPROM (Electronically
Erasable and Programmable Read Only Memory)
• High performance memory.
• Fast access.
• One disadvantage is that the entire block must be
erased and written simultaneously.
• Two Types:
- NAND Flash Memory
- NOR Flash Memory
• Common examples:
- Devices in Cameras, MP3/MP4 Players,
Cellphones, USB Flash Drives
1
© e-Learning Centre, UCSC
4
3.1 Disk Storage and Basic File Structures
5. Optical Drives
• Most popular type of Optical Drives are CDs and DVDs.
• Capacity of a CD is 700-MB and DVDs have capacities
ranging from 4.5 to 15 GB.
• CD - ROM reads the data by laser technology. They
cannot be overwritten.
• CD-R(compact disk recordable) and DVD-R: Allows to
store data which can be read as many times as
required.
• Currently this type of storage is comparatively declining
due to the popularity of the magnetic disks.
1
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures
6. Magnetic Tapes
• Used for archiving and as a backup storage of data.
• Note that Magnetic Disks (400 GB–8TB) and Magnetic
Tapes (2.5TB–8.5TB) are two different storage types.
1
© e-Learning Centre, UCSC
6
Activity
1
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures
3.1.2. Storage Organization of Databases
• Usually databases have Persistent data. This means
large volumes of data stored over long periods of
time.
• These persistent data are continuously retrieved and
processed in the storage period.
• The place where the databases are stored
permanently in the computer memory is the
secondary storage.
• Magnetic disks are widely used here since:
- If the database is too large, it will not fit in the
main memory.
- Secondary storage is non-volatile, but the
main memory is volatile.
- The cost of storage per unit of data is lesser in
secondary storage.
1
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures
1
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures
2
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
2
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures
3.1.2. Storage Organization of Databases
2
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
3.1.3. Secondary Storage Media
2
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures
3.1.3. Secondary Storage Media
Hardware components
on disk:
a) A single-sided disk
with read/write
hardware.
2
© e-Learning Centre, UCSC
6
3.1 Disk Storage and Basic File Structures
2
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures
2
© e-Learning Centre, UCSC
9
Activity
3
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
3
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures
3
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
Field
Employee Relation
Value
Record
© e-Learning Centre, UCSC
3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk
3
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures
3
© e-Learning Centre, UCSC
6
Activity
Select the Data Type that best matches the description out of
the following.
[Integer, Floating Point, Date and Time, Boolean, Character]
3
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures
3
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures
4
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
4
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures
4
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures
4
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
4
© e-Learning Centre, UCSC
4
Activity
Fill in the blanks in the following statements.
1. A file where the sizes of records in it are different in size
is called a _______________.
bfr = B / R
Block Size = B bytes
Record Size = R bytes 4
© e-Learning Centre, UCSC
6
3.1 Disk Storage and Basic File Structures
4
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures
4
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures
b = r / bfr
5
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
Example of Calculation
There is a disk with block size B=256 bytes. A file has
r=50,000 STUDENT records of fixed-length. Each
record has the following fields:
NAME (55 bytes), STDID (4 bytes),
DEGREE(2 bytes), PHONE(10 bytes),
SEX (1 byte).
5
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk
Operations on
Files
Retrieval Update
Operations Operations
5
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures
5
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures
3.1.6. File Operation
Operation Description
Open Allows to read or write to a file. Sets the file
pointer to the file's beginning.
Reset Sets the file pointer of an open file to the
beginning of the file.
Find (Locate) The first record that meets a search
criterion is found. The block holding that
record is transferred to a main memory
buffer. The file pointer is set to the buffer
record, which becomes the current record.
5
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures
3.1.6. File Operations
Operation Description
Read (Get) Copies the current record from the buffer to
a user-defined program variable. The
current record pointer may also be
advanced to the next record in the file using
this command.
FindNext Searches the file for the next entry that
meets the search criteria. The block holding
that record is transferred to a main memory
buffer.
Delete The current record is deleted, and the file
on disk is updated to reflect the deletion.
6
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
Operation Description
Modify Modifies some field values for the current
record and the file on disk is updated to
reflect the modification.
Operation Description
Scan Scan returns the initial record if the file
has just been opened or reset;
otherwise, it returns the next record.
6
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures
Operation Description
FindAll Locates all the records in the file that
satisfy a search condition.
FindOrdered Locates all the records in the file in a
specified order condition.
Reorganize Starts the reorganization process. (In
cases such as ordering the records)
6
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
6
© e-Learning Centre, UCSC
4
3.1 Disk Storage and Basic File Structures
6
© e-Learning Centre, UCSC
5
Activity
6
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures
• Deleting a Record.
• A program must first locate its block, copy the block
into a buffer, remove the record from the buffer, and
then rewrite the block back to the disk to delete a
record.
• This method of deleting a large number of data
results in waste of storage space.
6
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures
7
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
• Modifying a Record.
• Because the updated record may not fit in its
former space on disk, modifying a variable-length
record may require removing the old record and
inserting the modified record.
7
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures
• Reading a Record.
• A sorted copy of the file is produced to read all
entries in order of the values of some field.
Because sorting a huge disk file is a costly task,
specific approaches for external sorting are
employed.
7
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures
7
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
• Internal Hashing.
When it comes to internal files, hashing is usually
done with a Hash Table and an array of records.
• Method 1 for Internal Hashing
• If the array index range is 0 to m – 1, there are m slots
with addresses that correspond to the array indexes.
• Then a hash function is selected that converts the
value of the hash field into an integer between 0 and
m-1.
• The record address is then calculated using the given
function.
• h(K) = Hash Function of K Value
• K = Field Value h(K) = K Mod m
7
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures
temp ← 1;
for i ← 1 to 20 do temp ← temp * code(K[i ] ) mod M ;
hash_address ← temp mod M;
8
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
8
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database
records: Internal hashing, external hashing
• Internal Hashing.
• Methods of Collision Resolution
• Open Addressing - The program scans the
subsequent locations in order until an unused
(empty) position is discovered, starting with the
occupied position indicated by the hash address.
• Chaining - Changing the pointer of the occupied
hash address location to the address of the new
record in an unused overflow location and putting
the new record in an unused overflow location.
• Multiple Hashing - If the first hash function fails,
the program uses a second hash function. If a new
collision occurs, the program will utilize open
addressing or a third hash function, followed by
open addressing if required.
8
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database
records: Internal hashing, external hashing
• External Hashing.
• Hashing for disk files is named as External
Hashing.
• The target address space is built up of Buckets,
each of which stores many records, to match the
properties of disk storage.
• A bucket is a continuous group of disk blocks or a
single disk block.
• Rather than allocating an absolute block address to
the bucket, the hashing function translates a key to a
relative bucket number.
• The bucket number is converted into the matching
disk block address via a table in the file header.
8
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database records:
Internal hashing, external hashing
• External Hashing.
The following diagram shows matching bucket
numbers (0 to M -1) to disk block addresses.
8
© e-Learning Centre, UCSC
4
3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database
records: Internal hashing, external hashing
• External Hashing.
• Since many records will fit in a bucket can hash to
the same bucket without generating issues, the
collision problem is less severe with buckets.
• When a bucket is full to capacity and a new record is
entered, a variant of chaining can be used in which a
pointer to a linked list of overflow records for the
bucket is stored in each bucket.
• Here, the linked list pointers should be Record
Pointers, which comprise a block address as well
as a relative record position inside the block.
8
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database records:
Internal hashing, external hashing
• External Hashing.
Handling overflow for buckets by chaining
8
© e-Learning Centre, UCSC
6
Activity
Match the description with the correct term.
8
© e-Learning Centre, UCSC
7
3.2 Introduction to indexing
8
© e-Learning Centre, UCSC
8
3.2 Introduction to indexing
8
© e-Learning Centre, UCSC
9
3.3 Types of Indexes
9
© e-Learning Centre, UCSC
1
3.3 Types of Indexes
9
© e-Learning Centre, UCSC
2
3.3 Types of Indexes
9
© e-Learning Centre, UCSC
3
3.3 Types of Indexes
9
© e-Learning Centre, UCSC
4
3.3 Types of Indexes
The image given in the next slide illustrates the index file and
respective block pointers to the data file.
9
© e-Learning Centre, UCSC
5
3.3 Types of Indexes
9
© e-Learning Centre, UCSC
6
3.3 Types of Indexes
9
© e-Learning Centre, UCSC
7
3.3 Types of Indexes
9
© e-Learning Centre, UCSC
8
3.3 Types of Indexes
9
© e-Learning Centre, UCSC
9
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 0
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 0
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 0
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 0
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 0
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 0
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 0
3.3 Types of Indexes
Clustering Index
1
© e-Learning Centre, UCSC 1
3.3 Types of Indexes
Clustering Index with allocation of
blocks for distinct values in the
ordered key field.
1
© e-Learning Centre, UCSC 1
3.3 Types of Indexes
• Single Level Indexes: Clustering indexes
Ex: For the same ordered file with r = 300,000, B = 4,096
bytes, let’s say we have used a field “Zip code“ which is non
key field, to order the data file.
Assumption: Each Zip Code has equal number of records and
there are 1000 distinct values for Zip Codes (ri). Index entries
consist of 5-byte long Zip Code and 6-byte long block pointer.
Size of the record Ri = 5+6 = 11 bytes
Blocking factor bfri = B/Ri = floor(4,096/11)
= 372 index entries per
block
Hence, number of blocks needed bi = Ri/bfri
= ceiling(1,000/372) = 3 blocks.
Block accesses to perform a binary search,
= log2 (bi) = ceiling(log2 (3))= 2
1
© e-Learning Centre, UCSC 1
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 1
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 1
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 1
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 2
3.3 Types of Indexes
1
© e-Learning Centre, UCSC 2
3.3.2 Multilevel indexes: Overview of multilevel
indexes
1
© e-Learning Centre, UCSC 2
3.3.2 Multilevel indexes: Overview of multilevel
indexes
• If the first level index has r1 entries, blocking factor for the
first level bfr1 = fo.
• The number of blocks required for the first level is given
by, ( r1 / fo).
• Therefore, the number of records in the second level
index r2= ( r1 / fo).
• Similarly, r3= ( r2 / fo).
• However, we need to have second level only if the first
level requires more than 1 block. Likewise, we consider for
a next level only if the current level requires more than 1
block.
• If the top level is t,
t= ⎡ (logfo (r1)) ⎤ 1
© e-Learning Centre, UCSC 2
3.4 Indexes on Multiple Keys
1
© e-Learning Centre, UCSC 2
3.4 Indexes on Multiple Keys
1
© e-Learning Centre, UCSC 2
3.4 Indexes on Multiple Keys
• All of the mentioned methods will eventually give the
same set of records as the result.
• However, the number of individual records which meet
one of the specified conditions (either department_id= 1
or gpa= 3.5) are larger than the records that satisfy both
conditions (department_id= 1 and gpa= 3.5).
• Hence, none of the above three methods is efficient for
searching records we required.
• Having a multiple key index on department_id and gpa
would be more efficient in this case, because we can
search for the records which meets given requirements
just by accessing the index file.
• We refer to keys containing multiple attributes as
composite keys.
1
© e-Learning Centre, UCSC 2
3.4 Indexes on Multiple Keys
• Ordered Index on Multiple Attributes
• We can create a key field for previously discussed
file as <department_id,gpa>.
• Search key is also a pair of values. For the previous
example this will be <1,3.5>
• In general, if an index is created on attributes
<A1,A2,A3 …. ,An>, the search key values are tuples
with n values ; <v1,v2,v3 …. ,vn>.
• A lexicographic (alphabetical) ordering of these tuple
values establishes an order on this composite
search keys.
• For example, all the composite keys with 1 for
department_id will precede those for department_id
2.
• When the department_id is the same, the composite
keys will be sorted in ascending order of the gpa. 1
© e-Learning Centre, UCSC 2
3.4 Indexes on Multiple Keys
• Partitioned Hashing
• Partitioned hashing is an extension of static external
hashing (when a search-key value is provided, the
hash function always computes the same address)
which allows access on multiple keys.
• This is suitable only for equality comparisons. It
doesn’t support range queries.
• For a key consisting n attributes, n separate hash
addresses are generated. The bucket address is a
concatenation of these n addresses.
• Then it is possible to search for composite key by
looking up the appropriate buckets that match the
parts of the address in which we are interested.
1
© e-Learning Centre, UCSC 2
3.4 Indexes on Multiple Keys
• Partitioned Hashing
• For example, consider the composite search key
<department_id,gpa>
• If department_id and gpa are hashed into 2-bit and
6-bit address respectively, we get an 8 bit bucket
address.
• If department_id = 1 hashed to 01 and gpa = 3.5
hashed to 100011 then the bucket address is
01100011.
• To search for students with 3.5 gpa, we can search
for buckets 00100011 , 01100011, 10100011,
11100011
1
© e-Learning Centre, UCSC 3
3.4 Indexes on Multiple Keys
• Partitioned Hashing
• Advantages of partitioned hashing:
i. Ease of extending for any number of attributes.
ii. Ability to design the bucket addresses in a way
that frequently accessed attributes get higher-
order bits in the address. (Higher-order bits are
the left most bits)
iii. There is no need to maintain a separate access
structure for individual attributes.
1
© e-Learning Centre, UCSC 3
3.4 Indexes on Multiple Keys
• Partitioned Hashing
• Disadvantages of partitioned hashing:
i. Inability to handle range queries on any of the
component attributes.
ii. Most of the time, records are not maintained by
the order of the key which was used for the hash
function. Hence, using lexicographic order of
combination of attributes as a key (eg:
<department_id,gpa>) to access the records
would not be straightforward or efficient.
1
© e-Learning Centre, UCSC 3
3.4 Indexes on Multiple Keys
• Grid Files
• Constructed using a grid array with one linear
scale (or dimension) for each of the search
attributes.
• For the previous example of students file, we can
construct a linear scale for department_id and
another for gpa.
• These linear scales are created to preserve the
uniform distribution of that particular attributes that
are considered as index.
• Each cell points to some bucket address where
the records corresponding to that cell are stored.
1
© e-Learning Centre, UCSC 3
3.4 Indexes on Multiple Keys
Following illustration shows a grid array for the Student file with
one linear scale for department_id and another for the gpa
attribute Student File
0 1 2 3
department_id Linear scale gpa Linear scale
0 0 < 0.9 0
1 1 1.0 - 1.9 1
2 2 2.0 - 2.9 2
3 3 > 3.0 3
Linear scale for department_id Linear scale for gpa 1
© e-Learning Centre, UCSC 3
3.4 Indexes on Multiple Keys
• Grid Files
• When we query for deparment_id = 1 and gpa =3.5, it
maps to cell (1,3) as highlighted in the previous slide.
• Records for this combination can be found in the
corresponding bucket.
• Due to nature of this indexing, we can perform range
queries.
• As an example, for range query gpa > 2.0 and
department_id < 2 , following bucket pool can be
selected.
3
0
1
3
© e-Learning Centre, UCSC
0 1 2 3
3.4 Indexes on Multiple Keys
• Grid Files
• Grid files can be applied to any number of search
keys.
• If we have n number of search keys, we’ll get a grid
array of n dimensions.
• Hence it is possible to partition the file along the
dimensions of the search key attributes.
• Thus, grid files provide an access by combinations of
values along dimensions of grid array.
• Space overhead and additional maintenance cost for
reorganization of the dynamic files are some
drawbacks of grid files.
1
© e-Learning Centre, UCSC 3
3.5 Other types of Indexes
• Hash Indexes
• The hash index is a secondary structure that allows
access to the file using hashing.
• The search key is defined on an attribute except the
one used for organizing the primary data file.
• Index entries consist of the hashed key and the
pointer to the record which is corresponding to the
key.
• The index files with hash index could be arranged as
dynamically expandable hash file.
1
© e-Learning Centre, UCSC 3
3.5 Other types of Indexes
• Hash Indexes
Hash-based indexing.
1
© e-Learning Centre, UCSC 3
3.5 Other types of Indexes
• Bitmap Indexes
1
© e-Learning Centre, UCSC 3
3.5 Other types of Indexes
• Bitmap Indexes
1
© e-Learning Centre, UCSC 4
3.5 Other types of Indexes
0 51024 Sandun M 1 0
• In the given table we 1 23402 Kamalani F 0 1
have a column for
2 62104 Eranda M 1 0
record the gender of
the employee. 3 34723 Christina F 0 1
• The bitmap index for 4 81165 Clera F 0 1
the values are an
5 13646 Mohamad M 1 0
array of bits as
shown. 6 54649 Karuna M 1 0
7 41301 Padma F 0 1
M 10100110
F 01011001
1
4
© e-Learning Centre, UCSC
1
3.5 Other types of Indexes
• Bitmap Indexes
• According to the example given in the previous slide,
• If we consider value F in column gender, 1st, 3rd,
4th and 7th bits are marked as “1” because record
ids of 1,3,4, and 7 have value F init. But the record
ids of 0,2,5 and 6 set to “0”.
• Bitmap index is created on a set of records which
are numbered from 0 to n with a record id or row id
that can be mapped to a physical address.
• This physical address is created with block number
and record offset within the block.
1
© e-Learning Centre, UCSC 4
3.5 Other types of Indexes
1
© e-Learning Centre, UCSC 4
3.6 Index Creation and Tuning
• Index Creation
• An index is not an essential part of a data file.
However, we can create and remove index
dynamically.
• Usually, index is known as access structures. We
can create index based on the frequently used
search requirements.
• The physical ordering of the data file is disregarded
by creating a secondary index.
• Secondary index can be created in conjunction with
virtually any primary record organization.
• Secondary index can be used in addition to the
primary index such as ordering, hashing or mixed
files.
1
© e-Learning Centre, UCSC 4
3.6 Index Creation and Tuning
• Index Creation
• Following command is a general way of creating an
index in RDBMS;
CREATE [ UNIQUE ] INDEX <index name>
ON <table name> ( <column name> [ <order> ] { ,
<column name> [ <order> ] } )
[ CLUSTER ] ;
• Keywords in green square brackets are optional.
• [Cluster] → sort records in the datafile on the
indexing attribute.
• <order> → ASC/DESC (default- ASC)
1
© e-Learning Centre, UCSC 4
3.6 Index Creation and Tuning
• Tuning Indexes
• The indexes that we have created, may require
modifications due to following reasons,
i. Long run time of the queries due to deficiency
of an index.
ii. Index may not get utilized.
iii. Attributes that are used to create the index
might subject to frequent changes.
• DBMS provide options to view the execution order
of the queries. The indexes used, number of disk
accesses are include in this view and it is known as
query plan.
• With the query plan, we can identify if the above
problems are taking place and hence update or
remove index accordingly.
1
© e-Learning Centre, UCSC 4
3.6 Index Creation and Tuning
• Tuning Indexes
1
© e-Learning Centre, UCSC 4
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 4
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
3.7 Physical Database Design in Relational
Databases
1
© e-Learning Centre, UCSC 5
Activity
1
© e-Learning Centre, UCSC 5
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
4 : Distributed Database Systems
1
© e-Learning Centre, UCSC
Overview
Transparency
• In general, transparency is not allowing the end user to
know implementation details.
• There are several types of transparencies introduced in the
distributed database domain because the data is distributed
in multiple nodes.
i. Location transparency : Commands issued are not
changed according to the location of data or the
node.
ii. Naming transparency: When a name is associated
with an object, the object can be accessed without
giving additional details such as the location of data.
Transparency Cont.
iii. Replication transparency : User is not aware of the
replicas that are available in multiple nodes in order to
provide better performance, availability and reliability.
iv. Fragmentation transparency: User is not aware of
the fragments available.
v. Design transparency: User is unaware of the design
of the distributed database while he is performing the
transactions.
vi. Execution transparency: User is unaware of the
transaction execution details.
1
© e-Learning Centre, UCSC
0
4.1 Distributed Database Concepts, Components and
Advantage
1
© e-Learning Centre, UCSC
1
4.1 Distributed Database Concepts, Components and
Advantage
Scalability
1
© e-Learning Centre, UCSC
3
4.1 Distributed Database Concepts, Components and
Advantage
Autonomy
• The extent to which a single node (database) have the
capacity to be worked independently is refer to as
Autonomy.
• Higher flexibility is given to the nodes when there is high
autonomy.
• Autonomy can be applied in many aspects such as,
- Design autonomy: Independence of data model usage
and transaction management techniques.
- Communication autonomy: The extent to which each
node can decide on sharing of information with other
nodes.
- Execution autonomy: Independence of users to operate
as they prefer.
1
© e-Learning Centre, UCSC
4
4.1 Distributed Database Concepts, Components and
Advantage
Advantages of DDB
1. Improves the flexibility of application development
- The ability of carrying out application development
and maintenance from different physical locations.
2. Improve Availability
- Faults are isolated to the site of origin without
disturbing the other nodes connected.
- Even though a single node fails, the other nodes
continue to operate without failing the entire system.
(However, in a centralized system, failure at a single
site makes the whole system unavailable to all
users). Therefore, availability is improved with a
DDB.
1
© e-Learning Centre, UCSC
5
4.1 Distributed Database Concepts, Components and
Advantage
Advantages of DDB Cont.
3. Improve performance
- Data items are stored closer to where it is needed
the most. It reduces the competition for CPU and I/O
services required. The access delays involved in
wide area networks are also brought down.
- Since each node holds only a partition of the entire
DB, the number of transactions executed in each
site is smaller compared to the situation where all
transactions are submitted to a single centralized
database.
- Execution of queries in parallel by executing multiple
queries at different sites, or by splitting the query into
a number of subqueries also improves the
performance.
1
© e-Learning Centre, UCSC
6
4.1 Distributed Database Concepts, Components and
Advantage
1
© e-Learning Centre, UCSC
7
Activity
1
© e-Learning Centre, UCSC
8
Activity
2
© e-Learning Centre, UCSC
0
4.2 Types of Distributed Database Systems
2
© e-Learning Centre, UCSC
1
4.2 Types of Distributed Database Systems
2
© e-Learning Centre, UCSC
2
4.2 Types of Distributed Database Systems
2
© e-Learning Centre, UCSC
4
4.2 Types of Distributed Database Systems
Pure distributed
database
system
Autonomy
Centralized database
Heterogeneity systems
2
5
© e-Learning Centre, UCSC
4.3 Distributed Database Design Techniques
Fragmentation
● As the name implies, in distributed architecture, separate
portions of data should be stored in different nodes.
● Initially, we have to identify the basic logical unit of data.
In a relational database, relations are the simplest logical
unit.
● Fragmentation is a process of dividing the whole
database into various sub relations so that data can be
stored in different systems.
2
© e-Learning Centre, UCSC
6
4.3 Distributed Database Design Techniques
Example
● Suppose we have a relational database schema with three
tables (EMPLOYEE, DEPARTMENT, WORKS_ON) which
we should make partitions in order to store in several
nodes.
● Assume there are no replications allowed (data replication
allows storage of certain data in more than one place to
gain availability and reliability).
Employee
FNAME LNAME SSN BDATE ADDRESS
Works_on
ESSN DNO HOURS
Department
DNO DNAME LOCATION
2
© e-Learning Centre, UCSC
7
4.3 Distributed Database Design Techniques
Example - Approach 01
● One approach of data distribution is storing each relation
in each site.
We can store each relation in one node. In the following
example, we have stored the Employee table in Node 1, the
Department table in node 2 and Works_on table in node 3.
d4 Headquarters Kandy
d3 Finance Colombo
DNO
d8 Research Rathnapura
DNAME
LOCATION
2
© e-Learning Centre, UCSC
9
4.3 Distributed Database Design Techniques
3
© e-Learning Centre, UCSC
0
4.3 Distributed Database Design Techniques
Horizontal Fragmentation
• A subset of rows in a relation is known as horizontal
fragment or shard.
• Selection of the tuple subset is based on a condition of one
or more attributes.
• With horizontal fragmentation, we can divide tables
horizontally by creating subsets of tuples which has a logical
meaning for each of the subset.
• Then these fragments are assigned to different nodes in the
distributed system.
• Each horizontal fragment on a relation R can be specified in
the relational algebra by σCi(R) operation.(Ci → condition,
R→ relation).
• Reconstruction of the original relation is done by taking the
union of all fragments.
3
© e-Learning Centre, UCSC
2
• Ex:- If we want to store sales employee details and marketing
employee details separately in 2 nodes, we can use horizontal
fragmentation.
Employee
3
© e-Learning Centre, UCSC
3
4.3 Distributed Database Design Techniques
Explanation
• Original table (Employee) is divided into two subset of
rows.
• First horizontal fragment created (Sales_employee)
consists of details of employees who are working in the
sales department.
Sales_employee 𝛔Department = “sales” (Employee)
• Second horizontal fragment created
(Marketing_employee) consists of details of employees
who are working in the marketing department.
Marketing_employee 𝛔Department = “marketing”
(Employee)
3
© e-Learning Centre, UCSC
4
4.3 Distributed Database Design Techniques
Vertical Fragmentation
• With vertical fragmentation, we can divide the table by
columns.
• There can be situations where we do not need to store all
the attributes of a relation in a certain site.
• Therefore, with the technique of vertical fragmentation, we
can keep only required columns of a relation within a
single site.
• In vertical fragmentation, it is a must to include the primary
key or some unique key attribute in every vertical
fragment. Otherwise, we will not be able to create the
original table by putting the fragments together.
3
© e-Learning Centre, UCSC
5
4.3 Distributed Database Design Techniques
3
© e-Learning Centre, UCSC
6
• Ex:- If we want to store employees’ pay details and department
details separately in 2 nodes, we can use vertical fragmentation.
Employee
Explanation
• Original table (Employee) is divided into two subset of
columns.
• First vertical fragment created (Pay_data) consists of
salary details of employees.
Pay_data πname, salary(Employee)
• Second vertical fragment created (Dept_data) consists of
department details of employees.
Dept_data πname, Department (Employee)
3
© e-Learning Centre, UCSC
8
4.3 Distributed Database Design Techniques
Mixed Fragmentation
• Another fragmentation technique is the hybrid (mixed)
fragmentation where we can use a combination of both
the horizontal and vertical fragmentations.
• For example, take the EMPLOYEE table that we used
before.
• Employee table is vertically split into payment data and
department data. (vertical fragmentation)
• Then the department table is again separated by the
department, where the horizontal fragmentation is taking
place. (horizontal fragmentation)
• Relevant fragmentations with data can be seen in the next
slide.
3
© e-Learning Centre, UCSC
9
Employee
Kirushanthi 45900
Anna 47900
4
© e-Learning Centre, UCSC
0
Activity
Replication
• The main purpose of having data replicated in several
nodes is to ensure the availability of data.
• One extreme of data replication is having a copy of the
entire database at every node (full replication).
• The other extreme is not having replication at all. Here,
every data item is stored only at one site. (no replication)
4
© e-Learning Centre, UCSC
4
4.3 Distributed Database Design Techniques
Replication Cont.
• Full replication
-With full replication, we can achieve a higher degree
of availability. The reason for this is, the entire system
keeps running, even with only one site up, because
every site contains the whole DB.
-The other advantage is improved performance of read
queries, as the results can be obtained from any site
by locally processing at the site where it submitted.
-However, there are drawback of full replication.
-One is, degrading the write performance, because
each update should be performed at every copy of
data to maintain the consistency.
-Making the concurrency control and recovery
techniques are more complex and expensive.
4
© e-Learning Centre, UCSC
5
4.3 Distributed Database Design Techniques
Replication Cont.
• No replication
-When there are no replications, all fragments must be
disjoint ( no tuple in relation R, can be seen in more
than one site.) But the repetition of primary key should
be expected for the vertical fragments or mixed
fragments.
-Also known as non-redundant allocation.
-Suitable for systems with high write traffic.
-Lesser degree of availability is a disadvantage of no
replication.
4
© e-Learning Centre, UCSC
6
4.3 Distributed Database Design Techniques
Replication Cont.
• To get a balance between the pros and cons we
discussed, we can select a degree of replication suitable
for our application.
• Some fragments of the database may be replicated, and
others may not according to the requirements.
• It is also possible to have some fragments replicated in all
the nodes in the distributed system.
• Any way, all the replicas should be synchronized when an
update is taken place.
4
© e-Learning Centre, UCSC
7
4.3 Distributed Database Design Techniques
Allocation
• There cannot be any site which is not assigned to a site in
a DDB.
• The process of distributing data into nodes is known as
data allocation.
• The decisions of selecting the site to hold each fragment
and the number of replicas available for each data
depends on the,
- Performance requirement of the system
- Types of transactions
- Availability goals
- Transaction frequency
4
© e-Learning Centre, UCSC
8
4.3 Distributed Database Design Techniques
Allocation Cont.
Consider the following scenarios and the suggested
allocation mechanisms:
• Requires high availability of the system with high number
of retrievals,
- Recommend to have a fully replicated database.
• Requires to retrieve a subsection of data frequently,
-Recommend to allocate the required fragment into
multiple sites.
• Requires to perform a higher number of updates,
-Recommend to have a less number of replicas.
However, It is hard to find an optimal solution to distributed
data allocation since it is a complex optimization problem.
4
© e-Learning Centre, UCSC
9
Activity
Distribution Models
When the data volume increases, we can add more nodes
within our distributed database system to handle it. There are
different models for distributing data among these nodes.
1. Single server
• This is the minimum form of distribution and most often the
recommended option.
• Here, the database will be running in a single server without
any distribution.
• Since all read and write operations occur at a single node, it
would reduce the complexity by making the management
process easy.
5
© e-Learning Centre, UCSC
2
4.3 Distributed Database Design Techniques
A B C D
A B C D
5
© e-Learning Centre, UCSC
3
4.3 Distributed Database Design Techniques
5
© e-Learning Centre, UCSC
4
4.3 Distributed Database Design Techniques
5
© e-Learning Centre, UCSC
5
4.3 Distributed Database Design Techniques
5
© e-Learning Centre, UCSC
6
Distributed Database Design Techniques
Master
5
© e-Learning Centre, UCSC
8
4.3 Distributed Database Design Techniques
Distribution Models Cont.
• Appointment of the new master can be either an
automatic or a manual process.
• The disadvantage of having replicated nodes is the
inconsistency that may occur in between nodes.
• If the changes are not propagated to all the slave nodes,
there is a chance of different clients who are accessing
various slave nodes read different values.
5
© e-Learning Centre, UCSC
9
4.3 Distributed Database Design Techniques
6
© e-Learning Centre, UCSC
0
4.3 Distributed Database Design Techniques
6
© e-Learning Centre, UCSC
1
4.3 Distributed Database Design Techniques
6
© e-Learning Centre, UCSC
2
Activity
3. 4.
2.
Global Query Local Query
Query Mapping Localization
Optimization Optimization
6
4
6
© e-Learning Centre, UCSC
5
4.4 Query Processing and Optimization in
Distributed Databases
6
© e-Learning Centre, UCSC
6
4.4 Query Processing and Optimization in
Distributed Databases
6
© e-Learning Centre, UCSC
7
4.4 Query Processing and Optimization in
Distributed Databases
6
© e-Learning Centre, UCSC
8
4.4 Query Processing and Optimization in
Distributed Databases
6
© e-Learning Centre, UCSC
9
4.4 Query Processing and Optimization in
Distributed Databases
Example
Suppose Employee table and Department table are stored at node
01 and node 02 respectively. Results are expected to be presented
in node 03.
Employee Department
Size of one record =100 bytes Size of one record =35 bytes
No. of records=10000 No. of records=100
Node 01 Node 02
Results
Node 03
7
© e-Learning Centre, UCSC
0
4.4 Query Processing and Optimization in
Distributed Databases
Example
According to the details given, let’s calculate the size of each
relation.
7
© e-Learning Centre, UCSC
1
4.4 Query Processing and Optimization in
Distributed Databases
The sizes of attributes in Employee and Department relations are given
below.
EMPLOYEE
Fname field is 15 bytes long, Lname field is 15 bytes long, Address field is 10 bytes long
DEPARTMENT
7
© e-Learning Centre, UCSC
2
4.4 Query Processing and Optimization in
Distributed Databases
7
© e-Learning Centre, UCSC
3
4.4 Query Processing and Optimization in
Distributed Databases
Method 3
Explanation Transfer the DEPARTMENT relation to
site 1. Execute the join at site 1. Send the result to site 3.
Calculation
Total no. of bytes to be transferred = Size of the Department
table + size of the query result
= 403,500 bytes
7
© e-Learning Centre, UCSC
6
4.4 Query Processing and Optimization in
Distributed Databases
7
© e-Learning Centre, UCSC
7
Activity
Query: Retrieve the Student Name and Course Name which the
student is following.
Write the relational algebra for the above query.
Query: Retrieve the Student Name and Course Name which the
student is following.
If we are to transfer STUDENT and COURSE relations into node
3 and perform join operation, how many bytes need to be
transferred? Explain your answer.
© e-Learning Centre, UCSC
Activity
Suppose STUDENT table is stored in site 1 and COURSE able is
stored in site 2. The tables are not fragmented and the results
are stored in site 3. Every student is assigned to one course.
STUDENT(Sid, StudentName, Address, Grade, CourseID)
1000 records, each record is 50 bytes long
Sid: 5 bytes, StudentName;10 bytes, Address: 20 bytes
Query: Retrieve the Student Name and Course Name which the
student is following.
If we are to transfer STUDENT table into site 2, and then execute
join and send result into site 3, how many bytes need to be
transferred? Explain your answer.
© e-Learning Centre, UCSC
Activity
Suppose STUDENT table is stored in site 1 and COURSE able is
stored in site 2. The tables are not fragmented and the results
are stored in site 3.
STUDENT(Sid, StudentName, Address, Grade, CourseID)
1000 records, each record is 50 bytes long
Sid: 5 bytes, StudentName;10 bytes, Address: 20 bytes
Query: Retrieve the Student Name and Course Name which the
student is following.
If we are to transfer COURSE table into site 1, and then execute
join and send result into site 3, how many bytes need to be
transferred? Explain your answer.
© e-Learning Centre, UCSC
4.5 NoSQL Characteristics related to Distributed
Databases and Distributed System
1. Scalability
• NoSQL databases are typically used in applications with
high data growth.
• Scalability is the potential of a system to handle a growing
amount of data.
• In Distributed Databases, there are two strategies for
scaling a system.
- Horizontal scalability: When the amount of data
increases, distributed system can be expanded by
adding more nodes into the system.
- Vertical scalability: Increasing the storage capacity of
existing nodes.
• It is possible to carry out horizontal scalability while the
system is on operation. We can distribute the data among
newly added sites without disturbing the operations of
system.
8
© e-Learning Centre, UCSC
2
4.5 NoSQL Characteristics related to Distributed
Databases and Distributed System
2. Availability, Replication and Eventual Consistency:
• Most of the applications that are using NoSQL DBs,
require availability.
• It is achieved by replicating data in several nodes.
• With this technique, even if one node fails, the other
nodes who have the replication of same data will
response to the data requests.
• Read performance is also improved by having replicas.
When the number of read operations are higher, clients
can access the replicated nodes without making a single
node busy.
8
© e-Learning Centre, UCSC
3
4.5 NoSQL Characteristics related to Distributed
Databases and Distributed System
Availability, Replication and Eventual Consistency Cont.:
• But having replications may not be effective for write
operations because after a write operation, all the nodes
having same data item should be updated in order to keep
the system consistent.
• Due to this requirement of updating all the nodes with the
same data item, the system can get slower.
• However, most of the NoSQL applications prefer eventual
consistency.
• Eventual consistency will be discussed in next slide.
8
© e-Learning Centre, UCSC
4
4.5 NoSQL Characteristics related to Distributed
Databases and Distributed System
Availability, Replication and Eventual Consistency Cont.:
• Eventual Consistency
This means that at any time there may be nodes with
replication inconsistencies but if there are no further updates,
eventually all the nodes will synchronise and will be updated
to the same value.
8
© e-Learning Centre, UCSC
6
4.5 NoSQL Characteristics related to Distributed
Databases and Distributed System
Replication Models Cont.:
- Master-master replication: All the nodes are treated
similarly. Reads and writes can be performed on any
of the nodes. But it is not assured that all reads done
on different nodes see the same value. Since it is
possible for multiple users to write on a single data
item at the same time, system can be temporarily
inconsistent.
8
© e-Learning Centre, UCSC
7
4.5 NoSQL Characteristics related to Distributed
Databases and Distributed System
4. Sharding of Files:
• We have discussed the concept sharding in slide 55.
• In many NoSQL applications, there can be millions of data
records accessed by thousands of users concurrently.
• Effective responses can be provided by storing partitions of
data in several nodes.
• By using the technique called sharding (horizontal
partitioning), we can distribute the load across multiple
sites.
• Combination of sharding and replication improves load
balancing and data availability.
8
© e-Learning Centre, UCSC
8
4.5 NoSQL Characteristics related to Distributed
Databases and Distributed System
5. High-Performance Data Access:
• In many NoSQL applications, it might be necessary to find
a single data value or a file among billions of records.
• To achieve this, techniques such as hashing and range
partitioning are used.
- Hashing: A hash function h(k) applied on a given
key K, provides the location of a particular object.
- Range partitioning: Object’s location can be
identified from range of key values. For example,
location i would hold the objects whose key values
K are in the range Kimin ≤ K ≤ Ki max.
• We can use other indexes to locate objects based on
attribute conditions (different from the key K).
8
© e-Learning Centre, UCSC
9
Activity
Fill in the blanks with the most suitable word given.
(horizontal, vertical, eventual consistency, consistency, master,
slave, availability, usability)
Distributed Database
Concepts, Components
and Advantages
Types of Distributed
Database Systems
9
© e-Learning Centre, UCSC
1
Summary
NoSQL Characteristics
related to Distributed
Databases and
Distributed Systems
9
© e-Learning Centre, UCSC
2
5 : Consistency and Transaction Processing
Concepts
IT3306 – Data Management
Level II - Semester 3
1
© e-Learning Centre, UCSC
0
5.1.1. Single-user systems, Multi-user systems and
Transactions
1
© e-Learning Centre, UCSC
1
5.1.1. Single-user systems, Multi-user systems and
Transactions
1
© e-Learning Centre, UCSC
2
5.1.2. Transaction States
1
© e-Learning Centre, UCSC
3
5.1.2. Transaction States
1
© e-Learning Centre, UCSC
4
5.1.2. Transaction States
End
Begin
Transaction Commit
Transaction Partially
Active Committed
Committed
Abort
Abort
Failed Terminated
1
© e-Learning Centre, UCSC
5
5.1.2. Transaction States
1
© e-Learning Centre, UCSC
6
5.1.2. Transaction States
1
© e-Learning Centre, UCSC
7
5.1.3. Problems in Concurrent Execution
Example Transaction
Account balance of A (X) is 1000;
Account balance of B (Y) is 2000;
Transaction T1 - Rs.50 is withdrawn from A and deposited in B.
Transaction T2 - Rs.100 deposited to account A.
T1 T1 T2
T2
A = 1000 A = 1000
read_item(X) read_item(X)
X:= X-N; A – 50= 950 A + M = 1100
X:= X+M;
write_item (X); write_item (X); A = 950 A = 1100
read_item(Y); B = 2000
Y:= Y+N;
B = 2000 + 50
write_item(Y);
B = 2050
1
© e-Learning Centre, UCSC
9
5.1.3. Problems in Concurrent Transaction Processing
2
© e-Learning Centre, UCSC
1
5.1.3. Problems in Concurrent Transaction Processing
The Lost Update Problem -Example
T1 T2 T1 T2
READ(X) X=80
Still the value
X=X-N X=80-8 of X is 80,
READ(X) X=80
because the
change done
WRITE(X) X=72 by T1 is not yet
X=X+M X=80+2 written.
WRITE(X) X=82
READ(Y) Y=100
Y=Y+N Y=100+8
WRITE(Y) Y=108
• But, after the execution, the results ( X=82 and Y=108) does not
match with expected calculations (X=74 and Y=108).
• The resulting X value is incorrect because the update done by T1
for X is lost and T2 gets the X value directly from the DB. 2
© e-Learning Centre, UCSC
2
5.1.3. Problems in Concurrent Transaction Processing
2
© e-Learning Centre, UCSC
3
5.1.3. Problems in Concurrent Transaction Processing
The Temporary Update (or Dirty Read) Problem
T1 T2
read_item(X);
X = X - N; • In the given table,
write_item(X);
transaction T1 has updated
read_item(X); the value of X, and then
X = X + M;
write_item(X); transaction T2 has read the
read_item(Y);
updated value of X.
rollback;
T1 T2 T1 T2
X value should
READ(X); 80 be 80 when T1
is rolled back.
X=X-N; X=80-5
But, T2 has read
WRITE(X); 75 X from the
temporary
READ(X); 75
update done by
X=X+M; X=75+4 T1.
WRITE(X); 79
READ(Y); 100
ROLLBACK; ROLLBACK
2
© e-Learning Centre, UCSC
6
5.1.3. Problems in Concurrent Transaction Processing
2
© e-Learning Centre, UCSC
7
5.1.3. Problems in Concurrent Transaction Processing
The Incorrect Summary Problem
X= 80; Y=100;, N=5; M=4; A=5;
T1 T3 T1 T3 T3 reads X after it is
SUM=0; 0 updated by T1. The
READ(A); 5 correct value of X is
SUM+=A; 5
taken for the sum. But
T3 reads Y before it is
READ(X); 80
getting updated and
X=X-N; 80-5 hence read an
WRITE(X); 75 incorrect value for the
READ(X); 75 sum.
SUM+=X; 5+75
The correct sum after
READ(Y); 100
reading Y should be 80
SUM+=Y; 80+100 + 105.
READ(Y); 100 But instead it gives 80 +
Y=Y+N;
100 since y is read as
100+5
100 instead of 105.
WRITE(Y); 105
2
© e-Learning Centre, UCSC
8
5.1.3. Problems in Concurrent Transaction
Processing
2
© e-Learning Centre, UCSC
9
5.1.3. Problems in Concurrent Transaction Processing
T1 T2 T1 T2
READ(X) 80
READ(X) 80
X=X-5 80-5
WRITE(X) 75
READ(X) 75
3
© e-Learning Centre, UCSC
0
5.1.3. DBMS Failures
3
© e-Learning Centre, UCSC
1
5.1.3. DBMS Failures
3
© e-Learning Centre, UCSC
2
5.1.3. DBMS Failures
3
© e-Learning Centre, UCSC
3
5.1.3. DBMS Failures
3
© e-Learning Centre, UCSC
4
5.1.3. DBMS Failures
3
© e-Learning Centre, UCSC
5
5.2. Properties of Transactions
ACID properties
ACID are the properties of transactions which are
imposed by concurrency control and recovery methods of
the DBMS.
ACID stands for
i) A – Atomicity
ii) C – Consistency
iii) I – Isolation
iv) D – Durability
A detailed description of each property is explained in the
upcoming slides.
3
© e-Learning Centre, UCSC
6
5.2. Properties of Transactions
i) Atomicity
3
© e-Learning Centre, UCSC
7
5.2. Properties of Transactions
3
© e-Learning Centre, UCSC
9
5.2. Properties of Transactions
ii) Consistency
• A transaction should be completely executed from
beginning to end without getting interfered by other
transactions to preserve the consistency. A transaction
leads the database from one consistent state to another.
• A database state is a collection of all the data values in the
database at a given point.
• The conservation of consistency is viewed as the
responsibility of the developers who compose the programs
and of the DBMS module that upholds integrity constraints.
4
© e-Learning Centre, UCSC
0
5.2. Properties of Transactions
ii) Consistency
• A consistent state of the database fulfils the requirements
indicated in the schema and other constraints on the
database that should hold.
• If a database is in a consistent state before executing the
transaction, then it will be in a consistent state after the
complete execution of the transaction (assuming that no
interference occurs with other transactions).
4
© e-Learning Centre, UCSC
1
5.2. Properties of Transactions
4
© e-Learning Centre, UCSC
2
5.2. Properties of Transactions
iii) Isolation
• During the execution of a transaction, it should appear
as if it is isolated from other transactions even though
there are many transactions happening concurrently.
• The execution of a transaction should not interfere
with other transactions executing simultaneously.
• The isolation property is authorized by the
concurrency control subsystem of the DBMS.
• In the event that each transaction doesn't make its
write updates apparent to other transactions until it is
submitted, one type of isolation is authorized that
takes care of the temporary update issue.
4
© e-Learning Centre, UCSC
3
5.2. Properties of Transactions
4
© e-Learning Centre, UCSC
4
5.2. Properties of Transactions
Levels of Isolation
Before talking about isolation levels, let’s discuss about
database locks.
Database Locks
A database lock is used to "lock" data in a database table so that
only one transaction/user/session may edit it. Database locks are
used to prevent two or more transactions from changing the same
piece of data at the same time.
4
© e-Learning Centre, UCSC
5
5.2. Properties of Transactions
Levels of Isolation
There have been attempts to define the level of isolation
of a transaction.
• Level 0 (zero) isolation (known as Read
Uncommitted) - If a transaction does not overwrite the
dirty reads of higher-level transactions.
• Level 1 (one) isolation (known as Read Committed) -
If a transaction has no lost updates.
• Level 2 isolation (known as Repeatable Read) - If a
transaction has no lost updates and no dirty reads.
• Level 3 Isolation / True isolation (known as
Serializable Read) - If a transaction has no lost
updates, no dirty reads and no repeatable reads.
4
© e-Learning Centre, UCSC
6
5.2. Properties of Transactions
Levels of Isolation
Example for Level 0 (zero) isolation
T1 T2
update employee
set salary = salary - 100
where emp_number = 25
select sum(salary)
from employee
Commit;
Rollback;
Levels of Isolation
4
© e-Learning Centre, UCSC
8
5.2. Properties of Transactions
Levels of Isolation
Example for Level 1 isolation
T1 T2
update employee
set salary = salary - 100
where emp_number = 25;
select sum(salary)
from employee
where emp_number < 50;
rollback
commit
T1 T2
select sum(salary)
from employee
where emp_number < 25
update employee
set salary= salary- 100
where emp_number = 22
commit transaction
select sum(salary)
from employee
where emp_number < 25
commit transaction
5
© e-Learning Centre, UCSC
0
5.2. Properties of Transactions
Levels of Isolation
Example for Level 2 isolation
In the example in previous slide;
T1 queries to get the sum of salaries of employees whose
emp_number is less than 25. T2 updates the salary of the
employee whose emp_number is 22. Then T1 executes the
same query again.
If transaction T2 modifies and commits the changes to the
employee table after the first query in T1, but before the second
one, the same two queries in T1 would produce different
results. Isolation level 2 blocks transaction T2 from executing. It
would also block a transaction that attempted to delete the
selected row. Thus, lost updates and dirty reads are avoided.
5
© e-Learning Centre, UCSC
1
5.2. Properties of Transactions
Phantoms
• If a database table includes a record which was not
present at the start of a transaction but is present at the
end then it is called a phantom record.
• For example, If transaction T2 enters a record to a table
that transaction T1 currently reads (the record also
satisfies the filtering conditions used in T1), then that
record is a phantom because it was not there when T1
started but is there when T1 ends.
• If the equivalent serial order is T1 followed by T2, then the
record should not be seen. But if it is T2 followed by T1,
then the phantom record should be in the result given to
T1.
5
© e-Learning Centre, UCSC
2
5.2. Properties of Transactions
Levels of Isolation
Example for Level 2 isolation
Consider the following example on phantom reads
T1 T2
commit transaction
commit transaction
5
© e-Learning Centre, UCSC
3
5.2. Properties of Transactions
Levels of Isolation
Example for Level 2 isolation (Phantom reads in Level 2
Isolation)
In the example given in the previous slide;
T1 retrieves the rows from employee table where salaries are more than 45000.
Then T2 inserts a row that meets the criteria given in T1 (an employee whose
salary is greater than 45000) and commits. T1 issues the same query again.
The number of rows retrieved for the same select query in T1 are different when
the isolation level is 2.
Total no. of records retrieved by executing second select statement = total no.
of records retrieved by first select statement is +1.
This creates a phantom. Phantoms occur when one transaction reads a set of
rows that satisfy a search condition, and then a second transaction modifies
those data. If the first transaction repeats the read with the same search
conditions, it obtains a different set of rows.
In the above example, T1 sees a phantom row in the second select query.
5
© e-Learning Centre, UCSC
4
5.2. Properties of Transactions
Levels of Isolation
Example for Level 2 isolation (Phantom reads in Level 2
Isolation)
5
© e-Learning Centre, UCSC
5
5.2. Properties of Transactions
Levels of Isolation
Example for Level 3 isolation
Explanation of the example is given in the next slide.
T1 T2
commit transaction
commit transaction
5
© e-Learning Centre, UCSC
6
5.2. Properties of Transactions
Levels of Isolation
Example for Level 3 isolation
In the table shown in previous slide;
5
© e-Learning Centre, UCSC
8
5.2. Properties of Transactions
Snapshot isolation
Example for Snapshot isolation
T1 T2 empID empname
SELECT *
FROM employee
100 Upul
ORDER BY empID;
200 Manjitha
INSERT INTO employee
(empID, empname)
VALUES(600, 'Anura');
COMMIT; This is the output of the first
SELECT * select query of T1. It only
FROM employee
ORDER BY empID;
generates the data that is
INSERT INTO employee
available in the current
(empID, empname) snapshot.
VALUES(700, 'Arjuna')
COMMIT;
5
© e-Learning Centre, UCSC
9
5.2. Properties of Transactions
Snapshot isolation
T1 T2 empID empname
SELECT *
FROM employee
100 Upul
ORDER BY empID;
200 Manjitha
INSERT INTO employee
(empID, empname)
VALUES(600, 'Anura');
COMMIT;
The second select statement
SELECT *
FROM employee of T1 produces the same
ORDER BY empID; result as the first select
INSERT INTO employee statement, because T1 has
(empID, empname)
VALUES(700, 'Arjuna') not committed yet. The
COMMIT; snapshot taken by T1
remains without changing
SELECT * FROM employee until it commits.
ORDER BY empID;
6
© e-Learning Centre, UCSC
0
5.2. Properties of Transactions
Snapshot isolation
T1 T2 empID empname
6
© e-Learning Centre, UCSC
2
5.2. Properties of Transactions
iv) Durability
• Durability or permanency means, once the changes of a
transaction are committed to the database, those changes
must remain in the database and should not be lost.
6
© e-Learning Centre, UCSC
3
5.2. Properties of Transactions
Example for Durability
• Definition : Changes must never be lost because of
subsequent failures (eg: power failure)
• In the transaction T1, if transaction failure occurs after
write (A), but before write (B);
To recover the database,
i. We must remove changes of partially done transactions.
Therefore, the change done on A should be rolled back.
(before crash, A was 950. Then it needs to be rolled
back to 1000)
ii. We need to reconstruct completed transactions.
If the system fails after the commit operation of a
transaction, but before the data could be written on to
the disk, then that transaction needs to be
reconstructed.
The database should keep all its latest updates even if the
system fails. If a transaction commits after updating data, then
the database should have the modified data.
6
© e-Learning Centre, UCSC
4
Activity
6
© e-Learning Centre, UCSC
5
Activity
read (x)
x=x-n
read (x)
x=x+m
write (x)
read (y)
write (x)
y=y+n
write (y)
6
© e-Learning Centre, UCSC
6
Activity
T1 T2
read (x)
x=x-n
write (x)
read (x)
x=x+m
write (x)
commit
read (y)
abort
6
© e-Learning Centre, UCSC
7
Activity
T1 T2
sum = 0
Identify the problem that
would result in the given read (a)
read (x)
x=x-n
write (x)
read (x)
sum = sum + x
read (y)
sum = sum + y
read (y)
y=y+n
write (y)
6
© e-Learning Centre, UCSC
8
Activity
6
© e-Learning Centre, UCSC
9
Activity
7
© e-Learning Centre, UCSC
0
Activity
Drag and drop the matching words for the sentence
1. Problems caused by hardware, software, or network error
that occurs in the computer system during transaction
execution. –
2. Occurs due to the errors in operation such as integer
overflow or division by zero. –
3. Occurs due to some exception in the programme cause
the cancellation of a transaction –
4. Occurs due to read or write malfunction and data in some
disk blocks may get lost –
5. Problems such as power loss, failure in air-conditioning,
natural disasters, theft, sabotage, mistakenly overwriting
disks or tape, and mounting of a wrong tape by the
operator –
7
© e-Learning Centre, UCSC
1
Activity
item table=>
item_no 1 2 3 4 5 6 7
A list of item numbers and their prices are given in the above
table. After the two transactions T1, T2 were executed on the
above item table, the output was 14,500.
What can be the least possible isolation level used in T2?
T1 T2
rollback
commit
7
© e-Learning Centre, UCSC
2
Activity
7
© e-Learning Centre, UCSC
3
5.3 Schedules
Schedules of Transactions
• The arrangement or order of operations in a
transaction is named as a schedule.
S = T1, T2, T3,.....,Tn
7
© e-Learning Centre, UCSC
4
5.3 Schedules
Schedules of Transactions
• In this slide set, we will be using a set of notations for
the operations included in a transaction and to identify
the transaction number we will be adding a subscript.
• Following are the notations and their descriptions, that
we use in this slide set.
b begin_transaction
r read_item
w write_item
e end_transaction
c commit
a abort
7
© e-Learning Centre, UCSC
5
5.3 Schedules
T1 T2
r(X)
r(X)
w(X)
r(Y)
w(X)
w(Y)
7
© e-Learning Centre, UCSC
6
5.3 Schedules
• Schedules of Transactions
If two operations in a schedule have the following
properties, it is known as a conflict.
1. Operations are from different transactions.
2. Do the operation on same data item.
3. At least one of the two operations is a write (insert,
update, delete)
7
© e-Learning Centre, UCSC
7
5.3 Schedules
7
© e-Learning Centre, UCSC
8
5.3 Schedules
7
© e-Learning Centre, UCSC
9
5.3 Schedules
8
© e-Learning Centre, UCSC
0
5.3 Schedules
S’’ =r1 (X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2;
In the above example, T2 is also aborted since T1 aborted.
The reason here is that, T2 reads the X value from T1.
8
© e-Learning Centre, UCSC
1
5.3 Schedules
8
© e-Learning Centre, UCSC
2
5.4 Serializability
8
© e-Learning Centre, UCSC
3
5.4 Serializability
8
© e-Learning Centre, UCSC
4
5.4 Serializability
w(a)=87
Initial values of a=90 and b=90
r(b)=90
b=b+3
w(b)=93
What is the final value of a and b after
completion of T1 and T2?
c
r(a)=87 a= 89
a=a+2 b=93
w(a)=89
w(a)=92 a= 92
b=93
c
w(a)=87
Initial values of a=90 and b=90
r(a)=87
w(b)=93
Is this a correct schedule? Yes. The final
c
answers are correct.
8
© e-Learning Centre, UCSC
7
5.4 Serializability
8
© e-Learning Centre, UCSC
8
5.4 Serializability
S1 = r1(X); w2(X);
S2= w2(X); r1(X);
S1 and S2 are not conflict equivalent since the order of conflicting
operations are different.
8
© e-Learning Centre, UCSC
9
5.4 Serializability
P Q
T1 T2
• A schedule S is serializable, if it
T1 T2
is conflict equivalent to a serial
r(a)
r(a) schedule S’.
a=a -3
a=a -3
w(a)
r(b) w(a) Ex:
b=b+3 r(a) • Schedule P is a serial schedule.
w(b)
a=a+2 • Schedule Q performs all the
c
conflicting operations in the
r(a) w(a)
same order as schedule P.
a=a+2 c
Therefore, P and Q schedules
w(a) r(b) are conflict equivalent.
c
b=b+3 • Hence, Q is a serializable
w(b)
schedule.
c
9
© e-Learning Centre, UCSC
0
5.4 Serializability
9
© e-Learning Centre, UCSC
1
5.4 Serializability
T1 T2
9
© e-Learning Centre, UCSC
3
5.4 Serializability
T1 T2
X
9
© e-Learning Centre, UCSC
4
5.4 Serializability
T1 T2
X
9
© e-Learning Centre, UCSC
5
5.4 Serializability
T1 T2
X
9
© e-Learning Centre, UCSC
6
5.4 Serializability
T1 T2 T3
Testing for Serializability of a
1. r(Z)
Schedule
2 r(Y)
7 w(X)
Line no. 3 and 4: T2->T3 (Y) 8 w(Y)
Line no. 1 and 9: T2->T3 (Z) 9 w(Z)
Line no. 7 and 10: T1->T2 (X) 10 r(X)
13 r(X)
9
© e-Learning Centre, UCSC
7
5.4 Serializability
9
© e-Learning Centre, UCSC
8
5.4 Serializability
9
© e-Learning Centre, UCSC
9
5.4 Serializability
1
© e-Learning Centre, UCSC 0
5.4 Serializability
1
© e-Learning Centre, UCSC 0
5.4 Serializability
View Equivalence and View Serializability
• Criteria for two schedules S and S′ to be view equivalent is
as follows.
1
© e-Learning Centre, UCSC 0
5.4 Serializability
T1 T2 T1 T2
r(a) r(a)
w(a) w(a)
r(a) r(b)
w(a) w(b)
r(b) r(a)
w(b) w(a)
r(b) r(b)
w(b) w(b)
S P 1
© e-Learning Centre, UCSC 0
5.4 Serializability
1
© e-Learning Centre, UCSC 0
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 0
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 0
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 0
5.5 Transaction Support in SQL
SET TRANSACTION
READ ONLY,
ISOLATION LEVEL READ UNCOMMITTED,
DIAGNOSTIC SIZE 6;
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
Read
Uncommitted
Read Committed
Repeatable Read
Serializable
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
Read Uncommitted: Declares that transaction can read rows
that have been modified by other transactions but not yet
committed. Thus, result in dirty reads, non-repeatable reads
and phantoms.
• Example - Consider the following transactions T1 and T2
occurs on an account that holds Rs.50,000 of initial balance.
Transaction (T1) →
Deduct Rs: 1000 from an account (Customer_ID=Cid_1105)
due to an automated bill payment happens every month.
But, since an error occurred, T1 transaction rolled back
without committing.
Transaction (T2) →
At the same time while T1 executes, customer
(Customer_ID=Cid_1105) checks his account balance.
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
Read Committed: Declares that transaction can only read
data that has been committed by other transactions. Thus,
prevent dirty reads. But result in non-repeatable reads and
phantoms.
• Example - Consider the following transactions T1 and T2
occurs on an account that holds Rs.50,000 of initial balance.
Transaction (T1) →
Deduct Rs: 1000 from an account (Customer_ID=Cid_1105)
due to an automated bill payment happens every month.
This transaction was successfully completed and committed
to the database.
Transaction (T2) →
At the same time while T1 executes, customer
(Customer_ID=Cid_1105) checks his account balance twice
consequently. T2 Reads the account balance twice.
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 1
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
Transaction (T1) →
Deduct Rs: 1000 from an account (Customer_ID=Cid_1105)
due to an automated bill payment happens every month.
Then T1 transaction commits.
Transaction (T2) →
At the same time while T1 executes, customer
(Customer_ID=Cid_1105) checks his account balance twice
consequently.
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
• First read statement of T2 will not get the balance, but the
second read statement in T2 will get the output =
49,000.
• Explanation→ We have set the isolation level to
“REPEATABLE READ” in T1, the first read statement in
T2 will not allowed to read the balance because T1 has
updated the balance and not committed yet.
• When T2 reads the balance again, T1 has been
completed and committed to the database. Hence it gets
the output= 49,000.
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
Transaction (T1) →
Reads details of employees who are working in the “123”
department twice consecutively.
Transaction (T2) →
At the same time new record is inserted into the employee
table with name =”June” who is working in the "123"
department.
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 2
5.5 Transaction Support in SQL
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
Consistency
• As we discussed in previous slides, a transaction leads
the database from one consistent state to another.
• In other words, transactions must affect database only
in valid ways.
Consistency in NoSQL
• In NoSQL databases, eventual consistency is preferred
over immediate consistency.This will be discussed in
detail later.
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
Update Consistency
• Update consistency in NoSQL make sure that write-
write conflicts doesn’t occur.
• Write-write conflict occurs when two transactions
update same data item at the same time. If the server
serialize the updates, a lost update occurs.
• There are 2 types of approaches for maintaining
consistency.
– Pessimistic approach: Prevents conflicts from
occurring.
– Optimistic approach: Let the conflicts occur but
detects and takes action to sort them out.
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
Samanali and Krishna read the record A which has the value
100. Samanali wants to add 50 to the A value. Just before
writing the value, she checks the value of A, to make sure it
has not changed since her last read and then does the
modification. Meanwhile Krishna wants to subtract 20 from the
value A. Just before the modification, he also checks the value
of A to make sure the value remain unchanged as 100. But as
Samanali has changed A to 150, Krishna fails to do the
update.
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
Samanali and Krishna read the record A which has the value
100. Then Samanali add 50 to this value and write it.
Meanwhile Krishna subtract 20 from value of A and write it.
DBMS will save the both values 150 (changed by Samanali)
and 80 (changed by Krishna) as possible values for A and
then mark them as conflicts.
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
Read Consistency
• Read consistency in NoSQL will guarantee that readers
will always get consistent responses to their requests.
• Read consistency will prevent “inconsistent read” or
“read-write conflict”.
• Read consistency will preserve ,
- Logical consistency (ensure that different
data items make sense together).
- Replication consistency (ensure that same
data item has the same value when read from
different replicas).
- Session consistency (within user’s session
there is read-your-writes consistency. It means once
you’ve made an update, you are guaranteed to
continue seeing that update).
1
© e-Learning Centre, UCSC 3
5.6 Consistency in NoSQL
Replication
• Creating multiple copies of data items over different
servers is known as replication.
• Can be implemented using following two forms.
- Master-Slave : In master-slave replication, the
master processes the updates and then changes are
propagated to slaves.
- Peer-to-peer: In peer-to-peer replication, all the
nodes can process updates and then synchronize
their copies of data.
1
© e-Learning Centre, UCSC 4
5.6 Consistency in NoSQL
Master-Slave Replication
• Master Master
1
© e-Learning Centre, UCSC 4
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 4
5.6 Consistency in NoSQL
Peer-to-Peer Replication
• All the replicas have equal weight
• Every replica can process updates
• Even if one replica fails, system can operate normally.
• Pros
- Resistant to node failures
- Can easily add nodes to improve performance
• Cons
- Write-write inconsistencies can occur
- Read-write inconsistencies can occur due to slow
propagation
1
© e-Learning Centre, UCSC 4
5.6 Consistency in NoSQL
Relaxing Consistency
• Even though consistency is a good property, normally it is
impossible to achieve consistency without significant
sacrifices to other characteristics of the system such as
availability.
• Transactions will enforce consistency but it is possible to
relax isolation levels to enable individual transactions to
read data that has not been committed yet.
• Relaxing isolation level will improve the performance but
will reduce the consistency.
1
© e-Learning Centre, UCSC 4
5.6 Consistency in NoSQL
CAP Theorem
• In a database which has several connected nodes, given
the three properties of Consistency, Availability and
Partition tolerance, it is possible to enforce only two
properties at a time.
- Consistency: (We discussed earlier).
- Availability: Every request received by a non failing
node in the system must result in a response.
- Partition tolerance: The system continues to operate
despite communication breakages that separate the
cluster into multiple partitions which are unable to
communicate with each other.
• The resulting system designed using CAP theorem will not
be perfectly consistent or perfectly available but would
have a reasonable combination.
1
© e-Learning Centre, UCSC 4
5.6 Consistency in NoSQL
CP Category CA Category
Some data might Network problems might
become unavailable. stop the system.
Partition tolerance
P A Availability
AP Category
Data inconsistencies may
occur.
1
© e-Learning Centre, UCSC 4
5.6 Consistency in NoSQL
Durability
• Durability means that committed transactions will survive
permanently (even if the system crashed). This is
achieved by flushing the records to disk (Non-volatile
memory) before acknowledging the commit.
Relaxing Durability
• In relaxing durability, database can apply updates in
memory and periodically flush changes to the disk. If the
durability needs can be specified on a call-by-call basis,
more important updates can be flushed to disk.
• By relaxing durability, we can gain higher performance.
1
© e-Learning Centre, UCSC 4
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
Relaxing Durability
• Another class of durability tradeoffs comes up with
replicated data.
• A failure of replication durability occurs when a node
processes an update but fails before that update is
replicated to the other nodes.
• For example, assume a peer-to-peer replicated system
with three nodes, R1 , R2 and R3. If the transaction is
updated to the memory of R1, but it crashed before the
update is sent to R2 and R3, a failure of replication will
occurr. This can be avoided by setting the durability level.
If the system doesn’t acknowledge the commit until the
update is propagated to majority of nodes, above
scenario will not have occurred.
1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
Quorums
• Answers the question, “How many nodes need to be
involved to get strong consistency?”
• Write quorum specifies the number of nodes with non
conflicting writes.
• If W > N/2 ; then the system said to have a strong
consistency.
• W - Number of nodes participating in the write
• N - Number of nodes involved in replication
• The number of replicas is known as the replication
factor.
• If number of nodes required to contact for a read is R;
when R + W > N you can have a strong consistent
read. 1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
Quorums Example
• Let’s consider a system with replication factor 3. How
many nodes are required to confirm a write?
For a system to have a strong consistency, W should be
greater than N/2. ( N is replication factor)
Here, W needs to be greater than 3/2
W>1.5
Therefore we need at least 2 nodes to confirm a write.
• What is the number of nodes you need to contact for a
read?
R + W >N (according to definition in previous slide)
R > N -W
R> 3 - 2
R>1
Therefore the number of nodes you need to contact for a read is 2. 1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
Version Stamps
• We need human intervention to work with updates in a
transactional system as transactions have limitations.
• Applying locks for longer period of time will affect the
performance of the system. Solution for this is version
stamps, a field that changes every time the underlying
data in the record changes.
• System can note the version stamp when reading the
data and can check whether it’s changed before writing
the data.
1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
ii.Create a GUID
• Pros
- Can be generated by any node
• Cons
- Large numbers
- Unable to compare and find the most recent
version directly.
1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 5
5.6 Consistency in NoSQL
1
© e-Learning Centre, UCSC 6
Activity
Consider T1 and T2
65 rows
transactions given in tabular
format. If T1 reads 65 row
and 66 rows respectively in
Read1 and Read2
operations,
what is the minimum
isolation level of transaction
T1?
66 rows
1
© e-Learning Centre, UCSC 6
Activity
65 rows
Consider T1 and T2
transactions given in
tabular format. If T1
reads 65 rows in both
Read1 and Read2
operations,
what is the minimum
isolation level of
transaction T1?
65 rows
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 6
Activity
1
© e-Learning Centre, UCSC 7
Activity
1
© e-Learning Centre, UCSC 7
Schedule S:
Activity
T1 T2 T3 T4
r(X)
Write whether the given statements
are true or false considering the given
w(X) schedule S.
c
1. S is conflict serializable and
w(X) recoverable. (_______)
c 2. S is conflict serializable but not
recoverable. (_______)
w(Y)
3. S includes blind writes. (_______)
r(Z) 4. S is recoverable but not conflict
c serializable. (_______)
r(X)
r(Y)
1
© e-Learning Centre, UCSC 7
Activity
Property Explanation
Consistency System continues to operate even in the
presence of node failure
Availability System continues to operate in spite of
network failures.
Partition Tolerance All the users can see the same data at
same time.
1
© e-Learning Centre, UCSC 7
Activity
1
© e-Learning Centre, UCSC 7
Activity
1
© e-Learning Centre, UCSC 7
Activity
• Drag and drop the correct answer from the given list.
1
© e-Learning Centre, UCSC 7
Summary
Properties of
Transactions ACID properties, levels of isolation
1
© e-Learning Centre, UCSC 7
Summary
Schedules of Transactions
Schedules Schedules Based on Recoverability
1
© e-Learning Centre, UCSC 7