You are on page 1of 34

Advanced Databases Concepts

Amir Shahzad Khokhar


Centralized Database
• “Centralized database systems are those that
run on a single computer system and do not
interact with other computer systems”
Architecture for Centralized Database

• Requests are directed to the central server


• Clients do not communicate among themselves

Centralized Database Architecture


Advantages of centralized DB
• Single site provides high degree of security
• Concurrency
• Backup and recovery is easy
• Easy data management
• No need of distributed joins and directory
Disadvantages of Centralized DB
• High communication cost
• Bottleneck problem
• Availability problem
• Wasn’t responsive to need for faster response
times and quick access to information
• “The concept of distributed databases is
introduced in order to overcome these issues

Distributed Processing and Distributed Databases

• Distributed processing
– Database’s logical processing is shared among two or more
physically independent sites connected through a network
Distributed Database
• A distributed database is a collection of multiple,
logically interrelated databases distributed over a
computer network”
• The database is scattered over various locations
which provides local access to data and thus reduces
communication costs and increases availability.
• online banking, e-commerce merchant, HR
departments, telecommunication industry and air
line ticketing etc.
Types of Distributed
• Types of DistributedDatabase
– Homogeneous: Every site runs the same type of
DBMS
– Heterogeneous: Different sites rundifferent DBMS
(maybe even RDBMS and ODBMS)
Experiment Environment for Distributed Database

• All sites are interconnected and can communicate to one


another
Server/
DB Server/
Client DB
Client

Server/ Server/
DB DB
Client Client

Server/
DB
Client
Advantages of DDB
• Local autonomy
• Reduced communications costs because each
table can be located at the site that most
heavily uses it
• Improved availability because portions of the
database are available even if one or some of
the sites are down
Disadvantages of DDB
• Security issues
• Concurrency
• Backup and recovery are the main problem of
DDS
Distributed Database Transparency Features

• Allow end user to feel like database’s only user


• Features include:
– Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency
– Heterogeneity transparency
Distributed Database Transparency Features

• Distribution Transparency
– Allows management of physically dispersed database
as if centralized
– The user does not need to know
• That the table’s rows and columns are split
vertically or horizontally and stored among multiple
sites
• That the data are geographically dispersed among
multiple sites
• That the data are replicated among multiple sites
Distributed Database Transparency Features

• Transaction Transparency
– Allows a transaction to update data at more than one
network site
– Ensures that the transaction will be either entirely
completed or aborted in order to maintain database
integrity
• Failure Transparency
– Ensures that the system will continue to operate in the
event of a node or network failure
– Functions that were lost will be picked up by another
network node
Distributed Database Transparency Features

• Performance Transparency
– Allows the system to perform as if it were a centralized DBMS
• No performance degradation due to use of a network or
platform differences
• System will find the most cost effective path to access
remote data
• Heterogeneity Transparency
– Allows the integration of several different local DBMSs under
a common global schema
• DDBMS translates the data requests from the global
schema to the local DBMS schema
Results for Centralized Database

Statistics of sever
Response_time in MilliSeconds

0
10
20
30
40
50
60
70
80
90
N
or
m
al
ca
ll

R
ef
il l

Ba
la
n ce
In
qu
i ry

C
ha
n ge
La
ng
ua
ge

SM
S-
C

Queries
ha
rg
e

Pr
e-
Server Statistics (Locally)

ac
t iv
at
Su io
bs n
c rib
er
lis

Server Statistics (Locally)


t/
se
rv
i ce
cla
ss
Su
bs
c ri b
er
/s
er
vi
ce
cl
as
s
Avg. Value
Min. Value

Max. Value
Results for Centralized (cont..)
Results for distributed database
Response time Statistics for all sites
(Milliseconds)
Local Access Foreign Access
Queries Min Avg. Max Min Avg. Max
Normal Call 2.4 4.3 31 3.4 6.7 47.8
Refill 1.8 3.3 23.4 2.6 5.9 33.2
Balance Inquiry 1 1.65 15.2 1.6 2.5 26
Change language 1.2 2.05 17.4 2 3.6 27
SMS-Charge 1.4 2.15 21 2 2.9 28
Pre-activation 1 1.65 16.4 1.6 2.4 21.4

Statistics for local and foreign access

Local and Foreign Access Statistics


Fragmentation and Distribution
• Fragmentation is the process of splitting a
database into fragments
• Prior to fragmentation, data selection patterns
of applications running on the database are
identified
• Fragmentation is an important technique to
reduce the response time.
Issues using DDBMS
• Concurrency control
• Data Allocation
• Query processing and optimization
• Fragmentation
Inconsistency issue(Problem)
• Accessibility of distributed databases by
different users makes them vulnerable to
transaction interference
• Ignoring or weakening the property of
isolation may lead to inconsistency of data
Anomalies
• Concurrent access, if not handled properly,
may lead to a variety of anomalies in
distributed databases
– Dirty Read
– Lost Update
– Inconsistent Data Retrieval
– Phantom Read
Anomalies Contd..
• Dirty Read
– When one user reads a value changed by another
user before a commit or rollback is done. There is a
possibility of rollback that may lead the first user to
have read an invalid value
• Lost update
– In Distributed database data is accessed by multiple
users, it is possible that same data is accessed and
modified by the two users independenlty this leads
to the anomaly of lost update
Anomalies Contd..
• Inconsistent data retrieval
– This anomaly occurs when a data object is read
twice during a transaction and it is modified by
another transaction between two read operations
• Phantom Read
– Phantom read occurs when a query extracts a
subset of data to work further on it and it is
changed by another user during the execution of
the query
Inconsistent Retrieval Anomaly
Lost Update Anomaly
Concurrency Control
• Concurrency control is the activity of
coordinating concurrent accesses to a
database in a multiuser database
management system (DBMS). Concurrency
control permits users to access a database in a
multi programmed fashion while preserving
the illusion that each user is executing alone
on a dedicated system
Concurrency Control Techniques
• Locking
– Locking is locking the data objects in case of multiple
transactions
• timestamp ordering
– time-stamping is attaching a unique identifier to each
transaction
• multi-version
– multi-version is to use multiple versions of a data item
• optimistic concurrency control techniques
– rely mainly on the concept of verification and validation after a
transaction has executed its operations
Data Allocation
• While designing the distributed database the
data allocation is a key design issue
• The allocation of the data is a challenging task
In distributed database system it is often
required to allocate data as Fragmented,
Replicated and Centralized
Data Allocation
• While designing the distributed database the data
allocation is a key design issue

• Allocation of the data is a challenging task

• In distributed database system it is often required to


allocate data as Fragmented, Replicated and Centralized

• Data should be allocated or distributed to sites according


to some specific needs (user‟s need, system objective)
Data Allocation Techniques
• Broadly, there are four main strategies related to
allocation and placement of data
– Centralized
• Data are stored at a central server and the locality of reference
is low
– Fragmented
• data is divided into small pieces and these pieces are then
stored at different sites
– Complete replication
• copy all data at each site
– Selective replication
Fragmentation
• It is the decomposition of a relation into
fragments each being treated as a unit .

• Fragmentation is done according to the data


selection patterns of applications running on
the database
Types of Fragmentation
• Fragmentation is basically divided into two
categories
– Horizontal fragmentation
– Vertical fragmentation
• Primary horizontal fragmentation
• Derived horizontal fragmentation
References
• Questions?
• Research paper “concurrency control in ddbms”
• Notes
• Google it for more details
• Centralized and distributed notes
• Research paper chp. 13
• Or you can use Google for more details, read
research papers and articles

You might also like