You are on page 1of 5

Advance Database Systems Notes

Lecture # 1

Distributed Database systems:

 A distributed database system is collection of sites connected by a communication network.


 Each site has his full database system of own but each site on network is agreed on using the data of each other which
stored by user on own site.
 When we access Distributed database by any query then data comes in result by combining all of distributed datasets on
that network.
 Distributed data sets communicate between each other by communication network on which they are connected.
 Distributed Database can be created by creating different sites of data like Multiple Hard-drives of data or multiple data
stores.

Advantages of Distributed Database systems:

 Distributed database is easy & fast to access b/c that databse is already distributed on different parts & users have
to access with locally stored subset of data.
 Distributed databases provide connectivity to huge data sets(Islands) of information.
 Distributed databases are easily accessible b/c all data is divided in multiple groups so we can find data in specific
group of data which we want.
 Growth facilitation: New sites on network can be easily added with out affecting the other sites on network.
 Data is located where its demand is high for matching the business requirements.
 Reduce operating cast: We can add multiple work-stations without changing the Mainframe so it is less expensive.
 Improved Communication: Data is stored on local sites so data can access quickly.
 Less danger of single point failure b/c data is distributed on multiple sites.
 Each site comes with own processor so processor power is also distributed.
 User Friendly Interface: PCs & work station provide GUI based interface, is easy to handle & understand for trainees.

Disadvantages of Distributed Database systems:

 Distributed database systems are very complex


 Distributed databases are hard to secure them are connect them so they are difficult to manage as compare to
centralized database systems.
 Lack of standards: There is no any industry level communication protocol for DDBMS.
 Increased storage requirements: b/c data is very large & Multiple copies of same data distributed on multiple sites.
 Training cost for database administrators also high b/c DDBMS is very complex to design & handle.

Fundamental Principles of Distributed database systems:

 Distributed database systems should look like centralized database systems.


 Distributed database systems should be as fast as centralized database systems.
 DDBMS should perform all the functions of centralized DBMS.
 DDBMS should perform some necessary functions which imposed b/c of distribution of data & also some of additional
functions.

12 Principles:

1. Local autonomy:
 All distributed database sites should be independent from each other for successful completion of regular
operations
 Local data should be locally owned & also managed with local accountability
 Security, integrity & storage is under control of local site
2. No reliance on central site:
 All sites will be treated equally
 In distributed database systems none of any site is “Master” b/c central site might be bottle neck & also it could
be vulnerable & can affect all sites of that network.
3. Continues operation:
 Distributed database system should provide “Reliability & greater availability”
i. Reliability refers to system should be up & running at any given moment, if it faces any crash or failure.
ii. Availability refers to system should be up & running continuously throughout a specific period of time.
4. Location independence(Transparency):
 User should not have to know where data is physically stored
 & when user use it user will feel as he is using data on his own site.
5. Fragmentation Independence:
 Data should be stored where that data is mostly in use by that we can reduce traffic
 System support data fragmentation so relations can be divided in pieces of fragments for physically storage
purpose.
6. Replication independence:
 System support data replication which means we can store multiple copies of same relations on different sites.
 Replication independence provide better availability.
 All replicated copies should also be updated on every dated for prevent from updated propagation.
7. Hardware independence:
 System should be independent from hardware(Dell, IBM, HP) for working.
 System should run on any hardware.
8. Network Independence:
 DBMS should be independent from network architecture (Wired or wireless)
 DBMS should work on any type of network
9. DBMS Independence:
 System should be independent from any DBMS(Oracle, MySQL, DB2 etc)
 System should communicate different types of DBMS.
 System should support homogenous or heterogenous database systems
10. Operating system independence:
 System should be independent from Operating system
 System should run on any of operating system platform
11. Distributed query processing:
 If local user has to see the data of another city site, then user will be sent request to other city site from local
site to get the other city site.
 Distributed database system should be optimized as centralized database system.
12. Distributed transaction management:
 Transaction has agents which help in updating many sites at a time.
 Recovery control should be supported by systems so in any crash or error site can be rollback or commit.
 Recovery control can be achieved by two phase commit protocol.
 Relation will be locked when multiple users trying to access one relation con-currently.
Following are some of the adversities associated with distributed databases.
 Need for complex and expensive software − DDBMS demands complex and often expensive software to
provide data transparency and co-ordination across the several sites.
 Processing overhead − Even simple operations may require a large number of communications and
additional calculations to provide uniformity in data across the sites.
 Data integrity − The need for updating data in multiple sites pose problems of data integrity.
 Overheads for improper data distribution − Responsiveness of queries is largely dependent upon proper
data distribution. Improper data distribution often leads to very slow response to user requests.
Lecture # 2
Problems with the Distributed Database Systems:
When we have slow internet b/c of limited bandwidth than we face some problems to make services available for users in short
time, those problems with DDBMS are:

 Query processing:
Query Processing is a translation of high-level queries into low-level expression. It is a step
wise process that can be used at the physical level of the file system, query optimization and actual
execution of the query to get the result.
The objective of query processing is minimizing network utilization so query should be
distributed b/c data will come from different sites.
 Catalogue management:
System catalogue is data about the data
System catalogue stored the information about the database like Relations in Database, Constraints in
relation etc.
But problem is with DDBMS that where to store system catalogue?
There are some of possibilities where we can store system catalogue for DDBMS:
1. Centralized: System catalogue should be stored on central site but this will violate the “No
reliance on a central site” objective.
2. Fully replicated: Total catalogue stored entirely on every site but I will violets the “Loss of
autonomy” b/c updating in every relation is very hard.
3. Partitioned: Each site store their on catalogue & total catalogue is the union of all disjoint local
catalogues but it will make expensive to remote operations b/c on every transaction we have to
access half of the site on average.
4. Combination of 1 & 3: Each site will maintain a partition of their own catalogue & also store the
copy of unified of entire system catalogue on central site but this approach will again violets the
“No reliance on central site” but this approach is feasible in all above three approaches b/c site
will access its own catalogue locally & it has to go only on central catalogue for remote
operation.
 Object naming:
Object names is coming issues in experience when a site wants to access the name distributed database system
but in DDBMS there are multiple tables which have that name & all names are same then system will confused
from where to get that name & what needs to that site so for get rid of that object naming problem we can use
that name with relation name like R.name but this will violets the location independence (User do not need to
know from where that data is being accessed),
So, we will use R * Approach & R * SQL approach.
R * approach:
 Map the name to system-wide name, which globally unique internal identifier for objects.
 System wide name has variables like Creator id(User who created that object), Creator site(Site id on
which that object was created), Local Name(Name of the object), Birth site ID(ID of the site where object
was initially stored).
 E.g., Prince @ Matli.Abdul_Qayoom @ hyd.
R * SQL:
 Create synonyms for that object for globally uniquely identification
 CREATE SYSNONYM RD FOR Prince @ Matli.Abdul_Qayoom @ hyd.
This will create a synonym table in data dictionary, this synonym table contains catalogue for every
object stored on that site & also store currently stored objects catalogue.
How Synonym table work:
1. User refers to RD
2. System will look for that name in synonym table
3. When system find that name & now system knows that where this object was firstly stored
which was in Matli so system will use Matli catalogue
4. If that object is migrated from Matli to hyd site then system will delete the matli entry & Insert
& update the hyd entry in the system & find that object in hyd site catalogue.
 Update Propagation:
If we create a replication of some objects & during updating our system get stop responding b/c of any error &
copy of that is deleted but our updating is not complete so this will create propagation problem.
We can deal with update propagation with primary copy scheme.
Primary copy scheme:
 One copy of each replicated object is designated as primary copy & other all copies will be secondary
copies.
 Primary copy of each replica will be distributed on each site.
 Recovery control:
Recovery control in DDBMS is dependent on two phase Commit Protocol which means each site in DDBMS
should be capable acting as coordinator for some of transaction & also able to participate in recovery control
process of other sites b/c if our site is fully dependent on other sites then this will violets “No reliance of central
site” rule.
 Concurrency control:
Concurrency control in most of distributed systems is based on locking, & if each site on system is responsible for locks
on objects stored on that site, for that implementation we will need 5n messages:
1. n Lock requests
2. n lock grants
3. n update messages
4. n acknowledgements
5. n unlock requests
Lecture # 3
Components of DDBMS:
Computer workstations:
o
Network hardware & software:
Communication media:
Transaction processors:
o These are embedded on every system which help to request data.
Data processors or Data managers:
o

You might also like