You are on page 1of 7

UNIT 4

Q1- What is Distributed Database?

Ans- Distributed database (DDB) can be defined as a collection of multiple


logically interrelated databases distributed over a computer network.
In a distributed database, a database is physically stored in two or more computer
systems. Although geographically dispersed, a DDB system manages and controls the
entire database as a single collection of data. If redundant data is stored in separate
databases due to performance requirements, updates to one set of data will
automatically update the additional sets in a timely manner. A distributed system has
both data and transaction processing divided between one or more computers
connected by network, each computer playing a specific role in the system. DDBs bring
together the advantages of distributed computing and database management. DDBs
use client/server architecture to process information requests.
A DDB system allows applications to access data from local and remote
databases. In a Homogeneous DDB system, each database in the system is by the
same vendor. In a Heterogeneous DDB system, at least one of the databases will be
that of a different vendor. e.g. In a Oracle homogeneous DDB system all the databases
in the system will be oracle databases whereas in the heterogeneous one there would
be oracle databases and other databases like Sybase, DB2.
A DDB appears to a user as a single database but is, in fact, a set of databases
stored on multiple computers. The data on several computers can be simultaneously
accessed and modified using a network. Each database server in the DDB is controlled
by its local DBMS and each cooperates to maintain the consistency of the global
database.
Q2- What are the functions of the distributed database management
system?

Ans- Distribution of databases across a network leads to increased complexity in the


system design and implementation. To achieve the benefits of a distributed database
that we have seen earlier, the distributed database management software should be
able to perform the following functions in addition to the basic functions performed by a
non-distributed DBMS:
1. Distributed query processing:

Distributed query-processing means the ability to access remote sites and


transmit queries and data among the various sites along the communication network.
2. Data tracking:

The distributed DBMS should have the ability to keep track of the data
distribution, fragmentation and replication by expanding the distributed DBMS catalog.
3. Distributed transaction management:

Distributed transaction management is the ability to devise execution strategies


for queries and transactions that access data from more than one site and to
synchronize the access to distributed data and maintain the integrity of the overall
database.
4. Replicated data management:

This is the ability of the system to decide which copy of the replicated data item
to access and to maintain consistency of the copies of a replicated data item.
5. Distributed data recovery:

The distributed DBMS should have the ability to recover from individual site
crashes and failures of communication links.
6. Security:

Distributed transactions must be executed with the proper management of the


security of the data and the authorization and access privileges of users.
7. Distributed catalog management:

A catalog or directory contains information about the data inn the database -
metadata. The directory may be global for the entire distributed database or local for
each location.
Q3-What are the advantages of distributed database?

Ans-1.Transparency:

A DDB should be distribution transparent. This means hiding the details of where
each file is physically stored within the system. This is possible through DDBMS. The
following transparencies are achieved:
A. Distribution or network transparency:

This refers to the freedom for the user from the operational details of the
network. Network transparency may be divided into location transparency and
naming transparency. Location transparency refers to the fact that the command
used to perform a task independent of the location of the data and location of the
system where the command was issued. Naming transparency implies that once a
name is specified, the named objects can be accessed unambiguously without
additional specification.
B. SQL and COMMIT transparency:

A DDBMS provides query, update and transaction transparency. The standard


SQL statements like SELECT, INSERT, UPDATE and DELETE work just as they do
in a non-distributed database environment. The DDBMS should guarantee that all
nodes involved in a distributed transaction take the same action; either all commit or
all rollback the transaction. If a network or system failure occurs during the commit of
a distributed transaction, the transaction should be automatically and transparently
resolved.
C. Replication transparency:

Replication is the process by which we store copies of the database objects


at multiple sites for better availability, performance and reliability. Replication
transparency makes the user unaware of the existence of the copies.
D. Fragmentation transparency:

There are two types of fragmentation transparency - Horizontal and Vertical.


Horizontal fragmentation distributes a relation into sets of tuples. Vertical
fragmentation distributes a relation into sub-relations where each sub-relation is
defined by a subset of columns of the original relation. A global query by the user
must be transformed into several fragment queries. Fragmentation transparency
makes the user unaware of the existence of fragments.
2. Improved reliability and availability:

Reliability is defined as the probability that the system is up and running at a


certain point in time. Availability is the probability that the system is continuously
available during a time interval. When data and DBMS software are distributed over
several locations, one location may fail while other locations continue to operate. Only
the data and software of the failed location is inaccessible. This improves both reliability
and availability. Further improvement is achieved by judiciously replicating data and
software at more than one site. In the case of centralized systems the failure at one
location makes the whole system unavailable to all users.
3. Performance Improvement:

A DDBMS fragments the database by keeping the data closer to where it is


needed most. Data localization reduces the contention for CPU and I/O services and
network traffic. When a large database id distributed over multiple sites, smaller
databases exist at each site. As a result, local queries and transactions accessing data
at a single site have better performance because of the smaller local database. In
addition each site has a smaller number of transactions executing than if all transactions
are submitted to a single centralized database. Query performance can be improved by
executing multiple queries at different sites or breaking up a query into a number of
subqueries that execute in parallel.
4. Improved scalability:

In a distributed environment, expansion of the system in terms of adding more


data, increasing database sizes or adding more processors is much easier than other
systems.

5. Site autonomy:

Site autonomy means that each server participating in a DDB is administered


independently from all other databases. Although several databases can work together,
each database is a separate repository of data that is managed individually. Some of
the benefits are:
A. Local administrators control corresponding local data, therefore each database
administrator's domain of responsibility is smaller and more manageable.
B. Independent failures are less likely to disrupt other nodes of the DDB.
C. Administrators can recover from isolated system failures independently from other
nodes in the system.
D. A data dictionary exists for each local database - a global catalog is not necessary
to access local data.
E. Nodes can upgrade software independently.
F. Nodes of the system can mirror the logical organization of groups that need to
maintain independence.
Q4- Explain client/server database architecture.

Ans - Full scale DDBMSs has not been developed to support all the types of
functionalities. Instead, distributed database applications are being developed in the
context of the client-server architecture. It is now more common to use a 3-tier
architecture, particularly with web applications.

2-Tier model:

The client-server model is basic to distributed systems. It is a response to the


limitations presented by the traditional mainframe client-host model, in which a single
mainframe provides shared data access to many dumb terminals. The client-server
model is also a response to the LAN model, in which many isolated systems access a
file server that provides no processing power.
Client-server architecture provides integration of data and services and allows
clients to be isolated from inherent complexities such as communication protocols. The
simplicity of the client-server architecture allows clients to make requests that are routed
to the appropriate server. These requests are made in the form of transactions. The
client-server model consists of three parts:

1. Client:

The client is the machine running the front-end applications. It interacts with the
user through the keyboard, display and pointing device such as a mouse. The client
also refers to the client process that runs on the client machine. The client has no direct
data access responsibilities. It simply requests processes from the server and displays
data managed by the server. Therefore, the client workstation can be optimized for its
job. e.g it might not need large disk capacity, or it might benefit from graphic capabilities.
2. Server:

The server is the machine that runs the DBMS software and handles the
functions required for concurrent, shared data access. It is often referred to as the
backend. Server also refers to the server process that runs on the server machine. The
server receives and processes SQL and other query statements originating from client
applications. The server can also be optimized for its duties. e.g it can have large disk
capacity and fast processors. It can also take the load of disk I/O, printing, file transfer,
and so on.
3. Network:

The network enables remote data access through client-server and server-to-
server communication.
The clients and servers may all be located in physical proximity-say within the
same building- and connected via a LAN or they may be geographically distributed over
large distances and connected by WAN. LANs typically use cables whereas WANs use
telephone lines or satellites. It is also possible to use a combination of the two types of
networks. Networks will have different topologies. The topology defines the
communication paths among locations. The type of topology of the network used may
have significant effect on performance and hence on the strategies for distributed query
processing and DDB design.
A database server is the DBMS software managing a database and a client is an
application that requests information from a server. Each computer in a network is a
node that can host one or more databases. Each node is a distributed database system
can act as a client, a server or both, depending on the situation. Certain features are
always present in the client-server model. There is a server process that can process
requests from one or more client processes concurrently over a network connection.
The client machine provides front-end application software for accessing the data on the
server. The client initiates transactions; the server processes the transactions. Thee is a
SQL that can be used to access data stored on the server side.
e.g Assume that there are 2 database servers HQ(EMP table) and
BRANCH(SALES table) which are located in two separate locations connected
through a network.

A client can connect directly or indirectly to a database server. A direct


connection occurs when a client connects to a server and accesses information from a
database contained on that server. e.g if you connect to the HQ database and access
the EMP table on this database, you would write:
SELECT * FROM EMP;

This query is direct because you are not accessing an object on aremote
database. In contrast, an indirect connection occurs when a client connects to a server
and then accesses information contained in a database on a different server. e.g. if you
connect to the HQ database but access the SALES table on the remote BRANCH
database, you have to write:
SELECT * FROM SALES@BRANCH;

This query is indirect because the object you are accessing is not on the
database to which you are directly connected.

3-Tier model:

In the 3-tier model, the following 3 layers exists:

1. Presentation layer(client): This provides the user interface and interacts with the user.

2. Application layer(Business logic): This layer programs the application logic.

3. Database server: This layer handles query and update requests from the application
layer, processes the requests and send the results.
It is a common approach to include the functionality of a centralizzed DBMS at
the database server level. A number of relational DBMS products have taken this
approach, where an SQL server is provided. The application server must then formulate
the appropriate SQL queries and connect to the database server when needed. The
client provides the processing for user interface interactions.
In this architecture, the application server may also refer to a data dictionary that
includes information on the distribution of data among the various SQL servers as well
as modules for decomposing a global query into a number of local queries that can be
executed at various sites. Interaction between application server and database server
might proceed as follows during the processing of an SQL query:
1. The application server formulates a user query based on input from the client
layer and decomposes it into a number of independent site queries. Each site query is
sent to the appropriate database server site.
2. Each database server processes the local query and sends the results to the
application server site.
3. The application server combines the results of the subqueries to produce the
result of the originally required query, formats it into HTML or some other form accepted
by the client, and sends it to the client site for display.

If the DDBMS has the capability to hide the details of data distribution from the
application server, then it enables the application server to execute global queries and
transactions as though the database were centralized, without having to specify the
sites at which the data referenced in the query or transaction resides. This property is
called distribution transparency.

UNIT 5

Q1- What is data warehouse?

Ans-

You might also like