Distributed Databases

• •

An Introduction to Distributed Databases Replicating Data

Good sense is of all things in the world the most equally distributed, for everybody thinks he is so well supplied with it, that even the most difficult to please in all other matters never desire more of it than they already possess. Rene Descartes: Le Discours de la Methode This chapter describes what a distributed database is, the benefits of distributed database systems, and the Oracle distributed database architecture. The chapter includes:
• •

An Introduction to Distributed Databases Replicating Data

Note: The information in this chapter applies only for those systems using Oracle with the distributed or advanced replication options. See Oracle7 Server Distributed Systems, Volume I and Oracle7 Server Distributed Systems, Volume II for more information about distributed database systems and replicated environments. If you are using Trusted Oracle, see the Trusted Oracle7 Server Administrator's Guide for information about distributed databases in that environment. An Introduction to Distributed Databases A distributed database appears to a user as a single database but is, in fact, a set of databases stored on multiple computers. The data on several computers can be simultaneously accessed and modified using a network. Each database server in the distributed database is controlled by its local DBMS, and each cooperates to maintain the consistency of the global database. Figure 21 - 1 illustrates a representative distributed database system. The following sections outline some of the general terminology and concepts used to discuss distributed database systems. Clients, Servers, and Nodes A database server is the software managing a database, and a client is an application that requests information from a server. Each computer in a system is a node. A node in a distributed database system can be a client, a server, or both. For example, in Figure 21 - 1, the computer that manages the HQ database is acting as a database server when a statement is issued against its own data (for example, the second statement in each transaction issues a query against the local DEPT table), and is acting as a client when it issues a statement against remote data (for example, the first statement in each transaction is issued against the remote table EMP in the SALES database). Oracle supports heterogeneous client/server environments where clients and servers use different character sets. The character set used by a client is defined by the value of the NLS_LANG parameter for the client session. The character set used by a server is its database character set. Data conversion is done automatically between these

In Figure 21 . Direct and Indirect Connections A client can connect directly or indirectly to a database server.1. Although all the databases can work together. as though each database was a non-distributed database. they are distinct. separate repositories of data and are administered individually. An Example of a Distributed DBMS Architecture Site Autonomy Site autonomy means that each server participating in a distributed database is administered independently (for security and backup operations) from the other databases. For more information about National Language Support features.character sets if they are different. the client is connected directly to the intermediate HQ database and indirectly to the SALES database that contains the remote data.1. Some of the benefits of site autonomy are as follows: . Figure 21 . refer to Oracle7 Server Reference. when the client application issues the first and third statements for each transaction.

Distributed database management systems simply extend the hierarchical naming model by enforcing unique database names within a network. Because uniqueness is enforced at each level of the hierarchical structure. Therefore. The global database is partially available as long as one database and the network are available. A data dictionary exists for each local database. • • • • • Schema Objects and Naming in a Distributed Database A schema object (for example. Nodes can upgrade software independently. To resolve references to objects (a process called name resolution) within a single database. an object's global object name is guaranteed to be unique within the distributed database. a table) is accessible from all nodes that form a distributed database. Local data is controlled by the local database administrator. . no single database failure need halt all global operations or be a performance bottleneck. As a result. and references to the object's global object name can be resolved among the nodes of the system. an object's local name is guaranteed to be unique within the database and references to the object's local name can be easily resolved.2 illustrates a representative hierarchical arrangement of databases throughout a network and how a global database name is formed. just as a non-distributed local DBMS architecture must provide an unambiguous naming scheme to distinctly reference objects within the local database. the DBMS usually forms object names using a hierarchical approach. a DBMS guarantees that each schema has a unique name. For example. each object has a unique name. and that within a schema. within a single database. For example.• Nodes of the system can mirror the logical organization of companies or cooperating organizations that need to maintain an "arms length" relationship. Figure 21 . Independent failures are less likely to disrupt other nodes of the distributed database. each database administrator's domain of responsibility is smaller and more manageable. Failure recovery is usually performed on an individual node basis. Therefore. a distributed DBMS must use a naming scheme that ensures that objects throughout the distributed database can be uniquely identified and referenced.

the following statement creates a database link in the local database.com . Oracle uses database links.division3.acme.. Statements and Transactions in a Distributed Database . At this point.ACME. CREATE PUBLIC DATABASE LINK sales.com. .emp@sales. Database links are essentially transparent to users of a distributed database.DIVISION3. because the name of a database link is the same as the global name of the database to which the link points. For example. the SALES database link is implicitly used to facilitate the connection to the SALES database. For example.division3.Figure 21 . A database link defines a "path" to a remote database.EMP in the SALES database: SELECT * FROM scott.2. any application or user connected to the local database can access data in the SALES database by using global object names when referencing objects in the SALES database. The database link named SALES. Network Directories and Global Database Names Database Links To facilitate connections between the individual databases of a distributed database. consider the following remote query that references the remote table SCOTT.COM describes a path to a remote database of the same name..acme.

A distributed update is possible using a program unit. all of which reside at the same remote node. such as a procedure or trigger. A distributed transaction is any transaction that includes one or more statements that. A two-phase commit mechanism guarantees that all database servers participating in a distributed transaction either all commit or all roll back the statements in the transaction. remote procedure calls. A remote update is an update that modifies data in one or more tables. Remote and Distributed Statements A remote query is a query that selects information from one or more remote tables. A distributed update modifies data on two or more nodes. Note: A remote update may include a subquery that retrieves data from one or more remote nodes. update data on two or more distinct nodes of a distributed database. are either committed or rolled back as a unit. Oracle must coordinate transaction control over a network and maintain data consistency. including queries. Remote and Distributed Transactions A remote transaction is a transaction that contains one or more remote statements. updates. Two-phase commit is described in Chapter 1. If all statements of a transaction reference only a single remote node. Two-Phase Commit Mechanism A DBMS must guarantee that all statements in a transaction.The following sections introduce the terminology used when discussing statements and transactions in a distributed database environment. "Introduction to the Oracle Server". and triggers. all of which reference the same remote node. . A two-phase commit mechanism also protects implicit DML operations performed by integrity constraints. distributed or nondistributed. "Transaction Management". The effects of a transaction should be either visible or invisible to all other transactions at all nodes. A distributed query retrieves information from two or more nodes. and the execution of the program succeeds or fails as a unit. the statement is classified as a remote update. The general mechanisms of transaction control in a non-distributed database are discussed in Chapter 12. or remote procedure calls. this should be true for transactions that include any type of operation. but because the update is performed at only a single remote node. the data in the logical database can be kept consistent. so that if the transaction is designed properly. the transaction is remote. Transparency in a Distributed Database System The functionality of a distributed database system must be provided in such a manner that the complexities of the distributed database are transparent to both the database users and the database administrators. Statements in the program unit are sent to the remote nodes. even if a network or system failure occurs. all of which are located at the same remote node. In a distributed database. individually or as a group. not distributed. that includes two or more remote updates that access data on different nodes.

• • A distributed DBMS architecture should also provide facilities to transparently replicate data among the nodes of the system. • A distributed database system should also provide query. and ROLLBACK commands. INSERT. and . The distributed database must also perform with acceptable speed. Location transparency is beneficial for the following reasons: • Access to remote data is simplified. For example. a distributed database system should provide methods to hide the physical location of objects throughout the system from applications and users. a local database server communicates with the remote database using the network. without requiring complex programming or other special operations to provide distributed transaction control. SQL*Net and Network Independence When data is required from remote databases. standard SQL commands. update. Location transparency exists if a user can refer to the same table the same way. the functional transparencies explained above are not sufficient alone. that is. the transaction is automatically and transparently resolved globally. the nodes either all commit or all roll back the transaction. SAVEPOINT. Transaction transparency occurs when the DBMS provides the functionality described below using standard SQL COMMIT. and DELETE. should allow users to access remote data without the requirement for any programming. and transaction transparency. regardless of the node to which the user connects. when the network or system is restored. If a database that contains a critical table experiences a prolonged failure. The DBMS guarantees that all nodes involved in a distributed transaction take the same action: they either all commit or all roll back the transaction. replicates of the table in other databases can still be accessed. UPDATE. • The statements in a single transaction can reference any number of local or remote tables. Maintaining copies of a table across the databases in a distributed database is often desired so that • Tables that have high query and low update activity can be accessed faster by local user sessions because no network communication is necessary. because the database users do not need to know the location of objects. Objects can be moved with no impact on end-users or database applications. network communications software. Finally.For example. such as SELECT. • A DBMS that manages a distributed database should make table replication transparent to users working with the replicated tables. If a network or system failure occurs during the commit of a distributed transaction.

All Oracle databases are connected by a network and use SQL*Net to maintain communication.3 illustrates a heterogeneous distributed database system encompassing different versions of Oracle and non-Oracle databases. any application directly connected to an Oracle database can issue a SQL statement that accesses remote data in the following ways: • Data in another Oracle database is available. See the appropriate SQL*Connect documentation for more information about this product. SQL*Connect. For more information about SQL*Net and its features. see "SQL*Net" . • Figure 21 . The Mechanics of a Heterogeneous Distributed Database In a distributed database. You can connect the Oracle and non-Oracle databases with a network and use SQL*Net to maintain communication. it also connects database servers across networks to facilitate distributed transactions. Heterogeneous Distributed Database Systems The Oracle distributed database architecture allows the mix of different versions of Oracle along with database products from other companies to create a heterogeneous distributed database system. Data in a non-Oracle database (such as an IBM DB2 database) is available. . Just as SQL*Net connects clients and servers that operate on different computers of a network.Oracle's SQL*Net. assuming that the non-Oracle database is supported by Oracle's gateway architecture. no matter what version.

Heterogeneous Distributed Database Systems When connections from an Oracle node to a remote node (Oracle or non-Oracle) initially are established. in heterogeneous distributed systems. the connecting Oracle node records the capabilities of each remote system and the associated gateways. see your specific SQL*Connect documentation for more information on the capabilities of your system. the function may have to be performed by the local Oracle database.3. SQL statement execution proceeds. . an outer join).Figure 21 . Extended SQL functions in remote updates (for example. However. as described in the section "Statements and Transactions in a Distributed Database" . if a remote or distributed query includes an Oracle extended SQL function (for example. SQL statements issued from an Oracle database to a non-Oracle remote database server are limited by the capabilities of the remote database server and associated gateway. an outer join in a subquery) are not supported by all gateways. For example.