You are on page 1of 45

Advanced RDBMS UNIT V

Topics: Enhanced Data Models for Advanced Applications Temporal Database Concepts Spatial and Multimedia Database Distributed Databases and Client Server Architecture Data Fragmentation, Replication and Allocation Techniques Types of Distributed Database Systems Query Processing in Distributed Databases Overview of Concurrency Control and Recovery in Distributed Databases Client- Server Architecture and its Relationship to Distributed Databases Distributed Databases in Oracle Deductive Databases Prolog/Datalog Notation-Interpretation of Rules Basic Interface Mechanisms for Logic Programs

5.0 Introduction
Enhanced Data Models for Advanced Applications are data models are an extension to the Datamodels what we had already come across in Database Architecture. These advance application is used to incorporate Spatial and Multimedia databases Which are very eminently used in modern information technology. The Temporal database on the other hand looks at the calendar events etc., The spatial databases deals with Geographical information system, Weather, maps etc.,

5.1 Objective
The objective of this lesson is to learn and understand the enhanced data model in the Active Database and triggers, the concepts of the Distributed Database Management System and the Security concern of the same. The problem areas of the security is analysed. The terms prolog and DataLog notation and deductive databases.

5.2 Contents
5.2.1 Enhanced Data Models for Advanced Applications Active database & triggers Temporal databases Spatial and Multimedia databases Deductive databases

Page 161

Advanced RDBMS
Active database & triggers Triggers are executed when a specified condition occurs during insert /delete/update. Triggers are action that fire automatically based on these conditions. Triggers follow an Event-condition-action (ECA) model. Event: Database modification E.g., insert, delete, update), Condition: Any true/false expression Optional: If no condition is specified then condition is always true Action: Sequence of SQL statements that will be automatically executed When a new employees is added to a department, modify the Totalfisal of the Department to include the new employees salary Condition Logically this means that we will CREATE a TRIGGER, let us call the trigger Total fsal1 This trigger will execute AFTER INSERT ON Employee table It will do the following FOR EACH ROW WHEN NEW.Dno is NOT NULL The trigger will UPDATE DEPARTMENT By SETting the new Totalfisal to be the sum of old Totalfisal and NEW. Salary WHERE the Dno matches the NEW.Dno; Example: Trigger Definition

CREATE TRIGGER Totalfisal1 AFTER INSERT ON Employee FOR EACH ROW WHEN (NEW.Dno is NOT NULL) UPDATE DEPARTMENT SET Totalfisal = Totalfisal + NEW. Salary WHERE Dno = NEW.Dno;

Page 162

Advanced RDBMS
Can be FOR, AFTER, INSTEAD OF Can be INSERT, UPDATE, DELETE Can be CREATE or ALTER CREATE or ALTER TRIGGER CREATE TRIGGER <name> ALTER TRIGGER <name> CREATE OR ALTER TRIGGER <name> Creates a trigger Alters a trigger (assuming one exists) Creates a trigger if one does not exist Alters a trigger if one does exist Works in both cases, whether a trigger exists or not

Conditions AFTER BEFORE INSTEAD OF Executes after the event Executes before the event Executes instead of the event Note that event does not execute in this case E.g., used for modifying views Row-Level versus Statement-level Triggers can be Row-level FOR EACH ROW Statement-level Row level triggers Statement-level triggers Condition Any true/false condition to control whether a trigger is activated on not Absence of condition means that the trigger will always execute for the even Otherwise, condition is evaluated before the event for BEFORE trigger after the event for AFTER trigger Action Action can be One SQL statement A sequence of SQL statements enclosed between a BEGIN and an END Action specifies the relevant modifications a. Triggers on Views INSTEAD OF triggers are used to process view modifications specifies a row-level trigger Default (when FOR EACH ROW is not specified) Executed separately for each affected row Execute once for the SQL statement,

Page 163

Advanced RDBMS
b. Active Database Concepts and Triggers An active database allows users to make the following changes to triggers (rules) Activate Deactivate Drop An event can be considered in 3 ways Immediate consideration Deferred consideration Detached consideration Immediate consideration: Part of the same transaction and can be one of the following depending on the situation: Before After Instead of Deferred consideration: Condition is evaluated at the end of the transaction Detached consideration: Condition is evaluated in a separate transaction and the Potential Applications for Active Databases are as follows: Notification Automatic notification when certain condition occurs Enforcing integrity constraints Triggers are smarter and more powerful than constraints Maintenance of derived data Automatically update derived data and avoid anomalies due to redundancy. E.g., trigger to update the Totalfisal c. Triggers in SQL-99 Can alias variables inside the REFERENCING clause. Trigger examples are Create Trigger TotalfiSal After Update of Salary on Employee Referencing OLD Row as O new Row as N For each Row When ( NDno IS NOT NULL) UPDATE Department Set Totalfisal = Totalfisal +N.salary-O.salary Where Dno =N.Dno; Temporal Database Concepts

Page 164

Advanced RDBMS
Temporal Databases are with respect to Time Representation, Calendars, and Time Dimensions. Time is considered ordered sequence of points in some granularity. A calendar organizes time into different time units for convenience. Time Representation is in terms of Point events Single time point event E.g., bank deposit Series of point events can form a time series data Duration events Associated with specific time period. Time period is represented by start time and end time. Transaction time is the time when the information from a certain transaction becomes valid. Bitemporal database are Databases dealing with two time dimensions Incorporating Time in Relational Databases Using Tuple Versioning this is done by Adding to every tuple. Validate start time and Valid end time a) EmpfiVT Name ENo. DeptfiVT DName Salary Dno Supervisor Name

DNo.

Totalfisal Managerfinam

b) EmpfiTT Name ENo. DeptfiTT DName DNo.

Salary

Dno

Supervisor Name

Totalfisal Managerfinam

Incorporating Time in Object-Oriented Databases Using Attribute Versioning A single complex object stores all temporal changes of the object Time varying attribute An attribute that changes over time E.g., age Non-Time varying attribute An attribute that does not changes over time (fixed) E.g., date of birth Spatial and Multimedia Databases a. Spatial Database Concepts Keep track of objects in a multi-dimensional space Maps Geographical Information Systems (GIS)

Page 165

Advanced RDBMS
Weather In general spatial databases are n-dimensional This discussion is limited to 2-dimensional spatial databases b. Typical Spatial Queries Range query: Finds objects of a particular type within a particular distance from a given location E.g., Taco Bells in Pleasanton, CA Nearest Neighbor query: Finds objects of a particular type that is nearest to a given location E.g., Nearest Taco Bell from an address in Pleasanton, CA Spatial joins or overlays: Joins objects of two types based on some spatial condition (intersecting, overlapping, within certain distance, etc.) E.g., All Taco Bells within 2 miles from I-680. c. R-trees Technique for typical spatial queries. Group objects close in spatial proximity on the same leaf nodes of a tree structured index Internal nodes define areas (rectangles) that cover all areas of the rectangles in its subtree. d. Quad trees Divide subspaces into equally sized areas. In the years ahead multimedia information systems are expected to dominate our daily lives. Our houses will be wired for bandwidth to handle interactive multimedia applications. Our high-definition TV/computer workstations will have access to a large number of databases, including digital libraries, image and video databases that will distribute vast amounts of multisource multimedia content. e. Multimedia Databases Types of multimedia data are available in current systems Text: May be formatted or unformatted. For ease of parsing structured documents, standards like SGML and variations such as HTML are being used. Graphics: Examples include drawings and illustrations that are encoded using some descriptive standards (e.g. CGM, PICT, postscript). Images: Includes drawings, photographs, and so forth, encoded in standard formats such as bitmap, JPEG, and MPEG. Compression is built into JPEG and MPEG.

Page 166

Advanced RDBMS
These images are not subdivided into components. Hence querying them by content (e.g., find all images containing circles) is nontrivial. Animations: Temporal sequences of image or graphic data. Video: A set of temporally sequenced photographic data for presentation at specified rates for example, 30 frames per second. Structured audio: A sequence of audio components comprising note, tone, duration, and so forth. Audio: Sample data generated from aural recordings in a string of bits in digitized form. Analog recordings are typically converted into digital form before storage. Composite or mixed multimedia data: A combination of multimedia data types such as audio and video which may be physically mixed to yield a new storage format or logically mixed while retaining original types and formats. Composite data also contains additional control information describing how the information should be rendered. Nature of Multimedia Applications: Multimedia data may be stored, delivered, and utilized in many different ways. Applications may be categorized based on their data management characteristics. 5.2.2. Distributed Database and Client Server Architecture Distributed Database Concepts The distributed database has all of the security concerns of a single-site database plus several additional problem areas. We begin our investigation with a review of the security elements common to all database systems and those issues specific to distributed systems. A secure database must satisfy the following requirements (subject to the specific priorities of the intended application): 1. It must have physical integrity (protection from data loss caused by power failures or natural disaster), 2. It must have logical integrity (protection of the logical structure of the database), 3. It must be available when needed, 4. The system must have an audit system, 5. It must have elemental integrity (accurate data), 6. Access must be controlled to some degree depending on the sensitivity of the data, 7. A system must be in place to authenticate the users of the system, and 8. Sensitive data must be protected from inference [Pflee89]. The following discussion focuses on requirements 5-8 above, since these security areas are directly affected by the choice of DBMS model. The key goal of these requirements is

Page 167

Advanced RDBMS
to ensure that data stored in the DBMS is protected from unauthorized observation or inference, unauthorized modification, and from inaccurate updates. This can be accomplished by using access controls, concurrency controls, updates using the two-phase commit procedure (this avoids integrity problems resulting from physical failure of the database during a transaction), and inference reduction strategies The level of access restriction depends on the sensitivity of the data and the degree to which the developer adheres to the principal of least privilege (access limited to only those items required to carry out assigned tasks). Typically, a lattice is maintained in the DBMS that stores the access privileges of individual users. When a user logs on, the interface obtains the specific privileges for the user. According to Pfleeger [Pflee89], access permission may be predicated on the satisfaction of one or more of the following criteria: (1) Availability of data: Unavailability of data is commonly caused by the locking of a particular data element by another subject, which forces the requesting subject to wait in a queue. (2) Acceptability of access: Only authorized users may view and or modify the data. In a single level system, this is relatively easy to implement. If the user is unauthorized, the operating system does not allow system access. On a multilevel system, access control is considerably more difficult to implement, because the DBMS must enforce the discretionary access privileges of the user. (3) Assurance of authenticity: This includes the restriction of access to normal working hours to help ensure that the registered user is genuine. It also includes a usage analysis which is used to determine if the current use is consistent with the needs of the registered user, thereby reducing the probability of a fishing expedition or an inference attack. Concurrency controls help to ensure the integrity of the data. These controls regulate the manner in which the data is used when more than one user is using the same data element. These are particularly important in the effective management of a distributed system, because, in many cases, no single DBMS controls data access. If effective concurrency controls are not integrated into the distributed system, several problems can arise. Bell and Grisom [BellGris92] identify three possible sources of concurrency problems: (1) Lost update: A successful update was inadvertently erased by another user. (2) Unsynchronized transactions that violate integrity constraints. (3) Unrepeatable read: Data retrieved is inaccurate because it was obtained during an update. Each of these problems can be reduced or eliminated by implementing a suitable

Page 168

Advanced RDBMS
locking scheme (only one subject has access to a given entity for the duration of the lock) or a timestamp method (the subject with the earlier timestamp receives priority) Special problems exist for a DBMS that has multilevel access. In a multilevel access system, users are restricted from having complete data access. Policies restricting user access to certain data elements may result from secrecy requirements, or they may result from adherence to the principal of least privilege (a user only has access to relevant information). Access policies for multilevel systems are typically referred to as either open or closed. In an open system, all the data is considered unclassified unless access to a particular data element is expressly forbidden. A closed system is just the opposite. In this case, access to all data is prohibited unless the user has specific access privileges. Classification of data elements is not a simple task. This is due, in part, to conflicting goals. The first goal is to provide the database user with access to all non-sensitive data. The second goal is to protect sensitive data from unauthorized observation or inference. For example, the salaries for all of a given firm's employees may be considered nonsensitive as long as the employee's names are not associated with the salaries. Legitimate use can be made of this data. Summary statistics could be developed such as mean executive salary and mean salary by gender. Yet an inference could be made from this data. For example, it would be fairly easy to identify the salaries of the top executives. Another problem is data security classification. There is no clear-cut way to classify data. Millen and Lunt [MilLun92] demonstrate the complexity of the problem: They state that when classifying a data element, there are three dimensions: 1. The data may be classified. 2. The existence of the data may be classified. 3. The reason for classifying the data may be classified [MilLun92]. The first dimension is the easiest to handle. Access to a classified data item is simply denied. The other two dimensions require more thought and more creative strategies. For example, if an unauthorized user requests a data item whose existence is classified, how does the system respond? A poorly planned response would allow the user to make inferences about the data that would potentially compromise it. Key Issues in Distributed Databases Three key issues we have to consider in DDS are: Data Allocation: where are data placed? Data should be stored at site with "optimal" distribution. Fragmentation: relation may be divided into a number of sub-relations (called fragments) , which are stored in different sites. Replication: copy of fragment may be maintained at several sites. Definition and allocation of fragments carried out strategically to achieve: Locality of Reference Improved Reliability and Availability Improved Performance

Page 169

Advanced RDBMS
Balanced Storage Capacities and Costs Minimal Communication Costs. Involves analysing most important transactions, based on quantitative/qualitative information. a. Data Allocation Four strategies regarding placement of data are: Centralized Partitioned (or Fragmented) Complete Replication Selective Replication Centralized: Consists of single database stored at one site with users distributed across the network. Partitioned: Database partitioned into disjoint fragments, each fragment assigned to one site. Complete Replication: Consists of maintaining complete copy of database at each site. Selective Replication: Combination of partitioning, replication, and centralization. b. Data Fragmentation In a DDS, it is important to determine the site used to store the data. In order to assess the need for a distributed database system, the required partitioning of the data or fragmentation must first be studied. The distributed database can involve both horizontal and vertical partitioning. Four types of fragmentation are: Horizontal Vertical Mixed Mixed Fragmentation

Horizontal partitioning means that a record is stored at every location. Vertical partitioning means that the parts of the record are stored in different locations. c. Available Network The design of distributed database systems is strongly influenced by the type of underlying WAN or LAN. Distributed database systems involving vertical partitioning can run only on those networks that are connected continuously - at least during the hours when the distributed database is operational. Networks that are not continuously connected typically do not allow transactions across sites, but may keep local copies of remote data and refresh the copies periodically. For Page 170

Advanced RDBMS
example, a nightly backup might be taken. For applications where consistency is not critical, this is acceptable. This is also acceptable for systems involving horizontal partitioning of the data. d. Transaction Management This is used when vertical partitioning is used and special techniques must be applied in order to ensure that the transaction is applied in two different databases so as not to cause inconsistency. This technique is called the two-phase commit. It is recommended that the DBMS vendor provide the distributed transaction management software. The supplier should not attempt to write transaction management code nor buy a third party product for such a purpose. e. Replication Replication is the process of synchronizing several copies of the same records or record fragments located at different sites and is used to increase the availability of data and to speed query evaluation. The supplier must lay out a detailed Replication Plan including

The partitioning of the data and how to select data field names and key values so as not to cause conflicts between sites The timing of the replication (i.e., synchronous vs. asynchronous) Resolution of potentially conflicting updates at different sites and ways for detecting them

Note that suppliers feel that they can handle replication and especially an asynchronous one (i.e., copying numerous records from one database to the other). Unless such activities are labeled remote backups, it is recommended that the DBMS vendor provide the replication software. The supplier should not attempt to write replication code nor buy a third party product for such a purpose. Types of Distributed Databases 1. Homogeneous DDBMS: All sites use same DBMS product (eg.Oracle) Fairly easy to design and manage. 2. Heterogeneous DDBMS: Sites may run different DBMS products (eg. Oracle and Ingress) Possibly different underlying data models (eg. Relational DB and OO database) Occurs when sites have implemented their own databases and integration is considered later. Query Processing in Distributed Databases

Page 171

Advanced RDBMS
Let us understand the Query Processing with respect to Employee and Department Relations with no fragmentations. The processing of a Distributed Query can be done based on the following strategies: 1. Transfer the Employee and Department Relations to the result site. 2. Transfer the Employee relation to Site A where Department relation is located, execute the query and send the result to the output site. 3. Transfer the Department relation to Site B where Employee relation is located, execute the query and send the result to the output site. Concurrency Control and Recovery in DDS Concurrency control is dealing with multiple copies of data items making a copy consistent with other copies if a site on which copies stored fails and recovers later. Recovery in DDS is taken care in terms of the following: Failure of Individual sites when a site recovers its local data must be brought upto date. Failure of communication Link the system must be able to deal with failure of one or more communication links. Distributed Commit Problem is usually solved two-phase commit protocol. Distributed Deadlock Techniques for dealing with deadlocks must be followed. Assume that you and I both read the same row from the Customer table, we both change the data, and then we both try to write our new versions back to the database. Whose changes should be saved? Yours? Mine? Neither? A combination? Similarly, if we both work with the same Customer object stored in a shared object cache and try to make changes to it, what should happen? To understand how to implement concurrency control within your system you must start by understanding the basics of collisions you can either avoid them or detect and then resolve them. The next step is to understand transactions, which are collections of actions that potentially modify two or more entities. On modern software development projects, concurrency control and transactions are not simply the domain of databases, instead they are issues that are potentially pertinent to all of your architectural tiers. a. Collisions In Implementing Referential Integrity and Shared Business Logic, the referential integrity challenges are implemented that result from there being an object schema that is mapped to a data schema, which is a cross-schema referential integrity problems. With respect to collisions things are a little simpler, we only need to worry about the issues with ensuring the consistency of entities within the system of record. The system of record is the location where the official version of an entity is located. This is often data stored within a relational database although other representations, such as an XML structure or an object, are also viable.

Page 172

Advanced RDBMS
A collision is said to occur when two activities, which may or may not be full-fledged transactions, attempt to change entities within a system of record. There are three fundamental ways 1. Dirty read. Activity 1 (A1) reads an entity from the system of record and then updates the system of record but does not commit the change (for example, the change hasnt been finalized). Activity 2 (A2) reads the entity, unknowingly making a copy of the uncommitted version. A1 rolls back (aborts) the changes, restoring the entity to the original state that A1 found it in. A2 now has a version of the entity that was never committed and therefore is not considered to have actually existed. 2. Non-repeatable read. A1 reads an entity from the system of record, making a copy of it. A2 deletes the entity from the system of record. A1 now has a copy of an entity that does not officially exist. 3. Phantom read. A1 retrieves a collection of entities from the system of record, making copies of them, based on some sort of search criteria such as all customers with first name Bill.A2 then creates new entities, which would have met the search criteria (for example, inserts Bill Klassen into the database), saving them to the system of record. If A1 reapplies the search criteria it gets a different result set. b. Locking Strategies So what can you do? First, you can take a pessimistic locking approach that avoids collisions but reduces system performance. Second, you can use an optimistic locking strategy that enables you to detect collisions so you can resolve them. Third, you can take an overly optimistic locking strategy that ignores the issue completely. Pessimistic locking: is an approach where an entity is locked in the database for the entire time that it is in application memory (often in the form of an object). A lock either limits or prevents other users from working with the entity in the database. Optimistic locking: With multi-user systems it is quite common to be in a situation where collisions are infrequent. For example, a case where two people are working with Customer objects, but with different customers and therefore they wont collide. When this is the case optimistic locking becomes a viable concurrency control strategy. The idea is that you accept the fact that collisions occur infrequently, and instead of trying to prevent them you simply choose to detect them and then resolve the collision when it does occur. Overly Optimistic Locking: With the strategy you neither try to avoid nor detect collisions, assuming that they will never occur. This strategy is appropriate for single user systems, systems where the system of record is guaranteed to be accessed by only one user or system process at a time, or read-only tables. These situations do occur. It is important to recognize that this strategy is completely inappropriate for multi-user systems.

Page 173

Advanced RDBMS

5.2.3. Overview of client server architecture and its relationship to distributed databases Evolving of Client-Server Architecture The term client/server was first used in the 1980s in reference to personal computers (PCs) on a network. The actual client/server model started gaining acceptance in the late 1980s. The client/server software architecture is a versatile, message-based and modular infrastructure that is intended to improve usability, flexibility, interoperability, and scalability as compared to centralized, mainframe, time sharing computing. A client is defined as a requester of services and a server is defined as the provider of services. A single machine can be both a client and a server depending on the software configuration. Mainframe architecture (not a client/server architecture). With mainframe software architectures all intelligence is within the central host computer. Users interact with the host through a terminal that captures keystrokes and sends that information to the host. File sharing architecture (not a client/server architecture). The original PC networks were based on file sharing architectures, where the server downloads files from the shared location to the desktop environment. The requested user job is then run (including logic and data) in the desktop environment. File sharing architectures work if shared usage is low, update contention is low, and the volume of data to be transferred is low. Client/server architecture. As a result of the limitations of file sharing architectures, the client/server architecture emerged. This approach introduced a database server to replace the file server. Using a relational database management system (DBMS), user queries could be answered directly. The client/server architecture reduced network traffic by providing a query response rather than total file transfer. It improves multi-user updating through a GUI front end to a shared database. In client/server architectures, Remote Procedure Calls (RPCs) or standard query language (SQL) statements are typically used to communicate between the client and server. Relationship between Client-Server Architecture and Distributed Databases Three tier architectures. The three tier architecture (see Three Tier Software Architectures) (also referred to as the multi-tier architecture) emerged to overcome the limitations of the two tier architecture. In the three tier architecture, a middle tier was added between the user system interface client environment and the database management server environment. There are a variety of ways of implementing this middle tier, such as transaction processing monitors, message servers, or application servers. The middle tier can perform queuing, application execution, and database Page 174

Advanced RDBMS
staging. For example, if the middle tier provides queuing, the client can deliver its request to the middle layer and disengage because the middle tier will access the data and return the answer to the client. In addition the middle layer adds scheduling and prioritization for work in progress. The three tier client/server architecture has been shown to improve performance for groups with a large number of users (in the thousands) and improves flexibility when compared to the two tier approach. Flexibility in partitioning can be a simple as "dragging and dropping" application code modules onto different computers in some three tier architectures. A limitation with three tier architectures is that the development environment is reportedly more difficult to use than the visually-oriented development of two tier applications. Three tier architecture with transaction processing monitor technology. The most basic type of three tier architecture has a middle layer consisting of Transaction Processing (TP) monitor technology (see Transaction Processing Monitor Technology). The TP monitor technology is a type of message queuing, transaction scheduling, and prioritization service where the client connects to the TP monitor (middle tier) instead of the database server. The transaction is accepted by the monitor, which queues it and then takes responsibility for managing it to completion, thus freeing up the client. When the capability is provided by third party middleware vendors it is referred to as "TP Heavy" because it can service thousands of users. Three tier with message server. Messaging is another way to implement three tier architectures. Messages are prioritized and processed asynchronously. Messages consist of headers that contain priority information, and the address and identification number. The message server connects to the relational DBMS and other data sources. Three tier with an application server. The three tier application server architecture allocates the main body of an application to run on a shared host rather than in the user system interface client environment. The application server does not drive the GUIs; rather it shares business logic, computations, and a data retrieval engine. Three tier with an ORB architecture. Currently industry is working on developing standards to improve interoperability and determine what the common Object Request Broker (ORB) will be. Developing client/server systems using technologies that support distributed objects holds great pomise, as these technologies support interoperability across languages and platforms, as well as enhancing maintainability and adaptability of the system. There are currently two prominent distributed object technologies:

Common Object Request Broker Architecture (CORBA) COM/DCOM (see Component Object Model (COM), DCOM, and Related Capabilities).

Industry is working on standards to improve interoperability between CORBA and COM/DCOM. The Object Management Group (OMG) has developed a mapping between CORBA and COM/DCOM that is supported by several products.

Page 175

Advanced RDBMS

Security Problems Unique to Distributed Database Management Systems Centralized or Decentralized Authorization In developing a distributed database, one of the first questions to answer is where to grant system access. Bell and Grisom [BellGris92] outline two strategies: (1) Users are granted system access at their home site. (2) Users are granted system access at the remote site. The first case is easier to handle. It is no more difficult to implement than a centralized access strategy. Bell and Grisom point out that the success of this strategy depends on reliable communication between the different sites (the remote site must receive all of the necessary clearance information). Since many different sites can grant access, the probability of unauthorized access increases. Once one site has been compromised, the entire system is compromised. If each site maintains access control for all users, the impact of the compromise of a single site is reduced (provided that the intrusion is not the result of a stolen password). The second strategy, while perhaps more secure, has several disadvantages. Probably the most glaring is the additional processing overhead required, particularly if the given operation requires the participation of several sites. Furthermore, the maintenance of replicated clearance tables is computationally expensive and more prone to error. Finally, the replication of passwords, even though they're encrypted, increases the risk of theft. A third possibility offered by Woo and Lam [WooLam92] centralizes the granting of access privileges at nodes called policy servers. These servers are arranged in a network. When a policy server receives a request for access, all members of the network determine whether to authorize the access of the user. Woo and Lam believe that separating the approval system from the application interface reduces the probability of compromise. a. Integrity Preservation of integrity is much more difficult in a heterogeneous distributed database than in a homogeneous one. The degree of central control dictates the level of difficulty with integrity constraints (integrity constraints enforce the rules of the individual organization). The homogeneous distributed database has strong central control and has identical DBMS schema. If the nodes in the distributed network are heterogeneous (the DBMS schema and the associated organizations are dissimilar), several problems can arise that will threaten the integrity of the distributed data. They list three problem areas: 1. Inconsistencies between local integrity constraints, 2. Difficulties in specifying global integrity constraints, 3. Inconsistencies between local and global constraints [BellGris92].

Page 176

Advanced RDBMS
Bell and Grisom explain that local integrity constraints are bound to differ in a heterogeneous distributed database. The differences stem from differences in the individual organizations. These inconsistencies can cause problems, particularly with complex queries that rely on more than one database. Development of global integrity constraints can eliminate conflicts between individual databases. Yet these are not always easy to implement. Global integrity constraints on the other hand are separated from the individual organizations. It may not always be practical to change the organizational structure in order to make the distributed database consistent. Ultimately, this will lead to inconsistencies between local and global constraints. Conflict resolution depends on the level of central control. If there is strong global control, the global integrity constraints will take precedence. If central control is weak, local integrity constraints will. 5.2.4 Distributed Databases in Oracle The benefits of the site autonomy in an Oracle distributed database include Nodes of the system can mirror the logical organization of companies or groups that need to maintain independence Local administrators control corresponding local data . Therefore, each database administrators domain of responsibility is smaller and more manageable. Independent Failures are less likely to disrupt other nodes of the distributed database. No single database failure need halt all distributed operations or be a performance bottleneck Administrators can recover from isolated system failures independent of other nodes in the system. A data dictionary exists for each local database- a global catalog is not necessary to access local data Nodes can upgrade software independently. Future prospects of Client-Server Technology The database server is the Oracle software managing a database and a client is an application that requests information from a server. Each computer in a network is a node that can host one or more databases. Each node in a distributed database system can act as a client, a server or both depending on the situation. The host for the HQ database is acting as a database server when a statement is issued against its local data, but is acting as a client when it issues a statement against remote data Page 177

Advanced RDBMS
Since there is a ever growing technology and the development of Distributed Data processing and Database management the growth of Client Server technology is very promising. 5.2.5 Deductive Databases What is a deductive database system? A deductive database can be defined as an advanced database augmented with an inference system.

Database + Inference

Deductive database

By evaluating rules against facts, new facts can be derived, which in turn can be used to answer queries. It makes a database system more powerful. Some basic concepts from logic

To understand the deductive database system well, some basic concepts from mathematical logic are needed. term n-ary predicate literal (well-formed) formula clause and Horn-clause facts logic program term A term is a constant, a variable or an expression of the form f(t1, t2, ..., tn), where t1, t2, ..., tn are terms and f is a function symbol. Example: a, b, c, f(a, b), g(a, f(a, b)), x, y, g(x, y) n-ary predicate An n-ary predicate symbol is a symbol p appearing in an expression of the form p(t1, t2, ..., tn), called an atom, where t1, t2, ..., tn are terms. p(t1, t2, ..., tn) can only evaluate to true or false. -Example: p(a, b), q(a, f(a, b)), p(x, y) literal A literal is either an atom or its negation. -Example: p(a, f(a, b)), p(a, f(a, b))

Page 178

Advanced RDBMS
(well-formed) formula

-A well-formed (logic) formula is defined inductively as follows: - An atom is a formula. - If P and Q are formulas, then so are P, (P Q), (P (PQ), and (PQ). Q), - If x is a variable and P is a formula containing x, then (xP) and ( xP) are formulas. clause A clause is an expression of the following form: A1 A2 ... An B1 ... Bm where Ai and Bj are atoms. The above expression can be written in the following equivalent form: B1 ... Bm A1 ... An

consequent
or B1, ..., Bm A1 , ..., An

antecedent

A 1 0 1 0
-

B 1 1 0 0

A B 1 1 0 1

A 1 0 1 0

B 1 1 0 0

B A 1 1 0 1

Horn clause A Horn clause is a clause with the head containing only one positive atom. Bm A1 , ..., An fact A fact is a special Horn clause of the following form: Page 179

Advanced RDBMS
B with all variables in B being instantiated. (B can be simply written as B.) logic program A logic program is a set of Horn clauses. Facts: supervise(franklin, john), supervise(franklin, ramesh), supervise(franklin, joyce) supervise(james, franklin), supervise(jennifer, alicia), supervise(jennifer, ahmad), supervise(james, jennifer).

Rules: superior(X, Y) supervise(X, Y), superior(X, Y) supervise(X, Z), superior(Z, Y), subordinary(X, Y) superior(Y, X). Basic inference mechanism for logic programs interpretation of programs (rules + facts) There are two main alternatives for interpreting the theoretical meaning of rules: proof theoretic, and model theoretic interpretation Proof Theoretic Interpretation 1. The facts and rules are considered to be true statements, or axioms. facts - ground axioms rules - deductive axioms 2. The deductive axioms are used to construct proofs that derive new facts from existing facts. Example: 1. superior(X, Y) supervise(X, Y). (rule 1) 2. superior(X, Y) supervise(X, Z), superior (Z, Y). (rule 2) 3. 4. supervise(jennifer, ahmad). supervise(james, jennifer). (ground axiom, given) (ground axiom, given)

5. 6.

superior(jennifer, ahmad). superior(james, ahmad). Page 180

(apply rule 1 on 3) (apply rule 2 on 4 and 5)

Advanced RDBMS
Model Theoretic Interpretation 1. Given a finite or an infinite domain of constant values, assign to each predicate in the program every possible combination of values as arguments. 2. All the instantiated predicates contitute a Herbrand base. 3. An interpretation is a subset of the Herbrand base. 4. In the Herbrand base, each instantiated predicate evaluates to true or false in terms of the given facts and rules. 5. An interpretation is called a model for a specific set of rules and the corresponding facts if those rules are always true under that interpretation. 6. A model is a minimal model for a set of rules and facts if we cannot change any element in the model from true to false and still get a model for these rules and facts. Example: 1. superior(X, Y) supervise(X, Y). (rule 1) (rule 2)

2. superior(X, Y) supervise(X, Z), superior(Z, Y). known facts:

supervise(franklin, john), supervise(franklin, ramesh), supervise(franklin, joyce), supervise(james, franklin), supervise(jennifer, alicia), supervise(jennifer, ahmad), supervise(james, jennifer). For all other possible (X, Y) combinations supervise(X, Y) is false. domain = {james, franklin, john, ramesh, joyce, jennifer, alicia, ahmad} Interpretation - model - minimal model known facts: supervise(franklin, john), supervise(franklin, ramesh), supervise(franklin, joyce), supervise(james, franklin), supervise(jennifer, alicia), supervise(jennifer, ahmad), supervise(james, jennifer). For all other possible (X, Y) combinations supervise(X, Y) is false. derived facts: superior(franklin, john), superior(franklin, ramesh),

Page 181

Advanced RDBMS
superior(franklin, joyce), superior(jennifer, alicia), superior(jennifer, ahmad), superior(james, franklin), superior(james, jennifer), superior(james, john), superior(james, ramesh), superior(james, joyce), superior(james, alicia), superior(james, ahmad). For all other possible (X, Y) combinations superior(X, Y) is false. The above interpretation is also a model for the rules (1) and (2) since each of them evaluates always to true under the interpretation. For example, superior(X, Y) supervise(X, Y) superior(franklin, john) supervise(franklin, john) is true. superior(franklin, ramesh) supervise(franklin, ramesh) is true. ... superior(X, Y) supervise(X, Z), superior(Z, Y) superior(james, ramesh) supervise(james, franklin), superior (franklin, ramesh) is true. superior(james, alicia) supervise(james, jennifer), superior (jennifer, alicia) is true The model is also the minimal model for the rule (1) and (2) and the corresponding facts since eliminating any element from the model will make some facts or instatiated rules evaluate to false. For example, eliminating supervise(franklin, john) from the model will make this fact no more true under the interpretation; eliminating superior (james, ramesh) will make the following rule no more true under the interpretation: superior(james, ramesh) supervise(james, franklin), superior(franklin, ramesh) Inference mechanism

In general, there are two approaches to evaluating logicalprograms: bottom-up and top-down.

Page 182

Advanced RDBMS
a. Bottom-up mechanism 1. The inference engine starts with the facts and applies the rules to generate new facts. That is, the inference moves forward from the facts toward the goal. 2. As facts are generated, they are checked against the query predicate goal for a match. Example query goal: superior(james, Y)? rules and facts are given as above. 1.Check whether any of the existing facts directly matches the query. 2.Apply the first rule to the existing facts to generate new facts. 3.Apply the second rule to the existing facts to generate new facts. 4.As each fact is gnerated, it is checked for a match of the query goal. 5.Repeat step 1 - 4 until no more new facts can be found.

Example: 1. 2. superior(X, Y) supervise(X, Y). superior(X, Y) supervise(X, Z), superior(Z, Y). (rule 1) (rule 2)

known facts: supervise(franklin, john), supervise(franklin, ramesh), supervise(franklin, joyce), supervise(james, franklin), supervise(jennifer, alicia), supervise(jennifer, ahmad), supervise(james, jennifer). For all other possible (X, Y) combinations supervise(X, Y) is false. domain = {james, franklin, john, ramesh, joyce, jennifer, alicia, ahmad} superior(james, Y)? applying the first rule: superior(james, franklin), superior(james, jennifer) Y = {franklin, jennifer} applying the second rule: Y = {John, Joyce, Ramesh, alicia, ahmad}

Page 183

Advanced RDBMS
b. Top-down mechanism (also called back chaining and top-down resolution) 1. The inference engine starts with the query goal and attempts to find matches to the variables that lead to valid facts in the database. That is, the inference moves backward from the intended goal to determine facts that would satisfy the goal. 2. During the course, the rules are used to generate subgoals. The matching of these subgoals will lead to the match of the intended goal. 5.2.6 Prolog/Datalog Notation Predicate has Rule Query Involves a predicate symbol followed by y some variable arguments to answer the question where :- is read as if and only iff E.g., SUPERIOR(james,Y)? E.g., SUBORDINATE(james,X)? (b) Supervisory tree James franklin John Ramesh joyce Jennifer Alicia Ahamed Is of the form head :- body where :- is read as if and only iff E.g., SUPERIOR(X,Y) :- SUPERVISE(X,Y) E.g., SUBORDINATE(Y,X) :- SUPERVISE(X,Y) a name a fixed number of arguments Convention: Constants are numeric or character strings Variables start with upper case letters E.g., SUPERVISE(Supervisor, Supervisee) States that Supervisor SUPERVISE(s) Supervise

(a) Prolog notation supervise(franklin, john), supervise(franklin, ramesh), supervise(franklin, joyce), supervise(james, franklin), supervise(jennifer, alicia), supervise(jennifer, ahmad),

Page 184

Advanced RDBMS
supervise(james, jennifer). Interpretation of Rules There are two main alternatives for interpreting rules: Proof-theoretic Model-theoretic Proof-theoretic: Facts and rules for Ground axioms. Ground axioms contain no variables Rules for deductive axioms are - Deductive axioms can be used to construct new facts from existing facts. This process is known as theorem proving or Proving a new fact Model-theoretic: Given a finite or infinite domain of constant values, we assign the predicate every combination of values as arguments. If this is done fro every predicated, it is called interpretation Model: An interpretation for a specific set of rules a. Model-theoretic proofs Whenever a particular substitution to the variables in the rules is applied, if all the predicated are true under the interpretation, the predicate at the head of the rule must also be true b. Minimal model Cannot change any fact from true to false and still get a model for these rules 5.2.7. Basic interface mechanism for logic programs The Resource Description Framework (RDF) Model&Syntax Specification describes a metadata infrastructure which can accommodate classification elements from different vocabularies i.e. schemas. The underlying model consists of a labeled directed acyclic graph which can be linearized into eXtensible Markup Language (XML) transfer syntax for interchange between applications. Query Languages In general, query languages are formal languages to retrieve data from a database. Standardized languages already exist to retrieve information from different types of databases such as Structured Query Language (SQL) for relational databases and Object Query Language (OQL) and SQL3 for object databases.

Page 185

Advanced RDBMS
Semi-structure query languages such as XML-QL [3] operate on the document level structure. Logic programs consist of facts and rules where valid inference rules are used to determine all the facts that apply within a given model. With RDF, the most suitable approach is to focus on the underlying data model. Even though XML-QL could be used to query RDF descriptions in their XML encoded form, a single RDF data model could not be correctly determined with a single XML-QL query due to the fact that RDF allows several XML syntax encodings for the same data model. The Metalog Approach RDF provides the basis for structuring the data present in the web in a consistent and accurate way. However, RDF is only the first step towards the construction of what Tim Berners-Lee calls the "web of knowledge", a World Wide Web where data is structured, and users can fully benefit by this structure when accessing information on the web. RDF only provides the "basic vocabulary" in which data can be expressed and structured. Then, the whole problem of accessing an managing these data structured arises. Metalog provides a "logical" view of metadata present on the web. The Metalog approach is composed by several components. In the first component, a particular data semantics is established. Metalog provides way to express logical relationships like "and", "or" and so on, and to build up complex inference rules that encode logical reasoning. This "semantic layer" builds on top of RDF using a so-called RDF schema. The second component consists of a "logical interpretation" of RDF data (optionally enriched with the semantic schema) into logic programming. This way, the understood semantics of RDF is unwielded into its logical components (a logic program, indeed). This means that every reasonment on RDF data can be performed acting upon the corresponding logical view, the logic program, providing a neat and powerful way to reason about data.

Page 186

Advanced RDBMS
The third component is a language interface to writing structured data and reasoning rules. In principle, the first component already suffices: data and rules can be written directly in RDF, using RDF syntax and the metalog schema. RDF syntax aims at being more an encoding language rather than a user-friendly language, and it is well recognised in the RDF community and among vendors that the typical applications will provide more user-friendly interfaces between the "raw RDF" code and the user. Another important feature of the language, in this respect, is indeed that it can be used just as an interface to RDF, without the metalog extensions. This way, users will be able to access and structure metadata using RDF in a smooth and seamless way, using the metalog language. The Metalog Schema The first correspondence in Metalog is between the basic RDF data model and the predicates in logic. The RDF data model consists of so-called statements Statements are triples where there is a subject (the "resource"), a predicate (the "property"), and an object (the "literal"). Metalog views an RDF statement in the logical setting as just a binary predicate involving the subject and the literal. For example, the RDF statement is seen in logic programming as the predicate. Once established the basic correspondence between the basic RDF data model and predicates in logic, the next step comes easy: we can extend RDF so that the mapping to logic is able to take advantage of all of the logical relationships present in logical systems: that is to say, behind the ability of expresing static facts, we want the ability to encode dynamic reasoning rules, like in logic programming. In order to do so, we need at least:

the standard logical connectors (and, or, not) variables

The metalog schema extends plain RDF with this "logical layer", enabling to express arbitrary logical relationships within RDF. In fact, the metalog schema provides more accessories besides the aforementioned basic ones (like for example, the "implies" connector): anyway, not to heaven the discussion, we don't go into further details on this Page 187

Advanced RDBMS
topic. What the reader should keep in mind is just that the Metalog schema provides the "meta-logic" operators to reason with RDF statements. Technically, this is quite easy to do: the metalog schema is just a schema as defined by the RDF schema specification where, for example, and and or are subinstances of the RDF Bag connector. The mapping between "metalog RDF" and logical formulas is then completely natural: for each RDF statement that does not use a metalog connector, there is a corresponding logical predicate as defined before. Then, the metalog connectors are translated into the corresponding logical connectors in the natural way (so, for instance, the metalog and connector is mapped using logical conjunction, while the metalog or connector is mapped using logical disjunction). The Metalog Syntax Note that the RDF metalog schema and the corresponding translation into logical formulas is absolutely general. However, in practicse, one need also to then be able to process the resulting logical formulas in an effective ways. In other words, while the RDF metalog schema nicely extends RDF with the full power of first order predicate calculus, thus increasing by far the expressibility of basic RDF, there is still the other, computational, side of the coin: how to process and effectively reason with all these logical inference rules. It is well known that in general dealing with full first order predicate calculus is totally unfeasable computationally. So, what we would like to have is a subset of predicate calculus. The third level is then the actual syntax interface between the user and this "metalog RDF" encoding, with the constraint that the expressibility of the language must fit within the one provided by logic programming. The metalog syntax has been explicitly designed with the purpose of being totally natural-language based, trying to avoid any possible technicalities, and therefore making the language extrememly readable and self-descriptive.

Page 188

Advanced RDBMS
The way metalog reaches this scope is by a careful use of upper/lower case, quotes, and by allowing a rather liberal positioning of the keywords (an advanced parser then disambiguates the keywords from each metalog program line). Datalog programs and their evaluation 1. 2. 3. A Datalog program is a logic program. In a Datalog program, each predicate contains no function symbols. A Datalog program normally contains two kinds of predicates:

fact-based predicates and rule-based predicates. fact-based predicates are defined by listing all the combinations of values that make the predicate true. Rule-based predicates are defined to be the head of one or more Datalog rules. They correspond to virtual relations whose contents can be inferred by the inference engine. Example: -All the programs discussed earlier are Datalog programs. superior(X, Y) supervise(X, Y). superior (X, Y) supervise(X, Z), superior (Z, Y). supervise(jennifer, ahmad). supervise(james, jennifer). The following is a logic program, but not a Datalog program: p(X, Y) q(f(Y), X) two important concepts: - safety of programs - predicate dependency graph Safety of programs A Datalog program or a rule is said to be safe if it generates a finite set of facts. -Condition of unsafety A rule is unsafe if one of the variables in the rule can range over an infinite domain of values, and that variable is not limited to ranging over a finite predicate before it is instantiated.

Page 189

Advanced RDBMS
-Example: bigfisalary(Y) Y > 60000. bigfisalary(Y) Y > 60000, employee(X), salary(X, Y). The evaluation of these rules (no matter whether in bottom- up or in top-down fashion) will never terminate. The following is a safe rule: bigfisalary(Y) employee(X), salary(X, Y), Y > 60000. A variable X is limited if (1) it appears in a regular (not built-in) predicate in the body of the rule. (built-in predicates: <, >, , , =, ) (2) it appears in a predicate of the form X = c or c = X, where c is a constant. (3) it appears in a predicate of the form X = Y or Y = X in the rule body, where Y is a limited variable. (4) Before it is instantiated, some other regular predicates containing it will have been evaluated. Condition of safety: A rule is safe if each variable in it is limited. A program is safe if each rule in it is safe. Predicate Dependency graphs For a program P, we construct a dependency graph G representing a refer to relationship between the predicates in P. This is a directed graph where there is node for each predicate and an arc from node q to node p if and only if the predicate q occurs in the body of a rule whose head predicate is p. Example: superior(X, Y) supervise(X, Y), superior(X, Y) supervise(X, Z), superior(Z, Y), subordinary(X, Y) superior(Y, X), supervisor(X, Y) employee(X), supervise(X, Y), overfi40Kfiemp(X) employee(X), salary(X, Y), Y 40000, underfi40Kfisupervisor(X) supervisor(X), not(overfi40Kfiemp(X)), mainfiproductx fiemp(X ) employee(X), workson(X, productx, Y), Y 20, Page 190

Advanced RDBMS
president(X) employee(X), not(supervise(Y, X)). Evaluation of nonrecursive rules -If the dependency graph for a rule set has no cycles, the rule set is nonrecursive. -Evaluation involving only rule-based predicate 2.Single rule evaluation To evaluate a rule of the from: p p1, ..., pn we first compute the relations corresponding to p1, ..., pn corresponding to p.

and then the relation

3. All the rules will be evaluated along the predicate dependency graph. At each step, each rule will be evaluated in terms of step (2). -The general bottom-up evaluation strategy for a nonrecursive query ?-p(x1, x2, , xn) 1. Locate a set of rules S whose head involves the predicate p. If there are no such rules, then p is a fact-based predicate corresponding to some database relation Rp; in this case, one of the following expression is returned and the algorithm is terminated. (a) (b) If all arguments in p are distinct variables, the relational expression returned is Rp. If some arguments are constants or if the same variable appears in more than one argument position, the expression returned is

SELECT<condition>(Rp), where the <condition> is a conjunctive condition made up of a number of simple conditions connected by AND, and constructed as follows: i. if a constant c appears as argument i, include a simple condition ($i = c) in the conjuction. include a condition

ii. if the same variable appears in both argument location j and k, ($j = $k) in the conjuction.

2. At this point, one or more rules Si, i = 1, 2, ..., n, n > 0 exist with predicate p as their head. For each such rule Si, generate a relational expression as follows:

Page 191

Advanced RDBMS
a.Apply selection operation on the predicates in the body for each such rule, as discussed in Step 1(b). b.A natural join is constructed among the relations that correspond to the predicates in the body of the rule Si over the common variables. Let the resulting relation from this join be Rs. c. If any built-in predicate XY was defined over the arguments X and Y, the result of the join is subjected to an additional selection: SELECT XY(Rs) d. Repeat Step 2(c) until no more built-in predicates apply. 3. Take the UNION of the expressions generated in Step 2 Evaluation of recursive rules -If the dependency graph for a rule set has at least one cycle, the rule set is recursive. ancestor(X, Y) parent(X, Y), parent(X, Z), ancestor(Z, Y). naive strategy semi-naive strategy stratified databases some teminology for recursive queries linearly recursive left linearly recursive ancestor(X, Y) ancestor(X, Z), parent(Z, Y) right linearly recursive ancestor(X, Y) parent(X, Z), ancestor(Z, Y) non-linearly recursive Y) sg(X, Z), sibling(Z, W), sg(W, Y) some teminology for recursive queries extensional database (EDB) predicate ancestor(X, Y)

sg(X,

An EDB predicate is a predicate whose relation is stored in the database - fact-based predicate. intensional database (IDB) predicate An IDB predicate is a predicate whose relation is defined by logic rules - rule-based predicate. Datalog equation A Datalog equation is an equation obtained by replacing and with = and in a rule, respectively. Page 192

Advanced RDBMS
a(X, Y) = p(X, Y) X,Y(p(X, Z) a. naive strategy Consider the following equation system: Ri = Ei(R1, ..., Ri, ..., Rn) (i = 1, ..., m) which is formed by replacing the symbol with an equality sign in a Datalog program. b. Algorithm Jacobi naive strategy input: A system of algebraic equations and EDB output: The values of the variable relations: R1, ..., Ri, ..., Rn. for i = 1 to n do Ri := ; repeat Con := true; for i = 1 to n do Si := Ri; for i = 1 to m do {Ri := Ei(S1, ..., Si, ..., Sn); if Ri Si then {Con := false; Si := Ri;}} until Con = true; naive strategy sg(X, Y) sg(X, W), sibling(W, Z), sg(Z, Y) sibling(X, Y) parent(X, W), sibling(W, Z), parent(Y, Z) evaluation of recursive queries semi-naive strategy 1. The semi-naive evaluation method is a bottom-up strategy. 2. It is designed to eliminate redundancy in the evaluation of tuples at different iterations. Let Ri(k) be the temporary value of relation Ri at iteration step k. The differential of Ri between step k and step k - 1 is defined as follows: Di(k) = Ri(k) - Ri(k-1) For a linearly recursive rule set, Di(k) can be substituted for Ri in the k-th iteration of the nave algorithm. 3.The result is obtained by the union of the newly obtained term Ri and that obtained in the previous step. c. Algorithm seminaive strategy a(Z, Y))

Page 193

Advanced RDBMS
input: A system of algebraic equations and EDB. output: The values of the variable relations: R1, ..., Ri, ..., Rn. for i = 1 to n do Ri := ; for i = 1 to m do Di := ; repeat Con := true; for i = 1 to n do {Di := E(D1, ..., Di, ..., Dn) - Ri; Ri := Di Ri; if Di then Con := false; } until Con is true; Example: Step 0: D0 = , A0 = ; Step 1: D1 = P = {(bert, alice), (bert, george), (alice, derek), (alice, part), (derek, frank)} A1 = D1 A0 = {(bert, alice), (bert, george), (alice, derek), (alice, part), (derek, frank)} Step 2: D2 = {(bert, derek), (bert, pat), (alice, frank)} A2 = D2 A1 = {(bert, alice), (bert, george), (alice, derek), (alice, part), (derek, frank), {(bert, derek), (bert, pat),(alice, frank)} Example: Step 3: D3 = {(bert, frank) A3 = D3 A2 = {(bert, alice), (bert, george), (alice, derek), (alice, part), (derek, frank), {(bert, derek), (bert, pat),(alice, frank), (bert, frank)} Step 3: D4 = . The advantage of the semi-naive method is that at each step a differential term Di is used in each equation instead of the whole Ri. In this way, the time complexity of a computation is decreased drastically. -The magic-set rule rewriting technique 1. During a bottom-up evaluation, too many irrelevant tuples are evaluated. following rules:

For example, to evaluate the query sg(john, Z)? using the

sg(X, Y) flat(X, Y), sg(X, Y) up(X, Z), sg(Z, W), down(W, Y),

Page 194

Advanced RDBMS
a bottom-up method will generate all sg-tuples and then makes a selection operation to the answers. 2. Using the constants appearing in the query to restrict computation.

d. Stratified databases A stratified database is a Datalog program containing negated predicates. Example: Suppose that a supplier might wish to backorder items that are not in the warehouse. It would be convenient to write: backorder(X) item(X), warehouse(X). Its logically equivalent form is backorder(X), warehouse item(X). But this rule has a different meaning : if X is an item, then backorder it or it is stored in the warehouse. This is not what we want. p(X) q(X), q(X) p(X). To avoid the recursion via negation, we introduce the concept of stratification, which is defined by the use of a level l mapping. level l mapping: assign each literal in the program an integer such that if B A1, , An and Ai is positive, then l(Ai) l(B) for all i, 1 i n. If Ai is negative, then l(B) < l(Ai) for all i, 1 i n. p(X) q(X), q(X) p(X). To avoid the recursion via negation, we introduce the concept of stratification, which is defined by the use of a level l mapping. level l mapping: assign each literal in the program an integer such that if B A1, , An and Ai is positive, then l(Ai) l(B) for all i, 1 i n. If Ai is negative, then l(B) < l(Ai) for all i, 1 i n. Problem: recursion via negation

Page 195

Advanced RDBMS
If you can assign integers to all the literals in a program using a level mapping, then this program is stratifiable. p(X) q(X), q(X) p(X). In fact, we cannot find a level mapping for any program which contains recursion via negation. Evaluate the literals in the program from low level to the high level. Example: path(X, Y) edge(X, Y), path(X,Y) edge(X, Z), path(Z, Y), acyclicfipath(X, Y) path(X,Y), path(Y, X). We can many label mappings for this program. The following are just two of them: Use the Relational Operations Many operations of relational algebra can be defined in the for of Datalog rules that defined the result of applying these operations on database relations (fact predicates) Relational queries and views can be easily specified in Datalog Evaluation of Non-recursive Datalog Queries Define an inference mechanism based on relational database query processing concepts 5.2.8 Deductive database systems Deductive database systems are database management systems whose query language and (usually) storage structure are designed around a logical model of data. As relations are naturally thought of as the \value" of a logical predicate, and relational languages such as SQL are syntactic sugarings of a limited form of logical expression, it is easy to see deductive database systems as an advanced form of relational systems. The deductive systems do, however, share with the relational systems the important property of being declarative, that is, of allowing the user to query or update by saying what he or she wants, rather than how to perform the operation. However, you cannot find any level mapping for the following program:

Page 196

Advanced RDBMS
Declarativeness is now being recognized as an important driver of the success of relational systems. As a result, we see deductive database technology, and the declarativeness it engenders, iterating other branches of database systems, especially the object-oriented world, where it is becoming increasingly important to interface objectoriented and logical paradigms in so-called DOOD (Declarative and Object-Oriented Database) systems. Another important thrust has been the problem of coping with negation or nonmonotonic reasoning, where classical logic does not over, through the conventional means of logical deduction, an adequate definition of what some very natural logical statements \mean" to the programmer. Objective Of Deductive Databases The objective of deductive databases is to provide efficient support for sophisticated queries and reasoning on large databases; toward this goal, they combine the technology of logic programming with that of relational databases. Deductive database research has produced methods and techniques for implementing the declarative semantics of logical rules via efficient computation of fixpoints. Also, advances in language design and nonmonotonic semantics were made to allow the use of negation and set-aggregates in recursive programs; these yield greater expressive power while retaining polynomial data complexity and semantic well-formedness. Deductive database systems have been used in data mining and other advanced applications, and their techniques have been incorporated into a new generation of commercial databases Deductive Object oriented Databases A deductive database system is a database system which can make deductions (ie: conclude additional rules or facts) based on rules and facts stored in the (deductive) database. Deductive database systems:

Mainly deal with rules and facts. Use a declarative language (such as Prolog) to specify those rules and facts. Use an inference engine which can deduce new facts and rules from those given.

A good example of a declarative language would be Prolog, but for databases Datalog is used more often. Datalog is both a syntactic subset of prolog and a database query language it is designed specifically for working with logic and databases. Deductive databases are also known as logic databases, knowledge systems and inferential databases. The problem domain of an expert system / deductive database is usually quite narrow. Deductive databases are similar to expert systems - traditional expert systems have assumed that all the facts and rules they need (their knowledge base) will be loaded into main memory, whereas a deductive database uses a database (usually on disk storage) as its knowledge base. Traditional expert systems have usually also taken their facts and rules from a real expert in their problem domain, whereas deductive databases

Page 197

Advanced RDBMS
find their knowledge inherent in the data. Deductive databases and expert systems are mainly used for:

Replicating the functionality of a real expert. Hypothesis testing. Knowledge discovery (finding new relationships between data).

Applications of Commercial Deductive Database Systems Notation, Definitions, and Some Basic Concepts Deductive database systems divide their information into two categories: 1. Data, or facts, that are normally represented by a predicate with constant arguments (by a ground atom). For example, the fact parent(joe; sue), means that Sue is a parent of Joe. Here, parent is the name of a predicate, and this predicate is represented extensionally, that is, by storing in the database a relation of all the true tuples for this predicate. Thus, (joe; sue) would be one of the tuples in the stored relation. Extensional and intensional databases. Here, sg is a predicate (\same-generation"), and the head of each of the two rules is the atomic formula p(X; Y ). X and Y are variables. The other predicates found in the rules are flat, up, and down. These are presumably stored extensionally, while the relation for sg is intensional, that is, defined only by the rules. Intensional predicates play a role similar to views in conventional database systems, although we expect that in deductive applications there will be large numbers of intensional predicates and rules defining them, far more than the number of views defined in typical database applications. The first rule can be interpreted as saying that individuals X and Y are at the same generation if they are related by the predicate flat, that is, if there is a tuple (X; Y ) in the relation for flat. The second rule says that X and Y are also at the same generation if there are individuals U and V such that: 1. X and U are related by the up predicate. 2. U and V are at the same generation. 3. V and Y are related by the down predicate. These rules thus define the notion of being at the same generation recursively. Since common implementations of SQL do not support general recursions such as this example

Page 198

Advanced RDBMS
without going to a host-language program, we see one of the important extensions of deductive systems: the ability to support declarative, recursive queries. The optimization of recursive queries has been an active research area, and has often focused on some important classes of recursion. We say that a predicate p depends upon a predicate q | not necessarily distinct from p | if some rule with p in the head has a subgoal whose predicate either is q or (recursively) depends on q. If p depends upon q and q depends upon p, p and q are said to be mutually recursive. A program is said to be linear recursive if each rule contains at most one subgoal whose predicate is mutually recursive with the headpredicate. Optimization Techniques Perhaps the hardest problem in the implementation of deductive database systems is designing the query optimizer. While for nonrecursive rules, the optimization problem is similar to that of conventional relational optimization, the presence of recursive rules opens up a variety of new options and problems. There is an extensive literature on the subject, and we shall attempt here to give only the most basic ideas and motivation. Sometimes, a more restrictive definition is used, requiring that no two distinct predicates can be mutually recursive, or even that there be at most one recursive rule in the program. We shall not worry about such distinctions. a. Magic Sets The problem addressed by the magic-sets rule rewriting technique is that frequently a query asks not for the entire relation corresponding to an intensional predicate, but for a small subset.. A top-down, or backward-chaining search would start from the query as a goal and use the rules from head to body to create more goals, and none of these goals would be irrelevant to the query, although some may cause us to explore paths that happen to \dead end," because data that would lead to a solution to the query happens not to be in the database. Prolog evaluation is the best known example of top-down evaluation. However, the Prolog algorithm, like all purely top-down approaches, sufiers from some problems. It is prone to recursive loops, it may perform repeated computation of some subgoals, and it is often hard to tell that all solutions to the query goal have been found. On the other hand, a bottom-up or forward-chaining search, working from the bodies of the rules to the heads, would cause us to infer facts that would never even be considered in the top-down search. Yet bottom-up evaluation is desirable because it avoids the problems of looping and repeated computation that are inherent in the top-down approach. Also, bottom-up approaches allow us to use set-at-a-time operations like

Page 199

Advanced RDBMS
relational joins, which may be made efficient for disk-resident data, while the pure topdown methods use tuple-at-a-time operations. Magic-sets is a technique that allows us to rewite the rules for each query form (i.e., which arguments of the predicate are bound to constants, and which are variable), so that the advantages of top-down and bottom-up methods are combined. That is, we get the focus inherent in top-down evaluation combined with the loopingfreedom, easy termination testing, and efficient evaluation of bottom-up evaluation. Magic-sets is a rule-rewriting technique. We shall not give the method, of which many variations are known and used in practice contains an explanation of the basic techniques, and the following example should suggest the idea. b. Other Rule-Rewriting Techniques There are a number of other approaches to optimization that sometimes yield better performance than magicsets. These optimizations include the counting algorithm [BMSU86, SZ86, BR87b], the factoring optimization [NRSU89, KRS90], techniques for deleting redundant rules and literals [NS89, Sag88], techniques by which \existential" queries (queries for which a single answer | any answer | suffices) can be optimized [RBK88], and \envelopes" [SS88, Sag90]. A number of researchers [IW88, ZYT88, Sar89, RSUV89] have studied how to transform a program that contains nonlinear rules into an equivalent one that contains only linear rules. c. Iterative Fixpoint Evaluation Most rule-rewriting techniques like magic-sets expect implementation of the rewritten rules by a bottom-up technique, where starting with the facts in the database, we repeatedly evaluate the bodies of the rules with whatever facts are known (including facts for the intensional predicates) and infer what facts we can from the heads of the rules. This approach is called naive evaluation. We can improve the eficiency of this algorithm by a simple \trick." If in some round of the repeated evaluation of the bodies we discover a new fact f, then we must have used, for at least one of the subgoals in the utilized rule, a fact that was discovered on the previous round. For if not, then f itself would have been discovered in a previous round. We may thus reorganize the substitution of facts for the subgoals so that at least one of the subgoals is replaced by a fact that was discovered in the previous round. d. Extensions of Horn-Clause Programs A deductive database query language can be enhanced by permitting negated subgoals in the bodies of rules. However, we lose an important property of our rules. When rules have the form introduced in Section 2, there is a unique minimal model of the rules and data. A model of a program is a set of facts such that for any rule, replacing body literals by facts in the

Page 200

Advanced RDBMS
model results in a head fact that is also in the model. Thus, in the context of a model, a rule can be understood as saying, essentially, \if the body is true, the head is true". A minimal model is a model such that no subset is a model. The existence of a unique minimal model, or least model, is clearly a fundamental and desirable property. Indeed, this least model is the one computed by naive or seminaive evaluation, as discussed in Section 3.3. Intuitively, we expect the programmer had in mind the least model when he or she wrote the logic program. However, in the presence of negated literals, a program may not have a least model. An Historical Overview of Deductive Databases The origins of deductive databases can be traced back to work in automated theorem proving and, later, logic programming. In an interesting survey of the early development of the field [Min87], Minker suggests that Green and Raphael [GR68] were the first to recognize the connection between theorem proving and deduction in databases. They developed a series of question-answering systems that used a version of Robinson's resolution principle [Rob65], demonstrating that deduction could be carried out systematically in a database context. 5 Other early systems included MRPPS, DEDUCE-2, and DADM. MRPPS was an interpretive system developed at Maryland by Minker's group from 1970 through 1978 that explored several search procedures, indexing techniques, and semantic query optimization. One of the first papers on processing recursive queries was [MN82]; it contained the first description of bounded recursive queries, which are recursive queries that can be replaced by nonrecursive equivalents. DEDUCE was implemented at IBM in the mid 1970's [Cha78], and supported left-linear recursive Horn-clause rules using a compiled approach. DADM [KT81] emphasized the distinction between EDB and IDB and studied the representation of the IDB in the form of 'connection graphs' | closely related to Sickel's interconnectivity graphs [Sic76] | to aid in the development of query plans. A landmark workshop on logic and deductive databases was organized by Gallaire, Minker and Nicolas at Toulouse in 1977, and several papers from the proceedings appeared in book form [GM78]. Reiter's influential paper on the closed world assumption (as well as a paper on compilation of rules) appeared in this book, as did Clark's paper on negation-as-failure and a paper by Nicolas and Yazdanian on checking integrity constraints. The workshop and the book brought together researchers in the area of logic and databases, and gave an identity to the field. (The workshop was also organized in subsequent years, with proceedings, and continued to influence the field.) In 1976, van Emden and Kowalski [vEK76] showed that the least fixpoint of a Hornclause logic program coincided with its least Herbrand model. This provided a firm foundation for the semantics of logic programs, and especially, deductive databases, since fixpoint computation is the operational semantics associated with deductive databases (at

Page 201

Advanced RDBMS
least, of those implemented using bottom-up evaluation). The early work focused largely on identifying suitable goals for the field, and on developing a semantic foundation. The next phase of development saw an increasing emphasis on the development of efficient query evaluation techniques. Henschen and Naqvi proposed one of the earliest efficient techniques for evaluating 5Cordell Green received a Grace Murray Hopper award from the ACM for his work The area of deductive databases has matured in recent years, and it now seems appropriate to react upon what has been achieved and what the future holds. In this paper, we provide an overview of the area and brief describe a number of projects that have led to implemented systems. Deductive systems are not the only class of systems with a claim to being an extension of relational systems. Prolog and Databases There are two points to consider: Prolog's depth- first evaluation strategy leads to infinite loops, even for positive programs and even in the absence of function symbols or arithmetic. In the presence of large volumes of data, operational reasoning is not desirable, and a higher premium is placed upon completeness and termination of the evaluation method. In a typical database application, the amount of data is sufficiently large that much of it is on secondary storage. Efficient access to this data is crucial to good performance. The worst problem is adequately addressed by memory extensions to Prolog evaluation. For example, one can efficiently extend the widely used Warren abstract machine Prolog architecture [War89]. The second problem turns out to be harder. The key to accessing disk data efficiently is to utilize the set-oriented nature of typical database operations and to tailor both the clustering of data on disk and the management of buffers in order to minimize the number of pages fetched from disk. Prolog's tuple-at-a-time evaluation strategy severely curtails the implementor's ability to minimize disk accesses by re-ordering operations. The situation can thus be summarized as follows: Prolog systems evaluate logic programs efficiently in main memory, but are tuple-at-a-time, and therefore inefficient with respect to disk accesses. In contrast, database systems implement only a nonrecursive subset of logic programs (essentially described by relational algebra), but do so efficiently with respect to disk accesses. The goal of deductive databases is to deal with a superset of relational algebra that includes support for recursion in a way that permits efficient handling of disk data. Evaluation strategies should retain Prolog's goal-directed avor, but be more set-at-a-time.

Page 202

Advanced RDBMS

5.3 Revision points


Triggers are executed when a specified condition occurs during insert /delete/update. Triggers are action that fire automatically based on these conditions. Triggers follow an Event-condition-action (ECA) model. Row level triggers - Executed separately for each affected row Statement-level triggers - Execute once for the SQL statement An Event can be considered as : - Immediate, Deferred and Detached. R-Trees is a technique for typical spatial queries. Semi-structure query languages such as XML-QL [3] operate on the document level structure. Metalog provides a "logical" view of metadata present on the web. The middle tier can perform queuing, application execution, and database staging.

5.4 Intext Questions


1. 2. 3. 4. Explain the prospects of client server technology Elucidate the role played by Distributed Database management Discuss Datalog programs and evaluation Write a note on Enhanced data models for Advanced Application

5.5 Summary
Event: Database modification E.g., insert, delete, update), Condition: Any true/false expression. Optional: If no condition is specified then condition is always true Action: Sequence of SQL statements that will be automatically executed FOR EACH ROW trigger specifies a row-level trigger. An active database allows users to make the following changes to triggers i. Activate ii. Deactivate iii. Drop Time varying attribute refers to An attribute that changes over time Key Issues in DDS are Fragmentation, Data Allocation and Replication Datalog Program is s Logical Program.

Page 203

Advanced RDBMS 5.6 Terminal Exercise


1. 2. 3. 4. 5. 6. 7. 8. Triggers are of two types - _________ level an _________ level. Text, Images and Graphics are available in ___________ databases. _________ is a copy of fragment may be maintained at several sites. What is Data Fragmentation? What are the types of Data Fragmentation? What is Two-phase commit? What are the types of Distributed Databases? A _________ is said to occur when two activities, which may or may not be fullfledged transactions, attempt to change entities within a system of record. 9. What are the three locking strategies? 10. What are the different types of Architectures? 11. A _________ database can be defined as an advanced database augmented with an inference system. 12. What are the two main alternatives for interpreting rules?

5.7 Suggested Reading


1. [MilLun92] Millen, Jonathan K., Teresa F. Lunt, Security for Object-oriented Database Systems, In Proceedings IEEE Symposium on Research in Security and Privacy, pp. 260-272,1992. 2. [Mull94] Mullins, Craig S. The Great Debate, Forcefitting objects into a relational database just doesnt work well. The impedance problem is at the root of the incompatibilities. Byte, v19 n4, pp. 85-96, April 1994. 3. [Pflee89] Pfleeger, Charles P., (1989) Security in Computing. New Jersey: Prentice Hall. 1989. 4. [RobCor93] Rob, Peter and Carlos Coronel, Database Systems, Belmont: Wadsworth, 1993.

5.8 Assignments
1. Discuss in detail the Client-server architecture with Advantages. 2. Deductive Databases discuss advantages and Disadvantages.

5.9 Reference Books


1.[Sud95] Sudama, Ram, Get Ready for Distributed Objects, Datamation, V41 n18, pp. 67-71, October 1995. 2.[ThurFord95] Thuraisingham, Bhavani and William Ford, Security Constraint Processing In A Multilevel Secure Distributed Database Management System, IEEE Transactions on Knowledge and Data Engineering, v7 n2, pp. 274-293, April 1995.

5.10 Learning Activities


An individual or groups of people go to library for future activities.

Page 204

Advanced RDBMS 5.11 Keywords


1. 2. 3. 4. 5. 6. 7. 8. CORBA COM Client-Server Architecture Spatial Queries Fragmentation Allocation Replication Data log

Page 205

You might also like