You are on page 1of 105
24 BB Enhanced Data ae Models for Advanced Applications As the use of database systems has grown, users have demanded acklitional functionaliey from these sofeware packsges, with ehe purpose of making it easier to implement more advanced and complex user applications. Object-oriented databases and object-relational systems do provide features that allow users to extend their systems by specifying addi- tional abstract daca types for exch application. However, ic is quite useful 0 identify cer- ‘ain common features for some of these advanced applications and eo create models that ‘an represent these common features. In adsicicn, specialized storage structures and indexing methods can be iroplemented to improve the performance of these common fea cures. These features can then be implemented as abstract daca type or class libraries and separately purchased with the basic DBMS software package. The term. datablade has been, used in Informix and carteidge in Oracle (see Chapter 22) so refer co such optional sub- modules thae ean be incfuded in a OBMS package. Users can utilize these features ditectly if they are suitable for their applications, without having to reinvent, reimplement, and reprogram such common features This chapter introduces database concepss for some of the coramen features thar are veeded by advanced applications and that are starting to have widespread wse. The features ‘ve will cover are active rules that ate used in active dotabase applications, empora concepts that ore wed m temporal database applications, and briefly same af the isaves invalvng rmulimedia databses. We will abo discuss deductive databases. Ie i important to note chat cach of these topics is very broad, ane! we can give only a brie introduction to cach ates. In face, each of these areas can serve as the sole topic for a complete book. 755 756 Chapter 24 Enhanced Data Models for Advanced Applications In Section 24.1, we will introduce the topic of active databases, which provide additional functionality for specifying active rules. These mules can be automatically triggered by events that occur, such as a database updace or a cercain time being reaches, and can initiate certain actions that have heen specited in the rule declaration ifeeraia conditions are met. Many commercial packages already have some of the functionality provided by active databases in the form of triggers. Triggers ate now part of the 321-99 standard In Section 24.2, we will introduce the concepts of temporal databases, which permit the dacabase system to store a history of changes, and allow users to query both current and past states of the database, Some temporal database models also allow users to store farure expected information, such as planned schedules. I is important fo note that many database applications ate aiteady temporal, but are ofien implemented without having much temporal support from the DBMS package—that is, the temporal concepts were implemented in the application programs that access the database. Section 24.3 will give a brief overview of spatial and multimedia databases, Spatial databases provide concepts for databases that keep track of objects in a multidimensional space. For exarnple, cartographic databases that store maps include two-dimensional spatial positions of their cbjects, which inchude countries, states rivers, cities, roads, seas, and s on, Other darahases, such as meteorological darabases for weather informatio, ate theee-dimensional, since temperatures and other meteorological information are related tw three-dimeesional spacial points. Multimedia databases. provide features th allow users to store and query different types of multimedia information, which includes images (such as pictures or drawings), video clips (such as movies, news reels, or home videos), audio clips (such as songs, phone messages, or speeches), and documents {such as bocks or articles). In Section 24.4, we discuss deductive databases," an area that i atthe intersection databases, logic, and arrificial intelligence or knowledge bases. A deductive dacabase system a database system chat includes capabilities tp define (deductive) rules, which can deduce ot infer adklitional informecion from the facts chat are stored in a dacabase. Because part of the theoretical foundation for some deductive database systems is mathematical logic, such rules are often referred 0 as logic databases. Other types of systems, referred (0 as expert database systems or knowledge-based systems, abo incorporate reasoning and inferencing capabilities; such systems use techniques thar were developed in the field of artiicial incelhgence, including semandic networks, frames production systems, or rules for capturing Jomamn-apocific knowlege. Readers may choose to peruse the particular topics they are interested in, as the sections in this chapter are practically independene of one ancthcr 1. Section 244 ipa sammary of Chapter 25 from the third edition. The full chaprer will be avaible ‘onthe book Web site 24.1 Active Database Concepts and Triggers | 757 24.1 ACTIVE DATABASE CONCEPTS AND TRIGGERS Rales that specify actions shat are automatically triggered by certain events have been considered as importanc enhancements co a dacabase system for quite some time tn fact, the concept of triggers—a technique for specifying certain types of active rules—has existed in eatly vetsions of the SQL specification for relational databases anid criggets ate now part of the 5-99 standard, Commercial relational BBMSs—such as Oracle, DB2, ‘and SYBASE—have had various versions of triggers available, However, much research into what a general model for active databases should look like has been done since the «arly models of triggers were proposed. In Section 24-L.1, we will present the general con- «cepts thar have been proposed for specifying rules for active databases. We will use the syntax of the Oracle commercial relational DEMS to illustrate these concep's with specifi: examples, since Oracle triggers are close co the way cules are specifed in the SQL standacd. Seetion 24.1.2 will discuss some general design and implementation issues for active data bases, We chen give examples of how active databases are implemented in the STAR BURST experimental DBMS in Section 24.1.3, since STARBURST provides for many of the concepts of generalized active datakases within its framework. Section 24.1.4 discusses possible applications of active databases. Finally, Section 24.1.5 describes how triggers are declared in the SQL-99 standatd. 24.1.1 Generalized Model for Active Databases and Oracle Triggers The moxie] chat has been used for specifying active database cules is refersed t0 as the Event-Condition-Action, or ECA model. A rule in the ECA model has three components: [The event (or events) that criggets the rule: These events are usually database update opcrations that are explicitly applied to the dacabase. However, in the general model, chey could also be remporal events? or other kinds of extemal events. 2. The condition hac determines whether the rule action should be executed: Once the triggering event has occurred, an optional condition may be evaluated. If no condition is pecified, the action will be executed once the event occurs. Ifa condi tion is specified, it is fst evaluated, and only if t evaiuates to me will the rule aecion be executed 3. The action to be raken: The action is usually a sequence of SQL statements, but it could also be 2 database transaction of an external program chat will be automat cally executed, Let us consider some examples to illustrate these concepts. The examples are based ‘on a much simplified variation of the coaaer dacabase application from Figure 5.7, which imporal event specified asa periodic ime, suet as: Trigger this rile 2. An example would be a everyday ar 530 A.M. 758 Chapice 24 Enhanced Data Models for Advanced Applications ssshown in Figure 24.1, wich each employee having a name (vt), soctel security number {ss9), salary (se.arr}, department to which chey are currently assigned (own, a foreign key to oeparTIgnT), ancl a direct supervisor (suremvasor sit, a (recunsive) foreign key to ‘wrlovte), For this example, we assume that null is allowed for ow, indicating that an employee may be temporarily unassigned to any department. Each department has 2 naine (ase), Sumber {040}, the total salary of all employees assigned 16 the department (ToraL_sa1), anda manager (mawcen_ss a foreign key 10 Loree). Notice that che tovaL_sa_ attribute is really a derived atrribure, whose value should be the sum of the salaries of all employees who are assigned to che particular department. Maincaining the correct value of such a derived aribuce can be done via an active mle We first have to determine the events thet may cause 3 change in the value of Toval_sal, which are a follows: 1. Inserting (one or more) new employee cuples. 2. Changing the salary of (one or more) existing employees 3. Changing the assignment of exicting employees from one department to another. 4, Deleting (one oF more) employce cuples In the case of event 1, we only need co recompure rorAi sas if che new employee is immediately ossigned to a deportment—that is if the value of che oxo attribute for the new employee tuple is net null (assuming null is allowed for ows), Hence, chis would be the condition to be checked. A similar condition could be checked for event ? (and 4) determine whether the employee whose salary is changed! (or who is heing deleted} is currently assigned to a department. For event 3, we will always execute an sction to tmaintain the value of 7otal_sat correctly, 4 110 condition is needed (the action is always executed), The action for events [, 2, and 4 is to automatically update the value of 10a_s for she employec's department to reflect the newly inserted, updated, o: delesed employee's salary, In the case of evert 3, a twofold actiot: is needed} one ¢o update the TOTAL_SA of the employee's old department and the other to update the Tora._sat af the employee’ new department. ‘The four active niles (or triggers) RI, R2, R3, and R4—comresponding to the above situation—can be specified in the notation of the Oracle DPMS as shown in Figure 24.13 Let us consider rule RU wo illusrate the syacax of ereating tigyers in Oracle. The CREATE eum ores uw [ a [ san | owe | soremnson sv | DEPARTMENT conawe | ono [Tora SAL FIGURE 24.1. A simplified cowraw database used for active rule examples. 24.1 Active Database Concepts and Triggers | 759 (a) CREATE TRIGGER TOTALSALY AFTER INSERT ON EMPLOYEE FOREACH ROW WHEN (NEW.ONOIS NOT NULL) UPOATE DEPARTMENT SET TOTAL_SAL®TOTAL_SAL + NEW SALARY WHERE DNOsNEW.DNO; RQ: CREATE TRIGGER TOTALSAL? AFTER UPDATE OF SALARY ON EMPLOYEE FOR EACH ROW WHEN (NEW DNOIS NOT NULL) UPDATE DEPARTMENT ‘SET TOTAL_SAL=TOTAL_SAL + NEW.SALARY - OLD.SALARY WHERE DNO=NEW.ONO; 3: CREATE TRIGGER TOTALSAL3 AFTER UPDATE OF DNO ON EMPLOYEE FOREACH ROW BEGIN UPDATE DEPARTMENT SET TOTAL_SAL=TOTAL SAL + NEWSALARY WHERE DNO=NEW.ONO; UPDATE DEPARTMENT SET TOTAL_SAL=TOTAL_SAL— OLD SALARY WHERE DNO=OLD.DNO: END; CREATE TRIGGER TOTALSALA AFTER DELETE ON EMPLOYEE FOR EACH ROW WHEN (OLD.DNO IS NOT NULL} UPDATE DEPARTMENT SET TOTAL_SALsTOTAL SAL ~ OLD.SALARY WHERE DNO=0LD ONO; ®) RS: CREATE TRIGGER INFORM SUPERVISORY BEFORE INSERT OR UPDATE OF SALARY, SUPERVISOR_SSN ON EMPLOYEE FOR EACH ROW WHEN (NEW SALARY > (SELECT SAL ARY FROM EMPLOYEE WHERE SSN=NEW.SUPERVISOR_SSN)) INFORN_SUPERVISOR(NEW. SUPERVISOR_SSN, WEW.SSN}; FIGURE 24.2 Specifying active rules as triggers im Ovacle notation. (a) Triggers for automatically maintaining the consistericy of Torat_saL of oeparnmenr.(b) Trigger for comparing an employee's salary with that of his or her supervisor. 760 | Chapter 24 Enhanced Data Models for Advanced Applications ‘TRIGGER statement specifies a crigget (or active rule) name—torasaid for RI, The AFTEX-clause specifies that che rule will be triggered ofter the events that trigger the rule ‘occur. The triggering events—an insert of a new employce in this example—ore specified following the AFTER keyword. The ON-clause specifies the relation on which the rule s specitiec—enP.ovee for RI. The optimal keywords FOR EACH ROW specify chat the rule will be triggered once for each row that is affected by the triggering event.# The optimal SHEN- clause is used ta specify any conditions rae need co be checked after the rule is eiggered hut hefore the action is executed, Finally, the action(s) to be taken are specified as a Pf SQL block, which typically contains ane or mare SX statements or calls to execute extemal procedures “The four ciggers (active rules) RL, R2, R3, and R4 illustrate a number of features of sctive rules. Fits, the basic events thar can be specified for tiggering the rules ae che standard SQh update commands: INSERT, PRLETR, and UPDATE. These are specified by the keywords INSERT, DELETE, and UPDATE in Oracle notation. In the case of UPDATE one ray specify the attributes to be updated—for example, by writing UPDATE OF Saint, Second, the tule designer needs to have @ way to refer (0 the tuples that have heen inserted, deleted, or modified by the triggering event. The keywords NEW and OLD are used in Oracle notation: NEW is used co refer to a newly inserted or newly updated took, whereas OLD is used to refer to.a deleted ruple or toa tuple before ie was updated. “Thus rule 1 is tiggered after an INSERT operation is applied to the eatovee relation. In Ri, the condition (hew.o40 15 wor NULL) is checked, and if i evaluates to true, meaning that the newly inserted employee tuple is relared to a department, then the action is executed, The action updates the veratment cuple(s) relaced to che newly inserted employee by adkling their salary (yew.sataey) to the Torat_saL atcribute of their related department Rule RZ is similar to RU, butt is triggered by an UPDATE operation that updates the satay of an employee rather chan hy an INSERT. Rule R3 is ciggered by an update tothe ow attribute of ep.orte, which signifies chenging an employee's assignment from one department to another: There is no condition to check in R3, so the action is executed whenever the Ciggering event occurs. The action updates both the old department snd new department of the reassigned employees by adding their salary co roraL_sat of their new department ond subtracting their salary from Tora._sat of their old department. Note that this should work even ifthe value of oo was null, because in this ease no department will be selected for the rule action * Ic is important to note the effect of the optional FOR EACH ROW clause, which signifies thar the rule 4 triggered separately foreach eyple. This is known as 2 row-level igger If this clause was left out, the trigger would be known asa statement-level trigger 3. As we shall se lares 115 alo possible 1 specify BEIORE fastead of AFTER, which incates that the rule i wiggered fone che magerng events execute, 4. Again, we shall ee lager that an alteanstve is to wigge the cule andy nce even if mile roms (euples) ace aflecred by the riggerng event 5.1, R2, and R4 can alo be written without & condition. However, they may ke mere eFicient execute withthe condition since the sction i ic inked unless i aquired 24.1 Active Database Concepts and! Triggers and would be triggered once for each triggering starement. To see the difference, consider the following update operation, which gives 2 10 percenc taise co all employees assigned co deparment 5. This operation would be an evene thar eriggers rule R2: Wore EMPLOYEE ser 1a + say Because the above statement could update multiple records, a rule using row-level semantics, such ss R2 in Figure 24.2, would be tiggeted once for each rou, whereas a rule using statement-level semantics is triggered any once. The Oracle system allows the user to choose which of the above two options isto be used for each rule. Including the optional FOR EACH ROW clause creates a row-level nigger, and leaving it out creates a statement- level trigger. Nore thatthe keywords NEW and OLD can only be used with row-level riggers. ‘Asa second exatnple, suppose we want to check whenever an employee's salary is greater than the salary of his orher direct supervisor. Several events can trigger this rule: inserting 2 new employee, charging an employee's salary. oF changing an employee's superviso that the action to take woukl be to call an external procedure rts sirenvrson.® which will notify the supervisor. The rule coubd then be waitten asin RS (see Figure 24.2b). Figure 243 shows the syntax for specifying sone of the main uptions avaiable in Oracle riggers. We will deseribe the syntax for triggers in rhe 21-99 seansdard in Seevion 24.1.5. 24.1.2. Design and Implementation Issues for Active Databases The peevious section gave an overview of suine of the main conceprs for specifying active rules. In this section, we discuss some ackhtional iss concerning how mules are designed and implemented. The rst issue cortcems accivation, deactivation, and grouping of rules, -ctigger> = CREATE TRIGGER (AFTER | BEFORE ) ON [FOR EACH ROW } WHEN econditions } - == } -criggor evento::=INSERT | DELETE | UPDATE { OF (, )} ‘rigger action= ::= (BELECT SUM(tV SALARY) FROM NEW-UPDATED AS N WHERE D.DNO=N.ONO) WHERE —D.ONOIN (SELECT ONO FROM NEW-UPDATED: UPDATE DEPARTMENT ASD SET DTOTAL_SAL=D-TOTAL_SAL— {SELECT SUM(O.SALARY) FROM OLD-UPDATED AS 0 WHERE D.ONO0.0NO) WHERE — O.DNOIN (SELECT ONO FROM OLDUPDATED): FIGURE 24.5 Active rules using statement-level semantics in STARBURST notation. 8, Note that the WHEN keyword specifies vents in STARBURST but is used to specify che rule cond tion in SQL and Oracle triggers 24.1 Active Database Concepts and Triggers Finally, the THEN-clause is used to specify the aetion (or actions) to be taken, which are typically one or more SQ. statements. {In STARAURST, the hasic events that can be specified for triggering the rules are the standard Sc. update commands: (NSHRT, DELETE, and UPDATE. These are specified by the [Keywords nseareo, oetr7eD, and tenateo in STARBURST notation. Second, the rule designer needs to have a way to refer to the tuples that have been modified. The keywords tsenreD, DeLereo, wew-veuaTeo, and oLo-wroareD ate used in STARBURST notation (0 refer to four transition tables (relations) chat include the newly inserted tuples, the deleted cuples, the updated cuples before they were updated, and the updated cuples after they were updaced, respectively. Obviously, depending on the triggering events, only some of these transition, tables may be available. The rule writer can refer to these tables when writing the condition and aeeion parts of the rule, Transition tables contain tuples of the same type as those in the relation specified in the ON-clause of the rule—for RIS, R2S, and R3S, this isthe o1.orte celation In statement-level semantics, the rule designer can oaly refer to the transition tables asa whole and the rule is triggered only once, so the rules must he written differently than for row-level semantics. Because multiple employee tuples may be inserted in a single insert statement, we have to check if at last one of the nestly inserted employee tuples is relared to 4 department. In RIS, the condicion EXISTS(SELECT * FROM INSERTED WHERE ONO IS NOT NULL) ischecked, and fit evaluates to true, then the action is executed. The action updaces in @ single stacement the ocearienr tuple(s} related to the newly inserted employee(s) by add- ing their salaries to the Terat_sa attribute of each related department: Because more than tone newly inserted employee may belong to the some department, we use the SUM opgre- gate function to ensure chae all cheie salaries are aied. Role R2S is similar to RIS, but is triggered by an UFDATE operation that updates the salary of one or more employees rather than by an INSERT. Rule R3S is triggered by an update to the mn acteibure of ewLove, which signifies changing one or more employees’ assignment from one department to another. There is no condition in RSS, so the action is executed whenever the triggering event occurs The action updates both the old department(s) and new departments} of the reassigned employees by adding their salary to Tom_sat of each new department und subtracting theie salary from To1al_sut ofeach old department, {In our example, itis more complex to write the statement-level rules than the ro level rules, as can be illustrated by comparing Figures 24.2 and 24,5. However, this 6 not a general rule, and other types of active rules may be easier to specify using stacement- level notation than when using row-level notation. ‘The exccution model for active ales in STARBURST uses deferred consideration. That is all the rules that are triggered within a transaction are placed in a set—called the contlict 9. As in the Oracle examples, rules RLS and R2S ean be written without a condition. Howevs they may be mote efficient to execute with the condition since the action és net invoked unless is required 765 766 | Chapter 24 Enhanced Data Models for Advanced Applications set—which is not considered for evaluation of conditions and execution unl the transaction ends (by isuing ts COMMIT WORK command). STARBURST alo allows the user to explicitly start rule consideration in the middle of a transaction via an explicit PROCESS RUUS, command, Because multiple rules must be evaluated, i is necessary to specify an order among the rules. The syntax for rule deckaration in STARSURST allows the specification of ordering among che rules to instroct the system about the order in which & set of mules should Fe considered.” In addition, the uansition tabley—msSekrin, DELETED, New-urDATEO, and 0- urosren—contain the net effect ofall the operons within che transaction that affected exch table, since multiple operations may have been applied to cach table during the transaction 24.1.4 Potential Applications for Active Databases ‘We now briefly discuss some of che potential applications of active rules. Obviously, one important application is w allow notification of certain conditions that occur, For exam ple, an active database may be used to monitor, soy, the temperature of an industaal fur nace, The application can periodically insere in the database che temperature reading records directly from temperature sensors, and active rules can be written that ate tg ered whenever a temperature record is inserted, with a condition that checks if che ce perature exceeds the danger level, and the action to raise an alu. Active rules can also be used to enforce integrity constraints by specifying the types of events that may cause the constraints to be violsted and then evaluating’ sppeoprite ‘conditions that check whether the consttsints are actually violated by the event or not Hence, complex application constraints often known as business rules may be enforced that way. For example, in the uurversrry datakase application, one rule may menitor the grade point average of students whenever a new grade is entered, ad it tay alet the advisor if the cpa ofa student falls below a certain threshold; another rule may check thar course prerequisites are sarisfied hefore allowing a student to enroll in a course; and soca. ‘Other applications include the automatic maintenance of derived data, such as the ‘examples of rules RI through Ré thar maintain the derived attribute toral_sa: whenever individual employee tuples are changed. A similar applieation is 20 use active res eo maintain the consistency of materialized views (see Chapter 9) whenever the hase relations are modified. This application is also relevant to the new data warehousing technologies (see Chapter 28). A related application is co maintain replicated tables consistent by specifying rules thae modify the repleas whenever the master cable is made. 24.1.5 Triggers in sQL-99 “Triggets in the SQL-99 standard are quite similar co the examples we discussed in Section 24.1.1, with some minor syntactic differences. The basic events thar can be specified foe triggering the rules are the standard SQL update cominands: INSERT, DELETE, and UPDATE. 10. If no enler is specified becween » pair of rules, che system defaule onder is based on placing the rule declared fst ahead ofthe other rule 242 Temporal Database Concepts | 767 In the case of UPDATE one may specify the attributes to be updated. Both row-level and statement-level triggers are allowed, indicated in the trigger by the clauses FOR EACH ROW and FOR EACH STATEMENT, respectively. One syntactic difference is that the trigger ‘may specify particular tuple variable names forthe old and new tuples instead of using the Keywords NEW and OLD as in Figure 24.1. Trigger TI in Figure 24.6 shows how the row- level rigger R2 from Figure 24.1(a) may be specified in $Q1-99. Inside the REFERENCING clause, we named tuple variables (aliases) O and N to refer co the OLD cuple (before mod- ttication) and NEW tuple (after modification), respectively. Trigger T2 in Figure 24.6 shows how the statentent-level trigger R2S from Figure 24.5 may be specifed in SQU-99. For a statement-level trigger, the REFERENCING clause is used to refer co the cable of all new tuples {newly inverted or newly updared) as N, whereas the table of all old tuples (eleced tuples or tuples before they were updated) is referred to as O. 24.2 TEMPORAL DATABASE CONCEPTS Temporal databues, in the brualest sense, encompasy all database applications that require some aspect of time when organizing their information. Hence, they provide a {good example co illusteate che need for developing a set of unifying concepes for applica ion developers to use. Temporal database applications have been developed since the early days of database usage. However, in creating these applications, it was mainly left Th: CREATE TRIGGER TOTALSALL AFTER UPDATE OF SALARY ON EMPLOYEE REFERENCING OLD ROW AS 0, NEW ROW AS NL FOR EACH ROW WHEN (N.0NO TS NOT NULL) ‘UPDATE. DEPARTMENT SET TOTAL SAL = TOTAL_SAL + N.SALARY - 0,SALARY WHERE DNO w N.DNO: Te: CREATE TRIGGER TOTALSALZ AFTER UPDATE OF SALARY OW EMPLOYEE REFERENCING OLD TABLE AS ©, NEW TABLE AS N FOR EACH STATEMENT WHEN EXISTS(SELECT * FROM A WHERE N.OND TS NOT NULL) O8 EXISTS(SELECT * FROM O WHERE 0.CNO 1S NOT NULL) UPDATE DEPARTMENT AS D SET D.TOTAL SAL = 0. TOTAL_SAL ++ (SELECT SUM(N.SALARY) FROM N WHERE D.DNO~N.DNO) = (SELECT SUM(O. SALARY) FRON 0 WHERE D.DNO-O.DNO) WHERE DNO IN (CSELECT ONO FROM K) UNZON (SELECT ONO FROM 0)); FIGURE 24,6 Trigger T1 illustrating the syntax for defining triggers in 51-99. 768 Chapter 24 Enhanced Data Modeis for Advanced Applications che application designers and developers to discover, design, program, and implement she emporal concepts they need There are many examples of applications where some aspect of time is needed to maincain the infortnation in a database. These incluse Relth- cave, where patient histories need to be maintained; insurance, where claims and accident histories are required as well as information on the times when insurance policies are in effect; reseruaron systems in general (hotel, airline, car rental, train, ete), where informa- tion on the dates and times when reservations are in effect are required; scents dtae bases, where data collected fom experimenss includes the time when each data is measured an so on. Even the twe examples used im this book may be easily expanded inte temporal applications. In the cawnty databace, we may wish to keep saLany, 105, and prozecr histories on each employee. In the wurveesrty database, time is already included in the seaesrea and veak of each secriox of a course; the grade history of a stuvex, and the informa- tion on research grants. In fact, it is realistic to conclude chat the majoricy of detabase spplications have some temporal information, Users ofien attempted to simplify or inure temporal aspects because of the complexity that they ack’ to their applications, In this section, we will inteoduce some of the concepts that have been developed 0 deal with che complexity of temporal daeabase applications. Seecion 24.2.1 gives an overview of how time is represented in databases, the different types of empora! informacion, and some of the different dimensions of time that may be needed. Sectice 24.2.2 discusses how cme can be incorporated into relational databases. Section 2423 gives some acklivfonal options for representing time dhat are possible in database mone that allow complex-structured objects, such as object databases. Section 24.2.4 introduces operations for querying eemporal databases, and gives a brief overview of the TSQU Janguage, which extends SQL with temporal concepts. Section 24.2.5 focuses on time seties data, which is type of temporal data that is very important ie practice. 24.2.1 Time Representation, Calendars, and Time Dimensions For temporal datubases, umne is considered ro be an ordered sequence of points in some granularity that is determined by the application. For example, suppose that some tempo ral application never requires time units that are less than one second. Then, each rime point represents one second in time using this granularity In reality, each second isa {shore} rime duration, not @ point, since it may be further divided ino milliseconds, micro: seconds, and so un. Temporal database researchers have used the term chronon instead of poinc 10 describe this minimal granulatty for a particular application. The main conse- quenes of choosing # minimum gracularity—say, one second-—is that events occuring ‘within the same second will be considered to be simultaneous events, even though in ra: sty they may nor be Because there is no known beginning or ending of time, one needs a reference point from which to measure specific rime points, Various calendars are used by various cultues uch as Gregorian (Western), Chinese, Islamic, Hindu, Jewish, Coptic, ee.) with different reference points. calendar organizes time into different time units for convenience. Must 24.2 Temporal Database Concepts ‘calendars group 60 seconds into a minute, 60 minutes into an hous, 4 lrours into a day based on the physical rime of earth's roration around its axis), and 7 days into a week Further grouping of days into months and months into years either follow solar oc lunar ‘natural phonomena, ard are generally irregul. in the Gregorian calenda, which ss used in ‘most Western countrics, days are grouped into months that art either 28, 29, 30, or 31 days, and 12 months are grouped into a yeat. Complex formulas are used to map tive diferent tithe units to one another. In SQL, the temporal dara types (sce Chapter 8} include DATE (specifying Year Month, and Day as YYYY-MM-PD), TIME (pecifving Hour, Minure, and Second 2s HHMM:SS), TIMESTAMP (specifying a Date/Time combination, with options for including sub-second divisions if they are needed), INTERVAL (a relative time duration, such as 10 sleys or 250 minutes), and PERIOD (an anchored rime duration with a fixed starting point, such as the 10-day period from Januaty 1, 1999, 10 Janusry 10, 1999, inclusive). Event Information Versus Duration (or State} Information, A temporal database vill store information concerning when certain events acews, or when certain fects are considered to be ttuc. There are several different types of temporal information. Point events or facts are gypically associated in the database with a single time point in some granularity. For example, « bank deposit event may be associated with the timestamp when the depesic was made, or the total monthly sales of a product (fact) nay be associated with a particular month (say, February 1999). Nore thar even though such events or facts may have different granularities, each i still associated with a single me value in che database. This type of information is often represented as time series data as we shail discuss in Section 24.2.5. Duration events of facts, on the other hand, are associated with a specific time period in the database.” For example, an employce may have worked in a company from August 15, 1993, till November 20, 1998, ‘A time petiod is represented by its start and end time points [sraer=T04€, e40-T1¥€], For example, the above period s represented as (1993-08-15, 1998-11-20). Such a rime period is often interpreted re mean the set of oll time points from start-time to end-time, inclusive, in the specified granularity. Hence, assuring day granularity, the period (1993 08-15, 1998-11-20] represents the set of all days from August 15, 1995, unnl November 20, 1998, inclusive.!? 1, Uniorgunately, the terminology has not been used consistently For example, che erm interval ‘often used to denote an anchored duration, For consistency, we shall use the SQL terminalagy 12. This is the some os an anchored daracion. [thas ako been frequently called atime interval but ro.avoid confision we will use period ro he consistent with SQL terminology. 13, The sepeesentation (1993-08-15, 1998-11-20] is called a closed aneraat representation. One can ako use an open ings, denoted (2993-08-15, 1998-11-21), where the et of points dows na ‘clude the end pownt. Although the Laver representation & somactimes more canvenient, we shall use closed intervals througout to avoid confusion, 769 770 | Chapter 24 Enhanced Data Models for Advanced Applications Valid Time and Transaction Time Dimensions. Giver a particular event or fact that is associated with a particular tiine point or time period in the database, the sociation may be interpreted to mean diferent things. The most natural interpretation is chat the associated time isthe zime that the event occurred, or the period during which the fact was considered eo be tue im the real world. If this interpretation is used, the sociated time is often referred to as the valid time. A temporal database usine thi interpretation is valled u valid time darabase. Honever, a different ineerpretacion can be used, whete the associated cime refers to the time when the informacion was actually stored in the database; that is, tis the vale of the system time clock when the information is valid in the system." In this case, the associsted time is called the transaction time. A temporal database using this interpretation is called a transaction time database. Other interpretations can also be intended, but these two are considered «0 be the ‘most common ones, and they are referred to as time dimensions. In some applications, ‘only one of the dimensions is needed and in other cases botk time dimensiens ate required, in which case the temporal database is called a bitemporal database. If ether interpretstions ave intended for time, the user can define the semantics and program the applications appropriately, and itis called a user-defined time, ‘The next section shows with examples how these concepts can be Incorporated ito relational databases, acd Section 24.2.3 shows an approach to incorporate temporal ‘concepts into object databases. 24.2.2 Incorporating Time in Relational Databases Using Tuple Versioning Valid Time Relations. Let ws now see how the different types of temporal databases say be represented in che relational model. Fist, suppose that we would Like to include the history of changes as chey occur in the teal world, Consider again the database in Figure 24.1, and let us assume that, for this application, the granulerity is day. Then, we could convert che two relations saruovee and cepakrmext into valid time relations by adding the attributes vst (Valid Start Time} and ver (Walid End Time), whose dara type is DATE in onder to provide day granukrity. This is shown in Figure 24.7a, where the relations Ihave been renamed svt and oer". v7, respectively. Consider how the er vt relation differs from che nontemporal sy.ciee celation (Figure 24.1). ln owt, each (uple v repiesents a version of an employee's infosmation that is valid {in the real world) only duting the time period [v.v5ty. ¥.veT], whereas in ‘eworet each tuple represents only the curtent state or current version of cach employee. In tw_vr, the current version of each cinployce typically has a special value, ow, as its Ta: The explanation is more involved, 2s we shall seein Sectien 242.3. — 15, Acaontemporal relation is ako called a spapehot relation as ic shows ony che curens snapshot or covert now ofthe database, 24.2 Temporal Database Concepts @ EMevT ‘awe | $5N | Sarany | ono | SUPERVISOR ssw | VST | ver orPr_vr NAME | ONO | TOTM_SaL | MANAGER GSN | VST | VET (be) EMPIT awe | ssw | satay [ono | supenvisor_ssn Ter vEPTIT wane [ono | Tora sat | manacen sow [ror | Ter roy ‘EMP_8T (awe {sau [Tealanr [ono [ourernaon oon] ver | vet | gar [ver bEPT_ET NAME Tora sat | manacer,sen | vst | ver | rst | Ter FIGURE 24.7 Different types of temporal relational databases. (a) Valid time data base schema, (b} Tiansaction time database schema. (c) Bitemporal database schema. valid end rime. This special value, now, is a temporal variable char implicitly represents the current time as time progresses. The nontemporal erpioyes relation would only include those tuples frorn the s4_vr relation whose V=r is now: Figure 248 shows a few tuple versions in the valid-rime relations ey?_vt and oeet_vt There are exo versions of Smith, three versions of Wong, one version of Brown, and one version of Narayan. We can now see how a vatid time telation should behave: when information is changed. Whenever one or more attributes of an employce arc updated, rather than actually overwriting the old values, as would happen in a nonceraporal relation, the system should create a new version and close the current version by changing its ver co che end time. Hence, when the user issued the command to update the salary of Smith effective on June 1, 2003, to $30000, the second version of Smith was created (see Figure 24.8). Ac the time of this update, the frst version of Smith was the current version, with now as ite ver, but after the update now was changed to May 31, 2003 (one less than June 1, 2003, in day granularity}, to iedlicate that the version has become a closed or history version and that che new (second) version of Smith is now the current one. mm 772 | Chapter 24 Enhanced Data Models for Advanced Applications eMP_vT name | ssw | satany | ono | SUPERVSOR_ssN ver Sein vos S08aM5E5 ——20N20845 2000831 Sen soo 5545555 2000-0001 now Wong pve oee6e7777 1900-0820 c001-e1-38 Wong son oeese7777—zanI-02.01 2022.00.81 ee woos 2ese6ss5s 2002.08.01 now Brown en eogee7r77 2001-0501 2meD.ca- Nerayen Gesanceat $6000 5 aet0358 200-0001 DEPT_VT nave | owo | MANAGER_sst ver Resear §——«BBRGEEESS 2001.00.20 aoN2c0.a1 Aesewen = 5 S38MS8SS§—ZCEOT aw FIGURE 24.8 Some tuple versions in the valid time relations en»_vr and err 7. [cis important to note thar in a valid cime elation, dhe wer must generally provide the valid time of an update. For example, the salary update of Smith may have been entered in the database on May 15, 2003, at 852:12 A&M. say, even though the slay change in the real world is effective on June 1, 2003. This is called a proactive update, since it is applied to the darahase before i becomes effective in the real workd. If the update was applied to the darabase afte it hecame effective in the real world, it scalleda reteoactive update. An updute that is applied at the same tine when it becotties effective is called a simultaneous update “The action that correspond: to deleting an employee in a nontemporal database ‘would typically be applied to a valid time datahase hy closing che exrrent version of che employee being deleted, For example, if Smich leaves the company effective January 19, 2004, then this would be applied by changing ver of the current version of Smith from now co 2004-01-19, In Figure 24.3, there is no current version for Brown, because he presumably left the company on 2002-08-10 and was logically deleted, However, because the database is temporal, the old information on Brown is still there. “The operation to insert a new employee would correspond to creasing the fist be sorsion for that employee, and making ic the current version, with the vst being the effective (real world) time when the emplayee starts work. In Figuee 247. the euple ot Narayan illustrates this, since che first version has noc been updated yet. Notice thac ina valid time relation, the wintemporal key, such as SS in OMLOFE, fn longer unique in each tuple (version), The new relation key for enr_vt isa combination of the nontemporal key and the valid start rime attribute wr! so we Use (SSN, ST) as 16. A combination of che nontemporal key and the val end rime atribore ver cock alo he wed. 24.2 Temporal Database Concepts | 773 primary key. This s because, at any point im time, chere should be at most one valid version of each entity. Hence, the constraint chat any twe tuple versions representing the same entity should have nonintersectmg said cine periods should hold on valid time relations Notice thar if the nontemporal primary key valve tay change over time, itis important to have a unique surrogate key attribute, whose value never changes for each real world entity, in order to relace together all versions of the some real world entity Valid time relations basically Keep track of the history of changes a6 they become effective in the ref world. Hence, ifall real-world changes ere applied, che database keeps a history of the real-world sues thar are represented. However, because updates, insertions, and deletions may be applied retroactively or proactively, there is no record oF the actual database stace at any point in time, If the actual database states are more important ro an application, then one should use ransacton time relations Transaction Time Relations. In a eransaction time database, whenever a change is applied to the database, the actual timestamp of the transaction chat applied the change Ainsere, delete, or update) is recorded. Such a database is roost useful when changes ate applied simulumeously in the majoriey of cases—for example, real-time stock trading or banking transactions. If we convert the nontemporal database of Figure 24.1 into a transaction time database, then the two relations ex\ovee and oePéament are convested into transaction time relations by adding the attributes rst (Transaction Srart Time) and ter (Transaction End Time), whose data type is typically TIMESTAMP. This is shown ity Figure 24.7h, where the relations have been renamed €4p_r and teeT_1, respectively. In ew_rt, each tuple v represents a version of an employee's information that was ‘created ac actual time v.157 and was (logically) removed at actual sime v.ter (because the information was no longer correct). In ex®_r, the current versom of each employee typically has a special value, we (Until Changed), as its transaction end time, which indicates that the tuple represents correer information und ic is changed by some other transaction” A transaction time database has also been called a rollback database,!® because a user can logically roll back to che actual database state at any past point in time r by recieving all tuple versions whose transaction time period [¥-r5T, ¥.T¢T] includes time point 7. Bitemporal Relations. Some applications requiee both valid time and transaction time, leading to bitemporal relations, In our exemple, Figure 24.7c shows how the swrcovee and neaeneur non-temporal eelations in Figure 24.1 would appear as bitemnporal relations tx-_or and o¢e7_er respectively Figure 249 shows a few tuples in these relations. In these tables, cuples whose transaction end time ter is ue are the ones representing currently valid information, whereas tuples whose 174s an absolute timestamp are tuples, that were valid uncil (just before) that timestamp. Hence, the tuples with ve in Figure 249 correspond to the valid time tuples in Figure 24,7. The transaction start cime attribute Ts in each tuple is che timestamp of the transaction chat created that uple 17, The we variable tn transaction Lame relations comtesponds to dhe now va tions, The setanses are sighly differen dough. U6, The crm sllhack hete does nor have the same meaning as uncacton roll. (soe Chapter 19) during covery, whow the taesaction updates ace plysly ne. Rather, here the updates can be logical undone allowing the wer to examine the dvtahave os appeaced atx previous time pow 774 | Chapter 24 Enhanced Data Models for Advanced Applications EMP_eT nae] sen [ sauany | ono] surenvison ssn] ver [ver Ist TET rin 1204867605000 «5 ==SCSGKASENS©—«200205-15 now 2002060810588 200 OB OKOREE? Smo 129456769 250005 -IGMAEERS «2002-15 1H-O5GTZ0ORTEOL ORAZ oe ‘Sean 123858709 200005 MSSSS 2003-0801 now—aeuRORORORSSIZ = Wong 35S445555 26000 4 OMn887777——1999.0020 row o98 0820.29 24017. 1AE ee ee ee ea) Wong 5045865 50000 «=== «SaGGATZT7—«BOCTROT now «ONT OTA SBME 2002002900285 Wong 350445585 sc000 © «= =—=gneea 7777 aonazot oaT-osat 2002 an2K0—2857 © Wong 358445885 4c000 = S—HEEGSSSS © 200ZUHOT mow 2020 2KORZTST © Brown 222447777 28000 4 ate887777 200850 pow 2001427 IGZERE 200208-12101 7 Brown 222447777 26000 © & == «SMGBATZIT_—«—«— 20080501 1987810 200208:12,109157 os aye S6ccatsss 90000 «= SIDHISSES © «OCDUEOT mow 2005072000257 e DEPT_YT ‘Name | ono | MANAGER SSN | vsT | VET Tt Te, Resoucn —«§—«EUUEEESD. «2000020 row 2OUI.IS852:12 2001-0829,002557 Resenn «5 ESESSE 2001-00-00 1007-0001 20020928002087 Reseach = tasens: © nnn. 0L0s now 20020328,082557 we FIGURE 24.9 Some tuple versions in the bitemporal relations exr_st and pepr_sr. Now consider how an update operation would be implemented on a bitemporal relation. In this model of bitemporal databases, no armibutes are physically changed in any tuple except for the transaction end time etuiuie TET with a value of ue." To illustrate ow tuples ate created, consider the e#_er relation. The corent version ¥ of an employee fas ue in is 161 ateribuce and now in is ver actibute. If some auteibute—say, saany—is upcated, then the transaction T that performs the update should have to parameters: the new value of sake andthe valid time vr when the new salary becomes effective (in the real world). Assume that \i~ isthe time poinr before vin the yiven valid time granularity and that transaction Thos 3 timestamp 1561). Then, the following physical changes woul be apphed to the for table 1. Make » copy ¥2 of the current weston ¥; set 2VET co WI=, v2. 1ST to 1961), ater tour, and isere ¥2 in ev oT; ¥2 isa copy of the previous current version ¥ afcr it ss closed at valid tame vi~ 2. Make a copy v3 ef the current version set V3. 10 ¥F, vB.¥Er 10 now, v3. say te the new Salary value, ¥3,187 to TSC, VB.TET (0 uc, eed insert «3 in OWT: represents the new current version. 19. There have Seen many proposed temporal database models. We are describing specific models here as exainples to lusrare The concepts 20. Some bitemporal models allow te ver atcibate co be changed abo, bat the incerpreations of the tuples are diferent in those models 24.2 Temporal Database Concepts 3, Set v.rer to T5(7) since the current version is no longer representing correct information. As an illustration, consider the first three tuples v2, v2, and v3 in error in Figure 24.9, Before the update of Smith’s salary from 25000 to 36900, only v2 was in en# stand it was the current version and its Tet was ue. Then, a transaction T whose timestamp TSC) is 2003-06-04,08: 56:12 urdates the salary to 30000 with the effective valid time of 2003-06-01, The tuple v2 ts created, which is a copy of v2 except that its ver is set to 2003-05-31, ene day less than the new valid sime and ics rst is rhe timestamp of the updating transaction. The tuple v3 is also created, which has the new salary, its ¥ST i set to 2003-06-02, and its TST also the timestarap of the updating tramaction. Finally, the ‘Ter of v1 is set (o the timestamp of the updating cransaction, 2003-06-04,08:56;12. Note thar this is a reerooctive update, since the updating transaction ean on June 4, 2003, bur the salary change is effective on June 1, 2003, Similatly, when Wong's sslary and department are updated (at the same time) 10 30000 and §, the updating transaction’s timestamp is 2601-01-07,14:33:02 and the effective valid time for the updace is 2001-02-01. Hience, this tsa proactive update because the transaction ran on Januaty 7, 2001, but the effective date was February 1, 2001. in this case, ple v4 is logically replaced by v5 and v5. Next, let us illustrate how a delete operation would be implemented on a biteraporal selation by considering the tuples v9 and vi0 in the exr_or relation of Figure 24.9. Here, ‘employee Brown left the company effective August 10, 2002, and the logical delete is cotied out bya transaction T with 18(1) = 2002-08+12,19:12:07. Before this, v9 was the ‘current version of Brown, aid its Tor was ue, The logical delete is iraplemented by setting, Y9.TEF {0 2002-08-12, 10:11:07 to invalidate it, and creating the final version v20 for Brown, with ics ver = 2602-08-10 (see Figure 24.9), Finally, an insert operation is implemented by creating the first version as liustrated by VAL in the evr_st eable Implementation Considerations. There are various options for storing the tuples in a temporal relation, One is €0 store all the tuples in the same table, asin Figures 23.8 and 23,9. Another option isto ereate two tables: one for the enrrently valid information and the ‘other for the rest ofthe tuples. For example, in the bitemporal E¥?.6F elation. tuples with ne for their 1 and now for their ver would be in une relation, the ciarent table, since they are the ones currently valid (that is, epresent the current snapshot), and all cther tuples wil be in another relation. This allows the database administrator ro have differen acoess paths, such as indexes for each relation, and keeps che sie of che currene table reasonable, Another possibility isto create a chil (able for corrected tuples whose Te is not ue Another eption that is available is to vertically otition the attributes of the temporal relation into separate telations. The reason for this is that, if a celation has many actbutes, @ whole new tuple version is created whenever any one of the attributes is updated. If che arteibutes are updated asynchronously, each new version may differ in only cone of the attributes, thus needlessly repeating the otter atcibute values. Ifa sepacate relation is created 10 contain only the attributes that always change synchronously, with the primary key replicated in each relation, the database is said to be in temporal normal 778 776 Chapter 24 Enhanced Data Models for Advenced Applications form. Hrwever, 10 combine the information, a variation of jom known 2s temporal intersection join would he needed, which is generally expensive to implement Ic is important to nore that hiternpotal databases allow a complete recard of changes, Even a record of corrections is possible. For example, itis pessible hac two tupke versions of the same employee may have the same valid time but difierent attribute values as long as their transaction times are disjoint, In this case, the tuple with the later transaction time is a correction of the other tuple version, Even incorrectly entered valid times may be corrected this way. The incorrect state of the database will still be available as « previous database state for querying purposes. A datahase char Keeps such a complete record of changes and corrections has been called an append only database. 24.2.3 Incorporating Time in Object-Oriented Databases Using Attribute Versioning The previous section discussed che tuple versioning approach to implementing temporal databases. In this approach, whenever one attribute value is changed, a whole new tuple version is created, even though all the ocher attribute values wil be idencical co the previ ‘ous cuple version, An alternative approach can be used in database systems chat support complex structured objects, such as object databases (see Chapters 20 and 21} or object relational systems (see Chapter 22). This approach is called attribute versioning." Tn aceibute versioning, x single complex object is used to store all the temporal changes of the object. Bach artribure thar changes over time is called a time-varying attribute, and it has its values versioned over tine by adding tetmporil periods to the attribute. The temporal periods may represent valid time, transaction time, or bitemporal, depending en the application requirements. Artributes that do noc change are called nonctime-varying and are not associated with the temporal periods. To illustrate this, consider the example Figure 24.10, which is an attribute versioned valid cime representation of s?.ires using the ‘ODL novation for object databases (see Chapter 21), Here, we assumed that narge and saci security number are non-time-varying attributes {they do not change over time}, whereas salary, departinent, and supervisor are time-varying actibutes (they may change over time) Each time-varying attribuce is represented as @ list of tuples , ordered by valid stare time. Whenever an attribute is changed in this mod, she current attribute version is clased and 2 new attribute version for chis attribute only is appended to the lst. This allows attributes ro change asynchronously. The current value for each attribute has teat for its vast_o0_vove. When using attribute versioning, it ts useful to include a lifespan temporal attribute associated with the whole object whose value is one or more valid time perieds that indicate the valid time of existence for the whule object. Logical delecion of the object is iunplemented by closing the lifespan. The constraint that any tine period of an aticibute within an objece should be a subset of the object's hiespan should be enforced, relariomal model (see Chapeer 23) tribure versioning ean alka be used in che 24.2 Temporal Database Concepts ‘lace Temporal_Salary ( attribute Date valid_star_time; attribute Date valid_end_ ime; attribute float ‘alan r class Temporal Dept t atiibute Date valid_star_ time, attribute Date valie_ond_time; ‘attribute Department VT dept : class Temporal Supervisor t attibute Date ‘vai_start_im atttibute Date vvaié_end timer atttibute Employes VT supervisor: ® lass Temporal_Litespan 1 atttbute Date valid_ start_time; attribute Dale ‘vali end. fim class Employee_VT {extent empleyens) { attibute list —_lifesoan: attribute string : atteibute string attribute ietcTemporal_Salary> attribute st The object lifespan would also include both valid and transaction time dimensions, The full capabilities of hitemporai databases can hence be available with ariribure versioning. Mechgrisms similar (o those discussed earlier for updating tuple versions can be applied to updating atcribute versions. 777 778 | Chapter 24 Enhanced Data Models for Advanced Applications 24.2.4 Temporal Querying Constructs and the TSQL2 Language So far, we have discussed how data models may be extended with temporal constnucts, We now give a brief overview of how query operations need to be extended for temporal que- rying. Then we briefly discuss the TSU? language, which extends SQL for querying valid time, transaction time, and bitemporal relational datebases. In nontemperal relational databases, che typical selection consdtions involve attribute conditions, and tuples that satisfy these conditions ate selected from the set of vorent tuples. Following thet, the attabutes of interest to the query are specified by a projctim operation (sce Chapter 5). For esample, in che query to retrieve the names af all employees working ia department 5 whcee salary i greater than 30000, the selection condition would be: CGaLary > 30900) 80 Cowo = 59) ‘The projected attribute would be wwe. In a temporal database, the conditions mey involve time in addition to attributes. A pure time condition involves only time—foe example, co selecr all employee tuple versions that were valid on a certain time point rot that were valid dena certain tiene poviad (v1, 72]. n chis case, the specified time period is compared with the valid time petiod of each tuple version Cr.vst. t.ver), and only those tuples that satisfy the condieion are selected. In. these operations, a period 3s considered to be equivalent to the set of time points ftom TH to 72 inclusive, so the stanclard set comparison operations can be used. Additional operations, such as whether ‘one time period ends before ancther starts are also needed Some of the more ecomsnce ‘operations used in queries are as follows: [uvSt, VET] INCLUDES [t1, 2] _——_Eauivalent to cl = cSt AND = cvET [evst, ever] INCLUDED_IN [c1, 12] Equivalent to cl = tVST AND 2 = tvET Ievst, cVET] OVERLAPS [t1, 2] Eguivalenc to (el = tVET AND 12 = vst) evst, ver] BEFORE [1,2] Equivalent co cl wVET [evst, VET] APTER (UL, 12] Eguivalene to t2 = tvsT (e-v51, cVET] MEETS_BEFORE [1,12] Equivalent to Ul = cveT + 1° {evST, CVET] MEETS_AFTER [eL, <2] Equivalent eo (2 + 1 = cvST Jn addition, operations ate needed to manipulace time periods, such as computing the ion or intersection of two time periods. The resulis of these aperarions may not themselves be periods, but rather temporal elements—a collection of one or more disjbn time periods such that no ro time periods in a temporal element ate directly adjacent 2. A complete set of eperations, Known as Allen's algebra, bas been detined for comparing tine atods. 23. This operation eeeums tue if he inesecton of ehe wo periods ft empty tbs alo bee coled unenseeTs. 2A, exe, (one) eles to one tine poine in the speciedgeanulyity: The HEETS operation ht cally specify i one petod stats immediately afer the other pend end 24.2 Temporal Database Concepts That is, for any two time periods (11, 12] and [13, 14] in a cemporal element, che following three conditions must hold: 11, 12] intersection [13, 14] is empty. +13 is not the time point following 12 in the given granulacity © T1is not the time point following 74 tn the given granularity The latter conditions ate necessary to ensuré unique representations of temporal clements. [Fewo time periods [T1, 12] and (13, 14] are adjacent, they are combined into a single time period [11,14]. This 1s called eoalescing of rime periods, Coalescing also combines intersecting time periods, To illustrate how pure time conditions can he used, suppose a user wants ta select all employee versions that were valid at any point during 2002. The appropriate selection condition applied to the relation in Figure 24.8 would be (T-vST, T:VET] OVERLAPS (2002-01-01, 2002-12-31] ‘Typically, most temporal selections are applied to the valid time dimension. For a bitempozal database, one usually applics the conditions to the currently correet tuples with uc as their transaction end times. However, if the query nccds to he applied to a revious databace state, an as_oF T clause is appended ro rhe query, which means tha the query és applied to the valid time tuples that were correct in the database at time T. In addition to pure time conditions, other selections involve attribute and time conditions. For example, suppose we wish to retrieve all evr st tuple versions for ployess ho worked in deprunen 3 a any Ure daring 2002 tn this case, che condition is AUTYST, T.VET] OVERLAPS (2002-0101, 2002-12-31]) AND (T.ONO= 5} Finally, we give a brief overview of the TSQL2 query language, which extends SOL with constructs for temporal databases. The main idea behind TSQL? is to allow users to specify whether a relation 1s nontemporal (thar is, a standard SQ relation) or temporal ‘The CREATE TARLE statement is extended with an optianal aS-clause to allow users «0 declare differene temporal options. The following aptions are available ‘a5 vauio stare {valid time relation with valid time period) 45 VALTO event (valid cime relation with valid time poine) “45 Teawsacrios (transection time relation with transaction time period) 4S VALIO STATE Aww TemisAcTEN (bitemporal relation, valid time period} 4s vALID EVENT aM Teansscrion (bitemporal relation, valid time point) “The keywonds STATE and EVENT are used to specify whether atime period or time point is sxsociated with the valid time dimension, In TSQL2, rather than have the user actually see how the temporal tables are implemented (as we discussed in the previous sections), the 7302 language adds query language constructs to specify various ypes of temporal selections, ‘temporal projections, cemporal aggregations, transformation among granulatities, and many other concepts. The book by Snodgrass et al. (1995) deseribes the language 779 780 | Chapter 24 Enhanced Data Models for Advanced Applications 24.2.5 Time Series Data ‘Time series data is used very often in financial, sales, and economics applications. They involve data values chat are recorded acconling 1 a specilic predefined sequence of tine points. They are hence a special type of valid event data, where che event time points are predetermined according o a fixed calendar. Consider the example of closing datly stock prices of a particular company on the New York Stock Exchange. The gtanulatty here s dy, but the days thae the stock market is open are known (nionholiday weekdays). Hlence, at has been common to specify 9 computational procedure thar calculates the particeir calendar associated with a time series. Typical queries on time series involve temporal aggregation over higher granularity incervals—for example, finding the average of thaxi- rum weekly closing stock grice oF the maximum anc minimum monchly closing steck price from the deity informacion, Asanother esample, consider the daily sales dollat amount at each store of a chain of stores owned by a particular company, Again, typical temporal aggregates would be retrieving the weekly, monthly, oF yearly sales from the daly sales information (using the sum aggregate function), or comparing same store monthly sales with previous monthly sales, and so on. Recause of the speciolized nature of time series data, and the lack of support ws older DAMS ithas been common to use specialized time series management systems rather than general purpose DEMSS for managing such information. [6 such systems it has ben ‘common to store time series values in sequential order in a fle, and apply specialized time series procedures co amalyze the information. The problem with chis approach is thatthe tall power of high-level querying in languages such as SQL wall not be available in such systems. Mote recently, some commercia! DBMS packages are offering time series extensions such as the time series databace of Infomnix Universal Server (see Chapter 22). In ackicion, the ‘TSQQ2 language provides some suppor fr time series in the form of event tables 24.3: MULTIMEDIA DATABASES Because the two copics discussed in this section ate very broad, we can give only a very Ibricf introduction to these fields. Section 24.3.1 introduces spatial databases, and Scerion 24.3.2 briefly discusses multimedia databases. 24.3.1 Introduction to Spatial Database Concepts Spatial databases provide concepts for databases chat keep track of objects in a mult: dimensional space. For example, cartographic durabases thar store maps inchide wo dimensions] spatial descriptions of their objects—from countries and states to tives. Cities, roads, seas, und so on, These applications are aso known as Geographical [nforma- tion Systems (GIs), and are used in ateas such as environmental, emergency, and battle ‘management. Other databases, such as meteorological databases for weather information, ‘are three-dimensional, since temperatures and other meteorological information are 24.3 Multimedia Databases | 781 teloced to chuee-dimensivnal spatial points, In general, a spatial database scores objects that have spatial characteristics that describe diem. The spatial relationships among the objects are important, and they are often needed when querying the database. Alshough a spatial databuse cen in general refer eo un sedimensionl space for any n, we will Limit our discussion to rw0 dimensions as an illustration, The main extensions thot are needed for spatial databases are models that can sncerpret spatial characteristics. In addiion, special indexing and storage structures are ofien needed to improve performance. Lec us frst discuss some of the model extensions for two-diinensional spatial databases. The basic extensions needed are to inchule two- dimensional geometric concepts, such as points, lines and line segments, citcles, polygons, and ates, in order wo specify the spatial characteristics of objects. tn addition, spatial operations are needed to operate on the objects’ spatial characteristies—for example, to compute the distance between two objects-—as well as spatial Boolean conditiony—for example, to check whether cwo objects spatially overlap. To illustrate, consider a Jarabuse chat is use for emergency management applications. A description of the spatial positions of many types of objects would be needed. Sotie of chese objects generally have static spatial characteristics, such as streets and highways, wacer pumps (for fire control), police stations, fire stations, and hospitals. Ocher objects have dymaraic spatial characteristics that change over time, such a police vehicles, ambulances, ot fre teucks. The following categories illustrate three typical types of spatial queries: 1 Range query: Finds the objects of a parcicular type thar are within a given spatial area ‘oc within a particular distance from & given location, (For example, finds all hospitals within che Dallas city area, or finds all ambulances within five miles of an accident location.) + Nearest neighbor query. Finds aa object of a particular type that is closest «a given Jocation. (For example, fines the police car that is closest to a particular location.) ° Spatial joms or overlays: Typically joins the objects of ewo types based on some spatial condition, such as the objects intersecting or overlapping sparially ar being within a certain distance of one another. (For example, finds all cities chat fall on a major highway o finds all homes that are within two miles ofa lake.) For these and other types of spatial queries ro he answered efficiently, special echniques for spatial indexing are needed, One ofthe best known techniques i the use of R-teees and their variations. R-arees group together objects that are in close spatial physical proximicy cu the same leaf nodes of a tree structured index. Since a leaf node can point «9 only a certain nurobyer of objects, algorithms for dividing the space into rectangular subspaces that include the objects ate newled. Typical criteria for dividing the space include minimizing the rectangle areas, since this would lead co a quicker narrowing of the search space. Problems such 2s having objects with overlapping spatial areas are handled in differen ways by the many different variations of R-trees. The intemal nodes of R-trees are associated with rectangles whose area covers all the rectangles in its suberee. Hence, R-tes can easily ‘answer queries, such as find all objects ina given area by limiting the tree search to those suberees whose cectangles incesece with the area given in the query 782 | Chapter 24 Enhanced Data Models for Advanced Applications Other spatial storage structures include quadirees andl their variations. Quadtrees generally divide each space or subspace into equally sized areas. and proceed with the subdivisions of each subspace to klenttfy the positions of various objects. Recently any newer sparial access structures have been peoposed, and chis area is still an active reveatch area, 24.3.2 Introduction to Multimedia Database Concepts Multimedia databases provide features that allow users to store and query different types ‘of multimedia information, which includes mages (such as photos or drawings), wo cls Goch as movies, newsreels, or home videos), eudio chips (such as songs, phone messages, or speeches), and documents (such as books or articles). The main types of database quecies chat are needed involve locating multunedia sources that contain certain objects of inter- est For example, one may want to lecaie all video clips ina video database that include cectain person in them, say Bill Clinton. One may also want 10 retrieve video clips based ‘on certain activities included in them, such as a video clips where a goal is scored in a soccer gain by a certain player or team, “The above types of queries are referred to as content-based retrieval, because the ‘multimedia source Is being retsieved based on its containing certain objects or activities. Hence, a multimedia database must use some model to organize and index the multimedia sources based on their contents. Idencifying the contents of multimedia soutces is 2 difficult and time-consuming task. There are two main approaches. The first is based on automatic analysis of che multimedia sources co identity certain mathematical characteristics of their contents, This approach uses. different techniques depending on the type of mulsimedia source (image, text. video, or audio}. The second approach depends en manual identification of the objects and activities of interesc in cach multimedia source and on using this information 1 index the sources. This approach can be applied to all the different multimedia sources, but it requires a manual preprocessing phase where a person has to scan each rnuleimedia source ro identify and catalog the objects and eetiviries it contains so that they can be used to index these sources In the remainder of this section, e will very briefly discus some of the characteristics of each type of multimedia source—images, video, audio, and text sources, in thar order. ‘An image is typically stored cither in raw form as a sctof pixel or cell values, oF in compressed forin to save space. The image shape descriptor describes the geometric shape of the raw image, which is typically a rectangle of eells of a certain with and height. Hence, each image can be represented by an m by n grid of cells. Bach cell conrains a pixel value that describes the cell content. [n blackfwhite images, pixels can be one bit In gray scale or color images, a pixel is multiple bits: Because images may require large amounts of space, they are often stored in compressed form. Compression standards, such ‘5 GIF oF 5986, use vorious mathematical transformations to reduce the number of cells stored but still maintain the main image characteristics. The mathematical transforms 24.3 Multimedia Databases | 783 that can be used include Discrete Fourier Transform (DFT), Discrete Cosine Transform (0€7), and wavelet transforms. To identify objects of interest in an image, the image is typically divided into homogeneous segments using a homogeneity predicate. For example, in a color image, calls that are adjacene to one another and whose pixel values ae close are grouped into segment. The homogeneity predicate defines the conditions for how to automatically group those cells. Segmentation and compression can hence identify the main characteristics of an image. A typical image database query would be to find images in the database that are similar €9 a given image. The given image could be an isolated segment that contains, say. a pattern of interest, and the query isto locate other images thar contain that same pattern. There are two main techniques for this rype of search, The fist approach uses a distance function c© compare the given image with the stored images ond their segments. I the distance value returned is small, the probability of a mateh is high. Indexes can be created co group together scored images that are elose in the distance rmettic so as to limit the search space. The second approach, called che transformation approach, measures image similarity by having a small nurmber of transformations thac can transforin one image’s cells so march the other iinage. Transformations include rotations, translations, and scaling. Although the latter approach is more general. ic is also more time consuming and difficul. A video source is typically represented as a sequence of frames, where each frame is a still image. However, rather than identifying the objects and activities in every individual fiame, the video is divided inco video segments, where each segment is made up of sequence of conriguous frames that includes che same objectsfactivicies. Each segment is ‘lentified by its starring and ending frames. The objects and activities identified in each, video segment can be used to index the segments. An indexing technique called frame segnent ees has been proposed for video indexing, The index includes buth objects, such, as perms, houses, cars, and activities, such a 4 person delivering a apecch or owe people talking. Videos are alvo often compressed using standands such as MES. A textidocument source is basically the fll texc of some article, book oF magacine These sources are typically indexed by identifying the keywords that appear in the text and theit relative frequencies. However, filler words are eliminated from that process Because there could be too many keywords when actenupting to index a collection of documents, cechniques have been developed to reduce the number of keywords to those that are mest relevant to the collection. A technique called singular value decompositions (vb), which is based on matrix transformations, can be used for this purpose. An indexing technique called telescoping vector tees, or TV-trees, can then be used to group simiar documents together, ‘Audio sources include stored recorded messages, such as speeches, class presentations, for even surveillance reconding of phone messages or conversions by law enforcement Here, diserete transforms can be used to identify the main charactenstics of a certain person's voice in order have similarity based indexing and retrieval. Audio characteristic features include loudness, intensity, pitch, and clarity. 784 | Chapter 24 Enhanced Data Models for Advanced Applications 24.4 INTRODUCTION TO DEDUCTIVE DATABASES 24.4.1 Overview of Deductive Databases Ina deductive detsbase system, we typically specify niles cheough a declarative language— language in which we specify what to achieve rather chan how to achieve it. An inference engine (or deduction mechanism) within the system can deduce new facts from the dats boase by interpreting these rules. The model used for deductive databases is closely related ta the relational dara model, seal particuterly eo che domain relarional caleulus formalism (sce Section 6.6). {¢'s als related to the field of logie programming and the Prolog language ‘The deductive database work based on legic has used Prolog.as a starting point. A variation ‘of Prolog called Datalog is used to define rules declaratively in conjunction with an existing set of relations, which are themselves created as licerals in the language. Although the lan- ‘auage structure of Datalog reserubles that of Prolog, its operational semnantics—that is, how a Datalog program isto be executed—is still different. A deductive database uses two main types of specifications: facts and cules. Facts are specified in 9 manner similar to the way relations are specified, except chat it net necessary to include the attribure names. Recall rhae a tuple in a relacion describes some real-world fact whose meaning is partly determined by the attribute names. In adedluctive database, the meaning of an actuibute value in a tuple is determined solely by its poston within che tuple. Rules are somewhat similar to relational views They specify vireul relations that are not actually stored but that can be formed from the facts by applying inference mechanisms kased on the rule specifications. The main difference between tule; and views is that rules may involve recursion and hence may yield virtual relations that cannot be defined in terms of basic relational views. ‘The evaluation of Prolog programs és based on a technique called backward chaining, hich involves a top-down evaluation of goals. In the deductive databases hae use Datalog, attention has been devoted 0 handling lange volumes of data stored in a relational database. Hence, evaluation techniques have been devised chat resemble chose for a bottom-up evaluation. Prolog suffers from the limication thac the order of specification of facts and rules is significant in evaluation; moreover, the onder of literals (defined later in Section 244.3) within a rule is significant. The execution techniques for Databy programs attempt to circumvent these problems, 24.4.2 Prolog/Datalog Notation The notation used in Prolog/Datalog is based on providing predicates with unique names ‘A predicate has an: iruplicit meaning, which is suggested by the predicate name, and a fixed nuraber of arguments. I the arguments are al constant valves, the predicate sirply states that a certain fact is true. Ian the ether band, the predicate has variables 2s argu ments, itis either consideted as a query of as part of a ule or constraint, Throughout this chapter, we adopt the Prelog convention that all constant values in a predicate are cither rumene or character strings: they are represented as identifiers (or names} staring wich lowercase eters ory, whereas variable names always stare with an uppercase liter 24.4 Introduction to Deductive Databases Consider the example shown in Figure 24.11, which is hased on the relational data- base of Figure 5.6, bur in a much simplified form, There are three predicate names: suer- ‘ise, superior, and subordinate. The supervise predicate is dehned via 9 set of facts, each, cof which has two arguments: a supervisor name, followed hy the name of a direct supervi- see (subordinate) of that supervisor. These facts correspond to the actual dara thae 1: stored in the dutahose, and they can be considered as constituting a set of tuples i are tion surenvise wich rwo atcribuces whose schema is surenvtse (Supervisor Supervise) Thus, supervisecX.Y) states the fact that "X supervises ¥." Notice the omission of the attribute names ia the Prolog notation. Attribute names are only represented by vir tue of the position of each argument in a predicate: the fin argument represents the supervisor, and the second argument represents a direct subordinate The other two predicate names are defined by rules, The main contribution of deduc- tive databases isthe ability to specify recursive rules, and to provide a framework for infer- ‘ing new information based on the specified rules. A rule ts of che for head : hody, where i Is read as “if and only if.” A rule usually has. single predicate co the left ofthe > symbol—called the head or left-hand side (LHS) ot conclusion of the nule—and one or ‘more predicates to the right of the :- symbol—called the hody or right-hand side (RHS} cr premise(s) of dhe rule, A predicate wich conscants as arguments is said co be ground wwe also refer to it as am instantiated predicate. The arguments of the predicates that appear in a rule typically inchide a number of variable symbols, although predicates can also contain constants as arguments. A rule specifies that, if a particular assignment oe binding of constant values to the variables in the body (RHS predheates) makes the RUS predicates true, it also makes the head (24S predicate) true hy using the some assignment ‘of constant values to variables, Hence, a rule provides vs with a way of generating new facts that ate instantiations ofthe head of the rule, These new factsare based on facts that o rc} Fa nat, toma superset . Suponfeetranklnamesh). suber isetranksn oye). searkin Teter -supervise(jennifer alicia) iets JV 7S supervse(jamesjennite). jot ramesh joyce ala subordinale(X,¥) = superorYX), Queries superorjames,¥)? uperiarfames,joyco)? FIGURE 24.11 (a) Prolog notation. tb) The supervisory tree. shined 785 786 Chapter 24 Enhanced Data Models for Advanced Applications already exis, comesponding to the instantiations (or bindings) of predicates in che boy ‘of che rule. Notice that by listing multiple predicates in the body of a rule we impleitly apply the logical and operator to these predicates. Hence, the commas between the RH ‘predicates may be read as meaning “and.” ‘Consider the definition of the predicate superor in Figure 24-11, whose fist argu ‘mene is an employee maine and whose second argument is an employee who is exther & director an indivect subordinate of the first employee. By indirect sebordimate, we mean the subordinate of some subordinate down to any number of levels. Thus superior (X.Y) stands for the face that “XK is a superior of Y” through ditect or indirect supervision, We can write two rules that together specify the meaning of she new predicate. The fist rile under Rules iu che figure states tht, for every value of X and ¥, i superviseX.2—die tule body~—is tue, then superior (X,Yi—the tule head—is also true, since ¥ would bea direct subordinate of X (at one level down). This rule can he used co generate all diret superios/sukordinate relationships from the facts chac define che supervise predicate. The second recursive rule states that, if supervise(X, 2) and superior(7, ¥) ate both tre, then superior(X,¥) is also true. This is an example of a recursive rule, where one of the rule body predicates in the RH isthe same as the rule head predicate in the LHS. In general, the tule body detines a number of premises such that, if they arc all tue, we can deduce that the conclusion in the rule head is also true, Notice ehae, if we have wo for more) rules with the same head (LHS predicate), itis equivalent to saying chat the pred cate is true (that is, thar it can he instantiated) if ether one of the hoses s true henee,t is equivalent to a logical or operation. For example, if we have two tules X i= ¥ and X s+ Z, they are equivalent to a rule X t= ¥ or Z. The latter fozm is noc used in deduc- tive systems, however, because it is not in the standart form of rule, called a Horn clause, as we discuss in Section 24.4.4 A Prolog system contains a number of built-in predicates that the system can inter pret ditectly. These typically include the equality comparison operator =(%.¥), whith retumns true if X ond Y are identical and can also be writen a3 X=Y by using the standard inkx notarion.’* Crher comparison operators for numbers, such as <, <=, >, and >=, can be treated as binary peelicates. Arithmetic functions such as +, ~, *, and f can be used as arguments in predicates in Prolog. in contrast, Daralog (in its basic form) does nat allow functions such as arithmetic operations as arguments: indeed, this is one of the main differences beeen Prolog and Datalog. However, later extensions to Datalog have been: proposed to include functions. A query typically involves a predicate syinbol with some variable arguments, and its meaning (or “anower”) is to deduce all che different constant combinations dat, when bound (assigned) tw the variables, can make the predicate true. For example, the fst ‘query in Figure 24.1% eequests the names of all subordinates of “james” at any level. A dif ferent type of query, which has only constant symbols as arguments, returns either tue for a fake result, depending on whether the acguments provided can he deduced from 25. Prolog system typically has a number of iforont equality predicates car howe ference precations, 244 Introduction to Dedutive Databases the facts and rules. For example, the second guery in Figure 24.11 retums true, since superior(janes, joyce) can he deduced. 24.4.3. Datalog Notation In Datalog, as in other logic-based languages, a program is built fiom basic objects called atomic formulas. [tis customary to define the syntax of logic-based langunges by descr {ng the syntax of atomic formulas and identifying how they can be combined to form a rogram. In Daralog, aromic formulas are literals of the form p(s a+ +++ ay), where pis the predicace name and nis the number of arguments for predicate p. Different predi cate symbols can have different numbers of arguments, and the mumber of arguments n of predicate p is sometimes called the arity or degree of p. The argurvents can he either €on- stant values or variable names. As mentioned earlier, we use the convention that con- stant values either are numeric or start wich a lowercave character, whereas variable names always start with an uppercase character. A umber of builtin predicates are included in Datalog, which can also be used to construct atomic formulas. The built-in predicates are of two main types: the binary comparison predicates <{1ess), <= (less_or_equal), »(greater), and > Cgreater_ ‘or_equal) over ordered domains; and the comparison predicates = (equal) and (not equal) over ordered or unordered domains. These can be used es binary predi- cates with the same functional syrrax as orher predicates—ior example by writing Jess(%, 3)—or they can be apecified by using the customary inlix notation Xe3. Notice that, because the domains of these predicates are potentially infinite, they should be used with care in rule definitions. For example, the predicate greater(X, 3), if used alone, generates an infinite set of values for X that satisfy che predicate {all mee- ger numbers greater than 3). A literal is either an atomic formula as defined carlier—ealled a positive literal—or an atomic formula preceded by not. The latte is a negated atomic formula, called a negae tive literal. Datalog programs can be considered t0 be a subset of the predicate calculus formulas, which are somewhat similar to the formulas of the domain relational calculus {sce Section 6.7). to Datalog. however, these formulas are first converted into what is known as clausal form before they are expressed in Daralog; and only formulas given if a restricted clause form, called Hom clanses* can be used in Dacalag. 24.4.4 Clausal Form and Horn Clauses Recall from Section 6.6 that a formula in the relational calculus is a coneltion chat includes predicates called cioms {based on relation names). In addition, a formula can have quantifiers—namely, tae tmiversal quanifer {for all) and the exstentic! quantifier 26 Nawed afer dhe wodlematicin Allved Hora 797 788 | Chapter 24 Enhanced Data Models for Advanced Applications {chere exists). In clausal form, a formula must be transformed iro another formula with the following characteristics: All variables in che formula arc universally quantified. Hence, itis not necessary « include the universal quantities (forall) explicitly; the quantifiers are removed, and all variables in the formula are implicitly quancified by the universal quancifie. * In clausal form, the formula is made up of « number of clauses, where each clause is composed of a number of literals connected by OR logical connectives only. Hence, each elause isa dsunction of lcetals *# The clauses themselves are connected by AND logical connectives only, to form a for ‘mula, Hence, the causal form of a formula isa conjimetion of clsuses ican be shown that ary formuda can be converted into clausal form. For our purposes, ‘ve are mainly interested in the form of the individual clauses, each of which is a dune. tron of literals, Recall thar literals can be positive lirerals or negative literals, Consider a clause of the form noU(F,) CR not(P;} OR... OR noHP,F OR Q, OR Q; OR. ..ORQ, mo “This clause has n negative literals and m positive literals, Such 2 clause can be ean formed into the following equivalent logical formula PL ANDP, AND... ANDP, => Q, OR Q)O8...08 Q, a ‘where => isthe implies symbol. The formulas (1) and (2) are equivalent, meaning thac thee truth values are always che same. This isthe case because ill the P, literals (t= 1, 2,.-..n) are tru, the formula (2) is trae only if at keast one of the (Q) is true, which is the meaning of the => (implies) syrabel, For formula (1), if all the PLierals (= 1,2, .-.1) are ue, ther negations areal ale; so in cis case formula (1) is crue only ifat lease one of the Q,$ 15 rue. In Datalog, rules are expressod as a restricted! form of clauses called Hawn clauses, in which a clause can contain at most one positive literal. Hence, a Hom clause is either of dhe form not(P,) OF noeP,} OR «OR HORLP,} OR Q 6 or ofthe fora not(P,) OR not{P,) OR. OR noe(P,) w ‘The Hor clause in (3) can be transformed into the clause P, ANDP) AND. -ANDP,=>Q 6 which is wricten in Datalog as the following rule Qe PLP Py 6 The Horn clause in (4) can be transformed into Py ANDP; AND... ANDP, => a which is written in Dacalog as follows PipPay es Py i 24.4 Introduction to Deductive Databases A Datalog rule, as in (6), is hence a Hom clause, and its meaning, based on formula (5), is that ifthe predicates Py and P) and... end P,, are all tae fora particular binding to their variable arguments, then i also true and can hence be inferred. The Datalog, expression (8) can be considered as an integrity constraint, where all the predicates must be true to satisfy the query. In general, a query in Dataleg consists of two components: * A Datalog program, which isa finite set of rules. A literal P(X,, Xo, Xqhe where each X, isa variable or a constant. A Prolog or Datalog system has an intemal inference engine chat can be used co process and compute the results of such queries. Prolog inference engines typically return one tesult to the query (that i, one set of vshues for the vattables in the query) at a time and ‘must be prompred to retam additional results. On the contrary, Datalog erurns results secat-atime. 24.4.5. Interpretations of Rules There ate two main attematives for interpreting the theoretical meaning of rules: proof theoretic and model-tearetic. In practical systems, the inference mechanism within a sys- tem defines the exact incerpretation, which may not coincide wich either of the two theo- retical interpretations. The inferenice mechanism isa computational procesture and hence provides a computational interpretation of che meaning of rules. In this section, we first discuss the two theoretical interpretations. Infereace tnechanisms are then discussed briefly asa way of defining che meaning of rules. In the proof-theoretic interpretation of rules, we consider the facts and rules to be ‘nue statements, or axioms. Ground axioms contaitt no variables. The facts are ground axioms that are given to be true. Rules ate called deduetive axioms, since they can be ued to deduce new fhets. The deductive axioms can be wed ro construct proofs that derive new facts from existing facts, For example, Figure 24.12 shows how to prove the fact superiorCjanes, afmad) fom the rules and facts given in Figure 24.11. The proof- theoretic interpretation gives us a procedural or computational approzeh for computing an answer 10 the Datalog query. The process of proving whether a certain fact (theorem) holds s known as zheonem proving. r .Y) = superviseO¥)- (rule 1} 2-supenor X.Y) = superselX.Z), superon ZY). (rule) 3. supermsejsnnter armas, {ground axiom, ven) 4 superweeyamesyernife, {ground axiom. ven} 5. syperiorgennieranmad). {appv rule 1073) 6: superorjames anmad}. (apoty rule 2 an 4 and 5} FIGURE 24,12 Proving anew fact. 789 790 Chapter 24. Enhanced Data Models for Advanced Applications The second type of interpretation is called the model-theoretic interpretation, Here, given a five or an infinite domain of constent values” we assign to a predicate every possible combination of values as arguments. We must then determine whether the predicate is crue or false. In general, itis sufficient to specify the combinations of arguments that make the predicate true, and to state that all other combinations make the predicate false. If chis is done for every predicate, itis called an interpretation of the sec of predicates, For exemple, consider the incerpteration shown in Figure 24.15 for the predicates supervise and superior. This interpretation assigns a truth valve (trie 01 false) 10 every possible combination of argument velues (ftom a finite demain} for the two predicates, ‘An incerprecation is called 2 model for a specific set of rues if those rules are akecy: true ursler chat interpretation; that i, for any values assigned to the variables in the rules the head of the rules is true when we substitute the truth values assigned t0 the predicates Rules superor(X.¥} = suparvise(X Mh ssuperor(X.¥) = supanvse(X Zi superion!2.¥), Interpretation row Facts: supervisettranklin john) i true suporvise(tanklin ames) is tue “supervise(rankin joyce) true, ‘superviso(annitor alia} is rus. superviseljanniler aad) s true. “suporvieeijames, franklin) ie tru. sunervselfamas orto is true. supervise(X.¥) 1s fala for al other posible (XY) combinations Derived Facts: superoctankln john} i true. ‘superior(renkinramesh) (5 true. superiocankin joyes) is tue. supericjaniferatcia) is tr. ‘superar (ennifer mad) is Gwe. ‘superior(james;fenkln is true. superior (james janet) i true ‘superior (jaros john) is true, ‘superior (james,ramesh) is tue. “superior (james joyce) stra superior (james, aca is tre. ‘superiorjamesahrrad) is rue. ‘superior (X,¥} false for all ther possible (X.¥} combinations. FIGURE 24.13 An interpretation that is a minimal model. 27, The most commonly chosen domain is knite ou is alle the Herbal Unive

You might also like