You are on page 1of 5

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 18, ISSUE 2, MAY 2013

Combining Temporal Abstraction and Integrity Constraint Checking in Clinical Databases


Chung Pham Van and Tuan Anh Duong
AbstractThis paper proposes a new framework for temporal abstraction (TA) in which there is a tight coupling between temporal integrity constaint checking (TICC) and temporal abstraction. The TICC stage not only ensures the consistency of the raw data stored in the temporal database but also builds the basic and prepares appropriate datasets for the TA stage. The novel approach has been applied to the clinical database system for monitoring the treatment of patients who have colorectal cancer. Index Terms Inference graph, temporal abstraction, temporal integrity constraint checking, transition graph.

1 INTRODUCTION

In a temporal database, the information that is stored includes temporal attributes stating when the information is valid. Temporal abstraction (TA) is an approach to provide short, informative, context-sensitive summaries, at various levels of abstraction, of time oriented data stored in a temporal database. Meaningful summaries include abstractions that hold over both timepoints and time intervals. Abstractions can be performed on the non-temporal attributes or on the temporal attributes. The use of temporal abstraction is important especially for decision-support applications which require more abstract data, while databases usually contain primitive data. The application of temporal abstraction in clinical information systems which can help the medical doctors in monitoring, diagnosis and treatment of patients has been studied in several research works. Typical works in this direction include the knowledge-based temporal abstraction (KBTA) framework ([10]), methods for context-sensitive and expectation-guided TA ([8]), methods for combining statistical and probability techniques with TA ([1], [2]) and methods for combining TA and data mining ([6],[7]). In this paper, we propose a new approach for TA which has a tight coupling with the temporal integrity checking stage. Specifying and monitoring temporal integrity constraints is essential for every actual temporal database system in order to ensure its correctness and reliability. In our proposed framework, the TICC stage not only ensures the consistency of the data stored in the temporal database but also builds the basis and extracts appropriate datasets for the TA stage. This data preparation step is very important for TA since the data input for temporal abstraction algorithms should be valid and in a suitable form required by TA task.

The method has been applied in the field of monitoring the treatment of patients who have colorectal cancer at the Ho Chi Minh City Cancer Hospital. Experiment results on real dataset show that the performance of our TA system is quite encouraging. The organization of the paper is as follows. Section 2 explains transition graphs and temporal integrity constraint checking in temporal databases. Section 3 describes inference graph as the main technique in the implementation of our TA system. Section 4 presents the data preprocessing for temporal abstraction. Experimental results are reported briefly in section 5. Conclusions are given in section 6

2 CHECKING TEMPORAL INTEGRITY CONSTRAINT IN TEMPORAL DATABASES


Actual temporal database system implementations need an integrity constraint checking method to assure the consistency of the database. In order to monitor temporal integrity constraints (TICs) in temporal databases, a method was proposed by Gertz and Lipeck ([5]) to utilize transition graphs which describe the admissible lifecycles of objects and can be constructed from TICs. Inspired by the approach proposed by Gertz et al., we develop some extensions to their framework on how to check some more complicated forms of TICs in which repeated states are allowed in the state sequences of objects.

2.1 Transition Graph Assume the temporal integrity constraint stating that A patient who has colorectal cancer is hospitalized. He may be treated by surgery (s1). After surgery one month, he can go to a new watching state in which he must be watched in a period of time (s2). If he is in (s2) and if his current Carcino Embryonic Antigen (CEA) value is lower than that of the last month, he can go a new watching Chung Pham van is with the Faculty of Information Technology, HCM state (s3) in which his CEA is measured monthly. If he is Industry University, Vietnam. Tuan Anh Duong is with the Faculty of Computer Science and Enginner- in (s3) and his CEA value in current month after at least ing, HCM city University of Technology, Vietnam. three months in the state (s3) is lower than or equal to that of the previous measure, he can go to a new watching

state (s4) in which his CEA is measured for every three months or six months. If he is in (s4) and his CEA value is normal and after 5 years of watching, he can be considered as recovered. If he is in one of the three states (s2), (s3) or (s4) and his CEA value becomes higher than triple of the previous measure, he can move to the state called metastasis. If he is in (s4) and his CEA value increases but it is lower than his previous measure, he can move back to the watching stage (s3) This TIC comes from the simplified therapy plan for treating colorectal cancer. The constraint can be represented as a transition graph given in Fig. 1. The transition graph corresponds to the description of admissible lifecycles of colorectal cancer patients being treated at the hospital. The graph consists of five states and s5 is the final state. The sentence (label) at an edge of the graph represents the condition that must be hold for producing the corresponding state transition.
(1 ) Surgery
After one month

y 3v where v is previous measure

(6 ) Metestasis

z 3m where m is previous measure

k 3n where n is previous measure

(2) Measure CEA value y


y CEA value x where x was measure in last month

labels successively hold in the states of the sequence. The main advantage of using transition graphs is that we can reduce analysis of state sequences to checks on state transitions. Definition 1 (State sequence) A state sequence ! is an ordered list of vertices which object can be moved from initial vertex to final vertex in transition graph. For example, ! = (s0, s1, s1, s2) that is, the object is in initial vertex s0, then moves to ordered vertices corresponding to s1, s1 and final vertex s2. To be admissible with respect to a TIC, a state sequence must satisfy some certain requirements, which are described by the three following definitions. Definition 2 (Condition on Temporal Ordering) Consider adjacent vertices i and j in a transition graph (i.e. these is an outgoing edge from i to j). Assume that valid-times of a given object at vertices i and j are [t1, t2] and [t3, t4]. If t3t2, then we say [t1, t2] and [t3, t4] satisfy condition on temporal ordering. Definition 3 (Condition on Object Lifecycle) An update to object at vertex j is accepted, if the object exists in current vertex i and there is outgoing edge from i to j, or the object does not exist in any vertex and vertex j is initial vertex in transition graph. Otherwise the condition on object lifecycle is violated. Definition 4 (Condition on Continuous State Sequences). Let ! = (s1, s2 , . . ., sn) be a state sequence. If an update causes to miss one or more states belonging to ! (except the final vertex sn), then this update violates the condition on continuous state sequences. The Repeated Vertex in States Sequence of Objects Consider the transition graph in Fig. 1. The paths on this transition graph may consist of many cycles. Checking TIC in this situation is quite challenging. To solve this problem, we propose a technique using a positive integer t called frequency-value. The method is summarized as follows: - A lifecycle of an object # begins when it moves from the initial vertex. At that moment t is assigned to a frequency-value 1. - When object # reaches to a new vertex, the value of t does not change. - Object # moves back to some old vertex (old vertex is the vertex to which the object has moved before), the value of t is incremented by 1. Creating tables to represent Transition Graph We need a group of small tables that represents a given transition graph internally. These tables are automatically created and their data are derived from the transition graph, when a TIC being input to the system by user. Their structures are explained as follows. VERTEX table consists of Vname column that keeps names of vertices, and Vpos column that distinguishes vertices as initial, final or ordinary. TRANSITION_STATE table includes current vertex (Curr_state), label of outgoing edge (Label) and ingoing vertex (Trans_state) of the object. The data in the two tables VERTEX and TRANSITION_STATE are as follows. VERTEX(V_ID, Vname, Vpos)

(4) Measure CEA value k for every 3 or 6 months


k increases and k< h where h is previous measure

(3 ) Measure CEA value z monthly

u z and z is at least the third measure, u is previous measure

(5) Recovery
CEA value is normal and watching-time > 5 years

Fig. 1. An example of transition graph

A transition graph here is required to be deterministic, i.e. at each node the labels of different outgoing edges must exclude each other. Furthermore, in this work we support some more complicated forms of TICs, which corresponds to deterministic transition graphs that allow cycles, i.e. there may be repeated vertices in a state sequence of objects. A transition graph can be used to analyze a state sequence by searching a corresponding path, whose edge

TRANSITION_STATE( Cur_State, Label, Trans_State) Besides, for temporal modifications, we have to maintain another table: OBJECT_POS. This table shows the current vertex (Vertex_To), previously passed vertex (Vertex_From) and frequency-value (Times) of the object under consideration (P_ID). This table must be updated whenever an object is updated. OBJECT_POS (P_ID, Vertex_From, Vertex_To, Times)

2, 3, and 4 for detecting any violation. These conditions only require checking the previous vertex and the next vertex of the updated vertex. This way of checking allows a remarkable simplification of TIC monitoring. We have developed the procedure for checking TIC. For more details about this procedure, readers can refer to [9].

2.2 Checking Integrity Constraints using Transition Graph Now we outline the process of checking TICs in a temporal database, based on transition graphs. Any temporal modifications on objects of the data model may include a valid time interval [vs, ve] where vs is the start value of valid time and ve is the end value of valid time. Here an update at a vertex means updating a tuple of a table which keeps the data related to an object at the state represented by that vertex. So, for shorter expression, we just say an update at a vertex. Checking TICs involves with update operations such as insert, update, or delete and these operations work with one vertex at a time. Hence, we have to distinguish three cases of update operations.
Insert-operation An object Ob can be inserted at a vertex s of a transition graph. According to where the vertex s is in the transition graph, there are different cases for checking the integrity constraint at s as follows. Let look at Fig. 3. - If Ob is inserted at the initial vertex s1 and Ob has not yet existed at any vertex, there is no need of TIC checking. - If Ob exists at vertex i and there is an insert-operation at the vertex i, check the condition on temporal ordering (definition 2). - If Ob is inserted to vertex j when Ob already exists at vertex i. If there is an edge connecting from i to j, check whether it satisfies the label on that edge and the condition on temporal ordering between i and j, if it does not satisfy, the insert-operation is rejected, otherwise, increase the frequency-value by 1 (because Ob has existed at i) and accept the insert-operation. - If Ob is inserted at vertex j, and Ob already exists at vertex j and j is the final vertex and there is no cycle at this vertex and there is no outgoing edge from this vertex, the insert-operation is rejected. Delete-operation An object can be deleted at a given vertex (i.e., a tuple being deleted). This operation must be checked against the condition on continuous state sequences. If the condition is not satisfied, the delete-operation is rejected. Update-operation The value of a non-temporal attribute of an object might be updated at any vertex. The update-operation can be achieved through a sequence of a delete-operation and an insert operation. In this case, we must check TICs against the two operations one by one. Checking TICs in temporal databases is performed after occurrence of an update into a temporal table. We check the admissibility conditions given in the definitions

3 TEMPORAL ABSTRACTION USING INFERENCE GRAPH


3.1 Background Basing on the KBTA framework developed by Shahar ([10]), we introduce briefly in this section the basic TA methods as well as the knowledge required in the TA process. The KBTA framework decomposes the TA task into five parallel subtasks: temporal context restriction, vertical temporal inference, horizontal temporal inference, temporal interpolation and temporal pattern matching. For each subtask, there is a corresponding TA method or mechanism. Some of the above-mentioned TA tasks are explained as follows. Vertical temporal inference. This method involves creation of abstractions by inference from contemporaneous propositions into higher-level concepts. It applies contemporaneous abstraction mechanism. For example, two (primitive) parameter intervals with value low in [1, n] and with value high [m, k] are abstracted into one parameter interval with the value grade 2 in [m, n] ([m, n] is the intersection of [m, k] and [1, n]). Horizontal temporal inference. This method involves creation of abstraction by inference from similar-type propositions, attached to different time intervals. It requires that the intervals are not disjoint but different in temporal span. In a special case, we can abstract a parameter point in the interval ti into itself in another interval tj where ti tj. Temporal interpolation. This method involves creation of longer parameter intervals by bridging gaps between similar-type disjoint point or interval-based propositions. This method can bridge the gap only if the gap has an acceptable length. 3.2 Knowledge Base for Temporal Abstraction Knowledge required by TA consists of the rules which specify how data objects change from one state to another and the inference rules which are used in temporal abstraction process. The first group of rules are called state transition rules. The second group of rules are called TA inference rules. $ State transition rules. Rules of this kind have the form of if..then clauses. For example, the rule IF (CEA level < 5) and (the total time of watching 5 years) THEN recover is the rule that enables the cancer patients to move to recover state. $ TA inference rules. These rules are specified using the syntax of the Temporal Abstraction Rule language TAR proposed by Boaz et al ([3], [4]). The TAR rule looks like a rule in deductive database with some differences.

3.3 Inference Graph From a transition graph, we can build an inference graph by attaching some TA inference rules at each vertex of the transition graph. As the inference graph is more domain-specific, a TA system based on it might perform better than the single, generic algorithm that interprets the inference rules. The inference graph for a TA application has the following characteristics. The inference graph contains sufficient (and not redundant) knowledge required by the TA inference process. It helps to simplify the TA inference since the rules and data were distributed on the vertices of the inference graph and the knowledge as well as data at a particular vertex is sufficient for the temporal abstraction process at that vertex. Data at any vertex in the inference graph are valid and consistent at the time interval under consideration due to the preceding TICC stage. The termination of the TA inference process can be ensured since the inference graph is deterministic and the set of rules attached to the inference graph is a well-defined knowledge base.

graph can be viewed as a knowledge base which is represented in a form of a deterministic graph. For a particular TA application, we can acquire this knowledge from experienced experts. The inference graph for TA applied to colorectal cancer patient data is shown in Fig. 2. It is built from the transition graph in Fig. 1. In Fig. 2, the rules r1, r2, , r13 are TA inference rules attached at the vertices of the inference graph. On the edges of the graph are state transition rules.

y 3v where v is previous measure

(1 ) Surgery

r1 , r2
z 3m where m is previous measure

After one month

(6 ) Metestasis

r3 , r4
k 3n where n is previous measure

(2) Measure CEA value y

r6 , r7 , r8

y CEA value x where x was measure in last month

(4) Measure CEA value k for every 3 or 6 months

r9, r10, r11

(3 ) Measure CEA value z monthly

3.4 Temporal Abstraction Let V= { s1, s2, . . . , sn} be the set of vertices in the inference graph and s1, sn be respectively the initial vertex and the final vertex of the graph. In our approach, during data collection or update, the TICC task also decomposes the original database table into several small tables in such a way that each small table stores only the time-stamped data related to each vertex in the inference graph. After this data preprocessing has been done, the TA task related to a given object Ob during the time interval from ti to tj needs to perform two steps: data retrieval step and temporal abstraction step. Data Retrieval Step. This step retrieves the data related to the object Ob during the time interval from ti to tj on the inference graph. This data retrieval step traverses from the initial vertex in the graph to gather all the data of Ob at the vertices along its state sequence to eventually the vertex at which the object terminates its state transitions. All the data collected in this step are stored in a temporary table T. Temporal Abstraction Step. Data collected in the data retrieval step is kept in ascending order of valid times. Then all the TA inference rules attached at each vertex in the inference graph are applied to perform TA mechanisms on the collected data in the temporary table T. For each rule, this step performs reasoning on all the relevant parameters. All the abstracted facts are stored in the temporary T and the answer of the TA query is composed from these generated facts.

r5

k increases and k< h where h is previous measure

4 TEMPORAL DATA PREPARATION FOR TEMPORAL ABSTRACTION


In some fixed time interval, objects data will be changed according to their state sequences. In other words, they will be changed according to the temporal business rule in their organization. For data preprocessing before TA, we propose an algorithm for decomposing underlying temporal data with only one temporal business rule (i.e. one transition graph). The transition graph has n +1 vertices, with the initial state s0 , the final state sn, and intermediate states s1, s2, s3 , . . . , sn-1. The algorithm 4.1 is based on the transition graph which is used for checking the TIC for every object. Therefore the algorithm can find errors in temporal data when they violate some constraints. The algorithm will terminate because the number of vertices in the transition graph are limited.

u z and z is at least the third measure, u is previous measure

r12, r13
(5) Recovery

CEA value is normal and watching-time > 5 years

Fig. 2. An example of inference graph

The finiteness of a well-defined knowledge base written in TAR language is proved in ([3], [4]). Inference on the inference graph follows an explicit path which is a state transition sequence related to a given object. During inference process, at the vertex where the object will not change state any more, the inference terminates and does not have to scan all the remaining vertices. Inference

Algorithm 4.1 ( Decomposing temporal data) Input: table E, a temporal integrity constraint is represented by transition graph TS . Output: the new tables that are decomposed from E. 1- Create table Temp is similar E but at first Temp is empty. 2- Create table # which contains all the objects in E, each object in # is unique. 3- With each state si of transition graph, create a new table with the same name which contains the data for all the objects at state si. 4- For each object of #, do the following steps: 4a- Take out all objects data from E, sort by valid-time, insert them into table Temp 4b- Check objects data in Temp to enforce a sequence of states in TS - if not, they can be repaired (or removed from E) before go to step 4c. 4c- Take out all objects data of state si from Temp, and transfer them to new table si. Let k be the number of objects in the object table # and n be the number of records in underlying temporal table E. The time complexity of the Algorithm 4.1 is T(n) = O(k*n). In the context of clinical databases, k is quite small in comparison to n, the complexity of the algorithm is almost linear.

stage. The TICC stage not only ensures the consistency of the data stored in the temporal database but also build the basis and creates appropriate datasets for the TA stage. This data preparation step is very important for TA since the data input for temporal abstraction algorithms must be valid and in a suitable form required by TA task. The TA approach in our work differs from previous TA approaches mainly in two features: (1) an advantageous data preprocessing step which has been done by a preceding integrity constraint checking task and (2) the inference graph for TA task derived from the transition graph built in TICC stage and (3) the domain-specific and deterministic properties of the inference graph which help to make TA process efficient. None of the previous work on TA, except the work by Horn et al ([ ]), has applied a tight combination between TICC and TA. Through this work, we believe that data validation should not be separated from temporal abstraction in clinical databases and we can gain remarkable benefits from the tight coupling between them.

REFERENCES
[1] Bellazzi, R., Larizza, C., Riva, A, Temporal abstraction for preprocessing and interpreting diabetes monitoring time series, Proc. of Workshop on Intelligent data analysis in Medicine and Pharmacology (IDAMAP 97), pp 1-9 (1997) [2] Bellazzi, R., Larizza, C., Magni, P., Montani, S.: Stefanelli, M., Intelligent analysis of clinical time series: an application in diabetes mellitus domain, Artificial Intelligence in Medicine, 20(1), pp. 35-57 (2000). [3] Boaz, D., M. Balaban, M. and Y. Shahar, Y., A TemporalAbstraction Rule Language for Medical Databases, Proc. of the Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP '03), Protaras, Cyprus, pp. 67-73 (2003). [4] Boaz, D. and Shahar, Y., A Framework for Distributed Mediation of Temporal-Abstraction Queries to Clinical Databases, Artificial Intelligence in Medicine, 34(1), pp. 3-24 (2005). [5] Gertz, M., Lipeck, U.W., Temporal Integrity Constraints in Temporal Databases, Proc. of the International Workshop on Temporal Databases , J. Clifford, A. Tuzhilin (Eds.), Sep.1995, Workshops in Computing, Springer-Verlag, Berlin, pp. 77-92 (1995). [6] Ho, T.B., Nguyen, D.D., Kawasaki, S., Nguyen, T.D., Extracting Knowledge from Hepatitis Data with Temporal Abstraction, IEEE Conference on Data Mining, Workshop on Active Mining, Maebashi, Dec. 9-12, pp. 91-96 (2002) [7] Ho, T.B., Kawasaki, S., Le, S.Q., Tran, T.N., Takabayashi, K., Yokoi, H., Combining Temporal Abstraction and Data Mining to Study Hepatitis, ECML/PKDD 2004 Workshop on Discovery Challenge, Pisa 20-24 Sep. 2004, pp. 155-167 (2004) [8] Horn, W., Miksch, S., Egghart, G., Popow, C., Paky, F.: Effective Data Validation of High-frequency Data: Time-Point, TimeInterval, and Trend-Base Methods. Computers in Biology and Medicine, 27(5) (1997) [9] Pham, V.C., Checking Temporal Integrity Constraints and Temporal Abstraction in Temporal Clinical Databases, Ph. D. Dissertation, Faculty of Computer Science and Engineering, HCM City University of Technology, March (2008) [10] Shahar, Y., A framework for knowledge-base temporal abstraction. Technical Report, Section on Medical Informatics, Stanford University School of Medicine (1995)

5 EXPERIMENTS
We have implemented a KBTA system for monitoring the treatment of patients who have colorectal cancer. The patient data consist of 1597 records related to 73 patients. The data have been gathered in five contiguous years from 1994 to 2001 in HoChiMinh City Cancer Hospital. The original data has been decomposed into 11 small tables with average 147 records in each small table. The set of inference rules for this TA application consists of about 36 rules. The user can input a TA query related to a given patient. If the query is syntactically correct, the TA system will provide the TA result which shows the concise summarized findings related to that particular patient. This abstracted information is very helpful to the physicians for decision making and to the patient as well. The TA system integrates into a temporal query system and an integrity constraint checking module which are all built on top of a temporal clinical database implemented in an Oracle DBMS. The whole system, called TDM, has been described in more details in [9]. All the modules in the TDM system are implemented in Visual C++. Experimental results on real data from a clinical database in HoChiMinh City Cancer Hospital show that the performance of our TA system is quite encouraging. One TA query processing on the synthetic dataset (with 100000 records) takes about 7 seconds on average with a Pentium IV 2.4 GHz, 512 MB RAM PC.

6 CONCLUSION
The paper has outlined a new approach for TA which has a tight coupling with the temporal integrity checking