Knowledge Modules (KM), components of Oracle Data Integrator’s Open Connector Technology, are generic, highly specific and reusable

code templates which defines the overall data integration process. Each Knowledge Module contains the knowledge required by ODI to perform a specific set of actions or tasks against a specific technology or set of technologies, such as connecting to this technology, extracting data from it, transforming the data, checking it, integrating it, etc. Oracle Data Integrator provides a large number of knowledge modules out-of-the-box. Knowledge Modules are also fully extensible. The code can be opened and edited through a graphical user interface to implement new integration methods or best practices (for example, for higher performance or to comply with regulations and corporate standards). ODI Uses Six Different Types of Knowledge Modules 1. RKM (Reverse Knowledge Module) are used to perform a customized reverse-engineering of data models for a specific technology. It extracts metadata from a metadata provider to ODI repository. These are used in data models. A data model corresponds to group of tabular data structure stored in a data server and is based on a Logical Schema defined in the topology and contain only metadata. LKM (Loading Knowledge Module) are used to extract data from heterogeneous source systems (files, middleware, databases, etc.) to a staging area. These are used in Interfaces. An interface consists of a set of rules that define the loading of a datastore or a temporary target structure from one or more source datastores. JKM (Journalizing Knowledge Modules) are used to create a journal of data modifications (insert, update and delete) of the source databases to keep track of changes. These are used in data models and used for Changed Data Capture. IKM (Integration Knowledge Module) are used to integrate (load) data from staging to target tables. These are used in Interfaces. CKM (Check Knowledge Module) are used to check data consistency i.e. constraints on the sources and targets are not violated. These are used in data model’s static checks and interfaces flow checks. Static check refers to constraint or rules defined in data model to verify integrity of source or application data. Flow check refers to declarative rules defined in interfaces to verify an application’s incoming data before loading into target tables. SKM (Service Knowledge Module) are used to generate code required for data services. These are used in data models. Data Services are specialized web services that enable access to application data in datastores, and to the changes captured for these datastores using Changed Data Capture.

2.

3. 4. 5.

6.

Loading Knowledge Module (LKM) The LKM is used to load data from a source datastore to staging tables. This loading comes into picture when some transformations take place in the staging area and the source datastore is on a different data server than in the staging area. The LKM is not required when all source datastores reside on the same data server as the staging area. An interface consists of a set of declarative rules that define the loading of a datastore or a temporary target structure from one or more source datastores. The LKM executes the declarative rules on the source server and retrieves a single result set that it stores in a "C$" table in the staging area using defined loading method. An interface may require several LKMs when it uses datastores from heterogeneous sources. LKM Loading Methods Are Classified as Follows 1. Loading Using the Run-time Agent - This is a standard Java connectivity method (JDBC, JMS, etc.). It reads data from source using JDBC connector and writes to staging table using JDBC. This method is not suitable for loading large volume of data set as it reads data as row-by-row in an array from source and writes to staging as row-by-row in a batch. Loading File Using Loaders - This method is used to leverages most efficient loading utility available for the staging area technology (for e.g. Oracle’s SQL*Loader, Microsoft SQL Server bcp, Teradata FastLoad or MultiLoad) when the interface uses Flat file as a source. Loading Using Unload/Load - This is alternate solution for run-time agent when dealing with large volumes of data across heterogeneous sources. Data from sources can be extracted into flat file and then load the file into staging table. Loading Using RDBMS-Specific Strategies - This method leverages RDBMSs mechanism for data transfer across servers (e.g. Oracle’s database links, MS SQL Server’s linked servers, etc.)

2.

3. 4.

A Typical LKM Loading Process Works in the Following Way

1. The loading process drops the temporary loading table C$ (if it exists) and then creates the loading table in the staging area. The loading table represents a source set i.e. the images of the columns that takes part in transformation and not the source datastore. It can be explained with a few examples below:

   

If only a few columns from a source table is used in a mapping and joins on the staging area, then loading table contains images of only those columns. Source columns which are not required in the rest of the integration flow will not appear in loading table. If a source column is only used as a filter constraint to filter out certain rows and is not used afterward in interface, then loading table will not include this column. If two tables are joined in the source and the resulting source set is used in transformations in the staging area, then loading table will contain combined columns from both tables. If all the columns from a source datastore are mapped in interface and this datastore is not joined in source, then the loading table is an exact image of source datastore. For e.g. in case of a File as a source.

2. Data is loaded from the source (A, B, C in this case) into the loading table using an appropriate LKM loading method (run-time agent, RDBMS specific strategy). 3. Data from loading table is then used in the integration phase to load integration table. 4. After the integration phase, and before the interface completes, the temporary loading table is dropped. LKM Naming Convention LKM <source technology> to <target technology> [(loading method)] Oracle Data Integrator provides a large number Loading Knowledge Modules out-of-the-box. List of supported LKMs can be found in ODI Studio and also can be seen in installation directory <ODI Home>\oracledi\xml-reference. Below are examples of a few LKMs Integration Knowledge Module LKM File to SQL Description Loads data from an ASCII or EBCDIC File to any ISO-92 compliant database. LKM File to MSSQL (BULK) Loads data from a file to a Microsoft SQL Server BULK INSERT SQL statement. LKM File to Oracle (EXTERNAL Loads data from a file to an Oracle staging area using the EXTERNAL TABLE) TABLE SQL Command. LKM MSSQL to MSSQL (LINKED Loads data from a Microsoft SQL Server to a Microsoft SQL Server SERVERS) database using Linked Servers mechanism. LKM MSSQL to Oracle (BCP Loads data from a Microsoft SQL Server to Oracle database (staging SQLLDR) area) using the BCP and SQL*Loader utilities. LKM Oracle BI to Oracle Loads data from any Oracle BI physical layer to an Oracle target (DBLINK) database using dblink. LKM Oracle to Oracle (datapump) Loads data from an Oracle source database to an Oracle staging area database using external tables in the datapump format.

So. Existing rows are updated and non-existence rows are inserted using Natural Key defined in interface. IKM loads the records from “I$” table to target table using the defined integration mode (control append.Rows are append to target table. IKM then writes the result set directly into target table (in case of “Append” integration mode) and into an Integration table “I$” (in case of more complex integration mode. 3. . ODI Supports the Integration Modes Below     Append . This is useful for example when a fact or transaction rows that reference an INTEGRATION_ID of a dimension that may not exist in previous run of interface but is available in current run. 1. 4. Typical Flow Process 1. 2. C$ tables loaded by LKM in case of remote datastore on a separate data server than staging area) into the target datastore depending on selected integration mode. IKM can also be configured to recycle rejected records from previous runs from error table “E$” to integration table “I$” by setting property RECYCLE_ERRORS in the interface before calling CKM.Used for performing insert and update. When Staging is on the Same Data Server as Target This is useful to perform complex integration strategies. Incremental update. SCD) before loading into target. It is possible to delete all rows before performing an insert by setting optional truncate property. but in addition data flow can be checked by setting flow control property. Incremental Update. For e. When staging is on a different server than target (also referred as multi-technology IKMs).) After completion of data loading.e. Data in this table are flagged for insert/update. Existing records are not updated. When staging is on same server as target. implementing technology specific optimized integration methods before loading data into target. 5. data might be inserted.. Slowly Changing Dimension .Integration Knowledge Module (IKM) IKM takes place in the interface during an integration process to integrate data from source (in case of datastore exists in the same data server as the staging area) or loading table (i.Does same operation as Append. IKM executes a single set-oriented SQL based programming to perform staging area and target area declarative rules on all “C$” tables and source tables (D in this case) to generate result set. updated or to capture slowly changing dimension. recycling rejected records from previous runs.g. along with checking flow control. Control Append . IKM drops temporary integration tables. IKM Integration Process Works in Two Ways 1. Incremental Update . etc.Used to maintain Type 2 SCD for slowly changing attributes. IKM can optionally call CKM to check the consistency of target datastore. transformed and checked against constraint to identify invalid rows using CKM and load erroneous rows into “E$” table and removed them from “I$” table. Integration table or flow table is an image of the target table with few extra fields required to carry out specific operations on data before loading data into target. A flow control is used to check the data quality to ensure that all references are validated before loading into target. 2. the error record becomes valid and need to be reapplied to target.

IKM Oracle Incremental Update IKM Oracle Slowly Changing Dimension IKM Oracle Multi Table Insert . Erroneous data can be isolated into an error table and can be recycled in next execution of interface. using a multi-table insert statement. it is possible to synchronize deletions Integrated data into Oracle target table by maintaining SCD type 2 history. IKM then writes the result set directly into target table using defined integration mode (append or incremental update). When Staging is on a Different Data Server Than Target This configuration is mainly used for data servers with no transformation capabilities and only simple integration modes are possible. IKM executes a single set-oriented SQL based programming to perform staging area and target area declarative rules on all “C$” tables and source tables (D in this case) to generate result set. it is possible to synchronize deletions Integrates data into an Oracle target table in incremental update. When using this module with a journalized source table. Erroneous data can be isolated into an error table and can be recycled in next execution of interface. Below Are Examples of a Few IKMs Integration Knowledge Module IKM Oracle Incremental Update (MERGE) Description Integrates data into an Oracle target table in incremental update mode using the MERGE DML statement. Server to File. Erroneous data can be isolated into an error table and can be recycled in next execution of interface. IKM Naming Convention IKM [<staging technology>] <target technology> [<integration mode>] [(<integration method>)] List of supported IKMs can be found in ODI Studio and also can be seen in installation directory <ODI Home>\oracledi\xml-reference. 2. When using this module with a journalized source table. CKM operations cannot be performed in this strategy.2. Integrates data from one source into one to many Oracle target tables in append mode. Typical Flow Process 1.g. for e.

Reverse-engineering metadata from the heterogeneous systems for Oracle Data Integrator (RKM). Open Connectors contain a combination of:       Connection strategy (JDBC. JCA or other. Exposing data in the form of web services (SKM). etc. Data movement options (create target table. Data processing and transformation strategies. depending on the technology capabilities. JMS. triggers. etc. integrating it. Each Knowledge Module type refers to a specific integration task. JMS.). views. using specific strategies (insert/update. These KMs are used in interfaces Controlling Data Integrity on the data flow (CKM). update etc. Correct syntax or protocol (SQL. transforming the data. Control over the creation and deletion of all the temporary and work tables. Transaction management (commit/rollback). database utilities for instance). extracting data from it. KMs define an Open Connector that performs defined tasks against a technology.Knowledge Modules (KM) implement “how” the integration processes occur. These KMs are used in data model's static check and interfaces flow checks. Combined with a connectivity layer such as JDBC. KMs contain the knowledge required by Oracle Data Integrator to perform a specific set of tasks against a specific technology or set of technologies. Integrating data in a target system. etc. using system-optimized methods (LKM). Handling Changed Data Capture (CDC) on a given system (JKM) Loading data from one system to another. slowly changing dimensions) (IKM). insert/delete. such as connecting to this technology. Different types of Knowledge Modules Oracle Data Integrator' Open Connectors use 6 different types of Knowledge Modules: . checking it. These KMs are used in interfaces.) for the technologies involved. Knowledge Modules What are Knowledge Modules? Knowledge Modules (KMs) are components of Oracle Data Integrator' Open Connector technology.

Knowledge Module. etc. including data loading. etc. Oracle Data Integrator allows partners and clients to protect their intellectual property (for example a specific approach. middleware. update and delete) of the source databases to keep track of the changes. you will define:   The functional rules (mappings. At run time Oracle Data Integrator will use the functional rules. At the same time. How does it work? At design time When designing Interfaces in Oracle Data Integrator. JKM (Journalizing Knowledge Modules) are used to create a journal of data modifications (insert.      RKM (Reverse Knowledge Modules) are used to perform a customized reverseengineering of data models for a specific technology. etc. target and staging area servers involved.). These tasks will be orchestrated by the Agent via the Open Connectors and executed by the source. CKM (Check Knowledge Modules) are used to check that constraints on the sources and targets are not violated. IKM (Integration Knowledge Modules) are used to integrate (load) data to the target tables. transaction management and the appropriate code for the job. This allows clients and partners to easily extend the Oracle Data Integrator Open Connectors to adjust them to a specific strategy. to implement a different approach and integrate other technologies. Customization of Knowledge Modules Beyond the KMs that are included with Oracle Data Integrator and cover most standard data transfer and integration needs. Knowledge Module options and the metadata contained in the Repository (topology. data integration. these Interfaces contain several phases. SKM (Service Knowledge Modules) are used to generate the code required for creating data services. mainframe. constraints. Tasks include connection. Knowledge Modules are fully open .their source code is visible to any user authorized by the administrator. LKM (Loading Knowledge Modules) are used to extract data from the source database tables and other systems (files. an advanced use of certain technologies) by giving the option of encrypting the code of their KMs. For each of these phases. You can configure this Knowledge Module for this phase using its options. etc. models. .) for this phase The Knowledge Module to be used for this phase.) to generate automatically a list of tasks to process the job you have defined. Knowledge Modules can be easily exported and imported into the Repository for easy distribution among Oracle Data Integrator installations. data check.

the benefit of Knowledge Modules is that you make a change once and it is instantly propagated to hundreds of transformations. Check Knowledge Modules (CKM) : The CKM is in charge of checking that records of a data set are consistent with defined constraints they can check either an existing table or the temporary "I$" table created by an IKM. Service Knowledge Modules (SKM) Reverse-engineering Knowledge Modules (RKM) : The RKM is in charge of connecting to the application or metadata provider then transforming and writing the resulting metadata into Oracle Data Integrator’s repository. etc. the CKM reads the constraints of the table and checks them against the data of the table. Loading Knowledge Modules (LKM) 4. physical paths.Knowledge Modules After long time doing some assignment on ODI. Each KM is dedicated to an individual task in the overall data integration process. Check Knowledge Modules (CKM) 3. . Records that don’t match the constraints are written to the "E$" error table in the staging area. Journalizing Knowledge Modules (JKM) 6. They don’t contain references to physical objects (data stores. You can find more about ODI Architecture and how to configure from here . CKM operates in both STATIC_CONTROL and FLOW_CONTROL In STATIC_CONTROL mode.so thought to write something about Knowledge modules.) Six Types of Knowledge Modules: 1. Integration Knowledge Modules (IKM) 5. columns. The metadata is written temporarily into the SNP_REV_xx tables. KMs are based on logical tasks that will be performed. Reverse-engineering Knowledge Modules (RKM) 2. A KM will be reused across several interfaces or models.    Knowledge Modules (KMs) are code templates.ODI .

.It is used by interfaces when some of the source data stores are not on the same data server as the staging area. and tables located on the same data server as the staging area. Records that violate these constraints are written to the "E$" table of the staging area. or the source datastores are on the same data server as the staging area. This table will hold records loaded from the source server. The resulting set is usually processed by the IKM and written into the "I$" temporary table before loading it to the target . transformed data to the target table. Therefore. the CKM reads the constraints of the target table of the Interface.all remote source data sets have been loaded by LKMs into "C$" temporary tables in the staging area. Integration Knowledge Modules (IKM): The IKM is in charge of writing the final. the IKM simply needs to execute the "Staging and Target" transformations. They may be simply appended to the target. Every interface uses a single IKM. It checks these constraints against the data contained in the "I$" flow table of the staging area. or compared for incremental updates or for slowly changing dimensions. IKM will written into the "I$" temporary table before loading it to the target.The LKM creates the "C$" temporary table in the staging area.In FLOW_CONTROL mode. joins and filters on the "C$" tables. These final transformed records can be written in several ways depending on the IKM selected in your interface. Loading Knowledge Modules (LKM): An LKM is in charge of loading source data from a remote server to the staging area.

a sub model or a data store. Service Knowledge Modules (SKM): SKMs are in charge of creating and deploying data manipulation Web Services to your Service Oriented Architecture (SOA) infrastructure. Several Knowledge Modules are provided to export data to a target file or to read data from a source file. but rather within a model to define how the CDC infrastructure is initialized. Reading from a File:            LKM File to SQL LKM File to DB2 UDB (LOAD) LKM File to MSSQL (BULK) LKM File to Netezza (EXTERNAL TABLE) LKM File to Oracle (EXTERNAL TABLE) LKM File to Oracle (SQLLDR) LKM File to SalesForce (Upsert) LKM File to SAS LKM File to Sybase IQ (LOAD TABLE) IKM File to Teradata (TTUs) LKM File to Teradata (TTUs) Writing to a File:   IKM SQL to File Append IKM Netezza To File (EXTERNAL TABLE) . JKMs are not used in interfaces.Finally data flow looks like Journalizing Knowledge Module(JKM): JKMs create the infrastructure for Change Data Capture on a model.

The Journalizing Components The journalizing components are:    Journals: Where changes are recorded. Oracle Data Integrator supports two journalizing modes:   Simple Journalizing tracks changes in individual datastores in a model. Subscribers are entities (applications. etc) that use the changes tracked on a datastore or on a consistent set.com/cdc-consistent/ Introduction Changed Data Capture (CDC) allows Oracle Data Integrator to track changes in source data caused by other applications. delete). They subscribe to a model's CDC to have the changes tracked for . Subscribers: CDC uses a publish/subscribe model. Capture processes: Journalizing captures the changes in the source datastores either by creating triggers on the data tables. In such an architecture. The group of datastores journalized in this mode is called a Consistent Set.   IKM SalesForce to File (with filter) IKM SalesForce to File (without filter) IKM Teradata to File (TTUs) Changed Data Capture http://odiexperts. integration processes. updates and deletes) made to the records of this model's datastores. "New Purchase Order") during a business process. See the documentation on journalizing knowledge modules for more information on the capture processes used. or by using database-specific programs to retrieve log data from data server log files. Changed Data Capture is performed by journalizing models. Journalizing a model consists of setting up the infrastructure to capture the changes (inserts. These changes are captured by Oracle Data Integrator and transformed into events that are propagated throughout the information system. Journals only contain references to the changed records along with the type of changes (insert/update. taking into account the referential integrity between these datastores. Reducing the source data flow to only changed data is useful in many contexts. It is essential when setting up an event-oriented architecture for integration.com/changed-data-capture-cdc/ http://odiexperts. such as data synchronization and replication. Oracle Data Integrator can avoid processing unchanged data in the flow. When running integration interfaces. Consistent Set Journalizing tracks changes to a group of the model's datastores. applications make changes in the data ("Customer Deletion".

These components are implemented in the journalizing infrastructure. these changes are discarded from the journals. Consistent Set Journalizing Simple Journalizing enables you to journalize one or more datastores. the associated ORDER change has been also captured. Simple vs. It should be used when referential integrity constraints need to be ensured when capturing the data changes. 3. them. When all subscribers have consumed the captured changes. This approach has a limitation. 1. The set of available changes for which consistency is guaranteed is called the Consistency Window. it is also more difficult to set up. 5. Setting up Journalizing This is the basic process for setting up CDC on an Oracle Data Integrator data model. For performance reasons. Note: It is not possible to journalize a model (or datastores within a model) using both consistent set and simple journalizing. and by integration processes to retrieve the changed data. Journalizing views: Provide access to the changes and the changed data captured. and vice versa. arrange the datastores in order Add subscribers Start the journals To set the data model CDC parameters: . 2. Each journalized datastore is treated separately when capturing the changes. Changes in this window should be processed in the correct sequence (ORDER followed by ORDER_LINE) by designing and sequencing integration interfaces into packages. Processing ORDER_LINE records with no associated ORDER records may cause referential constraint violations in the integration process. They are used by the user to view the changes captured. consistent set journalizing is also recommended when a large number of subscribers are required. Although consistent set journalizing is more powerful. illustrated in the following example: Say you need to process changes in the ORDER and ORDER_LINE datastores (with a referential integrity constraint based on the fact that an ORDER_LINE record should have an associated ORDER record). Each of these steps is described in more detail below. you have no guarantee that the associated new records in ORDERS have also been captured. If you have captured insertions into ORDER_LINE. 4. Note that consistent set journalizing guarantees the consistency of the captured changes. Changes are captured only if there is at least one subscriber to the changes. Consistent Set Journalizing provides the guarantee that when you have an ORDER_LINE change captured. Set the CDC parameters Add the datastores to the CDC For consistent set journalizing.

model or sub-model to CDC. To add or remove datastores to or from the CDC: You should now flag the datastores that you want to journalize. To arrange the datastores in order (consistent set journalizing only): You only need to arrange the datastores in order when using consistent set journalizing. Select the Journalizing KM you want to use for this model. 3. ORDER_LINE has a foreign key constraint that references ORDER). A change in the datastore flag is taken into account the next time the journals are (re)started. the journals should be restarted for the model (Journalizing information is preserved for the other datastores).e.This includes selecting or changing the journalizing mode and journalizing knowledge module used for the model. You should arrange the datastores in the consistent set into an order which preserves referential integrity when using their changed data. and both are added to the CDC. it is recommended that you stop journalizing with the existing configuration before modifying the data model journalizing parameters. If the PRODUCT datastore has references imported from both ORDER and ORDER_LINE (i. It is possible to add datastores to the CDC after the journal creation phase. Only knowledge modules suitable for the data model's technology and journalizing mode. 1. Right-click then select Changed Data Capture > Add to CDC to add the datastore. and that have been previously imported into at least one of your projects will appear in the list. the journals should be re-started. 3. then select the Journalized Tables tab. If a datastore is removed from CDC in Consistent Set mode. indicating that the journal infrastructure is not yet in place. Refer to the knowledge module's description for more information on the options. 1. the model or the sub-model you want to add/remove to/from CDC. if an ORDER table has references imported from an ORDER_LINE datastore (i. For example. . Set the Options for this KM. If the model is already being journalized. 1. all of the datastores contained in the model or sub-model are flagged. It should be yellow. the journals should be stopped for this individual datastore. When flagging a model or a sub-model. 5. The datastores added to CDC should now have a marker icon. or select Changed Data Capture > Remove from CDC to remove it. Refresh the tree view. Select the datastore. 2.e. Note: If a datastore with journals running is removed from the CDC in simple mode. 2. its order should be lower still. both ORDER and ORDER_LINE have foreign key constraints to the ORDER table). the ORDER datastore should come before ORDER_LINE. 4. Edit the data model you want to journalize. Click OK to save the changes. In this case. Select the journalizing mode you want to set up: Consistent Set or Simple. The journal icon represents a small clock. Edit the data model you want to journalize. and then select the Journalizing tab.

Type a subscriber name into the field. You can also directly edit the Order value for this datastore.2. you can remove datastores from CDC using the Remove from CDC button To add or remove subscribers: This adds or removes a list of entities that will use the captured changes. 3. Select the data model or datastore you want to journalize. Select the Changed Data Capture > Subscriber > Unsubscribe option instead. then click the Add Subscriber button. 4. 4. If the datastores are not currently in any particular order. Repeat step 3 until the datastores are ordered correctly. 2. Restarting a journal does not remove or alter any changed data that has already been captured. then select Changed Data Capture > Subscriber > Subscribe. Right-click. To start/stop the journals: Starting the journals creates the CDC infrastructure if it does not exist yet. then select Changed Data Capture > Start Journal if you want to start the journals. 2. 3. Right-click. 1. or Changed Data Capture > Drop Journal if you want to stop them. Repeat the operation for each subscriber you want to add. so you should review the suggested order afterwards. To remove a subscriber is very similar. Note: From this tab. You can track this session from the Operator. Note: Stopping the journals deletes the entire the journalizing infrastructure and all captured changes are lost. It also validates the addition. removal and order changes for journalized datastores. 1. Select a datastore from the list. You can also add subscribers after starting the journals. then click OK to save the changes. A session to add the subscribers to the CDC is launched. click the Reorganize button. This feature suggests an order for the journalized datastores based on the data models' foreign keys. . Changes to the order of datastores are taken into account the next time the journals are (re)started. Note that this automatic reorganization is not error-free. A window appears which lets you select your subscribers. Select the journalized data model if using Consistent Set Journalizing or select a data model or individual datastore if using Simple Journalizing. you should regenerate them to take into account the new organization of the CDC set. Then click OK. Subscribers added after journal startup will only retrieve changes captured since they were added to the subscribers list. Note: If existing scenarios consume changes from this CDC set. then use the Up and Down buttons to move it within the list.

Consequently. 3. 5. Deleting subscribers and stopping journals can be automated in the same manner. The properties panel opens. The Database-specific programs are installed separately (see the KM documentation for more information). Journalizing Infrastructure Details When the journals are started. 8. Create a new package in Designer Drag and drop the model or datastore you want to journalize. they are installed on the tables in the Data Schema for the Oracle Data Integrator physical schema containing the journalized tables. The journalizing infrastructure is implemented by the journalizing KM at the physical level. It is possible to automate these operations using Oracle Data Integrator packages. See the Packages section for more information. The journal table   . then enter the list of subscribers into the Subscribers group. After this is working well. CDC is automatically deployed in the Test context by using a package. consistent sets. To automate journalizing setup: 1. It is possible to split subscriber and journal management into different steps and packages.A session begins to start or drops the journals. This common infrastructure contains information about subscribers. Generate a scenario for this package. 7. A new step appears. Check the Start box to start the journals. Eventually the same package is used to deploy CDC in the Production context. This common infrastructure consists of tables whose names are prefixed with SNP_CDC_. Journalizing trigger names are prefixed with the prefix defined in the Journalizing Elements Prefixes for the physical schema. When this scenario is executed in a context. Double-Click the step icon in the diagram. In the Type list. Journal tables and journalizing views are installed in the Work Schema for the Oracle Data Integrator physical schema containing the journalized tables. Automating these operations is recommended to deploy a journalized infrastructure across different contexts. Typical situation: the developer manually configures CDC in the Development context. Check the Add Subscribers box. 6. etc for all the journalized schemas of this data server. select Journalizing Model/Datastore. it starts the journals according to the model configuration and creates the specified subscribers using this context. 4. Add Subscribers and Start Journals operations should be performed in each context where journalizing is required for the data model. A CDC common infrastructure for the data server is installed in the Work Schema for the Oracle Data Integrator physical schema that is flagged as Default for this data server. Click OK to save. You can track this session from the Operator. the journalizing infrastructure (if not installed yet) is deployed or updated in the following locations:  When the journalizing knowledge module creates triggers. 2. The default value for this prefix is T$.

No Infrastructure . Using Changed Data Once journalizing is started and changes are tracked for subscribers.Journalizing is marked as inactive in the model. A window containing the changed data appears. Remnants . Right-click. and the infrastructure is operational for this datastore. Also ensure that installing and running triggers is technically feasible without interfering with the general behavior of the software package. but no appropriate journalizing infrastructure was detected in the current context. Note: This window selects data using the journalizing view. The default value is J$ for journal tables and JV$ for journalizing views Note: All components (except the triggers) of the journalizing infrastructure (like all Data Integrator temporary objects.and journalizing view names are prefixed with the prefixes defined in the Journalizing Elements Prefixes for the physical schema. Important Note: The journalizing triggers are the only components for journalizing that must be installed. in the same schema as the journalized data. then select Changed Data Capture > Journal Data. it is possible to view the changes captured. Journals should be started. when needed. error and loading tables) are installed in the Work Schema for the Oracle Data Integrator physical schemas of the data server. but remnants of the journalizing infrastructure such as the journalizing table have been detected for this datastore in the context.Journalizing is active for this datastore in the current context. Before creating triggers on tables belonging to a third-party software package. This state may occur if the journals were not stopped and the table has been removed from CDC. This state may occur if the journalizing mode implemented in the infrastructure does not match the one declared for the model. such as integration. Select the journalized datastore 2. The changed data displays three extra columns for the changes details: . Journalizing Status Datastores in models or interfaces have an icon marker indicating their journalizing status in Designer's current context:    OK . To view the changed data: 1.Journalizing is marked as active in the model. These work schemas should be kept separate from the schema containing the application data (Data Schema). please check that this operation is not a violation of the software agreement or maintenance contract.

Changed data can be used as the source of integration interfaces. JRN_DATE and JRN_SUBSCRIBER) become available in the datastore. Using Changed Data: Simple Journalizing Using changed data from simple journalizing consists of designing interfaces using journalized datastores as sources. the SYNC_JRN_DELETE option of the integration knowledge module should be set carefully. It invokes the deletion from the target datastore of the records marked as deleted (D) in the journals and that are not excluded by the journalizing filter. It is always executed on the source. or only a specific type of change). When this box is checked:   the journalizing columns (JRN_FLAG. Knowledge Module Options When processing journalized data. integration will only process inserts and updates. This filter will reduce the amount of source data retrieved. JRN_DATE: Timestamp of the change. Designing Interfaces Journalizing Filter When a journalized datastore is inserted into an interface diagram. Using Changed Data: Consistent Set Journalizing Using Changed data in Consistent journalizing is similar to simple journalizing regarding interface design. They cannot be used by a subsequent interface. A journalizing filter is also automatically generated on this datastore. It takes the value I for an inserted/updated record and D for a deleted record. a Journalized Data Only check box appears in this datastore's property panel. It requires extra steps before and after processing the changed data in the interfaces. A typical filter for retrieving all changes for a given subscriber is: JRN_SUBSCRIBER = '<subscriber_name>'. If this option is set to No. JRN_SUBSCRIBER: Name of the Subscriber.   JRN_FLAG: Flag indicating the type of change. Note: In simple journalizing mode all the changes taken into account by the interface (after the journalizing filter is applied) are automatically considered consumed at the end of the interface and removed from the journal. The way it is used depends on the journalizing mode. . to process changes in a time range. in order to enforce changes consistently within the set. Journalized data is mostly used within integration processes. You can customize this filter (for instance.

and remove useless entries from the journal respectively. in case of an error). Designing Interfaces The changed data in consistent set journalizing are also processed using interfaces sequenced into packages. entries still remain in the journalizing tables and should be deleted. The extend window operation (re)computes this window to take into account new changes captured since the latest Extend Window operation. Designing interfaces when using consistent set journalizing is similar to simple journalizing. Lock Subscribers: Although the extend window is applied to the entire consistency set. It should be processed only after all the changes for the subscribers have been processed. In consistent mode. If the changes need to be processed again (for example. This operation can be scheduled separately from other journalizing operations. . This operation is implemented using a package step with the Journalizing Model Type. It should be always performed after the last interface using changes captured for the subscribers. The unlock subscriber and purge journal operations (see below) are required to commit consumption of these changes. This operation is implemented using a package step with the Journalizing Model Type. subscribers consume the changes separately. Using this timestamp to filter the changes consumed does not entirely ensure consistency in these changes. except for the following differences:   The changes taken into account by the interface (that is filtered with JRN_FLAG. the following operations should be performed:   Unlock Subscribers: This operation commits the use of the changes that where locked during the Lock Subscribers operations for the subscribers. They can be reused by subsequent interfaces. Operations after Using the Changed Data After using the changed data. Purge Journal: After all subscribers have consumed the changes they have subscribed to. This snapshot includes all the changes within the consistency window that have not been consumed yet by the subscriber(s). This operation is implemented using a package step with the Journalizing Model Type. the JRN_DATE column should not be used in the journalizing filter. This operation performs a subscriber(s) specific "snapshot" of the changes in the consistency window. JRN_DATE and JRN_SUBSCRIBER) are not automatically purged at the end of the interface. It should be always performed before the first interface using changes captured for the subscriber(s). this operation should not be performed.Operations Before Using the Changed Data The following operations should be undertaken before using the changed data when using consistent set journalizing:   Extend Window: The Consistency Window is a range of available changes in all the tables of the consistency set for which the insert/update/delete are possible without violating referential integrity.

. Below are some typical templates. SnpsWaitForLogData waits for a certain number of modifications to occur on a journalized table or a list of journalized tables. Click OK. This tool calls SnpsRefreshJournalCount to perform the count of new changes captured. n-1: Interfaces using the journalized data Step n: Unlock Subscribers + Purge Journal . select Journalizing Model. Note: It is possible to perform an Extend Window or Purge Journal on a datastore. as consistency for the changes may be no longer maintained at the consistency set level Journalizing Tools Oracle Data Integrator provides a set of tools that can be used in journalizing to refresh information on the captured changes or trigger other processes:      SnpsWaitForData waits for a number of rows in a table or a set of tables. Open the package where the operations will be performed. See the knowledge module description for more information. Calling this tool is required if using Database-Specific Processes to load journalizing tables. 5. 2. SnpsRefreshJournalCount refreshes the number of rows to consume for a given table list or CDC set for a specified journalizing subscriber. This tool needs to be used with specific knowledge modules. 6. SnpsWaitForTable waits for a table to be created and populated with a predetermined number of rows. This option should be used carefully. SnpsRetrieveJournalData retrieves the journalized events for a given table list or CDC set for a specified journalizing subscriber. Package Templates for Using Journalizing A number of templates may be used when designing packages to use journalized data. Lock/Unlock Subscriber or Purge Journal step in a package: 1.. To create an Extend Window. This operation is implemented using a package step with the Journalizing Model Type. Drag and drop the model for which you want to perform the operation. 3.This is performed by the Purge Journal operation. 4. This operation can be scheduled separately from the other journalizing operations. In the Type list. Template 1: One Simple Package (Consistent Set)    Step 1: Extend Window + Lock Subscribers Step 2 . Enter the list of subscribers into the Subscribers group if performing lock/unlock subscribers operations. This operation is provided to process changes for tables that are in the same consistency set at different frequencies. See the Oracle Data Integrator Tools Reference for more information on these functions. Check the option boxes corresponding to the operations you want to perform.

If no new log data is detected after a specified interval. If no new log data is detected after a specified interval. . Changed data will only be processed if new changes have been detected.e. n-1: Interfaces using the journalized data for subscriber A Step n: Unlock Subscriber A This package is scheduled every minute. using SnpsStartScen This package is scheduled regularly. end the package. Template 3: Using SnpsWaitForLogData (Consistent Set or Simple)   Step 1: SnpsWaitForLogData. Step 2: Extend Window. end the package.. Package 2: Purge Journal (at the end of week)  Step 1: Purge Journal This package is scheduled once every Friday. Extend Window may be resource consuming. This template is relevant if changes are made regularly in the journalized tables. Package 3: Process the Changes for Subscriber A    Step 1: Lock Subscriber A Step 2 . Such a package is used for instance to generate events in a MOM. Template 2: One Simple Package (Simple Journalizing)  Step 1 . n: Interfaces using the journalized data This package is scheduled to process all changes every minutes... We will keep track of the journals for the entire week. Step 2: Execute a scenario equivalent to the template 1 or 2. This template is relevant if changes are made regularly in the journalized tables. the purge. Template 4: Separate Processes (Consistent Set) This template dissociates the consistency window. This avoids useless processing if changes occur sporadically to the journalized tables (i. It is better to have this operation triggered only when new data appears. Package 1: Extend Window   Step 1: SnpsWaitForLogData. to avoid running interfaces that would process no data). and the changes consumption (for two different subscribers) in different packages.. This package is scheduled every minute.This package is scheduled to process all changes every minutes.

Reducing the source data flow to only changed data is useful in many contexts. Loads will process only changes since the last load.Package 4: Process the Changes for Subscriber B    Step 1: Lock Subscriber B Step 2 . •Consistent Set Journalizing tracks changes to a group of the model's datastores. updates and deletes) made to the records of this model's datastores. taking into account the referential integrity between these datastores. Changed Data Capture is performed by journalizing models.. The Journalizing Components The journalizing components are: . n-1: Interfaces using the journalized data for subscriber B Step n: Unlock Subscriber B This package is scheduled every day. ODI can avoid processing unchanged data in the flow. These changes are captured by Oracle Data Integrator and transformed into events that are propagated throughout the information system. applications make changes in the data ("Customer Deletion". AS/400). ODI can filter the data based on the last value loaded (cannot process updates and deletes). The volume of data to be processed is dramatically reduced. The group of datastores journalized in this mode is called a Consistent Set. CDC enables ODI to track changes in source data caused by other applications. "New Purchase Order") during a business process. 3) Time stamp based: Processes written with ODI can filter the data by comparing the time stamp value with the last load time (cannot process deletes) 4) Sequence number: If the records are numbered in sequence. Such a package is used for instance to load a data warehouse during the night with the changed data. It is essential when setting up an event-oriented architecture for integration. 2) Logs based: ODI retrieves changes from the database logs (Oracle. When running integration interfaces. Working With Change Data Capture Changed Data Capture Purpose of CDC is to enable applications to process changed data only. Oracle Data Integrator supports two journalizing modes: •Simple Journalizing tracks changes in individual datastores in a model. such as data synchronization and replication. Journalizing a model consists of setting up the infrastructure to capture the changes (inserts. In such architecture.. CDC Techniques 1)Trigger based: ODI will create and maintain triggers to keep track of the changes.

integration processes. . arrange the datastores in order 4. This tool needs to be used with specific knowledge modules. these changes are discarded from the journals.Add subscribers 5. They are used by the user to view the changes captured. These components are implemented in the journalizing infrastructure Setting up Journalizing: This is the basic process for setting up CDC on an Oracle Data Integrator data model. When all subscribers have consumed the captured changes. 1.Add the datastores to the CDC 3. •SnpsWaitForLogData waits for a certain number of modifications to occur on a journalized table or a list of journalized tables.•Journals: Where changes are recorded. Calling this tool is required if using Database-Specific Processes to load journalizing tables. •Journalizing views: Provide access to the changes and the changed data captured. etc) that use the changes tracked on a datastore or on a consistent set. This tool calls SnpsRefreshJournalCount to perform the count of new changes captured. Subscribers are entities (applications. and by integration processes to retrieve the changed data. •SnpsRetrieveJournalData retrieves the journalized events for a given table list or CDC set for a specified journalizing subscriber.Start the journals Journalizing Tools: Oracle Data Integrator provides a set of tools that can be used in journalizing to refresh information on the captured changes or trigger other processes: •SnpsWaitForData waits for a number of rows in a table or a set of tables. •Subscribers: CDC uses a publish/subscribe model. Each of these steps is described in more detail below.For consistent set journalizing. or by using database-specific programs to retrieve log data from data server log files. They subscribe to a model's CDC to have the changes tracked for them. •SnpsWaitForTable waits for a table to be created and populated with a pre-determined number of rows. •Capture processes: Journalizing captures the changes in the source datastores either by creating triggers on the data tables. •SnpsRefreshJournalCount refreshes the number of rows to consume for a given table list or CDC set for a specified journalizing subscriber. Changes are captured only if there is at least one subscriber to the changes. delete). Journals only contain references to the changed records along with the type of changes (insert/update.Set the CDC parameters 2.

. Verify the setting. Expand the Procedure-Demo > Knowledge Modules node. right-click Journalization (JKM).Implementing Changed Data Capture: Step:1) Import the appropriate JKM in the project. as shown in the following screen. Click the Projects tab. Select the logical schema Sales_Order. Click the Journalizing tab. and select Import Knowledge Modules. Click the Reverse Engineer tab and set Context to development. For Technology. Step:2) In the Models tab. enter: Oracle. create a new model named Oracle_relational_01.

select JKM Oracle Simple. right-click the EMPLOYEE table. You will start the CDC on the EMPLOYEE table in the Oracle_Relational_01 model. The small yellow clock icon is added to the table. Procedure-Demo. Click Yes to confirm. Click the Save to save your model and then close the tab. To add the table to CDC. Expand this model and verify its structure as follows. Step: 4) Reverse-engineer the model Oracle_Relational_01.Step: 3) In the Knowledge Module menu. as shown in the following screen. . and select Change Data Capture > Add to CDC. expand the Oracle_Relational_01 model. Step: 5) Set up the CDC Infrastructure. Step: 6) Click the Refresh icon.

Wait several seconds. Step: 8) you use the default subscriber SUNOPSIS. you do not have to add another subscriber. For that reason. then click Refresh and verify that the tiny clock icon at the EMPLOYEE table is green now. click OK again.Step: 7) Right-click the EMPLOYEE table again and select Changed Data Capture > Start Journal. In the Information window. Click OK to confirm that your subscriber is SUNOPSIS. This means that your journal has started properly. .

and verify that the rows are modified. open the Models tab. Scroll down. Step: 10) View the data and the changed data. In the Designer window.Step: 9) Click the ODI Operator icon to open the Operator. Step: 12) Right-click the table again and select View Data. . and then change the value to “jacob”. Save your changes and close the tab. Step: 11) Select the row with Employee_Key = 10. Similarly. Right-click the EMPLOYEE datastore and select Data. Close the tab. Select All Executions and verify that the EMPLOYEE session executed successfully. Change the value of the NAME2 column to “Symond”. select the row with Employee_Key = 15. Click Refresh.

Find the captured changed records in the journal data. and select Change Data Capture > Journal Data. right-click EMPLOYEE. Done ! .To verify that your changed data is captured. Close the tab.

Sign up to vote on this title
UsefulNot useful