You are on page 1of 86

Extracting, Transforming and Loading Data

Purpose

This tutorial shows you how to define simple ETL mapping and also shows how to work with
user-defined database objects in Warehouse Builder. In this tutorial, you also define a simple
process flow.

Note: The screenshots in this tutorial may appear slightly different from yours depending on the
order in which you perform the other tutorials in this OBE Series. You can ignore the irrelevant
items you see on your screens.

Time to Complete

Approximately 1 hour

Topics

This tutorial covers the following topics:

Overview
Prerequisites
Defining a Simple ETL Mapping
Using User-Defined Database Objects in an ETL Mapping
Defining a Simple Process Flow
Summary

Viewing Screenshots

Place the cursor over this icon to load and view all the screenshots for this tutorial.
(Caution: This action loads all screenshots simultaneously, so response time may be slow
depending on your Internet connection.)

Note: Alternatively, you can place the cursor over each individual icon in the following steps to
load and view only the screenshot associated with that step.

Overview

ETL stands for Extract, Transform and Load. ETL involves the movement and transformation of
data from your sources to your targets. In the tutorial "Setting up the Oracle Warehouse Builder
Project", you learned how to define Warehouse Builder modules that connect to your source and
target schemas. Now, you learn how to create the data movement and transformation logic.
This tutorial begins with creating a simple mapping, using the mapping editor. Warehouse
Builder facilitates creation of large and complex mappings, increasing productivity and reducing
the number of errors.

Back to Topic List

Prerequisites

Before starting this tutorial, you should:

1. Complete the Setting Up the Oracle Warehouse Builder Project tutorial.

Note: For this tutorial, you must have completed the Installation and Configuration of Oracle
Workflow as mentioned in Prerequisite topic of the tutorial "Setting up the Oracle Warehouse
Builder Project".

Also, you must have completed the "Importing Metadata in OE module" subtopic in tutorial
"Setting up the Oracle Warehouse Builder Project"

2. Log in to SQL Plus and run the grant_priv.sql script.

The grant_priv.sql script grants select privileges to a few OE and HR tables that you use as
source tables in your mappings later in this tutorial.

Log in to SQL*Plus as sys and run the grant_priv.sql script that you downloaded into the
c:\owbdemo_files folder.

To run the script, enter the following command at the SQL> prompt:

@c:\owbdemo_files\grant_priv.sql;

Back to Topic List

Defining a Simple ETL Mapping

In this topic, you create a basic mapping to understand how ETL design works in Warehouse
Builder. You use the OE sample schema as the source from which you load the data into
DP_TGT target.

In this topic, you perform the following:

Creating a Simple ETL Mapping


Deploying the Mapping and Related
Objects
Executing the Mapping
Viewing the Loaded Data in
CONS_ORDERS table

Back to Topic List

Creating a Simple ETL Mapping

To create a simple ETL Mapping, perform the following steps:

1. Assuming that you have logged in as owb user in the Design Center, in the Project Explorer
panel, expand HANDSON > Databases > Oracle > DP_TGT. Right-click Mappings and select
New.

The Create Mapping window appears.


2. In the Create Mapping window, enter SIMPLE_MAP as name of the mapping. Click OK.
The Mapping Editor launches the SIMPLE_MAP mapping.

3. You can add existing sources and targets to a mapping through the Explorer panel, which is on
the top left of the Mapping Editor.

In the Explorer panel, ensure that Available objects tab is selected. Expand Oracle > OE >
Tables. Drag ORDERS to the canvas.

Also, drag ORDER_ITEMS on to the canvas.


4. From the Palette, drag the JOINER operator to the canvas. Place the operator to the right of the
two tables. Click ( ) icon of each operator headers to maximize the operators.
5. Connection lines graphically represent how the data flows from a source, through operators, and
to a target.

Drag a connection line from ORDERS.INOUTGRP1 into JOINER.INGRP1.


6. Drag a connection line from ORDER_ITEMS.INOUTGRP1 into JOINER.INGRP2.
Observe that the attributes from the source tables get added as input attributes in the JOINER
operator.

7. In the Mapping Editor Canvas, click the JOINER operator header to highlight it.

In the Joiner Properties panel, examine the Join Condition property.


Click Edit ( ) to view the full join condition. In the Expression Builder window, the join
condition is as follows: INGRP2.ORDER_ID = INGRP1.ORDER_ID. Warehouse Builder
frames the join condition based on the constraints between the two tables, ORDERS and
ORDER_ITEMS. Click OK.
8. From the Palette, drag a Table Operator to the canvas. Place the operator to the right of the
JOINER operator.
The Add Table Operator window displays.

9. In the Add Table Operator window, select "Create unbound operator with no attributes". In
the New operator name field, enter CONS_ORDERS. Click OK.
10. Drag a connection line from JOINER.OUTGRP1 to CONS_ORDERS.INOUTGRP1.
Maximize the CONS_ORDERS operator and ensure that the operator is highlighted.
11. In the Table Operator Properties panel, for the Loading Type property, select
TRUNCATE/INSERT. If you want to insert or update, you can simply select
INSERT/UPDATE and Warehouse Builder will generate code with a MERGE statement

12. The CONS_ORDERS table still does not exist in the repository. Right-click the
CONS_ORDERS table operator header and select Create and Bind from the menu.

In the Create and Bind window, examine the details and click OK. From the Mapping menu,
select Close to exit the Mapping Editor.

13. In DP_TGT, expand Tables. Double-click CONS_ORDERS.


Examine the table details in the Data Object Editor. From Diagram menu, select Close Window
to exit the Data Object Editor.
Note that you derived this table directly from the mapping editor and did not create the table and
its columns via typing. This adds to your productivity and saves time.

From the Design menu, select Save All. In Warehouse Builder warning dialog box , click Yes to
save your work.

Back to Topic

Deploying the Mapping and Related Objects

Deployment is the process of creating your target system from the logical design or model. When
you deploy ETL mappings, the deploy procedure generates a PL/SQL script. You need to
explicitly execute the scripts to perform ETL. Before you deploy a mapping, you should have
deployed the objects the mapping refers to.

To deploy SIMPLE_MAP mapping, perform the following steps:

1. To deploy, you must launch the Control Center Manager. From the Tools menu, select Control
Center Manager.

2. Click DP_TGT_LOCATION if not already highlighted. In the Object Details panel, observe
the two objects you defined, SIMPLE_MAP and CONS_ORDERS, design status is set to New
and Deploy Status set to Not Deployed.
3. Click Default Actions to change Deploy Action from None to Create.
As a result of performing other tutorials, you may have many other objects under
DP_TGT_LOCATION. In order to select the two objects, SIMPLE_MAP and CONS_ORDERS,
expand DP_TGT_LOCATION > DP_TGT.

Expand Tables. Expand Mappings. Hold ctrl and select the two objects, SIMPLE_MAP and
CONS_ORDERS, and then click Default Actions as shown in the screenshot below.
4. From the File menu, select Deploy > To Control Center.
You can monitor the progress of deployment in the Control Center Jobs panel on the lower right
of the Control Center window.

In the Control Center Jobs panel, the deployment status changes from "generate" to "run" to
"completed successfully." The mappings are deployed successfully and are ready for execution.

To view the generated code for the mapping, double-click the HANDSON job in the Control
Center Jobs panel.

Note: If you performed deployment in any of the other tutorials in this series, you may see
multiple jobs in this panel. If you see multiple jobs in the window, check the Finished column
for the date and time and select the latest job.
5. In the Job Details: Handson dialog box, click the Script tab.

Note that Warehouse Builder generates a .pls (PL/SQL) code for the deployed mapping. To view
the generated code for SIMPLE_MAP, select SIMPLE_MAP and click View Code
Scroll down in the code window to examine the generated code. From Code menu, select Close.
Then in the Job Details window, from File menu, click Close to exit.
Back to Topic

Executing the Mapping

You learned that when you deploy a mapping Warehouse Builder generates a .pls script. You
need to execute that deployed code to actually perform the task the mapping was designed to
do."

The SIMPLE_MAP mapping was designed to select rows from OE.ORDERS and
OE.ORDER_ITEMS that would go through a join condition using the JOINER operator and then
load the resultant data into CONS_ORDERS table.

In order to execute the mapping, perform the following steps:

1. Assuming you did not exit the Control Center, expand DP_TGT_LOCATION > DP_TGT >
Mappings and click SIMPLE_MAP.
2. From the File menu, select Start.
When you click Start, the Control Center Jobs panel switches to the Execution tab. You can
monitor the execution in this panel.

To view the execution details, double-click SIMPLE_MAP in the Job column of the Control
Center Jobs panel.
3. In the Job Details: SIMPLE_MAP dialog box, click the Execution Results tab and monitor Row
Activity.

You can get a count of how many rows were inserted in the target table, CONS_ORDERS. From
the File menu of the Job Details: SIMPLE_MAP dialog box, select Close. From the File menu
of the Control Center Manager, select Close to exit.

Back to Topic

Viewing the Loaded Data in CONS_ORDERS table


To view the data that was loaded in CONS_ORDERS table as a result of executing the mapping,
perform the following steps:

1. In Project Explorer panel, expand Tables under DP_TGT. Right-click CONS_ORDERS and
select Data from the menu.

2. The Relational Data Viewer shows the data in the CONS_ORDERS table. Scroll up/down or
left/right to examine the data. From Object menu, select Close Window to exit the Data Viewer.
Back to Topic

Back to Topic List

Using User-Defined Database Objects in an ETL Mapping

In this topic you transform and load data that originally resided in User-Defined Database
objects.

In this topic, you perform the following:

Creating a MAP_CUSTOMERS Mapping that uses a User-Defined Database Object


Generating Code for the MAP_CUSTOMERS Mapping
Deploying the MAP_CUSTOMERS Mapping
Executing the MAP_CUSTOMERS Mapping
Viewing the Loaded Data in SCALAR_CUST table
Back to Topic List

Creating a MAP_CUSTOMERS Mapping that uses a User-Defined Database Object

Now you define a mapping that uses a user-defined object from an OE.CUSTOMERS table as a
source and loads the transformed data into a target table, SCALAR_CUST.

To create the mapping, perform the following steps:

1. In Project Explorer panel, expand DP_TGT. Right-click Mappings and select New.

The Create Mapping window appears.

2. In the Create Mapping window, enter MAP_CUSTOMERS as name of the mapping. Click OK.
The Mapping Editor launches the MAP_CUSTOMERS mapping.

3. In the Explorer panel, ensure that Available objects tab is selected. Expand Oracle > OE >
Tables. Drag CUSTOMERS table to the canvas. Maximize the CUSTOMERS operator by
clicking maximize ( ).

4. Double-click CUST_ADDRESS attribute in the CUSTOMERS operator.


In the Table Editor: Customers window, note the data type of the CUST_ADDRESS attribute.
The data type is OE.CUST_ADDRESS_TYP, a user-defined data type. Click OK.
5. To interpret the CUST_ADDRESS_TYP in CUSTOMERS you use the Expand Operator.

From the Palette, drag the Expand Object Operator on the canvas. The Add Expand Object
window displays.
6. In the Add Expand Object window, select CUST_ADDRESS_TYP, as shown in the screenshot.
Click OK.
Maximize the CUST_ADDRESS_TYP operator by clicking maximize ( ).

7. Drag a connection line from CUSTOMERS.CUST_ADDRESS to


INGRP1.CUST_ADDRESS_TYPE.
8. Note that the type has COUNTRY_ID. To find the corresponding country name, you define a
lookup.

From the Palette, drag the Key Lookup Operator on the canvas. Place the operator to the right of
the Expand Object operator. The Key Lookup wizard launches. Click Next on the Welcome
page.
9. In the Name page, accept the default name, KEY_LOOKUP. Click Next.
10. In the Groups page, accept the default groups. Click Next.
11. In the Input Connections page, expand CUST_ADDRESS_TYP > OUTGRP1 and select
COUNTRY_ID.

Click > to move COUNTRY_ID from Available Attributes list to Mapped Attributes. Click
Next.
12. In the Lookup page, click the drop down list to select the object, which has the lookup result.
Expand OE and select COUNTRIES.
13. Click in the blank Lookup table Key field and select COUNTRY_C_ID_PK.
14. Click in the blank Input Attribute field. Note COUNTRY_ID gets added under
COUNTRY_C_ID_PK.

Select COUNTRY_ID under Input Attribute field next to the COUNTRY_ID lookup table key.
Click Next.
15. In the No-match Rows page, enter 'UNKNOWN' (including single quotes) in the Default Value
field for COUNTRY_NAME Lookup Table Column. Click Next. In the Summary page,
examine the details and click Finish.
16. If you are not able to see all the objects similar to the following screenshot, then in the Mapping
Editor, click Auto-Layout on the toolbar to bring the mapping to its default size. Also, you can
click Fit in Window to adjust the mapping in your window.

The Mapping looks as follows:


17. Now you create the target table to load the transformed data into.

From the Palette, drag the Table Operator on the canvas.


18. In the Add Table Operator window, select Create unbound operator with no attributes. Enter
SCALAR_CUST as the new operator name. Click OK. Place the operator to the right of the
Key Lookup operator.
19. Drag a connection line from KEY_LOOKUP.COUNTRY_NAME to
SCALAR_CUST.INOUTGRP1.
20. From CUST_ADDRESS_TYP, select all the attributes except COUNTRY_ID. Drag a
connection line to SCALAR_CUST.INOUTGRP1, as shown in the screenshot.

In Connect Operators window, accept the default option, Copy source attributes to target
group and match. Click Go. Click OK.
21. From CUSTOMERS, hold shift and select CUST_FIRST_NAME and CUST_LAST_NAME
and drag a connection line into SCALAR_CUST>INOUTGRP1.
In Connect Operators window, accept the default option, Copy source attributes to target
group and match. Click Go. Click OK.
22. Minimize all the operators except SCALAR_CUST.

Click SCALAR_CUST operator header. In the Table Operator properties panel, for Loading
Type property, select TRUNCATE/INSERT.
23. Note that the SCALAR_CUST table is still not existing in the repository.

Right-click SCALAR_CUST header and select Create and Bind from the menu.
In the Create and Bind window, examine the details and click OK. The table SCALAR_CUST
gets added under DP_TGT> Tables.

This completes the mapping. From the Mapping menu, select Close to exit the Mapping Editor.

Back to Topic

Generating Code for the MAP_CUSTOMERS Mapping

To generate the code for this mapping, perform the following steps:
1. Expand DP_TGT > Mappings. Right-click MAP_CUSTOMERS and select Generate.

The Generation Results window displays.


2. Click the Script tab. Select MAP_CUSTOMERS and click View Code.
3. Examine the generated code for the mapping. Notice the NVL in the code to ensure the
'UNKNOWN' country name, the ANSI SQL code of the left outer join and the splitting out of
the type in SQL.

Note: This code is ready for deployment. You may also right-click the MAP_CUSTOMERS
mapping and click Deploy to deploy the mapping. Rather you will generate, deploy and execute
the mapping from the Control Center Manager in the forthcoming topics in this tutorial.

In the MAP_CUSTOMERS window, from Code Menu, select Close to exit. From the File
menu, select Close to exit the Generation Results window.

From the Design menu, select Save All. In Warehouse Builder warning dialog box , click Yes to
save the changes.
Back to Topic

Deploying the MAP_CUSTOMERS Mapping

To deploy the mapping, perform the following steps:

1. From the Tools menu, select Control Center Manager.


2. In the Control Center, expand DP_TGT_LOCATION > DP_TGT. Expand Tables. Expand
Mappings.
3. Hold Ctrl and select SCALAR_CUST and MAP_CUSTOMERS, as shown in the screenshot.
Note that both the objects Design Status is set to New and Deploy Status is set to Not Deployed.
4. Click Default Actions to set the Deploy Action to Create.
5. From the File menu, select Deploy > To Control Center.
In the Object Details panel, the Deploy Status is set to Success.
Back to Topic

Executing the MAP_CUSTOMERS Mapping

To execute the MAP_CUSTOMERS mapping, perform the following steps:

1. Assuming you did not exit the Control Center, expand DP_TGT_LOCATION > DP_TGT >
Mappings and click MAP_CUSTOMERS. From the File menu, select Start.
2. When you click Start, the Control Center Jobs panel switches to the Execution tab. You can
monitor the execution in this panel.

To view the execution details, double-click MAP_CUSTOMERS in the Job column of the
Control Center Jobs panel.
3. In the Job Details: MAP_CUSTOMERS dialog box, click the Execution Results tab and
monitor Row Activity.

You can get a count of how many rows were inserted in the target table, SCALAR_CUST.

From the File menu of the Job Details: MAP_CUSTOMERS dialog box, select Close. From the
File menu of the Control Center Manager, select Close to exit.
Back to Topic

Viewing the Loaded Data in SCALAR_CUST table

To view the data that was loaded after executing the MAP_CUSTOMERS mapping, perform the
following steps:

1. In Project Explorer panel, expand Tables under DP_TGT. Right-click SCALAR_CUST and
select Data from the menu.
2. The Relational Data Viewer shows the data in the SCALAR_CUST table. Scroll up/down or
left/right to examine the data. From Object menu, select Close Window to exit the Data Viewer.
Back to Topic

Back to Topic List

Defining a Simple Process Flow

In Warehouse Builder, a process flow is used to manage the execution of mappings and related
activities, such as uploading a file to the warehouse machine using FTP or notifying the
administrator in case of an error in the load process. Using process flows, you can define the
sequence in which the mappings will run and also define actions based on the outcome of the
execution for example sending a notification email if the mapping execution failed. Warehouse
Builder also allows you to schedule the process flows.

The Process Flow Editor includes a variety of activities that you add and then connect with
transitions to design a flow. An Activity represents units of work in a process flow. These units
of work can involve components internal or external to Warehouse Builder. Transitions indicate
the sequence and conditions in which to launch activities.
In this topic, you perform the following tasks:

Creating the Process Flow Module and Package


Designing the Process Flow
Deploying and Executing the Process Flow

Back to Topic List

Creating the Process Flow Module and Package

Before you can start designing the process flow, you need to create a process flow module that
includes a process flow package. Process Flow Modules include Process Flow Packages. The
Process Flow Package is a grouping mechanism that determines which process flows you can
interrelate. At run time, you can launch one process flow that launches other process flows that
exist in the same process flow package.

In this topic, you create a Process Flow Module named PFLOWS, a Process Flow Package
named L_MAPS. The Process Flow Module acts as a container by which you can validate,
generate, and deploy a group of process flows.

To create the process flow module and package, perform the following steps:

1. In the Project Explorer, expand HANDSON > Process Flows. Right-click Process Flow
Modules and select New from the menu. The Create Module Wizard is launched. Click Next on
the Welcome page.
2. In the Name and Description page, enter PFLOWS as the name of the process flow module.
Click Next.
3. Now you create an Oracle Workflow location for the process flow module, PFLOWS.

Note: You define an Oracle Workflow location to specify where you want to deploy your
process flows. The Oracle Workflow location points to a workflow schema (OWF_MGR) that
runs in the target database. Hence, you should have Oracle Workflow installed and configured
beforehand.

On the Connection Information page, observe that Warehouse Builder creates a location with a
default name, PFLOWS_LOCATION1. Click Edit... to specify the details of the Workflow
location.

In the Edit Oracle Workflow Location window, change the name of the location to
PFLOWS_LOCATION (remove the 1 from the default name).

This tutorial uses owf_mgr as the schema name and owf_mgr as the password. If your Oracle
Workflow schema has a different name and password, substitute accordingly. Ensure the
Workflow User is unlocked.
Provide the following information to create a fully qualified location:

Name: PFLOWS_LOCATION (remove the 1 from the default


name)
Password: owf_mgr
Host Name: localhost
Port Number: 1521
Service Name: orcl
Schema: owf_mgr
Version: 2.6.4

Click Test Connection. If the connection is successful, click OK. If the connection is not
successful, check the connection details and try again. Click OK. Click Next. On the Summary
page, review the details and click Finish
4. In the Create Process Flow Package window, enter L_MAPS as the name and click OK.

5. The Create Process Flow window is launched. Enter LOAD_PF as the name of the process flow.

Click OK.

Warehouse Builder launches the Process Editor and displays the process flow with a Start
activity and an End_Success activity.
In the next topic, you now learn how to design the process flow with activities and transitions.

Back to Topic

Designing the Process Flow

To design the process flow, perform the following steps:

1. In the Process Editor, from the Palette panel, drag the Fork activity to the canvas. You can use
the Fork activity to launch multiple, concurrent activities after the completion of an activity. You
can also drag the Fork activity from the Selected Objects tab in the Explorer panel.
2. In the Process Editor, in the Explorer panel, click the Available Objects tab. Expand DP_TGT
and drag SIMPLE_MAP mapping to the canvas. Place the operator to the right of the FORK
activity.
3. In the Explorer panel, ensure that the Available Objects tab is selected. From DP_TGT, drag the
MAP_CUSTOMERS mapping to the canvas.
4. From the Palette panel, drag an AND activity to the canvas, to the right of the SIMPLE_MAP
and MAP_CUSTOMERS mapping activity.
5. Position your cursor on the Start activity. Ensure the cursor changes to ( ). Drag a line (these
lines from here on will be referred to as transitions) from Start activity to the Fork activity.

To drag a transition, the mouse pointer must change to ( ).

Drag a transition from START1 activity to the FORK activity.


Drag a transition from FORK activity to the mapping activity SIMPLE_MAP.
Drag a transition from FORK activity to the mapping activity MAP_CUSTOMERS.
Drag a transition from mapping activity SIMPLE_MAP to the AND1 activity.
Drag a transition from mapping activity MAP_CUSTOMERS to the AND1 activity.
Drag a transition from AND1 activity to the mapping activity END_SUCCESS.

On the Process Editor toolbar, click AutoLayout ( ).

After having linked all the activities with transitions, the process flow should be as shown in the
screenshot.
6. This completes the design of this process flow.

Generate the LOAD_PF process flow by selecting Generate from the Process Flow menu. The
code generated from a process flow definition is in industry standard XML Process Definition
Language (XPDL).

From Process Flow menu, select Close to exit the process editor.

In the Design Center, click Save All ( ) on the toolbar. In Warehouse Builder warning dialog
box , click Yes to save the changes.

Back to Topic

Deploying and Executing the Process Flow


To deploy and execute the process flow, perform the following steps:

1. From the Tools menu, select Control Center Manager. In the Control Center, expand
PFLOWS_LOCATION > PFLOWS. Select PFLOWS.

In the Object Details panel, the process flow package L_MAPS is selected for deployment.
Design Status is set to New. Click Default Actions to set Deploy Action to Create.

Note: You deploy the process flow package, and not the process flow. However, you execute the
process flow.

2. From the File menu, select Deploy > To Control Center.


The process flow package, L_MAPS, is deployed successfully.
3. Expand L_MAPS if not already expanded and click LOAD_PF to select it.

Select Start from the File menu or click the Start ( ) button on the toolbar.

4. To view the execution details, double-click LOAD_PF in the Job column of the Control Center
Jobs panel.

In the LOAD_PF dialog box, click the Execution Results tab and monitor Row Activity. You
can get a count of how many rows were inserted.

From the File menu of the Job Details: LOAD_PF dialog box, select Close. From the File menu
of the Control Center Manager, select Close to exit.
The LOAD_PF process flow (a very simple example) runs the two mappings parallel to one
another, then waits until both are done, and continues. Recall since you had set the Loading Type
property for the target tables as TRUNCATE/INSERT, on execution of this process flow,
Warehouse Builder truncated the rows and re-loaded the data into the two target tables,
CONS_ORDERS and SCALAR_CUST.

You can achieve much more complex flows, where you loop around, set parameters and wait for
results. You can also decide to traverse certain branches based on conditions and errors, and
create a complete error handling scenario with notifications and emails.

Back to Topic
Back to Topic List

Summary

In this tutorial, you learned how to:

Create, Deploy and Execute a Simple ETL Mapping


Create, Deploy and Execute a Mapping that uses a User-Defined Database Object
Create, Deploy and Execute a Simple Process Flow

Back to Topic List

Place the cursor over this icon to hide all screenshots.