You are on page 1of 38

PowerCenter Basic Concepts

Ale Ribeiro
June 6, 2006

Agenda
What is PowerCenter?
PowerCenter Client Applications
Demo
PowerCenter Designer, Workflow Manager, Workflow Monitor
PowerCenter Architecture

Where do we use PowerCenter in IT?


Q&A

PowerCenter
Is a single, unified enterprise data integration
platform that allows companies and government
organizations of all sizes to access, discover,
and integrate data from virtually any business
system, in any format, and deliver that data
throughout the enterprise at any speed
An ETL Tool (Extract, Transform and Load)

PowerCenter Client Applications


Administration

Development

Administration
Console

Repository
Manager
Manage repository
connections
folders
objects
users and groups

Administration Console
(browser-based)

Designer

Perform domain and


repository service tasks:
Create/configure nodes
and repository services
Upgrade/delete
Start/stop
Backup/restore

Create ETL
Create and
mappings start workflows

Workflow
Manager

Workflow
Monitor
Monitor and
control
workflows

Designer Tools Create mappings

Target
Transformation Mapplet
Source
Designer:
Developer:
Designer:
Analyzer:
create
create source create target create reusable
objects
transformations mapplets
objects

Mapping
Designer:
create
mappings

Mapping
Logically Defines the ETL Process:
Reads data from sources
Applies transformation logic to data
Writes transformed data to targets

Source

Transformations

Target

Note: Sources and targets can be flat files, relational tables, XML files,
application systems, message queues, etc

Unit 1

Mapping (contd)

A mapping is a set of source and target definitions linked by transformation


objects that define the rules for data transformation. Mappings represent the
data flow between sources and targets. When the Integration Service runs a
session, it uses the instructions configured in the mapping to read,
transform, and write data.

Every mapping must contain the following components:


Source definition. Describes the characteristics of a source table or file.
Transformation. Modifies data before writing it to targets. Use different transformation objects to
perform different functions.
Target definition. Defines the target table or file.
Links. Connect sources, targets, and transformations so the Integration Service can move the
data as it transforms it.

A mapping can also contain one or more mapplets. A mapplet is a set of


transformations that you build in the Mapplet Designer and can use in
multiple mappings.

Example
Give me an Excel file with Total Order Amount per
Customer. I also need to know when this data was
extracted (date) and the customer type initial ( first letter
of the customer type)
Define the sources
Orders
Customers

Define any required transformation


Sum of order amount
Get extracted date
Get first letter of customer type

Create the file

Transformations
Generate, modify, or pass data
Data passes into and out of
transformations through ports that
you link in a mapping
Passive transformations do not
change the number of rows received
Active transformations can change
the number of rows received

Unit 1

PowerCenter Transformations (partial list)


Source Qualifier: reads data from flat file and relational sources
Expression: performs row-level calculations
Filter: drops rows conditionally
Sorter: sorts data
Aggregator: performs aggregate calculations
Joiner: joins heterogeneous sources
Lookup: looks up values and passes them to other objects
Update Strategy: tags rows for insert, update, delete, reject
Router: routes rows conditionally
Transaction Control: allows data-driven commits and rollbacks
10

Advanced PowerCenter Transformations


Union: Performs a union-all join between two data streams
Java: allows Java syntax to be used within PowerCenter
Midstream XML Parser: reads XML from anywhere in mapping
Midstream XML Generator: writes XML to anywhere

More Source Qualifiers: read from XML, message queues


and applications

11

Mapplet Set of transformation that can be


reusable

Mapplet
Input & Output
transformations
(pass data from
or to mapping)

Mapplet Designer Tool

Unit 14

12

Example: Data Sources Defined Outside Mapplet


Mapping

Source data defined


outside the Mapplet

Mapplet

Mapplet Input
transformation

Mapplet Output
transformation

Unit 14

13

Recap
1.
2.
3.
4.
5.

ETL
Designer
Mapping
Transformation
Mapplet

a.
b.
c.
d.

Extract, transform and load data


Create mapping objects
Logically defines the ETL process
Generates or manipulates data
Set of transformations that can be
reused in multiple mappings

14

Workflow Manager Tools Create and Start


Workflow

Create reusable tasks

Create worklets

Create workflows

15

Task
An executable set of actions, functions or
commands
Examples:
Session task runs a mapping
Command task runs a shell script
Email task sends an email
Decision task branches workflow conditionally
Timer task waits for a specified period

16

Session
Task that executes a mapping
Define Log Options, Error handling, Connections

17

Decision Task
 Tests for a condition during the workflow and sets a flag based on
the condition
 Use a link condition (or a Control task) downstream to test the flag
and control execution flow
 Can use workflow variables in condition

Options on all
tasks to fail parent
and disable

Treat inputs as
AND/OR

Unit 16

18

Email Task
 Sends an email within a workflow
Note: emails can also be sent post-session in a Session task

 Can be used with a link condition to notify success or failure of prior


tasks

Unit 16

19

Event Wait Task




Pauses processing of the pipeline until a specified event occurs

Events can be:


 Pre-defined file watch
 User-defined created by an Event Raise task elsewhere in the workflow

Unit 17

20

Event Wait Task (contd)


Events Tab

Specify either a pre-defined


or user-defined event

User-defined events must be declared in the workflow Events tab

21

Event Raise Task




Sets the location of a user-defined event in the workflow

User-defined events are triggered when the PowerCenter Server executes


the Event Raise Task

User-defined events must be declared in the workflow Events tab

Used with the Event Wait Task


22

Command Task


Specifies one or more UNIX


command or shell script,
DOS command or batch file
for Integration Services to run
during a workflow
Note: UNIX and DOS commands
can also be run pre- or postsession in a Session task

Command task status


(success or failure) is held in
the task-specific variable
$command_task_name.STATUS

23

Command Task (contd)

Add Cmd
Remove Cmd

24

Reusable Tasks
Session, Email and Command tasks can be reusable
Use the Task Developer to create reusable tasks
Reusable tasks appear in the Navigator Tasks node and can be
dragged and dropped into any workflow

In a workflow, a reusable task is indicated by a special symbol

Unit 17

25

Worklet
 An object representing a set or grouping of Tasks
 Can contain any Task available in the Workflow Manager
 Worklets expand and execute inside a Workflow
 A Workflow which contains a Worklet is called the parent
Workflow
 Worklets CAN be nested
 Reusable Worklets create in the Worklet Designer
 Non-reusable Worklets create in the Workflow Designer

Unit 18

26

Workflow
A collection of ordered tasks
Tasks can be linked sequentially, concurrently and/or combined
Links can be conditional on previous tasks completing

Unit 1

27

Workflow Structure
Workflow 1
1

Session 1
Worklet A

1
2
3

Session A1
Session A2
Session A3

Worklet B

Session B1Session B2

Worklet C
Session C1
Session C2

3
4

28

Workflow Schedule
Workflow can be scheduled to run continuously, repeat at a given time or
interval, or start manually.
The Integration Service runs a workflow unless the prior workflow run fails.
When a workflow fails, the Integration Service removes the workflow from the
schedule, and you must reschedule it

29

Workflow Monitor
Check Workflow Status
Recover Workflow
Get session log

30

Recap
1.
2.
3.
4.
5.

Workflow
Worklet
Task
Workflow Manager
Workflow Monitor

a.
b.
c.
d.
e.

A collection of ordered tasks


Set of tasks
An executable mapping, functions or commands
Create and start workflows
Monitor and control workflows

Unit 1

31

PowerCenter Architecture
Domain
Sources

Integration
Service

Repository
Service
Repository
Service Process

Targets

Administration
Console

PowerCenter Client

Repository

32

Architecture Components

Domain is a collection of nodes and services. Primary unit of administration

The Repository Service manages connections to the PowerCenter repository from


client applications. The Repository Service is a separate, multi-threaded process that
retrieves, inserts, and updates metadata in the repository database tables. The
Repository Service ensures the consistency of metadata in the repository.

The Integration Service reads mapping and session information from the repository.
It extracts data from the mapping sources and stores the data in memory while it
applies the transformation rules that you configure in the mapping. The Integration
Service loads the transformed data into the mapping targets.

The Administration Console is a web application that you use to manage a


PowerCenter domain. If you have a user login to the domain, you can access the
Administration Console. Use the Administration Console to perform administrative
tasks such as managing logs, user accounts, and domain objects. Domain objects
include services, nodes, and licenses.

The PowerCenter repository resides in a relational database. The repository


database tables contain the instructions required to extract, transform, and load data.
PowerCenter Client applications access the repository database tables through the
Repository Service.
33

Metadata
Defines data and processes
Examples:
Source and target definitions
Type (flat file, database table, XML file, etc)
Datatype (character string, integer, decimal, etc)
Other attributes (length, precision, etc.)

Mapping logic
Workflow logic

Stored in a metadata repository

Repository

34

Recap
Match the terms and explanations:
1. Metadata
2. Repository
3. Repository Manager
4. Integration Service

a. Defines data and processes


b. Collection of tables that contains
PowerCenter metadata
c. Repository organization and security
d. ETL processing engine

Unit 1

35

Where do we use PowerCenter?


Data Warehouse(SalesVision) and Data Mart
(Horizon) Loads
Customer Hub Load
Interfaces
PowerCafe Orders  Peoplesoft
Magic LeadsPowerCafe
Customer Portal Online Support AccessAtlas
ADS Sales Rep AccountsSalesPortal LDAP

36

PowerCenter Connect Options


Packaged
Applications and
Systems

Databases
and Flat
Files

Messaging and
Standards

Hierarchical*

Software as a
Service
(SaaS)

Hyperion Essbase

DB2

HTTP

Adabas

salesforce.com

Lotus Notes

Flat files

IBM MQSeries

C-ISAM

PeopleSoft

Informix

JMS

Complex flat files

SAP Netweaver BW

Netezza

LDAP

Datacom

SAS

SQL Server

MSMQ

IDMS

Siebel

Sybase

ODBC

IMS

Teradata

TIBCO Rendezvous

VSAM

Web logs

webMethods
Web Services
XML

37

Questions?

38

You might also like