You are on page 1of 144

 You can view details about a workflow or task in either a Gantt

1. GETTING STARTED chart view or a Task view. You can start, stop, and abort
workflows from the Workflow Monitor.
 Metadata Extensions tab
 The Workflow Monitor displays workflows that have run at least
 Execute SQL to create table
once.
 The source qualifier represents the rows that the Integration
 A transformation is a part of a mapping that generates or
Service reads from the source when it runs a session
modifies data.
 Every mapping includes a Source Qualifier transformation,
 A session is a set of instructions that tells the Integration Service
representing all data read from a source and temporarily stored
how to move data from sources to targets.
by the Integration Service.
 A session is a task, similar to other tasks available in the
Workflow Manager.
 The Advanced Transformation toolbar contains transformations
 You create a session for each mapping that you want the
such as Java, SQL, and XML Parser transformations
Integration Service to run.
 The Integration Service uses the instructions configured in the
Creating Target -
session and mapping to move data from sources to targets.
 You can also manually create a target definition, import the
definition for an existing target from a database, or create a
 A workflow is a set of instructions that tells the Integration
relational target from a transformation in the Designer
Service how to execute tasks, such as sessions, email
notifications, and shell commands.
 Click the Indexes tab to add an index to the target table.
 You create a workflow for sessions you want the Integration
 If the target database is Oracle, skip to the final step. You cannot
Service to run.
add an index to a column that already has the PRIMARY KEY
 You can include multiple sessions in a workflow to run sessions in
constraint added to it.
parallel or sequentially.
 The Integration Service uses the instructions configured in the
 Click Layout > Link Columns.
workflow to run sessions and other tasks.
 When you drag ports from one transformation to another, the
Designer copies the port description and links the original port
 **Select a code page for the database connection (DB
to its copy.
Connection Objects)
 If you click Layout > Copy Columns, every port you drag is copied,
 The source code page must be a subset of the target code page.
but not linked.
 The target code page must be a superset of the source code
 Link Columns vs. Copy columns??
page.

 By default, the Lookup transformation queries and stores the


 You can create reusable or non-reusable sessions in the Workflow
contents of the lookup table before the rest of the
Manager
transformation runs, so it performs the join through a local copy
of the table that it has cached
 When you create a workflow, you can include reusable tasks that
 If the data types, including precision and scale, of these two
you create in the Task Developer.
columns do not match, the Designer displays a message and
 You can also include non-reusable tasks that you create in the
marks the mapping invalid.
Workflow Designer.

 **Overview window
 By default, the workflow is scheduled to run on demand. The
 When you run the workflow, the Integration Service runs all
Integration Service only runs the workflow when you manually
sessions in the workflow, either simultaneously or in sequence,
start the workflow. You can configure workflows to run on a
depending on how you arrange the sessions in the workflow.
schedule

 Click the Gantt Chart tab at the bottom of the Time window to
 All workflows begin with the Start task, but you need to instruct
verify the Workflow Monitor is in Gantt Chart view.
the Integration Service which task to run next.
 Note: You can also click the Task View tab at the bottom of the
 To do this, you link tasks in the Workflow Manager
Time window to view the Workflow Monitor in Task view. You
can switch back and forth between views at any time.
 When the PowerCenter Integration Service runs workflows, you
can monitor workflow progress in the Workflow Monitor.

1
 From the list of source definitions, add the following source
definitions to the mapping:
- PROMOTIONS
- ITEMS_IN_PROMOTIONS
- ITEMS
- MANUFACTURERS
- ORDER_ITEMS
 Delete all Source Qualifier transformations that the Designer
creates when you add these source definitions
 Add a Source Qualifier transformation named SQ_AllData to the
mapping, and connect all the source definitions to it.

Sequence Generator -
 The starting number (normally 1).
 The current value stored in the repository.
 The number that the Sequence Generator transformation adds to
its current value for every request for a new ID.
 The maximum value in the sequence.
 A flag indicating whether the Sequence Generator transformation
counter resets to the minimum value once it has reached its
maximum value.

 The Sequence Generator transformation has two output ports,


NEXTVAL and CURRVAL, which correspond to the two pseudo-
columns in a sequence. When you query a value from the
NEXTVAL port, the transformation generates a new value.
 cannot add any new ports to this transformation or reconfigure
NEXTVAL and CURRVAL

Defining a Link Condition


 After you create links between tasks, you can specify conditions
for each link to determine the order of execution in the
workflow. If you do not specify conditions for each link, the
Integration Service executes the next task in the workflow by
default

 If the link condition evaluates to True, the Integration Service


runs the next task in the workflow. You can also use predefined
or user-defined workflow variables in the link condition.
 You can use the -- or // comment indicators with the Expression
Editor to add comments. Use comments to describe the
expression
 You can view results of link evaluation during workflow runs in
the workflow log

XML – Creating sources/targets etc

2. INFA ADMIN GUIDE


 A domain is a collection of nodes and services that you can group
in folders based on administration ownership

2
 One node in the domain acts as a gateway to receive service  When you start Informatica services, you start the Service
requests from clients and route them to the appropriate service Manager.
and node.  The Service Manager runs on each node.
 Services and processes run on nodes in a domain.  If the Service Manager is not running, the node is not available.
Application service support -
Service Manager - A service that manages all domain operations  It starts and stops services and service processes based on
Application Services - Services that represent server-based requests from clients.
functionality, such as the Model Repository Service and the Data  It also directs service requests to application services.
Integration Service.  The Service Manager uses TCP/IP to communicate with the
 The Service Manager and application services control security. application services
 The Service Manager manages users and groups that can log in to Domain support -
application clients and authenticates the users who log in to the  The functions that the Service Manager performs on a node
application clients. depend on the type of node.
 The Service Manager and application services authorize user  For example, the Service Manager running on the master
requests from application clients. gateway node performs all domain functions on that node.
 Informatica Administrator (the Administrator tool), consolidates  The Service Manager running on any other node performs some
the administrative tasks for domain objects such as services, domain functions on that node
nodes, licenses, and grids. You manage the domain and the
security of the domain through the Administrator tool

Nodes -
 Each node in the domain runs a Service Manager that manages
domain operations on that node
 The operations that the Service Manager performs depend on
the type of node.
 A node can be a gateway node or a worker node
 You can subscribe to alerts to receive notification about node
events such as node failure or a master gateway election.
 You can also generate and upload node diagnostics to the
Configuration Support Manager and review information such as
available EBFs and Informatica recommendations

Gateway Nodes -
 One node acts as the gateway at any given time.
 That node is called the master gateway.
 A gateway node can run application services, and it can serve as a
master gateway node.
 The master gateway node is the entry point to the domain
Application services represent server-based functionality.
Application services include the following services:
 You can configure more than one node to serve as a gateway.
- Analyst Service
 If the master gateway node becomes unavailable, the Service
- Content Management Service
Manager on other gateway nodes elects another master
- Data Integration Service
gateway node.
- Metadata Manager Service
 If you configure one node to serve as the gateway and the node
- Model Repository Service
becomes unavailable, the domain cannot accept service
- PowerCenter Integration Service
requests
- PowerCenter Repository Service
- PowerExchange Listener Service
Worker Nodes -
- PowerExchange Logger Service
 A worker node is any node not configured to serve as a gateway.
- Reporting Service
Service Manager -
- SAP BW Service
 It runs as a service on Windows and as a daemon on UNIX.
- Web Services Hub

3
Folder Management
High Availability  Folders can contain nodes, services, grids, licenses, and other
High availability consists of the following components: folders.
Resilience - The ability of application services to tolerate
transient network failures until either the resilience timeout User Accounts
expires or the external system failure is fixed.
Failover - The migration of an application service or task to Default Administrator -
another node when the node running the service process  The default administrator is a user account in the native security
becomes unavailable. domain.
Recovery - The automatic completion of tasks after a service is  You cannot create a default administrator.
interrupted. Automatic recovery is available for PowerCenter  You cannot disable or modify the user name or privileges of the
Integration Service and PowerCenter Repository Service tasks. default administrator.
You can also manually recover PowerCenter Integration Service  You can change the default administrator password
workflows and sessions. Manual recovery is not part of high
availability Domain Administrator -
 A domain administrator can create and manage objects in the
Informatica Administrator domain, including user accounts, nodes, grids, licenses, and
Use the Administrator tool to complete the following types of application services
tasks:  However, by default, the domain administrator cannot log in to
application clients.
Domain administrative tasks -  The default administrator must explicitly give a domain
 Manage logs, domain objects, user permissions, and domain administrator full permissions and privileges to the application
reports. services
 Generate and upload node diagnostics.
 Monitor jobs and applications that run on the Data Integration Application Client Administrator -
Service.  An application client administrator can create and manage
 Domain objects include application services, nodes, grids, folders, objects in an application client.
database connections, operating system profiles, and licenses.  You must create administrator accounts for the application clients
 By default, the application client administrator does not have
Security administrative tasks - permissions or privileges on the domain.
 Manage users, groups, roles, and privileges
User -
The Administrator tool has the following tabs:  A user with an account in the Informatica domain can perform
Domain - tasks in the application clients.
 View and edit the properties of the domain and objects within  Typically, the default administrator or a domain administrator
the domain. creates and manages user accounts and assigns roles,
 The contents that appear and the tasks you can complete on the permissions, and privileges in the Informatica domain.
Domain tab vary based on the view that you select.  However, any user with the required domain privileges and
 You can select the following views: permissions can create a user account and assign roles,
 Services and Nodes - View and manage application services, permissions, and privileges
nodes, grids, licenses
 Connections - View and manage connections

Logs - View log events for the domain and services within the Understanding Authentication and Security Domains -
domain.  When a user logs in to an application client, the Service Manager
Monitoring - View the status of profile jobs, scorecard jobs, authenticates the user account in the Informatica domain and
preview jobs, mapping jobs, and SQL data services for each Data verifies that the user can use the application client.
Integration Service.  The Service Manager uses native and LDAP authentication to
Reports - Run a Web Services Report or License Management authenticate users logging in to the Informatica domain.
Report.  You can use more than one type of authentication in an
Security - Manage users, groups, roles, and privileges. Informatica domain.
 By default, the Informatica domain uses native authentication.
4
 You can configure the Informatica domain to use LDAP  The PowerCenter Designer is resilient to temporary failures and
authentication in addition to native authentication. tries to establish a connection to the PowerCenter Repository
Service.
Privileges  The PowerCenter Repository Service starts within the resilience
Informatica includes the following privileges: timeout period, and the PowerCenter Designer reestablishes the
Domain privileges - Determine actions on the Informatica connection.
domain that users can perform using the Administrator tool and  After the PowerCenter Designer reestablishes the connection, the
the infacmd and pmrep command line programs. PowerCenter Repository Service recovers from the failed
Analyst Service privilege - Determines actions that users can operation and fetches the mapping into the PowerCenter
perform using Informatica Analyst. Designer workspace.
Data Integration Service privilege - Determines actions on
applications that users can perform using the Resilience -
Administrator tool and the infacmd command line program -  All clients of PowerCenter components are resilient to service
This privilege also determines whether users can drill down and failures.
export profile results.  PowerCenter services may also be resilient to temporary failures
Metadata Manager Service privileges - Determine actions that of external systems, such as database systems, FTP servers, and
users can perform using Metadata Manager. message queue sources.
Model Repository Service privilege - Determines actions on  For this type of resilience to work, the external systems must be
projects that users can perform using Informatica Analyst and highly available
Informatica Developer.
PowerCenter Repository Service privileges - Determine Internal Resilience
PowerCenter repository actions that users can perform using the  You can configure internal resilience at the following levels:
Repository Manager, Designer, Workflow Manager, Workflow - Domain
Monitor, and the pmrep and pmcmd command line programs. - Application Services
PowerExchange application service privileges - Determine - Gateway
actions that users can perform on the PowerExchange Listener
Service and PowerExchange Logger Service using the infacmd  The Model Repository, Data Integration Service, and Analyst
pwx commands. Service do not have internal resilience.
Reporting Service privileges - Determine reporting actions that  If the master gateway node becomes unavailable and fails over to
users can perform using Data Analyzer. another gateway node, you must restart these services.
 After the restart, the services do not restore the state of
 You assign privileges to users and groups and to application operation and do not recover from the point of interruption.
services.  You must restart jobs that were previously running during the
 You assign privileges to users and groups on the Security tab of interruption
the Administrator tool
High Availability in the Base Product
High Availability  Informatica provides some high availability functionality that
 If you have the high availability option, you can achieve full high does not require the high availability option.
availability of internal Informatica components.  The base product provides the following high availability
 You can achieve high availability with external components based functionality:
on the availability of those components.  Internal PowerCenter resilience - The Service Manager,
 If you do not have the high availability option, you can achieve application services, PowerCenter Client, and command line
some high availability of internal components programs are resilient to temporary unavailability of other
PowerCenter internal components.
Example  PowerCenter Repository database resilience - The PowerCenter
 While you are fetching a mapping into the PowerCenter Designer Repository Service is resilient to temporary unavailability of the
workspace, the PowerCenter Repository Service becomes repository database.
unavailable, and the request fails. The PowerCenter Repository  Restart services - The Service Manager can restart application
Service fails over to another node because it cannot restart on services after a failure.
the same node.  Manual recovery of PowerCenter workflows and sessions - You
can manually recover PowerCenter workflows and sessions.

5
 Multiple gateway nodes - You can configure multiple nodes as
gateway.
Note: You must have the high availability option for failover and
automatic recovery

You can configure the following resilience properties for the


domain, application services, and command line programs:
Resilience timeout - The amount of time a client tries to connect
or reconnect to a service. A limit on resilience timeouts can
override the timeout.
Limit on resilience timeout - The amount of time a service waits
for a client to connect or reconnect to the service. This limit can
override the client resilience timeouts configured for a
connecting client. This is available for the domain and application
services

Configuring Service Resilience for the Domain


 The domain resilience timeout determines how long services try
to connect as clients to other services.
 The default value is 30 seconds.

 The limit on resilience timeout is the maximum amount of time


that a service allows another service to connect as a client.
 This limit overrides the resilience timeout for the connecting
service if the resilience timeout is a greater value.
 The default value is 180 seconds.

 The PowerCenter Client resilience timeout is 180 seconds and is


not configurable.
PowerCenter Integration Service
 This resilience timeout is bound by the service limit on resilience
timeout.  PowerCenter Integration Service files include run-time files, state
of operation files, and session log files.
When you use a command line program to connect to the  The PowerCenter Integration Service creates files to store the
domain or an application service, the resilience timeout is state of operations for the service.
determined by one of the following values:  The state of operations includes information such as the active
Command line option - You can determine the resilience timeout service requests, scheduled tasks, and completed and running
for command line programs by using a command line option, processes.
-timeout or -t, each time you run a command  If the service fails, the PowerCenter Integration Service can
Environment variable - If you do not use the timeout option in restore the state and recover operations from the point of
the command line syntax, the command line program uses the interruption.
value of the environment variable
INFA_CLIENT_RESILIENCE_TIMEOUT that is configured on the To move data from sources to targets, the PowerCenter
client machine. Integration Service uses the following components:
Default value - If you do not use the command line option or the PowerCenter Integration Service process - The PowerCenter
environment variable, the command line program uses the Integration Service starts one or more PowerCenter Integration
default resilience timeout of 180 seconds. Service processes to run and monitor workflows. When you run a
Limit on timeout - If the limit on resilience timeout for the workflow, the PowerCenter Integration Service process starts and
service is smaller than the command line resilience timeout, the locks the workflow, runs the workflow tasks, and starts the
command line program uses the limit as the resilience timeout process to run sessions.
Load Balancer - The PowerCenter Integration Service uses the
Load Balancer to dispatch tasks. The Load Balancer dispatches

6
tasks to achieve optimal performance. It may dispatch tasks to a
single node or across the nodes in a grid. Recovery Tables Files
Data Transformation Manager (DTM) process - The PowerCenter  The PowerCenter Integration Service process creates recovery
Integration Service starts a DTM process to run each Session and tables on the target database system when it runs a session
Command task within a workflow. The DTM process performs enabled for recovery.
session validations, creates threads to initialize the session, read,
write, and transform data, and handles pre- and post- session Control File
operations.  When you run a session that uses an external loader, the
PowerCenter Integration Service process creates a control file
 PowerCenter Integration Service process accepts requests from and a target flat file.
the PowerCenter Client and from pmcmd and performs the  The control file contains information about the target flat file
following tasks: such as data format and loading instructions for the external
- Manage workflow scheduling. loader.
- Lock and read the workflow.  The control file has an extension of .ctl.
- Read the parameter file.  The PowerCenter Integration Service process creates the control
- Create the workflow log. file and the target flat file in the PowerCenter Integration
- Run workflow tasks and evaluates the conditional links Service variable directory, $PMTargetFileDir, by default.
connecting tasks.
- Start the DTM process or processes to run the session. Indicator File
- Write historical run information to the repository.  If you use a flat file as a target, you can configure the
- Send post-session email in the event of a DTM failure. PowerCenter Integration Service to create an indicator file for
target row type information.
Thread Types  For each target row, the indicator file contains a number to
 The types of threads the master thread creates depend on the indicate whether the row was marked for insert, update, delete,
pre- and post-session properties, as well as the types of or reject.
transformations in the mapping.  The PowerCenter Integration Service process names this file
The master thread can create the following types of threads: target_name.ind and stores it in the PowerCenter Integration
Mapping threads - Service variable directory, $PMTargetFileDir, by default
The master thread creates one mapping thread for each session.
The mapping thread fetches session and mapping information, Cache Files
compiles the mapping, and cleans up after session execution  When the PowerCenter Integration Service process creates
Pre- and post-session threads - memory cache, it also creates cache files.
The master thread creates one pre-session and one post-session  The PowerCenter Integration Service process creates cache files
thread to perform pre- and post-session operations. for the following mapping objects:
Reader threads - Aggregator transformation
Transformation threads - Joiner transformation
The number of transformation threads depends on the - Rank transformation
partitioning information for each pipeline - Lookup transformation
Writer threads - Sorter transformation
- XML target

Reject Files  By default, the DTM creates the index and data files for
 By default, the PowerCenter Integration Service process creates a Aggregator, Rank, Joiner, Lookup transformations and XML
reject file for each target in the session. targets in the directory configured for the $PMCacheDir service
The writer may reject a row in the following circumstances: process variable.
- It is flagged for reject by an Update Strategy or Custom  The PowerCenter Integration Service process names the index file
transformation. PM*.idx, and the data file PM*.dat.
- It violates a database constraint such as primary key constraint.  The PowerCenter Integration Service process creates the cache
- A field in the row was truncated or overflowed, and the target file for a Sorter transformation in the $PMTempDir service
database is configured to reject truncated or overflowed data. process variable directory.
Note: If you enable row error logging, the PowerCenter
Integration Service process does not create a reject file. Incremental Aggregation Files
7
 If the session performs incremental aggregation, the Metric-based - The Load Balancer evaluates nodes in a round-
PowerCenter Integration Service process saves index and data robin fashion. It checks all resource provision thresholds on each
cache information to disk when the session finished. available node and excludes a node if dispatching a task causes
 The next time the session runs, the PowerCenter Integration the thresholds to be exceeded. The Load Balancer continues to
Service process uses this historical information to perform the evaluate nodes until it finds a node that can accept the task. This
incremental aggregation. mode prevents overloading nodes when tasks have uneven
 By default, the DTM creates the index and data files in the computing requirements.
directory configured for the $PMCacheDir service process Adaptive - The Load Balancer ranks nodes according to current
variable CPU availability. It checks all resource provision thresholds on
each available node and excludes a node if dispatching a task
Persistent Lookup Cache causes the thresholds to be exceeded. This mode prevents
 By default, the DTM creates the index and data files in the overloading nodes and ensures the best performance on a grid
directory configured for the $PMCacheDir service process that is not heavily loaded.
variable

Load Balancer
You configure the following settings for the domain to determine
how the Load Balancer dispatches tasks:
Dispatch mode - The dispatch mode determines how the Load
Balancer dispatches tasks. You can configure the Load Balancer to
dispatch tasks in a simple round-robin fashion, in a round-robin
fashion using node load metrics, or to the node with the most
available computing resources.
Service level - Service levels establish dispatch priority among
tasks that are waiting to be dispatched. You can create different
service levels that a workflow developer can assign to workflows

You configure the following Load Balancer settings for each node:
Resources - When the PowerCenter Integration Service runs on a
grid, the Load Balancer can compare the resources required by a
task with the resources available on each node. The Load
Balancer dispatches tasks to nodes that have the required
resources. You assign required resources in the task properties.
You configure available resources using the Administrator tool or
infacmd.
CPU profile - In adaptive dispatch mode, the Load Balancer uses
the CPU profile to rank the computing throughput of each CPU USING PMCMD
and bus architecture in a grid. It uses this value to ensure that  pmcmd is a program you use to communicate with the
more powerful nodes get precedence for dispatch. Integration Service.
Resource provision thresholds - The Load Balancer checks one or  With pmcmd, you can perform some of the tasks that you can
more resource provision thresholds to determine if it can also perform in the Workflow Manager, such as starting and
dispatch a task. The Load Balancer checks different thresholds stopping workflows and sessions.
depending on the dispatch mode
Use pmcmd in the following modes:
The Load Balancer uses the following dispatch modes: Command line mode - You invoke and exit pmcmd each time you
Round-robin - The Load Balancer dispatches tasks to available issue a command. You can write scripts to schedule workflows
nodes in a round-robin fashion. It checks the Maximum Processes with the command line syntax. Each command you write in
threshold on each available node and excludes a node if command line mode must include connection information to the
dispatching a task causes the threshold to be exceeded. This Integration Service.
mode is the least compute-intensive and is useful when the load Interactive mode - You establish and maintain an active
on the grid is even and the tasks to dispatch have similar connection to the Integration Service. This lets you issue a series
computing requirements. of commands.

8
 You can use environment variables for user names and passwords  When you use pmrep, you can enter commands in the following
with pmcmd. modes:
 You can also use environment variables to customize the way Command line mode - You can issue pmrep commands directly
pmcmd displays the date and time on the machine running the from the system command line. Use command line mode to
Integration Service process. script pmrep commands.
 Before you use pmcmd, configure these variables on the machine Interactive mode - You can issue pmrep commands from an
running the Integration Service process. interactive prompt. pmrep does not exit after it completes a
 The environment variables apply to pmcmd commands that run command.
on the node.
 Note: If the domain is a mixed-version domain, run pmcmd from  You can use environment variables to set user names and
the installation directory of the Integration Service version passwords for pmrep. Before you use pmrep, configure these
variables.
Running Commands in Command Line Mode  The environment variables apply to pmrep commands that run
 When you run pmcmd in command line mode, you enter on the node.
connection information such as domain name, Integration
Service name, user name and password in each command.  All pmrep commands require a connection to the repository
 For example, to start the workflow “wf_SalesAvg” in folder except for the following commands:
“SalesEast,” use the following syntax: - Help
pmcmd startworkflow -sv MyIntService -d MyDomain -u seller3 -p - ListAllPrivileges
jackson -f SalesEast wf_SalesAvg  Use the pmrep Connect command to connect to the repository
before using other pmrep commands.
Running Commands in Interactive Mode  Note: If the domain is a mixed-version domain, run pmrep from
 Use pmcmd in interactive mode to start and stop workflows and the installation directory of the Repository Service version.
sessions without writing a script.
 When you use the interactive mode, you enter connection
information such as domain name, Integration Service name,
user name, and password. You can run subsequent commands
without entering the connection information for each
command.
 For example, the following commands invoke the interactive
mode, establish a connection to Integration Service
“MyIntService,” and start workflows “wf_SalesAvg” and
“wf_SalesTotal” in folder “SalesEast”:

pmcmd
pmcmd> connect -sv MyIntService -d MyDomain -u seller3 -p
3. DESIGNER GUIDE
jackson
pmcmd> setfolder SalesEast The Designer provides the following tools:
pmcmd> startworkflow wf_SalesAvg Source Analyzer - Import or create source definitions for flat file,
pmcmd> startworkflow wf_SalesTotal XML, COBOL, Application, and relational sources.
Target Designer - Import or create target definitions.
USING PMREP Transformation Developer - Create reusable transformations.
 pmrep is a command line program that you use to update Mapplet Designer - Create mapplets.
repository information and perform repository functions. Mapping Designer - Create mappings.
 pmrep is installed in the PowerCenter Client and PowerCenter
The Designer consists of the following windows:
Services bin directories. Navigator - Connect to multiple repositories and folders. You
 Use pmrep to perform repository administration tasks such as can also copy and delete objects and create shortcuts using the
listing repository objects, creating and editing groups, restoring Navigator. Workspace - View or edit sources, targets, mapplets,
transformations, and mappings. You work with a single tool at a
and deleting repositories, and updating session-related
time in the workspace, which has two formats: default and
parameters and security information in the PowerCenter workbook. You can view multiple versions of an object in the
repository. workspace.

9
Status bar - Displays the status of the operation you perform. - Add a repository.
Output - Provides details when you perform certain tasks, such
- Print the workspace.
as saving work or validating a mapping. Rightclick the Output
window to access window options, such as printing output text,
- View date and time an object was last saved.
saving text to file, and changing the font size. - Open and close a folder.
Overview - View workbooks that contain large mappings or a lot - Create shortcuts (You cannot create shortcuts to objects in
of objects. The Overview window outlines the visible area in the non-shared folders)
workspace and highlights selected objects in color. To open the
- Check out and in repository objects.
Overview window, click View > Overview Window.
Instance Data - View transformation data while you run the - Search for repository objects.
Debugger to debug a mapping. - Enter descriptions for repository objects.
Target Data - View target data while you run the Debugger to - View older versions of objects in the workspace.
debug a mapping.
- Revert to a previously saved object version.
You can view a list of open windows, and you can switch from - Copy objects.
one window to another in the Designer. To view the list of open - Export and import repository objects.
windows, click Window > Windows. - Work with multiple objects, ports, or columns.
- Rename ports.
**Configuring Designer options
- Use shortcut keys.
Creating a Toolbar  You can also view object dependencies in the Designer.
 You can create a new toolbar and choose buttons for the new
toolbar. Rules and Guidelines for Viewing and Comparing Versioned
 You can create toolbars in the Designer, Workflow Manager, and Repository Objects
the Workflow Monitor.  You cannot simultaneously view multiple versions of composite
objects, such as mappings and mapplets.
Find Next  Older versions of composite objects might not include the child
 Use the Find Next tool to search for a column or port name in: objects that were used when the composite object was checked
- Transformations in.
- Mapplets  If you open a composite object that includes a child object version
- Source definitions that is purged from the repository, the preceding version of the
- Target definitions child object appears in the workspace as part of the composite
 With the Find Next tool, you can search one object at a time. object.
 You cannot search multiple objects at the same time.  For example, you want to view version 5 of a mapping that
 Use Find Next in each Designer tool. originally included version 3 of a source definition, but version 3
 Select a single transformation or click in the Output window of the source definition is purged from the repository. When you
before performing the search. view version 5 of the mapping, version 2 of the source definition
 The Designer saves the last 10 strings searched in the Find Next appears as part of the mapping.
box on the Standard toolbar  Shortcut objects are not updated when you modify the objects
 You can search for a string in the Save, Generate, or Validate tabs they reference. When you open a shortcut object, you view the
in the Output window. same version of the object that the shortcut originally
 The Find in Workspace tool searches for a field name or referenced, even if subsequent versions exist.
transformation name in all transformations in the workspace.
 The Find in Workspace tool lets you to search all of the Viewing an Older Version of a Repository Object
transformations in the workspace for port or transformation To open an older version of an object in the workspace:
names.  1. In the workspace or Navigator, select the object and click
 You can search for column or port names or table names Versioning > View History.
matching the search string.  2. Select the version you want to view in the workspace and click
 You can specify whether to search across all names in the Tools > Open in Workspace.
workspace, or across the business name of a table, column, or  Note: An older version of an object is read-only, and the version
port. number appears as a prefix before the object name. You can
 You can also choose to search for whole word matches for the simultaneously view multiple versions of a non-composite
search string or matches which match the case of the search object in the workspace
string
Reverting to a Previous Object Version
You can complete the following tasks in each Designer tool:
10
 When you edit an object in the Designer, you can revert to a  You can create, edit, delete, and view user-defined metadata
previously saved version, undoing changes you entered since extensions and change their values.
the last save.  You can create metadata extensions for the following objects in
 You can revert to the previously saved versions of multiple the Designer:
objects at the same time. - Source definitions
- Target definitions
To revert to a previously saved version of an object: - Transformations
1. Open the object in the workspace. - Mappings
2. Select the object and click Edit > Revert to Saved. - Mapplets
3. Click Yes. If you selected more than one object, click Yes to All.  You can create either reusable or non-reusable metadata
 The Designer removes all changes entered since the last time you extensions.
saved the object.  You associate reusable metadata extensions with all repository
objects of a certain type, such as all source definitions or all
Copying Designer Objects Expression transformations.
 You can copy Designer objects within the same folder, to a  You associate non-reusable metadata extensions with a single
different folder, or to a different repository. repository object, such as one target definition or one mapping.
 You can copy any of the Designer objects such as sources, targets,  If you create a reusable metadata extension for a transformation,
mappings, mapplets, transformations, and dimensions. the metadata extension applies to all transformations of that
 You must open the target folder before you can copy objects to it. type (for example, all Aggregator transformations or all Router
transformations), and not to all transformations.
 The Copy Wizard checks for conflicts in the target folder and  Note: If you make a metadata extension reusable, you cannot
provides choices to resolve the conflicts. change it back to non-reusable. The Designer makes the
 The Copy Wizard displays possible resolutions. extension reusable as soon as you confirm the action.
 For a duplicate object, you can rename, reuse, replace, or skip
copying the object. Editing Reusable Metadata Extensions
 To configure display settings and functions of the Copy Wizard,  If the metadata extension you want to edit is reusable and
click Tools > Options in the Designer. editable, you can change the value of the metadata extension,
 You can import objects from an XML file through the Import but not any of its properties. However, if the vendor or user who
Wizard in the Designer. created the metadata extension did not make it editable, you
 The Import Wizard provides the same options to resolve conflicts cannot edit the metadata extension or its value.
as the Copy Wizard  To restore the default value for a metadata extension, click Revert
in the UnOverride column.
Working with Multiple Ports or Columns Editing Non-Reusable Metadata Extensions
 In all Designer tools, you can move or delete multiple ports or  If the metadata extension you want to edit is non-reusable, you
columns at the same time. can change the value of the metadata extension and its
 Note: You cannot select multiple ports or columns when editing properties.
COBOL sources in the Source Analyzer  You can also promote the metadata extension to a reusable
 Note: When you select multiple ports or columns, the Designer metadata extension.
disables add, copy, and paste  To restore the default value for a metadata extension, click Revert
in the UnOverride column.
Working with Metadata Extensions
 You can extend the metadata stored in the repository by Using Business Names
associating information with individual repository objects.  You can add business names to sources, targets, and columns.
 For example, you may want to store contact information with the  Business names are descriptive names that you give to a source,
sources you create. target, or column.
 You associate information with repository objects using metadata  They appear in the Navigator in the Business Components source
extensions. node and in the source and target nodes.
 Repository objects can contain both vendor-defined and user-  Business names can also appear as column names of the source
defined metadata extensions. and target definition in the workspace.
 You can view and change the values of vendor-defined metadata  You can also create source qualifiers to display business names as
extensions, but you cannot create, delete, or redefine them. column names in the Mapping and Mapplet Designers.

11
Using Business Documentation 2. Right-click in the workspace and choose View Mapplet Report
 Business documentation provides details about a repository
object or transformation expression. Viewing a Mapping Composite Report
 You can create and edit links to business documentation that you  View a mapping report to get more information about the objects
have developed for repository objects through the Designer. in a PowerCenter mapping.
 The documentation must reside on a local machine, network  The Mapping Composite Report includes information about the
server, or company intranet or internet web site in a Windows following components in the mapplet:
environment. - Source and target fields. Fields used in mapping sources.
 You can develop business documentation in HTML, PDF, or any - Port connections. Port-level connections between objects.
text format, for the following repository objects: - Transformation ports. Transformation ports for each
- Source and target tables and table instances transformation in the mapping.
- All transformations and transformation instances - Unconnected ports. Unconnected ports in mapping objects.
- Mapplets - Object-level connections. Connections between all objects in
- Mappings the mapping.
- Business component directories To view a Mapping Composite Report:
1. In the Designer, open a mapplet.
 To access business documentation, you need to complete the 2. Right-click in the workspace and choose View Mapping
following tasks: Report.
- Specify the documentation path in the Designer.
- Create a link in the repository object. You can import or create the following types of source
- Click the link to view the documentation definitions in the Source Analyzer:
- Relational tables, views, and synonyms
Viewing Mapplet and Mapping Reports - Fixed-width and delimited flat files that do not contain binary
 You can view PowerCenter Repository Reports for mappings and data.
mapplets in the Designer. - COBOL files
 View reports to get more information about the sources, targets, - XML files
ports, and transformations in mappings and mapplets. - Web Services Description Language (WSDL)
 When you view a report, the Designer launches the Data Analyzer - Data models using certain data modeling tools through - -
application in a browser window and displays the report. Metadata Exchange for Data Models (an add-on product)
 You can import sources that use multibyte character sets.
 You can view the following reports:  Source code pages must be a superset of the target code pages.
- Mapplet Composite Report
- Mapping Composite Report  Source definitions can be single- or multi-group.
 Before you run reports from the Designer, create a Reporting  A single-group source has a single group in the source definition.
Service in the PowerCenter domain that contains the  Relational sources use a single-group source definition.
PowerCenter repository.  A multi-group source has multiple groups in the source definition.
 When you create a Reporting Service for a PowerCenter  Non-relational sources such as XML sources use multi-group
repository, Data Analyzer imports the PowerCenter Repository source definitions.
Reports.
Editing Relational Source Definitions
Viewing a Mapplet Composite Report  You might want to manually edit a source definition to record
 The Mapplet Composite Report includes information about a properties that you cannot import from the source.
mapplet:  You can edit a relational source definition to create key columns
- All objects. Information about all objects in the mapplet. and key relationships.
- Lookup transformations. Lookup transformations in the  These relationships can be logical relationships. They do not have
mapplet. to exist in the database.
- Dependencies. Mappings that use the mapplet.
- Ports. Port details for the input and output ports. Working with COBOL Sources
- Sources. Source instances in the mapplet.  To provide support for mainframe source data, you can import a
- Transformations. Transformations used in the mapplet. COBOL file as a source definition in the Designer.
To view a Mapplet Composite Report:  COBOL files are fixed-width files that may contain text and binary
1. In the Designer, open a mapplet. data.
12
 PowerCenter supports the following code pages for COBOL files: identification division.
- 7-bit ASCII program-id. mead.
- EBCDIC-US environment division.
- 8-bit ASCII select file-one assign to "fname".
- 8-bit EBCDIC data division.
- ASCII-based MBCS file section.
- EBCDIC-based MBCS fd FILE-ONE.
 You can import shift-sensitive COBOL files that do not contain copy “sample.cpy”.
shift keys. working-storage section.
 Define the shift states for each column in the COBOL source procedure division.
definition. stop run.

 COBOL sources often de-normalize data and compact the Components in a COBOL Source File
equivalent of separate table records into a single record.  When you import a COBOL source, the Designer scans the file for
 You use the Normalizer transformation to normalize these the following components:
records in the mapping. - FD Section
 COBOL files often represent the functional equivalent of multiple - Fields
source tables within the same set of records. - OCCURS
 When you review the structure of the COBOL file, you can adjust - REDEFINES
the description to identify which groups of fields constitute a
single pseudo-table FD Section
 The Designer assumes that each FD entry defines the equivalent
Working with COBOL Copybooks of a source table in a relational source and creates a different
 The Designer cannot recognize a COBOL copybook (.cpy file) as a COBOL source definition for each such entry.
COBOL file (.cbl file) because it lacks the proper format.  For example, if the COBOL file has two FD entries, CUSTOMERS
 To import a COBOL copybook in the Designer, you can insert it and ORDERS, the Designer creates one COBOL source definition
into a COBOL file template by using the COBOL statement containing the fields attributed to CUSTOMERS, and another
“copy.” with the fields that belong to ORDERS
 After you insert the copybook file into the COBOL file template,
you can save the file as a .cbl file and import it in the Designer. Fields
 If the .cbl file and the .cpy file are not in the same local directory,  The Designer identifies each field definition, reads the datatype,
the Designer prompts for the location of the .cpy file. and assigns it to the appropriate source definition.

 When the COBOL copybook file contains tabs, the Designer OCCURS
expands tabs into spaces.  COBOL files often contain multiple instances of the same type of
 By default, the Designer expands a tab character into eight data within the same record.
spaces.  For example, a COBOL file may include data about four different
 You can change this default setting in powrmart.ini. financial quarters, each stored in the same record.
 You can find powrmart.ini in the root directory of the  When the Designer analyzes the file, it creates a different column
PowerCenter Client installation. for each OCCURS statement in the COBOL file.
 These OCCURS statements define repeated information in the
 To change the default setting, add the following text to same record. Use the Normalizer transformation to normalize
powrmart.ini: this information.
[AnalyzerOptions]
TabSize=n  For each OCCURS statement, the Designer creates the following
where n is the number of spaces the Designer reads for every items:
tab character. - One target table when you drag the COBOL source definition
 To apply changes, restart the Designer. into the Target Designer.
- A primary-foreign key relationship
 For example, the COBOL copybook file is called sample.cpy. The - A generated column ID (GCID)
COBOL file below shows how to use the copy statement to
insert the sample copybook into a COBOL file template: REDEFINES
13
 COBOL uses REDEFINES statements to build the description of  A file list is a file that contains the names and directories of each
one record based on the definition of another record. source file you want the Integration Service to use.
 When you import the COBOL source, the Designer creates a  When you configure a session to read a file list, the Integration
single source that includes REDEFINES. Service reads rows of data from the different source files in the
 The REDEFINES statement lets you specify multiple PICTURE file list.
clauses for the sample physical data location.  To configure the mapping to write the source file name to each
 Therefore, you need to use Filter transformations to separate the target row, add the CurrentlyProcessedFileName port to the
data into the tables created by REDEFINES. flat file source definition.
 For each REDEFINES:  The Integration Service uses this port to return the source file
- The Designer creates one target table when you drag the name.
COBOL source definition into the Target Designer. Creating Target Definitions
- The Designer creates one primary-foreign key relationship.  You can create the following types of target definitions in the
- The Designer creates a generated key (GK). Target Designer:
- You need a separate Filter transformation in the mapping. Relational - Create a relational target for a particular database
platform. Create a relational target definition when you want to
Rules and Guidelines for Delimited File Settings use an external loader to the target database.
Delimited files are character-oriented and line sequential. Use the Flat file - Create fixed-width and delimited flat file target
following rules and guidelines when you configure delimited files: definitions.
- The column and row delimiter character, quote character, and XML file - Create an XML target definition to output data to an
escape character must all be different for a source definition. These XML file.
properties must also be contained in the source or target file code
page.  You can create target definitions in the following ways:
- The escape character and delimiters must be valid in the code page - Import the definition for an existing target. Import the target
of the source or target file. Use the following rules and guidelines definition from a relational target or a flat file. The Target
when you configure delimited file sources: Designer uses a Flat File Wizard to import flat files.
- In a quoted string, use the escape character to escape the quote - Create a target definition based on a source definition. Drag a
character. If the escape character does not immediately precede a source definition into the Target Designer to make a target
quote character, the Integration Service reads the escape character definition.
as an ordinary character. - Create a target definition based on a transformation or
- Use an escape character to escape the column delimiter. However, mapplet. Drag a transformation into the Target Designer to
in a quoted string, you do not need to use an escape character to make a target definition.
escape the delimiter since the quotes serve this purpose. If the - Manually create a target definition. Create a target definition in
escape character does not immediately precede a delimiter the Target Designer.
character, the Integration Service reads the escape character as an - Design several related target definitions. Create several related
ordinary character. target definitions at the same time. You can create the overall
- When two consecutive quote characters appear within a quoted relationship, called a schema, and the target definitions, through
string, the Integration Service reads them as one quote character. wizards in the Designer.
For example, the Integration Service reads the following quoted  The Cubes and Dimensions Wizards follow common principles of
string as I’m going tomorrow: 2353,‘I’’m going tomorrow’MD data warehouse design to simplify the process of designing
- The Integration Service reads a string as a quoted string only if the related targets.
quote character you select is the first character of the field.
- If the field length exceeds the column size defined in the Source Creating a Target Definition from a Transformation
Qualifier transformation, the Integration Service truncates the field.  To create a relational target definition that closely matches a
- If the row of data exceeds the larger of the line sequential buffer transformation in the repository; you can create the target from
length or the total row size defined in the Source Qualifier the transformation.
transformation, the Integration Service drops the row and writes it  Drag a transformation from the Navigator to the Target Designer,
to the session log file. To determine the row size defined in the or create a target from a transformation in the Mapping
Source Qualifier transformation, add the column precision and the Designer workspace.
delimiters, and then multiply the total by the maximum bytes per  Create target definitions from the following types of
character. transformations:
- Single-group transformations. Create a single target definition
Working with File Lists from a transformation with one output group.
14
- Multiple-group transformations. Create multiple target  A Data Transformation source or target calls a Data
definitions from a transformation with multiple output groups. Transformation service from a PowerCenter session.
- Normalizer transformations. Create a target definition from a  Data transformation is the application that transforms the file
source qualifier or pipeline Normalizer transformation. formats.
- Mapplets. Create one or more target definitions from a  The Data Transformation service is a Data Transformation service
mapplet instance in a mapping. that is deployed to the Data Transformation repository and is
ready to run
 When you create a target definition from a transformation, the
target database type is the same as the repository database by Data Transformation Service Types
default.  When you create a project in Data Transformation Studio, you
 After you create the target definition in the repository, you can choose a Data Transformation service type to define the project.
edit it. For example, you might want to change the target type. Data Transformation has the following types of services that
transform data:
Creating a Target from a Transformation with One Output Group Parser - Converts source documents to XML. The input can have
 When you create a target from a transformation with one output any format. The output of a parser is always XML.
group, the Designer creates one target. Serializer - Converts an XML file to another document. The input
 All the output ports become input ports in the target. The name is XML. The output can be any format.
of the target is the same as the transformation name. Mapper - Converts an XML source document to another XML
structure or schema. The input is XML. The output is XML .
Creating a Target from a Transformation with Multiple Output Transformer - Modifies the data in any format. Adds, removes,
Groups converts, or changes text. Use transformers with a parser,
 When you create targets from a transformation with more than mapper, or serializer. You can also run a transformer as
one output group, the Designer creates one target for each standalone component.
output group in the transformation. Streamer - Splits large input documents, such as multiple
 When the transformation is a plug-in or Custom transformation, gigabyte data streams, into segments. The Streamer splits
the Designer retains the primary key-foreign key relationships documents that have multiple messages or multiple records in
between the groups in the target definitions. them.

Creating a Target from a Normalizer Transformation MAPPINGS -


 You can create a target from a source qualifier or pipeline
Normalizer transformation. Object Dependency
 When you create a target from a Normalizer transformation,  Some objects in a mapping are also stored as independent
the Designer creates one target and includes all the columns objects in the repository:
from the Normalizer. - Sources
 It does not create separate targets to represent the record - Targets
hierarchies or multiple-occurring fields in the Normalizer - Reusable transformations
transformation - Mapplets
 The mapping is dependent on these objects.
Creating a Target from a Mapplet  When this metadata changes, the Designer and other
 You can create a target from a mapplet that is under a mapping PowerCenter Client applications track the effects of these
Transformation Instances node. changes on mappings.
 When you drag the mapplet instance to the Target Designer, the  In these cases, you may find that mappings become invalid even
Designer creates a target for each output group in the mapplet. though you do not edit the mapping.
 Note: You cannot create a target when you drag a transformation  When a mapping becomes invalid, the Integration Service cannot
instance from a mapplet to the Target Designer run it properly, and the Workflow Manager invalidates the
session.
Data Transformation Source and Target  The only objects in a mapping that are not stored as
 Use a Data Transformation source or target to process data in any independent repository objects are the non-reusable
file format such as Excel spreadsheets or PDF documents. transformations that you build within the mapping.
 You can also transform data in formats such as HL7, EDI-X12,  These non-reusable transformations are stored within the
EDIFACT, SWIFT, NACHA, FIXBAI2, and DTCC. mapping only.

15
Exporting and Importing a Mapping  To view column dependencies, right-click a target column in a
 You export a mapping to an XML file and import a mapping from mapping and choose Show Field Dependencies.
an XML file through the Designer.  The Designer displays the Field Dependencies dialog box which
 You might want to use the export and import feature to copy a lists all source columns connected to the target column.
mapping to the same repository, a connected repository, or a  When you define a port expression that performs a calculation
repository to which you cannot connect using multiple source columns, and then connect that port to a
target column, the Field Dependencies dialog box lists all source
Invalidating Sessions columns you use in the expression
 When you edit and save a mapping, some changes cause the
session to be invalid even though the mapping remains valid. Options for Linking Ports
 The Integration Service does not run invalid sessions. If you edit a  When you link transformations, you can link with one of the
mapping, the Designer invalidates sessions when you perform following options:
the following actions: - One to one. Link one transformation or output group to one
- Add or remove sources or targets. transformation, input group, or target only.
- Remove mapplets or transformations. - One to many.
- Replace a source, target, mapplet, or transformation when - Link one port to multiple transformations, input groups, or
importing or copying objects targets.
- Add or remove Source Qualifiers or COBOL Normalizers, or - Link multiple ports in one transformation or output group to
change the list of associated sources for these transformations. multiple transformations, input groups, or targets.
- Add or remove a Joiner or Update Strategy transformation. - Many to one. Link many transformations to one
- Add or remove transformations from a mapplet in the transformation, input group, or target.
mapping.
- Change the database type for a source or target. Rules and Guidelines for Connecting Mapping Objects
- If the Designer detects an error when you try to link ports
Deleting a Mapping between two mapping objects, it displays a symbol indicating
 You may delete mappings that you no longer use. When you that you cannot link the ports.
delete a mapping, you do not delete any sources, targets, - Follow logic of data flow in the mapping. You can link the
mapplets, or reusable transformations defined outside the following types of ports:
mapping. - The receiving port must be an input or input/output port.
 Note: If you enable version control, deleted mapping remains - The originating port must be an output or input/output port.
checked out until you check it in. To check in a deleted - You cannot link input ports to input ports or output ports to
mapping, click Versioning > Find Checkouts. Select the deleted output ports.
mapping and click Tools > Check In. - You must link at least one port of an input group to an
upstream transformation.
Viewing Link Paths to a Port - You must link at least one port of an output group to a
 When displaying both link paths, the Designer traces the flow of downstream transformation.
data from one column in the source, in and out of each - You can link ports from one active transformation or one
transformation, and into a single port in the target. output group of an active transformation to an input group of
 For unconnected transformations, the Designer does not display another transformation.
a link path. - You cannot connect an active transformation and a passive
 For connected Lookup transformations, the Designer shows each transformation to the same downstream transformation or
output port dependent upon the input ports involved in the transformation input group.
lookup condition. - You cannot connect more than one active transformation to
 For Custom transformations, the Designer shows that an output the same downstream transformation or transformation input
port depends on all input ports by default. However, if you group.
define port relationships in a Custom transformation, the - You can connect any number of passive transformations to
Designer shows the dependent ports you define. the same downstream transformation, transformation input
 Note: You can configure the color the Designer uses to display group, or target
connectors in a link path. When configuring the format options, - You can link ports from two output groups in the same
choose the Link Selection option. transformation to one Joiner transformation configured for
sorted data if the data from both output groups is sorted.
Viewing Source Column Dependencies
16
- You can only link ports with compatible datatypes. The The precision of Item_desc_out is 12, ITEM_NAME is 10, and
Designer verifies that it can map between the two datatypes ITEM_DESC is 10. You change the precision of ITEM_DESC to 15. You
before linking them. The Integration Service cannot transform select parse expressions to infer dependencies and propagate the
data between ports with incompatible datatypes. While the port attributes of ITEM_NAME and ITEM_DESC. The Designer does
datatypes do not have to be identical, they do have to be not update the precision of the Item_desc_out port in the
compatible, such as Char and Varchar. Expression transformation since the ITEM_NAME and ITEM_DESC
- You must connect a source definition to a source qualifier only. ports have different precisions.
You then link the source qualifier to targets or other
transformations. Creating Target Files by Transaction
- You can link columns to a target definition in a mapping, but  You can generate a separate output file each time the Integration
you cannot copy columns into a target definition in a mapping. Service starts a new transaction.
Use the Target Designer to add columns to a target definition.  You can dynamically name each target flat file.
- The Designer marks some mappings invalid if the mapping  To generate a separate output file for each transaction, add a
violates data flow validation. FileName port to the flat file target definition.
 When you connect the FileName port in the mapping, the
Propagating Ports and attributes Integration Service creates a separate target file at each commit.
The Designer does not propagate changes to the following mapping  The Integration Service names the output file based on the
objects: FileName port value from the first row in each transaction.
- Unconnected transformations  By default, the Integration Service writes output files to
- Reusable transformations $PMTargetFileDir
- Mapplets
- Source and target instances Rules and Guidelines for Creating Target Files by Transaction
- SDK Source Qualifier - You can use a FileName column with flat file targets.
- You can add one FileName column to the flat file target
Rules and Guidelines for Propagating Ports and Attributes definition.
- The Designer does not propagate to implicit dependencies within - You can use a FileName column with data from real-time
the same transformation. sources.
- When you propagate a port description, the Designer overwrites - A session fails if you use a FileName column with merge files,
the description for the port in the other transformations in the file lists, or FTP targets.
mapping - If you pass the same file name to targets in multiple
- When you propagate backward along the link path, verify that the partitions, you might get unexpected results.
change does not cause the Integration Service to fail the session. For - When a transformation drops incoming transaction boundaries
example, if you propagate changes to a source qualifier, the and does not generate commits, the Integration Service writes
Integration Service might generate invalid SQL when it runs the all rows into the same output file. The output file name is the
session. If you change the port name “CUST_ID” to “CUSTOMER_ID,” initial value of the FileName port.
the Integration Service might generate SQL to select the wrong
column name if the source table uses “CUST_ID.” Rejecting Truncated and Overflow Data
- When you propagate port attributes, verify that the change does  When a conversion causes an overflow, the Integration Service,
not cause the Designer to invalidate the mapping. For example, by default, skips the row.
when you change the datatype of a port from integer to string and  The Integration Service does not write the data to the reject file.
propagate the datatype to other transformations, the Designer  For strings, the Integration Service truncates the string and
invalidates the mapping if a calculation uses one of the changed passes it to the next transformation.
ports. Validate the mapping after you propagate ports. If the  The Designer provides an option to let you include all truncated
Designer invalidates the mapping, click Edit > Revert to Saved to and overflow data between the last transformation and target
revert to the last saved version of the mapping. in the session reject file. If you select Reject Truncated/Overflow
- When you propagate multiple ports, and an expression or condition Rows, the Integration Service sends all truncated rows and any
depends on more than one propagated port, the Designer does not overflow rows to the session reject file or to the row error logs,
propagate attributes to implicit dependencies if the attributes do not depending on how you configure the session.
match. For example, you have the following expression in an
Expression transformation: Rules and Guidelines for Configuring the Target Update Override
Item_desc_out = Substr(ITEM_NAME, 0, 6) || Substr(ITEM_DESC, 0, - If you use target update override, you must manually put all
6) database reserved words in quotes.
17
- You cannot override the default UPDATE statement if the target $ParamMyCommand, as the SQL command, and set
column name contains any of the following characters: $ParamMyCommand to the SQL statement in a parameter file.
' , ( ) < > = + - * / \ t \ n \ 0 <space> - Use a semicolon (;) to separate multiple statements. The
- You can use parameters and variables in the target update Integration Service issues a commit after each statement.
query. Use any parameter or variable type that you can define in - The Integration Service ignores semicolons within /* ...*/.
the parameter file. - If you need to use a semicolon outside of comments, you can
 You can enter a parameter or variable within the UPDATE escape it with a backslash (\).
statement, or you can use a parameter or variable as the update - The Designer does not validate the SQL.
query.  Note: You can also enter pre- and post-session SQL commands on
 For example, you can enter a session parameter, the Properties tab of the Source Qualifier transformation.
$ParamMyOverride, as the update query, and set
$ParamMyOverride to the UPDATE statement in a parameter Validating the mapping
file.  When you develop a mapping, you must configure it so the
- When you save a mapping, the Designer verifies that you Integration Service can read and process the entire mapping.
have referenced valid port names. It does not validate the SQL.  The Designer marks a mapping invalid when it detects errors that
- If you update an individual row in the target table more than will prevent the Integration Service from running sessions
once, the database only has data from the last update. associated with the mapping.
 If the mapping does not define an order for the result data,  The Designer marks a mapping valid for the following reasons:
different runs of the mapping on identical input data may result Connection validation - Required ports are connected and that
in different data in the target table. all connections are valid.
- A WHERE clause that does not contain any column references Expression validation - All expressions are valid.
updates all rows in the target table, or no rows in the target Objects validation - The independent object definition matches
table, depending on the WHERE clause and the data from the the instance in the mapping.
mapping. Data flow validation - The data must be able to flow from the
 For example, the following query sets the EMP_NAME to “MIKE sources to the targets without hanging at blocking
SMITH” for all rows in the target table if any row of the transformations.
transformation has EMP_ID > 100:
 UPDATE T_SALES set EMP_NAME = 'MIKE SMITH' WHERE Connection Validation
:TU.EMP_ID > 100  The Designer performs connection validation each time you
- If the WHERE clause contains no port references, the mapping connect ports in a mapping and each time you validate or save a
updates the same set of rows for each row of the mapping. mapping.
 For example, the following query updates all employees with  When you connect ports, the Designer verifies that you make
EMP_ID > 100 to have the EMP_NAME from the last row in the valid connections.
mapping:  When you save or validate a mapping, the Designer verifies that
 UPDATE T_SALES set EMP_NAME = :TU.EMP_NAME WHERE the connections are valid and that all required ports are
EMP_ID > 100 connected.
- If the mapping includes an Update Strategy or Custom  When you save or validate a mapping, the Designer makes the
transformation, the Target Update statement only affects following connection validations:
records marked for update. - At least one source and one target must be connected.
- If you use the Target Update option, configure the session to - Source qualifiers must be mapped to a target.
mark all source records as update. - Mapplets must be connected. At least one mapplet input port
and output port is connected to the mapping. If the mapplet
Rules and Guidelines for Adding Pre- and Post-Session SQL includes a source qualifier that uses an SQL override, the
Commands Designer prompts you to connect all mapplet output ports to the
- Use any command that is valid for the database type. However, mapping.
the Integration Service does not allow nested comments, even - Datatypes between ports must be compatible. If you change a
though the database might. port datatype to one that is incompatible with the port it is
- You can use parameters and variables in the target pre- and connected to, the Designer generates an error and invalidates the
post-session SQL commands. mapping. For example, you have two Date/Time ports connected,
For example, you can enter a parameter or variable within the and you change one port to a Decimal. The Designer invalidates
command. Or, you can use a session parameter, the mapping. You can however, change the datatype if it remains
compatible with the connected ports, such as Char and Varchar
18
 Mapping A contains two multigroup transformations that block
Data Flow Validation data, MGT1 and MGT2. If you could run this session, MGT1
 When you validate or save a mapping, the Designer verifies that might block data from S1 while waiting for a row from S2.
the data can flow from all sources in a target load order group to  And MGT2 might block data from S2 while waiting for a row from
the targets without the Integration Service blocking all sources. S1.
 Mappings that include blocking transformations might hang at  The blocking transformations would block both source pipelines
runtime with any of the following mapping configurations: and the session would hang. Therefore, the Designer marks the
- You connect one source pipeline to multiple input groups of mapping invalid.
the blocking transformation  Mapping B contains one multigroup transformation that blocks
- You connect the sources and transformations in a target load data, MGT1.
order group in such a way that multiple blocking  Blocking transformations can never block all input groups, so
transformations could possibly block all source pipelines. MGT1 might block either S1 or S2, but never both.
 Depending on the source data used in a session, a blocking  MGT2 is not a blocking transformation, so it will never block data.
transformation might block data from one source while it waits  Therefore, this session will not hang at runtime due to blocking.
for a row from a different source.  The Designer marks the mapping valid
 When you save or validate a mapping with one of these
configurations, the Designer marks the mapping invalid. Steps to Validate a Mapping
 When the Designer marks a mapping invalid because the  You can validate a mapping while you are working on it through
mapping violates data flow validation, you must configure the the Designer. Also, when you click Repository > Save, the
mapping differently, or use a non-blocking transformation where Designer validates all mappings since the last time you saved.
possible.  When you validate or save a mapping the results of the validation
 The following figure shows mappings that are invalid because appear in the Output window.
one source provides data for multiple input groups of a blocking  The Repository Manager also displays whether a mapping is valid.
transformation:  To validate a mapping, check out and open the mapping, and
click Mappings > Validate.
 If the Output window is not open, click View > Output Window.
Review any errors to determine how to fix the mapping.

Validating Multiple Mappings


 You can validate multiple mappings without fetching them into
the workspace.
 To validate multiple mappings you must select and validate the
mappings from either a query results view or a view object
dependencies list.
 Note: If you use the Repository Manager, you can select and
validate multiple mappings from the Navigator.
To make the mappings valid, use a non-blocking transformation for
 You can save and optionally check in mappings that change from
MGT1 or create two instances of the same source and connect them
invalid to valid status as a result of the validation.
to the blocking transformation.
 To validate multiple mappings:
The following figure shows two similar mappings, one which is valid,
1. Select mappings from either a query or a view dependencies list.
and one which is invalid:
2. Right-click one of the selected mappings and choose Validate. The
Validate Objects dialog box displays.
3. Choose whether to save objects and check in objects that you
validate

MAPPLETS

Mapplets Overview

 When you use a mapplet in a mapping, you use an instance of


the mapplet. Like a reusable transformation, any change made
to the mapplet is inherited by all instances of the mapplet.
19
 Each port in the Input transformation connected to another
 Mapplets help simplify mappings in the following ways: transformation in the mapplet becomes a mapplet input port.
Include source definitions - Use multiple source definitions and  Input transformations can receive data from a single active
source qualifiers to provide source data for a mapping. source.
Accept data from sources in a mapping - If you want the mapplet to  Unconnected ports do not display in the Mapping Designer.
receive data from the mapping, use an Input transformation to
receive source data.  You can connect an Input transformation to multiple
Include multiple transformations - A mapplet can contain as many transformations in a mapplet.
transformations as you need.  However, you cannot connect a single port in the Input
Pass data to multiple transformations - You can create a mapplet to transformation to multiple transformations in the mapplet.
feed data to multiple transformations. Each Output transformation
in a mapplet represents one output group in a mapplet. Mapplet Output
Contain unused ports - You do not have to connect all mapplet input  Use an Output transformation in a mapplet to pass data through
and output ports in a mapping. the mapplet into a mapping.
 A mapplet must contain at least one Output transformation with
Understanding Mapplet Input and Output at least one connected port in the mapplet.
 To use a mapplet in a mapping, you must configure it for input  Each connected port in an Output transformation displays as a
and output. mapplet output port in a mapping.
 In addition to transformation logic that you configure, a mapplet  Each Output transformation in a mapplet displays as an output
has the following components: group in a mapping.
Mapplet input: You can pass data into a mapplet using source  An output group can pass data to multiple pipelines in a
definitions or Input transformations or both. When you use an Input mapping.
transformation, you connect it to the source pipeline in the mapping.
Mapplet output: Each mapplet must contain one or more Output Viewing Mapplet Input and Output
transformations to pass data from the mapplet into the mapping.  Mapplets and mapplet ports display differently in the Mapplet
Mapplet ports: Mapplet ports display only in the Mapping Designer. Designer and the Mapping Designer
Mapplet ports consist of input ports from Input transformations and  The following figure shows a mapplet with both an Input
output ports from Output transformations. If a mapplet uses source transformation and an Output transformation:
definitions rather than Input transformations for input, it does not
contain any input ports in the mapping.

Mapplet Input
 Mapplet input can originate from a source definition and/or from
an Input transformation in the mapplet.
 You can create multiple pipelines in a mapplet.  When you use the mapplet in a mapping, the mapplet object
 Use multiple source definitions and source qualifiers or Input displays only the ports from the Input and Output
transformations. transformations. These are referred to as the mapplet input and
 You can also use a combination of source definitions and Input mapplet output ports.
transformations.
Using Source Definitions for Mapplet Input  The following figure shows the same mapplet in the Mapping
 Use one or more source definitions in a mapplet to provide Designer:
source data.
 When you use the mapplet in a mapping, it is the first object in
the mapping pipeline and contains no input ports.

Using Input Transformations for Mapplet Input


 Use an Input transformation in a mapplet when you want the
mapplet to receive input from a source in a mapping.
 When you use the mapplet in a mapping, the Input
transformation provides input ports so you can pass data
through the mapplet.

20
 You can expand the mapplet in the Mapping Designer by  Reusable transformations and shortcuts inherit changes to their
selecting it and clicking Mappings > Expand. original transformations. This might invalidate the mapplet and
 This expands the mapplet within the mapping for view. the mappings that use the mapplet
Transformation icons within an expanded mapplet display as
shaded. Validating Mapplets
 You can open or iconize all the transformations in the mapplet  The Designer validates a mapplet when you save it. You can also
and mapping. validate a mapplet using the Mapplets > Validate menu
 You cannot edit any of the properties, navigate to other folders, command. When you validate a mapplet, the Designer writes all
or save the repository while the mapplet is expanded. relevant messages about the mapplet in the Output window.
 The following figure shows an expanded mapplet in the Mapping  The Designer validates the mapplet pipeline in the same way it
Designer: validates a mapping.
 The Designer also performs the following checks specific to
mapplets:
- The mapplet can contain Input transformations and source
definitions with at least one port connected to a transformation
in the mapplet.
- The mapplet contains at least one Output transformation with
at least one port connected to a transformation in the mapplet.

 In an expanded mapping, you do not see the Input and Output Editing Mapplets
transformations.  You can edit a mapplet in the Mapplet Designer. The Designer
validates the changes when you save the mapplet.
Creating a Mapplet  When you save changes to a mapplet, all instances of the
 A mapplet can be active or passive depending on the mapplet and all shortcuts to the mapplet inherit the changes.
transformations in the mapplet.  These changes might invalidate mappings that use the mapplet.
 Active mapplets contain one or more active transformations.  To see what mappings or shortcuts may be affected by changes
 Passive mapplets contain only passive transformations. you make to a mapplet, select the mapplet in the Navigator,
 When you use a mapplet in a mapping, all transformation rules right-click, and select Dependencies. Or, click Mapplets >
apply to the mapplet depending on the mapplet type. Dependencies from the menu.
 For example, as with an active transformation, you cannot
concatenate data from an active mapplet with a different  You can make the following changes to a mapplet without
pipeline. affecting the validity of existing mappings and sessions:
- Add input or output ports.
 Use the following rules and guidelines when you add - Change port names or comments.
transformations to a mapplet: - Change Input or Output transformation names or comments.
- If you use a Sequence Generator transformation, you must use - Change transformation names, comments, or properties.
a reusable Sequence Generator transformation. - Change port default values for transformations in the mapplet.
- If you use a Stored Procedure transformation, you must - Add or remove transformations in the mapplet, providing you
configure the Stored Procedure Type to be Normal. do not change the mapplet type from active to passive or from
- You cannot include PowerMart 3.5-style LOOKUP functions in a passive to active.
mapplet.
- You cannot include the following objects in a mapplet:  Use the following rules and guidelines when you edit a mapplet
- Normalizer transformations that is used by mappings:
- COBOL sources  Do not delete a port from the mapplet: The Designer deletes
- XML Source Qualifier transformations mapplet ports in the mapping when you delete links to an Input
- XML sources or Output transformation or when you delete ports connected
- Target definitions to an Input or Output transformation.
- Other mapplets  Do not change the datatype, precision, or scale of a mapplet
 Although reusable transformations and shortcuts in a mapplet port: The data type, precision, and scale of a mapplet port is
can be used, to protect the validity of the mapplet, use a copy of defined by the transformation port to which it is connected in
a transformation instead. the mapplet. Therefore, if you edit a mapplet to change the

21
datatype, precision, or scale of a port connected to a port in an View links to a port: You can view links to a port in a mapplet in
Input or Output transformation, you change the mapplet port. the same way you would view links to a port in a mapping. You
 Do not change the mapplet type: If you remove all active can view the forward path, the backward path, or both paths.
transformations from an active mapplet, the mapplet becomes Propagate port attributes: You can propagate port attributes in a
passive. If you add an active transformation to a passive mapplet in the same way you would propagate port attributes in
mapplet, the mapplet becomes active. a mapping. You can propagate attributes forward, backward, or in
both directions.
Mapplets and Mappings
 The following mappings tasks can also be performed on Using Mapplets in Mappings
mapplets:  In a mapping, a mapplet has input and output ports that you can
Set tracing level: You can set the tracing level on individual connect to other transformations in the mapping.
transformations within a mapplet in the same manner as in a  You do not have to connect all mapplet ports in a mapping.
mapping.  However, if the mapplet contains an SQL override, you must
Copy mapplet: You can copy a mapplet from one folder to connect all mapplet output ports in the mapping.
another as you would any other repository object. After you copy
the mapplet, it appears in the Mapplets node of the new folder. If  Like a reusable transformation, when you drag a mapplet into a
you make changes to a mapplet, but you do not want to mapping, the Designer creates an instance of the mapplet.
overwrite the original mapplet, you can make a copy of the  You can enter comments for the instance of the mapplet in the
mapplet by clicking Mapplets > Copy As. mapping.
Export and import mapplets: You can export a mapplet to an  You cannot otherwise edit the mapplet in the Mapping
XML file or import a mapplet from an XML file through the Designer.
Designer. You might want to use the export and import feature  If you edit the mapplet in the Mapplet Designer, each instance of
to copy a mapplet to another repository. the mapplet inherits the changes.
Delete mapplets: When you delete a mapplet, you delete all  The PowerCenter Repository Reports has a Mapplets list report
instances of the mapplet. This invalidates each mapping that you use to view all mappings using a particular mapplet.
containing an instance of the mapplet or a shortcut to the
mapplet. Creating and Configuring Mapplet Ports
Compare mapplets: You can compare two mapplets to find  After creating transformation logic for a mapplet, you can create
differences between them. For example, if you have mapplets mapplet ports. Use an Input transformation to define mapplet
with the same name in different folders, you can compare them input ports if the mapplet contains no source definitions. Use an
to see if they differ. Output transformation to create a group of output ports. Only
Compare instances within a mapplet: You can compare instances connected ports in an Input or Output transformation become
in a mapplet to see if they contain similar attributes. For example, mapplet input or output ports in a mapping. Unconnected ports
you can compare a source instance with another source instance, do not display when you use the mapplet in a mapping.
or a transformation with another transformation. You compare
instances within a mapplet in the same way you compare  You can create a mapplet port in the following ways:
instances within a mapping. Manually create ports in the Input/output transformation: You
Create shortcuts to mapplets: You can create a shortcut to a can create port names in Input and Output transformations. You
mapplet if the mapplet is in a shared folder. When you use a can also enter a description for each port name. The port has no
shortcut to a mapplet in a mapping, the shortcut inherits any defined data type, precision, or scale until you connect it to a
changes you might make to the mapplet. However, these changes transformation in the mapplet.
might not appear until the Integration Service runs the workflow Drag a port from another transformation: You can create an
using the shortcut. Therefore, only use a shortcut to a mapplet input or output port by dragging a port from another
when you do not expect to edit the mapplet. transformation into the Input or Output transformation. The new
Add a description: You can add a description to the mapplet in port inherits the port name, description, data type, and scale of
the Mapplet Designer in the same manner as in a mapping. You the original port. You can edit the new port name and description
can also add a description to the mapplet instance in a mapping. in the transformation. If you change a port connection, the
When you add a description, you can also create links to Designer updates the Input or Output transformation port to
documentation files. The links must be a valid URL or file path to match the attributes of the new connection.
reference the business documentation.
 You can view the data type, precision, and scale of available
mapplet ports when you use the mapplet in a mapping.
22
- A mapplet must contain at least one Input transformation or
Connecting to Mapplet Output Groups source definition with at least one port connected to a
 Each Output transformation displays as an output group when transformation in the mapplet.
you use a mapplet in a mapping. - A mapplet must contain at least one Output transformation
 Connect the mapplet output ports to the mapping pipeline. with at least one port connected to another transformation in the
Use Autolink to connect the ports. mapping.
- When a mapplet contains a source qualifier that has an override
 Use the following rules and guidelines when you connect for the default SQL query, you must connect all of the source
mapplet output ports in the mapping: qualifier output ports to the next transformation within the
- When a mapplet contains a source qualifier that has an mapplet.
override for the default SQL query, you must connect all of the - If the mapplet contains more than one source qualifier, use a
source qualifier output ports to the next transformation within Joiner transformation to join the output into one pipeline. If the
the mapplet. mapplet contains only one source qualifier, you must connect the
- If the mapplet contains more than one source qualifier, use a mapplet output ports to separate pipelines. You cannot use a
Joiner transformation to join the output into one pipeline. Joiner transformation to join the output.
- If the mapplet contains only one source qualifier, you must - When you edit a mapplet, you might invalidate mappings if you
connect the mapplet output ports to separate pipelines. You change the mapplet type from passive to active.
cannot use a Joiner transformation to join the output. - If you delete ports in the mapplet when the mapplet is used in a
mapping, you can invalidate the mapping.
 If you need to join the pipelines, you can create two mappings to - Do not change the datatype, precision, or scale of a mapplet
perform this task: port when the mapplet is used by a mapping.
- Use the mapplet in the first mapping and write data in each - If you use a Sequence Generator transformation, you must use a
pipeline to separate targets. reusable Sequence Generator transformation.
- Use the targets as sources in the second mapping to join data, - If you use a Stored Procedure transformation, you must
and then perform any additional transformation necessary. configure the Stored Procedure Type to be Normal.
- You cannot include PowerMart 3.5-style LOOKUP functions in a
Setting the Target Load Plan mapplet.
 When you use a mapplet in a mapping, the Mapping Designer - You cannot include the following objects in a mapplet:
lets you set the target load plan for sources within the mapplet. - Normalizer transformations
- Cobol sources
Pipeline Partitioning - XML Source Qualifier transformations
 If you have the partitioning option, you can increase the number - XML sources
of partitions in a pipeline to improve session performance. - Target definitions
Increasing the number of partitions allows the Integration - Pre- and post- session stored procedures
Service to create multiple connections to sources and process - Other mapplets
partitions of source data concurrently.
 When you create a session, the Workflow Manager validates Mapping Parameters and Variables
each pipeline in the mapping for partitioning. You can specify
multiple partitions in a pipeline if the Integration Service can Mapping Parameters
maintain data consistency when it processes the partitioned  A mapping parameter represents a constant value that you can
data. define before running a session.
 Some partitioning restrictions apply to mapplets.  A mapping parameter retains the same value throughout the
entire session.
Rules and Guidelines for Mapplets  When you use a mapping parameter, you declare and use the
 The following list summarizes the rules and guidelines that parameter in a mapping or mapplet.
appear throughout this chapter:  Then define the value of the parameter in a parameter file.
- You can connect an Input transformation to multiple  The Integration Service evaluates all references to the parameter
transformations in a mapplet. However, you cannot connect a to that value.
single port in the Input transformation to multiple  When you want to use the same value for a mapping parameter
transformations in the mapplet. each time you run the session, use the same parameter file for
- An Input transformation must receive data from a single active each session run.
source.
23
 When you want to change the value of a mapping parameter  The following table lists the default values the Integration Service
between sessions you can perform one of the following tasks: uses for different types of data:
- Update the parameter file between sessions.
- Create a different parameter file and configure the session to
use the new file.
- Remove the parameter file from the session properties. The
Integration Service uses the parameter value in the pre-session
variable assignment. If there is no pre-session variable
Using String Parameters and Variables
assignment, the Integration Service uses the configured initial
 For example, you might use a parameter named $$State in the
value of the parameter in the mapping.
filter for a Source Qualifier transformation to extract rows for a
Mapping Variables
particular state:
 Unlike a mapping parameter, a mapping variable represents a
STATE = ‘$$State’
value that can change through the session.
 During the session, the Integration Service replaces the
 The Integration Service saves the value of a mapping variable to
parameter with a string. If $$State is defined as MD in the
the repository at the end of each successful session run and
parameter file, the Integration Service replaces the parameter
uses that value the next time you run the session.
as follows:
 When you use a mapping variable, you declare the variable in the
STATE = ‘MD’
mapping or mapplet, and then use a variable function in the
mapping to change the value of the variable.
 You can perform a similar filter in the Filter transformation using
 At the beginning of a session, the Integration Service evaluates
the PowerCenter transformation language as follows:
references to a variable to determine the start value.
STATE = $$State
 At the end of a successful session, the Integration Service saves
 If you enclose the parameter in single quotes in the Filter
the final value of the variable to the repository.
transformation, the Integration Service reads it as the string
 The next time you run the session, the Integration Service
literal “$$State” instead of replacing the parameter with “MD.”
evaluates references to the variable to the saved value.
 To override the saved value, define the start value of the variable
Variable Datatype and Aggregation Type
in a parameter file or assign a value in the pre-session variable
 The Integration Service uses the aggregate type of a mapping
assignment in the session properties.
variable to determine the final current value of the mapping
variable.
Using Mapping Parameters and Variables
 When you have a pipeline with multiple partitions, the
 When the Designer validates a mapping variable in a reusable
Integration Service combines the variable value from each
transformation, it treats the variable as an Integer datatype.
partition and saves the final current variable value into the
 You cannot use mapping parameters and variables
repository.
interchangeably between a mapplet and a mapping.
 Mapping parameters and variables declared for a mapping
 You can create a variable with the following aggregation types:
cannot be used within a mapplet.
- Count
 Similarly, you cannot use a mapping parameter or variable
- Max
declared for a mapplet in a mapping.
- Min
Initial and Default Values
 You can configure a mapping variable for a Count Aggregation
 When you declare a mapping parameter or variable in a mapping
type when it is an Integer or Small Integer.
or a mapplet, you can enter an initial value.
 You can configure mapping variables of any datatype for Max or
 The Integration Service uses the configured initial value for a
Min aggregation types.
mapping parameter when the parameter is not defined in the
 To keep the variable value consistent throughout the session run,
parameter file.
the Designer limits the variable functions you use with a variable
 Similarly, the Integration Service uses the configured initial value
based on aggregation type.
for a mapping variable when the variable value is not defined in
 For example, use the SetMaxVariable function for a variable with
the parameter file, and there is no saved variable value in the
a Max aggregation type, but not with a variable with a Min
repository.
aggregation type.
 When the Integration Service needs an initial value, and you did
 The following table describes the available variable functions and
not declare an initial value for the parameter or variable, the
the aggregation types and datatypes you use with each
Integration Service uses a default value based on the datatype
function:
of the parameter or variable.
24
- The session runs in debug mode and is configured to discard
session output.
 You cannot use variable functions in the Rank or Aggregator
transformation. Use a different transformation for variable
functions.

Variable Functions Working with User-Defined Functions


 Variable functions determine how the Integration Service  After you create the function, you can create the following
calculates the current value of a mapping variable in a pipeline. expression in an Expression transformation to remove leading
 Use variable functions in an expression to set the value of a and trailing spaces from last names:
mapping variable for the next session run. :UDF.REMOVESPACES(LAST_NAME)

 The transformation language provides the following variable Configuring a User-Defined Function Name
functions to use in a mapping:  A valid function name meets the following requirements:
SetMaxVariable: Sets the variable to the maximum value of a group - It begins with a letter.
of values. It ignores rows marked for update, delete, or reject. To use - It can contain letters, numbers, and underscores. It cannot
the SetMaxVariable with a mapping variable, the aggregation type of contain any other character.
the mapping variable must be set to Max. - It cannot contain spaces.
SetMinVariable: Sets the variable to the minimum value of a group - It must be 80 characters or fewer.
of values. It ignores rows marked for update, delete, or reject. To use
the SetMinVariable with a mapping variable, the aggregation type of Configuring the Function Type
the mapping variable must be set to Min.  You can place user-defined functions in other user-defined
SetCountVariable: Increments the variable value by one. In other functions. You can also configure a user-defined function to be
words, it adds one to the variable value when a row is marked for callable from expressions. Callable means that you can place
insertion, and subtracts one when the row is marked for deletion. It user-defined functions in an expression.
ignores rows marked for update or reject. To use the  Select one of the following options when you configure a user-
SetCountVariable with a mapping variable, the aggregation type of defined function:
the mapping variable must be set to Count. Public: Callable from any user-defined function, transformation
SetVariable: Sets the variable to the configured value. At the end of expression, link condition expression, or task expression.
a session, it compares the final current value of the variable to the Private: Callable from another user-defined function. Create a
start value of the variable. Based on the aggregate type of the private function when you want the function to be part of a more
variable, it saves a final value to the repository. To use the complex function. The simple function may not be usable
SetVariable function with a mapping variable, the aggregation type independently of the complex function.
of the mapping variable must be set to Max or Min. The SetVariable
function ignores rows marked for delete or reject.  After you create a public user-defined function, you cannot
change the function type to private.
 Use variable functions only once for each mapping variable in a  Although you can place a user-defined function in another user-
pipeline. defined function, a function cannot refer to itself.
 The Integration Service processes variable functions as it  For example, the user-defined function RemoveSpaces includes a
encounters them in the mapping. user-defined function TrimLeadingandTrailingSpaces.
 The order in which the Integration Service encounters variable  TrimLeadingandTrailingSpaces cannot include RemoveSpaces.
functions in the mapping may not be the same for every Otherwise, RemoveSpaces is invalid.
session run. Configuring Public Functions that Contain Private Functions
 This may cause inconsistent results when you use the same  When you include ports as arguments in a private user-defined
variable function multiple times in a mapping. function, you must also include the ports as arguments in any
 The Integration Service does not save the final current value of public function that contains the private function. Use the same
a mapping variable to the repository when any of the datatype and precision for the arguments in the private and
following conditions are true: public function.
- The session fails to complete.  For example, you define a function to modify order IDs to include
- The session is configured for a test load. ‘INFA’ and the customer ID. You first create the following private
- The session is a debug session. function called ConcatCust that concatenates ‘INFA’ with the
port CUST_ID: CONCAT (‘INFA’, CUST_ID)
25
 After you create the private function, you create a public function run the Debugger, the Integration Service runs a debug instance
called ConcatOrder that contains ConcatCust: of the reusable session and creates and runs a debug workflow
CONCAT (:UDF.CONCATCUST( CUST_ID), ORDER_ID) for the session.
Create a debug session instance: You can configure source,
 When you add ConcatCust to ConcatOrder, you add the argument target, and session configuration properties through the
CUST_ID with the same datatype and precision to the public Debugger Wizard. When you run the Debugger, the Integration
function. Service runs a debug instance of the debug workflow and creates
 Note: If you enter a user-defined function when you manually and runs a debug workflow for the session
define the public function syntax, you must prefix the user-
defined function with :UDF The following figure shows the windows in the Mapping Designer
 The following table describes the user-defined function that appears when you run the Debugger:
management tasks and lists where you can perform each task:

Validating User-Defined Functions


You can validate a user-defined function from the following areas:
- Expression Editor when you create or edit a UDF
- Tools menu
- Query Results window
- View History window
 When you validate a user-defined function, the PowerCenter
Client does not validate other user-defined functions and
expressions that use the function.
 If a user-defined function is invalid, any user-defined function
 Note: You cannot create breakpoints for mapplet Input and
and expression that uses the function is also invalid.
Output transformations.
 Similarly, mappings and workflows that use the user-defined
function are invalid.
Creating Error Breakpoints
Using Debugger
 When you create an error breakpoint, the Debugger pauses when
the Integration Service encounters error conditions such as a
Debugger Session Types
transformation error or calls to the ERROR function.
 You can select three different debugger session types when you
 You also set the number of errors to skip for each breakpoint
configure the Debugger. The Debugger runs a workflow for each
before the Debugger pauses:
session type. You can choose from the following Debugger
- If you want the Debugger to pause at every error, set the
session types when you configure the Debugger:
number of errors to zero.
Use an existing non-reusable session: The Debugger uses
- If you want the Debugger to pause after a specified number of
existing source, target, and session configuration properties.
errors, set the number of errors greater than zero. For example, if
When you run the Debugger, the Integration Service runs the
you set the number of errors to five, the Debugger skips five
non-reusable session and the existing workflow. The Debugger
errors and pauses at every sixth error.
does not suspend on error.
Using ISNULL and ISDEFAULT
Use an existing reusable session: The Debugger uses existing
 You can create ISNULL and ISDEFAULT conditions in
source, target, and session configuration properties. When you
transformation and global data breakpoints.
26
 When you use the ISNULL or ISDEFAULT operator, you cannot use  When you do not select a mapplet to debug, the Designer does
the type or value in the condition. not expand it in the workspace.
 When you create an ISNULL condition, the Debugger pauses  You cannot complete the following tasks for transformations in
when the Integration Service encounters a null input value, the mapplet:
and the port contains the system default value. - Monitor or modify transformation data.
 When you create an ISDEFAULT condition, the Debugger pauses - Evaluate expressions.
in the following circumstances: - Edit breakpoints.
- The Integration Service encounters an output transformation - Step to a transformation instance.
error, and the port contains a user-defined default value of a
constant value or constant expression.  The Debugger can be in one of the following states:
- The Integration Service encounters a null input value, and the Initializing: The Designer connects to the Integration Service.
port contains a user-defined default value of a constant value or Running: The Integration Service processes the data.
constant expression. Paused: the Integration Service encounters a break and pauses
the Debugger.
Running an Existing Session in Debug Mode  Note: To enable multiple users to debug the same mapping at the
 If you choose to run an existing session in debug mode, the same time, each user must configure different port numbers in
Debugger Wizard displays a list of all sessions in the current the Tools > Options > Debug tab.
folder that use the mapping. Select the session you want to use.  The Debugger does not use the high availability functionality.
 You cannot run the Debugger against a session configured with
multiple partitions or a session configured to run on a grid. You  The following table describes the different tasks you can perform
must either change the properties of the session or choose to in each of the Debugger states:
create a debug session for the mapping.

Set Target Options


 On the last page of the Debugger Wizard, you can select the
following target options:
 Discard target data: You can choose to load or discard target data
when you run the Debugger. If you discard target data, the
Integration Service does not connect to the target.
 Display target data: You can select the target instances you want
to display in the Target window while you run a debug session.
 When you click Finish, if the mapping includes mapplets, the
Debugger displays the mapplet instance dialog box.
 Select the mapplets from this dialog box that you want to debug.
To clear a selected mapplet, press the Ctrl key and select the
mapplet.
 When you select a mapplet to debug, the Designer expands it to
display the individual transformations when the Debugger runs.

27
- Notifications tab: Displays messages from the Repository
Working with Persisted Values Service.
 When you run the Debugger against mappings with sequence
generators and mapping variables, the Integration Service might  You can step to connected transformations in the mapping,
save or discard persisted values: even if they do not have an associated breakpoint.
Discard persisted values: The Integration Service does not save  You cannot step to the following instances:
final values of generated sequence numbers or mapping variables - Sources
to the repository when you run a debug session or you run a - Targets
session in debug mode and discard target data. - Unconnected transformations
Save persisted values: The Integration Service saves final values - Mapplets not selected for debugging
of generated sequence numbers and mapping variables to the
repository when you run a session in debug mode and do not Modifying Data
discard target data. You can view the final value for Sequence  When the Debugger pauses, the current instance displays in the
Generator and Normalizer transformations in the transformation Instance window, and the current instance indicator displays on
properties. the transformation in the mapping. You can make the following
modifications to the current instance when the Debugger
pauses on a data breakpoint:
Modify output data: You can modify output data of the current
transformation. When you continue the session, the Integration
Service validates the data. It performs the same validation it
Designer Behavior performs when it passes data from port to port in a regular
When the Debugger starts, you cannot perform the following session.
tasks: Change null data to not-null: Clear the null column, and enter a
- Close the folder or open another folder. value in the value column to change null data to not-null.
- Use the Navigator. Change not-null to null: Select the null column to change not-null
- Perform repository functions, such as Save. data to null. The Designer prompts you to confirm that you want
- Edit or close the mapping. to make this change.
- Switch to another tool in the Designer, such as Target Modify row types: Modify Update Strategy, Filter, or Router
Designer. transformation row types.
- Close the Designer.
Note: Dynamic partitioning is disabled during debugging.  For Router transformations, you can change the row type to
override the group condition evaluation for user defined groups.
Monitoring the Debugger  For example, if the group condition evaluates to false, the rows
 When you run the Debugger, you can monitor the following are not passed through the output ports to the next
information: transformation or target.
Session status: Monitor the status of the session.  The Instance window displays <no data available>, and the row
Data movement: Monitor data as it moves through type is filtered. If you want to pass the filtered row to the next
transformations. transformation or target, you can change the row type to
Breakpoints: Monitor data that meets breakpoint conditions. Insert.
Target data: Monitor target data on a row-by-row basis.  Likewise, for a group that meets the group condition, you can
change the row type from insert to filtered.
 The Mapping Designer displays windows and debug indicators  After you change data, you can refresh the cache before you
that help you monitor the session: continue the session.
Debug indicators: Debug indicators on transformations help you  When you issue the Refresh command, the Designer processes
follow breakpoints and data flow. the request for the current transformation, and you can see if
Instance window: When the Debugger pauses, you can view the data you enter is valid.
transformation data and row information in the Instance window.  You can change the data again before you continue the session
Target window: View target data for each target in the mapping.
Output window: The Integration Service writes messages to the Restrictions
following tabs in the Output window:  You cannot change data for the following output ports:
- Debugger tab: The debug log displays in the Debugger tab. Normalizer transformation: Generated Keys and Generated
- Session Log tab: The session log displays in the Session Log tab. Column ID ports.
28
Rank transformation: RANKINDEX port.
Router transformation: All output ports.
Sequence Generator transformation: CURRVAL and NEXTVAL
ports.
Lookup transformation: NewLookupRow port for a Lookup
transformation configured to use a dynamic cache.
Custom transformation: Ports in output groups other than the
current output group.
Java transformation: Ports in output groups other than the
current output group.

 Additionally, you cannot change data associated with the following:


- Mapplets that are not selected for debugging
- Input or input/output ports
- Output ports when the Debugger pauses on an error breakpoint

4. WORKFLOW BASICS GUIDE Enhanced Security


Workflow Tasks  The Workflow Manager has an enhanced security option to
 You can create tasks in Task Developer, Workflow designer or specify a default set of permissions for connection objects.
Worklet designer  When you enable enhanced security, the Workflow Manager
 Tasks created in Task Dev are reusable but those created in assigns default permissions on connection objects for users,
Workflow/worklet designer are not. groups, and others.
 You can create the following types of tasks in the Workflow  When you disable enable enhanced security, the Workflow
Manager: Manager Assigns read, write, and execute permissions to all
Assignment: Assigns a value to a workflow variable. users that would otherwise receive permissions of the default
Command: Specifies a shell command to run during the group.
workflow.  If you delete the owner from the repository, the Workflow
Control: Stops or aborts the workflow. Manager assigns ownership of the object to the administrator.
Decision: Specifies a condition to evaluate.
Email: Sends email during the workflow. Viewing and Comparing Versioned Repository Objects
Event-Raise: Notifies the Event-Wait task that an event has  You can view and compare versions of objects in the Workflow
occurred. Manager. If an object has multiple versions, you can find the
Event-Wait: Waits for an event to occur before executing the next versions of the object in the View History window. In addition to
task. comparing versions of an object in a window, you can view the
Session: Runs a mapping you create in the Designer. various versions of an object in the workspace to graphically
Timer: Waits for a timed event to trigger. compare them.
 Use the following rules and guidelines when you view older
Workflow Manager Windows versions of objects in the workspace:
 The Workflow Manager displays the following windows to help - You cannot simultaneously view multiple versions of composite
you create and organize workflows: objects, such as workflows and worklets.
- Navigator - Older versions of a composite object might not include the child
- Workspace objects that were used when the composite object was checked
- Output in. If you open a composite object that includes a child object
- Overview version that is purged from the repository, the preceding version
of the child object appears in the workspace as part of the
composite object. For example, you might want to view version 5
of a workflow that originally included version 3 of a session, but
version 3 of the session is purged from the repository. When you
view version 5 of the workflow, version 2 of the session appears
as part of the workflow.

29
- You cannot view older versions of sessions if they reference  You can compare schedulers and session configuration objects
deleted or invalid mappings, or if they do not have a session in the Repository Manager.
configuration.  You cannot compare objects of different types. For example, you
cannot compare an Email task with a Session task.
Searching for Versioned Objects  When you compare objects, the Workflow Manager displays the
 Use an object query to search for versioned objects in the results in the Diff Tool window. The Diff Tool output contains
repository that meet specified conditions. When you run a different nodes for different types of objects.
query, the repository returns results based on those conditions.  When you import Workflow Manager Objects, you can compare
You may want to create an object query to perform the object conflicts.
following tasks:
 Track repository objects during development: You can add Label,  A workflow must contain a Start task. The Start task represents
User, Last saved, or Comments parameters to queries to track the beginning of a workflow.
objects during development.  When you create a workflow, the Workflow Designer creates a
 Associate a query with a deployment group: When you create a Start task and adds it to the workflow.
dynamic deployment group, you can associate an object query  You cannot delete the Start task
with it.
 You may decide to delete a workflow that you no longer use.
Comparing Repository Objects When you delete a workflow, you delete all nonreusable tasks
 Use the Workflow Manager to compare two repository objects of and reusable task instances associated with the workflow.
the same type to identify differences between the objects. For Reusable tasks used in the workflow remain in the folder when
example, if you have two similar Email tasks in a folder, you can you delete the workflow.
compare them to see which one contains the attributes you  If you delete a workflow that is running, the Integration Service
need. When you compare two objects, the Workflow Manager aborts the workflow.
displays their attributes in detail.  If you delete a workflow that is scheduled to run, the
 You can compare objects across folders and repositories. You Integration Service removes the workflow from the schedule.
must open both folders to compare the objects. You can
compare a reusable object with a non-reusable object. You can  If you want to write performance data to the repository you must
also compare two versions of the same object. perform the following tasks:
- Configure the session to collect performance data.
 You can compare the following types of objects: - Configure the session to write performance data to repository.
- Tasks - Configure Integration Service to persist run-time statistics to the
- Sessions repository at the verbose level.
- Worklets
- Workflows Guidelines for Entering Pre- and Post-Session SQL Commands
 Use the following guidelines when creating the SQL statements:
 You can also compare instances of the same type. For example, if - Use any command that is valid for the database type. However,
the workflows you compare contain worklet instances with the the Integration Service does not allow nested comments, even
same name, you can compare the instances to see if they differ. though the database might.
 Use the Workflow Manager to compare the following instances - Use a semicolon (;) to separate multiple statements. The
and attributes: Integration Service issues a commit after each statement.
- Instances of sessions and tasks in a workflow or worklet - The Integration Service ignores semicolons within /* ...*/.
comparison. For example, when you compare workflows, you can - If you need to use a semicolon outside of comments, you can
compare task instances that have the same name. escape it with a backslash (\).
- Instances of mappings and transformations in a session - The Workflow Manager does not validate the SQL.
comparison. For example, when you compare sessions, you can
compare mapping instances. Error Handling -
- The attributes of instances of the same type within a mapping  You can configure error handling on the Config Object tab.
comparison. For example, when you compare flat file sources,  You can choose to stop or continue the session if the Integration
you can compare attributes, such as file type (delimited or fixed), Service encounters an error issuing the pre- or post- session SQL
delimiters, escape characters, and optional quotes. command

30
 The Workflow Manager provides the following types of shell
commands for each Session task:
Pre-session command - The Integration Service performs pre-
session shell commands at the beginning of a session. You can
configure a session to stop or continue if a pre-session shell
command fails.
Post-session success command - The Integration Service
performs post-session success commands only if the session
completed successfully.
Post-session failure command - The Integration Service performs
post-session failure commands only if the session failed to
complete.

Pre-Session Shell Command Errors


 If you select stop, the Integration Service stops the session, but
continues with the rest of the workflow.
 If you select Continue, the Integration Service ignores the errors
and continues the session.
 By default the Integration Service stops the session upon shell
command errors.

Session Configuration Object

Configuration Object and Config Object Tab Settings


 You can configure the following settings in a session configuration
object or on the Config Object tab in session properties:
Advanced - Advanced settings allow you to configure constraint-
based loading, lookup caches, and buffer sizes.
Log options - Log options allow you to configure how you want to
save the session log. By default, the Log Manager saves only the
current session log.
Error handling - Error Handling settings allow you to determine if
the session fails or continues when it encounters pre-session
command errors, stored procedure errors, or a specified number
of session errors.
Partitioning options - Partitioning options allow the Integration
Service to determine the number of partitions to create at run
time.
Session on grid - When Session on Grid is enabled, the
Integration Service distributes session threads to the nodes in a
grid to increase performance and scalability.

31
Tasks

32
 Use the Decision task instead of multiple link conditions in a
workflow.
 Instead of specifying multiple link conditions, use the predefined
condition variable in a Decision task to simplify link conditions.

Example
 For example, you have a Command task that depends on the
status of the three sessions in the workflow. You want the
Integration Service to run the Command task when any of the
three sessions fails. To accomplish this, use a Decision task with
the following decision condition:
$Q1_session.status = FAILED OR $Q2_session.status = FAILED OR
$Q3_session.status = FAILED

 You can then use the predefined condition variable in the input
link condition of the Command task. Configure the input link
with the following link condition:
$Decision.condition = True

 You can configure the same logic in the workflow without the
Decision task.
 Without the Decision task, you need to use three link conditions
Decision Task
and treat the input links to the Command task as OR links.
 You can enter a condition that determines the execution of the
workflow, similar to a link condition with the Decision task.
Event Task
 The Decision task has a predefined variable called
 You can define events in the workflow to specify the sequence of
$Decision_task_name.condition that represents the result of the
task execution.
decision condition.
 The event is triggered based on the completion of the sequence
 The Integration Service evaluates the condition in the Decision
of tasks.
task and sets the predefined condition variable to True (1) or
False (0).
 Use the following tasks to help you use events in the workflow:
 You can specify one decision condition per Decision task.
Event-Raise task - Event-Raise task represents a user-defined
 After the Integration Service evaluates the Decision task, use the
event. When the Integration Service runs the Event-Raise task,
predefined condition variable in other expressions in the
the Event-Raise task triggers the event. Use the Event-Raise task
workflow to help you develop the workflow.
with the Event-Wait task to define events.
 Depending on the workflow, you might use link conditions
Event-Wait task - The Event-Wait task waits for an event to occur.
instead of a Decision task.
Once the event triggers, the Integration Service continues
 However, the Decision task simplifies the workflow.
executing the rest of the workflow.
If you do not specify a condition in the Decision task, the
Integration Service evaluates the Decision task to True.
 To coordinate the execution of the workflow, you may specify the
Using the Decision Task
following types of events for the Event-Wait and
33
Event-Raise tasks: Sources
Predefined event - A predefined event is a file-watch event. For Allocating Buffer Memory
predefined events, use an Event-Wait task to instruct the  When the Integration Service initializes a session, it allocates
Integration Service to wait for the specified indicator file to blocks of memory to hold source and target data.
appear before continuing with the rest of the workflow. When  The Integration Service allocates at least two blocks for each
the Integration Service locates the indicator file, it starts the next source and target partition.
task in the workflow.  Sessions that use a large number of sources or targets might
require additional memory blocks.
User-defined event - A user-defined event is a sequence of tasks  If the Integration Service cannot allocate enough memory
in the workflow. Use an Event-Raise task to specify the location of blocks to hold the data, it fails the session.
the user-defined event in the workflow. A user-defined event is
sequence of tasks in the branch from the Start task leading to the Partitioning Sources
Event-Raise task.  You can create multiple partitions for relational, Application, and
file sources.
 When all the tasks in the branch from the Start task to the Event-  For relational or Application sources, the Integration Service
Raise task complete, the Event-Raise task triggers the event. The creates a separate connection to the source database for each
Event-Wait task waits for the Event-Raise task to trigger the partition you set in the session properties.
event before continuing with the rest of the tasks in its branch.  For file sources, you can configure the session to read the source
with one thread or multiple threads.

Overriding the Source Table Name


 If you override the source table name on the Properties tab of
the source instance, and you override the source table name
using an SQL query, the Integration Service uses the source table
name defined in the SQL query.

Targets
Timer Task Working with Relational Targets
 You can specify the period of time to wait before the Integration  When you configure a session to load data to a relational target,
Service runs the next task in the workflow with the Timer task. you define most properties in the Transformations view on the
 You can choose to start the next task in the workflow at a Mapping tab.
specified time and date.
 You can also choose to wait a period of time after the start time Performing a Test Load
of another task, workflow, or worklet before starting the next  With a test load, the Integration Service reads and transforms
task. data without writing to targets.
 The Integration Service reads the number you configure for the
The Timer task has the following types of settings: test load.
Absolute time - You specify the time that the Integration Service  The Integration Service generates all session files and performs all
starts running the next task in the workflow. You may specify the pre- and post-session functions, as if running the full session.
date and time, or you can choose a user-defined workflow  To configure a session to perform a test load, enable test load
variable to specify the time. and enter the number of rows to test.
Relative time - You instruct the Integration Service to wait for a  The Integration Service writes data to relational targets, but
specified period of time after the Timer task, the parent rolls back the data when the session completes.
workflow, or the top-level workflow starts.  For all other target types, such as flat file and SAP BW, the
Integration Service does not write data to the targets.
 For example, a workflow contains two sessions. You want the
Integration Service wait 10 minutes after the first session Use the following guidelines when performing a test load:
completes before it runs the second session. Use a Timer task - You cannot perform a test load on sessions using XML sources.
after the first session. In the Relative Time setting of the Timer - You can perform a test load for relational targets when you
task, specify ten minutes from the start time of the Timer task. configure a session for normal mode.
Use a Timer task anywhere in the workflow after the Start task. - If you configure the session for bulk mode, the session fails.
- Enable a test load on the session Properties tab.
34
 You can configure the following properties for relational targets:
Target database connection - Define database connection
information.
Target properties - You can define target properties such as
target load type, target update options, and reject options.
Truncate target tables - The Integration Service can truncate
target tables before loading data.
Deadlock retries - You can configure the session to retry
deadlocks when writing to targets or a recovery table.
Drop and recreate indexes - Use pre- and post-session SQL to
drop and recreate an index on a relational target table to
optimize query speed.
Constraint-based loading - The Integration Service can load data
to targets based on primary key-foreign key constraints and active
sources in the session mapping.
Bulk loading - You can specify bulk mode when loading to DB2,
Microsoft SQL Server, Oracle, and Sybase databases.
Dropping and Recreating Indexes
 You can define the following properties in the session and  After you insert significant amounts of data into a target, you
override the properties you define in the mapping: normally need to drop and recreate indexes on that table to
Table name prefix - You can specify the target owner name or optimize query speed.
prefix in the session properties to override the table name prefix  You can drop and recreate indexes by:
in the mapping. Using pre- and post-session SQL - The preferred method for
Pre-session SQL - You can create SQL commands and execute dropping and re-creating indexes is to define an SQL statement in
them in the target database before loading data to the target. For the Pre SQL property that drops indexes before loading data to
example, you might want to drop the index for the target table the target. Use the Post SQL property to recreate the indexes
before loading data into it. after loading data to the target. Define the Pre SQL and Post SQL
Post-session SQL - You can create SQL commands and execute properties for relational targets in the Transformations view on
them in the target database after loading data to the target. For the Mapping tab in the session properties.
example, you might want to recreate the index for the target Using the Designer - The same dialog box you use to generate
table after loading data into it. and execute DDL code for table creation can drop and recreate
Target table name - You can override the target table name for indexes. However, this process is not automatic
each relational target.
Constraint-Based Loading
Target Table Truncation  In the Workflow Manager, you can specify constraint-based
 If you enable truncate target tables with the following sessions, loading for a session.
the Integration Service does not truncate target tables:  When you select this option, the Integration Service orders the
Incremental aggregation - When you enable both truncate target target load on a row-by-row basis.
tables and incremental aggregation in the session properties, the  For every row generated by an active source, the Integration
Workflow Manager issues a warning that you cannot enable Service loads the corresponding transformed row first to the
truncate target tables and incremental aggregation in the same primary key table, then to any foreign key tables.
session.  Constraint-based loading depends on the following requirements:
Test load - When you enable both truncate target tables and test Active source - Related target tables must have the same active
load, the Integration Service disables the truncate table function, source.
runs a test load session, and writes a message to the session log Key relationships - Target tables must have key relationships.
indicating that the truncate target tables’ option is turned off for Target connection groups - Targets must be in one target
the test load session. connection group.
Real-time - The Integration Service does not truncate target Treat rows as insert - Use this option when you insert into the
tables when you restart a JMS or WebSphere MQ real-time target. You cannot use updates with constraint based loading.
session that has recovery data.

35
 The Integration Service includes T_5 and T_6 in a different target
connection group because they are in a different target load
order group from the first four targets.

Bulk Loading
 You can enable bulk loading when you load to DB2, Sybase,
Oracle, or Microsoft SQL Server.
 If you enable bulk loading for other database types, the
Integration Service reverts to a normal load.

 When bulk loading, the Integration Service invokes the database


bulk utility and bypasses the database log, which speeds
performance.
 Without writing to the database log, however, the target
database cannot perform rollback.

 Note: When loading to DB2, Microsoft SQL Server, and Oracle


targets, you must specify a normal load for data driven sessions.
 In the first pipeline, target T_1 has a primary key, T_2 and T_3
When you specify bulk mode and data driven, the Integration
contain foreign keys referencing the T1 primary key. T_3 has a
Service reverts to normal load.
primary key that T_4 references as a foreign key
 Since these tables receive records from a single active source,
Committing Data
SQ_A, the Integration Service loads rows to the target in the
 When bulk loading to Sybase and DB2 targets, the Integration
following order:
Service ignores the commit interval you define in the session
1. T_1
properties and commits data when the writer block is full.
2. T_2 and T_3 (in no particular order)
 When bulk loading to Microsoft SQL Server and Oracle targets,
3. T_4
the Integration Service commits data at each commit interval.
 The Integration Service loads T_1 first because it has no foreign
Also, Microsoft SQL Server and Oracle start a new bulk load
key dependencies and contains a primary key referenced by T_2
transaction after each commit.
and T_3.
 The Integration Service then loads T_2 and T_3, but since T_2
Reserved Words
and T_3 have no dependencies, they are not loaded in any
 If any table name or column name contains a database reserved
particular order.
word, such as MONTH or YEAR, the session fails with database
 The Integration Service loads T_4 last, because it has a foreign
errors when the Integration Service executes SQL against the
key that references a primary key in T_3.
database.
 You can create and maintain a reserved words file, reswords.txt,
 After loading the first set of targets, the Integration Service
in the server/bin directory.
begins reading source B.
 When the Integration Service initializes a session, it searches for
 If there are no key relationships between T_5 and T_6, the
reswords.txt.
Integration Service reverts to a normal load for both targets.
 If the file exists, the Integration Service places quotes around
 If T_6 has a foreign key that references a primary key in T_5, since
matching reserved words when it executes SQL against the
T_5 and T_6 receive data from a single active source, the
database.
Aggregator AGGTRANS, the Integration Service loads rows to the
 Use the following rules and guidelines when working with
tables in the following order:
reserved words.
- T_5
- The Integration Service searches the reserved words file when
- T_6
it generates SQL to connect to source, target, and lookup
databases.
 T_1, T_2, T_3, and T_4 are in one target connection group if you
- If you override the SQL for a source, target, or lookup, you
use the same database connection for each target, and you use
must enclose any reserved word in quotes.
the default partition properties.
 T_5 and T_6 are in another target connection group together if
Working with Active Sources
you use the same database connection for each target and you
use the default partition properties.
36
 An active source is an active transformation the Integration
Service uses to generate rows. An active source can be any of
the following transformations:
- Aggregator
- Application Source Qualifier
- Custom, configured as an active transformation
- Joiner
- MQ Source Qualifier
- Normalizer (VSAM or pipeline)
- Rank
- Sorter
- Source Qualifier
- XML Source Qualifier
- Mapplet, if it contains any of the above transformations

 Note: The Filter, Router, Transaction Control, and Update Strategy


transformations are active transformations in that they can
change the number of rows that pass through. However, they Writing to Fixed-Width Files with Flat File Target Definitions
are not active sources in the mapping because they do not  When you want to output to a fixed-width flat file based on a flat
generate rows. Only transformations that can generate rows are file target definition, you must configure precision and field
active sources. width for the target field to accommodate the total length of the
target field.
 If the data for a target field is too long for the total length of the
field, the Integration Service performs one of the following
actions:
- Truncates the row for string columns
- Writes the row to the reject file for numeric and date time
columns

 Note: When the Integration Service writes a row to the reject file,
it writes a message in the session log.

Writing Empty Fields for Unconnected Ports in Fixed-Width File


Definitions
 The Integration Service does not write data in unconnected ports
to fixed-width files.
 If you want the Integration Service to write empty fields for the
unconnected ports, create output ports in an upstream
transformation that do not contain data.
Integration Service Handling for File Targets  Then connect these ports containing null values to the fixed
width flat file target definition
Writing to Fixed-Width Flat Files with Relational Target
Definitions Workflow Validation
 The Workflow Manager validates the following properties:
Expressions - Expressions in the workflow must be valid.
Tasks - Non-reusable task and Reusable task instances in the
workflow must follow validation rules.
Scheduler - If the workflow uses a reusable scheduler, the
Workflow Manager verifies that the scheduler exists.

37
 The Workflow Manager marks the workflow invalid if the  When a task instance is invalid, the workflow using the task
scheduler you specify for the workflow does not exist in the instance becomes invalid.
folder.  When a reusable task is invalid, it does not affect the validity of
 The Workflow Manager also verifies that you linked each task the task instance used in the workflow.
properly.  However, if a Session task instance is invalid, the workflow may
Note: The Workflow Manager validates Session tasks still be valid.
separately. If a session is invalid, the workflow may still be valid  The Workflow Manager validates sessions differently.

Validating Multiple Workflows Session Validation


 You can validate multiple workflows or worklets without fetching  If you delete objects associated with a Session task such as
them into the workspace. session configuration object, Email, or Command task, the
 To validate multiple workflows, you must select and validate the Workflow Manager marks a reusable session invalid.
workflows from a query results view or a view dependencies list.  However, the Workflow Manager does not mark a non-reusable
 When you validate multiple workflows, the validation does not session invalid if you delete an object associated with the
include sessions, nested worklets, or reusable worklet objects in session.
the workflows.  If you delete a shortcut to a source or target from the mapping,
 You can save and optionally check in workflows that change from the Workflow Manager does not mark the session invalid.
invalid to valid status.  The Workflow Manager does not validate SQL overrides or filter
conditions entered in the session properties when you validate a
Task Validation session.
 The Workflow Manager validates each task in the workflow as  You must validate SQL override and filter conditions in the SQL
you create it. Editor.
 When you save or validate the workflow, the Workflow Manager  If a reusable or non-reusable session instance is invalid, the
validates all tasks in the workflow except Session tasks Workflow Manager marks it invalid in the Navigator and in the
 When you delete a reusable task, the Workflow Manager Workflow Designer workspace.
removes the instance of the deleted task from workflows.  Workflows using the session instance remain valid
 The Workflow Manager also marks the workflow invalid when
you delete a reusable task used in a workflow. Expression Validation
 The Workflow Manager verifies that there are no duplicate task  The Workflow Manager validates all expressions in the workflow.
names in a folder, and that there are no duplicate task instances You can enter expressions in the Assignment task, Decision task,
in the workflow and link conditions.
 The Workflow Manager writes any error message to the Output
 The Workflow Manager uses the following rules to validate tasks: window
Assignment - The Workflow Manager validates the expression
you enter for the Assignment task. For example, the Workflow Workflow Schedules
Manager verifies that you assigned a matching datatype value to  You can schedule a workflow to run continuously, repeat at a
the workflow variable in the assignment expression. given time or interval, or you can manually start a workflow
Command - The Workflow Manager does not validate the shell
command you enter for the Command task.  If you configure multiple instances of a workflow, and you
Event-Wait - If you choose to wait for a predefined event, the schedule the workflow run time, the Integration Service runs
Workflow Manager verifies that you specified a file to watch. If all instances at the scheduled time. You cannot schedule
you choose to use the Event-Wait task to wait for a user-defined workflow instances to run at different times.
event, the Workflow Manager verifies that you specified an
event.  If you choose a different Integration Service for the workflow or
Event-Raise - The Workflow Manager verifies that you specified a restart the Integration Service, it reschedules all workflows.
user-defined event for the Event-Raise task.  This includes workflows that are scheduled to run continuously
Timer - The Workflow Manager verifies that the variable you but whose start time has passed and workflows that are
specified for the Absolute Time setting has the Date/Time scheduled to run continuously but were unscheduled.
datatype.  You must manually reschedule workflows whose start time has
Start - The Workflow Manager verifies that you linked the Start passed if they are not scheduled to run continuously.
task to at least one task in the workflow.

38
 If you delete a folder, the Integration Service removes workflows
from the schedule when it receives notification from the
Repository Service.
 If you copy a folder into a repository, the Integration Service
reschedules all workflows in the folder when it receives the
notification.

 The Integration Service does not run the workflow in the


following situations:
- The prior workflow run fails. When a workflow fails, the
Integration Service removes the workflow from the schedule, and
you must manually reschedule it. You can reschedule the
workflow in the Workflow Manager or using pmcmd.
- The Integration Service process fails during a prior workflow
run. When the Integration Services process fails in a highly
available domain and a running workflow is not configured for
recovery, the Integration Service removes the workflow from the
schedule. You can reschedule the workflow in the Workflow
Manager or using pmcmd.
- You remove the workflow from the schedule. You can remove
the workflow from the schedule in the Workflow Manager or
using pmcmd.
- The Integration Service is running in safe mode. In safe mode,
the Integration Service does not run scheduled workflows,
including workflows scheduled to run continuously or run on
service initialization. When you enable the Integration Service in
normal mode, the Integration Service runs the scheduled
workflows.

Session and Workflow Logs


 The following steps describe how the Log Manager processes
session and workflow logs:
1. The Integration Service writes binary log files on the node. It
sends information about the sessions and workflows to the Log
Manager.
2. The Log Manager stores information about workflow and
session logs in the domain configuration database. The domain
configuration database stores information such as the path to the
log file location, the node that contains the log, and the
Integration Service that created the log.

39
3. When you view a session or workflow in the Log Events 5. ADVANCED WORKFLOW GUIDE
window, the Log Manager retrieves the information from the
domain configuration database to determine the location of the
Understanding Pipeline Partitioning
session or workflow logs.
 A partition is a pipeline stage that executes in a single reader,
4. The Log Manager dispatches a Log Agent to retrieve the log
transformation, or writer thread.
events on each node to display in the Log Events window.
 The number of partitions in any pipeline stage equals the number
 You can also configure a workflow to produce text log files
of threads in the stage.
 When you configure the workflow or session to produce text log
 By default, the Integration Service creates one partition in every
files, the Integration Service creates the binary log and the text
pipeline stage
log file

 Complete the following tasks to configure partitions for a session:


Message Severity
- Set partition attributes including partition points, the number
of partitions, and the partition types.
- You can enable the Integration Service to set partitioning at run
time. When you enable dynamic partitioning, the Integration
Service scales the number of session partitions based on factors
such as the source database partitions or the number of nodes
in a grid.
- After you configure a session for partitioning, you can configure
memory requirements and cache directories for each
transformation.
- The Integration Service evaluates mapping variables for each
partition in a target load order group. You can use variable
functions in the mapping to set the variable values.
Log Events Window - When you create multiple partitions in a pipeline, the
 The Log Events window displays the following information for Workflow Manager verifies that the Integration Service can
each session and workflow: maintain data consistency in the session using the partitions.
Severity - Lists the type of message, such as informational or When you edit object properties in the session, you can impact
error. partitioning and cause a session to fail.
Time stamp - Date and time the log event reached the Log Agent. - You add or edit partition points in the session properties. When
Node - Node on which the Integration Service process is running. you change partition points you can define the partition type and
Thread - Thread ID for the workflow or session. add or delete partitions
Process ID - Windows or UNIX process identification numbers.
Displays in the Output window only. Partitioning Attributes
Message Code - Message code and number. Partition points - Partition points mark thread boundaries and
Message - Message associated with the log event. divide the pipeline into stages. The Integration Service
redistributes rows of data at partition points.
Writing to Log Files Number of partitions - A partition is a pipeline stage that
 When you create a workflow or session log, you can configure log executes in a single thread. If you purchase the Partitioning
options in the workflow or session properties. option, you can set the number of partitions at any partition
 You can configure the following information for a workflow or point. When you add partitions, you increase the number of
session log: processing threads, which can improve session performance.
Write Backward Compatible Log File - Select this option to create Partition types - The Integration Service creates a default
a text file for workflow or session logs. If you do not select the partition type at each partition point. If you have the Partitioning
option, the Integration Service creates the binary log only. option, you can change the partition type. The partition type
Log File Directory - The directory where you want the log file controls how the Integration Service distributes data among
created. By default, the Integration Service writes the workflow partitions at partition points
log file in the directory specified in the service process variable,
$PMWorkflowLogDir. It writes the session log file in the directory
specified in the service process variable, $PMSessionLogDir Partition Points

40
 A stage is a section of a pipeline between any two partition  When you connect more than one pipeline to a multiple input
points. group transformation, the Integration Service maintains the
 When you set a partition point at a transformation, the new transformation threads or creates a new transformation thread
pipeline stage includes that transformation. depending on whether or not the multiple input group
 Figure shows the default partition points and pipeline stages for a transformation is a partition point:
mapping with one pipeline: Partition point does not exist at multiple input group
transformation - When a partition point does not exist at a
multiple input group transformation, the Integration Service
processes one thread at a time for the multiple input group
transformation and all downstream transformations in the
stage.
Partition point exists at multiple input group transformation -
When a partition point exists at a multiple input group
transformation, the Integration Service creates a new pipeline
stage and processes the stage with one thread for each
 When you add a partition point, you increase the number of partition. The Integration Service creates one transformation
pipeline stages by one. thread for each partition regardless of the number of output
 Similarly, when you delete a partition point, you reduce the groups the transformation contains
number of stages by one.
 Partition points mark the points in the pipeline where the Partition Types
Integration Service can redistribute data across partitions. You can define the following partition types in the Workflow
Manager:
Number of Partitions Database partitioning - The Integration Service queries the IBM
 A partition is a pipeline stage that executes in a single reader, DB2 or Oracle database system for table partition information. It
transformation, or writer thread. reads partitioned data from the corresponding nodes in the
 The number of partitions in any pipeline stage equals the number database. You can use database partitioning with Oracle or IBM
of threads in that stage DB2 source instances on a multi-node tablespace. You can use
 You can define up to 64 partitions at any partition point in a database partitioning with DB2 targets.
pipeline Hash auto-keys - The Integration Service uses a hash function to
 When you increase or decrease the number of partitions at any group rows of data among partitions. The Integration Service
partition point, the Workflow Manager increases or decreases groups the data based on a partition key. The Integration
the number of partitions at all partition points in the pipeline. Service uses all grouped or sorted ports as a compound
 The number of partitions remains consistent throughout the partition key. You may need to use hash auto-keys partitioning
pipeline at Rank, Sorter, and unsorted Aggregator transformations.
 If you define three partitions at any partition point, the Hash user keys - The Integration Service uses a hash function to
Workflow Manager creates three partitions at all other group rows of data among partitions. You define the number of
partition points in the pipeline ports to generate the partition key.
 The number of partitions you create equals the number of Key range - With key range partitioning, the Integration Service
connections to the source or target. distributes rows of data based on a port or set of ports that you
 If the pipeline contains a relational source or target, the number define as the partition key. For each port, you define a range of
of partitions at the source qualifier or target instance equals the values. The Integration Service uses the key and ranges to send
number of connections to the database. rows to the appropriate partition. Use key range partitioning
 If the pipeline contains file sources, you can configure the session when the sources or targets in the pipeline are partitioned by
to read the source with one thread or with multiple threads. key range.
Pass-through - In pass-through partitioning, the Integration
 For example, when you define three partitions across the Service processes data without redistributing rows among
mapping, the master thread creates three threads at each partitions. All rows in a single partition stay in the partition after
pipeline stage, for a total of 12 threads crossing a pass-through partition point. Choose pass-through
partitioning when you want to create an additional pipeline stage
to improve performance, but do not want to change the
Partitioning Multiple Input Group Transformations distribution of data across partitions.

41
Round-robin - The Integration Service distributes data evenly - Dynamic partitioning uses the same connection for each
among all partitions. Use round-robin partitioning where you partition.
want each partition to process approximately the same number - You cannot use dynamic partitioning with XML sources and
of rows. targets.
- You cannot use dynamic partitioning with the Debugger.
Dynamic Partitioning - Sessions that use SFTP fail if you enable dynamic partitioning.
 When you use dynamic partitioning, you can configure the - When you set dynamic partitioning to a value other than
partition information so the Integration Service determines the disabled, and you manually partition the session on the
number of partitions to create at run time Mapping tab, you invalidate the session.
 The Integration Service scales the number of session partitions - The session fails if you use a parameter other than
at run time based on factors such as source database $DynamicPartitionCount to set the number of partitions.
partitions or the number of nodes in a grid.
The following dynamic partitioning configurations cause a
 If any transformation in a stage does not support partitioning, session to run with one partition:
or if the partition configuration does not support dynamic 1. You override the default cache directory for an Aggregator,
partitioning, the Integration Service does not scale partitions Joiner, Lookup, or Rank transformation. The Integration Service
in the pipeline. The data passes through one partition. partitions a transformation cache directory when the default is
$PMCacheDir.
Note: Do not configure dynamic partitioning for a session that 2. You override the Sorter transformation default work directory.
contains manual partitions. If you set dynamic partitioning to a The Integration Service partitions the Sorter transformation work
value other than disabled and you manually partition the directory when the default is $PMTempDir.
session, the session is invalid 3. You use an open-ended range of numbers or date keys with a
key range partition type.
Configuring Dynamic Partitioning 4. You use datatypes other than numbers or dates as keys in key
Configure dynamic partitioning using one of the following range partitioning.
methods: 5. You use key range relational target partitioning.
Disabled - Do not use dynamic partitioning. Define the number of 6. You create a user-defined SQL statement or a user-defined
partitions on the Mapping tab. source filter.
Based on number of partitions- Sets the partitions to a number 7. You set dynamic partitioning to the number of nodes in the
that you define in the Number of Partitions attribute. Use the grid, and the session does not run on a grid.
$DynamicPartitionCount session parameter, or enter a number 8. You use pass-through relational source partitioning.
greater than 1. 9. You use dynamic partitioning with an Application Source
Based on number of nodes in grid - Sets the partitions to the Qualifier.
number of nodes in the grid running the session. If you configure 10. You use SDK or PowerConnect sources and targets with
this option for sessions that do not run on a grid, the session runs dynamic partitioning
in one partition and logs a message in the session log.
Based on source partitioning - Determines the number of Cache Partitioning
partitions using database partition information. The number of  When you create a session with multiple partitions, the
partitions is the maximum of the number of partitions at the Integration Service may use cache partitioning for the
source. For Oracle sources that use composite partitioning, the Aggregator, Joiner, Lookup, Rank, and Sorter transformations.
number of partitions is the maximum of the number of  When the Integration Service partitions a cache, it creates a
subpartitions at the source. separate cache for each partition and allocates the configured
Based on number of CPUs - Sets the number of partitions equal cache size to each partition. The Integration Service stores
to the number of CPUs on the node that prepares the session. If different data in each cache, where each cache contains only the
the session is configured to run on a grid, dynamic partitioning rows needed by that partition.
sets the number of partitions equal to the number of CPUs on  As a result, the Integration Service requires a portion of total
the node that prepares the session multiplied by the number of cache memory for each partition.
nodes in the grid.

Rules and Guidelines for Dynamic Partitioning Mapping Variables in Partitioned Pipelines
 Use the following rules and guidelines with dynamic partitioning: Integration Service evaluates the value of a mapping variable in
each partition separately.
42
The Integration Service uses the following process to evaluate
variable values:  When you configure a session to write to a file target, you can
1. It updates the current value of the variable separately in each write the target output to a separate file for each partition or to
partition according to the variable function used in the mapping. a merge file that contains the target output for all partitions.
2. After loading all the targets in a target load order group, the  You can configure connection settings and file properties for each
Integration Service combines the current values from each target partition.
partition into a single final value based on the aggregation type of  When you create a partition point at transformations, the
the variable. Workflow Manager sets the default partition type.
3. If there is more than one target load order group in the  You can change the partition type depending on the
session, the final current value of a mapping variable in a target transformation type
load order group becomes the current value in the next target
load order group. Adding and Deleting Partition Points
4. When the Integration Service finishes loading the last target
load order group, the final current value of the variable is saved
into the repository

The following changes to mappings can cause session failure:


 Any changes you make (such as adding/altering transformations
etc) that affects the existing partitions or partition points.

Partition Points
 Partition points mark the boundaries between threads in a
pipeline and divide the pipeline into stages.
 By default, IS creates one Reader and one Writer Partition
points. Rules and Guidelines for Adding and Deleting Partition Points
 The Integration Service redistributes rows of data at partition - You cannot create a partition point at a source instance.
points. - You cannot create a partition point at a Sequence Generator
 You can add partition points to increase the number of transformation or an unconnected transformation.
transformation threads and increase session performance. - You can add a partition point at any other transformation
provided that no partition point receives input from more than
 When you configure a session to read a source database, the one pipeline stage.
Integration Service creates a separate connection and SQL - You cannot delete a partition point at a Source Qualifier
query to the source database for each partition transformation, a Normalizer transformation for COBOL
sources, or a target instance.
 When you configure a session to load data to a relational target, - You cannot delete a partition point at a multiple input group
the Integration Service creates a separate connection to the Custom transformation that is configured to use one thread per
target database for each partition at the target instance. partition.
 You configure the reject file names and directories for the target. - You cannot delete a partition point at a multiple input group
The Integration Service creates one reject file for each target transformation that is upstream from a multiple input group
partition Custom transformation that is configured to use one thread per
 You can configure a session to read a source file with one thread partition
or with multiple threads. - The following partition types have restrictions with dynamic
 You must choose the same connection type for all partitions partitioning:
that read the file.
43
Pass-through - When you use dynamic partitioning, if you change - If you use a shift-sensitive code page, use multi-threaded
the number of partitions at a partition point, the number of reading if the following conditions are true:
partitions in each pipeline stage changes. - The file is fixed-width.
Key Range - To use key range with dynamic partitioning you must - The file is not line sequential.
define a closed range of numbers or date keys. If you use an - You did not enable user-defined shift state in the source
open-ended range, the session runs with one partition. definition.
You can add and delete partition points at other - To read data from the three flat files concurrently, you must
transformations in the pipeline according to the following rules: specify three partitions at the source qualifier. Accept the default
- You cannot create partition points at source instances. partition type, pass-through.
- You cannot create partition points at Sequence Generator - If you configure a session for multi-threaded reading, and the
transformations or unconnected transformations. Integration Service cannot create multiple threads to a file
- You can add partition points at any other transformation source, it writes a message to the session log and reads the
provided that no partition point receives input from more than source with one thread.
one pipeline stage. - When the Integration Service uses multiple threads to read a
source file, it may not read the rows in the file sequentially. If sort
Note: When you create a custom SQL query to read database order is important, configure the session to read the file with a
tables and you set database partitioning, the Integration Service single thread. For example, sort order may be important if the
reverts to pass-through partitioning and prints a message in the mapping contains a sorted Joiner transformation and the file
session log. source is the sort origin.
- You can also use a combination of direct and indirect files to
Partitioning File Sources balance the load.
 The Integration Service creates one connection to the file source - Session performance for multi-threaded reading is optimal with
when you configure the session to read with one thread, and it large source files. The load may be unbalanced if the amount of
creates multiple concurrent connections to the file source when input data is small.
you configure the session to read with multiple threads. - You cannot use a command for a file source if the command
 Use the following types of partitioned file sources: generates source data and the session is configured to run on a
Flat file - You can configure a session to read flat file, XML, or grid or is configured with the resume from the last checkpoint
COBOL source files. recovery strategy.
Command - You can configure a session to use an operating
system command to generate source data rows or generate a file Using One Thread to Read a File Source
list.  When the Integration Service uses one thread to read a file
source, it creates one connection to the source.
 When connecting to file sources, you must choose the same  The Integration Service reads the rows in the file or file list
connection type for all partitions. sequentially.
 You may choose different connection objects as long as each  You can configure single-threaded reading for direct or indirect
object is of the same type. file sources in a session:
 To specify single- or multi-threaded reading for flat file sources, Reading direct files - You can configure the Integration Service to
configure the source file name property for partitions 2-n. read from one or more direct files. If you configure the session
 To configure for single-threaded reading, pass empty data with more than one direct file, the Integration Service creates a
through partitions 2-n. concurrent connection to each file. It does not create multiple
 To configure for multi-threaded reading, leave the source file connections to a file.
name blank for partitions 2-n. Reading indirect files - When the Integration Service reads an
indirect file, it reads the file list and then reads the files in the list
Rules and Guidelines for Partitioning File Sources sequentially. If the session has more than one file list, the
Use the following rules and guidelines when you configure a file Integration Service reads the file lists concurrently, and it reads
source session with multiple partitions: the files in the list sequentially.
- Use pass-through partitioning at the source qualifier. Using Multiple Threads to Read a File Source
- Use single- or multi-threaded reading with flat file or COBOL  When the Integration Service uses multiple threads to read a
sources. source file, it creates multiple concurrent connections to the
- Use single-threaded reading with XML sources source.
- You cannot use multi-threaded reading if the source files are  The Integration Service may or may not read the rows in a file
non-disk files, such as FTP files or WebSphere MQ sources. sequentially.
44
 You can configure multi-threaded reading for direct or indirect Note: When you use 1:n partitions, do not add a partition point
file sources in a session: at the Joiner transformation. If you add a partition point at the
Reading direct files - When the Integration Service reads a direct Joiner transformation, the Workflow Manager adds an equal
file, it creates multiple reader threads to read the file number of partitions to both master and detail pipelines.
concurrently. You can configure the Integration Service to read
from one or more direct files. For example, if a session reads Pushdown Optimization
from two files and you create five partitions, the Integration  When you run a session configured for pushdown optimization,
Service may distribute one file between two partitions and one the Integration Service translates the transformation logic into
file between three partitions. SQL queries and sends the SQL queries to the database.
Reading indirect files - When the Integration Service reads an  The source or target database executes the SQL queries to
indirect file, it creates multiple threads to read the file list process the transformations.
concurrently. It also creates multiple threads to read the files in
the list concurrently. The Integration Service may use more than  The amount of transformation logic you can push to the database
one thread to read a single file. depends on the database, transformation logic, mapping and
session configuration.
Partitioning Joiner Transformations  The Integration Service processes all transformation logic that it
 When you create a partition point at the Joiner transformation, cannot push to a database
the Workflow Manager sets the partition type to hash auto-
keys when the transformation scope is All Input. Pushdown Optimization Types
 The Workflow Manager sets the partition type to passthrough You can configure the following types of pushdown optimization:
when the transformation scope is Transaction. Source-side pushdown optimization
 You must create the same number of partitions for the master - The Integration Service pushes as much transformation logic as
and detail source. possible to the source database.
 If you configure the Joiner transformation for sorted input, you - Integration Service analyzes the mapping from the source to the
can change the partition type to pass-through. target or until it reaches a downstream transformation it cannot
 You can specify only one partition if the pipeline contains the push to the source database.
master source for a Joiner transformation and you do not add a - Integration Service generates and executes a SELECT statement
partition point at the Joiner transformation. based on the transformation logic for each transformation it can
 The Integration Service uses cache partitioning when you create a push to the database. - Then, it reads the results of this SQL
partition point at the Joiner transformation. query and processes the remaining transformations
 When you use partitioning with a Joiner transformation, you can Target-side pushdown optimization
create multiple partitions for the master and detail source of a - The Integration Service pushes as much transformation logic as
Joiner transformation. possible to the target database.
 If you do not create a partition point at the Joiner transformation, - Integration Service analyzes the mapping from the target to the
you can create n partitions for the detail source, and one source or until it reaches an upstream transformation it cannot
partition for the master source (1:n). push to the target database.
Note: You cannot add a partition point at the Joiner - It generates an INSERT, DELETE, or UPDATE statement based on
transformation when you configure the Joiner transformation the transformation logic for each transformation it can push to
to use the row transformation scope the target database.
Full pushdown optimization
 When you join data, you can partition data for the master and - Integration Service attempts to push all transformation logic to
detail pipelines in the following ways: the target database.
1:n - Use one partition for the master source and multiple - To use full pushdown optimization, the source and target
partitions for the detail source. The Integration Service maintains databases must be in the same relational database
the sort order because it does not redistribute master data management system
among partitions. - If the Integration Service cannot push all transformation logic to
n:n - Use an equal number of partitions for the master and detail the database, it performs both source-side and target-side
sources. When you use n:n partitions, the Integration Service pushdown optimization.
processes multiple partitions concurrently. You may need to - When you run a session with large quantities of data and full
configure the partitions to maintain the sort order depending on pushdown optimization, the database server must run a long
the type of partition you use at the Joiner transformation. transaction.

45
- Consider the following database performance issues when you Union - The Union transformation can merge data from multiple
generate a long transaction: source connections.
- A long transaction uses more database resources. Lookup - The connection for the Lookup transformation can differ
- A long transaction locks the database for longer periods of from the source connection.
time. This reduces database concurrency and increases the Target - The target connection can differ from the source
likelihood of deadlock. connection.
- A long transaction increases the likelihood of an unexpected
event.  Each connection object is pushdown compatible with itself.
- To minimize database performance issues for long transactions,  If you configure a session to use the same connection object for
consider using source-side or target-side pushdown optimization the source and target connections, the Integration Service can
push the transformation logic to the source or target database
Active and Idle Databases
 During pushdown optimization, the Integration Service pushes Error Handling, Logging, and Recovery
the transformation logic to one database, which is called the  The Integration Service and database process error handling,
active database. logging, and recovery differently.
 A database that does not process transformation logic is called an
idle database. Error Handling
 For example, a mapping contains two sources that are joined by a  When the Integration Service pushes transformation logic to the
Joiner transformation. database, it cannot track errors that occur in the database.
 If the session is configured for source-side pushdown  As a result, it handles errors differently than when it processes
optimization, the Integration Service pushes the Joiner the transformations in the session.
transformation logic to the source in the detail pipeline, which is  When the Integration Service runs a session configured for full
the active database. pushdown optimization and an error occurs, the database
 The source in the master pipeline is the idle database because it handles the errors.
does not process transformation logic.  When the database handles errors, the Integration Service does
 The Integration Service uses the following criteria to determine not write reject rows to the reject file.
which database is active or idle:
- When using full pushdown optimization, the target database is Logging
active and the source database is idle.  When the Integration Service pushes transformation logic to the
- In sessions that contain a Lookup transformation, the source or database, it cannot trace all the events that occur inside the
target database is active, and the lookup database is idle. database server.
- In sessions that contain a Joiner transformation, the source in  The statistics the Integration Service can trace depend on the
the detail pipeline is active, and the source in the master pipeline type of pushdown optimization.
is idle.  When you push transformation logic to the database, the
- In sessions that contain a Union transformation, the source in Integration Service generates a session log with the following
the first input group is active. The sources in other input groups differences:
are idle - The session log does not contain details for transformations
processed by the database.
- The session log does not contain the thread busy percentage
when the session is configured for full pushdown optimization.
- The session log contains the number of loaded rows when the
session is configured for source-side, targetside, and full
Pushdown Compatibility pushdown optimization.
 To push a transformation with multiple connections to a - The session log does not contain the number of rows read from
database, the connections must be pushdown compatible. the source when the Integration Service uses full pushdown
 Connections are pushdown compatible if they connect to optimization and pushes all transformation logic to the database.
databases on the same database management system and the - The session log contains the number of rows read from each
Integration Service can identify the database tables that the source when the Integration Service uses sourceside pushdown
connections access. optimization.
 The following transformations can have multiple connections:
Joiner - The Joiner transformation can join data from multiple Recovery
source connections.

46
 If you configure a session for full pushdown optimization and the - The transformation contains a variable port.
session fails, the Integration Service cannot perform incremental - The transformation meets all of the following criteria:
recovery because the database processes the transformations. - Is not a Sorter transformation, Union transformation, or
 Instead, the database rolls back the transactions. target.
 If the database server fails, it rolls back transactions when it - Is pushed to Microsoft SQL Server, Sybase, or Teradata.
restarts. - Is downstream from a Sorter transformation, which is
 If the Integration Service fails, the database server rolls back the downstream from a Union transformation or contains a distinct
transaction. sort.
 If the failure occurs while the Integration Service is creating - The session is configured to override the default values of input
temporary sequence objects or views in the database, which is or output ports.
before any rows have been processed, the Integration Service - The database does not have an equivalent operator, variable, or
runs the generated SQL on the database again. function that is used in an expression in the transformation.
- The mapping contains too many branches. When you branch a
 If the failure occurs before the database processes all rows, the pipeline, the SQL statement required to represent the mapping
Integration Service performs the following tasks: logic becomes more complex. The Integration Service cannot
1. If applicable, the Integration Service drops and recreates generate SQL for a mapping that contains more than 64 two-way
temporary view or sequence objects in the database to ensure branches, 43 three-way branches, or 32 four-way branches. If the
duplicate values are not produced. mapping branches exceed these limitations, the Integration
2. The Integration Service runs the generated SQL on the Service processes the downstream transformations.
database again.
 The Integration Service processes all transformations in the
 If the failure occurs while the Integration Service is dropping the mapping if any of the following conditions are true:
temporary view or sequence objects from the database, which - The session is a data profiling or debug session.
is after all rows are processed, the Integration Service tries to - The session is configured to log row errors.
drop the temporary objects again
Row Error Logging
Using the $$PushdownConfig Mapping Parameter  You can log row errors into relational tables or flat files.
 Depending on the database workload, you might want to use  When you enable error logging, the Integration Service creates
source-side, target-side, or full pushdown optimization at the error tables or an error log file the first time it runs the
different times. session. Error logs are cumulative.
 For example, use source-side or target-side pushdown  If the error logs exist, the Integration Service appends error data
optimization during the peak hours of the day, but use full to the existing error logs
pushdown optimization from midnight until 2 a.m. when  You can log source row data from flat file or relational sources
database activity is low. but you cannot log row errors from XML file sources.
 To use different pushdown optimization configurations at  You can view the XML source errors in the session log
different times, use the $$PushdownConfig mapping parameter.
 By default, the Integration Service logs transformation errors in
 The settings in the $$PushdownConfig parameter override the the session log and reject rows in the reject file.
pushdown optimization settings in the session properties.  When you enable error logging, the Integration Service does
Partitioning not generate a reject file or write dropped rows to the session
 You can push a session with multiple partitions to a database if log.
the partition types are pass-through partitioning or key range  Without a reject file, the Integration Service does not log
partitioning Transaction Control transformation rollback or commit errors.
 If you want to write rows to the session log in addition to the row
Rules and Guidelines for Pushdown Optimization and error log, you can enable verbose data tracing.
Transformations
 Use the following rules and guidelines when you configure the Note: When you log row errors, session performance may
Integration Service to push transformation logic to a database. decrease because the Integration Service processes one row at
 The Integration Service processes the transformation logic if any a time instead of a block of rows at once
of the following conditions are true:
- The transformation logic updates a mapping variable and saves Understanding the Error Log Tables
it to the repository database.
47
 When you choose relational database error logging, the
Integration Service creates the following error tables the first  The workflow state of operation includes the following
time you run a session: information:
PMERR_DATA - Stores data and metadata about a - Active service requests
transformation row error and its corresponding source row. - Completed and running task status
PMERR_MSG - Stores metadata about an error and the error - Workflow variable values
message.
PMERR_SESS - Stores metadata about the session.  When you run concurrent workflows, the Integration Service
PMERR_TRANS - Stores metadata about the source and appends the instance name or the workflow run ID to the
transformation ports, such as name and datatype, when a workflow recovery storage file in $PMStorageDir.
transformation error occurs.  When you enable a workflow for recovery the Integration Service
does not store the session state of operation by default.
 If the error tables exist for a session, the Integration Service
appends row errors to these tables. Session State of Operation
 Relational database error logging lets you collect row errors  When you configure the session recovery strategy to resume
from multiple sessions in one set of error tables. from the last checkpoint, the Integration Service stores the
 You can specify a prefix for the error tables. The error table session state of operation in the shared location,
names can have up to eleven characters. $PMStorageDir.
 The Integration Service creates the error tables without  The Integration Service also saves relational target recovery
specifying primary and foreign keys. However, you can specify information in target database tables.
key columns  When the Integration Service performs recovery, it restores the
state of operation to recover the session from the point of
Workflow Recovery interruption. It uses the target recovery data to determine how
 You can recover a workflow if the Integration Service can access to recover the target tables
the workflow state of operation.
 The workflow state of operation includes the status of tasks in  You can configure the session to save the session state of
the workflow and workflow variable values. operation even if you do not save the workflow state of
 The Integration Service stores the state in memory or on disk, operation.
based on how you configure the workflow:  You can recover the session, or you can recover the workflow
 Enable recovery - When you enable a workflow for recovery, the from the session.
Integration Service saves the workflow state of operation in a  The session state of operation includes the following information:
shared location. You can recover the workflow if it terminates, Source - If the output from a source is not deterministic and
stops, or aborts. The workflow does not have to be running. repeatable, the Integration Service saves the result from the SQL
 Suspend - When you configure a workflow to suspend on error, query to a shared storage file in $PMStorageDir.
the Integration Service stores the workflow state of operation Transformation - The Integration Service creates checkpoints in
in memory. You can recover the suspended workflow if a task $PMStorageDir to determine where to start processing the
fails. You can fix the task error and recover the workflow pipeline when it runs a recovery session. When you run a session
with an incremental Aggregator transformation, the Integration
 The Integration Service recovers tasks in the workflow based on Service creates a backup of the Aggregator cache files in
the recovery strategy of the task. $PMCacheDir at the beginning of a session run. The Integration
 By default, the recovery strategy for Session and Command Service promotes the backup cache to the initial cache at the
tasks is to fail the task and continue running the workflow. beginning of a session recovery run.
 You can configure the recovery strategy for Session and Relational target recovery data - The Integration Service writes
Command tasks. recovery information to recovery tables in the target database
 The strategy for all other tasks is to restart the task to determine the last row committed to the target when the
session was interrupted
Workflow State of Operation
 When you enable a workflow for recovery, the Integration Service Target Recovery Tables
stores the workflow state of operation in the shared location,  When the Integration Service runs a session that has a resume
$PMStorageDir. recovery strategy, it writes to recovery tables on the target
 The Integration Service can restore the state of operation to database system.
recover a stopped, aborted, or terminated workflow.
48
 When the Integration Service recovers the session, it uses
information in the recovery tables to determine where to begin
loading data to target tables.

The Integration Service creates the following recovery tables in


the target database:
PM_RECOVERY - Contains target load information for the session
run. The Integration Service removes the information from this
table after each successful session and initializes the information
at the beginning of subsequent sessions.
PM_TGT_RUN_ID - Contains information the Integration Service
uses to identify each target on the database. The information
remains in the table between session runs. If you manually create
this table, you must create a row and enter a value other than
zero for LAST_TGT_RUN_ID to ensure that the session recovers
successfully. Automatically Recovering Terminated Tasks
PM_REC_STATE - Contains information the Integration Service  When you have the high availability option, you can configure
uses to determine if it needs to write messages to the target automatic recovery of terminated tasks.
table during recovery for a real-time session.  When you enable automatic task recovery, the Integration
Service recovers terminated Session and Command tasks
Task Recovery Strategies without user intervention if the workflow is still running.
Each task in a workflow has a recovery strategy. When the  You configure the number of times the Integration Service
Integration Service recovers a workflow, it recovers tasks based attempts to recover the task.
on the recovery strategy:  Enable automatic task recovery in the workflow properties
Restart task - When the Integration Service recovers a workflow,
it restarts each recoverable task that is configured with a restart Resuming Sessions
strategy. You can configure Session and Command tasks with a  When the Integration Service resumes a session, the recovery
restart recovery strategy. All other tasks have a restart recovery session must produce the same data as the original session.
strategy by default.  The session is not valid if you configure recovery to resume from
Fail task and continue workflow - When the Integration Service the last checkpoint, but the session cannot produce repeatable
recovers a workflow, it does not recover the task. The task status data
becomes failed, and the Integration Service continues running
the workflow. Configure a fail recovery strategy if you want to  When you recover a session from the last checkpoint, the
complete the workflow, but you do not want to recover the task. Integration Service restores the session state of operation to
You can configure Session and Command tasks with the fail task determine the type of recovery it can perform:
and continue workflow recovery strategy. Incremental - The Integration Service starts processing data at
Resume from the last checkpoint - The Integration Service the point of interruption. It does not read or transform rows that
recovers a stopped, aborted, or terminated session from the last it processed before the interruption. By default, the Integration
checkpoint. You can configure a Session task with a resume Service attempts to perform incremental recovery.
strategy. Full - The Integration Service reads all source rows again and
performs all transformation logic if it cannot perform incremental
recovery. The Integration Service begins writing to the target at
the last commit point. If any session component requires full
recovery, the Integration Service performs full recovery on the
session

49
 The mapping contains a Union and Custom transformation that
never produce repeatable data.
 The Lookup transformation produces repeatable data when it
receives repeatable data.
 Therefore, the target does not receive repeatable data and you
cannot configure the session to resume recovery.
 You can modify the mapping to enable resume recovery.
 Add a Sorter transformation configured for distinct output rows
immediately after the transformations that never output
repeatable data. Add the Sorter transformation after the
Custom transformation.

Working with Repeatable Data


 When you configure recovery to resume from the last checkpoint,
the recovery session must be able to produce the same data in  The Lookup transformation produces repeatable data because it
the same order as the original session. receives repeatable data from the Sorter transformation.
 When you validate a session, the Workflow Manager verifies that  The following table describes when transformations produce
the transformations are configured to produce repeatable and repeatable data:
deterministic data.
 The session is not valid if you configure recovery to resume from
the last checkpoint, but the transformations are not configured
for repeatable and deterministic data.
 Session data is repeatable when all targets receive repeatable
data from the following mapping objects:
Source - The output data from the source is repeatable between
the original run and the recovery run.
Transformation - The output data from each transformation to
the target is repeatable.

Configuring a Mapping for Recovery


 You can configure a mapping to enable transformations in the
session to produce the same data between the session and
recovery run.
 When a mapping contains a transformation that never produces
repeatable data, you can add a transformation that always
produces repeatable data immediately after it.

 The mapping contains two Source Qualifier transformations that


produce repeatable data.

50
result, it cannot perform incremental recovery if the session fails.
When you perform recovery for sessions that contain SQL
overrides, the Integration Service must drop and recreate views.
- When you modify a workflow or session between the
interrupted run and the recovery run, you might get unexpected
results. The Integration Service does not prevent recovery for a
modified workflow. The recovery workflow or session log displays
a message when the workflow or the task is modified since last
run.
- The pre-session command and pre-SQL commands run only
once when you resume a session from the last checkpoint. If a
pre- or post- command or SQL command fails, the Integration
Service runs the command again during recovery. Design the
commands so you can rerun them.
- You cannot configure a session to resume if it writes to a
relational target in bulk mode

Unrecoverable Workflows or Tasks


 In some cases, the Integration Service cannot recover a workflow
or task. You cannot recover a workflow or task under the
 You can configure the Output is Repeatable and Output is following circumstances:
Deterministic properties for the following transformations, or You change the number of partitions - If you change the number
you can add a transformation that produces repeatable data of partitions after a session fails, the recovery session fails.
immediately after these transformations: The interrupted task has a fail recovery strategy - If you
- Application Source Qualifier configure a Command or Session recovery to fail and continue
- Custom the workflow recovery, the task is not recoverable.
- External Procedure Recovery storage file is missing - The Integration Service fails the
- Source Qualifier, relational recovery session or workflow if the recovery storage file is
- Stored Procedure missing from $PMStorageDir or if the definition of $PMStorageDir
changes between the original and recovery run.
Steps to Recover Workflows and Tasks Recovery table is empty or missing from the target database -
 You can use one of the following methods to recover a workflow The Integration Service fails a recovery session under the
or task: following circumstances:
Recover a workflow - Continue processing the workflow from the - You deleted the table after the Integration Service created it.
point of interruption. - The session enabled for recovery failed immediately after the
Recover a session - Recover a session but not the rest of the Integration Service removed the recovery information from the
workflow. table.
Recover a workflow from a session - Recover a session and
continue processing a workflow  You might get inconsistent data if you perform recovery under
the following circumstances:
 If you want to restart a workflow or task without recovery, you The sources or targets change after the initial session - If you
can restart the workflow or task in cold start mode. drop or create indexes or edit data in the source or target tables
before recovering a session, the Integration Service may return
Rules and Guidelines for Session Recovery missing or repeat rows.
 Configuring Recovery to Resume from the Last Checkpoint The source or target code pages change after the initial session
Use the following rules and guidelines when configuring recovery failure - If you change the source or target code page, the
to resume from last checkpoint: Integration Service might return incorrect data. You can perform
- You must use pass-through partitioning for each transformation. recovery if the code pages are two-way compatible with the
- You cannot configure recovery to resume from the last original code pages.
checkpoint for a session that runs on a grid. Stopping and Aborting
- When you configure a session for full pushdown optimization,  When you stop a workflow, the Integration Service tries to stop
the Integration Service runs the session on the database. As a all the tasks that are currently running in the workflow
51
If it cannot stop the workflow, you need to abort the workflow. Aggregator transformation - You cannot use an incremental
aggregation in a concurrent workflow. The session fails.
 The Integration Service can stop the following tasks completely: Lookup transformation - Use the following rules and guidelines
- Session for Lookup transformations in concurrent workflows:
- Command - You can use static or dynamic lookup cache with concurrent
- Timer workflows.
- Event-Wait - When the cache is non-persistent, the Integration Service adds
- Worklet the workflow run ID as a prefix to the cache file name.
- When the cache is an unnamed persistent cache, the Integration
 When you stop a Command task that contains multiple Service adds the run instance name as a prefix to the cache file
commands, the Integration Service finishes executing the name.
current command and does not run the rest of the commands. - If the cache is a dynamic, unnamed, persistent cache and the
 The Integration Service cannot stop tasks such as the Email task . current workflow is configured to allow concurrent runs with the
 For example, if the Integration Service has already started same instance name, the session fails.
sending an email when you issue the stop command, the - If the lookup cache name is parameterized, the Integration
Integration Service finishes sending the email before it stops Service names the cache file with the parameter value. Pass a
running the workflow. different file name for each run instance.
 The Integration Service aborts the workflow if the Repository Sequence Generator transformation - To avoid generating the
Service process shuts down same set of sequence numbers for concurrent workflows,
configure the number of cached values in the Sequence
Concurrent Workflows Generator transformation.
Use the following rules and guidelines for concurrent workflows:
- You cannot reference workflow run instances in parameter files. Grid Processing
To use separate parameters for each instance, you must configure Rules and Guidelines for Configuring a Workflow or Session to
different parameter files. Run on a Grid
- If you use the same cache file name for more than one - To run sessions over the grid, verify that the operating system
concurrent workflow instance, each workflow instance will be and bit mode is the same for each node of the grid. A session
valid. However, sessions will fail if conflicts occur writing to the might not run on the grid if the nodes run on different operating
cache. systems or bit modes.
- You can use pmcmd to restart concurrent workflows by run ID - If you override a service process variable, ensure that the
or instance name. Integration Service can access input files, caches, logs, storage
- If you configure multiple instances of a workflow and you and temporary directories, and source and target file directories.
schedule the workflow, the Integration Service runs all instances - To ensure that a Session, Command, or predefined Event-Wait
at the scheduled time. You cannot run instances on separate task runs on a particular node, configure the Integration Service
schedules. to check resources and specify a resource requirement for a the
- Configure a worklet to run concurrently on the worklet General task.
tab. - To ensure that session threads for a mapping object run on a
- You must enable a worklet to run concurrently if the parent particular node, configure the Integration Service to check
workflow is enabled to run concurrently. Otherwise the workflow resources and specify a resource requirement for the object.
is invalid. - When you run a session that creates cache files, configure the
- You can enable a worklet to run concurrently and place it in two root and cache directory to use a shared location to ensure
non-concurrent workflows. The Integration Service can run the consistency between cache files.
two worklets concurrently. - Ensure the Integration Service builds the cache in a shared
- Two workflows enabled to run concurrently can run the same location when you add a partition point at a Joiner
worklet. One workflow can run two instances of the same transformation and the transformation is configured for 1:n
worklet if the worklet has no persisted variables. partitioning. The cache for the Detail pipeline must be shared.
- A session in a worklet can run concurrently with a session in - Ensure the Integration Service builds the cache in a shared
another worklet of the same instance name when the session location when you add a partition point at a Lookup
does not contain persisted variables. transformation, and the partition type is not hash auto-keys.
- When you run a session that uses dynamic partitioning, and you
 The following transformations have restrictions with concurrent want to distribute session threads across all nodes in the grid,
workflows:
52
configure dynamic partitioning for the session to use the “Based
on number of nodes in the grid” method.  Workflow Variable Start and Current Values Conceptually, the
- You cannot run a debug session on a grid. Integration Service holds two different values for a workflow
- You cannot configure a resume recovery strategy for a session variable during a workflow run:
that you run on a grid. - Start value of a workflow variable
- Configure the session to run on a grid when you work with - Current value of a workflow variable
sessions that take a long time to run.
- Configure the workflow to run on a grid when you have multiple  The Integration Service looks for the start value of a variable in
concurrent sessions. the following order:
- You can run a persistent profile session on a grid, but you 1. Value in parameter file
cannot run a temporary profile session on a grid. 2. Value saved in the repository (if the variable is persistent)
- When you use a Sequence Generator transformation, increase 3. User-specified default value
the number of cached values to reduce the communication 4. Datatype default value
required between the master and worker DTM processes and the
repository. Parameters and Variables in Sessions
- To ensure that the Log Viewer can accurately order log events  Use user-defined session parameters in session or workflow
when you run a workflow or session on a grid, use time properties and define the values in a parameter file
synchronization software to ensure that the nodes of a grid use a  In the parameter file, folder and session names are case
synchronized date/time. sensitive
- If the workflow uses an Email task in a Windows environment,
configure the same Microsoft Outlook profile on each node to  User-defined session parameters do not have default values, so
ensure the Email task can run. you must define them in a parameter file.
 If the Integration Service cannot find a value for a user-defined
Workflow Variables session parameter, it fails the session, takes an empty string as
 Use the following types of workflow variables: the default value, or fails to expand the parameter at run time
Predefined workflow variables - The Workflow Manager
provides predefined workflow variables for tasks within a  You can run a session with different parameter files when you use
workflow. pmcmd to start a session.
User-defined workflow variables - You create user-defined  The parameter file you set with pmcmd overrides the parameter
workflow variables when you create a workflow. file in the session or workflow properties

 Use workflow variables when you configure the following types  You cannot define built-in session parameter values in the
of tasks: parameter file. The Integration Service expands these
Assignment tasks - Use an Assignment task to assign a value to a parameters when the session runs.
user-defined workflow variable. For example, you can increment
a user-defined counter variable by setting the variable to its Rules and Guidelines for Creating File Parameters and Database
current value plus 1. Connection Parameters
Decision tasks - Decision tasks determine how the Integration - When you define the parameter file as a resource for a node,
Service runs a workflow. For example, use the Status variable to verify the Integration Service runs the session on a node that can
run a second session only if the first session completes access the parameter file. Define the resource for the node,
successfully. configure the Integration Service to check resources, and edit the
Links - Links connect each workflow task. Use workflow variables session to require the resource.
in links to create branches in the workflow. For example, after a - When you create a file parameter, use alphanumeric and
Decision task, you can create one link to follow when the decision underscore characters. For example, to name a source file
condition evaluates to true, and another link to follow when the parameter, use $InputFileName, such as $InputFile_Data.
decision condition evaluates to false. - All session file parameters of a particular type must have
Timer tasks - Timer tasks specify when the Integration Service distinct names. For example, if you create two source file
begins to run the next task in the workflow. Use a user-defined parameters, you might name them $SourceFileAccts and
date/time variable to specify the time the Integration Service $SourceFilePrices.
starts to run the next task - When you define the parameter in the file, you can reference
any directory local to the Integration Service.
Print 161-163
53
- Use a parameter to define the location of a file. Clear the entry when the session fails. You can assign these variables the values
in the session properties that define the file location. Enter the of mapping parameters and variables.
full path of the file in the parameter file.
- You can change the parameter value in the parameter file Passing Parameter and Variable Values between Sessions
between session runs, or you can create multiple parameter files. To pass the mapping variable value from s_NewCustomers to
If you use multiple parameter files, use the pmcmd Startworkflow s_MergeCustomers, complete the following steps:
command with the -paramfile or -localparamfile options to 1. Configure the mapping associated with session
specify which parameter file to use. s_NewCustomers to use a mapping variable, for example, $
$Count1.
Use the following rules and guidelines when you create database 2. Configure the mapping associated with session
connection parameters: s_MergeCustomers to use a mapping variable, for example, $
- You can change connections for relational sources, targets, $Count2.
lookups, and stored procedures. 3. Configure the workflow to use a user-defined workflow
- When you define the parameter, you can reference any variable, for example, $$PassCountValue.
database connection in the repository. 4. Configure session s_NewCustomers to assign the value of
- Use the same $DBConnection parameter for more than one mapping variable $$Count1 to workflow variable $
connection in a session. $PassCountValue after the session completes successfully.
5. Configure session s_MergeCustomers to assign the value of
Mapping Parameters and Variables in Sessions workflow variable $$PassCountValue to mapping variable $
 If you use mapping variables in a session, you can clear any of the $Count2 before the session starts.
variable values saved in the repository by editing the session.
 When you clear the variable values, the Integration Service uses Parameter Files
the values in the parameter file the next time you run a session.  A parameter file is a list of parameters and variables and their
 If the session does not use a parameter file, the Integration associated values.
Service uses the values assigned in the pre-session variable  These values define properties for a service, service process,
assignment. workflow, worklet, or session.
 If there are no assigned values, the Integration Service uses the  The Integration Service applies these values when you run a
initial values defined in the mapping. workflow or session that uses the parameter file

Assigning Parameter and Variable Values in a Session  The Integration Service reads the parameter file at the start of
 You can update the values of certain parameters and variables the workflow or session to determine the start values for the
before or after a non-reusable session runs parameters and variables defined in the file
 Note: You cannot assign parameters and variables in reusable
sessions  Consider the following information when you use parameter files:
Types of parameters and variables - You can define different
 You can update the following types of parameters and variables types of parameters and variables in a parameter file. These
before or after a session runs: include service variables, service process variables, workflow and
Pre-session variable assignment - You can update mapping worklet variables, session parameters, and mapping parameters
parameters, mapping variables, and session parameters before a and variables.
session runs. You can assign these parameters and variables the Properties you can set in parameter files - Use parameters and
values of workflow or worklet variables in the parent workflow or variables to define many properties in the Designer and
worklet. Therefore, if a session is in a worklet within a workflow, Workflow Manager. For example, you can enter a session
you can assign values from the worklet variables, but not the parameter as the update override for a relational target instance,
workflow variables. and set this parameter to the UPDATE statement in the
 You cannot update mapplet variables in the pre-session variable parameter file. The Integration Service expands the parameter
assignment. when the session runs.
Post-session on success variable assignment - You can update Parameter file structure - Assign a value for a parameter or
workflow or worklet variables in the parent workflow or worklet variable in the parameter file by entering the parameter or
after the session completes successfully. You can assign these variable name and value on a single line in the form name=value.
variables the values of mapping parameters and variables. Groups of parameters and variables must be preceded by a
Post-session on failure variable assignment - You can update heading that identifies the service, service process, workflow,
workflow or worklet variables in the parent workflow or worklet worklet, or session to which the parameters or variables apply.
54
Parameter file location - Specify the parameter file to use for a declared in a mapping or mapplet, $$VariableName is a mapping
workflow or session. You can enter the parameter file name and variable.
directory in the workflow or session properties or in the pmcmd
command line. You cannot define the following types of variables in a parameter
file:
Parameter and Variable Types $Source and $Target connection variables - Define the database
 A parameter file can contain different types of parameters and location for a relational source, relational target, lookup table, or
variables. When you run a session or workflow that uses a stored procedure.
parameter file, the Integration Service reads the parameter file Email variables - Define session information in an email message
and expands the parameters and variables defined in the file. such as the number of rows loaded, the session completion time,
 You can define the following types of parameter and variable in a and read and write statistics.
parameter file: Local variables - Temporarily store data in variable ports in
Service variables - Define general properties for the Integration Aggregator, Expression, and Rank transformations.
Service such as email addresses, log file counts, and error Built-in variables - Variables that return run-time or system
thresholds. $PMSuccessEmailUser, $PMSessionLogCount, and information, such as Integration Service name or system date.
$PMSessionErrorThreshold are examples of service variables. The Transaction control variables - Define conditions to commit or
service variable values you define in the parameter file override rollback transactions during the processing of database rows.
the values that are set in the Administrator tool. ABAP program variables - Represent SAP structures, fields in SAP
Service process variables - Define the directories for Integration structures, or values in the ABAP program.
Service files for each Integration Service process. $PMRootDir,
$PMSessionLogDir, and $PMBadFileDir are examples of service Parameter File Structure
process variables. The service process variable values you define Warning: The Integration Service uses the period character (.) to
in the parameter file override the values that are set in the qualify folder, workflow, and session names when you run a
Administrator tool. If the Integration Service uses operating workflow with a parameter file. If the folder name contains a
system profiles, the operating system user specified in the period (.), the Integration Service cannot qualify the names
operating system profile must have access to the directories you properly and fails the workflow.
define for the service process variables.
Workflow variables - Evaluate task conditions and record  You can define parameters and variables in any section in the
information in a workflow. For example, you can use a workflow parameter file.
variable in a Decision task to determine whether the previous  If you define a service or service process variable in a workflow,
task ran properly. In a workflow, $TaskName.PrevTaskStatus is a worklet, or session section, the variable applies to the service
predefined workflow variable and $$VariableName is a user- process that runs the task.
defined workflow variable.  Similarly, if you define a workflow variable in a session section,
Worklet variables - Evaluate task conditions and record the value of the workflow variable applies only when the session
information in a worklet. You can use predefined worklet runs
variables in a parent workflow, but you cannot use workflow
variables from the parent workflow in a worklet. In a worklet,  The following table describes the parameter file headings that
$TaskName.PrevTaskStatus is a predefined worklet variable and $ define each section in the parameter file and the scope of the
$VariableName is a user-defined worklet variable. parameters and variables that you define in each section:
Session parameters - Define values that can change from session
to session, such as database connections or file names.
$PMSessionLogFile and $ParamName are user-defined session
parameters.
Mapping parameters - Define values that remain constant
throughout a session, such as state sales tax rates. When
declared in a mapping or mapplet, $$ParameterName is a user-
defined mapping parameter.
Mapping variables - Define values that can change during a
session. The Integration Service saves the value of a mapping
variable to the repository at the end of each successful session
run and uses that value the next time you run the session. When

55
 Create workflow variables to store the session parameter file
names. For example, you create user-defined workflow variables
$$s_1ParamFileName, $$s_2ParamFileName, and $
 If you specify the same heading more than once in a parameter $s_3ParamFileName. In the session properties for each session,
file, the Integration Service uses the information in the section set the parameter file name to a workflow variable:
below the first heading and ignores the information in the
sections below subsequent identical headings.

 If you define the same parameter or variable in multiple


sections in the parameter file, the parameter or variable with
the smallest scope takes precedence over parameters or
variables with larger scope. For example, a parameter file
contains the following sections:
[HET_TGTS.WF:wf_TGTS_ASC_ORDR]
$DBConnection_ora=Ora2
[HET_TGTS.WF:wf_TGTS_ASC_ORDR.ST:s_TGTS_ASC_ORDR]
$DBConnection_ora=Ora3  If you use a variable as the session parameter file name, and you
In session s_TGTS_ASC_ORDR, the value for session parameter define the same parameter or variable in both the session and
$DBConnection_ora is “Ora3.” In all other sessions in the workflow parameter files, the Integration Service sets parameter
workflow, it is “Ora2.” and variable values according to the following rules:
- When a parameter or variable is defined in the same section
Using Variables to Specify Session Parameter Files of the workflow and session parameter files, the Integration
 When you define a workflow parameter file and a session Service uses the value in the workflow parameter file.
parameter file for a session within the workflow, the Integration - When a parameter or variable is defined in both the session
Service uses the workflow parameter file, and ignores the section of the session parameter file and the workflow section
session parameter file. of the workflow parameter file, the Integration Service uses the
 To use a variable to define the session parameter file name, you value in the session parameter file.
must define the session parameter file name and set
$PMMergeSessParamFile=TRUE in the workflow parameter file. Using a Parameter File with pmcmd
 The $PMMergeSessParamFile property causes the Integration  The -localparamfile option defines a parameter file on a local
Service to read both the session and workflow parameter files. machine that you can reference when you do not have access to
parameter files on the Integration Service machine.
For example, you configured a workflow to run two concurrent
instances that contains three sessions:  The following command starts workflowA using the parameter
file, myfile.txt:
pmcmd startworkflow -uv USERNAME -pv PASSWORD -s
SALES:6258 -f east -w wSalesAvg -paramfile '\
$PMRootDir/myfile.txt' workflowA

56
The following command starts taskA using the parameter file, for the error log table name prefix, do not specify a prefix that
myfile.txt: exceeds 19 characters when naming Oracle, Sybase, or Teradata
pmcmd starttask -uv USERNAME -pv PASSWORD -s SALES:6258 -f error log tables. The error table names can have up to 11
east -w wSalesAvg -paramfile '\$PMRootDir/myfile.txt' taskA characters, and Oracle, Sybase, and Teradata databases have a
maximum length of 30 characters for table names. The parameter
Guidelines for Creating Parameter Files or variable name can exceed 19 characters
List all session parameters - Session parameters do not have
default values. If the Integration Service cannot find a value for Troubleshooting Parameters and Parameter Files
a session parameter, it may fail the session, take an empty I have a section in a parameter file for a session, but the
string as the default value, or fail to expand the parameter at Integration Service does not seem to read it.
run time. Session parameter names are not case sensitive.  Make sure to enter folder and session names as they appear in
List all necessary mapping parameters and variables - Mapping the Workflow Manager. Also, use the appropriate prefix for all
parameter and variable values become start values for user-defined session parameters.
parameters and variables in a mapping. Mapping parameter and
variable names are not case sensitive. I am trying to use a source file parameter to specify a source file
Enter folder names for non-unique session names - When a and location, but the Integration Service cannot find the source
session name exists more than once in a repository, enter the file.
folder name to indicate the location of the session.  Make sure to clear the source file directory in the session
Precede parameters and variables in mapplets with the mapplet properties. The Integration Service concatenates the source file
name - Use the following format: directory with the source file name to locate the source file.
mapplet_name.parameter_name=value Also, make sure to enter a directory local to the Integration
mapplet2_name.variable_name=value Service and to use the appropriate delimiter for the operating
Use multiple parameter files - You assign parameter files to system.
workflows, worklets, and sessions individually. You can specify
the same parameter file for all of these tasks or create multiple I am trying to run a workflow with a parameter file and one of
parameter files. the sessions keeps failing.
When defining parameter values, do not use unnecessary line  The session might contain a parameter that is not listed in the
breaks or spaces - The Integration Service interprets additional parameter file. The Integration Service uses the parameter file
spaces as part of a parameter name or value. Use correct date to start all sessions in the workflow. Check the session
formats for datetime values. Use the following date formats for properties, and then verify that all session parameters are
datetime values: defined correctly in the parameter file.
- MM/DD/RR
- MM/DD/YYYY I ran a workflow or session that uses a parameter file, and it
- MM/DD/RR HH24:MI failed. What parameter and variable values does the Integration
- MM/DD/YYYY HH24:MI Service use during the recovery run?
- MM/DD/RR HH24:MI:SS  For service variables, service process variables, session
- MM/DD/YYYY HH24:MI:SS parameters, and mapping parameters, the Integration Service
- MM/DD/RR HH24:MI:SS.MS uses the values specified in the parameter file, if they exist. If
- MM/DD/YYYY HH24:MI:SS.MS values are not specified in the parameter file, then the
- MM/DD/RR HH24:MI:SS.US Integration Service uses the value stored in the recovery
- MM/DD/YYYY HH24:MI:SS.US storage file. For workflow, worklet, and mapping variables, the
- MM/DD/RR HH24:MI:SS.NS Integration Service always uses the value stored in the
- MM/DD/YYYY HH24:MI:SS.NS recovery storage file.
You can use the following separators: dash (-), slash (/), backslash
(\), colon (:), period (.), and space. The Integration Service ignores Tips for Parameters and Parameter Files
extra spaces. You cannot use one- or three-digit values for year or Use a single parameter file to group parameter information for
the “HH12” format for hour. related sessions.
Do not enclose parameter or variable values in quotes - The  When sessions are likely to use the same database connection or
Integration Service interprets everything after the first equals directory, you might want to include them in the same
sign as part of the value. parameter file. When connections or directories change, you
Use a parameter or variable value of the proper length for the can update information for all sessions by editing one parameter
error log table name prefix - If you use a parameter or variable file.
57
 This allows the Integration Service to update the target
Use pmcmd and multiple parameter files for sessions with incrementally, rather than forcing it to process the entire source
regular cycles and recalculate the same data each time you run the session.
 Sometimes you reuse session parameters in a cycle. For example,  For example, you might have a session using a source that
you might run a session against a sales database everyday, but receives new data every day.
run the same session against sales and marketing databases  You can capture those incremental changes because you have
once a week. You can create separate parameter files for each added a filter condition to the mapping that removes pre-
session run. Instead of changing the parameter file in the existing data from the flow of data. You then enable incremental
session properties each time you run the weekly session, use aggregation
pmcmd to specify the parameter file to use when you start the
session.  When the session runs with incremental aggregation enabled for
the first time on March 1, you use the entire source.
Use reject file and session log parameters in conjunction with  This allows the Integration Service to read and store the
target file or target database connection parameters. necessary aggregate data. On March 2, when you run the
 When you use a target file or target database connection session again, you filter out all the records except those time-
parameter with a session, you can keep track of reject files by stamped March 2.
using a reject file parameter. You can also use the session log  The Integration Service then processes the new data and updates
parameter to write the session log to the target machine. the target accordingly.

Use a resource to verify the session runs on a node that has  Consider using incremental aggregation in the following
access to the parameter file. circumstances:
 In the Administrator tool, you can define a file resource for each You can capture new source data - Use incremental aggregation
node that has access to the parameter file and configure the when you can capture new source data each time you run the
Integration Service to check resources. Then, edit the session session. Use a Stored Procedure or Filter transformation to
that uses the parameter file and assign the resource. When you process new data.
run the workflow, the Integration Service runs the session with Incremental changes do not significantly change the target - Use
the required resource on a node that has the resource available. incremental aggregation when the changes do not significantly
change the target. If processing the incrementally changed
You can override initial values of workflow variables for a source alters more than half the existing target, the session may
session by defining them in a session section. not benefit from using incremental aggregation. In this case, drop
 If a workflow contains an Assignment task that changes the value the table and recreate the target with complete source data
of a workflow variable, the next session in the workflow uses
the latest value of the variable as the initial value for the Integration Service Processing for Incremental Aggregation
session. To override the initial value for the session, define a  The first time you run an incremental aggregation session, the
new value for the variable in the session section of the Integration Service processes the entire source.
parameter file.  At the end of the session, the Integration Service stores aggregate
data from that session run in two files, the index file and the
You can define parameters and variables using other data file.
parameters and variables.  The Integration Service creates the files in the cache directory
 For example, in the parameter file, you can define session specified in the Aggregator transformation properties.
parameter $PMSessionLogFile using a service process variable
as follows:  Each subsequent time you run the session with incremental
$PMSessionLogFile=$PMSessionLogDir/TestRun.txt aggregation, you use the incremental source changes in the
session.
 For each input record, the Integration Service checks historical
information in the index file for a corresponding group.
 If it finds a corresponding group, the Integration Service performs
the aggregate operation incrementally, using the aggregate data
Incremental Aggregation for that group, and saves the incremental change.
 If the source changes incrementally and you can capture changes,  If it does not find a corresponding group, the Integration Service
you can configure the session to process those changes. creates a new group and saves the record data.

58
 When writing to the target, the Integration Service applies the  Avoid moving or modifying the index and data files that store
changes to the existing target. historical aggregate information.
 It saves modified aggregate data in the index and data files to be  If you move the files into a different directory, the Integration
used as historical data the next time you run the session. Service rebuilds the files the next time you run the session

 If the source changes significantly and you want the Integration High Precision – 28 digits
Service to continue saving aggregate data for future incremental
changes, configure the Integration Service to overwrite existing
aggregate data with new aggregate data.

 Each subsequent time you run a session with incremental


aggregation, the Integration Service creates a backup of the
incremental aggregation files.
 The cache directory for the Aggregator transformation must
contain enough disk space for two sets of the files.

 When you partition a session that uses incremental aggregation,


the Integration Service creates one set of cache files for each
partition.
 The Integration Service creates new aggregate data, instead of
using historical data, when you perform one of the following
tasks:
- Save a new version of the mapping.
- Configure the session to reinitialize the aggregate cache.
- Move the aggregate files without correcting the configured path
or directory for the files in the session properties.
- Change the configured path or directory for the aggregate files
without moving the files to the new location.
- Delete cache files.
- Decrease the number of partitions.

 When the Integration Service rebuilds incremental aggregation


files, the data in the previous files is lost.
Note: To protect the incremental aggregation files from file
corruption or disk failure, periodically back up the files

Reinitializing the Aggregate Files


 If the source tables change significantly, you might want the
Integration Service to create new aggregate data, instead of
using historical data.
 For example, you can reinitialize the aggregate cache if the source
for a session changes incrementally every day and completely
changes once a month.
 When you receive the new source data for the month, you might
configure the session to reinitialize the aggregate cache,
truncate the existing target, and use the new source table
during the session.
 After you run a session that reinitializes the aggregate cache, edit
the session properties to disable the Reinitialize Aggregate
Cache option.

59
change the number of rows that pass through the
transformation.
Change the transaction boundary - For example, the Transaction
Control transformation is active because it defines a commit or
roll back transaction based on an expression evaluated for each
row.
Change the row type - For example, the Update Strategy
transformation is active because it flags rows for insert, delete,
update, or reject

 The Designer does not allow you to connect multiple active


transformations or an active and a passive transformation to the
same downstream transformation or transformation input
group because the Integration Service may not be able to
concatenate the rows passed by active transformations.

 The Sequence Generator transformation is an exception to the


rule.
 The Designer does allow you to connect a Sequence Generator
transformation and an active transformation to the same
downstream transformation or transformation input group.
 A Sequence Generator transformation does not receive data. It
generates unique numeric values

6. TRANSFORMATIONS

Active Transformations
 An active transformation can perform any of the following
actions:
Change the number of rows that pass through the
transformation - For example, the Filter transformation is active
because it removes rows that do not meet the filter condition. All
multi-group transformations are active because they might

60
Ports
Port name - Use the following conventions while naming ports:
- Begin with a single- or double-byte letter or single- or double-
byte underscore (_).
- Port names can contain any of the following single- or double-
byte characters: a letter, number, underscore (_), $, #, or @.

 All multi-group transformations are active transformations. You


cannot connect multiple active transformations or an active and
a passive transformation to the same downstream
transformation or transformation input group.
 Some multiple input group transformations require the
Integration Service to block data at an input group while the
Integration Service waits for a row from a different input group.
 A blocking transformation is a multiple input group
transformation that blocks incoming data.

 The following transformations are blocking transformations:


- Custom transformation with the Inputs May Block property
enabled
- Joiner transformation configured for unsorted input

 The Designer performs data flow validation when you save or


Creating a Transformation
validate a mapping.
You can create transformations using the following Designer
 Some mappings that contain active or blocking transformations
tools:
might not be valid.
Mapping Designer - Create transformations that connect sources
to targets. Transformations in a mapping cannot be used in other
 Using Expression Editor - The maximum number of characters
mappings unless you configure them to be reusable.
that you can include in an expression is 32,767
Transformation Developer - Create individual transformations,
called reusable transformations that use in multiple mappings.
Mapplet Designer - Create and configure a set of  Adding Expressions to a Port - In the Data Masking
transformations, called mapplets, that you use in multiple transformation, you can add an expression to an input port.
mapping For all other transformations, add the expression to an output
port.

61
Output port - The system default value for output transformation
Guidelines for Configuring Variable Ports errors is ERROR. The default value appears in the transformation
 Consider the following factors when you configure variable ports as ERROR(‘transformation error’). If a transformation error
in a transformation: occurs, the Integration Service skips the row. The Integration
Port order - The Integration Service evaluates ports by Service notes all input rows skipped by the ERROR function in the
dependency. The order of the ports in a transformation must session log file.
match the order of evaluation: input ports, variable ports, output
ports. The following errors are considered transformation errors:
Data type - The datatype you choose reflects the return value of - Data conversion errors, such as passing a number to a date
the expression you enter. function.
Variable initialization - The Integration Service sets initial values - Expression evaluation errors, such as dividing by zero.
in variable ports, where you can create counters. - Calls to an ERROR function.
Since variables can reference other variables, the display order
for variable ports is the same as the order in which the Input/output port - The system default value for null input is the
Integration Service evaluates each variable same as input ports, NULL. The system default value appears as a
blank in the transformation. The default value for output
 The display order for output ports does not matter since output transformation errors is the same as output ports. The default
ports cannot reference other output ports. Be sure output ports value for output transformation errors does not display in the
display at the bottom of the list of ports. transformation.

Variable Initialization Note: The Java Transformation converts PowerCenter datatypes


 The Integration Service does not set the initial value for variables to Java datatypes, based on the Java Transformation port type.
to NULL. Instead, the Integration Service uses the following Default values for null input differ based on the Java datatype.
guidelines to set initial values for variables:
- Zero for numeric ports The following table shows the system default values for ports in
- Empty strings for string ports connected transformations:
- 01/01/1753 for Date/Time ports with PMServer 4.0 date
handling compatibility disabled
- 01/01/0001 for Date/Time ports with PMServer 4.0 date
handling compatibility enabled

 Therefore, use variables as counters, which need an initial value.


For example, you can create a numeric variable with the
following expression:
VAR1 + 1
 This expression counts the number of rows in the VAR1 port. If
the initial value of the variable were set to NULL, the expression Entering User-Defined Default Values
would always evaluate to NULL. This is why the initial value is set  You can override the system default values with user-defined
to zero. default values for supported input, input/output, and output
ports within a connected transformation:
Using Default Values for Ports Input ports - You can enter user-defined default values for input
 All transformations use default values that determine how the ports if you do not want the Integration Service to treat null
Integration Service handles input null values and output values as NULL.
transformation errors. Output ports - You can enter user-defined default values for
 Input, output, and input/output ports are created with a system output ports if you do not want the Integration Service to skip
default value that you can sometimes override with a user- the row or if you want the Integration Service to write a specific
defined default value. message with the skipped row to the session log.
 Default values have different functions in different types of ports: Input/output ports - You can enter user-defined default values to
Input port - The system default value for null input ports is NULL. handle null input values for input/output ports in the same way
It displays as a blank in the transformation. If an input value is you can enter user-defined default values for null input values for
NULL, the Integration Service leaves it as NULL. input ports. You cannot enter user-defined default values for
output transformation errors in an input/output port.
62
 If a transformation is not connected to the mapping data flow,
Note: The Integration Service ignores user-defined default the Integration Service ignores user-defined default values.
values for unconnected transformations. For example, if you call  If any input port is unconnected, its value is assumed to be NULL
a Lookup or Stored Procedure transformation through an and the Integration Service uses the default value for that
expression, the Integration Service ignores any user-defined input port.
default value and uses the system default value only.  If an input port default value contains the ABORT function and
the input value is NULL, the Integration Service immediately
stops the session. Use the ABORT function as a default value to
restrict null input values. The first null value in an input port
stops the session.
 If an output port default value contains the ABORT function and
any transformation error occurs for that port, the session
immediately stops. Use the ABORT function as a default value to
enforce strict rules for transformation errors. The first
transformation error for this port stops the session.
 The ABORT function, constant values, and constant expressions
override ERROR functions configured in output port expressions.

Use the ABORT function to abort a session when the Integration


Service encounters null input values. Reusable Transformations
 The Designer stores each reusable transformation as metadata
Entering User-Defined Default Output Values separate from any mapping that uses the transformation.
 If you review the contents of a folder in the Navigator, you see
General Rules for Default Values the list of all reusable transformations in that folder
Use the following rules and guidelines when you create default  You can create most transformations as a non-reusable or
values: reusable.
 The default value must be a NULL, a constant value, a constant  However, you can only create the External Procedure
expression, an ERROR function, or an ABORT function. transformation as a reusable transformation.
 For input/output ports, the Integration Service uses default
values to handle null input values. The output default value of  When you add instances of a reusable transformation to
input/output ports is always ERROR(‘Transformation Error’). mappings, you must be careful that changes you make to the
 Variable ports do not use default values. transformation do not invalidate the mapping or generate
 You can assign default values to group by ports in the Aggregator unexpected data
and Rank transformations. Instances and Inherited Changes
 Not all port types in all transformations allow user-defined  Note that instances do not inherit changes to property settings,
default values. If a port does not allow user-defined default only modifications to ports, expressions, and the name of the
values, the default value field is disabled. transformation.
 Not all transformations allow user-defined default values.

63
A. AGGREGATOR  Note: The Integration Service uses memory to process an
Aggregator transformation with sorted ports. The Integration
 The Integration Service performs aggregate calculations as it
Service does not use cache memory. You do not need to
reads and stores data group and row data in an aggregate cache.
configure cache memory for Aggregator transformations that
 It’s unlike the Expression transformation, in that you use the
use sorted ports.
Aggregator transformation to perform calculations on groups.
 The result of an aggregate expression varies based on the group
 The Expression transformation permits you to perform
by ports in the transformation.
calculations on a row-by-row basis.
 For example, when the Integration Service calculates the
following aggregate expression with no group by ports defined,
 After you create a session that includes an Aggregator
it finds the total quantity of items sold:
transformation, you can enable the session option, Incremental
SUM( QUANTITY )
Aggregation. When the Integration Service performs
incremental aggregation, it passes source data through the
 However, if you use the same expression, and you group by the
mapping and uses historical cache data to perform aggregation
ITEM port, the Integration Service returns the total quantity of
calculations incrementally.
items sold, by item
Configuring Aggregator Transformation Properties
Aggregate Functions
Modify the Aggregator Transformation properties on the
 Use the following aggregate functions within an Aggregator
Properties tab.
transformation. You can nest one aggregate function within
Configure the following options:
another aggregate function.
 The transformation language includes the following aggregate
functions:
- AVG – COUNT – FIRST – LAST – MAX – MEDIAN - MIN
- PERCENTILE – STDDEV - SUM
- VARIANCE

Nested Aggregate Functions


 You can include multiple single-level or multiple nested functions
in different output ports in an Aggregator transformation.
 However, you cannot include both single-level and nested
functions in an Aggregator transformation.
 Therefore, if an Aggregator transformation contains a single-level
function in any output port, you cannot use a nested function in
any other port in that transformation.
 When you include single-level and nested functions in the same
Aggregator transformation, the Designer marks the mapping or
mapplet invalid.
 If you need to create both single-level and nested functions
create separate Aggregator transformations.

Group By Ports
Configuring Aggregate Caches  When you group values, the Integration Service produces one
 When you run a session that uses an Aggregator transformation, row for each group.
the Integration Service creates the index and the data caches in  If you do not group values, the Integration Service returns one
memory to process the transformation. row for all input rows.
 If the Integration Service requires more space, it stores overflow  The Integration Service typically returns the last row of each
values in cache files. group (or the last row received) with the result of the
 You can configure the index and the data caches in the aggregation.
Aggregator transformation or in the session properties. Or,  However, if you specify a particular row to be returned (for
 You can configure the Integration Service to determine the cache example, by using the FIRST function), the Integration Service
size at run time. then returns the specified row

64
Sorted Input Conditions - You can bind one shared library or DLL to multiple Custom
 Do not use sorted input if either of the following conditions are transformation instances if you write the procedure code to
true: handle multiple Custom transformation instances.
- The aggregate expression uses nested aggregate functions. - When you write the procedure code, you must make sure it
- The session uses incremental aggregation. does not violate basic mapping rules.
- The Custom transformation sends and receives high precision
Tips for Aggregator Transformations decimals as high precision decimals.
Limit connected input/output or output ports. - Use multi-threaded code in Custom transformation procedures
 Limit the number of connected input/output or output ports to
reduce the amount of data the Aggregator transformation stores Creating Groups and Ports
in the data cache.  You can create multiple input groups and multiple output groups
Filter the data before aggregating it. in a Custom transformation.
 If you use a Filter transformation in the mapping, place the  You must create at least one input group and one output group
transformation before the Aggregator transformation to reduce  When you create a passive Custom transformation, you can
unnecessary aggregation. only create one input group and one output group.

B. CUSTOM TRANSFORMATION Working with Port Attributes


 Custom transformations operate in conjunction with procedures  Ports have attributes, such as datatype and precision. When you
you create outside of the Designer interface to extend create a Custom transformation, you can create user-defined
PowerCenter functionality. port attributes.
 You can create a Custom transformation and bind it to a  User-defined port attributes apply to all ports in a Custom
procedure that you develop using the Custom transformation transformation.
functions.  For example, you create a external procedure to parse XML data.
 You can create a port attribute called “XML path” where you can
 Use the Custom transformation to create transformation define the position of an element in the XML hierarchy.
applications, such as sorting and aggregation, which require
all input rows to be processed before outputting any output Custom Transformation Properties
rows.  Properties for the Custom transformation apply to both the
 To support this process, the input and output functions occur procedure and the transformation.
separately in Custom transformations compared to External  Configure the Custom transformation properties on the
Procedure transformations. Properties tab of the Custom transformation.
 The following table describes the Custom transformation
 The Integration Service passes the input data to the procedure properties:
using an input function.
 The output function is a separate function that you must enter
in the procedure code to pass output data to the Integration
Service.
 In contrast, in the External Procedure transformation, an
external procedure function does both input and output, and
its parameters consist of all the ports of the transformation.

 You can also use the Custom transformation to create a


transformation that requires multiple input groups, multiple
output groups, or both

Rules and Guidelines for Custom Transformations


- Custom transformations are connected transformations. You
cannot reference a Custom transformation in an expression.
- You can include multiple procedures in one module. For
example, you can include an XML writer procedure and an XML
parser procedure in the same module.

65
Within the session - Configure the session to treat the source
rows as data driven.

 If you do not configure the Custom transformation to define the


update strategy, or you do not configure the session as data
driven, the Integration Service does not use the external
procedure code to flag the output rows.
 Instead, when the Custom transformation is active, the
Integration Service flags the output rows as insert.
 When the Custom transformation is passive, the Integration
Service retains the row type.
 For example, when a row flagged for update enters a passive
Custom transformation, the Integration Service maintains the
row type and outputs the row as update.

Working with Transaction Control


 You can define transaction control for Custom transformations
using the following transformation properties:
Transformation Scope - Determines how the Integration Service
applies the transformation logic to incoming data.
Generate Transaction - Indicates that the procedure generates
transaction rows and outputs them to the output groups.

 The following table describes how the Integration Service handles


transaction boundaries at Custom transformations:

Setting the Update Strategy


 Use an active Custom transformation to set the update strategy
for a mapping at the following levels:
Within the procedure - You can write the external procedure
code to set the update strategy for output rows. The external
Blocking Input Data
procedure can flag rows for insert, update, delete, or reject.
 By default, the Integration Service concurrently reads sources in a
Within the mapping - Use the Custom transformation in a
target load order group.
mapping to flag rows for insert, update, delete, or reject. Select
 However, you can write the external procedure code to block
the Update Strategy Transformation property for the Custom
input data on some input groups.
transformation.
66
 Blocking is the suspension of the data flow into an input group of
a multiple input group transformation.
 To use a Custom transformation to block input data, you must
write the procedure code to block and unblock data. Locale
 You must also enable blocking on the Properties tab for the  The locale identifies the language and region of the characters in
Custom transformation the data.
 Choose a locale from the list.
Note: When the procedure blocks data and you configure the  The Data Masking transformation masks character data with
Custom transformation as a non-blocking transformation, the characters from the locale that you choose.
Integration Service fails the session  The source data must contain characters that are compatible with
the locale that you select.
Validating Mappings with Custom Transformations
 When you include a Custom transformation in a mapping, both Seed
the Designer and Integration Service validate the mapping.  The seed value is a start number that enables the Data Masking
 The Designer validates the mapping you save or validate and the transformation to return deterministic data with Key Masking.
Integration Service validates the mapping when you run the  The Data Masking transformation creates a default seed value
session. that is a random number between 1 and 1,000.
 You can enter a different seed value or apply a mapping
Validating at Design Time parameter value.
 When you save or validate a mapping, the Designer performs  Apply the same seed value to a column to return the same
data flow validation. masked data values in different source data.
 When the Designer does this, it verifies that the data can flow
from all sources in a target load order group to the targets Associated O/P
without blocking transformations blocking all sources.  The Associated O/P is the associated output port for an input
port.
Validating at Runtime  The Data Masking transformation creates an output port for
 When you run a session, the Integration Service validates the each input port.
mapping against the procedure code at runtime.  The naming convention is out_<port name>. The associated
 When the Integration Service does this, it tracks whether or not output port is a readonly port.
it allows the Custom transformations to block data

C. DATA MASKING

 The Data Masking transformation modifies source data based on


masking rules that you configure for each column.
 You can maintain data relationships in the masked data and
maintain referential integrity between database tables.

 You can apply the following types of masking with the Data
Masking transformation:
Key masking - Produces deterministic results for the same source
data, masking rules, and seed value.
Random masking - Produces random, non-repeatable results for
the same source data and masking rules.
Expression masking - Applies an expression to a port to change
the data or create data.
Substitution - Replaces a column of data with similar but
unrelated data from a dictionary.
Special mask formats - Applies special mask formats to change
SSN, credit card number, phone number, URL, email address, or
IP addresses

67
You can configure the following masking rules for key masking  If the source month contains 31 days, the Data Masking
string values: transformation returns a month that has 31 days.
Seed - Apply a seed value to generate deterministic masked data  If the source month is February, the Data Masking transformation
for a column. Select one of the following options: returns February.
Value - Accept the default seed value or enter a number  The Data Masking transformation always generates valid dates
between 1 and 1,000.
Mapping Parameter - Use a mapping parameter to define the Masking with Mapping Parameters
seed value. The Designer displays a list of the mapping  The Integration Service applies a default seed value in the
parameters that you create for the mapping. Choose the mapping following circumstances:
parameter from the list to use as the seed value. - The mapping parameter option is selected for a column but the
Mask Format - Define the type of character to substitute for each session has no parameter file.
character in the input data. You can limit each character to an - You delete the mapping parameter.
alphabetic, numeric, or alphanumeric character type. - A mapping parameter seed value is not between 1 and 1,000.
Source String Characters - Define the characters in the source
string that you want to mask. For example, mask the number sign  The Integration Service applies masked values from the default
(#) character whenever it occurs in the input data. The Data value file.
Masking transformation masks all the input characters when  You can edit the default value file to change the default values.
Source String Characters is blank. The Data Masking  The default value file is an XML file in the following location:
transformation does not always return unique data if the number <PowerCenter Installation
of source string characters is less than the number of result string Directory>\infa_shared\SrcFiles\defaultValue.xml
characters.
Result String Characters - Substitute the characters in the target  The name-value pair for the seed is default_seed = "500".
string with the characters you define in Result String Characters.
For example, enter the following characters to configure each  If the seed value in the default value file is not between 1 and
mask to contain all uppercase alphabetic characters: 1,000, the Integration Service assigns a value of 725 to the seed
ABCDEFGHIJKLMNOPQRSTUVWXYZ and writes a message in the session log.

Masking Numeric Values Substitution Masking


 Configure key masking for numeric source data to generate  Substitution masking replaces a column of data with similar but
deterministic output. unrelated data.
 When you configure a column for numeric key masking, the  When you configure substitution masking, define the relational
Designer assigns a random seed value to the column. or flat file dictionary that contains the substitute values.
 When the Data Masking transformation masks the source data, it  The Data Masking transformation performs a lookup on the
applies a masking algorithm that requires the seed. dictionary that you configure.
 You can change the seed value for a column to produce  The Data Masking transformation replaces source data with data
repeatable results if the same source value occurs in a different from the dictionary. Substitution is an effective way to replace
column. production data with realistic test data.
 For example, you want to maintain a primary-foreign key  You can substitute data with repeatable or non-repeatable
relationship between two tables. values.
 In each Data Masking transformation, enter the same seed value  When you choose repeatable values, the Data Masking
for the primary-key column as the seed value for the foreign-key transformation produces deterministic results for the same
column. source data and seed value.
 The Data Masking transformation produces deterministic results  You must configure a seed value to substitute data with
for the same numeric values. deterministic results.
 The referential integrity is maintained between the tables.  The Integration Service maintains a storage table of source and
masked values for repeatable masking
Masking Datetime Values
 The Data Masking transformation can mask dates between 1753 Dictionaries
and 2400 with key masking.  A dictionary is a flat file or relational table that contains the
 If the source year is in a leap year, the Data Masking substitute data and a serial number for each row in the file.
transformation returns a year that is also a leap year.  The Integration Service generates a number to retrieve a
dictionary row by the serial number.
68
 The Integration Service generates a hash key for repeatable  If the Integration Service does not find a row, it retrieves a row
substitution masking or a random number for non-repeatable from the dictionary with a hash key.
masking.
 You can configure an additional lookup condition if you configure Rules and Guidelines for Substitution Masking
repeatable substitution masking. - If a storage table does not exist for a repeatable substitution
 You can configure a dictionary to mask more than one port in the mask, the session fails.
Data Masking transformation. - If the dictionary contains no rows, the Integration Service
 The following example shows a flat file dictionary that contains returns default masked values.
first name and gender: - When the Integration Service finds an input value with the
SNO,GENDER,FIRSTNAME locale, dictionary, and seed in the storage table, it retrieves the
1,M,Adam masked value, even if the row is no longer in the dictionary.
2,M,Adeel - If you delete a connection object or modify the dictionary,
3,M,Adil truncate the storage table. Otherwise, you might get unexpected
4,F,Alice results.
5,F,Alison - If the number of values in the dictionary is less than the number
of unique values in the source data, the Integration Service
 Use the following rules and guidelines when you create a cannot mask the data with unique repeatable values. The
dictionary: Integration Service returns default masked values.
- Each record in the dictionary must have a serial number. The
serial number does not have to be the key in a relational table. Random Masking
- The serial numbers are sequential integers starting at one. The  Random masking generates random nondeterministic masked
serial numbers cannot have a missing number in the sequence. data.
- The serial number column can be anywhere in a dictionary row.  The Data Masking transformation returns different values when
It can have any label. the same source value occurs in different rows.
- The first row of a flat file dictionary must have column labels to  You can define masking rules that affect the format of data that
identify the fields in each record. The fields are separated by the Data Masking transformation returns.
commas. If the first row does not contain column labels, the  Mask numeric, string, and date values with random masking
Integration Service takes the values of the fields in the first row as
column names. Masking Numeric Values
- A flat file dictionary must be in the $PMLookupFileDir lookup  When you mask numeric data, you can configure a range of
file directory. By default, this directory is in the following location: output values for a column.
<PowerCenter_Installation_Directory>\server\infa_shared\LkpFil  The Data Masking transformation returns a value between the
es minimum and maximum values of the range depending on port
- If you create a flat file dictionary on Windows and copy it to a precision.
UNIX machine, verify that the file format is correct for UNIX. For  To define the range, configure the minimum and maximum
example, Windows and UNIX use different characters for the end ranges or a blurring range based on a variance from the original
of line marker. source value.
- If you configure substitution masking for more than one port, all
relational dictionaries must be in the same database schema.  You can configure the following masking parameters for numeric
- You cannot change the dictionary type or the substitution data:
dictionary name in session properties. Range - Define a range of output values. The Data Masking
transformation returns numeric data between the minimum and
Storage Tables maximum values.
 The Data Masking transformation maintains storage tables for Blurring Range - Define a range of output values that are within a
repeatable substitution between sessions. fixed variance or a percentage variance of the source data. The
 A storage table row contains the source column and a masked Data Masking transformation returns numeric data that is close
value pair. to the value of the source data. You can configure a range and a
 Each time the Integration Service masks a value with a repeatable blurring range.
substitute value, it searches the storage table by dictionary
name, locale, column name, input value, and seed. Masking String Values
 If it finds a row, it returns the masked value from the storage  Configure random masking to generate random output for string
table to the Data Masking transformation. columns.
69
 To configure limitations for each character in the output string,
configure a mask format.
 Configure filter characters to define which source characters to
mask and the characters to mask them with.

 You can apply the following masking rules for a string port:
Range - Configure the minimum and maximum string length. The
Data Masking transformation returns a string of random
characters between the minimum and maximum string length.
Mask Format - Define the type of character to substitute for each
character in the input data. You can limit each character to an
alphabetic, numeric, or alphanumeric character type.
Source String Characters - Define the characters in the source
string that you want to mask. For example, mask the number sign
(#) character whenever it occurs in the input data. The Data
Masking transformation masks all the input characters when
Source String Characters is blank.
Result String Replacement Characters - Substitute the characters Source String Characters
in the target string with the characters you define in Result String  Source string characters are source characters that you choose to
Characters. For example, enter the following characters to mask or not mask.
configure each mask to contain uppercase alphabetic characters  The position of the characters in the source string does not
A - Z: ABCDEFGHIJKLMNOPQRSTUVWXYZ matter.
 The source characters are case sensitive.
Masking Date Values  You can configure any number of characters.
 To mask date values with random masking, either configure a  When Characters is blank, the Data Masking transformation
range of output dates or choose a variance. replaces all the source characters in the column.
 When you configure a variance, choose a part of the date to blur.
 Choose the year, month, day, hour, minute, or second.  Select one of the following options for source string characters:
 The Data Masking transformation returns a date that is within the Mask Only - The Data Masking transformation masks characters
range you configure. in the source that you configure as source string characters. For
example, if you enter the characters A, B, and c, the Data Masking
 You can configure the following masking rules when you mask a transformation replaces A, B, or c with a different character when
datetime value: the character occurs in source data. A source character that is not
Range - Sets the minimum and maximum values to return for the an A, B, or c does not change. The mask is case sensitive.
selected datetime value Mask All Except - Masks all characters except the source string
Blurring - Masks a date based on a variance that you apply to a characters that occur in the source string. For example, if you
unit of the date. The Data Masking transformation returns a date enter the filter source character “-” and select Mask All Except,
that is within the variance. You can blur the year, month, day, or the Data Masking transformation does not replace the “-”
hour. Choose a low and high variance to apply. character when it occurs in the source data. The rest of the
source characters change.
Applying Masking Rules
 Apply masking rules based on the source datatype. Example
 When you click a column property on the Masking Properties tab,  A source file has a column named Dependents. The Dependents
the Designer displays masking rules based on the datatype of column contains more than one name separated by commas.
the port. You need to mask the Dependents column and keep the comma
 The following table describes the masking rules that you can in the test data to delimit the names. For the Dependents
configure based on the masking type and the source datatype: column, select Source String Characters. Choose Don’t Mask and
enter “,” as the source character to skip

Result String Replacement Characters

70
 Result string replacement characters are characters you choose  The following types of masks retain the format of the original
as substitute characters in the masked data. data:
 When you configure result string replacement characters, the - Social Security numbers
Data Masking transformation replaces characters in the source - Credit card numbers
string with the result string replacement characters. - Phone numbers
 To avoid generating the same output for different input values, - URL addresses
configure a wide range of substitute characters, or mask only a - Email addresses
few source characters. - IP addresses
 The position of each character in the string does not matter.  The Data Masking transformation returns a masked value that
has a realistic format, but is not a valid value.
 Select one of the following options for result string replacement  For example, when you mask an SSN, the Data Masking
characters: transformation returns an SSN that is the correct format but is
Use Only - Mask the source with only the characters you define not valid.
as result string replacement characters. For example, if you enter  You can configure repeatable masking for Social Security numbers
the characters A, B, and c, the Data Masking transformation
replaces every character in the source column with an A, B, or c.  When the source data format or datatype is invalid for a mask,
The word “horse” might be replaced with “BAcBA.” the Integration Service applies a default mask to the data.
Use All Except - Mask the source with any characters except the  The Integration Service applies masked values from the default
characters you define as result string replacement characters. For value file
example, if you enter A, B, and c result string replacement
characters, the masked data never has the characters A, B, or c. Default Value File
 The default value file is an XML file in the following location:
Example <PC Directory>\infa_shared\SrcFiles\defaultValue.xml
To replace all commas in the Dependents column with
semicolons, complete the following tasks:  The defaultValue.xml file contains the following name-value
1. Configure the comma as a source string character and select pairs:
Mask Only. <?xml version="1.0" standalone="yes" ?>
The Data Masking transformation masks only the comma when it <defaultValue
occurs in the Dependents column. default_char = "X"
2. Configure the semicolon as a result string replacement default_digit = "9"
character and select Use Only. default_date = "11/11/1111 00:00:00"
The Data Masking transformation replaces each comma in the default_email = "abc@xyz.com"
Dependents column with a semicolon default_ip = "99.99.9.999"
default_url = "http://www.xyz.com"
Range default_phone = "999 999 999 9999"
 Define a range for numeric, date, or string data. When you define default_ssn = "999-99-9999"
a range for numeric or date values the Data Masking default_cc = "9999 9999 9999 9999"
transformation masks the source data with a value between the default_seed = "500"
minimum and maximum values. When you configure a range for />
a string, you configure a range of string lengths.
Rules and Guidelines for Data Masking Transformations
Blurring - The Data Masking transformation does not mask null values. If
 Blurring creates an output value within a fixed or percent the source data contains null values, the Data Masking
variance from the source data value. Configure blurring to transformation returns null values. To replace null values, add an
return a random value that is close to the original value. You can upstream transformation that allows user-defined default values
blur numeric and date values for input ports.
- When the source data format or datatype is invalid for a mask,
the Integration Service applies a default mask to the data. The
Integration Service applies masked values from a default values
file.
Special Mask Formats - The Data masking transformation returns an invalid Social
Security number with the same format and area code as the
71
source. If the Social Security Administration has issued more than 3. When you develop Informatica external procedures, the
half of the numbers for an area, the Data Masking transformation External Procedure transformation provides the information
might not be able to return unique invalid Social Security required to generate Informatica external procedure stubs.
numbers with key masking.
External Procedure transformations return one or no output
D. EXTERNAL PROCEDURE rows for each input row.
 External Procedure transformations operate in conjunction with
procedures you create outside of the Designer interface to E. FILTER
extend PowerCenter functionality.  A filter condition returns TRUE or FALSE for each row that the
 If you are an experienced programmer, you may want to develop Integration Service evaluates, depending on whether a row
complex functions within a dynamic link library (DLL) or UNIX meets the specified condition.
shared library, instead of creating the necessary Expression  For each row that returns TRUE, the Integration Services pass
transformations in a mapping. through the transformation.
 To get this kind of extensibility, use the Transformation Exchange  For each row that returns FALSE, the Integration Service drops
(TX) dynamic invocation interface built into PowerCenter. and writes a message to the session log.
 Using TX, you can create an Informatica External Procedure
transformation and bind it to an external procedure that you  You cannot concatenate ports from more than one
have developed. transformation into the Filter transformation.
 You can bind External Procedure transformations to two kinds of  The input ports for the filter must come from a single
external procedures: transformation
- COM external procedures (available on Windows only)  If the filter condition evaluates to NULL, the row is treated as
- Informatica external procedures (available on Windows, AIX, HP- FALSE.
UX, Linux, and Solaris) Note: The filter condition is case sensitive.

External Procedures and External Procedure Transformations F. HTTP


 There are two components to TX: external procedures and  The HTTP transformation enables you to connect to an HTTP
External Procedure transformations. server to use its services and applications.
 When you run a session with an HTTP transformation, the
 An External procedure exists separately from the Integration Integration Service connects to the HTTP server and issues a
Service. request to retrieve data from or update data on the HTTP server,
 It consists of C, C++, or Visual Basic code written by a user to depending on how you configure the transformation:
define a transformation. Read data from an HTTP server - When the Integration Service
 This code is compiled and linked into a DLL or shared library, reads data from an HTTP server; it retrieves the data from the
which is loaded by the Integration Service at runtime. HTTP server and passes the data to the target or a downstream
 An external procedure is “bound” to an External Procedure transformation in the mapping. For example, you can connect to
transformation. an HTTP server to read current inventory data, perform
calculations on the data during the PowerCenter session, and
 An External Procedure transformation is created in the Designer. pass the data to the target.
It is an object that resides in the Informatica repository and Update data on the HTTP server - When the Integration Service
serves several purposes: writes to an HTTP server, it posts data to the HTTP server and
1. It contains the metadata describing the following external passes HTTP server responses to the target or a downstream
procedure. It is through this metadata that the Integration transformation in the mapping. For example, you can post data
Service knows the “signature” (number and types of parameters, providing scheduling information from upstream transformations
type of return value, if any) of the external procedure. to the HTTP server during a session
2. It allows an external procedure to be referenced in a mapping.
By adding an instance of an External Procedure transformation to G. IDENTITY RESOLUTION
a mapping, you call the external procedure bound to that
 The Identity Resolution transformation is an active
transformation.
transformation that you can use to search and match data in
Note: You can create a connected or unconnected External
Informatica Identity Resolution (IIR).
Procedure.

72
 The PowerCenter Integration Service uses the search definition
that you specify in the Identity Resolution transformation to  A Java transformation handles output rows based on the
search and match data residing in the IIR tables. transformation type as follows:
 The input and output views in the system determine the input - A passive Java transformation generates one output row for
and output ports of the transformation. each input row in the transformation after processing each
input row.
Groups and Ports - An active Java transformation generates multiple output rows
 An Identity Resolution transformation contains an input group for each input row in the transformation.
and an output group.
 The input group has ports that represent fields in the input view  Use the generateRow method to generate each output row. For
of the search definition. example, if the transformation contains two input ports that
 The output group has ports that represent fields in the output represent a start date and an end date, you can use the
view of the search definition in addition to ports that describe generateRow method to generate an output row for each date
the result of the search. between the start date and the end date.

H. JAVA Datatype Conversion


 Extend PowerCenter functionality with the Java transformation.  When a Java transformation reads input rows, it converts input
 The Java transformation provides a simple native programming port datatypes to Java datatypes.
interface to define transformation functionality with the Java  When a Java transformation writes output rows, it converts Java
programming language. datatypes to output port datatypes.
 You can use the Java transformation to quickly define simple or  For example, the following processing occurs for an input port
moderately complex transformation functionality without with the integer datatype in a Java transformation:
advanced knowledge of the Java programming language or an 1. The Java transformation converts the integer datatype of the
external Java development environment. input port to the Java primitive int datatype.
 The PowerCenter Client uses the Java Development Kit (JDK) to 2. In the transformation, the transformation treats the value of
compile the Java code and generate byte code for the the input port as the Java primitive int datatype.
transformation. 3. When the transformation generates the output row, it converts
 The PowerCenter Client stores the byte code in the PowerCenter the Java primitive int datatype to the integer datatype.
repository.
 The Integration Service uses the Java Runtime Environment (JRE)  A Java transformation can have input ports, output ports, and
to execute generated byte code at run time. input/output ports. You create and edit groups and ports on the
 When the Integration Service runs a session with a Java Ports tab.
transformation, the Integration Service uses the JRE to execute  A Java transformation always has one input group and one
the byte code and process input rows and generate output output group.
rows.  The transformation is not valid if it has multiple input or output
 Create Java transformations by writing Java code snippets that groups
define transformation logic.
Compiling a Java Transformation
 Define transformation behavior for a Java transformation based  The PowerCenter Client uses the Java compiler to compile the
on the following events: Java code and generate the byte code for the transformation.
- The transformation receives an input row.  The Java compiler compiles the Java code and displays the results
- The transformation has processed all input rows. of the compilation in the Output window on the code entry
- The transformation receives a transaction notification such as tabs.
commit or rollback.  The Java compiler installs with the PowerCenter Client in the
java/bin directory.
 To compile the full code for the Java transformation, click
Active and Passive Java Transformations Compile on the Java Code tab.
 When you create a Java transformation, you define its type as
active or passive.  When you create a Java transformation, it contains a Java class
 After you set the transformation type, you cannot change it. that defines the base functionality for a Java transformation.
 A Java transformation runs the Java code that you define on the
On Input Row tab one time for each row of input data.
73
 The full code for the Java class contains the template class code  By default, when you add ports to a Joiner transformation, the
for the transformation, plus the Java code you define on the ports from the first source pipeline display as detail sources.
code entry tabs.  Adding the ports from the second source pipeline sets them as
 When you compile a Java transformation, the PowerCenter Client master sources.
adds the code from the code entry tabs to the template class for
the transformation to generate the full class code for the  If you use multiple ports in the join condition, the Integration
transformation. Service compares the ports in the order you specify.
 The PowerCenter Client then calls the Java compiler to compile
the full class code.  If you join Char and Varchar datatypes, the Integration Service
 The Java compiler compiles the transformation and generates the counts any spaces that pad Char values as part of the string:
byte code for the transformation Char(40) = "abcd"
Varchar(40) = "abcd"
Note: The Java transformation is also compiled when you click OK  The Char value is “abcd” padded with 36 blank spaces, and the
in the transformation Integration Service does not join the two fields because the Char
field contains trailing spaces.
Java Expressions
 You can invoke PowerCenter expressions in a Java transformation  Note: The Joiner transformation does not match null values. For
with the Java programming language. example, if both EMP_ID1 and EMP_ID2 contain a row with a
 Use expressions to extend the functionality of a Java null value, the Integration Service does not consider them a
transformation. match and does not join the two rows. To join rows with null
 For example, you can invoke an expression in a Java values, replace null input with default values, and then join on
transformation to look up the values of input or output ports or the default values.
look up the values of Java transformation variables.
 To invoke expressions in a Java transformation, you generate the Using Sorted Input
Java code or use Java transformation API methods to invoke the  When you configure the Joiner transformation to use sorted data,
expression. the Integration Service improves performance by minimizing
 You invoke the expression and use the result of the expression on disk input and output.
the appropriate code entry tab.  When you configure the sort order in a session, you can select a
 You can generate the Java code that invokes an expression or use sort order associated with the Integration Service code page.
API methods to write the Java code that invokes the expression  When you run the Integration Service in Unicode mode, it uses
the selected session sort order to sort character data.
I. JOINER  When you run the Integration Service in ASCII mode, it sorts all
 The master pipeline ends at the Joiner transformation, while character data using a binary sort order.
the detail pipeline continues to the target.  To ensure that data is sorted as the Integration Service requires,
the database sort order must be the same as the user-defined
 The Joiner transformation accepts input from most session sort order.
transformations. However, consider the following limitations on  When you join sorted data from partitioned pipelines, you must
the pipelines you connect to the Joiner transformation: configure the partitions to maintain the order of sorted data
- You cannot use a Joiner transformation when either input
pipeline contains an Update Strategy transformation.  If you pass unsorted or incorrectly sorted data to a Joiner
- You cannot use a Joiner transformation if you connect a transformation configured to use sorted data, the session fails
Sequence Generator transformation directly before the Joiner and the Integration Service logs the error in the session log file
transformation

 Joiner Data Cache Size - Default cache size is 2,000,000 bytes


 Joiner Index Cache Size - Default cache size is 1,000,000 bytes Adding Transformations to the Mapping
 When you add transformations between the sort origin and the
 To improve performance for an unsorted Joiner transformation, Joiner transformation, use the following guidelines to maintain
use the source with fewer rows as the master source. sorted data:
 To improve performance for a sorted Joiner transformation, use - Do not place any of the following transformations between the
the source with fewer duplicate key values as the master sort origin and the Joiner transformation: CARNUXX
- Custom
74
- Unsorted Aggregator  Joining two branches might impact performance if the Joiner
- Normalizer transformation receives data from one branch much later than
- Rank the other branch.
- Union  The Joiner transformation caches all the data from the first
- XML Parser branch, and writes the cache to disk if the cache fills.
- XML Generator  The Joiner transformation must then read the data from disk
- Mapplet, if it contains one of the above when it receives the data from the second branch. This can slow
- You can place a sorted Aggregator transformation between processing.
the sort origin and the Joiner transformation if you use the
following guidelines: Joining Two Instances of the Same Source
- Configure the Aggregator transformation for sorted input.  You can also join same source data by creating a second instance
- Use the same ports for the group by columns in the of the source.
Aggregator transformation as the ports at the sort origin.  After you create the second source instance, you can join the
- The group by ports must be in the same order as the ports at pipelines from the two source instances.
the sort origin.  If you want to join unsorted data, you must create two
- When you join the result set of a Joiner transformation with instances of the same source and join the pipelines
another pipeline, verify that the data output from the first Joiner  The following figure shows two instances of the same source
transformation is sorted. joined with a Joiner transformation:
Tip: You can place the Joiner transformation directly after the
sort origin to maintain sorted data

Example of a Join Condition


 For example, you configure Sorter transformations in the master
and detail pipelines with the following sorted ports:
1. ITEM_NO
2. ITEM_NAME
3. PRICE Note: When you join data using this method, the Integration
 When you configure the join condition, use the following Service reads the source data for each source instance, so
guidelines to maintain sort order: performance can be slower than joining two branches of a
- You must use ITEM_NO in the first join condition. pipeline
- If you add a second join condition, you must use ITEM_NAME.
- If you want to use PRICE in a join condition, you must also use Guidelines for Joining Data from a Single Source
ITEM_NAME in the second join condition.  Use the following guidelines when deciding whether to join
 If you skip ITEM_NAME and join on ITEM_NO and PRICE, you branches of a pipeline or join two instances of a source:
lose the sort order and the Integration Service fails the session. - Join two branches of a pipeline when you have a large source or
if you can read the source data only once. For example, you can
Joining Two Branches of the Same Pipeline only read source data from a message queue once.
 When you join data from the same source, you can create two - Join two branches of a pipeline when you use sorted data. If the
branches of the pipeline. source data is unsorted and you use a Sorter transformation to
 When you branch a pipeline, you must add a transformation sort the data, branch the pipeline after you sort the data.
between the source qualifier and the Joiner transformation in - Join two instances of a source when you need to add a blocking
at least one branch of the pipeline. transformation to the pipeline between the source and the Joiner
 You must join sorted data and configure the Joiner transformation.
transformation for sorted input. - Join two instances of a source if one pipeline may process
 The following figure shows a mapping that joins two branches slower than the other pipeline.
of the same pipeline: - Join two instances of a source if you need to join unsorted data

Blocking the Source Pipelines


 When you run a session with a Joiner transformation, the
Integration Service blocks and un-blocks the source data, based
on the mapping configuration and whether you configure the
Joiner transformation for sorted input.

75
 You configure how the Integration Service applies transformation
Unsorted Joiner Transformation logic and handles transaction boundaries using the
 When the Integration Service processes an unsorted Joiner transformation scope property.
transformation, it reads all master rows before it reads the  You configure transformation scope values based on the mapping
detail rows. configuration and whether you want to preserve or drop
 To ensure it reads all master rows before the detail rows, the transaction boundaries.
Integration Service blocks the detail source while it caches rows
from the master source.  You can preserve transaction boundaries when you join the
 Once the Integration Service reads and caches all master rows, it following sources:
unblocks the detail source and reads the detail rows. You join two branches of the same source pipeline - Use the
 Some mappings with unsorted Joiner transformations violate Transaction transformation scope to preserve transaction
data flow validation. boundaries.
You join two sources, and you want to preserve transaction
Sorted Joiner Transformation boundaries for the detail source - Use the Row transformation
 When the Integration Service processes a sorted Joiner scope to preserve transaction boundaries in the detail pipeline.
transformation, it blocks data based on the mapping
configuration.  You can drop transaction boundaries when you join the following
 Blocking logic is possible if master and detail input to the Joiner sources:
transformation originate from different sources. You join two sources or two branches and you want to drop
 The Integration Service uses blocking logic to process the Joiner transaction boundaries - Use the All Input transformation scope
transformation if it can do so without blocking all sources in a to apply the transformation logic to all incoming data and drop
target load order group simultaneously. transaction boundaries for both pipelines.
 Otherwise, it does not use blocking logic.
 Instead, it stores more rows in the cache.  The following table summarizes how to preserve transaction
 When the Integration Service can use blocking logic to process boundaries using transformation scopes with the Joiner
the Joiner transformation, it stores fewer rows in the cache, transformation:
increasing performance.

Caching Master Rows


 When the Integration Service processes a Joiner transformation,
it reads rows from both sources concurrently and builds the
index and data cache based on the master rows.
 The Integration Service then performs the join based on the
detail source data and the cache data.
 The number of rows the Integration Service stores in the cache
depends on the partitioning scheme, the source data, and
whether you configure the Joiner transformation for sorted Preserving Transaction Boundaries for a Single Pipeline
input.  When you join data from the same source, use the Transaction
 To improve performance for an unsorted Joiner transformation, transformation scope to preserve incoming transaction
use the source with fewer rows as the master source. boundaries for a single pipeline.
 To improve performance for a sorted Joiner transformation, use  Use the Transaction transformation scope when the Joiner
the source with fewer duplicate key values as the master. transformation joins data from the same source, either two
branches of the same pipeline or two output groups of one
Working with Transactions transaction generator.
 When the Integration Service processes a Joiner transformation,  Use this transformation scope with sorted data and any join type.
it can apply transformation logic to all data in a transaction, all  When you use the Transaction transformation scope, verify that
incoming data, or one row of data at a time. master and detail pipelines originate from the same transaction
 The Integration Service can drop or preserve transaction control point and that you use sorted input.
boundaries depending on the mapping configuration and the  For example, in “Preserving Transaction Boundaries for a Single
transformation scope. Pipeline” on page 223 the Sorter transformation is the
transaction control point.

76
 You cannot place another transaction control point between the Configure the Lookup transformation to perform the following
Sorter transformation and the Joiner transformation. types of lookups:
 In the mapping, the master and detail pipeline branches originate Relational or flat file lookup -
from the same transaction control point, and the Integration  Perform a lookup on a flat file or a relational table.
Service joins the pipeline branches with the Joiner  When you create a Lookup transformation using a relational table
transformation, preserving transaction boundaries as the lookup source, you can connect to the lookup source
using ODBC and import the table definition as the structure for
 The following figure shows a mapping that joins two branches of the Lookup transformation.
a pipeline and preserves transaction boundaries:  Use the following options with relational lookups:
- Override the default SQL statement to add a WHERE clause or
to query multiple tables.
- Sort null data high or low, based on database support.
- Perform case-sensitive comparisons based on the database
support.

 When you create a Lookup transformation using a flat file as a


Preserving Transaction Boundaries in the Detail Pipeline
lookup source, the Designer invokes the Flat File Wizard.
 When you want to preserve the transaction boundaries in the
 Use the following options with flat file lookups:
detail pipeline, choose the Row transformation scope.
- Use indirect files as lookup sources by configuring a file list as
 The Row transformation scope allows the Integration Service to
the lookup file name
process data one row at a time.
- Use sorted input for the lookup.
 The Integration Service caches the master data and matches the
- Sort null data high or low.
detail data with the cached master data.
- Use case-sensitive string comparison with flat file lookups.
 When the source data originates from a real-time source, such as
IBM MQ Series, the Integration Service matches the cached
Pipeline Lookups
master data with each message as it is read from the detail
 Create a pipeline Lookup transformation to perform a lookup on
source.
an application source that is not a relational table or flat file.
 Use the Row transformation scope with Normal and Master
 A pipeline Lookup transformation has a source qualifier as the
Outer join types that use unsorted data.
lookup source. The source qualifier can represent any type of
source definition, including JMS and MSMQ.
Dropping Transaction Boundaries for Two Pipelines
 The source definition cannot have more than one group.
 When you want to join data from two sources or two branches
 When you configure a pipeline Lookup transformation, the
and you do not need to preserve transaction boundaries, use
lookup source and source qualifier are in a different pipeline
the All Input transformation scope.
from the Lookup transformation.
 When you use All Input, the Integration Service drops incoming
 The source and source qualifier are in a partial pipeline that
transaction boundaries for both pipelines and outputs all rows
contains no target.
from the transformation as an open transaction.
 The Integration Service reads the source data in this pipeline and
 At the Joiner transformation, the data from the master pipeline
passes the data to the Lookup transformation to create the
can be cached or joined concurrently, depending on how you
cache.
configure the sort order.
 You can create multiple partitions in the partial pipeline to
 Use this transformation scope with sorted and unsorted data and
improve performance.
any join type.

 To improve performance when processing relational or flat file


J. LOOKUP
lookup sources, create a pipeline Lookup transformation instead
 Use a Lookup transformation in a mapping to look up data in a of a relational or flat file Lookup transformation.
flat file, relational table, view, or synonym  You can create partitions to process the lookup source and pass it
 You can import a lookup definition from any flat file or relational to the Lookup transformation.
database to which both the PowerCenter Client and Integration  Create a connected or unconnected pipeline Lookup
Service can connect. transformation.
 You can also create a lookup definition from a source qualifier Note: Do not enable HA recovery for sessions that have real-time
sources for pipeline lookups. You might get unexpected results

77
Configuring a Pipeline Lookup Transformation in a Mapping
 A mapping that contains a pipeline Lookup transformation
includes a partial pipeline that contains the lookup source and
source qualifier.
 The partial pipeline does not include a target. The Integration
Service retrieves the lookup source data in this pipeline and
passes the data to the lookup cache.
 The partial pipeline is in a separate target load order group in
session properties.
 You can create multiple partitions in the pipeline to improve
performance.
 You cannot configure the target load order with the partial
pipeline.
 The following mapping shows a mapping that contains a pipeline
Lookup transformation and the partial pipeline that processes
the lookup source:

Connected Lookup Transformation


 The following steps describe how the Integration Service
processes a connected Lookup transformation:
The mapping contains the following objects: 1. A connected Lookup transformation receives input values
- The lookup source definition and source qualifier are in a directly from another transformation in the pipeline.
separate pipeline. The Integration Service creates a lookup cache 2. For each input row, the Integration Service queries the lookup
after it processes the lookup source data in the pipeline. source or cache based on the lookup ports and the condition in the
- A flat file source contains new department names by employee transformation.
number. 3. If the transformation is uncached or uses a static cache, the
- The pipeline Lookup transformation receives Employee_Number Integration Service returns values from the lookup query.
and New_Dept from the source file. The pipeline Lookup  If the transformation uses a dynamic cache, the Integration
performs a lookup on Employee_ID in the lookup cache. It Service inserts the row into the cache when it does not find the
retrieves the employee first and last name from the lookup row in the cache. When the Integration Service finds the row in
cache. the cache, it updates the row in the cache or leaves it
- A flat file target receives the Employee_ID, First_Name, unchanged. It flags the row as insert, update, or no change.
Last_Name, and New_Dept from the Lookup transformation. 4. The Integration Service passes return values from the query
to the next transformation.

 If the transformation uses a dynamic cache, you can pass rows to


a Filter or Router transformation to filter new rows to the target.
Connected or unconnected lookup - Unconnected Lookup Transformation
 A connected Lookup transformation receives source data,  An unconnected Lookup transformation receives input values
performs a lookup, and returns data to the pipeline. from the result of a :LKP expression in another transformation.
 An unconnected Lookup transformation is not connected to a  You can call the Lookup transformation more than once in a
source or target. mapping.
 A transformation in the pipeline calls the Lookup transformation  A common use for unconnected Lookup transformations is to
with a :LKP expression. update slowly changing dimension tables.
 The unconnected Lookup transformation returns one column to  For more information about slowly changing dimension tables,
the calling transformation. visit the Informatica Knowledge Base at
http://mysupport.informatica.com.

78
 The following steps describe the way the Integration Service
processes an unconnected Lookup transformation:
1. An unconnected Lookup transformation receives input values
from the result of a :LKP expression in another transformation, such
as an Update Strategy transformation.
2. The Integration Service queries the lookup source or cache based
on the lookup ports and condition in the transformation.
3. The Integration Service returns one value into the return port of
the Lookup transformation.
4. The Lookup transformation passes the return value into the :LKP
expression.

 The lookup table can be a single table, or you can join multiple
tables in the same database using a lookup SQL override.
 The Integration Service queries the lookup table or an in-memory
cache of the table for all incoming rows into the Lookup
transformation

Return port - Use only in unconnected Lookup transformations.


Designates the column of data you want to return based on the
lookup condition. You can designate one lookup port as the return
port

Use the following guidelines to configure lookup ports:


- If you delete lookup ports from a flat file lookup, the session fails.
- You can delete lookup ports from a relational lookup if the
mapping does not use the lookup port. This reduces the amount of
memory the Integration Service needs to run the session.

Lookup Properties
The following table describes the Lookup transformation properties:

79
contains a reserved word, you must ensure that all reserved
words are enclosed in quotes.
Cached or un-cached lookup - Use parameters and variables - Use parameters and variables
 Cache the lookup source to improve performance. when you enter a lookup SQL override. Use any parameter or
 If you cache the lookup source, you can use a dynamic or static variable type that you can define in the parameter file. You can
cache. enter a parameter or variable within the SQL statement, or you
 By default, the lookup cache remains static and does not change can use a parameter or variable as the SQL query. For example,
during the session. you can use a session parameter, $ParamMyLkpOverride, as the
 With a dynamic cache, the Integration Service inserts or updates lookup SQL query, and set $ParamMyLkpOverride to the
rows in the cache. SQL statement in a parameter file. The Designer cannot expand
 When you cache the target table as the lookup source, you can parameters and variables in the query override and does not
look up values in the cache to determine if the values exist in validate it when you use a parameter or variable. The Integration
the target. Service expands the parameters and variables when you run the
 The Lookup transformation marks rows to insert or update the session.
target. A lookup column name contains a slash (/) character - When
generating the default lookup query, the Designer and Integration
Lookup Query Service replace any slash character (/) in the lookup column name
 The Integration Service queries the lookup based on the ports with an underscore character. To query lookup column names
and properties you configure in the Lookup transformation. containing the slash character, override the default lookup query,
 The Integration Service runs a default SQL statement when the replace the underscore characters with the slash character, and
first row enters the Lookup transformation. enclose the column name in double quotes.
 If you use a relational lookup or a pipeline lookup against a Add a WHERE clause - Use a lookup SQL override to add a
relational table, you can customize the default query with the WHERE clause to the default SQL statement. You might want to
Lookup SQL Override property use the WHERE clause to reduce the number of rows included in
the cache. When you add a WHERE clause to a Lookup
 If you configure both the Lookup SQL Override and the Lookup transformation using a dynamic cache, use a Filter transformation
Source Filter properties, the Integration Service ignores the before the Lookup transformation to pass rows into the dynamic
Lookup Source Filter property cache that match the WHERE clause.
Note: The session fails if you include large object ports in a
Default Lookup Query WHERE clause.
 The default lookup query contains the following statements: Other - Use a lookup SQL override if you want to query lookup
SELECT - The SELECT statement includes all the lookup ports in data from multiple lookups or if you want to modify the data
the mapping. You can view the SELECT statement by generating queried from the lookup table before the Integration Service
SQL using the Lookup SQL Override property. Do not add or caches the lookup rows. For example, use TO_CHAR to convert
delete any columns from the default SQL statement. dates to strings.
ORDER BY - The ORDER BY clause orders the columns in the
same order they appear in the Lookup transformation. The Guidelines for Overriding the Lookup Query
Integration Service generates the ORDER BY clause. You cannot  Use the following guidelines when you override the lookup SQL
view this when you generate the default SQL using the Lookup query:
SQL Override property. - You can override the lookup SQL query for relational lookups.
- Generate the default query, and then configure the override.
Overriding the Lookup Query This ensures that all the lookup/output ports are included in the
 Override the lookup query in the following circumstances: query. If you add or subtract ports from the SELECT statement,
Override the ORDER BY clause - Create the ORDER BY clause with the session fails.
fewer columns to increase performance. When you override the - Add a source lookup filter to filter the rows that are added to
ORDER BY clause, you must suppress the generated ORDER BY the lookup cache. This ensures the Integration Service inserts
clause with a comment notation. rows in the dynamic cache and target table that match the
Note: If you use pushdown optimization, you cannot override the WHERE clause.
ORDER BY clause or suppress the generated ORDER BY clause - To share the cache, use the same lookup SQL override for each
with a comment notation. Lookup transformation.
A lookup table name or column names contains a reserved - If you override the ORDER BY clause, the session fails if the
word - If the table name or any column name in the lookup query ORDER BY clause does not contain the condition ports in the
80
same order they appear in the Lookup condition or if you do not  When configuring a lookup cache, you can configure the
suppress the generated ORDER BY clause with the comment following options:
notation. - Persistent cache
- If you use pushdown optimization, you cannot override the - Re-cache from lookup source
ORDER BY clause or suppress the generated ORDER BY clause - Static cache
with comment notation. - Dynamic cache
- If the table name or any column name in the lookup query - Shared cache
contains a reserved word, you must enclose all reserved words in - Pre-build lookup cache
quotes. Note: You can use a dynamic cache for relational or flat file
- You must choose the Use Any Value Lookup Policy on Mulitple lookups
Match condition to override the lookup query for an uncached
lookup. Rules and Guidelines for Returning Multiple Rows
- The Integration Service caches all rows from the lookup source
Handling Multiple Matches for cached lookups.
- Use the first matching value, or use the last matching value - You can configure an SQL override for a cached or uncached
- Use any matching value lookup that returns multiple rows.
- Use all values - You cannot enable dynamic cache for a Lookup transformation
- Return an error. When the Lookup transformation uses a static that returns multiple rows.
cache or no cache, the Integration Service marks the row as an - You cannot return multiple rows from an unconnected Lookup
error. The Lookup transformation writes the row to the session transformation.
log by default, and increases the error count by one. When the - You can configure multiple Lookup transformations to share a
Lookup transformation has a dynamic cache, the Integration named cache if the Lookup transformations have matching
Service fails the session when it encounters multiple matches. caching lookup on multiple match policies.
The session fails while the Integration Service is caching the - A Lookup transformation that returns multiple rows cannot
lookup table or looking up the duplicate key values. Also, if you share a cache with a Lookup transformation that returns one
configure the Lookup transformation to output old values on matching row for each input row.
updates, the Lookup transformation returns an error when it
encounters multiple matches. The transformation creates an Configuring Unconnected Lookup Transformations
index based on the key ports instead of all Lookup transformation  An unconnected Lookup transformation is a Lookup
ports. transformation that is not connected to a source or target.
 Call the lookup from another transformation with a :LKP
Lookup Caches expression.
 You can configure a Lookup transformation to cache the lookup  You can perform the following tasks when you call a lookup from
file or table. an expression:
 The Integration Service builds a cache in memory when it - Test the results of a lookup in an expression.
processes the first row of data in a cached Lookup - Filter rows based on the lookup results.
transformation. - Mark rows for update based on the result of a lookup and
 It allocates memory for the cache based on the amount you update slowly changing dimension tables.
configure in the transformation or session properties. - Call the same lookup multiple times in one mapping.
 The Integration Service stores condition values in the index cache
and output values in the data cache. Database Deadlock Resilience
 The Integration Service queries the cache for each row that  The Lookup transformation is resilient to a database deadlock for
enters the transformation. un-cached lookups.
 The Integration Service also creates cache files by default in the  When a database deadlock error occurs, the session does not fail.
$PMCacheDir.  The Integration Service attempts to re-execute the last statement
 If the data does not fit in the memory cache, the Integration for a specified retry period.
Service stores the overflow values in the cache files.  You can configure the number of deadlock retries and the
 When the session completes, the Integration Service releases deadlock sleep interval for an Integration Service.
cache memory and deletes the cache files unless you configure  These values also affect database deadlocks for the relational
the Lookup transformation to use a persistent cache. writer. You can override these values at the session level as
custom properties.

81
 Configure following Integration Service Properties:
NumOfDeadlockRetries - The number of times the PowerCenter Configure a pipeline Lookup transformation to improve
Integration Service retries a target write on a database deadlock. performance when processing a relational or flat file lookup
Minimum is 0. Default is 10. If you want the session to fail on source:
deadlock set NumOfDeadlockRetries to zero.  You can create partitions to process a relational or flat file lookup
DeadlockSleep - Number of seconds before the PowerCenter source when you define the lookup source as a source qualifier.
Integration Service retries a target write on database deadlock. Configure a non-reusable pipeline Lookup transformation and
If a deadlock occurs, the Integration Service attempts to run the create partitions in the partial pipeline that processes the
statement. The Integration Service waits for a delay period lookup source.
between each retry attempt. If all attempts fail due to deadlock,
the session fails. The Integration Service logs a message in the Lookup Caches
session log whenever it retries a statement.  The Integration Service builds a cache in memory when it
processes the first row of data in a cached Lookup
Tips for Lookup Transformations transformation.
Add an index to the columns used in a lookup condition:  It allocates memory for the cache based on the amount you
 If you have privileges to modify the database containing a lookup configure in the transformation or session properties.
table, you can improve performance for both cached and un-  The Integration Service stores condition values in the index cache
cached lookups. and output values in the data cache.
 This is important for very large lookup tables.  The Integration Service queries the cache for each row that
 Since the Integration Service needs to query, sort, and compare enters the transformation
values in these columns, the index needs to include every  If the data does not fit in the memory cache, the Integration
column used in a lookup condition. Service stores the overflow values in the cache files.
 When the session completes, the Integration Service releases
Place conditions with an equality operator (=) first: cache memory and deletes the cache files unless you configure
 If you include more than one lookup condition, place the the Lookup transformation to use a persistent cache.
conditions in the following order to optimize lookup  If you use a flat file or pipeline lookup, the Integration Service
performance: always caches the lookup source.
- Equal to (=)  If you configure a flat file lookup for sorted input, the Integration
- Less than (<), greater than (>), less than or equal to (<=), greater Service cannot cache the lookup if the condition columns are
than or equal to (>=) not grouped.
- Not equal to (!=)  If the columns are grouped, but not sorted, the Integration
Service processes the lookup as if you did not configure sorted
Cache small lookup tables: input
 Improve session performance by caching small lookup tables.
The result of the lookup query and processing is the same, When you configure a lookup cache, you can configure the
whether or not you cache the lookup table. following cache settings:

Join tables in the database: Building caches -


 If the lookup table is on the same database as the source table in  You can configure the session to build caches sequentially or
the mapping and caching is not feasible, join the tables in the concurrently.
source database rather than using a Lookup transformation.  When you build sequential caches, the Integration Service creates
caches as the source rows enter the Lookup transformation.
 When you configure the session to build concurrent caches, the
Integration Service does not wait for the first row to enter the
Lookup transformation before it creates caches.
Use a persistent lookup cache for static lookups:  Instead, it builds multiple caches concurrently.
 If the lookup source does not change between sessions,
configure the Lookup transformation to use a persistent lookup Persistent cache -
cache.  You can save the lookup cache files and reuse them the next time
 The Integration Service then saves and reuses cache files from the Integration Service processes a Lookup transformation
session to session, eliminating the time required to read the configured to use the cache.
lookup source.
82
 If the lookup table does not change between sessions, you can
configure the Lookup transformation to use a persistent lookup
cache

Re-cache from source -


 If the persistent cache is not synchronized with the lookup table,
you can configure the Lookup transformation to rebuild the
lookup cache.

Static cache -
 You can configure a static, or read-only, cache for any lookup
source.
 By default, the Integration Service creates a static cache.
 It caches the lookup file or table and looks up values in the cache
for each row that comes into the transformation.
 When the lookup condition is true, the Integration Service
returns a value from the lookup cache.
 The Integration Service does not update the cache while it
 The Integration Service can build lookup caches for connected
processes the Lookup transformation.
Lookup transformations in the following ways:
- Sequential caches
Dynamic cache -
- Concurrent caches
 To cache a table, flat file, or source definition and update the
 The Integration Service builds caches for unconnected Lookup
cache, configure a Lookup transformation with dynamic cache.
transformations sequentially regardless of how you configure
 The Integration Service dynamically inserts or updates data in the
cache building.
lookup cache and passes the data to the target.
 The dynamic cache is synchronized with the target.
Sequential Caches
 By default, the Integration Service builds a cache in memory
Shared cache -
when it processes the first row of data in a cached Lookup
 You can share the lookup cache between multiple
transformation.
transformations.
 The Integration Service creates each lookup cache in the pipeline
 You can share an unnamed cache between transformations in the
sequentially.
same mapping.
 The Integration Service waits for any upstream active
 You can share a named cache between transformations in the
transformation to complete processing before it starts
same or different mappings.
processing the rows in the Lookup transformation.
 Lookup transformations can share unnamed static caches within
 The Integration Service does not build caches for a downstream
the same target load order group if the cache sharing rules
Lookup transformation until an upstream Lookup transformation
match. Lookup transformations cannot share dynamic cache
completes building a cache.
within the same target load order group.
 For example, the following mapping contains an unsorted
Aggregator transformation followed by two Lookup
 When you do not configure the Lookup transformation for
transformations:
caching, the Integration Service queries the lookup table for
each input row.
 The result of the Lookup query and processing is the same,
whether or not you cache the lookup table.
 However, using a lookup cache can increase session performance.

 Configuring sequential caching may allow you to avoid building


lookup caches unnecessarily.
 For example, a Router transformation might route data to one
pipeline if a condition resolves to true, and it might route data
to another pipeline if the condition resolves to false.

83
 In this case, a Lookup transformation might not receive data at  Configure the Teradata table as a relational target in the mapping
all. and pass the lookup cache changes back to the Teradata table.

Concurrent Caches Note - When you create multiple partitions in a pipeline that use
 You can configure the Integration Service to create lookup caches a dynamic lookup cache, the Integration Service creates one
concurrently. memory cache and one disk cache for each transformation.
 You may be able to improve session performance using However, if you add a partition point at the Lookup
concurrent caches. transformation, the Integration Service creates one memory
 Performance may especially improve when the pipeline contains cache for each partition.
an active transformation upstream of the Lookup
transformation. Dynamic Lookup Properties -
 You may want to configure the session to create concurrent A Lookup transformation with a dynamic cache has the following
caches if you are certain that you will need to build caches for properties:
each of the Lookup transformations in the session. NewLookupRow - The Designer adds this port to a Lookup
transformation configured to use a dynamic cache. Indicates with
Dynamic Lookup Cache a numeric value whether the Integration Service inserts or
 The following list describes some situations when you use a updates the row in the cache, or makes no change to the cache.
dynamic lookup cache: To keep the lookup cache and the target table synchronized, pass
Updating a master customer table with new and updated rows to the target when the NewLookupRow value is equal to 1
customer information - or 2.
 Use a Lookup transformation to perform a lookup on the
customer table to determine if a customer exists in the target. Associated Expression - Associate lookup ports or the associated
 The cache represents the customer table. ports with an expression, an input/output port, or a sequence ID.
 The Lookup transformation inserts and updates rows in the cache The Integration Service uses the data in the associated expression
as it passes rows to the target. to insert or update rows in the lookup cache. If you associate a
sequence ID, the Integration Service generates a primary key for
Inserting rows into a master customer table from multiple real- inserted rows in the lookup cache.
time sessions -
 Use a Lookup transformation in each session to perform a lookup Ignore Null Inputs for Updates - The Designer activates this port
on the same customer table. property for lookup/output ports when you configure the Lookup
 Each Lookup transformation inserts rows into the customer table transformation to use a dynamic cache. Select this property when
and it inserts them in the dynamic lookup cache. you do not want the Integration Service to update the column in
 For more information about synchronizing dynamic cache the cache when the data in this column contains a null value.
between multiple sessions, see “Synchronizing Cache with the
Lookup Source” on page 278. Ignore in Comparison - The Designer activates this port property
for lookup/output ports not used in the lookup condition when
Loading data into a slowly changing dimension table and a fact you configure the Lookup transformation to use a dynamic cache.
table - The Integration Service compares the values in all lookup ports
 Create two pipelines and configure a Lookup transformation that with the values in their associated input ports by default. Select
performs a lookup on the dimension table. this property if you want the Integration Service to ignore the
 Use a dynamic lookup cache to load data to the dimension table. port when it compares values before updating a row.
 Use a static lookup cache to load data to the fact table, and
specify the name of the dynamic cache from the first pipeline. Update Dynamic Cache Condition - Allow the Integration Service
to update the dynamic cache conditionally. You can create a
Boolean expression that determines whether to update the cache
for an input row. Or, you can enable the Integration Service to
Reading a flat file that is an export from a relational table - update the cache with an expression result for an input row. The
 Read data from a Teradata table when the ODBC connection is expression can contain values from the input row or the lookup
slow. cache
 You can export the Teradata table contents to a flat file and use
the file as a lookup source. Rules and Guidelines for Dynamic Lookup Caches

84
Use the following guidelines when you use a dynamic lookup - Select Insert and Update as Update for the target table options
cache: in the session properties
- You cannot share the cache between a dynamic Lookup
transformation and static Lookup transformation in the same K. NORMALIZER
target load order group.  The Normalizer transformation generates a key for each source
- You can create a dynamic lookup cache from a relational table, row.
flat file, or source qualifier transformation.  This generated key remains same for the output group created
- The Lookup transformation must be a connected for each source row.
transformation.  The Integration Service increments the generated key sequence
- Use a persistent or a non-persistent cache. number each time it processes a source row
- If the dynamic cache is not persistent, the Integration Service
always rebuilds the cache from the database, even if you do not You can create a VSAM Normalizer transformation or a pipeline
enable Re-cache from Lookup Source. When you synchronize Normalizer transformation:
dynamic cache files with a lookup source table, the Lookup VSAM Normalizer transformation -
transformation inserts rows into the lookup source table and the  A non-reusable transformation that is a Source Qualifier
dynamic lookup cache. If the source row is an update row, the transformation for a COBOL source.
Lookup transformation updates the dynamic lookup cache only.  The Mapping Designer creates VSAM Normalizer columns from a
- You can only create an equality lookup condition. You cannot COBOL source in a mapping.
look up a range of data in dynamic cache.  The column attributes are read-only.
- Associate each lookup port that is not in the lookup condition  The VSAM Normalizer receives a multiple-occurring source
with an input port, sequence ID, or expression. column through one input port.
- Use a Router transformation to pass rows to the cached target
when the NewLookupRow value equals one or two. Use the Pipeline Normalizer transformation -
Router transformation to drop rows when the NewLookupRow  A transformation that processes multiple-occurring data from
value equals zero, or you can output those rows to a different relational tables or flat files.
target.  You create the columns manually and edit them in the
- Verify that you output the same values to the target that the Transformation Developer or Mapping Designer.
Integration Service writes to the lookup cache. When you choose  The pipeline Normalizer transformation represents multiple-
to output new values on update, only connect lookup/output occurring columns with one input port for each source column
ports to the target table instead of input/output ports. When you occurrence.
choose to output old values on update, add an Expression
transformation after the Lookup transformation and before the  When a Normalizer transformation receives more than one type
Router transformation. Add output ports in the Expression of data from a COBOL source, you need to connect the
transformation for each port in the target table and create Normalizer output ports to different targets based on the type
expressions to ensure you do not output null input values to the of data in each row.
target.
- When you use a lookup SQL override, map the correct columns
to the appropriate targets for lookup.
- When you add a WHERE clause to the lookup SQL override, use
a Filter transformation before the Lookup transformation. This
ensures the Integration Service inserts rows in the dynamic cache Troubleshooting Normalizer Transformations
and target table that match the WHERE clause. I cannot edit the ports in my Normalizer transformation when
- When you configure a reusable Lookup transformation to use a using a relational source.
dynamic cache, you cannot edit the condition or disable the When you create ports manually, add them on the Normalizer
Dynamic Lookup Cache property in a mapping. tab in the transformation, not the Ports tab.
- Use Update Strategy transformations after the Lookup
transformation to flag the rows for insert or update for the target. Importing a COBOL file failed with numberrors. What should I
- Use an Update Strategy transformation before the Lookup do?
transformation to define some or all rows as update if you want Verify that the COBOL program follows the COBOL standard,
to use the Update Else Insert property in the Lookup including spaces, tabs, and end of line characters.
transformation.
- Set the row type to Data Driven in the session properties. The COBOL file headings should be similar to the following text:
85
identification division.  The Rank transformation differs from the transformation
program-id. mead. functions MAX and MIN, in that it lets you select a group of
environment division. top or bottom values, not just one value.
select file-one assign to "fname".  You can also write expressions to transform data or perform
data division. calculations. You can also create local variables and write non-
file section. aggregate expressions.
fd FILE-ONE.
 When the Integration Service runs in the ASCII data movement
The Designer does not read hidden characters in the COBOL mode, it sorts session data using a binary sort order.
program. Use a text-only editor to make changes to the COBOL  When the Integration Service runs in Unicode data movement
file. Do not use Word or Wordpad. Remove extra spaces. mode, the Integration Service uses the sort order configured for
the session
A session that reads binary data completed, but the information
in the target table is incorrect. Rank Caches
Edit the session in the Workflow Manager and verify that the  During a session, the Integration Service compares an input row
source file format is set correctly. The file format might be with rows in the data cache.
EBCDIC or ASCII. The number of bytes to skip between records  If the input row outranks a cached row, the Integration Service
must be set to 0. replaces the cached row with the input row.
 If you configure the Rank transformation to rank across multiple
I have a COBOL field description that uses a non-IBM COMP groups, the Integration Service ranks incrementally for each
type. How should I import the source? group it finds.
In the source definition, clear the IBM COMP option.  The Integration Service stores group information in an index
cache and row data in a data cache.
In my mapping, I use one Expression transformation and one  If you create multiple partitions in a pipeline, the Integration
Lookup transformation to modify two output ports from the Service creates separate caches for each partition.
Normalizer transformation. The mapping concatenates them
into a single transformation. All the ports are under the same  When you create a Rank transformation, you can configure the
level. When I check the data loaded in the target, it is incorrect. following properties:
Why is that? - Enter a cache directory.
You can only concatenate ports from level one. Remove the - Select the top or bottom rank.
concatenation. - Select the input/output port that contains values used to
determine the rank.
- You can select only one port to define a rank.
- Select the number of rows falling within a rank.
- Define groups for ranks, such as the 10 least expensive
products for each manufacturer

M. ROUTER
L. RANK  A Filter transformation tests data for one condition and drops the
 You can select only the top or bottom rank of data with Rank rows of data that do not meet the condition.
transformation.  However, a Router transformation tests data for one or more
 Use a Rank transformation to return the largest or smallest conditions and gives you the option to route rows of data that
numeric value in a port or group. do not meet any of the conditions to a default output group
 You can also use a Rank transformation to return the strings at  When you use a Router transformation in a mapping, the
the top or the bottom of a session sort order. Integration Service processes the incoming data only once.
 During the session, the Integration Service caches input data until  When you use multiple Filter transformations in a mapping, the
it can perform the rank calculations. Integration Service processes the incoming data for each
transformation.
 You cannot modify or delete output ports or their properties

86
 The Integration Service determines the order of evaluation for  If the Sequence Generator is not configured to cycle through the
each condition based on the order of the connected output sequence, the NEXTVAL port generates sequence numbers up to
groups. the configured End Value.
 The Integration Service processes user-defined groups that are
connected to a transformation or a target in a mapping.  For example, you might connect NEXTVAL to two targets in a
 The Integration Service only processes user-defined groups that mapping to generate unique primary key values.
are not connected in a mapping if the default group is  The Integration Service creates a column of unique primary key
connected to a transformation or a target. values for each target table.
 If a row meets more than one group filter condition, the  The column of unique primary key values is sent to one target
Integration Service passes this row multiple times table as a block of sequence numbers.
 The Designer deletes the default group when you delete the last  The other target receives a block of sequence numbers from the
user-defined group from the list. Sequence Generator transformation after the first target
receives the block of sequence numbers.
N. SEQUENCE GENERATOR
 The Sequence Generator transformation generates numeric  For example, you configure the Sequence Generator
values. transformation as follows:
 Use the Sequence Generator to create unique primary key values, Current Value = 1, Increment By = 1.
replace missing primary keys, or cycle through a sequential
range of numbers.  The Integration Service generates the following primary key
 The Sequence Generator transformation is a connected values for the T_ORDERS_PRIMARY and T_ORDERS_FOREIGN
transformation. target tables:
 It contains two output ports that you can connect to one or more T_ORDERS_PRIMARY TABLE: T_ORDERS_FOREIGN TABLE:
transformations. PRIMARY KEY PRIMARY KEY
1 6
 The Integration Service generates a block of sequence numbers 2 7
each time a block of rows enters a connected transformation 3 8
 If you connect CURRVAL, the Integration Service processes one 4 9
row in each block. 5 10
 When NEXTVAL is connected to the input port of another
transformation, the Integration Service generates a sequence of  If you want the same values to go to more than one target that
numbers. receives data from a single transformation, you can connect a
 When CURRVAL is connected to the input port of another Sequence Generator transformation to that preceding
transformation, the Integration Service generates the NEXTVAL transformation.
value plus the Increment By value  The Integration Service processes the values into a block of
sequence numbers.
 You can make a Sequence Generator reusable, and use it in  This allows the Integration Service to pass unique values to the
multiple mappings transformation, and then route rows from the transformation to
 You can use a range of values from 1 to targets.
9,223,372,036,854,775,807 with the smallest interval of 1.
 The following figure shows a mapping with a Sequence Generator
 The Sequence Generator transformation has two output ports: that passes unique values to the Expression transformation.
NEXTVAL and CURRVAL. You cannot edit or delete these ports.  The Expression transformation populates both targets with
 Likewise, you cannot add ports to the transformation. identical primary key values.

NEXTVAL
 Connect NEXTVAL to multiple transformations to generate unique
values for each row in each transformation.
 Use the NEXTVAL port to generate sequence numbers by
connecting it to a downstream transformation or target.
 You connect the NEXTVAL port to generate the sequence based
on the Current Value and Increment By properties.

87
CURRVAL
 CURRVAL is NEXTVAL plus the Increment By value. You typically
only connect the CURRVAL port when the NEXTVAL port is
already connected to a downstream transformation.
 When a row enters a transformation connected to the CURRVAL
port, the Integration Service passes the last created NEXTVAL
value plus one.
 The following figure shows connecting CURRVAL and NEXTVAL
ports to a target:

 For example, you configure the Sequence Generator


transformation as follows:
Current Value = 1, Increment By = 1.

 The Integration Service generates the following values for


NEXTVAL and CURRVAL:
NEXTVAL CURRVAL
1 2
2 3
3 4
4 5
5 6

 If you connect the CURRVAL port without connecting the


NEXTVAL port, the Integration Service passes a constant value
for each row.
 When you connect the CURRVAL port in a Sequence Generator
transformation, the Integration Service processes one row in
each block.
 You can optimize performance by connecting only the NEXTVAL
port in a mapping.

Sequence Generator Transformation Properties


 The following table describes the Sequence Generator
End Value
transformation properties you can configure:
 End Value is the maximum value you want the Integration Service
to generate.
 If the Integration Service reaches the end value and the
Sequence Generator is not configured to cycle through the
sequence, the session fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.

Number of Cached Values


 When you have a reusable Sequence Generator transformation in
several sessions and the sessions run at the same time, use
Number of Cached Values to ensure each session receives
unique values in the sequence.
 By default, Number of Cached Values is set to 1000 for reusable
Sequence Generators.
 For non-reusable Sequence Generator, Number of Cached
Values is set to 0 by default

88
Reset  Allocate at least 16 MB (16,777,216 bytes) of physical memory to
 If you select Reset for a non-reusable Sequence Generator sort data using the Sorter transformation.
transformation, the Integration Service generates values based  Sorter cache size is set to 16,777,216 bytes by default.
on the original current value each time it starts the session.
 Otherwise, the Integration Service updates the current value to  The Integration Service requires disk space of at least twice the
reflect the last-generated value plus one, and then uses the amount of incoming data when storing data in the work
updated value the next time it uses the Sequence Generator directory
transformation.
 For example, you might configure a Sequence Generator  Use the following formula to determine the size of incoming data:
transformation to create values from 1 to 1,000 with an number_of_input_rows [( S column_size) + 16]
increment of 1, and a current value of 1 and choose Reset.
 During the first session run, the Integration Service generates P. SOURCE QUALIFIER
numbers 1 through 234.  The Source Qualifier transformation represents the rows that the
 Each subsequent time the session runs, the Integration Service Integration Service reads when it runs a session.
again generates numbers beginning with the current value of 1.  Use the Source Qualifier transformation to complete the
 If you do not select Reset, the Integration Service updates the following tasks:
current value to 235 at the end of the first session run. Join data originating from the same source database -
 The next time it uses the Sequence Generator transformation, You can join two or more tables with primary key foreign key
the first value generated is 235. relationships by linking the sources to one Source Qualifier
Note: Reset is disabled for reusable Sequence Generator transformation.
transformations. Filter rows when the Integration Service reads source data -
If you include a filter condition, the Integration Service adds a
O. SORTER WHERE clause to the default query.
 You can sort data from relational or flat file sources Specify an outer join rather than the default inner join -
 When you specify multiple ports for the sort key, the Integration If you include a user-defined join, the Integration Service replaces
Service sorts each port sequentially. the join information specified by the metadata in the SQL query.
 The order the ports appear in the Ports tab determines the Specify sorted ports -
succession of sort operations. If you specify a number for sorted ports, the Integration Service
 The Sorter transformation treats the data passing through each adds an ORDER BY clause to the default SQL query.
successive sort key port as a secondary sort of the previous port. Select only distinct values from the source -
If you choose Select Distinct, the Integration Service adds a
Sorter Cache SELECT DISTINCT statement to the default SQL query.
 You can configure a numeric value for the Sorter cache, or you Create a custom query to issue a special SELECT statement for
can configure the Integration Service to determine the cache the Integration Service to read source data -
size at run time For example, you might use a custom query to perform aggregate
 If you configure the Integration Service to determine the cache calculations
size, you can also configure a maximum amount of memory for
the Integration Service to allocate to the cache.  If the datatypes in the source definition and Source Qualifier
 If the total configured session cache size is 2 GB (2,147,483,648 transformation do not match, the Designer marks the mapping
bytes) or greater, you must run the session on a 64-bit invalid when you save it.
Integration Service  You specify a target load order based on the Source Qualifier
transformations in a mapping
 Before starting the sort operation, the Integration Service  If one Source Qualifier transformation provides data for multiple
allocates the amount of memory configured for the Sorter cache targets, you can enable constraint-based loading in a session to
size. have the Integration Service load data based on target table
 If the Integration Service runs a partitioned session, it allocates primary and foreign key relationships
the specified amount of Sorter cache memory for each partition  You can use parameters and variables in the SQL query, user-
 If it cannot allocate enough memory, the Integration Service fails defined join, source filter, and pre- and post-session SQL
the session. commands of a Source Qualifier transformation
 For best performance, configure Sorter cache size with a value  The Integration Service first generates an SQL query and
less than or equal to the amount of available physical RAM on expands each parameter or variable.
the Integration Service machine.
89
 It replaces each mapping parameter, mapping variable, and  Use the Joiner transformation for heterogeneous sources and to
workflow variable with its start value. join flat files.
 Then it runs the query on the source database
Viewing the Default Query
Source Qualifier Transformation Properties  Do not connect to the source database. You only connect to the
source database when you enter an SQL query that overrides
the default query.
 You must connect the columns in the Source Qualifier
transformation to another transformation or target before you
can generate the default query

Default Join -
 When you join related tables in one Source Qualifier
transformation, the Integration Service joins the tables based on
the related keys in each table
 This default join is an inner equijoin, using the following syntax in
the WHERE clause:
Source1.column_name = Source2.column_name

 The columns in the default join must have:


- A primary key-foreign key relationship
- Matching datatypes

Custom Join -
You might need to override the default join under the following
circumstances:
- Columns do not have a primary key-foreign key relationship.
- The datatypes of columns used for the join do not match.
- You want to specify a different type of join, such as an outer
join.
Default Query
 For relational sources, the Integration Service generates a query Adding an SQL Query
for each Source Qualifier transformation when it runs a session.  The Source Qualifier transformation provides the SQL Query
 The default query is a SELECT statement for each source column option to override the default query.
used in the mapping.  You can enter an SQL statement supported by the source
 In other words, the Integration Service reads only the columns database.
that are connected to another transformation  Before entering the query, connect all the input and output ports
you want to use in the mapping.
 If any table name or column name contains a database reserved
word, you can create and maintain a file, reswords.txt, Entering a User-Defined Join
containing reserved words.  Entering a user-defined join is similar to entering a custom SQL
 When the Integration Service initializes a session, it searches for query.
reswords.txt in the Integration Service installation directory.  However, you only enter the contents of the WHERE clause, not
 If the file exists, the Integration Service places quotes around the entire query.
matching reserved words when it executes SQL against the  When you perform an outer join, the Integration Service may
database. insert the join syntax in the WHERE clause or the FROM clause
 If you override the SQL, you must enclose any reserved word in of the query, depending on the database syntax.
quotes.
 When a mapping uses related relational sources, you can join  When you add a user-defined join, the Source Qualifier
both sources in one Source Qualifier transformation. transformation includes the setting in the default SQL query.
 During the session, the source database performs the join before  However, if you modify the default query after adding a user-
passing data to the Integration Service defined join, the Integration Service uses only the query
90
defined in the SQL Query property of the Source Qualifier Sorted Ports -
transformation.  When you use sorted ports, the Integration Service adds the
ports to the ORDER BY clause in the default query.
 When including a string mapping parameter or variable, use a  The sorted ports are applied on the connected ports rather than
string identifier appropriate to the source system. the ports that start at the top of the SQ
 For most databases, you need to enclose the name of a string  Use sorted ports for relational sources only.
parameter or variable in single quotes.  When using sorted ports, the sort order of the source database
must match the sort order configured for the session.
Outer Join Support  To ensure data is sorted as the Integration Service requires, the
 The Integration Service supports two kinds of outer joins: database sort order must be the same as the user-defined
Left - Integration Service returns all rows for the table to the left session sort order
of the join syntax and the rows from both tables that meet the  The Source Qualifier transformation includes the number of
join condition sorted ports in the default SQL query.
Right - Integration Service returns all rows for the table to the  However, if you modify the default query after choosing the
right of the join syntax and the rows from both tables that meet Number of Sorted Ports, the Integration Service uses only the
the join condition query defined in the SQL Query property.

Informatica Join Syntax Pre and post-session SQL commands -


 When you enter join syntax, use the Informatica or database-  You can add pre- and post-session SQL commands on the
specific join syntax. Properties tab in the Source Qualifier transformation
 When you use the Informatica join syntax, the Integration Service  The Integration Service runs pre-session SQL commands against
translates the syntax and passes it to the source database during the source database before it reads the source.
the session.  It runs post-session SQL commands against the source database
Note: Always use database-specific syntax for join conditions. after it writes to the target
 When you use Informatica join syntax, enclose the entire join
statement in braces ({Informatica syntax}). Guidelines for pre- and post-session SQL commands in SQ:
 When you use database syntax, enter syntax supported by the - Use any command that is valid for the database type. However,
source database without braces. the Integration Service does not allow nested comments, even
though the database might.
Normal Join Syntax - You can use parameters and variables in source pre- and post-
{ source1 INNER JOIN source2 on join_condition } session SQL commands or you can use a parameter or variable as
the command. Use any parameter or variable type that you can
Left Outer Join Syntax define in the parameter file.
{ source1 LEFT OUTER JOIN source2 on join_condition } - Use a semicolon (;) to separate multiple statements. The
Integration Service issues a commit after each statement.
Right Outer Join Syntax - The Integration Service ignores semicolons within /*...*/.
{ source1 RIGHT OUTER JOIN source2 on join_condition } - If you need to use a semicolon outside of comments, you can
escape it with a backslash (\). When you escape the semicolon,
Entering a Source Filter the Integration Service ignores the backslash, and it does not use
 You can enter a source filter to reduce the number of rows the the semicolon as a statement separator.
Integration Service queries. - The Designer does not validate the SQL.
 If you include the string ‘WHERE’ or large objects in the source
filter, the Integration Service fails the session. Note: You can also enter pre- and post-session SQL commands on
 The Source Qualifier transformation includes source filters in the Properties tab of the target instance in a mapping
the default SQL query.
 If, however, if you modify the default query after adding a Troubleshooting Source Qualifier Transformations
source filter, the Integration Service uses only the query I cannot connect a source definition to a target definition.
defined in the SQL query portion of the Source Qualifier  You cannot directly connect sources to targets. Instead, you need
transformation. to connect them through a Source Qualifier transformation for
 You can use a parameter or variable as the source filter or include relational and flat file sources, or through a Normalizer
parameters and variables within the source filter transformation for COBOL sources.

91
I cannot connect multiple sources to one target.
 The Designer does not allow you to connect multiple Source When you create an SQL transformation, you configure the
Qualifier transformations to a single target. There are two following options:
workarounds: Mode - The SQL transformation runs in one of the following
Reuse targets - Since target definitions are reusable, you can add modes:
the same target to the mapping multiple times. Then connect Script mode: The SQL transformation runs ANSI SQL scripts that
each Source Qualifier transformation to each target. are externally located. You pass a script name to the
Join the sources in a Source Qualifier transformation. Then transformation with each input row. The SQL transformation
remove the WHERE clause from the SQL query. outputs one row for each input row.
Query mode: The SQL transformation executes a query that you
The source has QNAN (not a number) values in some columns, define in a query editor. You can pass strings or parameters to the
but the target shows 1.#QNAN. query to define dynamic queries or change the selection
 Operating systems have different string representations of NaN. parameters. You can output multiple rows when the query has a
The Integration Service converts QNAN values to 1.#QNAN on SELECT statement.
Win64EMT platforms. 1.#QNAN is a valid representation of
QNAN. Passive or active transformation - The SQL transformation is an
active transformation by default. You can configure it as a
I entered a custom query, but it is not working when I run the passive transformation when you create the transformation.
workflow containing the session. Database type - The type of database the SQL transformation
 Be sure to test this setting for the Source Qualifier transformation connects to.
before you run the workflow. Reopen the dialog box in which Connection type - Pass database connection information to the
you entered the custom query. You can connect to a database SQL transformation or use a connection object.
and click the Validate button to test the SQL. The Designer
displays any errors. Script Mode –
 The most common reason a session fails is because the  An SQL transformation running in script mode runs SQL scripts
database login in both the session and Source Qualifier from text files.
transformation is not the table owner. You need to specify the  You pass each script file name from the source to the SQL
table owner in the session and when you generate the SQL transformation ScriptName port.
Query in the Source Qualifier transformation.  The script file name contains the complete path to the script file.
 You can test the SQL Query by cutting and pasting it into the  When you configure the transformation to run in script mode,
database client tool (such as Oracle Net) to see if it returns an you create a passive transformation
error.  The transformation returns one row for each input row

I used a mapping variable in a source filter and now the session


fails.
 Try testing the query by generating and validating the SQL in the
Source Qualifier transformation. If the variable or parameter is a
string, you probably need to enclose it in single quotes. If it is a
datetime variable or parameter, you might need to change its
format for the source system.
 The Integration Service ignores the output of any SELECT
statement you include in the SQL script.
Q. SQL TRANSFORMATION
 The SQL transformation in script mode does not output more
 The SQL transformation processes SQL queries midstream in a than one row of data for each input row.
pipeline.  You cannot use nested scripts where the SQL script calls another
 You can insert, delete, update, and retrieve rows from a SQL script.
database.  A script cannot accept run-time arguments
 You can pass the database connection information to the SQL
transformation as input data at run time. Query Mode -
 The transformation processes external SQL scripts or SQL queries  It executes an SQL query that you define in the transformation.
that you create in an SQL editor.  When you configure the SQL transformation to run in query
 The SQL transformation processes the query and returns rows mode, you create an active transformation
and database errors
92
 The transformation can return multiple rows for each input row.  If the query result contains multiple rows, the SQL transformation
repeats the pass-through data in each row
 When you create a query, the SQL Editor validates the port
names in the query. Passive Mode Configuration
 It also verifies that the ports you use for string substitution are  When you create a SQL transformation, you can configure the
string datatypes. SQL transformation to run in passive mode instead of active
 The SQL Editor does not validate the syntax of the SQL query mode.
 You cannot change the mode after you create the transformation
 You can create the following types of SQL queries in the SQL
transformation: Guidelines to configure the SQL transformation to run in passive
Static SQL query - The query statement does not change, but you mode:
can use query parameters to change the data. The Integration - If a SELECT query returns more than one row, the Integration
Service prepares the query once and runs the query for all input Service returns the first row and an error to the SQLError port.
rows. The error states that the SQL transformation generated multiple
Dynamic SQL query - You can change the query statements and rows.
the data. The Integration Service prepares a query for each input - If the SQL query has multiple SQL statements, then the
row. Integration Service executes all the statements. The Integration
Service returns data for the first SQL statement only. The SQL
Static Query - transformation returns one row. The SQLError port contains the
 When you create a static query, the Integration Service prepares errors from all the SQL statements. When multiple errors occur,
the SQL procedure once and executes it for each row. they are separated by semi-colons in the SQLError port.
 When you create a dynamic query, the Integration Service - If the SQL query has multiple SQL statements and a statistics
prepares the SQL for each input row. port is enabled, the Integration Service returns the data and
 You can optimize performance by creating static queries statistics for the first SQL statement. The SQLError port contains
 Bind a parameter to an input port - SQL Editor encloses the the errors for all the SQL statements.
name in question marks (?)
Guidelines to configure the SQL transformation to run in query
 When the SQL query contains a SELECT statement, the output mode:
ports must be in the same order as the columns in the SELECT - The number and the order of the output ports must match the
statement. number and order of the fields in the query SELECT clause.
- The native datatype of an output port in the transformation
Dynamic Query - must match the datatype of the corresponding column in the
 To change a query statement, configure a string variable in the database. The Integration Service generates a row error when the
query for the portion of the query you want to change. datatypes do not match.
 To configure the string variable, identify an input port by name in - When the SQL query contains an INSERT, UPDATE, or DELETE
the query and enclose the name with the tilde (~). clause, the transformation returns data to the SQLError port,
 The query changes based on the value of the data in the port. the pass-through ports, and the NumRowsAffected port when it
 The transformation input port that contains the query parameter is enabled. If you add output ports the ports receive NULL data
must be a string datatype. values.
- When the SQL query contains a SELECT statement and the
 You can pass the full query or pass part of the query in an input transformation has a pass-through port, the transformation
port: returns data to the pass-through port whether or not the query
Full query - You can substitute the entire SQL query with query returns database data. The SQL transformation returns a row
statements from source data. with NULL data in the output ports.
Partial query - You can substitute a portion of the query - You cannot add the "_output" suffix to output port names that
statement, such as the table name. you create.
- You cannot use the pass-through port to return data from a
 You can add pass-through ports to the SQL transformation SELECT query.
 When the source row contains a SELECT query statement, the - When the number of output ports is more than the number of
SQL transformation returns the data in the pass-through port in columns in the SELECT clause, the extra ports receive a NULL
each row it returns from the database. value.

93
- When the number of output ports is less than the number of For a dynamic connection, if the retry attempt fails, the
columns in the SELECT clause, the Integration Service generates Integration Service returns an error in the SQLError port. The
a row error. Integration Service processes the next statement based on the
- You can use string substitution instead of parameter binding in a Continue on SQL Error within Row property. If the property is
query. However, the input ports must be string datatypes. disabled, the Integration Service skips the current row. If the
current row contains a DML statement such as INSERT, UPDATE,
or DELETE, the Integration Service increments the error counts.
Ways to connect the SQL transformation to a database:
Static connection - Configure the connection object in the For a static connection, if the retry attempts fail, the Integration
session. You must first create the connection object in Workflow Service returns an error in the SQLError port. If the current row
Manager. contains a DML statement, then the Integration Service fails the
Logical connection - Pass a connection name to the SQL session. The Integration Service processes the next statement
transformation as input data at run time. You must first create based on Continue on SQL Error within a Row property. If the
the connection object in Workflow Manager. property is disabled the Integration Service skips the current row.
Full database connection - Pass the connect string, user name,
password, and other connection information to the SQL <Print - 361 to 364>
transformation input ports at run time
R. STORED PROCEDURE
Note: If a session has multiple partitions, the SQL transformation There are three types of data that pass between the Integration
creates a separate database connection for each partition. Service and the stored procedure:
- Input/output parameters
 The following transaction control SQL statements are not valid - Return values
with the SQL transformation: - Status codes
SAVEPOINT - Identifies a rollback point in the transaction.
SET TRANSACTION - Changes transaction options.  If a stored procedure returns a result set rather than a single
return value, the Stored Procedure transformation takes only
 When you have high availability, the SQL transformation provides the first value returned from the procedure
database connection resiliency for static and dynamic
connections. When the Integration Service fails to connect to Status Codes
the database, it retries the connection.  Status codes provide error handling for the Integration Service
 You can configure the connection retry period for a connection during a workflow.
 When the Integration Service cannot connect to the database in  The stored procedure issues a status code that notifies whether
the time period that you configure, it generates a row error for a or not the stored procedure completed successfully.
dynamic connection or fails the session for a static connection.  You cannot see this value.
 The Integration Service uses it to determine whether to continue
Database Deadlock Resiliency running the session or stop
 The SQL transformation is resilient to database deadlock errors
when you enable the Session Retry on Deadlock session Connected - The flow of data through a mapping in connected
property. mode also passes through the Stored Procedure transformation.
 The SQL transformation is resilient to database deadlock errors All data entering the transformation through the input ports
in Query mode but it is not resilient to deadlock errors in Script affects the stored procedure. You should use a connected Stored
mode. Procedure transformation when you need data from an input
 If a deadlock occurs in Query mode, the Integration Service tries port sent as an input parameter to the stored procedure, or the
to reconnect to the database for the number of deadlock retries results of a stored procedure sent as an output parameter to
that you configure. another transformation.
 When a deadlock occurs, the Integration Service retries the SQL
statements in the current row if the current row has no DML Unconnected - The unconnected Stored Procedure
statements. transformation is not connected directly to the flow of the
 If the row contains a DML statement such as INSERT, UPDATE, or mapping. It either runs before or after the session, or is called by
DELETE, the Integration Service does not process the current an expression in another transformation in the mapping.
row again

94
executes the stored procedures in the execution order that you
specify in the mapping

 The Integration Service opens the database connection when it


encounters the first stored procedure.
 The database connection remains open until the Integration
Service finishes processing all stored procedures for that
connection.
 The Integration Service closes the database connections and
opens a new one when it encounters a stored procedure using
a different database connection.

 To run multiple stored procedures that use the same database


connection, set these stored procedures to run consecutively.
 If you do not set them to run consecutively, you might have
unexpected results in the target.
Specifying when the Stored Procedure Runs  For example, you have two stored procedures: Stored Procedure
The following list describes the options for running a Stored A and Stored Procedure B. Stored Procedure A begins a
Procedure transformation: transaction, and Stored Procedure B commits the transaction.
Normal - The stored procedure runs where the transformation  If you run Stored Procedure C before Stored Procedure B, using
exists in the mapping on a row-by-row basis. This is useful for another database connection, Stored Procedure B cannot
calling the stored procedure for each row of data that passes commit the transaction because the Integration Service closes
through the mapping, such as running a calculation against an the database connection when it runs Stored Procedure C.
input port. Connected stored procedures run only in normal
mode.  Use the following guidelines to run multiple stored procedures
Pre-load of the Source - Before the session retrieves data from within a database connection:
the source, the stored procedure runs. This is useful for verifying - The stored procedures use the same database connect string
the existence of tables or performing joins of data in a temporary defined in the stored procedure properties.
table. - You set the stored procedures to run in consecutive order.
Post-load of the Source - After the session retrieves data from - The stored procedures have the same stored procedure type:
the source, the stored procedure runs. This is useful for removing - Source pre-load
temporary tables. - Source post-load
Pre-load of the Target - Before the session sends data to the - Target pre-load
target, the stored procedure runs. This is useful for verifying - Target post-load
target tables or disk space on the target system. Creating a Stored Procedure Transformation
Post-load of the Target - After the session sends data to the  After you configure and test a stored procedure in the database,
target, the stored procedure runs. This is useful for re-creating you must create the Stored Procedure transformation in the
indexes on the database. Mapping Designer. There are two ways to configure the Stored
Procedure transformation:
 You can run more than one Stored Procedure transformation in - Use the Import Stored Procedure dialog box to configure the
different modes in the same mapping. ports used by the stored procedure.
 For example, a pre-load source stored procedure can check table - Configure the transformation manually, creating the
integrity, a normal stored procedure can populate the table, and appropriate ports for any input or output parameters.
a post-load stored procedure can rebuild indexes in the
database.  Stored Procedure transformations are created as Normal type by
 However, you cannot run the same instance of a Stored default, which means that they run during the mapping, not
Procedure transformation in both connected and unconnected before or after the session.
mode in a mapping.  New Stored Procedure transformations are not created as
 You must create different instances of the transformation reusable transformations.
 To create a reusable transformation, click Make Reusable in the
 If the mapping calls more than one source or target pre- or post- Transformation properties after creating the transformation.
load stored procedure in a mapping, the Integration Service
95
 Note: Configure the properties of reusable transformations in the
Transformation Developer, not the Mapping Designer, to make  When using an unconnected Stored Procedure transformation
changes globally for the transformation. in an expression, you need a method of returning the value of
output parameters to a port. Use one of the following methods
Importing Stored Procedures to capture the output values:
 When you import a stored procedure, the Designer creates ports - Assign the output value to a local variable.
based on the stored procedure input and output parameters. - Assign the output value to the system variable PROC_RESULT.
 You should import the stored procedure whenever possible.
 There are three ways to import a stored procedure in the  By using PROC_RESULT, you assign the value of the return
Mapping Designer: parameter directly to an output port, which can apply directly
- Select the stored procedure icon and add a Stored Procedure to a target.
transformation.  You can also combine the two options by assigning one output
- Click Transformation > Import Stored Procedure. parameter as PROC_RESULT, and the other parameter as a
- Click Transformation > Create, and then select Stored variable.
Procedure.
 Use PROC_RESULT only within an expression.
 When you import a stored procedure containing a period (.) in  If you do not use PROC_RESULT or a variable, the port containing
the stored procedure name, the Designer substitutes an the expression captures a NULL.
underscore (_) for the period in the Stored Procedure  You cannot use PROC_RESULT in a connected Lookup
transformation name. transformation or within the Call Text for a Stored Procedure
transformation
Manually Creating Stored Procedure Transformations
 To create a Stored Procedure transformation manually, you need Expression Rules
to know the input parameters, output parameters, and return - A single output parameter is returned using the variable
values of the stored procedure, if there are any. PROC_RESULT.
 You must also know the datatypes of those parameters, and the - When you use a stored procedure in an expression, use the :SP
name of the stored procedure. reference qualifier. To avoid typing errors, select the Stored
 All these are configured through Import Stored Procedure. Procedure node in the Expression Editor, and double-click the
 To create a Stored Procedure transformation, In the Mapping name of the stored procedure.
Designer, click Transformation > Create, and then select Stored - However, the same instance of a Stored Procedure
Procedure transformation cannot run in both connected and unconnected
mode in a mapping. You must create different instances of the
<Print – 384-385> transformation.
- The input/output parameters in the expression must match the
Changing the Stored Procedure input/output ports in the Stored Procedure transformation. If the
 If the number of parameters or the return value in a stored stored procedure has an input parameter, there must also be an
procedure changes, you can either re-import it or edit the input port in the Stored Procedure transformation.
Stored Procedure transformation manually. - When you write an expression that includes a stored procedure,
 The Designer does not verify the Stored Procedure list the parameters in the same order that they appear in the
transformation each time you open the mapping. stored procedure and the Stored Procedure transformation.
 After you import or create the transformation, the Designer does - The parameters in the expression must include all of the
not validate the stored procedure. parameters in the Stored Procedure transformation. You cannot
 The session fails if the stored procedure does not match the leave out an input parameter. If necessary, pass a dummy
transformation. variable to the stored procedure.
- The arguments in the expression must be the same datatype
Configuring an Unconnected Transformation and precision as those in the Stored Procedure transformation.
 An unconnected Stored Procedure transformation is not directly - Use PROC_RESULT to apply the output parameter of a stored
connected to the flow of data through the mapping. procedure expression directly to a target. You cannot use a
Instead, the stored procedure runs either: variable for the output parameter to pass the results directly to a
From an expression - Called from an expression written in the target. Use a local variable to pass the results to an output port
Expression Editor within another transformation in the mapping. within the same transformation.
Pre- or post-session - Runs before or after a session
96
- Nested stored procedures allow passing the return value of one S. TRANSACTION CONTROL
stored procedure as the input parameter of another stored
 A transaction is the set of rows bound by commit or roll back
procedure. For example, if you have the following two stored
rows. You can define a transaction based on a varying number of
procedures:
input rows. You might want to define transactions based on a
- get_employee_id (employee_name)
group of rows ordered on a common key, such as employee ID
- get_employee_salary (employee_id)
or order entry date.
And the return value for get_employee_id is an employee ID
number, the syntax for a nested stored procedure is:
 In PowerCenter, you define transaction control at the following
:sp.get_employee_salary (:sp.get_employee_id
levels:
(employee_name))
Within a mapping - Within a mapping, you use the Transaction
You can have multiple levels of nested stored procedures.
Control transformation to define a transaction. You define
- Do not use single quotes around string parameters. If the input
transactions using an expression in a Transaction Control
parameter does not contain spaces, do not use any quotes. If the
transformation. Based on the return value of the expression, you
input parameter contains spaces, use double quotes.
can choose to commit, roll back, or continue without any
transaction changes.
Tips for Stored Procedure Transformations
Within a session - When you configure a session, you configure it
 Do not run unnecessary instances of stored procedures.
for user-defined commit. You can choose to commit or roll back a
 Each time a stored procedure runs during a mapping, the session
transaction if the Integration Service fails to transform or write
must wait for the stored procedure to complete in the database.
any row to the target.
You have two possible options to avoid this:
Reduce the row count - Use an active transformation prior to the
 If the mapping has a flat file target you can generate an output
Stored Procedure transformation to reduce the number of rows
file each time the Integration Service starts a new transaction.
that must be passed the stored procedure. Or, create an
You can dynamically name each target flat file.
expression that tests the values before passing them to the
stored procedure to make sure that the value does not really
Transaction Control Transformation Properties
need to be passed.
 Use the Transaction Control transformation to define conditions
Create an expression - Most of the logic used in stored
to commit and roll back transactions from transactional targets.
procedures can be easily replicated using expressions in the
 Transactional targets include relational, XML, and dynamic
Designer.
MQSeries targets

 The transaction control expression uses the IIF function to test


each row against the condition.
Use the following syntax for the expression:
Troubleshooting Stored Procedures
IIF (condition, value1, value2)
The session did not have errors before, but now it fails on the
stored procedure.
 The Integration Service evaluates the condition on a row-by-row
 The most common reason for problems with a Stored Procedure
basis.
transformation results from changes made to the stored
 The return value determines whether the Integration Service
procedure in the database. If the input/output parameters or
commits, rolls back, or makes no transaction changes to the
return value changes in a stored procedure, the Stored
row.
Procedure transformation becomes invalid. You must either
 When the Integration Service issues a commit or roll back based
import the stored procedure again, or manually configure the
on the return value of the expression, it begins a new
stored procedure to add, remove, or modify the appropriate
transaction.
ports.
 Use the following built-in variables in the Expression Editor when
you create a transaction control expression:
The session has been invalidated since I last edited the
TC_CONTINUE_TRANSACTION - The Integration Service does not
mapping. Why?
perform any transaction change for this row. This is the default
 Any changes you make to the Stored Procedure transformation
value of the expression.
may invalidate the session. The most common reason is that you
TC_COMMIT_BEFORE - The Integration Service commits the
have changed the type of stored procedure, such as from a
transaction, begins a new transaction, and writes the current row
Normal to a Post-load Source type.
to the target. The current row is in the new transaction.

97
TC_COMMIT_AFTER - The Integration Service writes the current  Downstream transformations with the Transaction level
row to the target, commits the transaction, and begins a new transformation scope can use the transaction boundaries
transaction. The current row is in the committed transaction. defined by an upstream Transaction Control transformation.
TC_ROLLBACK_BEFORE - The Integration Service rolls back the  The following figure shows a valid mapping with a Transaction
current transaction, begin a new transaction, and write the Control transformation that is effective for a Sorter
current row to the target. The current row is in the new transformation, but ineffective for the target:
transaction.
TC_ROLLBACK_AFTER - The Integration Service writes the current
row to the target, rolls back the transaction, and begins a new
transaction. The current row is in the rolled back transaction.
 In this example, TCT1 transformation is ineffective for the target,
but effective for the Sorter transformation.
 If the transaction control expression evaluates to a value other
 The Sorter transformation’s Transformation Scope property is
than commit, roll back, or continue, the Integration Service
Transaction. It uses the transaction boundaries defined by TCT1.
fails the session.
 The Aggregator Transformation Scope property is All Input.
 It drops transaction boundaries defined by TCT1.
Using Transaction Control Transformations in Mappings
 The TCT2 transformation is an effective Transaction Control
 Transaction Control transformations are transaction generators.
transformation for the target.
 They define and redefine transaction boundaries in a mapping.
 They drop any incoming transaction boundary from an upstream
 Mapping Guidelines and Validation
active source or transaction generator, and they generate new
- If the mapping includes an XML target, and you choose to
transaction boundaries downstream.
append or create a new document on commit, the input groups
 You can also use Custom transformations configured to generate
must receive data from the same transaction control point.
transactions to define transaction boundaries.
- Transaction Control transformations connected to any target
other than relational, XML, or dynamic MQSeries targets are
 Transaction Control transformations can be effective or
ineffective for those targets.
ineffective for the downstream transformations and targets in
- You must connect each target instance to a Transaction Control
the mapping.
transformation you can connect multiple targets to a single
 The Transaction Control transformation becomes ineffective for
Transaction Control transformation.
downstream transformations or targets if you put a
- You can connect only one effective Transaction Control
transformation that drops incoming transaction boundaries after
transformation to a target.
it.
- You cannot place a Transaction Control transformation in a
 This includes any of the following active sources or
pipeline branch that starts with a Sequence Generator
transformations:
transformation.
- Aggregator with the All Input level transformation scope
- If you use a dynamic Lookup transformation and a Transaction
- Joiner with the All Input level transformation scope
Control transformation in the same mapping, a rolled-back
- Rank with the All Input level transformation scope
transaction might result in unsynchronized target data.
- Sorter with the All Input level transformation scope
- A Transaction Control transformation may be effective for one
- Custom with the All Input level transformation scope
target and ineffective for another target. If each target is
- Custom transformation configured to generate transactions
connected to an effective Transaction Control transformation, the
- Transaction Control transformation
mapping is valid.
- A multiple input group transformation, such as a Custom
- Either all targets or none of the targets in the mapping should
transformation, connected to multiple upstream transaction
be connected to an effective Transaction Control transformation.
control points
T. UNION
 Mappings with Transaction Control transformations that are
 The Integration Service processes all input groups in parallel.
ineffective for targets may be valid or invalid.
 It concurrently reads sources connected to the Union
 When you save or validate the mapping, the Designer displays a
transformation and pushes blocks of data into the input groups
message indicating which Transaction Control transformations
of the transformation.
are ineffective for targets.
 You can connect heterogeneous sources to a Union
 Although a Transaction Control transformation may be ineffective
transformation.
for a target, it can be effective for downstream transformations.

98
 The transformation merges sources with matching ports and
outputs the data from one output group with the same ports as
the input groups.
 The Union transformation is developed using the Custom
transformation
 Similar to the UNION ALL statement, the Union transformation
does not remove duplicate rows

Rules and Guidelines for Union - Forwarding Rejected Rows


- You can create multiple input groups, but only one output  You can configure the Update Strategy transformation to either
group. pass rejected rows to the next transformation or drop them.
- All input groups and the output group must have matching  By default, the Integration Service forwards rejected rows to the
ports. The precision, datatype, and scale must be identical across next transformation.
all groups.  The Integration Service flags the rows for reject and writes them
- The Union transformation does not remove duplicate rows. To to the session reject file.
remove duplicate rows, you must add another transformation  If you do not select Forward Rejected Rows, the Integration
such as a Router or Filter transformation. Service drops rejected rows and writes them to the session log
- You cannot use a Sequence Generator or Update Strategy file.
transformation upstream from a Union transformation.  If you enable row error handling, the Integration Service writes
- The Union transformation does not generate transactions the rejected rows and the dropped rows to the row error logs.
 It does not generate a reject file. If you want to write the dropped
 When a Union transformation in a mapping receives data from rows to the session log in addition to the row error logs, you can
a single transaction generator, the Integration Service enable verbose data tracing.
propagates transaction boundaries.
 When the transformation receives data from multiple  Update strategy expression uses the IIF or DECODE function from
transaction generators, the Integration Service drops all the transformation language to test each row to see if it meets a
incoming transaction boundaries and outputs rows in an open particular condition
transaction.

U. UPDATE STRATEGY Aggregator and Update Strategy Transformations


 In PowerCenter, you set the update strategy at two different  When you connect Aggregator and Update Strategy
levels: transformations as part of the same pipeline, you have the
Within a session - When you configure a session, you can instruct following options:
the Integration Service to either treat all rows in the same way Position the Aggregator before the Update Strategy
(for example, treat all rows as inserts), or use instructions coded transformation -
into the session mapping to flag rows for different database  In this case, you perform the aggregate calculation, and then use
operations. the Update Strategy transformation to flag rows that contain the
Within a mapping - Within a mapping, you use the Update results of this calculation for insert, delete, or update.
Strategy transformation to flag rows for insert, delete, update, or Position the Aggregator after the Update Strategy
reject. transformation -
Note: You can also use the Custom transformation to flag rows  Here, you flag rows for insert, delete, update, or reject before you
for insert, delete, update, or reject perform the aggregate calculation.
 How you flag a particular row determines how the Aggregator
Flagging Rows Within a Mapping transformation treats any values in that row used in the
 For the greatest degree of control over the update strategy, you calculation.
add Update Strategy transformations to a mapping.  For example, if you flag a row for delete and then later use the
 The most important feature of this transformation is its update row to calculate the sum, the Integration Service subtracts the
strategy expression, used to flag individual rows for insert, value appearing in this row.
delete, update, or reject.  If the row had been flagged for insert, the Integration Service
 The following table lists the constants for each database would add its value to the sum.
operation and their numeric equivalent:
Lookup and Update Strategy Transformations
99
 When you create a mapping with a Lookup transformation that  Add an Update Strategy transformation to the mapping. When
uses a dynamic lookup cache, you must use Update Strategy you write the transformation update strategy expression, use
transformations to flag the rows for the target tables. DECODE or IIF to specify the criteria for rejecting the row. When
 When you configure a session using Update Strategy you configure a session that uses this mapping, select Data
transformations and a dynamic lookup cache, you must define Driven for the Treat Source Rows As session property.
certain session properties.
 You must define the Treat Source Rows As option as Data Driven. V. XML SOURCE QUALIFIER
 Specify this option on the Properties tab in the session  An XML Source Qualifier transformation always has one input or
properties. output port for every column in the XML source.
 You must also define the following update strategy target table  When you create an XML Source Qualifier transformation for a
options: source definition, the Designer links each port in the XML source
- Select Insert definition to a port in the XML Source Qualifier transformation.
- Select Update as Update  You cannot remove or edit any of the links.
- Do not select Delete  If you remove an XML source definition from a mapping, the
 These update strategy target table options ensure that the Designer also removes the corresponding XML Source Qualifier
Integration Service updates rows marked for update and inserts transformation.
rows marked for insert.  You can link one XML source definition to one XML Source
 If you do not choose Data Driven, the Integration Service flags all Qualifier transformation
rows for the database operation you specify in the Treat Source
Rows As option and does not use the Update Strategy  You can link ports of one XML Source Qualifier group to ports of
transformations in the mapping to flag the rows. different transformations to form separate data flows.
 The Integration Service does not insert and update the correct  However, you cannot link ports from more than one group in an
rows. XML Source Qualifier transformation to ports in the same target
 If you do not choose Update as Update, the Integration Service transformation
does not correctly update the rows flagged for update in the
target table.
As a result, the lookup cache and target table might become W. XML PARSER
unsynchronized.
 Use an XML Parser transformation to extract XML inside a
Only perform inserts into a target table.
pipeline.
 When you configure the session, select Insert for the Treat Source
 The XML Parser transformation lets you extract XML data from
Rows As session property. Also, make sure that you select the
messaging systems, such as TIBCO or MQ Series, and from other
Insert option for all target instances in the session.
sources, such as files or databases.
Delete all rows in a target table.
 The XML Parser transformation functionality is similar to the XML
 When you configure the session, select Delete for the Treat
source functionality, except it parses the XML in the pipeline.
Source Rows As session property. Also, make sure that you
 For example, you might want to extract XML data from a TIBCO
select the Delete option for all target instances in the session.
source and pass the data to relational targets.
Only perform updates on the contents of a target table.
 The XML Parser transformation reads XML data from a single
 When you configure the session, select Update for the Treat
input port and writes data to one or more output ports.
Source Rows As session property. When you configure the
update options for each target table instance, make sure you
select the Update option for each target instance.
X. XML GENERATOR
Perform different database operations with different rows
destined for the same target table.  Use an XML Generator transformation to create XML inside a
 Add an Update Strategy transformation to the mapping. When pipeline.
you write the transformation update strategy expression, use  The XML Generator transformation lets you read data from
either the DECODE or IIF function to flag rows for different messaging systems, such as TIBCO and MQ Series, or from other
operations (insert, delete, update, or reject). When you sources, such as files or databases.
configure a session that uses this mapping, select Data Driven  The XML Generator transformation functionality is similar to the
for the Treat Source Rows As session property. Make sure that XML target functionality, except it generates the XML in the
you select the Insert, Delete, or one of the Update options for pipeline.
each target table instance.  For example, you might want to extract data from relational
Reject data. sources and pass XML data to targets.

100
 The XML Generator transformation accepts data from multiple - If you need to create both single-level and nested functions,
ports and writes XML through a single output port. create separate Aggregator transformations.
- You cannot use strings in numeric expressions. For example, the
expression 1 + '1' is not valid because you can only perform
addition on numeric datatypes. You cannot add an integer and a
string.
- You cannot use strings as numeric parameters. For example, the
expression SUBSTR (TEXT_VAL, '1', 10) is not valid because the
SUBSTR function requires an integer value, not a string, as the
start position.
- You cannot mix datatypes when using comparison operators.
For example, the expression 123.4 = '123.4' is not valid because it
compares a decimal value with a string.
- You can pass a value from a port, literal string or number,
variable, Lookup transformation, Stored Procedure
transformation, External Procedure transformation, or the results
of another expression.
- Use the ports tab in the Expression Editor to enter a port name
into an expression. If you rename a port in a connected
transformation, the Designer propagates the name change to
expressions in the transformation.
- Separate each argument in a function with a comma.
- Except for literals, the transformation language is not case
sensitive.
7. TRANSFORMATION LANGUAGE REFERENCE - Except for literals, the Designer and PowerCenter Integration
Service ignore spaces.
- The colon (:), comma (,), and period (.) have special meaning
and should be used only to specify syntax.
- The PowerCenter Integration Service treats a dash (-) as a minus
operator.
- If you pass a literal value to a function, enclose literal strings
within single quotation marks. Do not use quotation marks for
literal numbers. The PowerCenter Integration Service treats any
string value enclosed in single quotation marks as a character
string.
- When you pass a mapping parameter or variable or a workflow
variable to a function within an expression, do not use quotation
marks to designate mapping parameters or variables or workflow
variables.
- Do not use quotation marks to designate ports.
- You can nest multiple functions within an expression except
aggregate functions, which allow only one nested aggregate
function. The PowerCenter Integration Service evaluates the
expression starting with the innermost function.

Reserved Words
Rules and Guidelines for Expression Syntax Some keywords in the transformation language, such as
Use the following rules and guidelines when you write constants, operators, and built-in variables, are reserved for
expressions: specific functions. These include:
- You cannot include both single-level and nested aggregate - :EXT
functions in an Aggregator transformation. - :INFA
- :LKP
101
- :MCR  Use the Treat Null In Comparison Operators As property to
- :SD configure how the PowerCenter Integration Service handles null
- :SEQ values in comparison expressions.
- :SP
- :TD This PowerCenter Integration Service configuration property
- AND affects the behavior of the following comparison operators in
- DD_DELETE expressions:
- DD_INSERT =, !=, ^=, <>, >, >=, <, <=
- DD_REJECT
- DD_UPDATE For example, consider the following expressions:
- FALSE NULL > 1
- NOT NULL = NULL
- NULL
- OR
- PROC_RESULT
- SESSSTARTTIME
- SPOUTPUT
- SYSDATE
- TRUE
- WORKFLOWSTARTTIME

The following words are reserved for workflow expressions:


- ABORTED
- DISABLED
- FAILED
Transaction Control Variables
- NOTSTARTED
 The following example uses transaction control variables to
- STARTED
determine where to process a row:
- STOPPED IIF (NEWTRAN=1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)
- SUCCEEDED  If NEWTRAN=1, the TC_COMMIT_BEFORE variable causes a
commit to occur before the current row processes.
Note: You cannot use a reserved word to name a port or local  Otherwise, the TC_CONTINUE_TRANSACTION variable forces the
variable. You can only use reserved words within transformation row to process in the current transaction.
and workflow expressions. Reserved words have predefined
meanings in expressions  Use the following variables in the Expression Editor when you
create a transaction control expression:
Working with Null Values in Boolean Expressions TC_CONTINUE_TRANSACTION - The PowerCenter Integration
 Expressions that combine a null value with a Boolean expression Service does not perform any transaction change for the current
produce results that are ANSI compliant. row. This is the default transaction control variable value.
 For example, the PowerCenter Integration Service produces the TC_COMMIT_BEFORE - The PowerCenter Integration Service
following results: commits the transaction, begins a new transaction, and writes
- NULL AND TRUE = NULL the current row to the target. The current row is in the new
- NULL AND FALSE = FALSE transaction.
TC_COMMIT_AFTER - The PowerCenter Integration Service
Working with Null Values in Comparison Expressions writes the current row to the target, commits the transaction,
 When you use a null value in an expression containing a and begins a new transaction. The current row is in the
comparison operator, the PowerCenter Integration Service committed transaction.
produces a null value. TC_ROLLBACK_BEFORE - The PowerCenter Integration Service
 However, you can also configure the PowerCenter Integration rolls back the current transaction, begin a new transaction, and
Service to treat null values as high or low in comparison write the current row to the target. The current row is in the new
operations. transaction.

DATES
102
 Date functions accept datetime values only. To pass a string to a and 49, the PowerCenter Integration Service returns the current
date function, first use TO_DATE to convert it to a datetime value. century plus the two-digit year from the source string. If the
 For example, the following expression converts a string port to source string year is between 50 and 99, the Integration Service
datetime values and then adds one month to each date: returns the previous century plus the two-digit year from the
ADD_TO_DATE( TO_DATE( STRING_PORT, 'MM/DD/RR'), 'MM', 1 ) source string.
 You can use dates between 1 A.D. and 9999 A.D in the Gregorian  Current Year Between 50 and 99 - If the current year is between
calendar system. 50 and 99 (such as 1998) and the source string year is between
0 and 49, the PowerCenter Integration Service returns the next
Julian Day, Modified Julian Day, and the Gregorian calendar century plus the two-digit year from the source string. If the
 You can use dates in the Gregorian calendar system only. source string year is between 50 and 99, the PowerCenter
 Dates in the Julian calendar are called Julian dates and are not Integration Service returns the current century plus the
supported in Informatica. specified two-digit year
 This term should not be confused with Julian Day or with
Modified Julian Day.  The following table summarizes how the RR format string
converts to dates:
 You can manipulate Modified Julian Day (MJD) formats using the J
format string.
 The MJD for a given date is the number of days to that date since
Jan 1 4713 B.C. 00:00:00 (midnight).
 By definition, MJD includes a time component expressed as a
decimal, which represents some fraction of 24 hours.
 The J format string does not convert this time component.
Example
 For example, the following TO_DATE expression converts strings The following expression produces the same return values for any
in the SHIP_DATE_MJD_STRING port to date values in the current year between 1950 and 2049:
default date format:
TO_DATE (SHIP_DATE_MJD_STR, 'J') TO_DATE( ORDER_DATE, 'MM/DD/RR' )
SHIP_DATE_MJD_STR RETURN_VALUE
2451544 Dec 31 1999 00:00:00.000000000 ORDER_DATE RETURN_VALUE
2415021 Jan 1 1900 00:00:00.000000000 '04/12/98' 04/12/1998 00:00:00.000000000
'11/09/01' 11/09/2001 00:00:00.000000000
 Because the J format string does not include the time portion of a
date, the return values have the time set to DIFFERENCE BETWEEN THE YY AND RR FORMAT STRINGS
00:00:00.000000000.  PowerCenter also provides a YY format string.
 You can also use the J format string in TO_CHAR expressions. For  Both the RR and YY format strings specify two-digit years.
example, use the J format string in a TO_CHAR expression to  The YY and RR format strings produce identical results when
convert date values to MJD values expressed as strings. For used with all date functions except TO_DATE.
example: TO_CHAR(SHIP_DATE, 'J')  In TO_DATE expressions, RR and YY produce different results.

SHIP_DATE RETURN_VALUE The following table shows the different results each format string
Dec 31 1999 23:59:59 2451544 returns:
Jan 1 1900 01:02:03 2415021

RR FORMAT STRING
 The transformation language provides the RR format string to
convert strings with two-digit years to dates.
 Using TO_DATE and the RR format string, you can convert a string
in the format MM/DD/RR to a date.
 The RR format string converts data differently depending on the
 For dates in the year 2000 and beyond, the YY format string
current year.
produces less meaningful results than the RR format string. Use
 Current Year Between 0 and 49 - If the current year is between 0
the RR format string for dates in the twenty-first century
and 49 (such as 2003) and the source string year is between 0
103
Default Date Format
 By default, the date format is MM/DD/YYYY HH24:MI:SS.US.
Note: The format string is not case sensitive. It must always be
enclosed within single quotation marks.

 The following table describes date functions that use date format
strings to evaluate input dates:

TO_CHAR Format Strings


 TO_CHAR is generally used when the target is a flat file or a
database that does not support a Date/Time datatype.
 You can convert the entire date or a part of the date to a string.

 The following table summarizes the format strings for dates in


the function TO_CHAR:

TO_DATE and IS_DATE Format Strings


 The TO_DATE function converts a string with the format you
specify to a datetime value.
 TO_DATE is generally used to convert strings from flat files to
datetime values.
Note: TO_DATE and IS_DATE use the same set of format strings.
104
 The source string format and the format string must match, Year - Enter a positive or negative integer in the amount
including any date separator. argument. Use any of the year format strings: Y, YY, YYY, or YYYY.
 If any part does not match, the PowerCenter Integration Service The following expression adds 10 years to all dates in the
does not convert the string, and it skips the row. SHIP_DATE port: ADD_TO_DATE ( SHIP_DATE, 'YY', 10 )
 If you omit the format string, the source string must be in the Month - Enter a positive or negative integer in the amount
date format specified in the session argument. Use any of the month format strings: MM, MON,
 The following table summarizes the format strings for the MONTH. The following expression subtracts 10 months from each
functions TO_DATE and IS_DATE: date in the SHIP_DATE port: ADD_TO_DATE( SHIP_DATE,
<Same as the TO_CHAR formats> 'MONTH', -10 )
Day - Enter a positive or negative integer in the amount
RULES AND GUIDELINES FOR DATE FORMAT STRINGS argument. Use any of the day format strings: D, DD, DDD, DY, and
 The format of the TO_DATE string must match the format string DAY. The following expression adds 10 days to each date in the
including any date separators. If it does not, the PowerCenter SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'DD', 10 )
Integration Service might return inaccurate values or skip the Hour - Enter a positive or negative integer in the amount
row. argument. Use any of the hour format strings: HH, HH12, HH24.
 The format string must be enclosed within single quotation marks The following expression adds 14 hours to each date in the
SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'HH', 14 )
ABORT Minute - Enter a positive or negative integer in the amount
 When the PowerCenter Integration Service encounters an ABORT argument. Use the MI format string to set the minute. The
function, it stops transforming data at that row. following expression adds 25 minutes to each date in the
 It processes any rows read before the session aborts and loads SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'MI', 25 )
them based on the source- or target-based commit interval and Seconds - Enter a positive or negative integer in the amount
the buffer block size defined for the session. argument. Use the SS format string to set the second. The
 The PowerCenter Integration Service writes to the target up to following expression adds 59 seconds to each date in the
the aborted row and then rolls back all uncommitted data to the SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'SS', 59 )
last commit point. Milliseconds - Enter a positive or negative integer in the amount
 You can perform recovery on the session after rollback argument. Use the MS format string to set the milliseconds. The
 If you use ABORT in an expression for an unconnected port, the following expression adds 125 milliseconds to each date in the
PowerCenter Integration Service does not run the ABORT SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'MS', 125 )
function. Microseconds - Enter a positive or negative integer in the amount
argument. Use the US format string to set the microseconds. The
Syntax following expression adds 2,000 microseconds to each date in the
ABORT( string ) SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'US', 2000 )
Return Value Nanoseconds - Enter a positive or negative integer in the amount
NULL. argument. Use the NS format string to set the nanoseconds. The
following expression adds 3,000,000 nanoseconds to each date in
ABS the SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'NS', 3000000 )
Returns the absolute value of a numeric value.
 If you pass a value that creates a day that does not exist in a
Syntax particular month, the PowerCenter Integration Service returns
ABS( numeric_value ) the last day of the month.

Return Value  For example, if you add one month to Jan 31 1998, the
Positive numeric value PowerCenter Integration Service returns Feb 28 1998.
ADD_TO_DATE
 Adds a specified amount to one part of a datetime value, and ASCII
returns a date in the same format as the date you pass to the  The ASCII function returns the numeric ASCII value of the first
function. character of the string passed to the function
 ADD_TO_DATE accepts positive and negative integer values. Use  You can pass a string of any size to ASCII, but it evaluates only
ADD_TO_DATE to change the following parts of a date: the first character in the string

 This function is identical in behavior to the CHRCODE function.


105
 If you use ASCII in existing expressions, they will still work  The following expression returns NULL based on an index value of
correctly. However, when you create new expressions, use the 4: CHOOSE( 4, 'knife', 'flashlight', 'diving hood' )
CHRCODE function instead of the ASCII function.
CHR
AVG  ASCII Mode - CHR returns the ASCII character corresponding to
 Returns the average of all values in a group of rows. the numeric value you pass to this function.
 Optionally, you can apply a filter to limit the rows you read to  Unicode Mode - returns the Unicode character corresponding to
calculate the average. the numeric value you pass to this function
 You can nest only one other aggregate function within AVG, and  ASCII values fall in the range 0 to 255.
the nested function must return a Numeric datatype.  You can pass any integer to CHR, but only ASCII codes 32 to 126
are printable characters.
Syntax
AVG( numeric_value [, filter_condition ] ) Syntax
CHR( numeric_value )
 If a value is NULL, AVG ignores the row. However, if all values
passed from the port are NULL, AVG returns NULL  Use the CHR function to concatenate a single quote onto a string.
 The single quote is the only character that you cannot use inside
 AVG groups values based on group by ports you define in the a string literal.
transformation, returning one result for each group.  Consider the following example: 'Joan' || CHR(39) || 's car'
 If there is not a group by port, AVG treats all rows as one group,
returning one value. The return value is: Joan's car

Example: CHRCODE
 The following expression returns the average wholesale cost of  ASCII Mode - CHRCODE returns the numeric ASCII value of the
flashlights: AVG( WHOLESALE_COST, ITEM_NAME='Flashlight' ) first character of the string passed to the function.
 UNICODE Mode - returns the numeric Unicode value of the first
 You can perform arithmetic on the values passed to AVG before character of the string passed to the function
the function calculates the average. For example: AVG( QTY *
PRICE - DISCOUNT ) COMPRESS
 Compresses data using the zlib 1.2.1 compression algorithm.
CEIL  Use the COMPRESS function before you send large amounts of
 Returns the smallest integer greater than or equal to the numeric data over a wide area network.
value passed to this function.
Syntax
 For example, if you pass 3.14 to CEIL, the function returns 4. If COMPRESS( value )
you pass 3.98 to CEIL, the function returns 4.
Likewise, if you pass -3.17 to CEIL, the function returns -3. Return Value
 Compressed binary value of the input value.
Syntax  NULL if the input is a null value.
CEIL( numeric_value )
CONCAT
CHOOSE Syntax
 Chooses a string from a list of strings based on a given position. CONCAT( first_string, second_string )
 You specify the position and the value.
 If the value matches the position, the PowerCenter Integration Return Value
Service returns the value. String.

Syntax  If one of the strings is NULL, CONCAT ignores it and returns the
CHOOSE( index, string1 [, string2, ..., stringN] ) other string.
 If both strings are NULL, CONCAT returns NULL
 The following expression returns the string ‘flashlight’ based on
an index value of 2: CHOOSE( 2, 'knife', 'flashlight', 'diving hood' )  CONCAT does not add spaces to separate strings.
106
 If you want to add a space between two strings, you can write an  You can add a condition to filter rows out of the row set before
expression with two nested CONCAT functions. calculating the running total.
 Use CUME and similar functions (such as MOVINGAVG and
MOVINGSUM) to simplify reporting by calculating running
CONVERT_BASE values.
 Converts a number from one base value to another base value.
Syntax
Syntax CUME( numeric_value [, filter_condition] )
CONVERT_BASE( value, source_base, dest_base )
DATE_COMPARE
 The following example converts 2222 from the decimal base  Returns an integer indicating which of two dates is earlier.
value 10 to the binary base value 2: CONVERT_BASE( "2222", 10, 2 ) DATE_COMPARE returns an integer value rather than a date
 The PowerCenter Integration Service returns 100010101110 value.

COUNT Return Value


 Returns the number of rows that have non-null values in a group.  -1 if the first date is earlier.
 Optionally, you can include the asterisk (*) argument to count all input  0 if the two dates are equal.
values in a transformation.  1 if the second date is earlier.
 You can nest only one other aggregate function within COUNT.  NULL if one of the date values is NULL
 You can apply a condition to filter rows before counting them.
DATE_DIFF
Syntax  Returns the length of time between two dates. You can request
COUNT( value [, filter_condition] ) the format to be years, months, days, hours, minutes, seconds,
or milliseconds, microseconds, or nanoseconds.
COUNT( * [, filter_condition] )  The PowerCenter Integration Service subtracts the second date
from the first date and returns the difference.
 COUNT groups values based on group by ports you define in the
transformation, returning one result for each group. Syntax
 If there is no group by port COUNT treats all rows as one group, DATE_DIFF( date1, date2, format )
returning one value
Return Value
CRC32  Double value. If date1 is later than date2, the return value is a
 Returns a 32-bit Cyclic Redundancy Check (CRC32) value. positive number. If date1 is earlier than date2, the return value
 Use CRC32 to find data transmission errors. is a negative number.
 You can also use CRC32 if you want to verify that data stored in a  0 if the dates are the same.
file has not been modified.  NULL if one (or both) of the date values is NULL.
 If you use CRC32 to perform a redundancy check on data in ASCII
mode and Unicode mode, the PowerCenter Integration Service
may generate different results on the same input value.
DECODE
Note: CRC32 can return the same output for different input
strings. If you want to generate keys in a mapping, use a Examples
Sequence Generator transformation. If you use CRC32 to  You might use DECODE in an expression that searches for a
generate keys in a mapping, you may receive unexpected results. particular ITEM_ID and returns the ITEM_NAME:
DECODE( ITEM_ID, 10, 'Flashlight',
Syntax 14, 'Regulator',
CRC32( value ) 20, 'Knife',
40, 'Tank',
CUME 'NONE' )
 Returns a running total. A running total means CUME returns a
total each time it adds a value. ITEM_ID RETURN VALUE
10 Flashlight
107
14 Regulator EXP
17 NONE  Returns e raised to the specified power (exponent), where
20 Knife e=2.71828183.
25 NONE  For example, EXP(2) returns 7.38905609893065.
NULL NONE  You might use this function to analyze scientific and technical
40 Tank data rather than business data.
 EXP is the reciprocal of the LN function, which returns the natural
 DECODE returns the default value of NONE for items 17 and 25 logarithm of a numeric value.
because the search values did not match the ITEM_ID.
 Also, DECODE returns NONE for the NULL ITEM_ID. Syntax
EXP( exponent )
 The following expression tests multiple columns and conditions,
evaluated in a top to bottom order for TRUE or FALSE: Return Value
 Double value.
DECODE( TRUE,  NULL if a value passed as an argument to the function is NULL
Var1 = 22, 'Variable 1 was 22!',
Var2 = 49, 'Variable 2 was 49!', FIRST
Var1 < 23, 'Variable 1 was less than 23.',  Returns the first value found within a port or group.
Var2 > 30, 'Variable 2 was more than 30.',  Optionally, you can apply a filter to limit the rows the
'Variables were out of desired ranges.') PowerCenter Integration Service reads.
 You can nest only one other aggregate function within FIRST.
Var1 Var2 RETURN VALUE
21 47 Variable 1 was less than 23. Syntax
22 49 Variable 1 was 22! FIRST( value [, filter_condition ] )
23 49 Variable 2 was 49!
24 27 Variables were out of desired ranges. Return Value
25 50 Variable 2 was more than 30. First value in a group

ERROR  If a value is NULL, FIRST ignores the row. However, if all values
 Causes the PowerCenter Integration Service to skip a row and passed from the port are NULL, FIRST returns NULL
issue an error message, which you define.  FIRST groups values based on group by ports you define in the
 The error message displays in the session log. transformation, returning one result for each group.
 The PowerCenter Integration Service does not write these  If there is no group by port, FIRST treats all rows as one group,
skipped rows to the session reject file. returning one value.

 For example, you use the ERROR function in an expression, and


you assign the default value, ‘1234’, to the output port. FLOOR
 Each time the PowerCenter Integration Service encounters the  Returns the largest integer less than or equal to the numeric
ERROR function in the expression, it overrides the error with the value you pass to this function.
value ‘1234’ and passes ‘1234’ to the next transformation.  For example, if you pass 3.14 to FLOOR, the function returns 3.
 It does not skip the row, and it does not log an error in the  If you pass 3.98 to FLOOR, the function returns 3.
session log  Likewise, if you pass -3.17 to FLOOR, the function returns -4.

IIF( SALARY < 0, ERROR ('Error. Negative salary found. Row skipped.', Syntax
EMP_SALARY ) FLOOR( numeric_value )

SALARY RETURN VALUE Return Value


10000 10000  Integer if you pass a numeric value with declared precision
-15000 'Error. Negative salary found. Row skipped.' between 0 and 28.
NULL NULL  Double if you pass a numeric value with declared precision
150000 150000 greater than 28.
 NULL if a value passed to the function is NULL.
108
 By default, the match is case sensitive
FV
 Returns the future value of an investment, where you make Return Value
periodic, constant payments and the investment earns a  Value1 if it is the greatest of the input values, value2 if it is the
constant interest rate. greatest of the input values, and so on.
 NULL if any of the arguments is null
Syntax
FV( rate, terms, payment [, present value, type] ) IIF
 Returns one of two values you specify, based on the results of a
Example condition.
 You deposit $2,000 into an account that earns 9% annual interest
compounded monthly (monthly interest of 9%/ 12, or 0.75%). Syntax
 You plan to deposit $250 at the beginning of every month for the IIF( condition, value1 [,value2] )
next 12 months.
 The following expression returns $5,337.96 as the account  Unlike conditional functions in some systems, the FALSE (value2)
balance at the end of 12 months: condition in the IIF function is not required.
FV(0.0075, 12, -250, -2000, TRUE)  If you omit value2, the function returns the following when the
condition is FALSE:
GET_DATE_PART - 0 if value1 is a Numeric datatype.
 Returns the specified part of a date as an integer value. - Empty string if value1 is a String datatype.
 Therefore, if you create an expression that returns the month - NULL if value1 is a Date/Time datatype.
portion of the date, and pass a date such as Apr 1 1997
00:00:00, GET_DATE_PART returns 4.  For example, the following expression does not include a FALSE
condition and value1 is a string datatype so the PowerCenter
Syntax Integration Service returns an empty string for each row that
GET_DATE_PART( date, format ) evaluates to FALSE: IIF( SALES > 100, EMP_NAME )

Return Value SALES EMP_NAME RETURN VALUE


 Integer representing the specified part of the date. 150 John Smith John Smith
 NULL if a value passed to the function is NULL. 50 Pierre Bleu '' (empty string)
120 Sally Green Sally Green
NULL Greg Jones '' (empty string)

 When you use IIF, the datatype of the return value is the same
 The following expressions return the day for each date in the as the datatype of the result with the greatest precision.
DATE_SHIPPED port:
GET_DATE_PART( DATE_SHIPPED, 'D' )  For example, you have the following expression:
GET_DATE_PART( DATE_SHIPPED, 'DD' ) IIF( SALES < 100, 1, .3333 )
GET_DATE_PART( DATE_SHIPPED, 'DDD' )  The TRUE result (1) is an integer and the FALSE result (.3333) is a
GET_DATE_PART( DATE_SHIPPED, 'DY' ) decimal.
GET_DATE_PART( DATE_SHIPPED, 'DAY' )  The Decimal datatype has greater precision than Integer, so the
datatype of the return value is always a Decimal
DATE_SHIPPED RETURN VALUE
Mar 13 1997 12:00:00 AM 13  You can often use a Filter transformation instead of IIF to
June 3 1997 11:30:44PM 3 maximize session performance.
Aug 22 1997 12:00:00PM 22
NULL NULL IN
 Matches input data to a list of values. By default, the match is
case sensitive.
GREATEST
 Returns the greatest value from a list of input values. Syntax
 Use this function to return the greatest string, date, or number. IN( valueToSearch, value1, [value2, ..., valueN,] CaseFlag )
109
FIRST_NAME RETURN VALUE
Example ramona Ramona
 The following expression determines if the input value is a safety 18-albert 18-Albert
knife, chisel point knife, or medium titanium knife. NULL NULL
 The input values do not have to match the case of the values in ?!SAM ?!Sam
the comma-separated list: THOMAS Thomas
IN( ITEM_NAME, ‘Chisel Point Knife’, ‘Medium Titanium Knife’, ‘Safety Knife’, 0 PierRe Pierre

ITEM_NAME RETURN VALUE INSTR


Stabilizing Vest 0 (FALSE)  Returns the position of a character set in a string, counting from
Safety knife 1 (TRUE) left to right.
Medium Titanium knife 1 (TRUE)
NULL Syntax
INSTR( string, search_value [,start [,occurrence [,comparison_type ]]] )
INDEXOF
 Finds the index of a value among a list of values. By default, the Return Value
match is case sensitive.  Integer if the search is successful. Integer represents the position
of the first character in the search_value, counting from left to
Syntax right.
INDEXOF( valueToSearch, string1, [string2, ..., stringN,] CaseFlag )  0 if the search is unsuccessful.
 NULL if a value passed to the function is NULL
Example
 The following expression determines if values from the  The following expression returns the position of the second
ITEM_NAME port match the first, second, or third string: occurrence of the letter ‘a’, starting at the beginning of each
INDEXOF( ITEM_NAME, ‘diving hood’, ‘flashlight’, ‘safety knife’) company name.
 Because the search_value argument is case sensitive, it skips the
ITEM_NAME RETURN VALUE ‘A’ in ‘Blue Fin Aqua Center’, and returns 0:
Safety Knife 0 INSTR( COMPANY, 'a', 1, 2 )
diving hood 1
Compass 0
safety knife 3
flashlight 2 COMPANY RETURN VALUE
Blue Fin Aqua Center 0
 Safety Knife returns a value of 0 because it does not match the Maco Shark Shop 8
case of the input value. Scuba Gear 9
Frank's Dive Shop 0
INITCAP VIP Diving Club 0
 Capitalizes the first letter in each word of a string and converts all
other letters to lowercase.  The following expression returns the position of the first
 Words are delimited by white space (a blank space, formfeed, character in the string ‘Blue Fin Aqua Center’ (starting from the
newline, carriage return, tab, or vertical tab) and characters that last character in the company name):
are not alphanumeric. INSTR( COMPANY, 'Blue Fin Aqua Center', -1, 1 )
 For example, if you pass the string ‘…THOMAS’, the function
returns Thomas. COMPANY RETURN VALUE
Blue Fin Aqua Center 1
Syntax Maco Shark Shop 0
INITCAP( string ) Scuba Gear 0
Frank's Dive Shop 0
Example VIP Diving Club 0
 The following expression capitalizes all names in the FIRST_NAME
port: INITCAP( FIRST_NAME )  You can nest the INSTR function within other functions to
accomplish more complex tasks.

110
 The following expression evaluates a string, starting from the end '+3.45E-32'
of the string. '+3.45d+32' (Windows only)
 The expression finds the last (rightmost) space in the string and '+3.45D-32' (Windows only)
then returns all characters to the left of it: '.6804'
SUBSTR( CUST_NAME,1,INSTR( CUST_NAME,' ' ,-1,1 ))  The output port for an IS_NUMBER expression must be a String
or Numeric datatype.
CUST_NAME RETURN VALUE
PATRICIA JONES PATRICIA ITEM_PRICE RETURN VALUE
MARY ELLEN SHAH MARY ELLEN '123.00' 1 (True)
'-3.45e+3' 1 (True)
 The following expression removes the character '#' from a string: '-3.45D-3' 1 (True - Windows only)
SUBSTR( CUST_ID, 1, INSTR(CUST_ID, '#')-1 ) || SUBSTR( CUST_ID, '-3.45d-3' 0 (False - UNIX only)
INSTR(CUST_ID, '#')+1 ) '3.45E-' 0 (False) Incomplete number
'' 0 (False) Consists entirely of blanks
CUST_ID RETURN VALUE '' 0 (False) Empty string
ID#33 ID33 '+123abc' 0 (False)
#A3577 A3577 ' 123' 1 (True) Leading white blanks
SS #712403399 SS 712403399 '123 ' 1 (True) Trailing white blanks
'ABC' 0 (False)
ISNULL '-ABC' 0 (False)
 Returns whether a value is NULL. ISNULL evaluates an empty NULL NULL
string as FALSE.
Note: To test for empty strings, use LENGTH. IS_SPACES
 Returns whether a string value consists entirely of spaces.
Syntax  A space is a blank space, a formfeed, a newline, a carriage return,
ISNULL( value ) a tab, or a vertical tab.
 IS_SPACES evaluates an empty string as FALSE because there
Example are no spaces. To test for an empty string, use LENGTH.
 The following example checks for null values in the items table:
ISNULL( ITEM_NAME )
Example
ITEM_NAME RETURN VALUE
 The following expression checks the ITEM_NAME port for rows
Flashlight 0 (FALSE)
that consist entirely of spaces:
NULL 1 (TRUE) IS_SPACES( ITEM_NAME )
Regulator system 0 (FALSE)
'' 0 (FALSE) Empty string is not NULL ITEM_NAME RETURN VALUE
Flashlight 0 (False)
IS_NUMBER 1 (True)
 Returns whether a string is a valid number. A valid number Regulator system 0 (False)
consists of the following parts: NULL NULL
- Optional space before the number '' 0 (FALSE) (Empty string does not
- Optional sign (+/-) contain spaces.)
- One or more digits with an optional decimal point
- Optional scientific notation, such as the letter ‘e’ or ‘E’ (and the LAST
letter ‘d’ or ‘D’ on Windows) followed by an optional sign (+/-),  Returns the last row in the selected port.
followed by one or more digits  Optionally, you can apply a filter to limit the rows the
- Optional white space following the number PowerCenter Integration Service reads.
 You can nest only one other aggregate function within LAST
 The following numbers are all valid:
' 100 ' Syntax
' +100' LAST( value [, filter_condition ] )
'-100'
'-3.45e+32' Return Value
111
 Last row in a port.
 NULL if all values passed to the function are NULL, or if no rows LENGTH
are selected (for example, the filter condition evaluates to FALSE  Returns the number of characters in a string, including trailing
or NULL for all rows). blanks.

LAST_DAY Return Value


 Returns the date of the last day of the month for each date in a  Integer representing the length of the string.
port.  NULL if a value passed to the function is NULL

Syntax LN
LAST_DAY( date )  Returns the natural logarithm of a numeric value. For example,
LN(3) returns 1.098612.
Return Value  You usually use this function to analyze scientific data rather than
 Date. The last day of the month for that date value you pass to business data.
this function  This function is the reciprocal of the function EXP.

 If a value is NULL, LAST_DAY ignores the row. However, if all Syntax


values passed from the port are NULL, LAST_DAY returns NULL LN( numeric_value )
 LAST_DAY groups values based on group by ports you define in
the transformation, returning one result for each group. If there Return Value
is no group by port, LAST_DAY treats all rows as one group,  Double value.
returning one value  NULL if a value passed to the function is NULL

Examples LOG
The following expression returns the last day of the month for  Returns the logarithm of a numeric value.
each date in the ORDER_DATE port:  Most often, you use this function to analyze scientific data rather
LAST_DAY( ORDER_DATE ) than business data
Syntax
ORDER_DATE RETURN VALUE LOG( base, exponent )
Apr 1 1998 12:00:00AM Apr 30 1998 12:00:00AM
Jan 6 1998 12:00:00AM Jan 31 1998 12:00:00AM Return Value
Feb 2 1996 12:00:00AM Feb 29 1996 12:00:00AM (Leap year)  Double value.
NULL NULL  NULL if a value passed to the function is NULL
Jul 31 1998 12:00:00AM Jul 31 1998 12:00:00AM
Example
LEAST  The following expression returns the logarithm for all values in
 Returns the smallest value from a list of input values. the NUMBERS port:
 By default, the match is case sensitive. LOG( BASE, EXPONENT )

Syntax BASE EXPONENT RETURN VALUE


LEAST( value1, [value2, ..., valueN,] CaseFlag ) 15 1 0
.09 10 -0.956244644696599
Example NULL 18 NULL
The following expression returns the smallest quantity of items 35.78 NULL NULL
ordered: -9 18 Error. (PowerCenter Integration
LEAST( QUANTITY1, QUANTITY2, QUANTITY3 ) Service does not write the row.)
0 5 Error. (PowerCenter Integration
QUANTITIY1 QUANTITY2 QUANTITY3 RETURN VALUE Service does not write the row.)
150 756 27 27 10 -2 Error. (PowerCenter Integration
NULL Service does not write the row.)
5000 97 17 17
120 1724 965 120

112
 The PowerCenter Integration Service displays an error and does  Lowercase character string. If the data contains multibyte
not write the row if you pass a negative number, 0, or 1 as a characters, the return value depends on the code page and data
base value, or if you pass a negative value for the exponent. movement mode of the Integration Service.
 NULL if a value in the selected port is NULL.
LOOKUP
Note: This function is not supported in mapplets. LPAD
 Use the Lookup transformation rather than the LOOKUP function  Adds a set of blanks or characters to the beginning of a string to
to look up values in PowerCenter mappings. set the string to a specified length.
 If you use the LOOKUP function in a mapping, you need to enable
the lookup caching option for 3.5 compatibility in the session Syntax
properties LPAD( first_string, length [,second_string] )
 This option exists expressly for PowerMart 3.5 users who want to
continue using the LOOKUP function, rather than creating Return Value
Lookup transformation  String of the specified length.
 NULL if a value passed to the function is NULL or if length is a
Syntax negative number.
LOOKUP( result, search1, value1 [, search2, value2]... )
Examples
Return Value  The following expression standardizes numbers to six digits by
 Result if all searches find matching values. If the PowerCenter padding them with leading zeros.
Integration Service finds matching values, it returns the result LPAD( PART_NUM, 6, '0')
from the same row as the search1 argument.
 NULL if the search does not find any matching values. PART_NUM RETURN VALUE
 Error if the search finds more than one matching value. 702 000702
1 000001
0553 000553
484834 484834
Example
 The following expression searches the lookup source :TD.SALES  LPAD counts the length from left to right.
for a specific item ID and price, and returns the item name if  If the first string is longer than the length, LPAD truncates the
both searches find a match: string from right to left.
LOOKUP( :TD.SALES.ITEM_NAME, :TD.SALES.ITEM_ID, 10,  For example, LPAD(‘alphabetical’, 5, ‘x’) returns the string ‘alpha’.
:TD.SALES.PRICE, 15.99 )
 If the second string is longer than the total characters needed to
ITEM_NAME ITEM_ID PRICE return the specified length, LPAD uses a portion of the second
Regulator 5 100.00 string:
Flashlight 10 15.99 LPAD( ITEM_NAME, 16, '*..*' )
Halogen Flashlight 15 15.99
NULL 20 15.99 ITEM_NAME RETURN VALUE
RETURN VALUE: Flashlight Flashlight *..**.Flashlight
Compass *..**..**Compass
 When you compare char and varchar values, the LOOKUP Regulator System Regulator System
function returns a result only if the two rows match. Safety Knife *..*Safety Knife
 This means that both the value and the length for each row must
match (use trim) LTRIM
 Removes blanks or characters from the beginning of a string.
LOWER  You can use LTRIM with IIF or DECODE in an Expression or Update
 Converts uppercase string characters to lowercase Strategy transformation to avoid spaces in a target table.
 If you do not specify a trim_set parameter in the expression:
Return Value - In UNICODE mode, LTRIM removes both single- and double-byte
spaces from the beginning of a string.
- In ASCII mode, LTRIM removes only single-byte spaces.
113
 Removes blanks or characters from the end of a string.
 If you use LTRIM to remove characters from a string, LTRIM
compares the trim_set to each character in the string argument,  If you do not specify a trim_set parameter in the expression:
character-by-character, starting with the left side of the string. - In UNICODE mode, RTRIM removes both single- and double-
 If the character in the string matches any character in the byte spaces from the end of a string.
trim_set, LTRIM removes it. - In ASCII mode, RTRIM removes only single-byte spaces.
 LTRIM continues comparing and removing characters until it fails
to find a matching character in the trim_set.  If you use RTRIM to remove characters from a string, RTRIM
 Then it returns the string, which does not include matching compares the trim_set to each character in the string argument,
characters. character-by-character, starting with the right side of the string.
 If the character in the string matches any character in the
Syntax trim_set, RTRIM removes it.
LTRIM( string [, trim_set] )  RTRIM continues comparing and removing characters until it fails
to find a matching character in the trim_set.
Return Value  It returns the string without the matching characters.
 String. The string values with the specified characters in the
trim_set argument removed. Syntax
 NULL if a value passed to the function is NULL. If the trim_set is RTRIM( string [, trim_set] )
NULL, the function returns NULL.
Example
Example  The following expression removes the characters ‘re’ from the
 The following expression removes the characters ‘S’ and ‘.’ from strings in the LAST_NAME port:
the strings in the LAST_NAME port: RTRIM( LAST_NAME, 're')
LTRIM( LAST_NAME, 'S.')

LAST_NAME RETURN VALUE


LAST_NAME RETURN VALUE Nelson Nelson
Nelson Nelson Page Pag
Osborne Osborne Osborne Osborn
NULL NULL NULL NULL
S. MacDonald MacDonald Sawyer Sawy
Sawyer awyer H. Bender H. Bend
H. Bender H. Bender Steadman Steadman
Steadman teadman
 RTRIM removes ‘e’ from Page even though ‘r’ is the first character
 LTRIM removes ‘S.’ from S. MacDonald and the ‘S’ from both in the trim_set.
Sawyer and Steadman, but not the period from H. Bender.  This is because RTRIM searches, character-by-character, for the
 This is because LTRIM searches, character-by-character, for the set of characters you specify in the trim_set argument.
set of characters you specify in the trim_set argument.  If the last character in the string matches the first character in the
 If the first character in the string matches the first character in trim_set, RTRIM removes it.
the trim_set, LTRIM removes it.  If, however, the last character in the string does not match,
 Then LTRIM looks at the second character in the string. If it RTRIM compares the second character in the trim_set.
matches the second character in the trim_set, LTRIM removes it,  If the second from last character in the string matches the second
and so on. character in the trim_set, RTRIM removes it, and so on.
 When the first character in the string does not match the
corresponding character in the trim_set, LTRIM returns the  When the character in the string fails to match the trim_set,
string and evaluates the next row. RTRIM returns the string and evaluates the next row.
 In the example of H. Bender, H does not match either character in  In the last example, the last character in Nelson does not match
the trim_set argument, so LTRIM returns the string in the any character in the trim_set argument, so RTRIM returns the
LAST_NAME port and moves to the next row. string 'Nelson' and evaluates the next row.

RTRIM MAKE_DATE_TIME
114
 Returns the date and time based on the input values.
Syntax
Syntax MAX( numeric_value [, filter_condition] )
MAKE_DATE_TIME( year, month, day, hour, minute, second,
nanosecond ) Return Value
 Numeric value
Return Value  NULL if all values passed to the function are NULL or if no rows
Date as MM/DD/YYYY HH24:MI:SS. Returns a null value if you do are selected (for example, the filter condition evaluates to FALSE
not pass the function a year, month, or day. or NULL for all rows).
 Note: If the return value is Decimal with precision greater than
Example 15, you can enable high precision to ensure decimal precision up
The following expression creates a date and time from the input to 28 digits
ports:
MAKE_DATE_TIME( SALE_YEAR, SALE_MONTH, SALE_DAY, SALE_HOUR,  If a value is NULL, MAX ignores it. However, if all values passed
SALE_MIN, SALE_SEC )
from the port are NULL, MAX returns NULL.

SALE_YR SALE_MTH SALE_DAY SALE_HR SALE_MIN SALE_SEC


 MAX groups values based on group by ports you define in the
RETURN VALUE
transformation, returning one result for each group.
2002 10 27 8 36 22
 If there is no group by port, MAX treats all rows as one group,
10/27/2002 08:36:22
returning one value.
2000 6 15 15 17
06/15/2000 15:17:00
2003 1 3 22 45
01/03/2003 00:22:45
MAX/MIN (String)
04 3 30 12 5 10
 Returns the highest string value found within a port or group.
03/30/0004 12:05:10
 You can apply a filter to limit the rows in the search.
99 12 12 5 16
 You can nest only one other aggregate function within MAX.
12/12/0099 05:00:16
Note: The MAX function uses the same sort order that the Sorter
MAX/MIN (Dates)
transformation uses. However, the MAX function is case sensitive,
 Returns the latest date found within a port or group.
and the Sorter transformation may not be case sensitive.
 You can apply a filter to limit the rows in the search.
 You can also use MAX to return the latest date or the largest
 You can nest only one other aggregate function within MAX.
numeric value in a port or group.
 You can also use MAX to return the largest numeric value or the
highest string value in a port or group
Syntax
MAX( string [, filter_condition] )
Syntax
MAX( date [, filter_condition] )
Return Value
 String.
Return Value
 NULL if all values passed to the function are NULL, or if no rows
 Date.
are selected (for example, the filter condition evaluates to FALSE
 NULL if all values passed to the function are NULL, or if no rows
or NULL for all rows).
are selected (for example, the filter condition evaluates to FALSE
or NULL for all rows).
 MAX groups values based on group by ports you define in the
transformation, returning one result for each group.
MAX/MIN (Numbers)
 If there is no group by port, MAX treats all rows as one group,
 Returns the maximum numeric value found within a port or
returning one value
group.
 You can apply a filter to limit the rows in the search.
MD5
 You can nest only one other aggregate function within MAX.
 Calculates the checksum of the input value. The function uses
 You can also use MAX to return the latest date or the highest
Message-Digest algorithm 5 (MD5).
string value in a port or group.

115
 MD5 is a oneway cryptographic hash function with a 128-bit hash  It encodes both uppercase and lowercase letters in uppercase.
value.
 You can conclude that input values are different when the  METAPHONE encodes characters according to the following list of
checksums of the input values are different. Use MD5 to verify rules:
data integrity. - Skips vowels (A, E, I, O, and U) unless one of them is the first
character of the input string. METAPHONE(‘CAR’) returns ‘KR’ and
Syntax METAPHONE(‘AAR’) returns ‘AR’.
MD5( value ) - Uses special encoding guidelines

Return Value MOD


Unique 32-character string of hexadecimal digits 0-9 and a-f.  Returns the remainder of a division calculation. For example,
NULL if the input is a null value. MOD(8,5) returns 3.

MEDIAN Syntax
 Returns the median of all values in a selected port. MOD( numeric_value, divisor )
 If there is an even number of values in the port, the median is the
average of the middle two values when all values are placed Return Value
ordinally on a number line. If there is an odd number of values  Numeric value of the datatype you pass to the function. The
in the port, the median is the middle number. remainder of the numeric value divided by the divisor.
 You can nest only one other aggregate function within MEDIAN,  NULL if a value passed to the function is NULL.
and the nested function must return a Numeric datatype.
Examples
 The PowerCenter Integration Service reads all rows of data to  The following expression returns the modulus of the values in the
perform the median calculation. PRICE port divided by the values in the QTY port:
 The process of reading rows of data to perform the calculation MOD( PRICE, QTY )
may affect performance.
 Optionally, you can apply a filter to limit the rows you read to PRICE QTY RETURN VALUE
calculate the median. 10.00 2 0
12.00 5 2
Syntax 9.00 2 1
MEDIAN( numeric_value [, filter_condition ] ) 15.00 3 0
NULL 3 NULL
Return Value 20.00 NULL NULL
 Numeric value. 25.00 0 Error. Integration Service does not write row.
 NULL if all values passed to the function are NULL, or if no rows
are selected. For example, the filter condition evaluates to FALSE  The last row (25, 0) produced an error because you cannot divide
or NULL for all rows. by 0.
 To avoid dividing by 0, you can create an expression similar to the
Note: If the return value is Decimal with precision greater than following, which returns the modulus of Price divided by
15, you can enable high precision to ensure decimal precision up Quantity only if the quantity is not 0.
to 28 digits.  If the quantity is 0, the function returns NULL:
MOD( PRICE, IIF( QTY = 0, NULL, QTY ))
 MEDIAN groups values based on group by ports you define in the
transformation, returning one result for each group. PRICE QTY RETURN VALUE
 If there is no group by port, MEDIAN treats all rows as one group, 10.00 2 0
returning one value 12.00 5 2
9.00 2 1
METAPHONE 15.00 3 0
 Encodes string values. You can specify the length of the string NULL 3 NULL
that you want to encode. 20.00 NULL NULL
 METAPHONE encodes characters of the English language 25.00 0 NULL
alphabet (A-Z).
116
 The last row (25, 0) produced a NULL rather than an error
because the IIF function replaces NULL with the 0 in the QTY ROW_NO SALES RETURN VALUE
port. 1 600 NULL
2 504 NULL
MOVINGAVG 3 36 NULL
 Returns the average (row-by-row) of a specified set of rows. 4 100 NULL
 Optionally, you can apply a condition to filter rows before 5 550 1790
calculating the moving average. 6 39 1229
7 490 1215
Syntax
MOVINGAVG( numeric_value, rowset [, filter_condition] )  The function returns the sum for a set of five rows: 1790 based
on rows 1 through 5, 1229 based on rows 2 through 6, and 1215
Return Value based on rows 3 through 7
Numeric value.

 MOVINGAVG ignores null values when calculating the moving


average. However, if all values are NULL, the function returns
NULL thereafter, returns the average for the last five rows read:
MOVINGAVG( SALES, 5 )
NPER
ROW_NO SALES RETURN VALUE  Returns the number of periods for an investment based on a
1 600 NULL constant interest rate and periodic, constant payments.
2 504 NULL
3 36 NULL Syntax
4 100 NULL NPER( rate, present value, payment [, future value, type] )
5 550 358
6 39 245.8 Return Value
7 490 243 Numeric.

 The function returns the average for a set of five rows: 358 based Example
on rows 1 through 5, 245.8 based on rows 2 through 6, and 243  The present value of an investment is $2,000. Each payment is
based on rows 3 through 7. $500 and the future value of the investment is $20,000.
 The following expression returns 9 as the number of periods for
MOVINGSUM which you need to make the payments:
 Returns the sum (row-by-row) of a specified set of rows. NPER( 0.01, -2000, -500, 20000, TRUE )
 Optionally, you can apply a condition to filter rows before
calculating the moving sum. PERCENTILE
 Calculates the value that falls at a given percentile in a group of
Syntax numbers.
MOVINGSUM( numeric_value, rowset [, filter_condition] )  You can nest only one other aggregate function within
PERCENTILE, and the nested function must return a Numeric
Return Value datatype.
Numeric value.  The PowerCenter Integration Service reads all rows of data to
perform the percentile calculation.
 MOVINGSUM ignores null values when calculating the moving  The process of reading rows to perform the calculation may affect
sum. However, if all values are NULL, the function returns NULL. performance.
 Optionally, you can apply a filter to limit the rows you read to
Example calculate the percentile
 The following expression returns the sum of orders for a
Stabilizing Vest, based on the first five rows in the Sales port, Syntax
and thereafter, returns the average for the last five rows read: PERCENTILE( numeric_value, percentile [, filter_condition ] )
MOVINGSUM( SALES, 5 )

117
Return Value  The following expression returns -2111.64 as the monthly
Numeric value. payment amount of a loan:
PMT( 0.01, 10, 20000 )
 If a value is NULL, PERCENTILE ignores the row. However, if all
values in a group are NULL, PERCENTILE returns NULL. POWER
 Returns a value raised to the exponent you pass to the function.
 PERCENTILE groups values based on group by ports you define in
the transformation, returning one result for each group. Syntax
 If there is no group by port, PERCENTILE treats all rows as one POWER( base, exponent )
group, returning one value
Return Value
Example Double value.
 The PowerCenter Integration Service calculates a percentile using
the following logic: Example
 Use the following guidelines for this equation:  The following expression returns the values in the Numbers port
- x is the number of elements in the group of values for which raised to the values in the Exponent port:
you are calculating a percentile. POWER( NUMBERS, EXPONENT )
- If i < 1, PERCENTILE returns the value of the first element in the NUMBERS EXP RETURN VALUE
list. 10.0 2.0 100
- If i is an integer value, PERCENTILE returns the value of the ith 3.5 6.0 1838.265625
element in the list 3.5 5.5 982.594307804838
- Otherwise PERCENTILE returns the value of n: NULL 2.0 NULL
10.0 NULL NULL
 The following expression returns the salary that falls at the 75th -3.0 -6.0 0.00137174211248285
percentile of salaries greater than $50,000: 3.0 -6.0 0.00137174211248285
PERCENTILE( SALARY, 75, SALARY > 50000 ) -3.0 6.0 729.0
-3.0 5.5 729.0
SALARY
125000.0  The value -3.0 raised to 6 returns the same results as -3.0 raised
27900.0 to 5.5.
100000.0  If the base is negative, the exponent must be an integer.
NULL  Otherwise, the PowerCenter Integration Service rounds the
55000.0 exponent to the nearest integer value.
9000.0
85000.0 PV
86000.0  Returns the present value of an investment.
48000.0
99000.0 Syntax
RETURN VALUE: 106250.0 PV( rate, terms, payment [, future value, type] )

PMT Return Value


 Returns the payment for a loan based on constant payments and Numeric.
a constant interest rate.
Example
Syntax  The following expression returns 12,524.43 as the amount you
PMT( rate, terms, present value[, future value, type] ) must deposit in the account today to have a future value of
$20,000 in one year if you also deposit $500 at the beginning of
Return Value each period:
Numeric. PV( 0.0075, 12, -500, 20000, TRUE )

Example RAND

118
 Returns a random number between 0 and 1. This is useful for  You might use REG_EXTRACT in an expression to extract middle
probability scenarios. names from a regular expression that matches first name,
middle name, and last name.
Syntax  For example, the following expression returns the middle name
RAND( seed ) of a regular expression:
REG_EXTRACT( Employee_Name, '(\w+)\s+(\w+)\s+(\w+)',2)
Return Value
Numeric. Employee_Name Return Value
Stephen Graham Smith Graham
 For the same seed, the PowerCenter Integration Service Juan Carlos Fernando Carlos
generates the same sequence of numbers
REG_MATCH
Example  Returns whether a value matches a regular expression pattern.
 The following expression may return a value of  This lets you validate data patterns, such as IDs, telephone
0.417022004702574: numbers, postal codes, and state names.
RAND (1) Note: Use the REG_REPLACE function to replace a character
pattern in a string with a new character pattern.
RATE
 Returns the interest rate earned per period by a security. Syntax
REG_MATCH( subject, pattern )
Syntax
RATE( terms, payment, present value[, future value, type] ) Return Value
 TRUE if the data matches the pattern.
Return Value  FALSE if the data does not match the pattern.
Numeric.  NULL if the input is a null value or if the pattern is NULL.

Example Example
 The following expression returns 0.0077 as the monthly interest  You might use REG_MATCH in an expression to validate
rate of a loan: telephone numbers.
RATE( 48, -500, 20000 )  For example, the following expression matches a 10-digit
telephone number against the pattern and returns a Boolean
 To calculate the annual interest rate of the loan, multiply 0.0077 value based on the match:
by 12. The annual interest rate is 0.0924 or 9.24%. REG_MATCH (Phone_Number, '(\d\d\d-\d\d\d-\d\d\d\d)' )

REG_EXTRACT Phone_Number Return Value


 Extracts subpatterns of a regular expression within an input 408-555-1212 TRUE
value. NULL
 For example, from a regular expression pattern for a full name, 510-555-1212 TRUE
you can extract the first name or last name. 92 555 51212 FALSE
 Note: Use the REG_REPLACE function to replace a character 650-555-1212 TRUE
pattern in a string with another character pattern 415-555-1212 TRUE
831 555 12123 FALSE
Syntax
REG_EXTRACT( subject, 'pattern', subPatternNum )  You can also use REG_MATCH for the following tasks:
- To verify that a value matches a pattern. This use is similar to
Return Value the SQL LIKE function.
 Returns the value of the nth subpattern that is part of the input - To verify that values are characters. This use is similar to the SQL
value. The nth subpattern is based on the value you specify for IS_CHAR function.
subPatternNum.
 NULL if the input is a null value or if the pattern is null  To verify that a value matches a pattern, use a period (.) and an
asterisk (*) with the REG_MATCH function in an expression.
Example

119
 A period matches any one character. An asterisk matches 0 or
more instances of values that follow it. Return Value
 String.
 For example, use the following expression to find account  Empty string if REPLACECHR removes all characters in InputString.
numbers that begin with 1835:  NULL if InputString is NULL.
REG_MATCH(ACCOUNT_NUMBER, ‘1835.*’)  InputString if OldCharSet is NULL or empty

 To verify that values are characters, use a REG_MATCH function


with the regular expression [a-zA-Z]+. a-z matches all lowercase
characters.
 A-Z matches all uppercase characters.
 The plus sign (+) indicates that there should be at least one
character.

 For example, use the following expression to verify that a list of


last names contain only characters:
REG_MATCH(LAST_NAME, ‘[a-zA-Z]+’)

REG_REPLACE
 Replaces characters in a string with another character pattern.
 By default, REG_REPLACE searches the input string for the
character pattern you specify and replaces all occurrences with
the replacement pattern.
 You can also indicate the number of occurrences of the pattern
you want to replace in the string.

Syntax REPLACESTR
REG_REPLACE( subject, pattern, replace, numReplacements )  Replaces characters in a string with a single character, multiple
characters, or no character.
Return Value  REPLACESTR searches the input string for all strings you specify
String and replaces them with the new string you specify.
Syntax
Example REPLACESTR ( CaseFlag, InputString, OldString1, [OldString2, ...
 The following expression removes additional spaces from the OldStringN,] NewString )
Employee name data for each row of the Employee_name port: Return Value
REG_REPLACE( Employee_Name, ‘\s+’, ‘ ’)  String.
 Empty string if REPLACESTR removes all characters in InputString.
Employee_Name RETURN VALUE  NULL if InputString is NULL.
Adam Smith Adam Smith  InputString if all OldString arguments are NULL or empty.
Greg Sanders Greg Sanders
Sarah Fe Sarah Fe
Sam Cooper Sam Cooper

REPLACECHR
 Replaces characters in a string with a single character or no
character.
 REPLACECHR searches the input string for the characters you
specify and replaces all occurrences of all characters with the
new character you specify.

Syntax
REPLACECHR( CaseFlag, InputString, OldCharSet, NewChar )

120
- The session is configured for a test load.
- The session is a debug session.
- The session runs in debug mode and is configured to discard
session output.

Syntax
SETCOUNTVARIABLE( $$Variable )

SET_DATE_PART
 Sets one part of a Date/Time value to a value you specify. With
SET_DATE_PART, you can change the following parts of a date:

Year - Change the year by entering a positive integer in the value


argument. Use any of the year format strings:
Y, YY, YYY, or YYYY to set the year. For example, the following
expression changes the year to 2001 for all dates in the
SHIP_DATE port:
SET_DATE_PART( SHIP_DATE, 'YY', 2001 )
Month - Change the month by entering a positive integer
REVERSE between 1 and 12 (January=1 and December=12) in the value
Reverses the input string. argument. Use any of the month format strings: MM, MON,
MONTH to set the month. For example, the following expression
Syntax changes the month to October for all dates in the SHIP_DATE
REVERSE( string ) port:
SET_DATE_PART( SHIP_DATE, 'MONTH', 10 )
<Print 123-127> Day - Change the day by entering a positive integer between 1
and 31 (except for the months that have less than 31 days:
SETCOUNTVARIABLE February, April, June, September, and November) in the value
 Counts the rows evaluated by the function and increments the argument. Use any of the month format strings (D, DD, DDD, DY,
current value of a mapping variable based on the count. and DAY) to set the day. For example, the following expression
 Increases the current value by one for each row marked for changes the day to 10 for all dates in the SHIP_DATE port:
insertion. SET_DATE_PART( SHIP_DATE, 'DD', 10 )
 Decreases the current value by one for each row marked for
deletion. Syntax
 Keeps the current value the same for each row marked for SET_DATE_PART( date, format, value )
update or reject.
 Returns the new current value. Return Value
 Date in the same format as the source date with the specified
 At the end of a successful session, the PowerCenter Integration part changed.
Service saves the last current value to the repository.  NULL if a value passed to the function is NULL.
 When used with a session that contains multiple partitions, the
PowerCenter Integration Service generates different current Examples
values for each partition.  The following expressions change the hour to 4PM for each date
 At the end of the session, it determines the total count for all in the DATE_PROMISED port:
partitions and saves the total to the repository. SET_DATE_PART( DATE_PROMISED, 'HH', 16 )
 Unless overridden, it uses the saved value as the initial value of SET_DATE_PART( DATE_PROMISED, 'HH12', 16 )
the variable for the next time you use this session. SET_DATE_PART( DATE_PROMISED, 'HH24', 16 )

 The PowerCenter Integration Service does not save the final value DATE_PROMISED RETURN VALUE
of a mapping variable to the repository when any of the Jan 1 1997 12:15:56AM Jan 1 1997 4:15:56PM
following are true: Feb 13 1997 2:30:01AM Feb 13 1997 4:30:01PM
- The session fails to complete. Mar 31 1997 5:10:15PM Mar 31 1997 4:10:15PM

121
Dec 12 1997 8:07:33AM Dec 12 1997 4:07:33PM  Then, it saves the largest value to the repository. For example,
NULL NULL the last evaluated value for $$MaxItems in each partition is as
follows:
SETMAXVARIABLE/SETMINVARIABLE Partition Final Current Value for $$MaxItems
 Sets the current value of a mapping variable to the higher of two Partition 1 35
values: the current value of the variable or the value you specify. Partition 2 23
 Returns the new current value. Partition 3 22
 The function executes only if a row is marked as insert.
 SETMAXVARIABLE ignores all other row types and the current SETVARIABLE
value remains unchanged.  Sets the current value of a mapping variable to a value you
specify.
 At the end of a successful session, the PowerCenter Integration  Returns the specified value.
Service saves the final current value to the repository.  The SETVARIABLE function executes only if a row is marked as
 When used with a session that contains multiple partitions, the insert or update. SETVARIABLE ignores all other row types and
PowerCenter Integration Service generates different current the current value remains unchanged.
values for each partition.
 At the end of the session, it saves the highest current value  At the end of a successful session, the PowerCenter Integration
across all partitions to the repository. Service compares the final current value of the variable to the
 Unless overridden, it uses the saved value as the initial value of start value of the variable.
the variable for the next session run.  Based on the aggregate type of the variable, it saves a final
 When used with a string mapping variable, SETMAXVARIABLE current value to the repository.
returns the higher string based on the sort order selected for  Unless overridden, it uses the saved value as the initial value of
the session. the variable for the next session run.

Examples Return Value


 The following expression compares the number of items Current value of the variable.
purchased in each transaction with a mapping variable $ When value is NULL, the PowerCenter Integration Service returns
$MaxItems. the current value of $$Variable.
 It sets $$MaxItems to the higher of two values and returns the
historically highest number of items purchased in a single Examples
transaction to the MAX_ITEMS port.  The following expression sets a mapping variable $$Time to the
 The initial value of $$MaxItems from the previous session run is system date at the time the PowerCenter Integration Service
22. evaluates the row and returns the system date to the SET_$
SETMAXVARIABLE ($$MAXITEMS, ITEMS) $TIME port:
SETVARIABLE ($$Time, SYSDATE)
TRANSACTION ITEMS MAX_ITEMS
0100002 12 22 TRANSACTION TOTAL SET_$$TIME
0100003 5 22 0100002 534.23 10/10/2000 01:34:33
0100004 18 22 0100003 699.01 10/10/2000 01:34:34
0100005 35 35 0100004 97.50 10/10/2000 01:34:35
0100006 5 35 0100005 116.43 10/10/2000 01:34:36
0100007 14 35 0100006 323.95 10/10/2000 01:34:37
 At the end of the session, the PowerCenter Integration Service
 At the end of the session, the PowerCenter Integration Service saves 10/10/2000 01:34:37 to the repository as the last
saves ‘35’ to the repository as the maximum current value for $ evaluated current value for $$Time. The next time the session
$MaxItems. runs, the PowerCenter Integration Service evaluates all
 The next time the session runs, the PowerCenter Integration references to $$Time to 10/10/2000 01:34:37
Service evaluates the initial value to $$MaxItems to ‘35’.
SIGN
 If the same session contains three partitions, the PowerCenter Returns whether a numeric value is positive, negative, or 0.
Integration Service evaluates $$MaxItems for each partition.
Syntax
122
SIGN( numeric_value ) - Skips numbers in string. For example, both SOUNDEX(‘Joh12n’)
and SOUNDEX(‘1John’) return ‘J500’.
Return Value - Returns NULL if string is NULL or if all the characters in string are
 -1 for negative values. not letters of the English alphabet.
 0 for 0.
 1 for positive values. Syntax
 NULL if NULL. SOUNDEX( string )

SOUNDEX Return Value


 Encodes a string value into a four-character string.  String.
 SOUNDEX works for characters in the English alphabet (A-Z).  NULL if one of the following conditions is true:
 It uses the first character of the input string as the first character - If value passed to the function is NULL.
in the return value and encodes the remaining three unique - No character in string is a letter of the English alphabet.
consonants as numbers. - string is empty.
 SOUNDEX encodes characters according to the following list of Example
rules:  The following expression encodes the values in the
- Uses the first character in string as the first character in the EMPLOYEE_NAME port:
return value and encodes it in uppercase. For example, both SOUNDEX( EMPLOYEE_NAME )
SOUNDEX(‘John’) and SOUNDEX(‘john’) return ‘J500’.
- Encodes the first three unique consonants following the first EMPLOYEE_NAME RETURN VALUE
character in string and ignores the rest. For example, both John J500
SOUNDEX(‘JohnRB’) and SOUNDEX(‘JohnRBCD’) return ‘J561’. William W450
- Assigns a single code to consonants that sound alike. jane J500
joh12n J500
 The following table lists SOUNDEX encoding guidelines for 1abc A120
consonants: NULL NULL

STDDEV
 Returns the standard deviation of the numeric values you pass to
this function.
 STDDEV is used to analyze statistical data.
 You can nest only one other aggregate function within STDDEV,
and the nested function must return a Numeric datatype.

Syntax
STDDEV( numeric_value [,filter_condition] )

Return Value
 Numeric value.
 NULL if all values passed to the function are NULL or if no rows
are selected (for example, the filter condition evaluates to FALSE
- Skips the characters A, E, I, O, U, H, and W unless one of them is
or NULL for all rows).
the first character in string. For example, SOUNDEX(‘A123’)
returns ‘A000’ and SOUNDEX(‘MAeiouhwC’) returns ‘M000’.
 STDDEV groups values based on group by ports you define in the
- If string produces fewer than four characters, SOUNDEX pads
transformation, returning one result for each group.
the resulting string with zeroes. For example, SOUNDEX(‘J’)
 If there is no group by port, STDDEV treats all rows as one group,
returns ‘J000’.
returning one value.
- If string contains a set of consecutive consonants that use the
same code listed in “SOUNDEX” on page 140, SOUNDEX encodes
SUBSTR
the first occurrence and skips the remaining occurrences in the
 Returns a portion of a string.
set. For example, SOUNDEX(‘AbbpdMN’) returns ‘A135’.
 SUBSTR counts all characters, including blanks, starting at the
beginning of the string.
123
Syntax  The return value is ‘bcd’. Compare this result to the following
SUBSTR( string, start [,length] ) example:
SUBSTR('abcd', -2, 8)
Return Value  The return value is ‘cd’.
 String.
 Empty string if you pass a negative or 0 length value. SUM
 NULL if a value passed to the function is NULL.  Returns the sum of all values in the selected port.
 Optionally, you can apply a filter to limit the rows you read to
Examples calculate the total.
 The following expressions return the area code for each row in  You can nest only one other aggregate function within SUM, and
the Phone port: the nested function must return a Numeric datatype
SUBSTR( PHONE, 0, 3 ) Syntax
SUM( numeric_value [, filter_condition ] )
PHONE RETURN VALUE
809-555-0269 809 Return Value
357-687-6708 357  Numeric value.
NULL NULL  NULL if all values passed to the function are NULL or if no rows
are selected (for example, the filter condition evaluates to FALSE
 You can also pass a negative start value to return the phone or NULL for all rows).
number for each row in the Phone port.
 The expression still reads the source string from left to right when  You can perform arithmetic on the values passed to SUM before
returning the result of the length argument: the function calculates the total. For example:
SUBSTR( PHONE, -8, 3 ) SUM( QTY * PRICE - DISCOUNT )

PHONE RETURN VALUE SYSTIMESTAMP


808-555-0269 555  Returns the current date and time of the node hosting the
809-555-3915 555 PowerCenter Integration Service with precision to the
357-687-6708 687 nanosecond.
NULL NULL  The precision to which you display the date and time depends on
the platform.
 You can nest INSTR in the start or length argument to search for a  The return value of the function varies depending on how you
specific string and return its position. configure the argument:
 The following expression evaluates a string, starting from the end - When you configure the argument of SYSTIMESTAMP as a
of the string. variable, the PowerCenter Integration Service evaluates the
 The expression finds the last (rightmost) space in the string and function for each row in the transformation.
then returns all characters preceding it: - When you configure the argument of SYSTIMESTAMP as a
SUBSTR( CUST_NAME,1,INSTR( CUST_NAME,' ' ,-1,1 ) - 1 ) constant, the PowerCenter Integration Service evaluates the
function once and retains the value for each row in the
CUST_NAME RETURN VALUE transformation.
PATRICIA JONES PATRICIA
MARY ELLEN SHAH MARY ELLEN Syntax
SYSTIMESTAMP( [format] )
 The following expression removes the character '#' from a string:
SUBSTR( CUST_ID, 1, INSTR(CUST_ID, '#')-1 ) || SUBSTR( CUST_ID, TO_BIGINT
INSTR(CUST_ID, '#')+1 )  Converts a string or numeric value to a bigint value.
 TO_BIGINT syntax contains an optional argument that you can
 When the length argument is longer than the string, SUBSTR choose to round the number to the nearest integer or truncate
returns all the characters from the start position to the end of the decimal portion.
the string.  TO_BIGINT ignores leading blanks.
 Consider the following example:
SUBSTR('abcd', 2, 8)
Syntax

124
TO_BIGINT( value [, flag] )
DATE_PROMISED RETURN VALUE
Return Value Apr 1 1998 12:00:10AM '04/01/1998 00:00:10.000000'
 Bigint. Feb 22 1998 01:31:10PM '02/22/1998 13:31:10.000000'
 NULL if the value passed to the function is NULL. Oct 24 1998 02:12:30PM '10/24/1998 14:12:30.000000'
 0 if the value passed to the function contains alphanumeric NULL NULL
characters
TO_CHAR (Numbers)
Examples  Converts numeric values to text strings. TO_CHAR also converts
The following expressions use values from the port IN_TAX: dates to strings.
TO_BIGINT( IN_TAX, TRUE ) TO_CHAR converts numeric values to text strings as follows:
- Converts double values to strings of up to 16 digits and provides
IN_TAX RETURN VALUE accuracy up to 15 digits. If you pass a number with more than 15
'7245176201123435.6789' 7245176201123435 digits, TO_CHAR rounds the number to the sixteenth digit.
'7245176201123435.2' 7245176201123435 - Returns decimal notation for numbers in the ranges (-1e16,-1e-
'7245176201123435.2.48' 7245176201123435 16] and [1e-16, 1e16). TO_CHAR returns scientific notation for
NULL NULL numbers outside these ranges.
'A12.3Grove' 0 Note: The PowerCenter Integration Service converts the values
' 176201123435.87' 176201123435 1e-16 and -1e16 to scientific notation, but returns the values 1e-
'-7245176201123435.2’ -7245176201123435 16 and -1e-16 in decimal notation.
'-7245176201123435.23' -7245176201123435
-9223372036854775806.9 -9223372036854775806 Syntax
9223372036854775806.9 9223372036854775806 TO_CHAR( numeric_value )

TO_CHAR (Dates) Return Value


 Converts dates to character strings.  String.
 TO_CHAR also converts numeric values to strings.  NULL if a value passed to the function is NULL.
 You can convert the date into any format using the TO_CHAR
format strings. TO_DATE
 Converts a character string to a Date/Time datatype. You use the
Syntax TO_DATE format strings to specify the format of the source
TO_CHAR( date [,format] ) strings.
 The output port must be Date/Time for TO_DATE expressions.
Return Value  If you are converting two-digit years with TO_DATE, use either
 String. the RR or YY format string. Do not use the YYYY format string.
 NULL if a value passed to the function is NULL.
Syntax
Examples TO_DATE( string [, format] )
 The following expression converts the dates in the
DATE_PROMISED port to text in the format MON DD YYYY: Return Value
TO_CHAR( DATE_PROMISED, 'MON DD YYYY' )  Date.
 TO_DATE always returns a date and time. If you pass a string that
DATE_PROMISED RETURN VALUE does not have a time value, the date returned always includes
Apr 1 1998 12:00:10AM 'Apr 01 1998' the time 00:00:00.000000000.
Feb 22 1998 01:31:10PM 'Feb 22 1998'  You can map the results of this function to any target column
Oct 24 1998 02:12:30PM 'Oct 24 1998' with a datetime datatype.
NULL NULL  If the target column precision is less than nanoseconds, the
PowerCenter Integration Service truncates the datetime value to
 If you omit the format argument, TO_CHAR returns a string in the match the precision of the target column when it writes
date format specified in the session, by default, MM/DD/YYYY datetime values to the target.
HH24:MI:SS.US:  NULL if you pass a null value to this function.
TO_CHAR( DATE_PROMISED )

125
TO_DECIMAL 'A12.3Grove' 0
 Converts a string or numeric value to a decimal value.
TO_DECIMAL ignores leading blanks. TO_INTEGER
 Converts a string or numeric value to an integer. TO_INTEGER
Syntax syntax contains an optional argument that you can choose to
TO_DECIMAL( value [, scale] ) round the number to the nearest integer or truncate the
decimal portion. TO_INTEGER ignores leading blanks.

Syntax
Return Value TO_INTEGER( value [, flag] )
 If the string contains a non-numeric character, converts the
numeric portion of the string up to the first nonnumeric Return Value
character.  Integer.
 If the first numeric character is non-numeric, returns 0.  NULL if the value passed to the function is NULL.
 Decimal of precision and scale between 0 and 28, inclusive.  0 if the value passed to the function contains alphanumeric
 NULL if a value passed to the function is NULL. characters

Example Examples
 This expression uses values from the port IN_TAX. The datatype is  The following expressions use values from the port IN_TAX. The
decimal with precision of 10 and scale of 3: PowerCenter Integration Service displays an error when the
TO_DECIMAL( IN_TAX, 3 ) conversion causes a numeric overflow:
TO_INTEGER( IN_TAX, TRUE )
IN_TAX RETURN VALUE
'15.6789' 15.679 IN_TAX RETURN VALUE
'60.2' 60.200 '15.6789' 15
'118.348' 118.348 '60.2' 60
NULL NULL '118.348' 118
'A12.3Grove' 0 ‘5,000,000,000’ Error. Integration Service doesn't write row.
‘711A1’ 711 NULL NULL
'A12.3Grove' 0
TO_FLOAT ' 123.87' 123
 Converts a string or numeric value to a double-precision floating '-15.6789' -15
point number (the Double datatype).
 TO_FLOAT ignores leading blanks. TRUNC (Dates)
 Truncates dates to a specific year, month, day, hour, minute,
Syntax second, millisecond, or microsecond.
TO_FLOAT( value )  You can also use TRUNC to truncate numbers.
 You can truncate the following date parts:
Return Value Year - If you truncate the year portion of the date, the function
 Double value. returns Jan 1 of the input year with the time set to
 0 if the value in the port is blank or a non-numeric character. 00:00:00.000000000. For example, the following expression
 NULL if a value passed to this function is NULL. returns 1/1/1997 00:00:00.000000000: TRUNC(12/1/1997
3:10:15, 'YY')
Example Month - If you truncate the month portion of a date, the function
 This expression uses values from the port IN_TAX: returns the first day of the month with the time set to
TO_FLOAT( IN_TAX ) 00:00:00.000000000. For example, the following expression
returns 4/1/1997 00:00:00.000000000: TRUNC(4/15/1997
IN_TAX RETURN VALUE 12:15:00, 'MM')
'15.6789' 15.6789 Day - If you truncate the day portion of a date, the function
'60.2' 60.2 returns the date with the time set to 00:00:00.000000000. For
'118.348' 118.348 example, the following expression returns 6/13/1997
NULL NULL 00:00:00.000000000: TRUNC(6/13/1997 2:30:45, 'DD')
126
15.9928 15.992
Syntax NULL NULL
TRUNC( date [,format] )

Return Value
 Date.
 NULL if a value passed to the function is NULL. TRUNC( PRICE, -1 )

Examples PRICE RETURN VALUE


 The following expressions truncate the year portion of dates in 12.99 10.0
the DATE_SHIPPED port: -187.86 -180.0
TRUNC( DATE_SHIPPED, 'Y' ) 56.95 50.0
TRUNC( DATE_SHIPPED, 'YY' ) 1235.99 1230.0
TRUNC( DATE_SHIPPED, 'YYY' )
TRUNC( DATE_SHIPPED, 'YYYY' ) UPPER
Converts lowercase string characters to uppercase.
DATE_SHIPPED RETURN VALUE
Jan 15 1998 2:10:30AM Jan 1 1998 12:00:00.000000000 Syntax
Apr 19 1998 1:31:20PM Jan 1 1998 12:00:00.000000000 UPPER( string )
Jun 20 1998 3:50:04AM Jan 1 1998 12:00:00.000000000
Dec 20 1998 3:29:55PM Jan 1 1998 12:00:00.000000000 VARIANCE
NULL NULL  Returns the variance of a value you pass to it. VARIANCE is used
to analyze statistical data.
TRUNC (Numbers)  You can nest only one other aggregate function within VARIANCE,
 Truncates numbers to a specific digit. You can also use TRUNC to and the nested function must return a Numeric datatype.
truncate dates.
Syntax
Syntax VARIANCE( numeric_value [, filter_condition ] )
TRUNC( numeric_value [, precision] )

 If precision is a positive integer, TRUNC returns numeric_value


with the number of decimal places specified by precision.
 If precision is a negative integer, TRUNC changes the specified
digits to the left of the decimal point to zeros.
 If you omit the precision argument, TRUNC truncates the decimal
portion of numeric_value and returns an integer.
 If you pass a decimal precision value, the PowerCenter
Integration Service rounds numeric_value to the nearest integer
before evaluating the expression.

Return Value
 Numeric value.
 NULL if one of the arguments is NULL

Examples
 The following expressions truncate the values in the Price port:
TRUNC( PRICE, 3 )

PRICE RETURN VALUE


12.9995 12.999
-18.8652 -18.865
56.9563 56.956
127
 By default, the Integration Service uses one reader thread, one
transformation thread, and one writer thread to process a
session.
 The thread with the highest busy percentage identifies the
bottleneck in the session.
8. PERFORMANCE TUNING  The session log provides the following thread statistics:
Run time - Amount of time the thread runs.
Idle time - Amount of time the thread is idle. It includes the time
Complete the following tasks to improve session performance:
the thread waits for other thread processing within the
1. Optimize the target - Enables the Integration Service to write
application. Idle time includes the time the thread is blocked by
to the targets efficiently.
the Integration Service, but not the time the thread is blocked
2. Optimize the source - Enables the Integration Service to read
by the operating system.
source data efficiently.
Busy time - Percentage of the run time the thread is by according
3. Optimize the mapping - Enables the Integration Service to
to the following formula:
transform and move data efficiently.
(run time - idle time) / run time X 100
4. Optimize the transformation - Enables the Integration Service
You can ignore high busy percentages when the total run time is
to process transformations in a mapping efficiently.
short, such as under 60 seconds. This does not necessarily
5. Optimize the session - Enables the Integration Service to run
indicate a bottleneck.
the session more quickly.
Thread work time - The percentage of time the Integration
6. Optimize the grid deployments - Enables the Integration
Service takes to process each transformation in a thread. The
Service to run on a grid with optimal performance.
session log shows the following information for the
7. Optimize the PowerCenter components - Enables the
transformation thread work time:
Integration Service and Repository Service to function optimally.
Thread work time breakdown:
8. Optimize the system - Enables PowerCenter service processes
<transformation name>: <number> percent
to run more quickly
<transformation name>: <number> percent
<transformation name>: <number> percent
Bottlenecks -
Look for performance bottlenecks in the following order:
 If a transformation takes a small amount of time, the session log
1. Target
does not include it.
2. Source
 If a thread does not have accurate statistics, because the session
3. Mapping
ran for a short period of time, the session log reports that the
4. Session
statistics are not accurate.
5. System

Eliminating Bottlenecks Based on Thread Statistics


 Use the following methods to identify performance bottlenecks:
Complete the following tasks to eliminate bottlenecks based on
Run test sessions - You can configure a test session to read from
thread statistics:
a flat file source or to write to a flat file target to identify source
- If the reader or writer thread is 100% busy, consider using string
and target bottlenecks.
datatype in the source or target ports. Non-string ports require
Analyze performance details - Analyze performance details, such
more processing.
as performance counters, to determine where session
- If a transformation thread is 100% busy, consider adding a
performance decreases.
partition point in the segment. When you add partition points to
Analyze thread statistics - Analyze thread statistics to determine
the mapping, the Integration Service increases the number of
the optimal number of partition points.
transformation threads it uses for the session. However, if the
Monitor system performance - You can use system monitoring
machine is already running at or near full capacity, do not add
tools to view the percentage of CPU use, I/O waits, and paging to
more threads.
identify system bottlenecks. You can also use the Workflow
- If one transformation requires more processing time than the
Monitor to view system resource usage
others, consider adding a pass-through partition point to the
transformation.
Using Thread Statistics
 You can use thread statistics in the session log to identify source,
Example
target, or transformation bottlenecks.
When you run a session, the session log lists run information and
thread statistics similar to the following text:
128
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], - Have the database administrator optimize database
CONCURRENT SET [1] ***** performance by optimizing the query.
Thread [READER_1_1_1] created for [the read stage] of partition - Increase the database network packet size.
point [SQ_two_gig_file_32B_rows] has completed. - Configure index and key constraints.
Total Run Time = [505.871140] secs
Total Idle Time = [457.038313] secs Identifying Source Bottlenecks
Busy Percentage = [9.653215]  You can read the thread statistics in the session log to determine
if the source is the bottleneck.
Thread [TRANSF_1_1_1] created for [the transformation stage] of  When the Integration Service spends more time on the reader
partition point thread than the transformation or writer threads, you have a
[SQ_two_gig_file_32B_rows] has completed. source bottleneck.
Total Run Time = [506.230461] secs  If the session reads from a relational source, use the following
Total Idle Time = [1.390318] secs methods to identify source bottlenecks:
Busy Percentage = [99.725359] - Filter transformation
Thread work time breakdown: - Read test mapping
LKP_ADDRESS: 25.000000 percent - Database query
SRT_ADDRESS: 21.551724 percent
RTR_ZIP_CODE: 53.448276 percent  If the session reads from a flat file source, you probably do not
have a source bottleneck.
Thread [WRITER_1_*_1] created for [the write stage] of partition
point [scratch_out_32B] has completed. Using a Filter Transformation
Total Run Time = [507.027212] secs  You can use a Filter transformation in the mapping to measure
Total Idle Time = [384.632435] secs the time it takes to read source data.
Busy Percentage = [24.139686]  Add a Filter transformation after each source qualifier.
 Set the filter condition to false so that no data is processed
 In this session log, the total run time for the transformation passed the Filter transformation.
thread is 506 seconds and the busy percentage is 99.7%.  If the time it takes to run the new session remains about the
 This means the transformation thread was never idle for the 506 same, you have a source bottleneck.
seconds.
 The reader and writer busy percentages were significantly Using a Read Test Mapping
smaller, about 9.6% and 24%.  You can create a read test mapping to identify source
 In this session, the transformation thread is the bottleneck in the bottlenecks. A read test mapping isolates the read query by
mapping. removing the transformation in the mapping.
 To determine which transformation in the transformation thread  To create a read test mapping, complete the following steps:
is the bottleneck, view the busy percentage of each 1. Make a copy of the original mapping.
transformation in the thread work time breakdown. 2. In the copied mapping, keep only the sources, source
 In this session log, the transformation RTR_ZIP_CODE had a busy qualifiers, and any custom joins or queries.
percentage of 53%. 3. Remove all transformations.
4. Connect the source qualifiers to a file target.
Identifying Target Bottlenecks  Run a session against the read test mapping. If the session
To identify a target bottleneck, complete the following tasks: performance is similar to the original session, you have a source
- Configure a copy of the session to write to a flat file target. If bottleneck.
the session performance increases significantly, you have a target
bottleneck. If a session already writes to a flat file target, you Using a Database Query
probably do not have a target bottleneck.  To identify source bottlenecks, execute the read query directly
- Read the thread statistics in the session log. When the against the source database.
Integration Service spends more time on the writer thread than  Copy the read query directly from the session log. Execute the
the transformation or reader threads, you have a target query against the source database with a query tool such as isql.
bottleneck.  Measure the query execution time and the time it takes for the
query to return the first row.
Eliminating Target Bottlenecks
Complete the following tasks to eliminate target bottlenecks: Eliminating Source Bottlenecks
129
Complete the following tasks to eliminate source bottlenecks:
- Set the number of bytes the Integration Service reads per line Identifying System Bottlenecks on Windows
if the Integration Service reads from a flat file source. Use the Windows Performance Monitor to create a chart that
- Have the database administrator optimize database provides the following information:
performance by optimizing the query. Percent processor time - If you have more than one CPU, monitor
- Increase the database network packet size. each CPU for percent processor time.
- Configure index and key constraints. Pages/second - If pages/second is greater than five, you may
- If there is a long delay between the two time measurements in have excessive memory pressure (thrashing).
a database query, you can use an optimizer hint Physical disks percent times - The percent of time that the
physical disk is busy performing read or write requests.
Identifying Mapping Bottlenecks Physical disks queue length - The number of users waiting for
To identify mapping bottlenecks, complete the following tasks: access to the same disk device.
- Read the thread statistics and work time statistics in the session Server total bytes per second - The server has sent to and
log. When the Integration Service spends more time on the received from the network.
transformation thread than the writer or reader threads, you
have a transformation bottleneck. When the Integration Service Identifying System Bottlenecks on UNIX
spends more time on one transformation, it is the bottleneck in top - View overall system performance. This tool displays CPU
the transformation thread. usage, memory usage, and swap usage for the system and for
- Analyze performance counters. High errorrows and individual processes running on the system.
rowsinlookupcache counters indicate a mapping bottleneck. iostat - Monitor the loading operation for every disk attached to
- Add a Filter transformation before each target definition. Set the database server. Iostat displays the percentage of time that
the filter condition to false so that no data is loaded into the the disk is physically active. If you use disk arrays, use utilities
target tables. If the time it takes to run the new session is the provided with the disk arrays instead of iostat.
same as the original session, you have a mapping bottleneck. vmstat - Monitor disk swapping actions. Swapping should not
occur during the session.
Eliminating Mapping Bottlenecks sar - View detailed system activity reports of CPU, memory, and
- To eliminate mapping bottlenecks, optimize transformation disk usage. You can use this tool to monitor
settings in mappings. CPU loading - It provides percent usage on user, system, idle
time, and waiting time. You can also use this tool to monitor disk
 If you do not have a source, target, or mapping bottleneck, you swapping actions.
may have a session bottleneck.
 Small cache size, low buffer memory, and small commit intervals Eliminating System Bottlenecks
can cause session bottlenecks. - If the CPU usage is more than 80%, check the number of
concurrent running tasks. Consider changing the load or using a
Identifying Session Bottlenecks grid to distribute tasks to different nodes. If you cannot reduce
- To identify a session bottleneck, analyze the performance the load, consider adding more processors.
details. Performance details display information about each - If swapping occurs, increase the physical memory or reduce the
transformation, such as the number of input rows, output rows, number of memory-intensive applications on the disk.
and error rows. - If you have excessive memory pressure (thrashing), consider
adding more physical memory.
Eliminating Session Bottlenecks - If the percent of time is high, tune the cache for PowerCenter to
- To eliminate session bottlenecks, optimize the session use in-memory cache instead of writing to disk. If you tune the
cache, requests are still in queue, and the disk busy percentage is
Using the Workflow Monitor to Identify System Bottlenecks at least 50%, add another disk device or upgrade to a faster disk
CPU% - The percentage of CPU usage includes other external device. You can also use a separate disk for each partition in the
tasks running on the system. session.
Memory usage - To troubleshoot, use system tools to check the - If physical disk queue length is greater than two, consider
memory usage before and after running the session and then adding another disk device or upgrading the disk device. You also
compare the results to the memory usage while running the can use separate disks for the reader, writer, and transformation
session. threads.
Swap usage - Swap usage is a result of paging due to possible - Consider improving network bandwidth.
memory leaks or a high number of concurrent tasks.
130
- When you tune UNIX systems, tune the server for a major performance against the ability to recover an incomplete
database system. session.
- If the percent time spent waiting on I/O (%wio) is high, consider  When bulk loading to Microsoft SQL Server or Oracle targets,
using other under-utilized disks. For example, if the source data, define a large commit interval to increase performance.
target data, lookup, rank, and aggregate cache files are all on the  Microsoft SQL Server and Oracle start a new bulk load transaction
same disk, consider putting them on different disks. after each commit.
 Increasing the commit interval reduces the number of bulk load
Optimizing Flat File Targets transactions, which increases performance.
If you use a shared storage directory for flat file targets, you can
optimize session performance by ensuring that the shared Using External Loaders
storage directory is on a machine that is dedicated to storing and To increase session performance, configure PowerCenter to use
managing files, instead of performing other tasks. an external loader for the following types of target databases:
If the Integration Service runs on a single node and the session - IBM DB2 EE or EEE
writes to a flat file target, you can optimize session performance - Oracle
by writing to a flat file target that is local to the Integration - Sybase IQ
Service process node. - Teradata

Dropping Indexes and Key Constraints Minimizing Deadlocks


 When you define key constraints or indexes in target tables, you  If the Integration Service encounters a deadlock when it tries to
slow the loading of data to those tables. write to a target, the deadlock only affects targets in the same
 To improve performance, drop indexes and key constraints before target connection group. The Integration Service still writes to
you run the session. targets in other target connection groups.
 You can rebuild those indexes and key constraints after the  Encountering deadlocks can slow session performance.
session completes.  To improve session performance, you can increase the number of
 If you decide to drop and rebuild indexes and key constraints on a target connection groups the Integration Service uses to write to
regular basis, you can use the following methods to perform the targets in a session. To use a different target connection
these operations each time you run the session: group for each target in a session, use a different database
- Use pre-load and post-load stored procedures. connection name for each target instance.
- Use pre-session and post-session SQL commands.  You can specify the same connection information for each
Note: To optimize performance, use constraint-based loading connection name.
only if necessary
Increasing Database Network Packet Size
Increasing Database Checkpoint Intervals  If you write to Oracle, Sybase ASE, or Microsoft SQL Server
 The Integration Service performance slows each time it waits for targets, you can improve the performance by increasing the
the database to perform a checkpoint. network packet size.
 To decrease the number of checkpoints and increase  Increase the network packet size to allow larger packets of data
performance, increase the checkpoint interval in the database. to cross the network at one time.
 Note: Although you gain performance when you reduce the  Increase the network packet size based on the database you
number of checkpoints, you also increase the recovery time if write to:
the database shuts down unexpectedly. Oracle - You can increase the database server network packet size
in listener.ora and tnsnames.ora. Consult your database
Using Bulk Loads documentation for additional information about increasing the
 You can use bulk loading to improve the performance of a session packet size, if necessary.
that inserts a large amount of data into a DB2, Sybase ASE, Sybase ASE and Microsoft SQL Server - Consult your database
Oracle, or Microsoft SQL Server database. documentation for information about how to increase the packet
 Configure bulk loading in the session properties. size.
 When bulk loading, the Integration Service bypasses the  For Sybase ASE or Microsoft SQL Server, you must also change
database log, which speeds performance. the packet size in the relational connection object in the
 Without writing to the database log, however, the target Workflow Manager to reflect the database server packet size.
database cannot perform rollback. Optimizing the Query
 As a result, you may not be able to perform recovery. When you  If a session joins multiple source tables in one Source Qualifier,
use bulk loading, weigh the importance of improved session you might be able to improve performance by optimizing the
131
query with optimizing hints. Also, single table select statements documentation for additional information about increasing the
with an ORDER BY or GROUP BY clause may benefit from packet size, if necessary.
optimization such as adding indexes.  Sybase ASE and Microsoft SQL Server - Consult your database
 Usually, the database optimizer determines the most efficient documentation for information about how to increase the
way to process the source data. packet size.
 However, you might know properties about the source tables  For Sybase ASE or Microsoft SQL Server, you must also change
that the database optimizer does not. the packet size in the relational connection object in the
 The database administrator can create optimizer hints to tell the Workflow Manager to reflect the database server packet size.
database how to execute the query for a particular set of source
tables. Connecting to Oracle Database Sources
 The query that the Integration Service uses to read data appears  If you are running the Integration Service on a single node and
in the session log/SQ Transformation the Oracle instance is local to the Integration Service process
 Have the database administrator analyze the query, and then node, you can optimize performance by using IPC protocol to
create optimizer hints and indexes for the source tables. connect to the Oracle database. You can set up an Oracle
 Use optimizing hints if there is a long delay between when the database connection in listener.ora and tnsnames.ora.
query begins executing and when PowerCenter receives the first
row of data. Using tempdb to Join Sybase or Microsoft SQL Server Tables
 Configure optimizer hints to begin returning rows as quickly as  When you join large tables on a Sybase or Microsoft SQL Server
possible, rather than returning all rows at once. database, it is possible to improve performance by creating the
 This allows the Integration Service to process rows parallel with tempdb as an in-memory database to allocate sufficient
the query execution. memory.
 Once you optimize the query, use the SQL override option to take
full advantage of these modifications. Optimizing Mappings Overview
 You can also configure the source database to run parallel queries  Focus on mapping-level optimization after you optimize the
to improve performance. targets and sources.
 Generally, you reduce the number of transformations in the
Using Conditional Filters mapping and delete unnecessary links between transformations
 A simple source filter on the source database can sometimes to optimize the mapping.
negatively impact performance because of the lack of indexes.  Configure the mapping with the least number of transformations
 You can use the PowerCenter conditional filter in the Source and expressions to do the most amount of work possible.
Qualifier to improve performance.  Delete unnecessary links between transformations to minimize
 Whether you should use the PowerCenter conditional filter to the amount of data moved.
improve performance depends on the session.
 For example, if multiple sessions read from the same source Optimizing Flat File Sources
simultaneously, the PowerCenter conditional filter may improve - Optimize the line sequential buffer length.
performance. - Optimize delimited flat file sources.
 However, some sessions may perform faster if you filter the - Optimize XML and flat file sources.
source data on the source database.
 You can test the session with both the database filter and the Optimizing the Line Sequential Buffer Length
PowerCenter filter to determine which method improves  If the session reads from a flat file source, you can improve
performance. session performance by setting the number of bytes the
Integration Service reads per line.
Increasing Database Network Packet Size  By default, the Integration Service reads 1024 bytes per line.
 If you read from Oracle, Sybase ASE, or Microsoft SQL Server  If each line in the source file is less than the default setting, you
sources, you can improve the performance by increasing the can decrease the line sequential buffer length in the session
network packet size. properties.
 Increase the network packet size to allow larger packets of data
to cross the network at one time. Optimizing Delimited Flat File Sources
 Increase the network packet size based on the database you read  If a source is a delimited flat file, you must specify the delimiter
from: character to separate columns of data in the source file.
 Oracle - You can increase the database server network packet size  You must also specify the escape character.
in listener.ora and tnsnames.ora. Consult your database
132
 The Integration Service reads the delimiter character as a regular  If you use the Getting Started Wizard to create a pass-through
character if you include the escape character before the mapping, the wizard creates an Expression transformation
delimiter character. between the SQ transformation and the target
 You can improve session performance if the source flat file does
not contain quotes or escape characters. Optimizing Filters
 Use one of the following transformations to filter data:
Optimizing XML and Flat File Sources  Source Qualifier transformation - The Source Qualifier
 XML files are usually larger than flat files because of the tag transformation filters rows from relational sources.
information.  Filter transformation - The Filter transformation filters data within
 The size of an XML file depends on the level of tagging in the XML a mapping. The Filter transformation filters rows from any type
file. of source.
 More tags result in a larger file size.
 As a result, the Integration Service may take longer to read and  If you filter rows from the mapping, you can improve efficiency by
cache XML sources. filtering early in the data flow.
 Use a filter in the Source Qualifier transformation to remove the
Configuring Single-Pass Reading rows at the source.
 Single-pass reading allows you to populate multiple targets with  The Source Qualifier transformation limits the row set extracted
one source qualifier. from a relational source.
 Consider using single-pass reading if you have multiple sessions
that use the same sources.  If you cannot use a filter in the Source Qualifier transformation,
 You can combine the transformation logic for each mapping in use a Filter transformation and move it as close to the SQ
one mapping and use one source qualifier for each source. transformation as possible to remove unnecessary data early in
 The Integration Service reads each source once and then sends the data flow.
the data into separate pipelines.  The Filter transformation limits the row set sent to a target.
 A particular row can be used by all the pipelines, by any  Avoid using complex expressions in filter conditions.
combination of pipelines, or by no pipelines.  To optimize Filter transformations, use simple integer or
true/false expressions in the filter condition.
 For example, you have the Purchasing source table, and you use
that source daily to perform an aggregation and a ranking. Note: You can also use a Filter or Router transformation to drop
 If you place the Aggregator and Rank transformations in separate rejected rows from an Update Strategy transformation if you do
mappings and sessions, you force the Integration Service to read not need to keep rejected rows.
the same source table twice.
 However, if you include the aggregation and ranking logic in one Optimizing Datatype Conversions
mapping with one source qualifier, the Integration Service reads  You can increase performance by eliminating unnecessary
the Purchasing source table once, and then sends the datatype conversions.
appropriate data to the separate pipelines.  For example, if a mapping moves data from an Integer column to
 When changing mappings to take advantage of single-pass a Decimal column, then back to an Integer column, the
reading, you can optimize this feature by factoring out common unnecessary datatype conversion slows performance.
functions from mappings.  Where possible, eliminate unnecessary datatype conversions
 For example, if you need to subtract a percentage from the Price from mappings.
ports for both the Aggregator and Rank transformations, you
can minimize work by subtracting the percentage before Use the following datatype conversions to improve system
splitting the pipeline. performance:
- Use integer values in place of other datatypes when performing
comparisons using Lookup and Filter transformations. For
example, many databases store U.S. ZIP code information as a
Char or Varchar datatype. If you convert the zip code data to an
Optimizing Pass-Through Mappings Integer datatype, the lookup database stores the zip code
To pass directly from source to target without any other 94303-1234 as 943031234. This helps increase the speed of the
transformations, connect the Source Qualifier transformation lookup comparisons based on zip code.
directly to the target. - Convert the source dates to strings through port-to-port
conversions to increase session performance. You can either
133
leave the ports in targets as strings or change the ports to  You can use the TreatCHARasCHARonRead option when you
Date/Time ports. configure the Integration Service in the Informatica
Administrator so that the Integration Service does not trim
Optimizing Expressions trailing spaces from the end of Char source fields
Complete the following tasks to isolate the slow expressions:
1. Remove the expressions one-by-one from the mapping. Choosing DECODE Versus LOOKUP
2. Run the mapping to determine the time it takes to run the  When you use a LOOKUP function, the Integration Service must
mapping without the transformation. look up a table in a database.
 When you use a DECODE function, you incorporate the lookup
 If there is a significant difference in session run time, look for values into the expression so the Integration Service does not
ways to optimize the slow expression. have to look up a separate table. Therefore, when you want to
look up a small set of unchanging values, use DECODE to
Factoring Out Common Logic improve performance.
 If the mapping performs the same task in multiple places, reduce Using Operators Instead of Functions
the number of times the mapping performs the task by moving  The Integration Service reads expressions written with operators
the task earlier in the mapping. faster than expressions with functions.
 For example, you have a mapping with five target tables.  Where possible, use operators to write expressions.
 Each target requires a Social Security number lookup.  For example, you have the following expression that contains
 Instead of performing the lookup five times, place the Lookup nested CONCAT functions:
transformation in the mapping before the data flow splits. CONCAT (CONCAT (CUSTOMERS.FIRST_NAME, ‘’)
 Next, pass the lookup results to all five targets. CUSTOMERS.LAST_NAME)
 You can rewrite that expression with the || operator as follows:
Minimizing Aggregate Function Calls CUSTOMERS.FIRST_NAME || ‘ ’ || CUSTOMERS.LAST_NAME
 When writing expressions, factor out as many aggregate function
calls as possible. Optimizing IIF Functions
 Each time you use an aggregate function call, the Integration  IIF functions can return a value and an action, which allows for
Service must search and group the data. more compact expressions.
 For example, in the following expression, the Integration Service  For example, you have a source with three Y/N flags: FLG_A,
reads COLUMN_A, finds the sum, then reads COLUMN_B, finds FLG_B, FLG_C. You want to return values based on the values of
the sum, and finally finds the sum of the two sums: each flag.
SUM(COLUMN_A) + SUM(COLUMN_B)  You use the following expression:
 If you factor out the aggregate function call, as below, the IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y',
Integration Service adds COLUMN_A to COLUMN_B, then finds VAL_A + VAL_B + VAL_C,
the sum of both. IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'N',
SUM(COLUMN_A + COLUMN_B) VAL_A + VAL_B ,
IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'Y',
Replacing Common Expressions with Local Variables VAL_A + VAL_C,
 If you use the same expression multiple times in one IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N',
transformation, you can make that expression a local variable. VAL_A ,
 You can use a local variable only within the transformation. IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y',
VAL_B + VAL_C,
Choosing Numeric versus String Operations IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N',
 The Integration Service processes numeric operations faster than VAL_B ,
string operations. IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y',
 For example, if you look up large amounts of data on two VAL_C,
columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N',
lookup around EMPLOYEE_ID improves performance. 0.0,))))))))
 This expression requires 8 IIFs, 16 ANDs, and at least 24
Optimizing Char-Char and Char-Varchar Comparisons comparisons.
 When the Integration Service performs comparisons between  If you take advantage of the IIF function, you can rewrite that
CHAR and VARCHAR columns, it slows each time it finds trailing expression as:
blank spaces in the row.
134
IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B, 0.0)+ 3. The Integration Service updates the target incrementally,
IIF(FLG_C='Y', VAL_C, 0.0) rather than processing the entire source and recalculating the
 This results in three IIFs, two comparisons, two additions, and a same calculations every time you run the session.
faster session. 4. You can increase the index and data cache sizes to hold all data
in memory without paging to disk
Evaluating Expressions - Filter data before you aggregate it.
If you are not sure which expressions slow performance, evaluate - Limit port connections
the expression performance to isolate the problem.
1. Time the session with the original expressions. Optimizing Custom Transformations
2. Copy the mapping and replace half of the complex expressions  The Integration Service can pass a single row to a Custom
with a constant. transformation procedure or a block of rows in an array.
3. Run and time the edited session.  You can write the procedure code to specify whether the
4. Make another copy of the mapping and replace the other half procedure receives one row or a block of rows.
of the complex expressions with a constant.  You can increase performance when the procedure receives a
5. Run and time the edited session block of rows:
Optimizing External Procedures - You can decrease the number of function calls the Integration
 For example, you need to create an external procedure with two Service and procedure make. The Integration Service calls the
input groups. input row notification function fewer times, and the procedure
 The external procedure reads a row from the first input group calls the output notification function fewer times.
and then reads a row from the second input group. - You can increase the locality of memory access space for the
 If you use blocking, you can write the external procedure code to data.
block the flow of data from one input group while it processes - You can write the procedure code to perform an algorithm on a
the data from the other input group. block of data instead of each row of data.
 When you write the external procedure code to block data, you
increase performance because the procedure does not need to Optimizing Joiner Transformations
copy the source data to a buffer. However, you could write the Use the following tips to improve session performance with the
external procedure to allocate a buffer and copy the data from Joiner transformation:
one input group to the buffer until it is ready to process the Designate the master source as the source with fewer duplicate
data. key values - When the Integration Service processes a sorted
 Copying source data to a buffer decreases performance. Joiner transformation, it caches rows for one hundred unique
keys at a time. If the master source contains many rows with the
Optimizing Aggregator Transformations same key value, the Integration Service must cache more rows,
 Aggregator transformations often slow performance because and performance can be slowed.
they must group data before processing it. Designate the master source as the source with fewer rows -
 Aggregator transformations need additional memory to hold During a session, the Joiner transformation compares each row of
intermediate group results. the detail source against the master source. The fewer rows in
 Use the following guidelines to optimize the performance of an the master, the fewer iterations of the join comparison occur,
Aggregator transformation: which speeds the join process.
- Group by simple columns. Perform joins in a database when possible - Performing a join in a
1. When possible, use numbers instead of string and dates in the database is faster than performing a join in the session. The type
columns used for the GROUP BY. of database join you use can affect performance. Normal joins
2. Avoid complex expressions in the Aggregator expressions. are faster than outer joins and result in fewer rows. In some
- Use sorted input. cases, you cannot perform the join in the database, such as
1. The Sorted Input option decreases the use of aggregate caches joining tables from two different databases or flat file systems.
2. You can increase performance when you use the Sorted Input
option in sessions with multiple partitions To perform a join in a database, use the following options:
- Use incremental aggregation. - Create a pre-session stored procedure to join the tables in a
1. If you can capture changes from the source that affect less database.
than half the target, you can use incremental aggregation to - Use the Source Qualifier transformation to perform the join.
optimize the performance of Aggregator transformations.
2. When you use incremental aggregation, you apply captured Join sorted data when possible - To improve session
changes in the source to aggregate calculations in a session. performance, configure the Joiner transformation to use sorted
135
input. When you configure the Joiner transformation to use - When you enter the ORDER BY statement, enter the
sorted data, the Integration Service improves performance by columns in the same order as the ports in the lookup
minimizing disk input and output. You see the greatest condition.
performance improvement when you work with large data sets. - You must also enclose all database reserved words in
For an unsorted Joiner transformation, designate the source with quotes.
fewer rows as the master source. - Enter the following lookup query in the lookup SQL
override:
Optimizing Lookup Transformations SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE,
- Use the optimal database driver. ITEMS_DIM.ITEM_ID FROM ITEMS_DIM ORDER BY
- Cache lookup tables. ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE --
 Use the appropriate cache type.
 Enable concurrent caches.  Use a machine with more memory.
 Optimize Lookup condition matching.
- When the Lookup transformation matches lookup cache - Optimize the lookup condition.
data with the lookup condition, it sorts and orders the  If you include more than one lookup condition, place the
data to determine the first matching value and the last conditions in the following order to optimize lookup
matching value. performance:
- You can configure the transformation to return any value Equal to (=)
that matches the lookup condition. Less than (<), greater than (>), less than or equal to (<=),
- When you configure the Lookup transformation to greater than or equal to (>=)
return any matching value, the transformation returns Not equal to (! =)
the first value that matches the lookup condition. - Filter lookup rows.
- It does not index all ports as it does when you configure - Index the lookup table.
the transformation to return the first matching value or  The Integration Service needs to query, sort, and
the last matching value. compare values in the lookup condition columns.
- When you use any matching value, performance can  The index needs to include every column used in a
improve because the transformation does not index on lookup condition
all ports, which can slow performance. - Optimize multiple lookups.
 If a mapping contains multiple lookups, even with
 Reduce the number of cached rows. caching enabled and enough heap memory, the lookups
- You can reduce the number of rows included in the can slow performance.
cache to increase performance. Use the Lookup SQL  Tune the Lookup transformations that query the largest
Override option to add a WHERE clause to the default amounts of data to improve overall performance.
SQL statement  To determine which Lookup transformations process the
most data, examine the Lookup_rowsinlookupcache
 Override the ORDER BY statement. counters for each Lookup transformation.
- By default, the Integration Service generates an ORDER  The Lookup transformations that have a large number in
BY statement for a cached lookup. this counter might benefit from tuning their lookup
- The ORDER BY statement contains all lookup ports. expressions.
- To increase performance, suppress the default ORDER  If those expressions can be optimized, session
BY statement and enter an override ORDER BY with performance improves
fewer columns. - Create a pipeline Lookup transformation and configure
- The Integration Service always generates an ORDER BY partitions in the pipeline that builds the lookup source.
statement, even if you enter one in the override.
- Place two dashes ‘--’ after the ORDER BY override to Optimizing Sequence Generator Transformations
suppress the generated ORDER BY statement.  To optimize Sequence Generator transformations, create a
- For example, a Lookup transformation uses the following reusable Sequence Generator and using it in multiple mappings
lookup condition: simultaneously.
ITEM_ID = IN_ITEM_ID  Also, configure the Number of Cached Values property.
PRICE <= IN_PRICE  The Number of Cached Values property determines the number
- The Lookup transformation includes three lookup ports of values the Integration Service caches at one time.
used in the mapping, ITEM_ID, ITEM_NAME, and PRICE.  Make sure that the Number of Cached Value is not too small.
136
 Consider configuring the Number of Cached Values to a value Grid -
greater than 1,000.  A grid is an alias assigned to a group of nodes that allows you to
 If you do not have to cache values, set the Number of Cache automate the distribution of workflows and sessions across
Values to 0. nodes.
 Sequence Generator transformations that do not use cache are  When you use a grid, the Integration Service distributes workflow
faster than those that require cache. tasks and session threads across multiple nodes.
 A Load Balancer distributes tasks to nodes without overloading
Optimizing Sorter Transformations any node. Running workflows and sessions on the nodes of a
- Allocate enough memory to sort the data. grid provides the following performance gains:
- Specify a different work directory for each partition in the Sorter - Balances the Integration Service workload.
transformation. - Processes concurrent sessions faster.
- Processes partitions faster.
Optimizing SQL Transformations
 Each time the Integration Service processes a new query in a  The Integration Service requires CPU resources for parsing input
session, it calls a function called SQLPrepare to create an SQL data and formatting the output data.
procedure and pass it to the database.  A grid can improve performance when you have a performance
 When the query changes for each input row, it has a performance bottleneck in the extract and load steps of a session.
impact.  A grid can improve performance when memory or temporary
 When an SQL query contains commit and rollback query storage is a performance bottleneck.
statements, the Integration Service must recreate the SQL  When a PowerCenter mapping contains a transformation that has
procedure after each commit or rollback. cache memory, deploying adequate memory and separate disk
 To optimize performance, do not use transaction statements in storage for each cache instance improves performance.
an SQL transformation query.  Running a session on a grid can improve throughput because the
 When you configure the transformation to use a static grid provides more resources to run the session.
connection, you choose a connection from the Workflow  Performance improves when you run a few sessions on the grid
Manager connections. at a time.
 The SQL transformation connects to the database once during  Running a session on a grid is more efficient than running a
the session. workflow over a grid if the number of concurrent session
 When you pass dynamic connection information, the SQL partitions is less than the number of nodes.
transformation connects to the database each time the  When you run multiple sessions on a grid, session subtasks share
transformation processes an input row. node resources with subtasks of other concurrent sessions.
 Running a session on a grid requires coordination between
processes running on different nodes.

 For some mappings, running a session on a grid requires


Eliminating Transformation Errors additional overhead to move data from one node to another
 In large numbers, transformation errors slow the performance of node.
the Integration Service.  In addition to loading the memory and CPU resources on each
 With each transformation error, the Integration Service pauses to node, running multiple sessions on a grid adds to network
determine the cause of the error and to remove the row causing traffic.
the error from the data flow.  When you run a workflow on a grid, the Integration Service loads
 Next, the Integration Service typically writes the row into the memory and CPU resources on nodes without requiring
session log file. coordination between the nodes
 Transformation errors occur when the Integration Service
encounters conversion errors, conflicting mapping logic, and any Pushdown Optimization -
condition set up as an error, such as null input.  To increase session performance, push transformation logic to
 Check the session log to see where the transformation errors the source or target database.
occur.
 If the errors center on particular transformations, evaluate those Concurrent Sessions and Workflows -
transformation constraints.  If possible, run sessions and workflows concurrently to improve
performance.
Optimizing Sessions
137
 For example, if you load data into an analytic schema, where you Note: For a session that contains n partitions, set the DTM Buffer
have dimension and fact tables, load the dimensions Size to at least n times the value for the session with one
concurrently partition.

Buffer Memory  The Log Manager writes a warning message in the session log if
 When the Integration Service initializes a session, it allocates the number of memory blocks is so small that it causes
blocks of memory to hold source and target data. performance degradation.
 The Integration Service allocates at least two blocks for each  The Log Manager writes this warning message even if the
source and target partition. number of memory blocks is enough for the session to run
 Sessions that use a large number of sources and targets might successfully.
require additional memory blocks.  The warning message also gives a suggestion for the proper
 If the Integration Service cannot allocate enough memory blocks value.
to hold the data, it fails the session.  If you modify the DTM Buffer Size, increase the property by
 You can configure the amount of buffer memory, or you can multiples of the buffer block size.
configure the Integration Service to calculate buffer settings at
run time. Increasing DTM Buffer Size
 To increase the number of available memory blocks, adjust the  The DTM Buffer Size setting specifies the amount of memory the
following session properties: Integration Service uses as DTM buffer memory.
DTM Buffer Size - Increase the DTM buffer size on the Properties  The Integration Service uses DTM buffer memory to create the
tab in the session properties. internal data structures and buffer blocks used to bring data into
Default Buffer Block Size - Decrease the buffer block size on the and out of the Integration Service.
Config Object tab in the session properties.  When you increase the DTM buffer memory, the Integration
Service creates more buffer blocks, which improves
 Before you configure these settings, determine the number of performance during momentary slowdowns.
memory blocks the Integration Service requires to initialize the  Increasing DTM buffer memory allocation generally causes
session. performance to improve initially and then level off.
 Then, based on default settings, calculate the buffer size and the  When you increase the DTM buffer memory allocation, consider
buffer block size to create the required number of session the total memory available on the Integration Service process
blocks. system.
 If you have XML sources or targets in a mapping, use the number  If you do not see a significant increase in performance, DTM
of groups in the XML source or target in the calculation for the buffer memory allocation is not a factor in session performance.
total number of sources and targets.
 Note: Reducing the DTM buffer allocation can cause the session
 For example, you create a session that contains a single partition to fail early in the process because the Integration Service is
using a mapping that contains 50 sources and 50 targets. unable to allocate memory to the required processes.
Then, you make the following calculations:  To increase the DTM buffer size, open the session properties and
1. You determine that the session requires a minimum of 200 click the Properties tab.
memory blocks:  Edit the DTM Buffer Size property in the Performance settings.
[(total number of sources + total number of targets)* 2] =  Increase the property by multiples of the buffer block size, and
(session buffer blocks) then run and time the session after each increase
100 * 2 = 200
2. Based on default settings, you determine that you can change Optimizing the Buffer Block Size
the DTM Buffer Size to 15,000,000, or you can change the Default  Depending on the session source data, you might need to
Buffer Block Size to 54,000: increase or decrease the buffer block size.
(Session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer  If the machine has limited physical memory and the mapping in
Block Size) * (number of partitions) the session contains a large number of sources, targets, or
200 = .9 * 14222222 / 64000 * 1 partitions, you might need to decrease the buffer block size.
Or  If you are manipulating unusually large rows of data, increase the
200 = .9 * 12000000 / 54000 * 1 buffer block size to improve performance.
 If you do not know the approximate size of the rows, determine
the configured row size by completing the following steps.
 To evaluate needed buffer block size:
138
1. In the Mapping Designer, open the mapping for the session.  Examine the performance counters to determine how often the
2. Open the target instance Integration Service pages to a file.
3. Click the Ports tab.
4. Add the precision for all columns in the target. Perform the following tasks to optimize caches:
5. If you have more than one target in the mapping, repeat steps - Limit the number of connected input/output and output only
2 to 4 for each additional target to calculate the precision for ports.
each target. - Select the optimal cache directory location.
6. Repeat steps 2 to 5 for each source definition in the mapping.
7. Choose the largest precision of all the source and target  If you run the Integration Service on a grid and only some
precisions for the total precision in the buffer block size Integration Service nodes have fast access to the shared cache
calculation. file directory, configure each session with a large cache to run
on the nodes with fast access to the directory.
The total precision represents the total bytes needed to move the  If all Integration Service processes in a grid have slow access to
largest row of data. the cache files, set up a separate, local cache file directory for
 For example, if the total precision equals 33,000, then the each Integration Service process.
Integration Service requires 33,000 bytes in the buffers to move  An Integration Service process may have faster access to the
that row. cache files if it runs on the same machine that contains the
 If the buffer block size is 64,000 bytes, the Integration Service can cache directory.
move only one row at a time.  Note: You may encounter performance degradation when you
 Ideally, a buffer accommodates at least 100 rows at a time. cache large quantities of data on a mapped or mounted drive.
 So if the total precision is greater than 32,000, increase the size
of the buffers to improve performance. Increase the cache sizes
 To increase the buffer block size, open the session properties and  You configure the cache size to specify the amount of memory
click the Config Object tab. allocated to process a transformation.
 Edit the Default Buffer Block Size property in the Advanced  The amount of memory you configure depends on how much
settings. memory cache and disk cache you want to use.
 Increase the DTM buffer block setting in relation to the size of the  If you configure the cache size and it is not enough to process the
rows. transformation in memory, the Integration Service processes
 As with DTM buffer memory allocation, increasing buffer block some of the transformation in memory and pages information
size should improve performance. to cache files to process the rest of the transformation.
 If you do not see an increase, buffer block size is not a factor in  Each time the Integration Service pages to a cache file,
session performance. performance slows.
 You can examine the performance details of a session to
Caches determine when the Integration Service pages to a cache file.
 The Integration Service uses the index and data caches for XML  The Transformation_readfromdisk or Transformation_writetodisk
targets and Aggregator, Rank, Lookup, and Joiner counters for any Aggregator, Rank, or Joiner transformation
transformations. indicate the number of times the Integration Service pages to
 The Integration Service stores transformed data in the data cache disk to process the transformation.
before returning it to the pipeline.
 It stores group information in the index cache. Use the 64-bit version of PowerCenter to run large cache sessions
 Also, the Integration Service uses a cache to store data for Sorter  If you process large volumes of data or perform memory-
transformations. intensive transformations, you can use the 64-bit PowerCenter
 To configure the amount of cache memory, use the cache version to increase session performance.
calculator or specify the cache size.  The 64-bit version provides a larger memory space that can
 You can also configure the Integration Service to calculate cache significantly reduce or eliminate disk input/output.
memory settings at run time.  This can improve session performance in the following areas:
 If the allocated cache is not large enough to store the data, the Caching - With a 64-bit platform, the Integration Service is not
Integration Service stores the data in a temporary disk file, a limited to the 2 GB cache limit of a 32-bit platform.
cache file, as it processes the session data. Data throughput - With a larger available memory space, the
 Performance slows each time the Integration Service pages to a reader, writer, and DTM threads can process larger blocks of
temporary file. data

139
Target-Based Commit  When the Integration Service retrieves the session log from the
 Each time the Integration Service commits, performance slows. log service, workflow performance slows, especially when the
 Therefore, the smaller the commit interval, the more often the session log file is large and the log service runs on a different
Integration Service writes to the target database, and the slower node than the master DTM.
the overall performance  For optimal performance, configure the session to write to log file
 If you increase the commit interval, the number of times the when you configure post-session email to attach a session log.
Integration Service commits decreases and performance
improves. Optimizing Grid Deployments Overview
 When you increase the commit interval, consider the log file  When you run PowerCenter on a grid, you can configure the grid,
limits in the target database. sessions, and workflows to use resources efficiently and
 If the commit interval is too high, the Integration Service may fill maximize scalability.
the database log file and cause the session to fail.  To improve PowerCenter performance on a grid, complete the
 Therefore, weigh the benefit of increasing the commit interval following tasks:
against the additional time you would spend recovering a failed - Add nodes to the grid.
session. - Increase storage capacity and bandwidth.
 Click the General Options settings in the session properties to - Use shared file systems.
review and adjust the commit interval. - Use a high-throughput network when you complete the
following tasks:
Log Files 1. Access sources and targets over the network.
 A workflow runs faster when you do not configure it to write 2. Transfer data between nodes of a grid when using the
session and workflow log files. Session on Grid option.
 Workflows and sessions always create binary logs.
 When you configure a session or workflow to write a log file, the Storing Files
Integration Service writes logging events twice.  When you configure PowerCenter to run on a grid, you specify
 You can access the binary logs session and workflow logs in the the storage location for different types of session files, such as
Administrator tool source files, log files, and cache files.
 To improve performance, store files in optimal locations.
 For example, store persistent cache files on a high-bandwidth
Error Tracing shared file system.
 If a session contains a large number of transformation errors, and  Different types of files have different storage requirements.
you do not need to correct them, set the session tracing level to  You can store files in the following types of locations:
Terse. Shared file systems - Store files on a shared file system to enable
 At this tracing level, the Integration Service does not write error all Integration Service processes to access the same files. You can
messages or row-level information for reject data. store files on low-bandwidth and high-bandwidth shared file
 If you need to debug the mapping and you set the tracing level to systems.
Verbose, you may experience significant performance Local - Store files on the local machine running the Integration
degradation when you run the session. Do not use Verbose Service process when the files do not have to be accessed by
tracing when you tune performance. other Integration Service processes.
 The session tracing level overrides any transformation-specific
tracing levels within the mapping. High Bandwidth Shared File System Files
 This is not recommended as a long-term response to high levels  Because they can be accessed often during a session, place the
of transformation errors. following files on a high-bandwidth shared file system:
- Source files, including flat files for lookups.
Post-Session Emails - Target files, including merge files for partitioned sessions.
 When you attach the session log to a post-session email, enable - Persistent cache files for lookup or incremental aggregation.
flat file logging. - Non-persistent cache files for only grid-enabled sessions on a
 If you enable flat file logging, the Integration Service gets the grid.
session log file from disk.  This allows the Integration Service to build the cache only once.
 If you do not enable flat file logging, the Integration Service If these cache files are stored on a local file system, the
gets the log events from the Log Manager and generates the Integration Service builds a cache for each partition group.
session log file to attach to the email.
Low Bandwidth Shared File System Files
140
 Because they are accessed less frequently during a session, store  To recover from a workflow or session, the Integration Service
the following files on a low-bandwidth shared file system: writes the states of each workflow and session to temporary
- Parameter files or other configuration related files. files in a shared directory. This may decrease performance
- Indirect source or target files.
- Log files. OPTIMIZING THE SYSTEM OVERVIEW
 Often performance slows because the session relies on inefficient
Local Storage Files connections or an overloaded Integration Service process
 To avoid unnecessary file sharing when you use shared file system.
systems, store the following files locally:  System delays can also be caused by routers, switches, network
- Non-persistent cache files for sessions that are not enabled for a protocols, and usage by many users.
grid, including Sorter transformation temporary files.  Slow disk access on source and target databases, source and
- Individual target files for different partitions when performing a target file systems, and nodes in the domain can slow session
sequential merge for partitioned sessions. performance.
- Other temporary files that are deleted at the end of a session  Have the system administrator evaluate the hard disks on the
run. In general, to establish this, configure $PmTempFileDir for a machines.
local file system.  After you determine from the system monitoring tools that you
 Avoid storing these files on a shared file system, even when the have a system bottleneck, make the following global changes to
bandwidth is high improve the performance of all sessions:
Improve network speed - Slow network connections can slow
OPTIMIZING THE POWERCENTER COMPONENTS session performance. Have the system administrator determine if
 You can optimize performance of the following PowerCenter the network runs at an optimal speed. Decrease the number of
components: network hops between the Integration Service process and
- PowerCenter repository databases.
- Integration Service Use multiple CPUs - You can use multiple CPUs to run multiple
 If you run PowerCenter on multiple machines, run the Repository sessions in parallel and run multiple pipeline partitions in parallel.
Service and Integration Service on different machines. Reduce paging - When an operating system runs out of physical
 To load large amounts of data, run the Integration Service on the memory, it starts paging to disk to free physical memory.
higher processing machine. Configure the physical memory for the Integration Service
 Also, run the Repository Service on the machine hosting the process machine to minimize paging to disk.
PowerCenter repository Use processor binding - In a multi-processor UNIX environment,
the Integration Service may use a large amount of system
Optimizing PowerCenter Repository Performance resources. Use processor binding to control processor usage by
- Ensure the PowerCenter repository is on the same machine as the Integration Service process. Also, if the source and target
the Repository Service process. database are on the same machine, use processor binding to limit
- Order conditions in object queries. the resources used by the database.
- Use a single-node tablespace for the PowerCenter repository if
you install it on a DB2 database. USING PIPELINE PARTITIONS
- Optimize the database schema for the PowerCenter repository if  If you have the partitioning option, perform the following tasks to
you install it on a DB2 or Microsoft SQL Server database. manually set up partitions:
Increase the number of partitions -
Optimizing Integration Service Performance Use the following tips when you add partitions to a session:
- Use native drivers instead of ODBC drivers for the Integration  Add one partition at a time - To best monitor performance,
Service. add one partition at a time, and note the session settings
- Run the Integration Service in ASCII data movement mode if before you add each partition.
character data is 7-bit ASCII or EBCDIC.  Set DTM Buffer Memory - When you increase the number of
- Cache PowerCenter metadata for the Repository Service. partitions, increase the DTM buffer size. If the session
- Run Integration Service with high availability. contains n partitions, increase the DTM buffer size to at least
n times the value for the session with one partition.
 Note: When you configure the Integration Service with high  Set cached values for Sequence Generator - If a session has n
availability, the Integration Service recovers workflows and partitions, you should not need to use the “Number of
sessions that may fail because of temporary network or machine Cached Values” property for the Sequence Generator
failures. transformation. If you set this value to a value greater than 0,
141
make sure it is at least n times the original value for the Enable parallel queries - Some databases may have options that
session with one partition. must be set to enable parallel queries. Check the database
 Partition the source data evenly - Configure each partition to documentation for these options. If these options are off, the
extract the same number of rows. Integration Service runs multiple partition SELECT statements
 Monitor the system while running the session - If CPU cycles serially.
are available, you can add a partition to improve Separate data into different tables’ spaces - Each database
performance. For example, you may have CPU cycles provides an option to separate the data into different
available if the system has 20 percent idle time. tablespaces. If the database allows it, use the PowerCenter SQL
 Monitor the system after adding a partition - If the CPU override feature to provide a query that extracts data from a
utilization does not go up, the wait for I/O time goes up, or single partition.
the total data transformation rate goes down, then there is Group the sorted data - You can partition and group source data
probably a hardware or software bottleneck. If the wait for to increase performance for a sorted Joiner transformation.
I/O time goes up by a significant amount, then check the Maximize single-sorted queries
system for hardware bottlenecks. Otherwise, check the
database configuration. Optimizing the Target Database for Partitioning
 If a session contains multiple partitions, the throughput for each
Select the best performing partition types at particular points in a partition should be the same as the throughput for a single
pipeline partition session.
- You can use multiple pipeline partitions and database partitions.  If you do not see this correlation, then the database is probably
- To improve performance, ensure the number of pipeline inserting rows into the database serially.
partitions equals the number of database partitions  To ensure that the database inserts rows in parallel, check the
- To increase performance, specify partition types at the following following configuration options in the target database:
partition points in the pipeline: - Set options in the database to enable parallel inserts. For
Source Qualifier transformation - To read data from multiple flat example, set the db_writer_processes and DB2 has max_agents
files concurrently, specify one partition for each flat file in the options in an Oracle database to enable parallel inserts. Some
Source Qualifier transformation. Accept the default partition databases may enable these options by default.
type, pass-through. - Consider partitioning the target table. If possible, try to have
Filter transformation - Since the source files vary in size, each each partition write to a single database partition using a Router
partition processes a different amount of data. Set a partition transformation to do this. Also, have the database partitions on
point at the Filter transformation, and choose round-robin separate disks to prevent I/O contention among the pipeline
partitioning to balance the load going into the Filter partitions.
transformation. - Set options in the database to enhance database scalability. For
Sorter transformation - To eliminate overlapping groups in the example, disable archive logging and timed statistics in an Oracle
Sorter and Aggregator transformations, use hash auto-keys database to enhance scalability
partitioning at the Sorter transformation. This causes the
Integration Service to group all items with the same description PERFORMANCE COUNTERS OVERVIEW
into the same partition before the Sorter and Aggregator  All transformations have counters. The Integration Service tracks
transformations process the rows. You can delete the default the number of input rows, output rows, and error rows for each
partition point at the Aggregator transformation. transformation.
Target - Since the target tables are partitioned by key range,  Some transformations have performance counters. You can use
specify key range partitioning at the target to optimize writing the following performance counters to increase session
data to the target performance:
Use multiple CPUs. - Errorrows
- Readfromcache and Writetocache
Optimizing the Source Database for Partitioning - Readfromdisk and Writetodisk
 You can add partitions to increase the speed of the query. - Rowsinlookupcache
 Usually, each partition on the reader side represents a subset of
the data to be processed. Errorrows Counter
 Complete the following tasks to optimize the source database for Transformation errors impact session performance. If a
partitioning, transformation has large numbers of error rows in any of the
Tune the database - If the database is not tuned properly, Transformation_errorrows counters, you can eliminate the errors
creating partitions may not make sessions quicker. to improve performance
142
- Allocate 50 MB to the index cache and 250 MB to the data
Readfromcache and Writetocache Counters cache. The Integration Service accesses 50 percent of the data
 If a session contains Aggregator, Rank, or Joiner transformations, from the index cache and 50 percent of the data from the data
examine the Transformation_readfromcache and cache. The Integration Service does not access both index and
Transformation_writetocache counters along with the data caches a 100 times each. Therefore, the percentage of data
Transformation_readfromdisk and Transformation_writetodisk that gets accessed is 50 percent.
counters to analyze how the Integration Service reads from or
writes to disk.  If the session performs incremental aggregation, the Integration
 To analyze the disk access, first calculate the hit or miss ratio. Service reads historical aggregate data from the local disk during
 The hit ratio indicates the number of read or write operations the the session and writes to disk when saving historical data.
Integration Service performs on the cache.  As a result, the Aggregator_readtodisk and
 The miss ratio indicates the number of read or write operations Aggregator_writetodisk counters display numbers besides zero.
the Integration Service performs on the disk.  However, since the Integration Service writes the historical data
 Use the following formula to calculate the cache miss ratio: to a file at the end of the session, you can still evaluate the
[(# of reads from disk) + (# of writes to disk)]/[(# of reads from counters during the session.
memory cache) + (# of writes to memory cache)]  If the counters show numbers other than zero during the session
run, you can tune the cache sizes to increase performance.
Use the following formula to calculate the cache hit ratio:  However, there is a cost associated with allocating or deallocating
[1 - Cache Miss ratio] memory, so refrain from increasing the cache sizes to
accommodate more data volume if you know what volume of
To minimize reads and writes to disk, increase the cache size. The data the Integration Service will process.
optimal cache hit ratio is 1.
Rowsinlookupcache Counter
Readfromdisk and Writetodisk Counters  Multiple lookups can decrease session performance. To improve
 If a session contains Aggregator, Rank, or Joiner transformations, session performance, tune the lookup expressions for the larger
examine each Transformation_readfromdisk and lookup tables.
Transformation_writetodisk counter.

 If these counters display any number other than zero, you can
increase the cache sizes to improve session performance.
 The Integration Service uses the index cache to store group
information and the data cache to store transformed data,
which is typically larger.
 Therefore, although both the index cache and data cache sizes
affect performance, you may need to increase the data cache
size more than the index cache size.
 However, if the volume of data processed is greater than the
memory available you can increase the index cache size to
improve performance.

 For example, the Integration Service uses 100 MB to store the


index cache and 500 MB to store the data cache.
 With 200 randomly distributed accesses on each of the index and
data caches, you can configure the cache in the following ways:
- To optimize performance, allocate 100 MB to the index cache
and 200 MB to the data cache. The Integration Service accesses
100 percent of the data from the index cache and 40 percent of
the data from the data cache. The Integration Service always
accesses the index cache, and does not access the data cache 120
times. Therefore, the percentage of data that gets accessed is 70
percent.

143
144

You might also like