Workflow Administration Guide

Informatica PowerCenter®
(Version 7.1.1)

Informatica PowerCenter Workflow Administration Guide Version 7.1.1 August 2004 Copyright (c) 1998–2004 Informatica Corporation. All rights reserved. Printed in the USA. This software and documentation contain proprietary information of Informatica Corporation, they are provided under a license agreement containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Informatica Corporation does not warrant that this documentation is error free. Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX, and SuperGlue are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software are copyrighted by DataDirect Technologies, 1999-2002. Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and University of California, Irvine, Copyright (c) 1993-2002, all rights reserved. Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by Informatica, “as-is”, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration® is a registered trademark of Meta Integration Technology, Inc. This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The Apache Software is Copyright (c) 1999-2004 The Apache Software Foundation. All rights reserved. DISCLAIMER: Informatica Corporation provides this documentation “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in this documentation at any time without notice.

Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
New Features and Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi PowerCenter 7.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi PowerCenter 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii PowerCenter 7.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii About Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlviii About this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . l Visiting the Informatica Webzine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . . l Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . li

Chapter 1: Understanding the Server Architecture . . . . . . . . . . . . . . . 1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Workflow Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Pipeline Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 PowerCenter Server Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Running a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Load Manager Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Managing Workflow Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Locking and Reading the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Reading the Parameter File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Creating the Workflow Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Running Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Distributing Sessions to Worker Servers. . . . . . . . . . . . . . . . . . . . . . . . . . 9 Starting the DTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Running Sessions from Master Servers . . . . . . . . . . . . . . . . . . . . . . . . . . 10

iii

Writing Historical Information to the Repository . . . . . . . . . . . . . . . . . . 10 Sending Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Data Transformation Manager (DTM) Process . . . . . . . . . . . . . . . . . . . . . . . 11 Reading the Session Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Expanding Variables and Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Creating the Session Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Validating Code Pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Verifying Connection Object Permissions . . . . . . . . . . . . . . . . . . . . . . . 12 Running Pre-Session Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Running the Processing Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Running Post-Session Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Sending Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Understanding Processing Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Thread Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Threads and Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 PowerCenter Server Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Reading Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Blocking Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Block Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 System Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Load Manager Shared Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 DTM Buffer Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Cache Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Code Pages and Data Movement Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 ASCII Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Unicode Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Output Files and Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 PowerCenter Server Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Workflow Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Session Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Session Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Performance Detail File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Row Error Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Recovery Tables and Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Control File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
iv Table of Contents

Indicator File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Output File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Chapter 2: Configuring the Workflow Manager . . . . . . . . . . . . . . . . . 37
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Setting the Date/Time Display Format . . . . . . . . . . . . . . . . . . . . . . . . . 38 Customizing the Workflow Manager Options . . . . . . . . . . . . . . . . . . . . . . . . 39 Configuring General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Configuring Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Configuring Miscellaneous Options . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Enabling Enhanced Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Registering the PowerCenter Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Steps for Registering a PowerCenter Server . . . . . . . . . . . . . . . . . . . . . . 48 Deleting a PowerCenter Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Configuring Connection Object Permissions . . . . . . . . . . . . . . . . . . . . . . . . 51 Connection Object Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Setting Up a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . . 53 Database Connect Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Database Connection Code Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Configuring Environment SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Configuring a Relational Database Connection . . . . . . . . . . . . . . . . . . . 56 Deleting Connection Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Copying a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . 59 Replacing a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . . . 62

Chapter 3: Using the Workflow Manager . . . . . . . . . . . . . . . . . . . . . . 65
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Workflow Manager Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Workflow Manager Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Navigating the Workspace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Customizing Workflow Manager Windows . . . . . . . . . . . . . . . . . . . . . . 69 Using Toolbars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Searching for Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Arranging Objects in the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Table of Contents

v

Zooming the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Working with Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Viewing Object Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Entering Descriptions for Repository Objects . . . . . . . . . . . . . . . . . . . . . 73 Renaming Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Checking Out and In Versioned Repository Objects . . . . . . . . . . . . . . . . . . . 74 Checking Out Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Checking In Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Searching For Versioned Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Copying Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Copying Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Copying Workflow Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Comparing Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Steps for Comparing Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Working with Metadata Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Creating a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Editing a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Deleting a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Chapter 4: Working with Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Workflow Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Developing Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Creating a New Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Adding Tasks to Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Working with Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Using the Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Deleting a Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Editing a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Using the Workflow Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Step 1. Assign a Name and PowerCenter Server to the Workflow . . . . . . . 99 Step 2. Create a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Step 3. Schedule a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Using Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Pre-Defined Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 User-Defined Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

vi

Table of Contents

Scheduling a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Creating a Reusable Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Configuring Scheduler Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Editing Scheduler Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Disabling Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Validating a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Expression Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Task Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Workflow Properties Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Running Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Running the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Selecting a Server to Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . 122 Assigning the PowerCenter Server to a Workflow . . . . . . . . . . . . . . . . . 122 Running a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Running a Part of a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Running a Task in the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Suspending the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Configuring Suspension Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Stopping or Aborting the Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Server Handling of Stop and Abort . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Stopping or Aborting a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Chapter 5: Working with Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Creating a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Creating a Task in the Task Developer . . . . . . . . . . . . . . . . . . . . . . . . . 133 Creating a Task in the Workflow or Worklet Designer . . . . . . . . . . . . . 133 Configuring Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Reusable Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 AND or OR Input Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Disabling Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Failing Parent Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Validating Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Working with the Assignment Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Working with the Command Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Using Session Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Creating a Command Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Table of Contents

vii

Executing Commands in the Command Task . . . . . . . . . . . . . . . . . . . . 145 Working with the Control Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Working with the Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Using the Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Creating a Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Working with Event Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Example of User-Defined Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Working with Event-Raise Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Working With Event-Wait Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Working with the Timer Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Chapter 6: Working with Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Suspending Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Developing a Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Creating a Reusable Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Creating a Non-Reusable Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Configuring Worklet Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Adding Tasks in Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Nesting Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Using Worklet Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Persistent Worklet Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Overriding Initial Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Validating Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Chapter 7: Working with Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Creating a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Session Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Steps to Create a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Editing a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Edit Session Privilege . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Applying Attributes to All Instances . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Creating a Session Configuration Object . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Using Pre- and Post-Session SQL Commands . . . . . . . . . . . . . . . . . . . . . . . 186 Guidelines for Entering Pre- and Post-Session SQL Commands . . . . . . . 186 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

viii

Table of Contents

Using Pre- or Post-Session Shell Commands . . . . . . . . . . . . . . . . . . . . . . . . 188 Using Server and Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Configuring Non-Reusable Shell Commands . . . . . . . . . . . . . . . . . . . . 189 Configuring Reusable Shell Commands . . . . . . . . . . . . . . . . . . . . . . . . 192 Using Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Pre-Session Shell Command Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Using Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Validating a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Validating Multiple Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Running the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Selecting a Server to Run the Session . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Assigning the PowerCenter Server to a Session . . . . . . . . . . . . . . . . . . . 198 Stopping and Aborting a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Threshold Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Fatal Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 ABORT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 User Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 PowerCenter Server Handling for Session Failure . . . . . . . . . . . . . . . . . 201 Mapping Parameters and Variables in Sessions . . . . . . . . . . . . . . . . . . . . . . 203 Handling High Precision Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

Chapter 8: Working with Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Globalization Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Source Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Permissions and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Allocating Buffer Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Partitioning Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Configuring Sources in a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Configuring Readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Configuring Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Configuring Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Working with Relational Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Selecting the Source Database Connection . . . . . . . . . . . . . . . . . . . . . . 214 Defining the Treat Source Rows As Property . . . . . . . . . . . . . . . . . . . . 214 Configuring the Table Owner Name . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Overriding the SQL Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

Table of Contents

ix

Working with File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Configuring Source Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Configuring Fixed-Width File Properties . . . . . . . . . . . . . . . . . . . . . . . 220 Configuring Delimited File Properties . . . . . . . . . . . . . . . . . . . . . . . . . 222 Configuring Line Sequential Buffer Length . . . . . . . . . . . . . . . . . . . . . 225 Server Handling for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Character Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Multibyte Character Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Null Character Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Row Length Handling for Fixed-Width Flat Files . . . . . . . . . . . . . . . . . 228 Numeric Data Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Using a File List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Creating the File List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Configuring a Session to Use a File List . . . . . . . . . . . . . . . . . . . . . . . . 231

Chapter 9: Working with Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Globalization Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Target Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Partitioning Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Permissions and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Configuring Targets in a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Configuring Writers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Configuring Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Configuring Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Working with Relational Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Target Database Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Truncating Target Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Deadlock Retry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Dropping and Recreating Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Constraint-Based Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Bulk Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Table Name Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Reserved Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Working with Target Connection Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Working with Active Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

x

Table of Contents

Working with File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Configuring Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Configuring Fixed-Width Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Configuring Delimited Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Server Handling for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Writing to Fixed-Width Flat Files with Relational Target Definitions . . 268 Writing to Fixed-Width Files with Flat File Target Definitions . . . . . . . 269 Writing Multibyte Data to Fixed-Width Flat Files . . . . . . . . . . . . . . . . 270 Null Characters in Fixed-Width Files . . . . . . . . . . . . . . . . . . . . . . . . . 272 Character Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Writing Metadata to Flat File Targets . . . . . . . . . . . . . . . . . . . . . . . . . 273 Working with Heterogeneous Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

Chapter 10: Understanding Commit Points . . . . . . . . . . . . . . . . . . . 275
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Target-Based Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Source-Based Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Determining the Commit Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Switching from Source-Based to Target-Based Commit . . . . . . . . . . . . . 280 User-Defined Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Rolling Back Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Understanding Transaction Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Transformation Scope. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Understanding Transaction Control Units . . . . . . . . . . . . . . . . . . . . . . 289 Rules and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Setting Commit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

Chapter 11: Recovering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 Preparing for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Configuring the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Configuring the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Configuring the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Configuring the Target Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Creating pmcmd Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 Working with Repeatable Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Recovering a Suspended Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

Table of Contents

xi

Recovering a Suspended Workflow with Sequential Sessions . . . . . . . . . 305 Recovering a Suspended Workflow with Concurrent Sessions . . . . . . . . 306 Steps for Recovering a Suspended Workflow . . . . . . . . . . . . . . . . . . . . . 307 Recovering a Failed Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Recovering a Failed Workflow with Sequential Sessions . . . . . . . . . . . . . 308 Recovering a Failed Workflow with Concurrent Sessions . . . . . . . . . . . . 309 Steps for Recovering a Failed Workflow . . . . . . . . . . . . . . . . . . . . . . . . 310 Recovering a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Recovering Sequential Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Recovering Concurrent Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Steps for Recovering a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Server Handling for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Verifying Recovery Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Running Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Completing Unrecoverable Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

Chapter 12: Sending Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Configuring Email on UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Configuring Email on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Step 1. Verify the Informatica Service Startup Account . . . . . . . . . . . . . 322 Step 2. Configure a Microsoft Outlook User . . . . . . . . . . . . . . . . . . . . 322 Step 3. Configure Logon Network Security . . . . . . . . . . . . . . . . . . . . . 325 Step 4. Create Distribution Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 Step 5. Configure the PowerCenter Server Setup . . . . . . . . . . . . . . . . . 327 Working with Email Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 Email Address Tips and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 Steps to Create an Email Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Working with Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Using Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Email Variables and Format Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Configuring Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 Sample Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Working with Suspension Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Using Email Tasks in a Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . 341 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

xii

Table of Contents

Chapter 13: Pipeline Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Partition Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Number of Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Partition Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Configuring Partitioning Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Adding and Deleting Partition Points . . . . . . . . . . . . . . . . . . . . . . . . . 353 Adding and Deleting Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Entering Partition Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Specifying Partition Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Adding Keys and Key Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Cache Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Round-Robin Partition Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 Hash Keys Partition Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Hash Auto-Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Hash User Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 Key Range Partition Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Adding a Partition Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Adding Key Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Adding Filter Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Pass-Through Partition Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Database Partitioning Partition Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Partitioning Relational Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Entering an SQL Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Entering a Filter Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 Partitioning File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Guidelines for Partitioning File Sources . . . . . . . . . . . . . . . . . . . . . . . . 374 Using One Thread to Read a File Source . . . . . . . . . . . . . . . . . . . . . . . 375 Using Multiple Threads to Read a File Source . . . . . . . . . . . . . . . . . . . 375 Configuring for File Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 Partitioning Relational Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 Database Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Partitioning File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 Configuring Connection Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 Configuring File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Partitioning Joiner Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Partitioning Sorted Joiner Transformations . . . . . . . . . . . . . . . . . . . . . 384

Table of Contents

xiii

Using Sorted Flat Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Using Sorted Relational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Using Sorter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Optimizing Sorted Joiner Transformations with Partitions . . . . . . . . . . 390 Partitioning Lookup Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Partitioning Sorter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Configuring Sorter Transformation Work Directories . . . . . . . . . . . . . . 392 Mapping Variables in Partitioned Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . 394 Partitioning Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Restrictions on the Number of Partitions . . . . . . . . . . . . . . . . . . . . . . . 395 Partition Restrictions for Editing Objects . . . . . . . . . . . . . . . . . . . . . . . 396 Partition Restrictions for Informatica Application Products . . . . . . . . . . 397 Partitioning Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

Chapter 14: Monitoring Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Permissions and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Using the Workflow Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Opening the Workflow Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Connecting to Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Connecting to PowerCenter Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Filtering Tasks and Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Opening and Closing Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Viewing Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 Viewing Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 Customizing Workflow Monitor Options . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Configuring General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Configuring Gantt Chart View Options . . . . . . . . . . . . . . . . . . . . . . . . 411 Configuring Task View Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 Configuring Advanced Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 Using Workflow Monitor Toolbars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Working with Tasks and Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 Running a Task, Workflow, or Worklet . . . . . . . . . . . . . . . . . . . . . . . . 416 Resuming a Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Recovering a Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Stopping or Aborting Tasks and Workflows . . . . . . . . . . . . . . . . . . . . . 418 Scheduling and Unscheduling Workflows . . . . . . . . . . . . . . . . . . . . . . . 418

xiv

Table of Contents

Viewing Session Logs and Workflow Logs . . . . . . . . . . . . . . . . . . . . . . 419 Viewing History Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Workflow and Task Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Using the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Organizing Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Listing Tasks and Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 Navigating the Time Window in Gantt Chart View . . . . . . . . . . . . . . . 425 Zooming the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 Performing a Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Opening All Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Using the Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 Filtering in Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 Opening All Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Monitoring Session Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 Creating and Viewing Performance Details . . . . . . . . . . . . . . . . . . . . . . . . 436 Enabling Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Viewing Session Performance Details. . . . . . . . . . . . . . . . . . . . . . . . . . 436 Memory Requirement for Performance Details . . . . . . . . . . . . . . . . . . . 437 Understanding Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . 437 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

Chapter 15: Using Multiple Servers. . . . . . . . . . . . . . . . . . . . . . . . . . 443
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Using Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Using a File Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Running Sessions with Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Working with Server Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Distributing Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Server Grid Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Server Grid Guidelines and Requirements . . . . . . . . . . . . . . . . . . . . . . 448 Configuring Server Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Configuring Server Grid Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Configuring Workflow Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Configuring Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Override Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Steps for Creating a Server Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

Table of Contents

xv

Chapter 16: Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 Workflow Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Workflow Log Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 Configuring Workflow Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Viewing Workflow Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Session Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Session Log Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Load Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Detailed Transformation Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Configuring Session Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Viewing Session Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 Locating Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 Reading Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477

Chapter 17: Row Error Logging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Error Log Code Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Understanding the Error Log Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 PMERR_DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 PMERR_MSG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 PMERR_SESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 PMERR_TRANS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Understanding the Error Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Configuring Error Log Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493

Chapter 18: Session Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Session Log Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Changing the Session Log Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Changing the Session Log Name and Location . . . . . . . . . . . . . . . . . . . 498 Steps for Using $PMSessionLogFile . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 Database Connection Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Source File Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502 Changing the Source File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502 Changing the Source File and Directory . . . . . . . . . . . . . . . . . . . . . . . . 503
xvi Table of Contents

Steps for Using a Source File Parameter . . . . . . . . . . . . . . . . . . . . . . . . 503 Target File Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 Changing the Target File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 Changing the Target File and Directory . . . . . . . . . . . . . . . . . . . . . . . . 505 Steps for Using a Target File Parameter . . . . . . . . . . . . . . . . . . . . . . . . 505 Lookup File Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 Changing the Lookup File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 Changing the Lookup File and Directory . . . . . . . . . . . . . . . . . . . . . . . 507 Steps for Using a Lookup File Parameter . . . . . . . . . . . . . . . . . . . . . . . 507 Reject File Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 Changing the Reject File Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 Changing the Reject File and Directory . . . . . . . . . . . . . . . . . . . . . . . . 509 Steps for Using a Reject File Parameter . . . . . . . . . . . . . . . . . . . . . . . . 509 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510

Chapter 19: Parameter Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Parameter File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Guidelines for Creating Parameter Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Sample Parameter File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Configuring the Parameter File Location . . . . . . . . . . . . . . . . . . . . . . . . . . 518 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

Chapter 20: External Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 External Loader Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Permissions and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 External Loader Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 Loading Data Using Named Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 Staging Data to Flat Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 Partitioning Sessions with External Loaders . . . . . . . . . . . . . . . . . . . . . 526 Errors and Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Loading to DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 Setting DB2 External Loader Operation Modes . . . . . . . . . . . . . . . . . . 528 Configuring Authorities, Privileges, and Permissions . . . . . . . . . . . . . . 528 Configuring DB2 EE External Loader Attributes . . . . . . . . . . . . . . . . . 529

Table of Contents

xvii

Configuring DB2 EEE External Loader Attributes . . . . . . . . . . . . . . . . 530 Loading to Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Loading Multibyte Data to Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Oracle External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Reject File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 Loading to Sybase IQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Using Sybase IQ External Loader on UNIX . . . . . . . . . . . . . . . . . . . . . 535 Loading Multibyte Data to Sybase IQ . . . . . . . . . . . . . . . . . . . . . . . . . 535 Sybase IQ External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 536 Loading to Teradata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 Overriding the Control File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 Teradata MultiLoad External Loader Attributes . . . . . . . . . . . . . . . . . . 540 Teradata TPump External Loader Attributes . . . . . . . . . . . . . . . . . . . . . 542 Teradata FastLoad External Loader Attributes . . . . . . . . . . . . . . . . . . . . 545 Teradata Warehouse Builder External Loader Attributes . . . . . . . . . . . . 547 Creating an External Loader Connection . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Configuring External Loading in a Session . . . . . . . . . . . . . . . . . . . . . . . . . 553 Configuring a Session to Write to a File . . . . . . . . . . . . . . . . . . . . . . . . 553 Configuring File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Selecting an External Loader Connection . . . . . . . . . . . . . . . . . . . . . . . 555 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557

Chapter 21: Using FTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Mainframe Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Creating an FTP Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 FTP Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 Steps for Creating an FTP Connection . . . . . . . . . . . . . . . . . . . . . . . . 562 Creating an FTP Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 FTP File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 FTP File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568

Chapter 22: Using Incremental Aggregation. . . . . . . . . . . . . . . . . . . 573
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 PowerCenter Server Processing for Incremental Aggregation . . . . . . . . . . . . 575 Reinitializing the Aggregate Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 Moving or Deleting the Aggregate Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 577

xviii

Table of Contents

Finding Index and Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 Partitioning Guidelines with Incremental Aggregation . . . . . . . . . . . . . . . . 578 Preparing for Incremental Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Configuring the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Configuring the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

Chapter 23: Using pmcmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 Configuring Environment Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Configuring PM_CODEPAGENAME . . . . . . . . . . . . . . . . . . . . . . . . . 585 Configuring PMTOOL_DATEFORMAT . . . . . . . . . . . . . . . . . . . . . . 585 Configuring Repository Username and Password . . . . . . . . . . . . . . . . . 586 Configuring PM_HOME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Using the Command Line Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Connecting to the PowerCenter Server in the Command Line Mode . . . 589 pmcmd Return Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590 Using the Interactive Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 Connecting to the PowerCenter Server in the Interactive Mode . . . . . . . 592 Setting Defaults in the Interactive Mode . . . . . . . . . . . . . . . . . . . . . . . 593 pmcmd Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Command Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Using Quotation Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Syntax Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Aborttask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 Abortworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Connect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Disconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 Exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 Getrunningsessionsdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 Getserverdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 Getserverproperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 Getsessionstatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 Gettaskdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Getworkflowdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Pingserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Quit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602

Table of Contents

xix

Resumeworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Resumeworklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Scheduleworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Setfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Setnowait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Setwait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Showsettings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Shutdownserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Starttask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Startworkflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Stoptask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Stopworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Unscheduleworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610 Unsetfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610 Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Waittask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Waitworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611

Chapter 24: Session Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 Memory Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Determining Cache Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Cache Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Cache Column Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 Cache Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620 Aggregator Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Calculating the Aggregator Index Cache. . . . . . . . . . . . . . . . . . . . . . . . 621 Calculating the Aggregator Data Cache . . . . . . . . . . . . . . . . . . . . . . . . 622 Joiner Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 Calculating the Number of Master Rows . . . . . . . . . . . . . . . . . . . . . . . 625 Calculating the Joiner Index Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 Calculating the Joiner Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 Lookup Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 Static Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 Dynamic Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 Sharing Partitioned Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629

xx

Table of Contents

Calculating the Lookup Index Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 629 Calculating the Lookup Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Rank Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632 Calculating the Rank Index Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . 632 Calculating the Rank Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

Chapter 25: Performance Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636 Identifying the Performance Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Identifying Target Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Identifying Source Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Identifying Mapping Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638 Identifying a Session Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639 Identifying a System Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 Optimizing the Target Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642 Dropping Indexes and Key Constraints . . . . . . . . . . . . . . . . . . . . . . . . 642 Increasing Checkpoint Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642 Bulk Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642 External Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 Increasing Database Network Packet Size . . . . . . . . . . . . . . . . . . . . . . . 643 Optimizing Oracle Target Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 643 Optimizing the Source Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Optimizing the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Using tempdb to Join Sybase and Microsoft SQL Server Tables . . . . . . . 646 Using Conditional Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 Increasing Database Network Packet Sizes . . . . . . . . . . . . . . . . . . . . . . 646 Connecting to Oracle Source Databases . . . . . . . . . . . . . . . . . . . . . . . . 646 Optimizing the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Configuring Single-Pass Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Optimizing Datatype Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 648 Eliminating Transformation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 648 Optimizing Lookup Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 649 Optimizing Filter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 650 Optimizing Aggregator Transformations . . . . . . . . . . . . . . . . . . . . . . . 650 Optimizing Joiner Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Optimizing Sequence Generator Transformations . . . . . . . . . . . . . . . . . 652 Optimizing Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652

Table of Contents

xxi

Optimizing the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Pipeline Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Allocating Buffer Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Increasing the Cache Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658 Increasing the Commit Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658 Disabling High Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658 Reducing Error Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Removing Staging Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Optimizing the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660 Improving Network Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660 Using Multiple PowerCenter Servers . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Using Server Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Running the PowerCenter Server in ASCII Data Movement Mode . . . . . 661 Using Additional CPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Reducing Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Using Processor Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Pipeline Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 Optimizing the Source Database for Partitioning . . . . . . . . . . . . . . . . . 663 Optimizing the Target Database for Partitioning . . . . . . . . . . . . . . . . . 664

Appendix A: Session Properties Reference . . . . . . . . . . . . . . . . . . . 667
General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668 Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670 General Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670 Performance Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673 Config Object Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 Advanced Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 Log Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Error Handling Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 Mapping Tab (Transformations View) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681 Connections Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681 Sources Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683 Targets Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692 Mapping Tab (Partitions View) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 Partition Properties Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 KeyRange Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 HashKeys Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706

xxii

Table of Contents

Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 Non-Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709 Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710 Reusable Pre- or Post-Session Commands . . . . . . . . . . . . . . . . . . . . . . 711 Non-Reusable Pre- or Post-Session Commands . . . . . . . . . . . . . . . . . . 712 Reusable Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714 Non-Reusable Email. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Email Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718

Appendix B: Workflow Properties Reference . . . . . . . . . . . . . . . . . . 721
General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722 Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724 Scheduler Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726 Edit Scheduler Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 Variables Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 Events Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732 Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733

Appendix C: Session Properties Comparison Reference . . . . . . . . 735
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736 General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 Source Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738 Target Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743 Session Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750 Performance Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752 Source Location Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 Time Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 Schedule Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 Start Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 Duration Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 Use Absolute Time Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757 Log and Error Handling Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 Log File Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 Parameter File Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Batch Handling Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

Table of Contents

xxiii

Error Handling Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Transformations Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761 Partitions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763

xxiv

Table of Contents

List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure 1-1. PowerCenter Server and Data Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2. Partitioned Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3. PowerCenter Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4. Thread Creation for a Simple Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5. Thread Creation for a Pass-through Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6. Pipeline Stages in a Mapping With an Unsorted Aggregator Transformation . . . 1-7. Pipeline Stages in a Mapping with an Additional Partition Point . . . . . . . . . . . . 1-8. Thread Creation for a Mapping with Three Partitions . . . . . . . . . . . . . . . . . . . . 1-9. Thread Creation with Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10. Thread Creation with a Partition Point at a Joiner Transformation . . . . . . . . . 1-11. Target Load Order Groups and Source Pipelines . . . . . . . . . . . . . . . . . . . . . . . 1-12. Event Viewer Application Log Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13. Application Log Message Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1. Workflow Manager General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2. Workflow Manager Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3. Copy Wizard, Versioning, and Target Load Type Options . . . . . . . . . . . . . . . . 3-1. Sample Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2. Workflow Manager Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3. Check In Workflow Manager Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4. Query Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5. Diff Tool Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1. Sample Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2. Sample Workflow With Two Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3. Valid Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4. Example of a Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5. Setting Link Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6. Displaying Link Condition in the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7. Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8. Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9. Expression Using a Pre-Defined Workflow Variable . . . . . . . . . . . . . . . . . . . . . 4-10. Status Variable Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11. PrevTaskStatus Variable Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12. Sample Workflow Using Workflow Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13. Schedule tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14. Customized Repeat Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15. Example Workflow - Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16. Running Part of a Workflow - Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1. General Tab - Edit Tasks Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2. Revert Button in Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3. Run If Previous Completed Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2 .. 3 .. 6 . 15 . 15 . 17 . 18 . 18 . 20 . 20 . 22 . 29 . 30 . 40 . 42 . 43 . 66 . 68 . 75 . 76 . 81 . 88 . 89 . 93 . 93 . 95 . 95 . 96 104 107 107 108 108 115 116 120 125 135 137 146

List of Figures

xxv

Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

5-4. Example Workflow Using a Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5. Example Workflow without a Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6. Expanded Example Workflow Using a Decision Task . . . . . . . . . . . . . . . . . . . . 5-7. Example of User-Defined Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8. Example Workflow Using the Timer Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1. Workflow with Multiple Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2. Workflow with Nested Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3. Example of Persistent Worklet Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1. Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2. Session Target Object Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3. Connection Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4. Config Object Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5. Session Configuration Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6. Stop or Continue the Session on Pre- or Post-Session SQL Errors . . . . . . . . . . . 7-7. Make Reusable Option for Pre-Session Shell Commands . . . . . . . . . . . . . . . . . . 7-8. Stop or Continue the Session on Pre-Session Shell Command Error . . . . . . . . . . 7-9. Assign Server Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1. Sources Node of the Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2. Readers Settings in the Sources Node of the Mapping Tab . . . . . . . . . . . . . . . . 8-3. Connections Settings in the Sources Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4. Properties Settings in the Sources Node of the Mapping Tab . . . . . . . . . . . . . . . 8-5. Treat Source Rows As Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6. Source Table Owner Name Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7. SQL Query Override Property in the Session Properties . . . . . . . . . . . . . . . . . . 8-8. Properties Settings in the Sources Node for a Flat File Source . . . . . . . . . . . . . . 8-9. Flat Files Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-10. Fixed-Width File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11. Flat Files Dialog Box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12. Delimited File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-13. Line Sequential Buffer Length Property for File Sources . . . . . . . . . . . . . . . . . 9-1. Defining Target Properties in the Session Properties . . . . . . . . . . . . . . . . . . . . . 9-2. Writers Settings on the Mapping Tab of the Session Properties . . . . . . . . . . . . . 9-3. Connections Settings on the Mapping Tab of the Session Properties . . . . . . . . . 9-4. Properties Settings on the Mapping Tab of the Session Properties . . . . . . . . . . . 9-5. Properties Settings on the Mapping Tab for a Relational Target . . . . . . . . . . . . 9-6. Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7. Session Retry on Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8. Mapping Using Constraint-Based Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9. Properties Settings on the Mapping Tab for a Flat File Target . . . . . . . . . . . . . . 9-10. Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11. Flat Files Dialog Box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12. Fixed Width Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13. Flat Files Dialog Box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.150 .150 .151 .153 .161 .167 .168 .169 .177 .179 .181 .183 .184 .187 .189 .193 .198 .210 .211 .212 .213 .215 .216 .217 .219 .221 .221 .223 .223 .225 .236 .237 .238 .239 .242 .244 .247 .250 .262 .264 .265 .265 .266

xxvi

List of Figures

Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

9-14. Delimited File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . 10-1. Mapping with a Single Commit Source . . . . . . . . . . . . . . . . . . . . . . . 10-2. Mapping with Multiple Commit Sources . . . . . . . . . . . . . . . . . . . . . . 10-3. Mapping with Targets Connected to a Commit Source . . . . . . . . . . . . 10-4. Mapping a Custom Transformation with a Commit Source . . . . . . . . . 10-5. Roll Back on Failed Commit Example . . . . . . . . . . . . . . . . . . . . . . . . 10-6. Transaction Control Units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7. Session Commit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1. Mapping You Can Enable for Recovery . . . . . . . . . . . . . . . . . . . . . . . 11-2. Mapping You Cannot Enable for Recovery . . . . . . . . . . . . . . . . . . . . . 11-3. Modified Mapping You Can Enable for Recovery . . . . . . . . . . . . . . . . 11-4. Resuming a Suspended Workflow with Sequential Sessions . . . . . . . . . 11-5. Resuming a Suspended Workflow with Concurrent Sessions . . . . . . . . 11-6. Recovering Part of a Workflow With Sequential Sessions. . . . . . . . . . . 11-7. Recovering Part of a Workflow with Concurrent Sessions . . . . . . . . . . 11-8. Recovering Concurrent Sessions Individually . . . . . . . . . . . . . . . . . . . 12-1. Email Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2. Post-Session Email Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3. Suspension Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4. Email Task in a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5. Using Post-Session Commands to Generate Reports . . . . . . . . . . . . . . 12-6. Using Email Variables to Attach Reports . . . . . . . . . . . . . . . . . . . . . . 12-7. Sending Email without Microsoft Outlook . . . . . . . . . . . . . . . . . . . . . 13-1. Default Partition Points and Stages in a Sample Mapping . . . . . . . . . . 13-2. Threads Created for a Sample Mapping with Three Partitions . . . . . . . 13-3. Sample Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4. Session Properties Partitions View on the Mapping Tab . . . . . . . . . . . 13-5. Edit Partition Point Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6. Sample Mapping Showing Valid Partition Points . . . . . . . . . . . . . . . . 13-7. Mapping where Round-robin Partitioning Can Increase Performance . . 13-8. Mapping where Hash Partitioning Can Increase Performance . . . . . . . 13-9. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-10. Mapping where Key Range Partitioning Can Increase Performance . . 13-11. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-12. Adding Key Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-13. Mapping where Pass-through Partitioning Can Increase Performance . 13-14. Overriding the SQL Query and Entering a Filter Condition . . . . . . . 13-15. Properties Settings for Relational Targets in the Session Properties . . . 13-16. Connections Settings for File Targets in the Session Properties . . . . . 13-17. Properties Settings for File Targets in the Session Properties . . . . . . . 13-18. Sorted File Data with 1:n Partitions . . . . . . . . . . . . . . . . . . . . . . . . . 13-19. Sorted File Data Passed Through a Single Partition . . . . . . . . . . . . . . 13-20. Sorted Relational Data with 1:n Partitioning . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

267 279 280 281 282 286 290 292 303 304 304 306 307 308 309 312 328 332 339 341 342 343 343 347 348 349 351 352 354 360 361 362 363 364 365 367 371 378 381 382 386 387 388

List of Figures

xxvii

Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

13-21. Sorted Relational Data Passed Through a Single Partition . . . . . . . . . . . . . . . 13-22. Using Sorter Transformations with Hash Auto-Keys to Maintain Sort Order . 13-23. Session Properties - Configuring Sorter Transformations . . . . . . . . . . . . . . . . 14-1. Workflow Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2. Workflow Monitor Statistics Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3. General Tab for Workflow Monitor Options . . . . . . . . . . . . . . . . . . . . . . . . . 14-4. Gantt Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5. Task View Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-6. Advanced Tab for Workflow Monitor Options . . . . . . . . . . . . . . . . . . . . . . . . 14-7. Standard Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-8. Server Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9. View Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10. Filter Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-11. History Names Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12. Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13. Organizing Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14. Zooming the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15. Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-16. Session Properties Transformation Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 15-1. Distributing Sessions in a Server Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2. Running a Non-session Task on the Master Server . . . . . . . . . . . . . . . . . . . . . 16-1. Properties Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1. Using $PMSessionLogFile as the Name of the Session Log . . . . . . . . . . . . . . . 18-2. Using Parameters to Change the Session Source File . . . . . . . . . . . . . . . . . . . . 18-3. Using Parameters to Change the Session Target File . . . . . . . . . . . . . . . . . . . . 18-4. Using Parameters to Change the Session Lookup File . . . . . . . . . . . . . . . . . . . 18-5. Using Parameters to Change the Reject File Name . . . . . . . . . . . . . . . . . . . . . 20-1. Control File Editor Dialog Box for Teradata . . . . . . . . . . . . . . . . . . . . . . . . . 20-2. Writers Settings on the Mapping Tab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3. Properties Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-4. Connections Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1. Incremental Aggregation Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 25-1. Single-Pass Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1. General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2. Properties Tab - General Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3. Properties Tab - Performance Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4. Config Object Tab - Advanced Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5. Config Object Tab - Log Option Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6. Config Object Tab - Error Handling Settings . . . . . . . . . . . . . . . . . . . . . . . . . A-7. Mapping Tab - Connections Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-8. Mapping Tab - Sources Node - Readers Settings . . . . . . . . . . . . . . . . . . . . . . . A-9. Mapping Tab - Sources Node - Connections Settings . . . . . . . . . . . . . . . . . . . . A-10. Mapping Tab - Sources Node - Properties Settings . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.389 .390 .393 .403 .408 .410 .411 .412 .413 .415 .415 .415 .415 .420 .423 .426 .427 .431 .434 .446 .447 .477 .497 .502 .504 .506 .508 .539 .553 .554 .556 .580 .648 .668 .670 .673 .676 .677 .679 .681 .684 .685 .686

xxviii

List of Figures

Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure

A-11. Flat Files Dialog Box for Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12. Fixed Width Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-13. Delimited Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . A-14. Mapping Tab - Targets Node - Writers Settings . . . . . . . . . . . . . . . . A-15. Mapping Tab - Targets Node - Connections Settings . . . . . . . . . . . . A-16. Mapping Tab - Targets Node - Properties Settings (Relational) . . . . . A-17. Mapping Tab - Targets Node - File Properties Settings . . . . . . . . . . . A-18. Flat Files Dialog Box for Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . A-19. Fixed-Width Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . A-20. Delimited Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . . A-21. Mapping Tab - Transformations Node . . . . . . . . . . . . . . . . . . . . . . . A-22. Mapping Tab - Partitions Properties Node . . . . . . . . . . . . . . . . . . . . A-23. Mapping Tab - KeyRange Node . . . . . . . . . . . . . . . . . . . . . . . . . . . A-24. Mapping Tab - Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . A-25. Edit Partition Point Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . A-26. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-27. Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-28. Task Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-29. Edit Pre-Session Command Dialog Box . . . . . . . . . . . . . . . . . . . . . . A-30. Email Object Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-31. On-Success or On-Failure Email - General Tab . . . . . . . . . . . . . . . . A-32. On-Success or On-Failure Email - Properties Tab . . . . . . . . . . . . . . . A-33. Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1. Workflow Properties - General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . B-2. Workflow Properties - Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . B-3. Workflow Properties - Scheduler Tab . . . . . . . . . . . . . . . . . . . . . . . . B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box . . . B-5. Workflow Properties - Customized Repeat Dialog Box . . . . . . . . . . . . B-6. Workflow Properties - Variables Tab . . . . . . . . . . . . . . . . . . . . . . . . . B-7. Workflow Properties - Events Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . B-8. Workflow Properties - Metadata Extensions Tab . . . . . . . . . . . . . . . . C-1. Server Manager General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2. Server Manager Source Options Dialog Box for File Sources . . . . . . . . C-3. Server Manager Fixed-Width Properties Dialog Box . . . . . . . . . . . . . . C-4. Server Manager Delimited File Properties Dialog Box . . . . . . . . . . . . . C-5. Server Manager Source Options Dialog Box (XML Sources) . . . . . . . . C-6. Server Manager FTP Properties Dialog Box . . . . . . . . . . . . . . . . . . . . C-7. Server Manager Targets Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . C-8. Server Manager Output Files Dialog Box . . . . . . . . . . . . . . . . . . . . . . C-9. Server Manager External Loader Properties . . . . . . . . . . . . . . . . . . . . C-10. Server Manager Fixed-Width Dialog Box (Output Files) . . . . . . . . . . C-11. Server Manager Delimited File Properties Dialog Box (Output Files) C-12. Server Manager XML Target Dialog Box . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

688 689 690 693 694 696 699 701 702 702 704 705 706 707 708 709 710 712 713 715 716 717 718 722 724 726 727 729 731 732 733 737 739 740 741 741 742 744 745 747 747 748 748

List of Figures

xxix

Figure Figure Figure Figure Figure Figure Figure Figure Figure

C-13. C-14. C-15. C-16. C-17. C-18. C-19. C-20. C-21.

Server Server Server Server Server Server Server Server Server

Manager Manager Manager Manager Manager Manager Manager Manager Manager

Reject File Dialog Box . . . . . . . . . . . Pre-Session Commands Dialog Box . . Post-Session Commands and Email . . Configuration Parameter Dialog Box . Source Location Tab. . . . . . . . . . . . . Time tab . . . . . . . . . . . . . . . . . . . . . Repeat Dialog Box . . . . . . . . . . . . . . Log and Error Handling Tab . . . . . . . Transformations Tab . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

.. .. .. .. .. .. .. .. ..

. . . . . . . . .

.. .. .. .. .. .. .. .. ..

. . . . . . . . .

. . . . . . . . .

.. .. .. .. .. .. .. .. ..

. . . . . . . . .

. . . . . . . . .

.. .. .. .. .. .. .. .. ..

. . . . . . . . .

.. .. .. .. .. .. .. .. ..

. . . . . . . . .

.749 .750 .751 .752 .754 .755 .756 .758 .761

xxx

List of Figures

List of Tables
Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table 1-1. PowerCenter Server Connectivity Requirements . . . . . . . . . . . . . . . . . . . . . . . . 1-2. Processing Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1. Workflow Manager General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2. Workflow Manager Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3. Workflow Manager Miscellaneous Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4. Default Permissions for Connection Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5. Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6. TCP/IP Settings to Register a Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7. Native Connect String Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8. Source and Target Code Page Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9. Relational Database Connection Information . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10. Relational Database Connection Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1. Metadata Extension Attributes in the Workflow Manager . . . . . . . . . . . . . . . . . . 3-2. Workflow Manager Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3. Keyboard Shortcuts for Navigating the Workspace . . . . . . . . . . . . . . . . . . . . . . . 4-1. Task-Specific Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2. Datatype Default Values for User-defined Workflow Variables . . . . . . . . . . . . . . 4-3. Schedule Tab Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4. Repeat Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1. Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2. Timer Task Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1. Apply All Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2. PowerCenter Server Behavior for Failed Sessions . . . . . . . . . . . . . . . . . . . . . . . . 8-1. Treat Source Rows As Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2. Flat File Source Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3. Fixed-Width File Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4. Delimited File Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5. Support for ASCII and Unicode Data Movement Modes . . . . . . . . . . . . . . . . . . 8-6. Null Character Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1. Support for ASCII and Unicode Data Movement Modes . . . . . . . . . . . . . . . . . . 9-2. Relational Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3. Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4. PowerCenter Server Commands on Supported Databases . . . . . . . . . . . . . . . . . . 9-5. Flat File Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6. Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7. Writing to a Fixed-Width Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8. Delimited File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9. Datatype Modifications for File Target Columns . . . . . . . . . . . . . . . . . . . . . . . . 9-10. Field Length Measurements for Fixed-Width Flat File Targets . . . . . . . . . . . . . 9-11. Characters to Include when Calculating Field Length for Fixed-Width Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6 . 14 . 40 . 42 . 43 . 44 . 47 . 49 . 54 . 54 . 57 . 58 . 83 . 86 . 86 105 110 115 117 132 162 179 201 215 220 222 224 226 228 234 242 244 245 262 264 266 267 269 270 270

List of Tables

xxxi

Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table

10-1. Transformation Scope Property Values . . . . . . . . . . . . . . . . . . . . . 10-2. Session Commit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1. PM_RECOVERY Table Definition . . . . . . . . . . . . . . . . . . . . . . . . 11-2. PM_TGT_RUN_ID Table Definition . . . . . . . . . . . . . . . . . . . . . 11-3. pmcmd Return Codes for Recovery . . . . . . . . . . . . . . . . . . . . . . . . 11-4. Transformations that Output Repeatable Data . . . . . . . . . . . . . . . . 12-1. Email Variables for Post-Session Email . . . . . . . . . . . . . . . . . . . . . 12-2. Format Tags for Email Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1. Default Partition Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2. Options on Session Properties Partitions View on the Mapping Tab 13-3. Edit Partition Point Dialog Box Options . . . . . . . . . . . . . . . . . . . . 13-4. Valid Partition Types for Partition Points . . . . . . . . . . . . . . . . . . . 13-5. File Properties Settings for File Sources . . . . . . . . . . . . . . . . . . . . . 13-6. Configuring Source File Name for Single-Threaded Reading . . . . . 13-7. Configuring Source File Name for Multi-Threaded Reading . . . . . . 13-8. Partitioning Relational Target Attributes . . . . . . . . . . . . . . . . . . . . 13-9. File Targets Connection Options . . . . . . . . . . . . . . . . . . . . . . . . . 13-10. Target File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-11. Variable Value Calculations with Partitioned Sessions . . . . . . . . . 13-12. Restrictions on the Number of Partitions for Transformations . . . 13-13. Partitioning Guidelines for Informatica Application Products . . . . 14-1. Workflow Monitor General Options . . . . . . . . . . . . . . . . . . . . . . . 14-2. Gantt Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3. Advanced Workflow Monitor Options . . . . . . . . . . . . . . . . . . . . . 14-4. Workflow and Task Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5. Session Details on the Transformation Statistics Tab . . . . . . . . . . . 14-6. Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1. Losing Connectivity in a Server Grid . . . . . . . . . . . . . . . . . . . . . . 15-2. Override Workflow Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3. Override Server Grid Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1. Log File Default Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2. Workflow Log Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3. Session Log Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4. Session Log Tracing Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-5. Row Indicators in Reject File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-6. Column Indicators in Reject File . . . . . . . . . . . . . . . . . . . . . . . . . 17-1. PMERR_DATA Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-2. PMERR_MSG Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3. PMERR_SESS Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-4. PMERR_TRANS Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . 17-5. Error Log File Column Headers . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6. Error Log Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1. Naming Conventions for User-Defined Session Parameters . . . . . .

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.288 .292 .299 .299 .300 .301 .334 .334 .347 .352 .353 .357 .376 .376 .377 .379 .381 .382 .394 .396 .397 .410 .411 .413 .421 .434 .438 .448 .451 .451 .456 .458 .464 .473 .478 .479 .483 .485 .487 .487 .490 .494 .496

xxxii

List of Tables

Table 19-1. Parameters and Variables in Parameter File . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 19-2. Naming Conventions for User-Defined Session Parameters . . . . . . . . . . . . . . . . Table 20-1. Partitioning Guidelines for External Loaders . . . . . . . . . . . . . . . . . . . . . . . . . . Table 20-2. DB2 EE External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 20-3. DB2 EE External Loader Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 20-4. DB2 EEE External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 20-5. Oracle External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 20-6. Sybase IQ External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 20-7. Teradata MultiLoad External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . Table 20-8. Teradata MultiLoad External Loader Attributes Defined at the Session Level . . . Table 20-9. Teradata TPump External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . Table 20-10. Teradata TPump External Loader Attributes Defined at the Session Level . . . . Table 20-11. Teradata FastLoad External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . Table 20-12. Teradata FastLoad External Loader Attributes Defined at the Session Level . . . Table 20-13. Teradata Warehouse Builder Operators and Protocol . . . . . . . . . . . . . . . . . . . Table 20-14. Teradata Warehouse Builder External Loader Attributes . . . . . . . . . . . . . . . . . Table 20-15. Teradata Warehouse Builder External Loader Attributes Defined at the Session Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 20-16. Properties Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 21-1. FTP Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-1. pmcmd Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-2. Connection Information for the Command Line Mode . . . . . . . . . . . . . . . . . . Table 23-3. pmcmd Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-4. Setting Defaults for the Interactive Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-5. Command Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 23-6. pmcmd Syntax Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 24-1. Caching Storage Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 24-2. Cache File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 24-3. Aggregate Cache Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 24-7. Column Sizes for Cache Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 24-4. Rank Cache Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 24-5. Joiner Cache Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 24-6. Lookup Cache Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 25-1. Session Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A-1. General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A-2. Properties Tab - General Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A-3. Properties Tab - Performance Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A-4. Config Object Tab - Advanced Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A-5. Config Object Tab - Log Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A-6. Config Object Tab - Error Handling Settings . . . . . . . . . . . . . . . . . . . . . . . . . . Table A-7. Mapping Tab - Connections Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table A-8. Mapping Tab - Sources Node - Connections Settings . . . . . . . . . . . . . . . . . . . . Table A-9. Mapping Tab - Sources Node - Properties Settings (Relational Sources) . . . . . . . Table A-10. Mapping Tab - Sources Node - Properties Settings (File Sources) . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

513 520 527 529 530 531 534 536 540 542 542 544 545 546 547 547 549 555 563 582 590 590 593 594 595 614 616 617 618 618 618 618 655 668 671 674 676 678 679 682 685 686 687
xxxiii

List of Tables

Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table

A-11. Fixed-Width Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .689 A-12. Delimited Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .691 A-13. Mapping Tab - Targets Node - Writers Settings . . . . . . . . . . . . . . . . . . . . . . . . .693 A-14. Mapping Tab - Targets Node - Connections Settings . . . . . . . . . . . . . . . . . . . . . .695 A-15. Mapping Tab - Targets Node - Properties Settings (Relational) . . . . . . . . . . . . . .697 A-16. Mapping Tab - Targets Node - File Properties Settings . . . . . . . . . . . . . . . . . . . .699 A-17. Fixed-Width Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .702 A-18. Delimited Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .703 A-19. Mapping Tab - Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .707 A-20. Edit Partition Point Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .708 A-21. Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .711 A-22. Components Tab Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .711 A-23. Pre- or Post-Session Commands - General Tab . . . . . . . . . . . . . . . . . . . . . . . . . .713 A-24. Pre- or Post-Session Commands - Properties Tab . . . . . . . . . . . . . . . . . . . . . . . .714 A-25. Pre- or Post-Session Commands - Commands Tab . . . . . . . . . . . . . . . . . . . . . . .714 A-26. On-Success or On-Failure Emails - General Tab . . . . . . . . . . . . . . . . . . . . . . . . .716 A-27. On-Success or On-Failure Emails - Properties Tab . . . . . . . . . . . . . . . . . . . . . . .717 A-28. Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .718 B-1. Workflow Properties - General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .722 B-2. Workflow Properties - Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .724 B-3. Workflow Properties - Scheduler Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .727 B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box . . . . . . . . . . . . .728 B-5. Workflow Properties - Repeat Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . .729 B-6. Workflow Properties - Variables Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .731 B-7. Workflow Properties - Events Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .732 B-8. Workflow Properties - Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . .733 C-1. General Session Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .738 C-2. Source Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .738 C-3. File Source Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .739 C-4. XML Sources Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .742 C-5. FTP Properties Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .743 C-6. Target Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .743 C-7. Relational Target Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .744 C-8. File Target Output Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .746 C-9. XML Target Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .748 C-10. Reject Files Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .749 C-11. Pre-Session Commands Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .750 C-12. Post-Session Commands and Email Comparison . . . . . . . . . . . . . . . . . . . . . . . . .751 C-13. Performance Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .752 C-14. Configuration Parameters Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .753 C-15. Log File Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .759 C-16. Error Handling Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .759 C-17. Transformations Tab Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . .761

xxxiv

List of Tables

Preface
Welcome to PowerCenter, Informatica’s software product that delivers an open, scalable data integration solution addressing the complete life cycle for all data integration projects including data warehouses and data marts, data migration, data synchronization, and information hubs. PowerCenter combines the latest technology enhancements for reliably managing data repositories and delivering information resources in a timely, usable, and efficient manner. The PowerCenter metadata repository coordinates and drives a variety of core functions, including extracting, transforming, loading, and managing data. The PowerCenter Server can extract large volumes of data from multiple platforms, handle complex transformations on the data, and support high-speed loads. PowerCenter can simplify and accelerate the process of moving data warehouses from development to test to production.

xxxv

New Features and Enhancements
This section describes new features and enhancements to PowerCenter 7.1.1, 7.1, and 7.0.

PowerCenter 7.1.1
This section describes new features and enhancements to PowerCenter 7.1.1.

Data Profiling

Data sampling. You can create a data profile for a sample of source data instead of the entire source. You can view a profile from a random sample of data, a specified percentage of data, or for a specified number of rows starting with the first row. Verbose data enhancements. You can specify the type of verbose data you want the PowerCenter Server to write to the Data Profiling warehouse. The PowerCenter Server can write all rows, the rows that meet the business rule, or the rows that do not meet the business rule. Session enhancement. You can save sessions that you create from the Profile Manager to the repository. Domain Inference function tuning. You can configure the Data Profiling Wizard to filter the Domain Inference function results. You can configure a maximum number of patterns and a minimum pattern frequency. You may want to narrow the scope of patterns returned to view only the primary domains, or you may want to widen the scope of patterns returned to view exception data. Row Uniqueness function. You can determine unique rows for a source based on a selection of columns for the specified source. Define mapping, session, and workflow prefixes. You can define default mapping, session, and workflow prefixes for the mappings, sessions, and workflows generated when you create a data profile. Profile mapping display in the Designer. The Designer displays profile mappings under a profile mappings node in the Navigator.

♦ ♦

♦ ♦

PowerCenter Server
♦ ♦ ♦

Code page. PowerCenter supports additional Japanese language code pages, such as JIPSEkana, JEF-kana, and MELCOM-kana. Flat file partitioning. When you create multiple partitions for a flat file source session, you can configure the session to create multiple threads to read the flat file source. pmcmd. You can use parameter files that reside on a local machine with the Startworkflow command in the pmcmd program. When you use a local parameter file, pmcmd passes variables and values in the file to the PowerCenter Server.

xxxvi

Preface

SuSE Linux support. The PowerCenter Server runs on SuSE Linux. On SuSE Linux, you can connect to IBM, DB2, Oracle, and Sybase sources, targets, and repositories using native drivers. Use ODBC drivers to access other sources and targets. Reserved word support. If any source, target, or lookup table name or column name contains a database reserved word, you can create and maintain a file, reswords.txt, containing reserved words. When the PowerCenter Server initializes a session, it searches for reswords.txt in the PowerCenter Server installation directory. If the file exists, the PowerCenter Server places quotes around matching reserved words when it executes SQL against the database. Teradata external loader. When you load to Teradata using an external loader, you can now override the control file. Depending on the loader you use, you can also override the error, log, and work table names by specifying different tables on the same or different Teradata database.

Repository

Exchange metadata with other tools. You can exchange source and target metadata with other BI or data modeling tools, such as Business Objects Designer. You can export or import multiple objects at a time. When you export metadata, the PowerCenter Client creates a file format recognized by the target tool.

Repository Server

pmrep. You can use pmrep to perform the following functions:
− − −

Remove repositories from the Repository Server cache entry list. Enable enhanced security when you create a relational source or target connection in the repository. Update a connection attribute value when you update the connection.

SuSE Linux support. The Repository Server runs on SuSE Linux. On SuSE Linux, you can connect to IBM, DB2, Oracle, and Sybase repositories.

Security

Oracle OS Authentication. You can now use Oracle OS Authentication to authenticate database users. Oracle OS Authentication allows you to log on to an Oracle database if you have a logon to the operating system. You do not need to know a database user name and password. PowerCenter uses Oracle OS Authentication when the user name for an Oracle connection is PmNullUser.

Web Services Provider

Attachment support. When you import web service definitions with attachment groups, you can pass attachments through the requests or responses in a service session. The document type you can attach is based on the mime content of the WSDL file. You can attach document types such as XML, JPEG, GIF, or PDF.

Preface

xxxvii

Pipeline partitioning. You can create multiple partitions in a session containing web service source and target definitions. The PowerCenter Server creates a connection to the Web Services Hub based on the number of sources, targets, and partitions in the session.

XML

Multi-level pivoting. You can now pivot more than one multiple-occurring element in an XML view. You can also pivot the view row.

PowerCenter 7.1
This section describes new features and enhancements to PowerCenter 7.1.

Data Profiling
♦ ♦

Data Profiling for VSAM sources. You can now create a data profile for VSAM sources. Support for verbose mode for source-level functions. You can now create data profiles with source-level functions and write data to the Data Profiling warehouse in verbose mode. Aggregator function in auto profiles. Auto profiles now include the Aggregator function. Creating auto profile enhancements. You can now select the columns or groups you want to include in an auto profile and enable verbose mode for the Distinct Value Count function. Purging data from the Data Profiling warehouse. You can now purge data from the Data Profiling warehouse. Source View in the Profile Manager. You can now view data profiles by source definition in the Profile Manager. PowerCenter Data Profiling report enhancements. You can now view PowerCenter Data Profiling reports in a separate browser window, resize columns in a report, and view verbose data for Distinct Value Count functions. Prepackaged domains. Informatica provides a set of prepackaged domains that you can include in a Domain Validation function in a data profile.

♦ ♦

♦ ♦ ♦

Documentation
♦ ♦

Web Services Provider Guide. This is a new book that describes the functionality of Real-time Web Services. It also includes information from the version 7.0 Web Services Hub Guide. XML User Guide. This book consolidates XML information previously documented in the Designer Guide, Workflow Administration Guide, and Transformation Guide.

Licensing
Informatica provides licenses for each CPU and each repository rather than for each installation. Informatica provides licenses for product, connectivity, and options. You store
xxxviii Preface

the license keys in a license key file. You can manage the license files using the Repository Server Administration Console, the PowerCenter Server Setup, and the command line program, pmlic.

PowerCenter Server
♦ ♦ ♦ ♦

64-bit support. You can now run 64-bit PowerCenter Servers on AIX and HP-UX (Itanium). Partitioning enhancements. If you have the Partitioning option, you can define up to 64 partitions at any partition point in a pipeline that supports multiple partitions. PowerCenter Server processing enhancements. The PowerCenter Server now reads a block of rows at a time. This improves processing performance for most sessions. CLOB/BLOB datatype support. You can now read and write CLOB/BLOB datatypes.

PowerCenter Metadata Reporter
PowerCenter Metadata Reporter modified some report names and uses the PowerCenter 7.1 MX views in its schema.

Repository Server

Updating repository statistics. PowerCenter now identifies and updates statistics for all repository tables and indexes when you copy, upgrade, and restore repositories. This improves performance when PowerCenter accesses the repository. Increased repository performance. You can increase repository performance by skipping information when you copy, back up, or restore a repository. You can choose to skip MX data, workflow and session log history, and deploy group history. pmrep. You can use pmrep to back up, disable, or enable a repository, delete a relational connection from a repository, delete repository details, truncate log files, and run multiple pmrep commands sequentially. You can also use pmrep to create, modify, and delete a folder.

Repository

Exchange metadata with business intelligence tools. You can export metadata to and import metadata from other business intelligence tools, such as Cognos Report Net and Business Objects. Object import and export enhancements. You can compare objects in an XML file to objects in the target repository when you import objects. MX views. MX views have been added to help you analyze metadata stored in the repository. REP_SERVER_NET and REP_SERVER_NET_REF views allow you to see information about server grids. REP_VERSION_PROPS allows you to see the version history of all objects in a PowerCenter repository.

♦ ♦

Preface

xxxix

Transformations

Flat file lookup. You can now perform lookups on flat files. When you create a Lookup transformation using a flat file as a lookup source, the Designer invokes the Flat File Wizard. You can also use a lookup file parameter if you want to change the name or location of a lookup between session runs. Dynamic lookup cache enhancements. When you use a dynamic lookup cache, the PowerCenter Server can ignore some ports when it compares values in lookup and input ports before it updates a row in the cache. Also, you can choose whether the PowerCenter Server outputs old or new values from the lookup/output ports when it updates a row. You might want to output old values from lookup/output ports when you use the Lookup transformation in a mapping that updates slowly changing dimension tables. Union transformation. You can use the Union transformation to merge multiple sources into a single pipeline. The Union transformation is similar to using the UNION ALL SQL statement to combine the results from two or more SQL statements. Custom transformation API enhancements. The Custom transformation API includes new array-based functions that allow you to create procedure code that receives and outputs a block of rows at a time. Use these functions to take advantage of the PowerCenter Server processing enhancements. Midstream XML transformations. You can now create an XML Parser transformation or an XML Generator transformation to parse or generate XML inside a pipeline. The XML transformations enable you to extract XML data stored in relational tables, such as data stored in a CLOB column. You can also extract data from messaging systems, such as TIBCO or IBM MQSeries.

Usability
♦ ♦

Viewing active folders. The Designer and the Workflow Manager highlight the active folder in the Navigator. Enhanced printing. The quality of printed workspace has improved.

Version Control
You can run object queries that return shortcut objects. You can also run object queries based on the latest status of an object. The query can return local objects that are checked out, the latest version of checked in objects, or a collection of all older versions of objects.

Web Services Provider

Real-time Web Services. Real-time Web Services allows you to create services using the Workflow Manager and make them available to web service clients through the Web Services Hub. The PowerCenter Server can perform parallel processing of both requestresponse and one-way services. Web Services Hub. The Web Services Hub now hosts Real-time Web Services in addition to Metadata Web Services and Batch Web Services. You can install the Web Services Hub on a JBoss application server.

xl

Preface

Note: PowerCenter Connect for Web Services allows you to create sources, targets, and

transformations to call web services hosted by other providers. For more informations, see PowerCenter Connect for Web Services User and Administrator Guide.

Workflow Monitor
The Workflow Monitor includes the following performance and usability enhancements:
♦ ♦ ♦ ♦ ♦ ♦

When you connect to the PowerCenter Server, you no longer distinguish between online or offline mode. You can open multiple instances of the Workflow Monitor on one machine. You can simultaneously monitor multiple PowerCenter Servers registered to the same repository. The Workflow Monitor includes improved options for filtering tasks by start and end time. The Workflow Monitor displays workflow runs in Task view chronologically with the most recent run at the top. It displays folders alphabetically. You can remove the Navigator and Output window.

XML Support
PowerCenter XML support now includes the following features:
♦ ♦

Enhanced datatype support. You can use XML schemas that contain simple and complex datatypes. Additional options for XML definitions. When you import XML definitions, you can choose how you want the Designer to represent the metadata associated with the imported files. You can choose to generate XML views using hierarchy or entity relationships. In a view with hierarchy relationships, the Designer expands each element and reference under its parent element. When you create views with entity relationships, the Designer creates separate entities for references and multiple-occurring elements. Synchronizing XML definitions. You can synchronize one or more XML definition when the underlying schema changes. You can synchronize an XML definition with any repository definition or file used to create the XML definition, including relational sources or targets, XML files, DTD files, or schema files. XML workspace. You can edit XML views and relationships between views in the workspace. You can create views, add or delete columns from views, and define relationships between views. Midstream XML transformations. You can now create an XML Parser transformation or an XML Generator transformation to parse or generate XML inside a pipeline. The XML transformations enable you to extract XML data stored in relational tables, such as data stored in a CLOB column. You can also extract data from messaging systems, such as TIBCO or IBM MQSeries.

Preface

xli

Support for circular references. Circular references occur when an element is a direct or indirect child of itself. PowerCenter now supports XML files, DTD files, and XML schemas that use circular definitions. Increased performance for large XML targets. You can create XML files of several gigabytes in a PowerCenter 7.1 XML session by using the following enhancements:

Spill to disk. You can specify the size of the cache used to store the XML tree. If the size of the tree exceeds the cache size, the XML data spills to disk in order to free up memory. User-defined commits. You can define commits to trigger flushes for XML target files. Support for multiple XML output files. You can output XML data to multiple XML targets. You can also define the file names for XML output files in the mapping.

− −

PowerCenter 7.0
This section describes new features and enhancements to PowerCenter 7.0.

Data Profiling
If you have the Data Profiling option, you can profile source data to evaluate source data and detect patterns and exceptions. For example, you can determine implicit data type, suggest candidate keys, detect data patterns, and evaluate join criteria. After you create a profiling warehouse, you can create profiling mappings and run sessions. Then you can view reports based on the profile data in the profiling warehouse. The PowerCenter Client provides a Profile Manager and a Profile Wizard to complete these tasks.

Data Integration Web Services
You can use Data Integration Web Services to write applications to communicate with the PowerCenter Server. Data Integration Web Services is a web-enabled version of the PowerCenter Server functionality available through Load Manager and Metadata Exchange. It is comprised of two services for communication with the PowerCenter Server, Load Manager and Metadata Exchange Web Services running on the Web Services Hub.

Documentation
♦ ♦

Glossary. The Installation and Configuration Guide contains a glossary of new PowerCenter terms. Installation and Configuration Guide. The connectivity information in the Installation and Configuration Guide is consolidated into two chapters. This book now contains chapters titled “Connecting to Databases from Windows” and “Connecting to Databases from UNIX.” Upgrading metadata. The Installation and Configuration Guide now contains a chapter titled “Upgrading Repository Metadata.” This chapter describes changes to repository

xlii

Preface

objects impacted by the upgrade process. The change in functionality for existing objects depends on the version of the existing objects. Consult the upgrade information in this chapter for each upgraded object to determine whether the upgrade applies to your current version of PowerCenter.

Functions

Soundex. The Soundex function encodes a string value into a four-character string. SOUNDEX works for characters in the English alphabet (A-Z). It uses the first character of the input string as the first character in the return value and encodes the remaining three unique consonants as numbers. Metaphone. The Metaphone function encodes string values. You can specify the length of the string that you want to encode. METAPHONE encodes characters of the English language alphabet (A-Z). It encodes both uppercase and lowercase letters in uppercase.

Installation

Remote PowerCenter Client installation. You can create a control file containing installation information, and distribute it to other users to install the PowerCenter Client. You access the Informatica installation CD from the command line to create the control file and install the product.

PowerCenter Metadata Reporter
PowerCenter Metadata Reporter replaces Runtime Metadata Reporter and Informatica Metadata Reporter. PowerCenter Metadata Reporter includes the following features:

Metadata browsing. You can use PowerCenter Metadata Reporter to browse PowerCenter 7.0 metadata, such as workflows, worklets, mappings, source and target tables, and transformations. Metadata analysis. You can use PowerCenter Metadata Reporter to analyze operational metadata, including session load time, server load, session completion status, session errors, and warehouse growth.

PowerCenter Server
♦ ♦

DB2 bulk loading. You can enable bulk loading when you load to IBM DB2 8.1. Distributed processing. If you purchase the Server Grid option, you can group PowerCenter Servers registered to the same repository into a server grid. In a server grid, PowerCenter Servers balance the workload among all the servers in the grid. Row error logging. The session configuration object has new properties that allow you to define error logging. You can choose to log row errors in a central location to help understand the cause and source of errors. External loading enhancements. When using external loaders on Windows, you can now choose to load from a named pipe. When using external loaders on UNIX, you can now choose to load from staged files.
Preface xliii

External loading using Teradata Warehouse Builder. You can use Teradata Warehouse Builder to load to Teradata. You can choose to insert, update, upsert, or delete data. Additionally, Teradata Warehouse Builder can simultaneously read from multiple sources and load data into one or more tables. Mixed mode processing for Teradata external loaders. You can now use data driven load mode with Teradata external loaders. When you select data driven loading, the PowerCenter Server flags rows for insert, delete, or update. It writes a column in the target file or named pipe to indicate the update strategy. The control file uses these values to determine how to load data to the target. Concurrent processing. The PowerCenter Server now reads data concurrently from sources within a target load order group. This enables more efficient joins with minimal usage of memory and disk cache. Real time processing enhancements. You can now use real-time processing in sessions that also process active transformations, such as the Aggregator transformation. You can apply the transformation logic to rows defined by transaction boundaries.

Repository Server

Object export and import enhancements. You can now export and import objects using the Repository Manager and pmrep. You can export and import multiple objects and objects types. You can export and import objects with or without their dependent objects. You can also export objects from a query result or objects history. pmrep commands. You can use pmrep to perform change management tasks, such as maintaining deployment groups and labels, checking in, deploying, importing, exporting, and listing objects. You can also use pmrep to run queries. The deployment and object import commands require you to use a control file to define options and resolve conflicts. Trusted connections. You can now use a Microsoft SQL Server trusted connection to connect to the repository.

Security

LDAP user authentication. You can now use default repository user authentication or Lightweight Directory Access Protocol (LDAP) to authenticate users. If you use LDAP, the repository maintains an association between your repository user name and your external login name. When you log in to the repository, the security module passes your login name to the external directory for authentication. The repository maintains a status for each user. You can now enable or disable users from accessing the repository by changing the status. You do not have to delete user names from the repository. Use Repository Manager privilege. The Use Repository Manager privilege allows you to perform tasks in the Repository Manager, such as copy object, maintain labels, and change object status. You can perform the same tasks in the Designer and Workflow Manager if you have the Use Designer and Use Workflow Manager privileges. Audit trail. You can track changes to repository users, groups, privileges, and permissions through the Repository Server Administration Console. The Repository Agent logs security changes to a log file stored in the Repository Server installation directory. The

xliv

Preface

audit trail log contains information, such as changes to folder properties, adding or removing a user or group, and adding or removing privileges.

Transformations

Custom transformation. Custom transformations operate in conjunction with procedures you create outside of the Designer interface to extend PowerCenter functionality. The Custom transformation replaces the Advanced External Procedure transformation. You can create Custom transformations with multiple input and output groups, and you can compile the procedure with any C compiler. You can create templates that customize the appearance and available properties of a Custom transformation you develop. You can specify the icons used for transformation, the colors, and the properties a mapping developer can modify. When you create a Custom transformation template, distribute the template with the DLL or shared library you develop.

Joiner transformation. You can use the Joiner transformation to join two data streams that originate from the same source.

Version Control
The PowerCenter Client and repository introduce features that allow you to create and manage multiple versions of objects in the repository. Version control allows you to maintain multiple versions of an object, control development on the object, track changes, and use deployment groups to copy specific groups of objects from one repository to another. Version control in PowerCenter includes the following features:

Object versioning. Individual objects in the repository are now versioned. This allows you to store multiple copies of a given object during the development cycle. Each version is a separate object with unique properties. Check out and check in versioned objects. You can check out and reserve an object you want to edit, and check in the object when you are ready to create a new version of the object in the repository. Compare objects. The Repository Manager and Workflow Manager allow you to compare two repository objects of the same type to identify differences between them. You can compare Designer objects and Workflow Manager objects in the Repository Manager. You can compare tasks, sessions, worklets, and workflows in the Workflow Manager. The PowerCenter Client tools allow you to compare objects across open folders and repositories. You can also compare different versions of the same object. Delete or purge a version. You can delete an object from view and continue to store it in the repository. You can recover or undelete deleted objects. If you want to permanently remove an object version, you can purge it from the repository. Deployment. Unlike copying a folder, copying a deployment group allows you to copy a select number of objects from multiple folders in the source repository to multiple folders in the target repository. This gives you greater control over the specific objects copied from one repository to another.

Preface

xlv

Deployment groups. You can create a deployment group that contains references to objects from multiple folders across the repository. You can create a static deployment group that you manually add objects to, or create a dynamic deployment group that uses a query to populate the group. Labels. A label is an object that you can apply to versioned objects in the repository. This allows you to associate multiple objects in groups defined by the label. You can use labels to track versioned objects during development, improve query results, and organize groups of objects for deployment or export and import. Queries. You can create a query that specifies conditions to search for objects in the repository. You can save queries for later use. You can make a private query, or you can share it with all users in the repository. Track changes to an object. You can view a history that includes all versions of an object and compare any version of the object in the history to any other version. This allows you to see the changes made to an object over time.

XML Support
PowerCenter contains XML features that allow you to validate an XML file against an XML schema, declare multiple namespaces, use XPath to locate XML nodes, increase performance for large XML files, format your XML file output for increased readability, and parse or generate XML data from various sources. XML support in PowerCenter includes the following features:

XML schema. You can use an XML schema to validate an XML file and to generate source and target definitions. XML schemas allow you to declare multiple namespaces so you can use prefixes for elements and attributes. XML schemas also allow you to define some complex datatypes. XPath support. The XML wizard allows you to view the structure of XML schema. You can use XPath to locate XML nodes. Increased performance for large XML files. When you process an XML file or stream, you can set commits and periodically flush XML data to the target instead of writing all the output at the end of the session. You can choose to append the data to the same target file or create a new target file after each flush. XML target enhancements. You can format the XML target file so that you can easily view the XML file in a text editor. You can also configure the PowerCenter Server to not output empty elements to the XML target.

♦ ♦

Usability

Copying objects. You can now copy objects from all the PowerCenter Client tools using the copy wizard to resolve conflicts. You can copy objects within folders, to other folders, and to different repositories. Within the Designer, you can also copy segments of mappings to a workspace in a new folder or repository. Comparing objects. You can compare workflows and tasks from the Workflow Manager. You can also compare all objects from within the Repository Manager.

xlvi

Preface

Change propagation. When you edit a port in a mapping, you can choose to propagate changed attributes throughout the mapping. The Designer propagates ports, expressions, and conditions based on the direction that you propagate and the attributes you choose to propagate. Enhanced partitioning interface. The Session Wizard is enhanced to provide a graphical depiction of a mapping when you configure partitioning. Revert to saved. You can now revert to the last saved version of an object in the Workflow Manager. When you do this, the Workflow Manager accesses the repository to retrieve the last-saved version of the object. Enhanced validation messages. The PowerCenter Client writes messages in the Output window that describe why it invalidates a mapping or workflow when you modify a dependent object. Validate multiple objects. You can validate multiple objects in the repository without fetching them into the workspace. You can save and optionally check in objects that change from invalid to valid status as a result of the validation. You can validate sessions, mappings, mapplets, workflows, and worklets. View dependencies. Before you edit or delete versioned objects, such as sources, targets, mappings, or workflows, you can view dependencies to see the impact on other objects. You can view parent and child dependencies and global shortcuts across repositories. Viewing dependencies help you modify objects and composite objects without breaking dependencies. Refresh session mappings. In the Workflow Manager, you can refresh a session mapping.

♦ ♦

Preface

xlvii

About Informatica Documentation
The complete set of documentation for PowerCenter includes the following books:
♦ ♦

Data Profiling Guide. Provides information about how to profile PowerCenter sources to evaluate source data and detect patterns and exceptions. Designer Guide. Provides information needed to use the Designer. Includes information to help you create mappings, mapplets, and transformations. Also includes a description of the transformation datatypes used to process and transform source data. Getting Started. Provides basic tutorials for getting started. Installation and Configuration Guide. Provides information needed to install and configure the PowerCenter tools, including details on environment variables and database connections. PowerCenter Connect® for JMS® User and Administrator Guide. Provides information to install PowerCenter Connect for JMS, build mappings, extract data from JMS messages, and load data into JMS messages. Repository Guide. Provides information needed to administer the repository using the Repository Manager or the pmrep command line program. Includes details on functionality available in the Repository Manager and Administration Console, such as creating and maintaining repositories, folders, users, groups, and permissions and privileges. Transformation Language Reference. Provides syntax descriptions and examples for each transformation function provided with PowerCenter. Transformation Guide. Provides information on how to create and configure each type of transformation in the Designer. Troubleshooting Guide. Lists error messages that you might encounter while using PowerCenter. Each error message includes one or more possible causes and actions that you can take to correct the condition. Web Services Provider Guide. Provides information you need to install and configure the Web Services Hub. This guide also provides information about how to use the web services that the Web Services Hub hosts. The Web Services Hub hosts Real-time Web Services, Batch Web Services, and Metadata Web Services. Workflow Administration Guide. Provides information to help you create and run workflows in the Workflow Manager, as well as monitor workflows in the Workflow Monitor. Also contains information on administering the PowerCenter Server and performance tuning. XML User Guide. Provides information you need to create XML definitions from XML, XSD, or DTD files, and relational or other XML definitions. Includes information on running sessions with XML data. Also includes details on using the midstream XML transformations to parse or generate XML data within a pipeline.

♦ ♦

♦ ♦ ♦

xlviii

Preface

About this Book
The Workflow Administration Guide is written for developers and administrators who are responsible for creating workflows and sessions, running workflows, and administering the PowerCenter Server. This guide assumes you have knowledge of your operating systems, relational database concepts, and the database engines, flat files or mainframe system in your environment. This guide also assumes you are familiar with the interface requirements for your supporting applications. The material in this book is available for online use.

Document Conventions
This guide uses the following formatting conventions:
If you see… It means… The word or set of words are especially emphasized. Emphasized subjects. This is the variable name for a value you enter as part of an operating system command. This is generic text that should be replaced with user-supplied values. The following paragraph provides additional facts. The following paragraph provides suggested uses. The following paragraph notes situations where you can overwrite or corrupt data, unless you follow the specified procedure. This is a code example. This is an operating system command you enter from a prompt to run a task.

italicized text boldfaced text
italicized monospaced text

Note: Tip: Warning:
monospaced text bold monospaced text

Preface

xlix

Other Informatica Resources
In addition to the product manuals, Informatica provides these other resources:
♦ ♦ ♦ ♦ ♦

Informatica Customer Portal Informatica Webzine Informatica web site Informatica Developer Network Informatica Technical Support

Visiting Informatica Customer Portal
As an Informatica customer, you can access the Informatica Customer Portal site at http:// my.informatica.com. The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica Knowledgebase, Informatica Webzine, and access to the Informatica user community.

Visiting the Informatica Webzine
The Informatica Documentation team delivers an online journal, the Informatica Webzine. This journal provides solutions to common tasks, detailed descriptions of specific features, and tips and tricks to help you develop data warehouses. The Informatica Webzine is a password-protected site that you can access through the Customer Portal. The Customer Portal has an online registration form for login accounts to its webzine and web support. To register for an account, go to http://my.informatica.com. If you have any questions, please email webzine@informatica.com.

Visiting the Informatica Web Site
You can access Informatica’s corporate web site at http://www.informatica.com. The site contains information about Informatica, its background, upcoming events, and locating your closest sales office. You will also find product information, as well as literature and partner information. The services area of the site includes important information on technical support, training and education, and implementation services.

Visiting the Informatica Developer Network
The Informatica Developer Network is a web-based forum for third-party software developers. You can access the Informatica Developer Network at the following URL: http://devnet.informatica.com

l

Preface

The site contains information on how to create, market, and support customer-oriented addon solutions based on Informatica’s interoperability interfaces.

Obtaining Technical Support
There are many ways to access Informatica technical support. You can call or email your nearest Technical Support Center listed below or you can use our WebSupport Service. WebSupport requires a user name and password. You can request a user name and password at http://my.informatica.com.
North America / South America Informatica Corporation 2100 Seaport Blvd. Redwood City, CA 94063 Phone: 866.563.6332 or 650.385.5800 Fax: 650.213.9489 Hours: 6 a.m. - 6 p.m. (PST/PDT) email: support@informatica.com Africa / Asia / Australia / Europe Informatica Software Ltd. 6 Waltham Park Waltham Road, White Waltham Maidenhead, Berkshire SL6 3TN Phone: 44 870 606 1525 Fax: +44 1628 511 411 Hours: 9 a.m. - 5:30 p.m. (GMT) email: support_eu@informatica.com Belgium Phone: +32 15 281 702 Hours: 9 a.m. - 5:30 p.m. (local time) France Phone: +33 1 41 38 92 26 Hours: 9 a.m. - 5:30 p.m. (local time) Germany Phone: +49 1805 702 702 Hours: 9 a.m. - 5:30 p.m. (local time) Netherlands Phone: +31 306 082 089 Hours: 9 a.m. - 5:30 p.m. (local time) Singapore Phone: +65 322 8589 Hours: 9 a.m. - 5 p.m. (local time) Switzerland Phone: +41 800 81 80 70 Hours: 8 a.m. - 5 p.m. (local time)

Preface

li

lii

Preface

Chapter 1

Understanding the Server Architecture
This chapter covers the following subjects:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 2 PowerCenter Server Connectivity, 5 Running a Workflow, 7 Load Manager Process, 8 Data Transformation Manager (DTM) Process, 11 Understanding Processing Threads, 14 PowerCenter Server Processing, 22 System Resources, 24 Code Pages and Data Movement Modes, 27 Output Files and Caches, 28

1

Overview
You can register multiple PowerCenter Servers to a repository. The PowerCenter Server moves data from sources to targets based on workflow and mapping metadata stored in a repository. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. The PowerCenter Server runs workflow tasks according to the conditional links connecting the tasks. You can run a task by placing it in a workflow. When you have multiple PowerCenter Servers, you can assign a server to start a workflow or a session. This allows you to distribute the workload. You can increase performance by using a server grid to balance the workload. A server grid is a server object that allows you to automate the distribution of sessions across multiple servers. For more information about server grids, see “Working with Server Grids” on page 446. A session is a type of workflow task. A session is a set of instructions that describes how to move data from sources to targets using a mapping. Other workflow tasks include commands, decisions, timers, pre-session SQL commands, post-session SQL commands, and email notification. For details on workflow tasks, see “Working with Tasks” on page 131. Use the Designer to import source and target definitions into the repository and to build mappings. A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Use the Workflow Manager to develop and manage workflows. Use the Workflow Monitor to monitor workflows and stop the PowerCenter Server. When a workflow starts, the PowerCenter Server retrieves mapping, workflow, and session metadata from the repository to extract data from the source, transform it, and load it into the target. It also runs the tasks in the workflow. The PowerCenter Server uses Load Manager and Data Transformation Manager (DTM) processes to run the workflow. Figure 1-1 shows the processing path between the PowerCenter Server, repository, source, and target:
Figure 1-1. PowerCenter Server and Data Movement Source Source Data PowerCenter Server Transformed Data Target

Instructions from Metadata

Repository

2

Chapter 1: Understanding the Server Architecture

The PowerCenter Server can combine data from different platforms and source types. For example, you can join data from a flat file and an Oracle source. The PowerCenter Server can also load data to different platforms and target types. For example, you can load transformed data to both a flat file target and a Microsoft SQL Server database in the same session.

Workflow Processes
The PowerCenter Server uses both process memory and system shared memory to perform these tasks. It runs as a daemon on UNIX and a service on Windows. The PowerCenter Server uses the following processes to run a workflow:
♦ ♦

The Load Manager process. Starts and locks the workflow, runs workflow tasks, and starts the DTM to run sessions. The Data Transformation Manager (DTM) process. Performs session validations. Creates threads to initialize the session, read, write, and transform data, and handle pre- and postsession operations.

Pipeline Partitioning
When running sessions, the PowerCenter Server can achieve high performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel. To accomplish this, use the following session and server configuration:
♦ ♦

Configure the session with multiple partitions. Install the PowerCenter Server on a machine with multiple CPUs.

You can configure the partition type at most transformations in the pipeline. The PowerCenter Server can partition data using round-robin, hash, key-range, database partitioning, or pass-through partitioning. For relational sources, the PowerCenter Server creates multiple database connections to a single source and extracts a separate range of data for each connection. For XML or file sources, the PowerCenter Server reads multiple files concurrently. The files must have the same structure or hierarchy. When the PowerCenter Server transforms the partitions concurrently, it passes data between the partitions as needed to perform operations such as aggregation. When the PowerCenter Server loads relational data, it creates multiple database connections to the target and loads partitions of data concurrently. When the PowerCenter Server loads data to file targets, it creates a separate file for each partition. You can choose to merge the target files. Figure 1-2 shows a mapping that contains two partitions:
Figure 1-2. Partitioned Mapping Source Transformations Target

Overview

3

For more information about pipeline partitioning, see “Pipeline Partitioning” on page 345.

4

Chapter 1: Understanding the Server Architecture

PowerCenter Server Connectivity
The PowerCenter Server connects to the following Informatica platform components:
♦ ♦ ♦ ♦ ♦

PowerCenter Client Other PowerCenter Servers Repository Server Repository Agent Source and target databases

The PowerCenter Server is a repository client application. It connects to the Repository Server and Repository Agent to retrieve workflow and mapping metadata from the repository database. When the PowerCenter Server requests a repository connection from the Repository Server, the Repository Server starts and manages the Repository Agent. The Repository Server then re-directs the PowerCenter Server to connect directly to the Repository Agent. For details on repository connectivity, see “Understanding the Repository” in the Repository Guide. The Workflow Manager communicates directly with the PowerCenter Server over a TCP/IP connection. The Workflow Manager communicates directly with the PowerCenter Server each time you schedule or edit a workflow, display workflow details, and request workflow and session logs. You create the connection by defining the port number in the Workflow Manager and the PowerCenter Server configuration. Use the Workflow Manager to register the PowerCenter Server in the repository. In a server grid, the Workflow Manager communicates directly with multiple PowerCenter Servers over TCP/IP connections. Each PowerCenter Server retrieves a server grid object from the repository, which it uses to connect to the other PowerCenter Servers in the grid. When the PowerCenter Servers connect to each other, they maintain a constant line of communication with each other. For more information about creating and using server grids, see “Working with Server Grids” on page 446. The PowerCenter Server connects to the source or target database using ODBC or native drivers. It uses TCP/IP to connect to the Repository Server. The PowerCenter Server maintains a database connection pool for stored procedures or lookup databases in a workflow. The PowerCenter Server allows an unlimited number of connections to lookup or stored procedure databases. If a database user does not have permission for the number of connections a session requires, the session fails. You can optionally set a parameter to limit the database connections. For a session, the PowerCenter Server holds the connection as long as it needs to read data from source tables or write data to target tables. To prevent loss of information during data transfer, the PowerCenter Server, PowerCenter Client, Repository Server, Repository Agent, and repository database must have compatible code pages.

PowerCenter Server Connectivity

5

Figure 1-3 shows the PowerCenter Server connectivity:
Figure 1-3. PowerCenter Connectivity PowerCenter Client TCP/IP PowerCenter Server Native/ ODBC Sources and Targets

TCP/IP

Repository Server Repository Agent Native/ODL PowerCenter Repository

Table 1-1 summarizes the software you need to connect the PowerCenter Server to the platform components, source databases, and target databases:
Table 1-1. PowerCenter Server Connectivity Requirements PowerCenter Server Connection PowerCenter Client Other PowerCenter Servers Repository Server Repository Agent Source and target databases Connectivity Requirement TCP/IP TCP/IP TCP/IP TCP/IP Native database drivers or ODBC

Note: Both the Windows and UNIX versions of the PowerCenter Server can use ODBC drivers to connect to databases. However, Informatica recommends using native drivers when possible to improve performance.

6

Chapter 1: Understanding the Server Architecture

Running a Workflow
The PowerCenter Server uses the Load Manager process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks. When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks: 1. 2. 3. 4. 5. 6. 7. 8. Locks the workflow and reads workflow properties. Reads the parameter file and expands workflow variables. Creates the workflow log file. Runs workflow tasks. Distributes sessions to worker servers. Starts the DTM to run sessions. Runs sessions from master servers. Sends post-session email if the DTM terminates abnormally.

For details on the Load Manager process, see “Load Manager Process” on page 8. When the PowerCenter Server runs a session, the DTM performs the following tasks: 1. 2. 3. 4. 5. 6. 7. 8. 9. Fetches session and mapping metadata from the repository. Creates and expands session variables. Creates the session log file. Validates session code pages if data code page validation is enabled. Checks query conversions if data code page validation is disabled. Verifies connection object permissions. Runs pre-session shell commands. Runs pre-session stored procedures and SQL. Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data. Runs post-session stored procedures and SQL.

10. Runs post-session shell commands. 11. Sends post-session email. For details on the DTM process, see “Data Transformation Manager (DTM) Process” on page 11.

Running a Workflow

7

Load Manager Process
The Load Manager is the primary PowerCenter Server process. It accepts requests from the PowerCenter Client and from pmcmd. The Load Manager runs and monitors the workflow. It performs the following tasks:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Manages workflow scheduling. Locks and reads the workflow. Reads the parameter file. Creates the workflow log file. Runs workflow tasks and evaluates the conditional links connecting tasks. Starts the DTM, which runs the session. Writes historical run information to the repository. Sends post-session email in the event of DTM failure.

Managing Workflow Scheduling
The Load Manager manages workflow scheduling in the following situations:

When you start the PowerCenter Server. When you start the PowerCenter Server, the Load Manager launches and queries the repository for a list of workflows configured to run on the PowerCenter Server. When you save a workflow. When you save a workflow assigned to a PowerCenter Server to the repository, the Load Manager adds the workflow to or removes the workflow from the schedule queue.

Locking and Reading the Workflow
When the PowerCenter Server starts a workflow, the Load Manager requests an execute lock on the workflow from the repository. The execute lock allows the PowerCenter Server to run the workflow and prevents you from starting the workflow again until it completes. If the workflow is already locked, the PowerCenter Server cannot start the workflow. A workflow may be locked if it is already running. The Load Manager also reads the workflow from the repository at workflow run time. The Load Manager reads all links and tasks in the workflow except sessions and worklet instances. The Load Manager reads session instance information from the repository. The DTM retrieves the session and mapping from the repository at session run time. The Load Manager reads worklets from the repository when the worklet starts. For more information on locking, see “Repository Security” in the Repository Guide.

8

Chapter 1: Understanding the Server Architecture

Reading the Parameter File
When the workflow starts, the Load Manager checks the workflow properties for use of a parameter file. If the workflow uses a parameter file, the Load Manager reads the parameter file and expands the variable values for the workflow and any worklets invoked by the workflow. The parameter file can also contain mapping variables, mapping parameters, session parameters, and session variables for sessions in the workflow. When starting the DTM, the Load Manager passes the parameter file name to the DTM. For more information on the parameter file, see “Session Parameters” on page 495.

Creating the Workflow Log File
The Load Manager creates a log file for the workflow. The workflow log file contains a history of the workflow run, including initialization, workflow task status, and error messages. You can use information in the workflow log file in conjunction with the PowerCenter Server log and session log to troubleshoot system, workflow, or session problems. You can view the workflow log file in the Workflow Manager or open it in a text editor. The following sample shows the first few lines of a log file:
INFO : LM_36215 : (2076|2224) Starting execution of workflow [w_OrdersBooked]. INFO : LM_36255 : (2076|2224) Link [StartWorkflow --> s_BOOKINGS]: empty expression string, evaluated to TRUE. INFO : LM_36224 : (2076|2224) Starting execution of session instance [s_BOOKINGS]. INFO : LM_36302 : (2076|2224) Started DTM process [pid = 508] for session instance [s_BOOKINGS].

For more information on workflow log files, see “Log Files” on page 455.

Running Workflow Tasks
The Load Manager runs workflow tasks according to the conditional links connecting the tasks. Links define the order of execution for workflow tasks. When a task in the workflow completes, the Load Manager evaluates the completed task according to specified conditions, such as success or failure. Based on the result of the evaluation, the Load Manager runs successive links and tasks. For more information on workflows and workflow tasks, see “Working with Workflows” on page 87.

Distributing Sessions to Worker Servers
When you run a workflow in a server grid, the master server distributes session tasks to the worker servers in a round-robin fashion to balance the workload. When the master server
Load Manager Process 9

distributes a session to a worker server, the Load Manager on the worker server machine starts a DTM process to run the session. For more information about creating and using server grids, see “Working with Server Grids” on page 446.

Starting the DTM
When the workflow reaches a session, the Load Manager starts the DTM. The Load Manager provides the DTM with session and parameter file information that allows the DTM to retrieve the session and mapping metadata from the repository. For more information on the DTM process, see “Data Transformation Manager (DTM) Process” on page 11.

Running Sessions from Master Servers
If a PowerCenter Server is part of a server grid, it can run sessions assigned from other master servers. The master server runs tasks in a workflow before it runs sessions assigned from other master servers. For more information about creating and using server grids, see “Working with Server Grids” on page 446.

Writing Historical Information to the Repository
The Load Manager monitors the status of workflow tasks during the workflow run. When workflow tasks start or finish, the Load Manager writes historical run information to the repository. Historical run information for tasks includes start and completion times and completion status. Historical run information for sessions also includes source read statistics, target load statistics, and number of errors. You can view this information using the Workflow Monitor. For details on using the Workflow Monitor, see “Monitoring Workflows” on page 401.

Sending Post-Session Email
The Load Manager sends post-session email if the DTM terminates abnormally. The DTM sends post-session email in all other cases. For details on post-session email, see “Sending Email” on page 319.

10

Chapter 1: Understanding the Server Architecture

Data Transformation Manager (DTM) Process
When the workflow reaches a session, the Load Manager starts the DTM process. The DTM process is the process associated with the session task. The Load Manager creates one DTM process for each session in the workflow. The DTM process performs the following tasks:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Reads session information from the repository. Expands the server, session, and mapping variables and parameters. Creates the session log file. Validates source and target code pages. Verifies connection object permissions. Runs pre-session shell commands, stored procedures and SQL. Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data. Runs post-session stored procedures, SQL, and shell commands. Sends post-session email.

Reading the Session Information
The Load Manager provides the DTM with session instance information when it starts the DTM. The DTM retrieves the mapping and session metadata from the repository.

Expanding Variables and Parameters
If the workflow uses a parameter file, the Load Manager sends the parameter file to the DTM when it starts the DTM. The DTM creates and expands session-level, server-level, and mapping-level variables and parameters. For more information on the parameter file, see “Session Parameters” on page 495.

Creating the Session Log File
The DTM creates a log file for the session. The log file contains a complete history of the session run, including initialization, transformation, status, and error messages. You can use information in the log file in conjunction with the PowerCenter Server log and the workflow log file to troubleshoot system or session problems. You can view the log file in the Workflow Monitor or open it in a text editor. The following sample shows the first few lines of a log file:
MASTER> CMN_1010 System shared memory [2338661387] allocated for [12000000] bytes. MASTER> PETL_24000 Parallel Pipeline Engine initializing. MASTER> PETL_24001 Parallel Pipeline Engine running.

Data Transformation Manager (DTM) Process

11

MASTER> PETL_24003 Initializing session run. MAPPING> TM_6014 Initializing session [s_Customers] at [Tue Nov 04 16:55:06 2003]

For more information on session log files, see “Log Files” on page 455.

Validating Code Pages
When the PowerCenter Server runs in Unicode mode with data code page validation enabled, the DTM validates the following code pages:
♦ ♦ ♦ ♦ ♦ ♦ ♦

Source code pages. Must be a subset of the PowerCenter Server code page. Target code pages. Must be a superset of the PowerCenter Server code page. Repository Agent code page. Must be compatible with the PowerCenter Server code page. Repository Server code page. Must be compatible with the PowerCenter Server code page. Lookup database code page. Must be compatible with the PowerCenter Server code page. Stored procedure database code page. Must be compatible with the PowerCenter Server code page. PowerCenter Server code page. Must be registered with the Workflow Manager.

If the DTM cannot validate the code pages, it writes the error into the session log and fails the session. If you disable data code page validation, the PowerCenter Server does not enforce code page compatibility. The PowerCenter Server processes data internally using the UCS-2 character set. When you disable data code page validation the PowerCenter Server verifies that the source query, target query, lookup database query, and stored procedure call text convert from the source, target, lookup, or stored procedure data code page to the UCS-2 character without loss of data in conversion. If the PowerCenter Server encounters an error when converting data, it writes an error message to the session log. For more information about code pages, see “Globalization Overview” and “Code Pages” in the Installation and Configuration Guide.

Verifying Connection Object Permissions
After validating the session code pages, the DTM verifies permissions for connection objects used in the session. The DTM verifies that the user who started the PowerCenter Server and the user who started or scheduled the workflow has execute permissions for connection objects associated with the session.

Running Pre-Session Operations
After verifying connection object permissions, the DTM runs pre-session shell commands. The DTM then runs pre-session stored procedures and SQL commands.

12

Chapter 1: Understanding the Server Architecture

Running the Processing Threads
After initializing the session, the DTM uses reader, transformation, and writer threads to extract, transform, and load data. The number of threads the DTM uses to run the session depends on the number of partitions configured for the session. For a detailed discussion of reader, transformation, and writer threads, see “Understanding Processing Threads” on page 14.

Running Post-Session Operations
After the DTM runs the processing threads, it runs post-session SQL commands and stored procedures. The DTM then runs post-session shell commands.

Sending Post-Session Email
When the session finishes, the DTM composes and sends email reporting session completion or failure. If the DTM terminates abnormally, the Load Manager sends post-session email. For details on post-session email, see “Sending Email” on page 319.

Data Transformation Manager (DTM) Process

13

Understanding Processing Threads
The DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory. The default memory allocation is 12,000,000 bytes. The DTM uses multiple threads to process data. The main DTM thread is called the master thread. The master thread creates and manages other threads. The master thread for a session can create mapping, pre-session, post-session, reader, transformation, and writer threads. For more information, see “Thread Types” on page 14. For each target load order group in a mapping, the master thread can create several threads. The types of threads depend on the session properties and the transformations in the mapping. The number of threads depends on the partitioning information for each target load order group in the mapping. For more information on target load order groups, see “Reading Source Data” on page 22.

Thread Types
The master thread creates different types of threads for a session. The types of threads the master thread creates depend on the following factors:
♦ ♦

Pre- and post-session properties Types of transformations in the mapping

Table 1-2 lists the types of threads that the master thread can create:
Table 1-2. Processing Threads Thread Type Mapping Thread Pre- and Post-Session Threads Reader Thread Description One thread for each session. Fetches session and mapping information. Compiles the mapping. Cleans up after session execution. One thread each to perform pre- and post-session operations. One thread for each partition for each source pipeline. Reads from sources. Relational sources use relational reader threads, and file sources use file reader threads. One or more transformation threads for each partition. Processes data according to the transformation logic in the mapping. One thread for each partition, if a target exists in the source pipeline. Writes to targets. Relational targets use relational writer threads, and file targets use file writer threads.

Transformation Thread Writer Thread

14

Chapter 1: Understanding the Server Architecture

Figure 1-4 shows the threads the master thread creates for a simple mapping that contains one target load order group:
Figure 1-4. Thread Creation for a Simple Mapping

1 Reader Thread

1 Transformation Thread

1 Writer Thread

The mapping in Figure 1-4 contains a single partition. In this case, the master thread creates one reader, one transformation, and one writer thread to process the data. The reader thread controls how the PowerCenter Server extracts source data and passes it to the source qualifier, the transformation thread controls how the PowerCenter Server processes the data, and the writer thread controls how the PowerCenter Server loads data to the target. When the pipeline contains only a source definition, source qualifier, and a target definition, the data bypasses the transformation threads, proceeding directly from the reader buffers to the writer. This type of pipeline is a pass-through pipeline. Figure 1-5 shows the threads for a pass-through pipeline with one partition:
Figure 1-5. Thread Creation for a Pass-through Pipeline

1 Reader Thread

Bypassed Transformation Thread

1 Writer Thread

Note: The previous examples assume that each session contains a single partition. For

information on how partitions and partition points affect thread creation, see “Threads and Partitioning” on page 16.

Reader Threads
The master thread creates reader threads to extract source data. The number of reader threads depends on the partitioning information for each pipeline. The number of reader threads equals the number of partitions. For more information, see “Threads and Partitioning” on page 16. The PowerCenter Server creates an SQL statement for each reader thread to extract data from a relational source. For file sources, the PowerCenter Server can create multiple threads to read a single source.

Understanding Processing Threads

15

Transformation Threads
The master thread creates transformation threads to transform data received in buffers by the reader thread, move the data from transformation to transformation, and create memory caches when necessary. The number of transformation threads depends on the partitioning information for each pipeline. For more information, see “Threads and Partitioning” on page 16. The transformation threads store fully-transformed data in a buffer drawn from the memory pool for subsequent access by the writer thread. If the pipeline contains a Rank, Joiner, Aggregator, Sorter, or a cached Lookup transformation, the transformation thread uses cache memory until it reaches the configured cache size limits. If the transformation thread requires more space, it pages to local cache files to hold additional data. When the PowerCenter Server runs in ASCII mode, the transformation threads pass character data in single bytes. When the PowerCenter Server runs in Unicode mode, the transformation threads use double bytes to move character data.

Writer Threads
The master thread creates writer threads to load target data. The number of writer threads depends on the partitioning information for each pipeline. If the pipeline contains one partition, the master thread creates one writer thread. If it contains multiple partitions, the master thread creates multiple writer threads. For more information, see “Threads and Partitioning” on page 16. Each writer thread creates connections to the target databases to load data. If the target is a file, each writer thread creates a separate file. You can configure the session to merge these files. If the target is relational, the writer thread takes data from buffers and commits it to session targets. When loading targets, the writer commits data based on the commit interval in the session properties. You can configure a session to commit data based on the number of source rows read, the number of rows written to the target, or the number of rows that pass through a transformation that generates transactions, such as a Transaction Control transformation.

Threads and Partitioning
The master thread creates different numbers of threads for different mappings. The number of threads depends on the partitioning information for each target load order group. This includes the following factors:
♦ ♦ ♦

The partition points. Controls the thread boundaries and pipeline stages. The number of partitions. Controls the number of threads the master thread creates for each pipeline stage. The number of source pipelines. Controls the number of reader threads and the number of transformation threads downstream from the sources.

16

Chapter 1: Understanding the Server Architecture

Partition Points
By default, the Workflow Manager places partition points at certain transformations in each source pipeline. Partition points mark the thread boundaries in a source pipeline and divide the pipeline into stages. A pipeline stage is the section of a pipeline executed between any two partition points. When you set a partition point at a transformation, the new pipeline stage includes that transformation. The PowerCenter Server can redistribute rows of data at partition points. For example, if you place a partition point at a Sorter transformation and specify multiple partitions, the PowerCenter Server redistributes rows among all partitions before the rows enter the Sorter transformation. The rows stay in the same partitions until they reach the next partition point. For more information, see “Pipeline Partitioning” on page 345. By default, the Workflow Manager places a partition point at each of the following transformations:
♦ ♦

Source qualifier. Marks the reader stage. You cannot delete this partition point. Rank and unsorted Aggregator transformation. Marks the transformation stage boundaries and creates a new transformation stage. This is necessary to ensure that rows are grouped properly before the Rank and Aggregator transformations process them. You can delete these partition points under certain circumstances. For more information, see “Adding and Deleting Partition Points” on page 353. Target instance. Marks the writer stage. You cannot delete this partition point.

Figure 1-6 shows the pipeline stages for a mapping that contains an unsorted Aggregator transformation:
Figure 1-6. Pipeline Stages in a Mapping With an Unsorted Aggregator Transformation

*

*

*

*

Default Partition Points

First Stage

Second Stage

Third Stage

Fourth Stage

The mapping in Figure 1-6 contains four stages by default. The partition point at the source qualifier marks the boundary between the first (reader) and second (transformation) stages. The partition point at the Aggregator transformation marks the boundary between the second and third (transformation) stages. The partition point at the target instance marks the boundary between the third (transformation) and the fourth (writer) stages. If you use PowerCenter, you can add and delete partition points at other transformations. For information on valid partition points, see “Pipeline Partitioning” on page 345. When you add a partition point, you increase the number of pipeline stages by one. When you remove a partition point, you decrease the number of pipeline stages by one.

Understanding Processing Threads

17

Figure 1-7 shows the pipeline stages if you add a partition point at the Filter transformation:
Figure 1-7. Pipeline Stages in a Mapping with an Additional Partition Point

*

*

*

*

*

Partition Points

First Stage

Second Stage

Third Stage

Fourth Stage

Fifth Stage

Number of Partitions
The number of threads that process each pipeline stage depends on the number of partitions. A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of partitions in any pipeline stage equals the number of threads in that stage. If you do not specify otherwise, the PowerCenter Server creates one partition in every pipeline stage. If you purchased the partitioning option, you can configure multiple partitions for a single pipeline stage. You can specify the number of partitions at any partition point. The number of partitions must be consistent across a pipeline. Therefore, if you define two partitions at the source qualifier, the Workflow Manager sets two partitions at all transformations that are partition points, and two partitions at the target instances. For example, suppose you need to use the mapping in Figure 1-6 on page 17 to read data from three flat files. To do this, you need to specify three partitions at the source qualifier. When you do this, the Workflow Manager sets three partitions at all other partition points in the pipeline. The master thread creates three sets of threads. Figure 1-8 shows thread creation for a mapping with three partitions:
Figure 1-8. Thread Creation for a Mapping with Three Partitions

*

*

*

*

Default Partition Points

Threads for Partition #1 Threads for Partition #2 Threads for Partition #3 3 Reader Threads (First Stage) 6 Transformation Threads (Third Stage) 3 Writer Threads (Fourth Stage)

(Second Stage)

18

Chapter 1: Understanding the Server Architecture

When you define three partitions across the mapping in Figure 1-8, the master thread creates three threads at each pipeline stage, for a total of 12 threads. If you need to read data from four file sources, you would specify four partitions at the source qualifier. The master thread would create a fourth thread at each stage, for a total of 16 threads. The PowerCenter Server processes partitions concurrently. When you run a session with multiple partitions, the threads run as follows: 1. 2. The reader threads run concurrently to extract data from the source. The transformation threads run concurrently in each transformation stage to process data. The PowerCenter Server redistributes data among the partitions at each partition point. The writer threads run concurrently to write data to the target.

3.

Note: Increasing the number of partitions or partition points increases the number of threads.

Therefore, increasing the number of partitions or partition points also increases the load on the server machine. If the server machine contains ample CPU bandwidth, processing rows of data in a session concurrently can increase session performance. However, if you create a large number of partitions or partition points in a session that processes large amounts of data, you can overload the system.

Number of Source Pipelines
The master thread creates a reader and transformation thread for each source pipeline in the target load order group. For more information on source pipelines and target load order groups, see “Reading Source Data” on page 22. When you connect multiple pipelines to a multiple input group transformation, such as a Joiner or Custom transformation, the PowerCenter Server maintains the transformation threads or creates a new transformation thread depending on the partitioning information:

You add a partition point at the multiple input group transformation. The PowerCenter Server creates a new pipeline stage and creates one transformation thread downstream from the partition point. The PowerCenter Server creates one transformation thread regardless of the number of output groups the transformation contains. You do not add a partition point at the multiple input group transformation. The PowerCenter Server maintains the same number of transformation threads downstream from the partition point until it reaches the next partition point. However, for each partition at the multiple input group transformation and its downstream transformations, only one thread actively processes a row of data at any given time.

Understanding Processing Threads

19

Figure 1-9 shows the thread creation for a mapping that contains a Joiner transformation configured for sorted input:
Figure 1-9. Thread Creation with Joiner Transformation 1 Reader Thread 1 Transformation Thread

* * *

* Partition Points

1 Reader Thread

1 Transformation Thread

1 Writer Thread

Each source pipeline in Figure 1-9 contains a transformation thread. The Joiner transformation is not a partition point, so both transformation threads can process data at the Joiner and Expression transformations. However, only one transformation thread processes a row at any given time. The target load order group contains one target, so the master thread creates only one writer thread. Suppose you add a partition point at the Joiner transformation in Figure 1-9. Figure 1-10 shows the mapping in Figure 1-9 with a partition point at the Joiner transformation:
Figure 1-10. Thread Creation with a Partition Point at a Joiner Transformation 1 Reader Thread 1 Transformation Thread

* * * *

* Partition Points

1 Reader Thread

1 Transformation Thread

1 Transformation Thread Created After the Partition Point

1 Writer Thread

20

Chapter 1: Understanding the Server Architecture

Each source pipeline in Figure 1-10 contains a transformation thread. However, the transformation threads end at the Joiner transformation. The Joiner transformation is a partition point, so the master thread creates a new transformation thread starting at the partition point.
Note: If any source qualifier in either Figure 1-9 or Figure 1-10 feeds a target other than the

target associated with the Joiner transformation, the master thread creates an additional writer thread.

Understanding Processing Threads

21

PowerCenter Server Processing
When you run a session, the PowerCenter Server reads source data and passes it to the transformations for processing. To help understand PowerCenter Server processing, consider the following PowerCenter Server actions:

Reading source data. The PowerCenter Server reads the sources in a mapping at different times depending on how you configure the sources, transformations, and targets in the mapping. For more information on reading data, see “Reading Source Data” on page 22. Blocking data. The PowerCenter Server sometimes blocks the flow of data at a transformation in the mapping while it processes a row of data from a different source. For more information on blocking data, see “Blocking Data” on page 23. Block processing. The PowerCenter Server reads and processes a block of rows at a time. For more information, see “Block Processing” on page 23.

Reading Source Data
You create a session based on a mapping. Mappings contain one or more target load order groups. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. Each target load order group contains one or more source pipelines. A source pipeline consists of a source qualifier and all of the transformations and target instances that receive data from that source qualifier. By default, the PowerCenter Server reads sources in a target load order group concurrently, and it processes target load order groups sequentially. You can configure the order that the PowerCenter Server processes target load order groups. For more information on setting the target load order, see “Mappings” in the Designer Guide. Figure 1-11 shows a mapping that contains two target load order groups and three source pipelines:
Figure 1-11. Target Load Order Groups and Source Pipelines Sources Transformations Targets Pipeline A

A

T1 Target Load Order Group 1

B

T2

Pipeline B C T3 Target Load Order Group 2 Pipeline C

22

Chapter 1: Understanding the Server Architecture

In the mapping shown in Figure 1-11, the PowerCenter Server processes the target load order groups sequentially. It first processes Target Load Order Group 1 by reading Source A and Source B at the same time. When it finishes processing Target Load Order Group 1, the PowerCenter Server begins to process Target Load Order Group 2 by reading Source C.

Blocking Data
You can include multiple input group transformations in a mapping. The PowerCenter Server passes data to the input groups concurrently. However, sometimes the transformation logic of a multiple input group transformation requires that the PowerCenter Server block data on one input group while it waits for a row from a different input group. Blocking is the suspension of the data flow into an input group of a multiple input group transformation. When the PowerCenter Server blocks data, it reads data from the source connected to the input group until it fills the reader and transformation buffers. Once the PowerCenter Server fills the buffers, it does not read more source rows until the transformation logic allows the PowerCenter Server to stop blocking the source. When the PowerCenter Server stops blocking a source, it processes the data in the buffers and continues to read from the source. The PowerCenter Server blocks data at one input group when it needs a specific row from a different input group to perform the transformation logic. Once the PowerCenter Server reads and processes the row it needs, it stops blocking the source.

Block Processing
The PowerCenter Server reads and processes a block of rows at a time. The number of rows in the block depend on the row size and the DTM buffer size. In the following circumstances, the PowerCenter Server processes one row in a block:
♦ ♦

Log row errors. When you log row errors, the PowerCenter Server processes one row in a block. Connect CURRVAL. When you connect the CURRVAL port in a Sequence Generator transformation, the session processes one row in a block. For optimal performance, Informatica recommends that you connect only the NEXTVAL port in mappings. For more information, see “Sequence Generator Transformation” in the Transformation Guide. Configure array-based mode for Custom transformation procedure. When you configure the data access mode for a Custom transformation procedure to be row-based, the PowerCenter Server processes one row in a block. By default, the data access mode is arraybased, and the PowerCenter Server processes multiple rows in a block. For more information, see “Custom Transformation Functions” in the Transformation Guide.

PowerCenter Server Processing

23

System Resources
To allocate system resources for read, transformation, and write processing, you should understand how the PowerCenter Server allocates and uses system resources. The PowerCenter Server uses the following system resources:
♦ ♦ ♦ ♦

CPU Load Manager shared memory DTM buffer memory Cache memory

CPU Usage
The PowerCenter Server performs read, transformation, and write processing for a pipeline in parallel. It can process multiple partitions of a pipeline within a session, and it can process multiple sessions in parallel. If you have a symmetric multi-processing (SMP) platform, you can use multiple CPUs to concurrently process session data or partitions of data. This provides increased performance, as true parallelism is achieved. On a single processor platform, these tasks share the CPU, so there is no parallelism. The PowerCenter Server can use multiple CPUs to process a session that contains multiple partitions. The number of CPUs used depends on factors such as the number of partitions, the number of threads, the number of available CPUs, and amount or resources required to process the mapping. For more information about partitioning, see “Pipeline Partitioning” on page 345.

Load Manager Shared Memory
The Load Manager uses both process and shared memory. The Load Manager keeps a list of workflows and the schedule queue in process memory. The Load Manager shared memory is organized as an array of session slots that store session instance and status information. The DTM retrieves the session object and mapping object from the repository for processing. Session instance information does not occupy the shared memory slot until session run time. When you start a workflow, the Load Manager retrieves session instance information from the repository with other workflow tasks. At session runtime, the Load Manager places the session instance information into a shared memory slot and starts the DTM. The DTM connects to the shared memory and uses the session instance information to retrieve the session and mapping from the repository. When the session completes, the Load Manager releases the session instance from the shared memory slot and writes session run information to the repository. If the PowerCenter Server shuts down, it releases all sessions from shared memory.

24

Chapter 1: Understanding the Server Architecture

You can configure three parameters in the PowerCenter Server configuration that control how the Load Manager allocates shared memory to sessions and the number of sessions the PowerCenter Server runs simultaneously:

MaxSessions. The maximum sessions parameter indicates the maximum number of session slots available to the Load Manager at one time for running or repeating sessions. For example, if you select the default MaxSessions of 10, the Load Manager allocates 10 session slots. This parameter helps you control the number of sessions the PowerCenter Server can run simultaneously. LMSharedMemory. Set the Load Manager shared memory parameter in conjunction with the Maximum Sessions parameter to ensure that the Load Manager has enough memory for each session. The Load Manager requires approximately 200,000 bytes of shared memory for each session slot. The default setting is 2,000,000 bytes. For each increase of 10 sessions in the MaxSessions setting, you need to increase LMSharedMemory by 2,000,000 bytes. FailSessionIfMaxSessionsReached. The Fail Session If Max Sessions Reached option determines how the Load Manager handles a session when the number of sessions already running equals the number specified for maximum sessions. By default, this option is disabled, and the Load Manager holds sessions waiting to run in a ready queue until a session slot becomes available.

DTM Buffer Memory
The Load Manager launches the DTM. The DTM allocates buffer memory to the session based on the DTM Buffer Size setting in the session properties. By default, it allocates 12,000,000 bytes of memory to the session. The DTM divides the memory into buffer blocks as configured in the Buffer Block Size setting in the session properties (64,000 bytes per block, by default). The reader, transformation, and writer threads use buffer blocks to move data from sources to targets. You can sometimes improve session performance by increasing buffer memory when you run a session handling a large volume of character data and the PowerCenter Server runs in Unicode mode. In Unicode mode, the PowerCenter Server uses double bytes to move characters, so increasing buffer memory might improve session performance. If the DTM cannot allocate the configured amount of buffer memory for the session, the session cannot initialize. Informatica recommends you allocate no more than 1 GB for DTM buffer memory.

System Resources

25

Cache Memory
The DTM process creates in-memory index and data caches to temporarily store data used by the following transformations:
♦ ♦ ♦ ♦

Aggregator transformation (without sorted input) Rank transformation Joiner transformation Lookup transformation (with caching enabled)

You configure memory size for the index and data cache in the transformation properties. By default, the PowerCenter Server allocates 1,000,000 bytes for the index cache and 2,000,000 bytes for the data cache. By default, the DTM creates cache files in the directory configured for the $PMCacheDir server variable. If the DTM requires more space than it allocates, it pages to local index and data files. The DTM process also creates an in-memory cache to store data used by a Sorter transformation. You configure the memory size for the cache in the transformation properties. By default, the PowerCenter Server allocates 8,388,608 bytes for the cache, and the DTM creates cache files in the directory configured for the $PMTempDir server variable. If the DTM requires more cache space than it allocates, it pages to local cache files. When processing large amounts of data, the DTM may create multiple index and data files. The session does not fail if it runs out of cache memory and pages to the cache files. It does fail, however, if the local directory for cache files runs out of disk space. After the session completes, the DTM releases memory used by the index and data caches and deletes any index and data files. However, if the session is configured to perform incremental aggregation or if a Lookup transformation is configured for a persistent lookup cache, the DTM saves all index and data cache information to disk for the next session run. For more information about caching, see “Session Caches” on page 613.

26

Chapter 1: Understanding the Server Architecture

Code Pages and Data Movement Modes
You can configure PowerCenter to move multibyte data. The PowerCenter Server can move data in either ASCII or Unicode data movement mode. These modes determine how the PowerCenter Server handles character data. You choose the data movement mode in the PowerCenter Server configuration settings. If you want to move multibyte data, choose Unicode data movement mode. To ensure that data is not lost during conversion from one machine to another, you must also choose the appropriate code pages for your connections. In the Workflow Manager, you select code pages for the PowerCenter Server and the database connections the PowerCenter Server uses to connect to the source and target machines. The Workflow Manager validates code page compatibility when you add or edit a session. For more information, see “Globalization Overview” and “Code Pages” in the Installation and Configuration Guide.

ASCII Mode
Use ASCII mode when all sources and targets are 7-bit ASCII or EBCDIC character sets. In ASCII mode, the PowerCenter Server recognizes 7-bit ASCII and EBCDIC characters and stores each character in a single byte. When the PowerCenter Server runs in ASCII mode, it does not validate session code pages. It reads all character data as ASCII characters and does not perform code page conversions. It also treats all numerics as U.S. Standard and all dates as binary data.

Unicode Mode
Use Unicode mode when sources or targets use 8-bit or multibyte character sets and contain character data. In Unicode mode, the PowerCenter Server recognizes multibyte character sets as defined by supported code pages. If you configure the PowerCenter Server to validate data code pages, the PowerCenter Server validates source and target code page compatibility when you run a session. If you configure the PowerCenter Server for relaxed data code page validation, the PowerCenter Server lifts source and target compatibility restrictions. When reading a source, the PowerCenter Server converts data from the source character set to Unicode based on the source code page. The PowerCenter Server allots two bytes for each character when moving data through a mapping. The PowerCenter Server converts data from Unicode to the target character set based on the target code page when writing to the target. It also treats all numerics as U.S. Standard and all dates as binary data. The PowerCenter Server code page must be compatible with the code pages of the PowerCenter Client. For details on code page compatibility and validation, see “Globalization Overview” in the Installation and Configuration Guide.
Code Pages and Data Movement Modes 27

Output Files and Caches
Once launched, the PowerCenter Server logs status and error messages to a UNIX log file or to the Windows Application log. During each workflow run, the PowerCenter Server creates a workflow log file. During each session, the PowerCenter Server creates a session log file and reject file. Depending on transformation cache settings and target types, the PowerCenter Server may create additional files as well. The PowerCenter Server uses the PowerCenter Server code page to generate log files. When you directly access a log file generated by the PowerCenter Server, it appears in the character set of the PowerCenter Server code page. When you use the Workflow Manager to access a file generated by the PowerCenter Server, such as a session log, the Workflow Manager uses the PowerCenter Client code page to translate and display the session log in the character set of the PowerCenter Client code page. The PowerCenter Server creates the following output files:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

PowerCenter Server log Workflow log file Session log file Session details file Performance details file Reject files Row error logs Recovery tables and files Control file Post-session email Output file Cache files

When the PowerCenter Server on UNIX creates any file other than a recovery file, it sets the file permissions according to the umask of the shell that starts the PowerCenter Server. For example, when the umask of the shell that starts the PowerCenter Server is 022, the PowerCenter Server creates files with rw-r--r-- permissions. To change the file permissions, you must change the umask of the shell that starts the PowerCenter Server and then restart it. The PowerCenter Server on UNIX creates recovery files with rw------- permissions. The PowerCenter Server on Windows creates files with read and write permissions.

PowerCenter Server Log
The PowerCenter Server creates a log for all status and error messages. You can troubleshoot PowerCenter Server problems by examining error messages sent to this log.

28

Chapter 1: Understanding the Server Architecture

On UNIX, the default name of the PowerCenter Server log file is pmserver.log. You configure the PowerCenter Server log file name with the LogFileName option in the PowerCenter Server setup program. On Windows, the PowerCenter Server logs status and error messages in the event log. Use the Event Viewer to access those messages. You can also configure the PowerCenter Server on Windows to write status and error messages to a file.

PowerCenter Server Messages
The PowerCenter Server associates a message code with the text of every message. The code uses a text prefix, such as LM, CMN, or RR, with a code number, such as CMN_1039. In PowerCenter Server error logs, the codes appear before the text as follows:
LM_34003 Server initialization completed. LM_36802 Workflow <workflow name> scheduled to run at <time>.

Some message codes are embedded within other codes, for example:
CMN_1050 [LM 2041 Received request to start session]

You can also configure the PowerCenter Server on Windows to write error messages to the Application Log, which you can view with the Event Viewer. Messages sent from the PowerCenter Server display PowerCenter in the Source column, the code prefix in the Category column, and the code number in the Event column. However, since some message codes are embedded within other codes, to ensure you are viewing the true message code, you must view the text of the message. Figure 1-12 shows a sample application log:
Figure 1-12. Event Viewer Application Log Message

Output Files and Caches

29

Figure 1-13 shows how you can view the text of the message by selecting the message and using the Enter key:
Figure 1-13. Application Log Message Detail

Error Messages
Using the listed error code, consult the Troubleshooting Guide for probable causes and actions to correct the problem.

Workflow Log File
The PowerCenter Server creates a workflow log file for each workflow it runs. It writes information in the workflow log such as intitialization of processes, workflow task run information, errors encountered, and workflow run summary. Workflow log error messages are categorized into severity levels. You can configure the PowerCenter Server to suppress writing messages to the workflow log file. You can also configure the workflow to write workflow messages to the session log file. As with PowerCenter Server logs and session logs, the PowerCenter Server enters a code number into the workflow log file message along with message text. You can find information on error messages in the Troubleshooting Guide. By default, the PowerCenter Server saves workflow logs in a directory entered for the server variable $PMWorkflowLogDir in the PowerCenter Server registration and names the workflow log workflow_name.log. By default, the PowerCenter Server saves only one workflow log for each workflow. If you want to save multiple logs for different workflow runs, you can configure the workflow to save

30

Chapter 1: Understanding the Server Architecture

a workflow log file in two different ways:
♦ ♦

By timestamp, permitting an unlimited number of workflow logs. By cycle, saving the configured number of workflow logs, replacing the older logs with new logs. You can use the server variable $PMWorkflowLogCount to set the number of logs the PowerCenter Server archives for the workflow.

For more information about the workflow log, see “Log Files” on page 455.

Session Log File
The PowerCenter Server creates a session log file for each session it runs. It writes information in the session log such as initialization of processes, session validation, creation of SQL commands for reader and writer threads, errors encountered, and load summary. The amount of detail in the session log depends on the tracing level that you set. As with PowerCenter Server logs and workflow logs, the PowerCenter Server enters a code number along with message text. You can find information on error messages in the Troubleshooting Guide. By default, the PowerCenter Server saves session logs in a directory entered for the server variable $PMSessionLogDir in the PowerCenter Server registration and names the session log session_name.log. By default, the PowerCenter Server saves only one session log for each session. If you want to save multiple logs for different session runs, you can configure the session to save a session log file in two different ways:
♦ ♦

By timestamp, permitting an unlimited number of session logs. By cycle, saving the configured number of session logs, replacing the older logs with new logs. You can use the server variable $PMSessionLogCount to set the number of logs the PowerCenter Server archives for the session.

For more information about the session log, see “Log Files” on page 455.

Session Details
When you run a session, the Workflow Manager creates session details that provide load statistics for each target in the mapping. You can monitor session details during the session or after the session completes. Session details include information such as table name, number of rows written or rejected, and read and write throughput. You can view this information by double-clicking the session in the Workflow Monitor. For more information on session details file, see “Monitoring Session Details” on page 434.

Performance Detail File
The PowerCenter Server can create a set of information known as session performance details to help determine where performance can be improved. Performance details provide

Output Files and Caches

31

transformation-by-transformation information on the flow of data through the session. To generate this information for a session, select the performance detail option in the session properties. You can view performance details in the Workflow Monitor, or open the text file that contains the information in a text editor. The PowerCenter Server names the file session_name.perf, and stores it in the same directory as the session log (in the PowerCenter Server variable directory $PMSessionLog, by default). For more information on performance details, see “Creating and Viewing Performance Details” on page 436.

Reject Files
By default, the PowerCenter Server creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets. The writer may reject a row in the following circumstances:
♦ ♦ ♦

It is flagged for reject by an Update Strategy or Custom transformation. It violates a database constraint, such as primary key constraint. A field in the row was truncated or overflowed, and the target database is configured to reject truncated or overflowed data.

By default, the PowerCenter Server saves the reject file in the directory entered for the server variable $PMBadFileDir in the Workflow Manager, and names the reject file target_table_name.bad.
Note: If you enable row error logging, the PowerCenter Server does not create a reject file.

For more information about the reject file, see “Log Files” on page 455.

Row Error Logs
When you configure a session, you can choose to log row errors in a central location. When a row error occurs, the PowerCenter Server logs error information that allows you to determine the cause and source of the error. The PowerCenter Server logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information. For more information about row error logging, see “Row Error Logging” on page 481.

Recovery Tables and Files
You can recover failed sessions that write to relational targets. The PowerCenter Server creates recovery tables on the target database system when it runs a session enabled for recovery. When you run a session in recovery mode, the PowerCenter Server uses information in the recovery tables to complete the session. For more information about recovery, see “Recovering Data” on page 295.
32 Chapter 1: Understanding the Server Architecture

Control File
When you run a session that uses an external loader, the PowerCenter Server creates a control file and a target flat file. The control file contains information about the target flat file such as data format and loading instructions for the external loader. The control file has an extension of .ctl. You can view the control file and the target flat file in the target file directory (default: $PMTargetFilesDir). For more information about external loading and control files, see “External Loading” on page 523.

Email
You can compose and send email messages by creating an Email task in the Workflow Designer or Task Developer. You can place the Email task in a workflow, or you can associate it with a session. The Email task allows you to automatically communicate information about a workflow or session run to designated recipients. Email tasks in the workflow send email depending on the conditional links connected to the task. For post-session email, you can create two different messages, one to be sent if the session completes successfully, the other if the session fails. You can also use variables to generate information about the session name, status, and total rows loaded. For example, if your database administrator wants to track how long a session takes to complete, you can configure the session to send an email containing the time and date the session starts and completes. Or, if you want to notify your Informatica administrator when a session fails, you can configure the session to send an email only if it fails and attach the session log to the email. For more information, see “Sending Email” on page 319.

Indicator File
If you use a flat file as a target, you can configure the PowerCenter Server to create an indicator file for target row type information. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject. The PowerCenter Server names this file target_name.ind and stores it in the same directory as the target file. For more information about configuring the PowerCenter Server, see the Installation and Configuration Guide.

Output File
If the session writes to a target file, the PowerCenter Server creates the target file based on a file target definition. By default, the PowerCenter Server names the target file based on the target definition name. If a mapping contains multiple instances of the same target, the PowerCenter Server names the target files based on the target instance name.

Output Files and Caches

33

The PowerCenter Server creates this file in the PowerCenter Server variable directory, $PMTargetFileDir, by default. For more information about working with target files, see “Working with Targets” on page 233.

Cache Files
When the PowerCenter Server creates memory cache it also creates cache files. The PowerCenter Server creates index and data cache files for the following transformations in a mapping:
♦ ♦ ♦ ♦ ♦

Aggregator transformation Joiner transformation Rank transformation Lookup transformation Sorter transformation

By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and Lookup transformations in the directory configured for the $PMCacheDir server variable. The PowerCenter Server names the index file PM*.idx, and the data file PM*.dat. The PowerCenter Server creates the index and data files for the Sorter transformation in the $PMTempDir server variable directory. The PowerCenter Server writes to the cache files during the session in the following cases:
♦ ♦ ♦ ♦ ♦

The mapping contains one or more Aggregator transformations configured without sorted ports. The session is configured for incremental aggregation. The mapping contains a Lookup transformation that is configured to use a persistent lookup cache, and the PowerCenter Server runs the session for the first time. The mapping contains a Lookup transformation that is configured to initialize the persistent lookup cache. The DTM runs out of cache memory and pages to the local cache files. The DTM may create multiple files when processing large amounts of data. The session fails if the local directory runs out of disk space.

After the session completes, the DTM generally deletes the overflow index and data files. It does not delete the cache files under the following circumstances:
♦ ♦

The session is configured to perform incremental aggregation. The session is configured with a persistent lookup cache.

Incremental Aggregation Files
If the session performs incremental aggregation, the PowerCenter Server saves index and data cache information to disk when the session finished. The next time the session runs, the PowerCenter Server uses this historical information to perform the incremental aggregation.

34

Chapter 1: Understanding the Server Architecture

The PowerCenter Server names these files PMAGG*.dat and PMAGG*.idx and saves them to the cache directory. For more information about incremental aggregation, see “Using Incremental Aggregation” on page 573.

Persistent Lookup Cache
If a session uses a Lookup transformation, you can configure the transformation to use a persistent lookup cache. With this option selected, the PowerCenter Server saves the lookup cache to disk the first time it runs the session, then uses this lookup cache during subsequent session runs. These files are saved in the cache directory. If you do not name the files in the transformation properties, these files are named PMLKUP*.idx and PMLKUP*.dat. For more information about lookup caching, see “Session Caches” on page 613 and “Lookup Transformation” in the Transformation Guide.

Output Files and Caches

35

36

Chapter 1: Understanding the Server Architecture

Chapter 2

Configuring the Workflow Manager
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦

Overview, 38 Customizing the Workflow Manager Options, 39 Registering the PowerCenter Server, 46 Configuring Connection Object Permissions, 51 Setting Up a Relational Database Connection, 53 Replacing a Relational Database Connection, 62

37

Overview
Before you can use the Workflow Manager to create workflows and sessions, you must configure the Workflow Manager. You can configure display options and connection information in the Workflow Manager. You must register a PowerCenter Server before you can start it or create a workflow to run against it. You can configure the following information in the Workflow Manager:

Configure Workflow Manager options. You can configure options such as grouping sessions or docking and undocking windows. For details, see “Customizing the Workflow Manager Options” on page 39. Register PowerCenter Servers. Before you can start a PowerCenter Server, you must register it with the repository. For details, see “Registering the PowerCenter Server” on page 46. Create a server grid. When you have multiple PowerCenter Servers registered to the same repository you can create a server grid to balance workloads. For details, see “Working with Server Grids” on page 446. Create source and target database connections. Create connections to each source and target database. You must create connections to a database before you can create a session that accesses the database. For details, see “Setting Up a Relational Database Connection” on page 53. Create connections objects. Create connection objects in the repository when you define database, FTP, and external loader connections. For details, see “Configuring Connection Object Permissions” on page 51.

Setting the Date/Time Display Format
The Workflow Manager displays the date and time formats configured in the Windows Control Panel of the PowerCenter Client machine. To modify the date and time formats, display the Control panel and open Regional Settings. Set the date and time formats on the Date and Time tabs.
Note: For the Timer task and schedule settings, the Workflow Manager displays date in short

date format, and the time in 24-hour format (HH:mm).

38

Chapter 2: Configuring the Workflow Manager

Customizing the Workflow Manager Options
You can customize the Workflow Manager default options to control the behavior and look of the Workflow Manager tools. To configure Workflow Manager options, choose Tools-Options. You can configure the following options:

General. You can configure workspace options, display options, and other general options on the General tab. For more information about the General tab, see “Configuring General Options” on page 39. Format. You can configure font, color, and other format options on the Format tab. For more information about the Format tab, see “Configuring Format Options” on page 42. Miscellaneous. You can configure Copy Wizard and Versioning options on the Miscellaneous tab. For more information about the Miscellaneous tab, see “Configuring Miscellaneous Options” on page 43. Advanced. You can configure enhanced security for connection objects in the Advanced tab. For more information about the Advanced tab, see “Enabling Enhanced Security” on page 44.

♦ ♦

Configuring General Options
General options control tool behavior such as whether or not a tool retains its view when you close it, how the Overview window behaves, and where the Workflow Manager stores workspace files.

Customizing the Workflow Manager Options

39

Figure 2-1 shows the Workflow Manager General Options:
Figure 2-1. Workflow Manager General Options

Table 2-1 describes general options you can configure in the Workflow Manager:
Table 2-1. Workflow Manager General Options Option Reload Tasks/ Workflows When Opening a Folder Ask Whether to Reload the Tasks/Workflows Overview Window Pans Delay Arrange Workflows/ Worklets Vertically By Default Allow Invoking In-Place Editing Using the Mouse Description Reloads the last view of a tool when you open it. For example, if you have a workflow open when you disconnect from a repository, select this option so that the same workflow displays the next time you open the folder and Workflow Designer. Enabled by default. Appears only when you select Reload tasks/workflows when opening a folder. Select this option if you want the Workflow Manager to prompt you to reload tasks, workflows, and worklets each time you open a folder. Disabled by default. By default, when you drag the focus of the Overview window, the focus of the workbook moves concurrently. When you select this option, the focus of the workspace does not change until you release the mouse button. Disabled by default. Arranges tasks in workflows vertically by default. Disabled by default.

By default, you can press F2 to edit objects directly in the workspace instead of opening the Edit Task dialog box. Select this option so you can also click the object name in the workspace to edit the object. Disabled by default.

40

Chapter 2: Configuring the Workflow Manager

Table 2-1. Workflow Manager General Options Option Open Editor When Task Is Created Workspace File Directory Description Opens the Edit Task dialog box when you create a task. By default, the Workflow Manager creates the task in the workspace. If you do not enable this option, double-click the task to open the Edit Task dialog box. Disabled by default. The directory for workspace files created by the Workflow Manager. Workspace files maintain the last task or workflow you saved. This directory should be local to the PowerCenter Client to prevent file corruption or overwrites by multiple users. By default, the Workflow Manager creates files in the PowerCenter Client installation directory. Displays the name of the tool in the upper left corner of the workspace or workbook. Enabled by default. Shows the full name of a task when you select it. By default, the Workflow Manager abbreviates the task name in the workspace. Enabled by default. Shows the link condition in the workspace. If you do not enable this option, the Workflow Manager abbreviates the link condition in the workspace. Enabled by default. The Workflow Monitor launches when you start a workflow or a task. Enabled by default.

Display Tool Names On Views Always Show the Full Name of Selected Task Show the Expression On a Link Launch Workflow Monitor when Workflow is Started Receive Notifications from Server

Allows you to receive notification messages from the Repository Server. The Repository Server sends notification about actions performed on repository objects. Enabled by default. For details, see “Understanding the Repository” in the Repository Guide.

Customizing the Workflow Manager Options

41

Configuring Format Options
Format options control colors and fonts. To configure format options, select the appropriate Workflow Manager tool. Figure 2-2 shows the Workflow Manager Format Options:
Figure 2-2. Workflow Manager Format Options

Table 2-2 describes the format options for the Workflow Manager:
Table 2-2. Workflow Manager Format Options Option Show Solid Lines for Links Workspace Colors Color Font Categories Change Font Reset All Description Displays links as solid lines. By default, the Workflow Manager displays links as dotted lines. Displays all items that you can customize in the selected tool. Select an item to change its color. Choose the color of the selected item in Workspace Colors. Select the Workflow Manager tool for which you want to customize the display font. Select to change the display font and language script for the Workflow Manager tool you choose from the Categories menu. Resets all format options to their original default values.

42

Chapter 2: Configuring the Workflow Manager

Configuring Miscellaneous Options
Copy Wizard options control the display settings and available functions for the Copy Wizard. Versioning options control how the Workflow Manager displays checked out objects. Target loading options control how the PowerCenter Server loads targets. To configure Copy Wizard, Versioning, or Target Load Type options, choose Tools-Options and select the Miscellaneous tab. Figure 2-3 shows the Workflow Manager Miscellaneous Options:
Figure 2-3. Copy Wizard, Versioning, and Target Load Type Options

Table 2-3 describes the options for the Copy Wizard, Versioning, and Target Load Type:
Table 2-3. Workflow Manager Miscellaneous Options Option Validate Copied Objects Generate Unique Name When Resolved to “Rename” Description Validates the copied object. Enabled by default. Generates unique names for copied objects if you select the Rename option. For example, if the workflow wf_Sales has the same as a workflow in the destination folder, the Rename option generates the unique name wf_Sales1. Enabled by default. Uses the object with the same name in the destination folder if you select the Choose option. Displays the Check Out icon when an object has been checked out. Enabled by default.

Get Default Object When Resolved to “Choose” Show Check Out Image in Navigator

Customizing the Workflow Manager Options

43

Table 2-3. Workflow Manager Miscellaneous Options Option Reset All Target Load Type Description Resets all Copy Wizard and Versioning options to their default values. Sets default load type for sessions. You can choose normal or bulk loading. Any change you make takes effect after you restart the Workflow Manager. You can override this setting in the session properties. Default is Bulk. For more information on normal and bulk loading, see Table A-15 on page 697.

Enabling Enhanced Security
The Workflow Manager has an enhanced security option that allows you to specify a default set of privileges that applies to restricted access controls for connection objects. When you enable enhanced security, the Workflow Manager automatically assigns default permissions for connection objects to the object owner, owner group, and all other users. You can assign read, write, and execute permissions to an object, and specify permission for users and groups you add in the Permissions dialog box when you edit a connection. Table 2-4 lists the default permissions to a connection object:
Table 2-4. Default Permissions for Connection Objects User Owner Owner Group World Default Connection Object Permissions Read/Write/Execute Read/Execute No permissions

If you do not enable enhanced security, the Workflow Manager assigns Read, Write, and Execute permissions to all users or groups for the connection. Enabling enhanced security does not lock the restricted access settings for connection objects. You can continue to change the permissions for connection objects after enabling enhanced security. If you delete the Owner from the repository, the Workflow Manager automatically assigns ownership of the object to Administrator.
To enable enhanced security for connection objects: 1. 2.

Choose Tools-Options. Click the Advanced Tab.

44

Chapter 2: Configuring the Workflow Manager

3.

Select Enable Enhanced Security.

4.

Click OK.

Customizing the Workflow Manager Options

45

Registering the PowerCenter Server
Before you can start the PowerCenter Server or create or run workflows, you need to register the PowerCenter Server in the repository. Use the Workflow Manager to register the PowerCenter Server. To register, edit, or delete the PowerCenter Server, you must have Administer Server, Administrator, or Super User privileges. In addition, to register a PowerCenter Server, you need the following information:
♦ ♦ ♦

PowerCenter Server name. Host name. TCP/IP address used to access the PowerCenter Server. Use the IP address or host name of the machine on which the PowerCenter Server runs, and the port number the PowerCenter Server uses on that machine.

♦ ♦

Code page identifying the character set associated with the PowerCenter Server. Default directories you want the PowerCenter Server to use for workflow files and caches. Register a PowerCenter Server. When you register a PowerCenter Server, specify information such as the code page and directories for session output. This information is stored in the repository. When you register multiple PowerCenter Servers, you can choose the PowerCenter Server to run a workflow or a session. You also can create a server grid to distribute workloads across multiple servers.

You can perform the following registration tasks for a PowerCenter Server:

Edit a PowerCenter Server. When you edit a PowerCenter Server, all workflows and sessions using that PowerCenter Server use the updated server connection information, including the updated code page settings. You do not need to restart the Workflow Manager to use the updated information. Delete a PowerCenter Server. When you delete a PowerCenter Server, you must assign another PowerCenter Server for the workflows and sessions using the deleted server before you can run the workflow. To assign a PowerCenter Server to a workflow or to a session, choose Connections-Assign.

Server Variables
You can define server variables for each PowerCenter Server you register. Some server variables define the path and directories for workflow output files and caches. By default, the PowerCenter Server places output files in these directories when you run a workflow. Other server variables define server attributes such as log file count. In a server grid, you must use the same server variables for each server. The installation process creates directories in the location where you install the PowerCenter Server. To use these directories as the default location for the session output files, you must first set the server variable $PMRootDir to define the path to the directories.
46 Chapter 2: Configuring the Workflow Manager

By using server variables, you simplify the process of changing the PowerCenter Server that runs a workflow. If each workflow in a folder uses server variables, then when you copy the folder to a production repository, the PowerCenter Server in production can run the workflow using the server variables defined with the PowerCenter server running against the test repository. The PowerCenter Server reads and writes the files to the directories in the $PMRootDir path. To ensure a workflow successfully completes, relocate any necessary file source or incremental aggregation file to the default directories of the new PowerCenter Server. Table 2-5 lists the server variables you configure when you register a PowerCenter Server:
Table 2-5. Server Variables Server Variable $PMRootDir Required/ Optional Required Description A root directory to be used by any or all other server variables. Informatica recommends you use the PowerCenter Server installation directory as the root directory. Default directory for session logs. Defaults to $PMRootDir/SessLogs. Default directory for reject files. Defaults to $PMRootDir/BadFiles. Default directory for the index and data cache files. Defaults to $PMRootDir/Cache. To avoid performance problems, always use a drive local to the PowerCenter Server for the cache directory. Do not use a mapped or mounted drive for cache files. Default directory for target files. Defaults to $PMRootDir/TgtFiles. Default directory for source files. Defaults to $PMRootDir/SrcFiles. Default directory for external procedures. Defaults to $PMRootDir/ ExtProc. Default directory for temporary files. Defaults to $PMRootDir/Temp. Email address to receive post-session email when the session completes successfully. Use to address post-session email. The default value is an empty string. For details, see “Sending Email” on page 319. Email address to receive post-session email when the session fails. The default value is an empty string. Use to address post-session email. Number of session logs the PowerCenter Server archives for the session. Use to archive session logs. For details, see “Viewing Session Logs” on page 474. Defaults to 0. Number of non-fatal errors the PowerCenter Server allows before failing the session. Non-fatal errors include reader, writer, and DTM errors. If you want to stop the session on errors, enter the number of non-fatal errors you want to allow before stopping the session. The PowerCenter Server maintains an independent error count for each source, target, and transformation. Use to configure the Stop On option in the session properties. Defaults to 0. If you use the default setting, non-fatal errors do not cause the session to stop.

$PMSessionLogDir $PMBadFileDir $PMCacheDir

Required Required Required

$PMTargetFileDir $PMSourceFileDir $PMExtProcDir $PMTempDir $PMSuccessEmailUser

Required Required Required Required Optional

$PMFailureEmailUser $PMSessionLogCount

Optional Optional

$PMSessionErrorThreshold

Optional

Registering the PowerCenter Server

47

Table 2-5. Server Variables Server Variable $PMWorkflowLogDir $PMWorkflowLogCount $PMLookupFileDir Required/ Optional Required Optional Optional Description Default directory for workflow logs. Defaults to $PMRootDir/WorkflowLogs. Number of workflow logs the PowerCenter Server archives for the workflow. Defaults to 0. Default directory for lookup files. Defaults to $PMRootDir/LkpFiles.

Steps for Registering a PowerCenter Server
You can register one or more PowerCenter Servers with a PowerCenter repository, allowing you to run workflows and sessions on different servers. In a multiple server environment, it is important to enter descriptive server names for each registered server to help users differentiate between servers. When you register multiple servers you must have a unique server name and a unique combination of host name and port number for each server in the repository. For more information on using multiple servers, see “Using Multiple Servers” on page 443.
To register the PowerCenter Server: 1.

In the Workflow Manager, connect to the repository.
Note: The first time you connect to the repository, use the database user name and

password used to create the repository.
2.

Choose Server-Server Configuration. The Server Browser dialog box appears.

3.

Click New to register a new server.

48

Chapter 2: Configuring the Workflow Manager

The Server dialog box appears.

4. 5. 6.

Enter a new server name. Configure the TCP/IP connectivity settings. If you do not know the IP address, enter the host name and use the Resolve Server button to resolve the IP address. You can also enter the IP address in the Host Name/IP Address field and use the Resolve Server button to resolve the host name. The Workflow Manager can only resolve the host name or IP address if you enter the information in the Host Name/IP Address field. The Workflow Manager also resolves the host name or IP address when you click OK. Table 2-6 describes the settings required to register a PowerCenter Server using TCP/IP:
Table 2-6. TCP/IP Settings to Register a Server TCP/IP Option Server Name Host Name or IP address Resolved IP Address Port Number Required/ Optional Required Required n/a (read-only) Required Description The name of PowerCenter Server. This name must be unique to the repository. Server host name or IP address of the PowerCenter Server machine. The IP address resolved by the Workflow Manager. This is a read-only field. Port number the PowerCenter Server uses. Must be the same port listed in the PowerCenter Server configuration parameters.

Registering the PowerCenter Server

49

Table 2-6. TCP/IP Settings to Register a Server TCP/IP Option Timeout Code Page Required/ Optional Required Required Description Number of seconds the Workflow Manager waits for a response from the PowerCenter Server. Character set associated with the PowerCenter Server. Select the code page identical to the PowerCenter Server operating system code page. Must be identical to or compatible with the repository code page.

7.

For $PMRootDir, enter a valid root directory for the PowerCenter Server platform. Informatica recommends using the PowerCenter Server installation directory as the root directory because the PowerCenter Server installation creates the default server directories there. If you enter a different root directory, make sure to create the necessary directories.

8.

Enter the server variables, as desired. Do not use trailing delimiters. A trailing delimiter might invalidate the directory used by the PowerCenter Server. For example, enter c:\data\sessionlog, not c:\data\sessionlog\. See Table 2-5 on page 47 for a list of server variables.

9.

Click OK. The new PowerCenter Server appears in the Navigator below the repository.

Deleting a PowerCenter Server
When you delete a PowerCenter Server with associated workflows, assign another server to the workflows. For details, see “Assigning the PowerCenter Server to a Workflow” on page 122. To delete a PowerCenter Server, you must have one of the following privileges:
♦ ♦

Administer Server privilege Super User privilege

To delete a server: 1. 2. 3. 4.

In the Workflow Manager, choose Server-Server Configuration. Select the PowerCenter Server you want to delete. Click Delete. Click OK.

50

Chapter 2: Configuring the Workflow Manager

Configuring Connection Object Permissions
You create connection objects in the repository when you define the following connections:

Relational. Database connections for relational source or target databases. For more information about relational database connections, see “Setting Up a Relational Database Connection” on page 53. Queue. Database connections for message queues. For more information about message queues, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide. FTP. Connection to access source or target files using File Transfer Protocol (FTP). For more information about using FTP, see “Using FTP” on page 559. Application. Database connection to access databases such as SAP R/3 and PeopleSoft. For more information, see your PowerCenter Connect documentation. Loader. Connection to access target databases using external loaders. For more information about using external loaders, see “External Loading” on page 523.

♦ ♦ ♦ ♦

With correct permissions, you can access these objects from all folders in the repository and use them in any session.

Connection Object Permissions
You can configure and manage permissions within each connection object. The Workflow Manager assigns Owner permissions to the user who creates the connection. The Workflow Manager grants Owner Group permissions to the first group in the Group Memberships list of the owner. The Workflow Manager automatically assigns default permissions for connection objects to the object owner, owner’s group, and all other users if you enable enhanced security. For more information about enhanced security, see “Enabling Enhanced Security” on page 44. You can specify read, write, and execute permissions for each user and group in the list. You can perform the following types of tasks with different connection object permissions, in combination with user privileges and folder permissions:

Read. View the connection object in the Workflow Manager and Repository Manager. When you have read permission, you can perform tasks in which you view, copy, or edit repository objects associated with the connection object. Write. Edit the connection object. Execute. Run sessions that use the connection object.

♦ ♦

For information on tasks you can perform with user privileges, folder permissions, and connection object permissions, see “Repository Security” in the Repository Guide. To manage connection permissions, you must have Super User privileges or be the owner of the connection. If you do not have the privilege to manage connection permissions, the Permissions dialog box is read-only. You can change the owner of the object, add or remove users and groups in the permissions list, and change the permissions for each user or group.

Configuring Connection Object Permissions

51

To view or delete a connection, you must have at least read permission for the connection. To edit a connection, you must have read and write permissions for the connection. You add permissions from the Connection Browser dialog box.
To configure permissions for connection objects: 1.

Open the Connection Browser dialog box for the connection object. For example, choose Connections-Relational to open the Connection Browser dialog box for a relational database connection. Select the connection object you want to configure in the Connection Browser dialog box. Click Permissions to open the Permissions dialog box.

2. 3.

Configure permissions for connection objects.

4.

Select the owner and group for the connection object.

5.

Add user or group you want to assign permissions for the connection, and click OK.

52

Chapter 2: Configuring the Workflow Manager

Setting Up a Relational Database Connection
Before the PowerCenter Server can access a source or target database in a session, you must configure the database connections in the Workflow Manager. When you create or modify a session that reads from or writes to a relational database, you can select only configured source and target databases. Database connections are saved in the repository. When you create a connection, you must have the following information available:
♦ ♦ ♦ ♦ ♦ ♦

Database name. Name for the connection. Database type. Type of the source or target database. Database username. Name of a user who has the appropriate database permissions to read from and write to the database. Password. Database password (7-bit ASCII only). Connect string. Connect string used to communicate with the database. Database code page. Code page associated with the database.

Some database drivers, such as ISG Navigator, do not allow user names and passwords. Since the Workflow Manager requires a database user name and password, PowerCenter provides two reserved words to register databases that do not allow user names and passwords:
♦ ♦

PmNullUser PmNullPasswd

Use the PmNullUser user name if you are using Oracle OS Authentication. Oracle OS Authentication allows you to log on to an Oracle database if you have a logon to the operating system. You do not need to know a database user name and password. PowerCenter uses Oracle OS Authentication when the connection user name is PmNullUser and the connection is for an Oracle database. You can change connection information at any time. If you edit a Workflow Manager connection used by a workflow, the PowerCenter Server uses the updated connection information the next time the workflow runs. You might use this functionality when moving from test to production.
Tip: If you edit a database connection, all sessions using the named connection then use the

updated connection. To create a database connection, you must have one of the following privileges:
♦ ♦

Use Workflow Manager Super User

Database Connect Strings
When you create a database connection, specify a connect string for that connection. The PowerCenter Server uses connect strings to communicate with a database.

Setting Up a Relational Database Connection

53

Table 2-7 lists the native connect string syntax for each supported database when you create or update connections:
Table 2-7. Native Connect String Syntax Database IBM DB2 Informix Microsoft SQL Server Oracle Sybase Teradata* Connect String Syntax dbname dbname@servername servername@dbname dbname.world (same as TNSNAMES entry) servername@dbname ODBC_data_source_name or ODBC_data_source_name@db_name or ODBC_data_source_name@db_user_name Example mydatabase mydatabase@informix sqlserver@mydatabase oracle.world sambrown@mydatabase TeradataODBC TeradataODBC@mydatabase TeradataODBC@jsmith

*Use Teradata ODBC drivers to connect to source and target databases.

Database Connection Code Pages
When you create a database connection, select a code page for that connection. Code pages must be compatible for accurate data movement. If you configure the PowerCenter Server and PowerCenter Client for data code page validation, the PowerCenter Server enforces code page compatibility at session runtime. Use the following guidelines to determine code page compatibility:
♦ ♦

The target database code page must be a superset of the source database code page and the PowerCenter Server code page. The source database code page must be a subset of the target database code page and the PowerCenter Server code page.

For example, if the source database code page is 7-bit ASCII and the PowerCenter Server code page is Latin 1, the target database code page must be Latin 1, which is a superset of 7-bit ASCII. Table 2-8 summarizes code page compatibility between the source and target code pages when you configure the PowerCenter Client and PowerCenter Server for data code page validation:
Table 2-8. Source and Target Code Page Compatibility Component Code Page Source Target Code Page Compatibility Subset of target and PowerCenter Server. Superset of source and PowerCenter Server. The PowerCenter Server creates external loader data and control files using the target flat file code page.

54

Chapter 2: Configuring the Workflow Manager

When you change the code page in a database connection, you must choose one that is compatible with the previous code page. If the code pages are incompatible, the Workflow Manager invalidates all sessions using that database connection. If you configure the PowerCenter Client and PowerCenter Server for relaxed data code page validation, you can select any supported code page for source and target database connections. If you are familiar with your data and are confident that it will convert safely from one code page to another, you can run sessions with incompatible source and target data code pages. It is your responsibility to ensure your data will convert properly. For details, see “Globalization Overview” and “Code Pages” in the Installation and Configuration Guide.

Configuring Environment SQL
For relational databases, you may need to execute some SQL commands in the database environment when you connect to the database. For example, you might want to set isolation levels on the source and target systems to avoid deadlocks. You configure environment SQL in the database connection. You can use environment SQL for source, target, lookup, and stored procedure connections. If the SQL syntax is not valid, the PowerCenter Server does not connect to the database, and the session fails. The PowerCenter Server executes the SQL each time it connects to the database. For example, if you configure environment SQL in a target connection, and you configure three partitions for the pipeline, the PowerCenter Server executes the SQL three times, once for each connection to the target database.

Guidelines for Entering Environment SQL
Consider the following guidelines when creating the SQL statements:

You can enter any SQL command that is valid in the database associated with the connection object. The PowerCenter Server does not allow nested comments, even though the database might. When you enter SQL in the SQL Editor, you manually type in the SQL statements. Use a semi-colon (;) to separate multiple statements. The PowerCenter Server ignores semi-colons within single quotes, double quotes, or within /* ...*/. If you need to use a semi-colon outside of quotes or comments, you can escape it with a back slash (\). You cannot use session or mapping variables in the environment SQL. You can configure the table owner name using sqlid in the environment SQL for a DB2 connection. However, the table owner name in the target instance overrides the SET sqlid statement in environment SQL. To use the table owner name specified in the SET sqlid statement, do not enter a name in the target name prefix.

♦ ♦ ♦ ♦ ♦ ♦

Setting Up a Relational Database Connection

55

Configuring a Relational Database Connection
Use the following procedure to configure a relational database connection.
To create a relational database connection: 1. 2.

In the Workflow Manager, connect to a repository. Choose Connections-Relational. A dialog box appears, listing all the registered source and target database connections.

3. 4.

Select the type of database connection you want to create. Click New.

56

Chapter 2: Configuring the Workflow Manager

The Connection Object Definition dialog box appears.

5.

For relational database connections, enter the connection information listed in Table 2-9:
Table 2-9. Relational Database Connection Information Database Connection Option Name Required/ Optional Required Description Connection name used by the Workflow Manager. Connection name cannot contain spaces or other special characters, except for the underscore. Type of database. Database user name with the appropriate read and write database permissions to access the database. If you are using Oracle OS Authentication, or you are using databases such as ISG Navigator that do not allow user names, enter PmNullUser. For Teradata connections, this overrides the default database user name in the ODBC entry. Password for the database user name. For Oracle OS Authentication, or for databases such as ISG Navigator that do not allow passwords, enter PmNullPassword. For Teradata connections, this overrides the database password in the ODBC entry. Passwords must be in 7-bit ASCII only.

Type User Name

Required Required

Password

Required

Setting Up a Relational Database Connection

57

Table 2-9. Relational Database Connection Information Database Connection Option Connect String Required/ Optional Required for all databases, except Microsoft SQL Server and Sybase Required Description Connect string used to communicate with the database. For syntax, see “Database Connect Strings” on page 53.

Code Page

Specifies the code page the PowerCenter Server uses to read from a source database or write to a target database or file.

6.

For each type of relational database connection, enter the attributes listed in Table 2-10:
Table 2-10. Relational Database Connection Attributes Attribute Name Rollback Segment Relational Database Type Oracle Description The name of the rollback segment. A rollback segment records database transactions in the event that you want to undo the transaction. Enables parallel processing when loading data into a table in bulk mode. Enter SQL commands to set the database environment when you connect to the database. The name of the database. For Teradata connections, this overrides the default database name in the ODBC entry. Also, if you do no enter a database name here for a Teradata connection, the PowerCenter Server uses the default database name in the ODBC entry. The name of the Teradata ODBC data source. Database server name. Used to configure workflows. Used to optimize the ODBC connection to Sybase and Microsoft SQL Server. The name of the domain. Used for Microsoft SQL Server on Windows. If selected, the PowerCenter Server uses Windows authentication to access the Microsoft SQL Server database. The user name that starts the PowerCenter Server must be a valid Windows user with access to the Microsoft SQL Server database.

Enable Parallel Mode Environment SQL Database Name

Oracle All relational databases Sybase, Microsoft SQL Server, and Teradata

Data Source Name Server Name Packet Size Domain Name Use Trusted Connection

Teradata Sybase and Microsoft SQL Server Sybase and Microsoft SQL Server Microsoft SQL Server Microsoft SQL Server

7.

Click OK. The new database connection appears in the Connection Browser list.

8.

To add more database connections, repeat steps 3-7.

58

Chapter 2: Configuring the Workflow Manager

9.

Click OK to save all changes.

Deleting Connection Objects
When you delete relational, queue, FTP, Application, and external loader connections, the Workflow Manager marks all sessions that use these connections invalid. To make the sessions valid, you must edit them and replace the missing connections.

Copying a Relational Database Connection
After you set up a relational database connection, you can make a copy of it by clicking the Copy As button. The Workflow Manager allows you to choose the relational database type when you make a copy of a relational database connection. When you make a copy of a relational database connection, the Workflow Manager retains the connection properties that apply to the relational database type you select. The copy of the connection is invalid if a required connection property is missing. Edit the connection properties manually to validate the connection. The Workflow Manager appends an underscore and the first three letters of the relational database type to the name of the new database connection. For example, you make a copy of the Microsoft SQL Server database connection called Dev_Target. You choose Oracle for the type of the new database connection. The Workflow Manager names the new database connection Dev_Target_Ora.
To copy a relational database connection: 1.

Choose Connections-Relational. The Relational Connection Browser appears.

2.

Choose the relational connection you want to copy.
Tip: Hold the shift key to select more than one connection to copy.
Setting Up a Relational Database Connection 59

3.

Click Copy As. The Select Subtype dialog box appears.

4. 5. 6.

Select a relational database type for the copy of the connection. Click OK. The Workflow Manager retains connection properties that apply to the relational database type. If a required connection property does not exist, the Workflow Manager displays a warning message.

7. 8.

Click OK to close the warning dialog box. The copy of the connection appears in the Relational Connection Browser.

60

Chapter 2: Configuring the Workflow Manager

9. 10.

If the copied connection is invalid, click the Edit button to enter required connection properties. Click Close to close the Relational Connection Browser dialog box.

Setting Up a Relational Database Connection

61

Replacing a Relational Database Connection
You can replace a relational database connection with another relational database connection. For example, you might have several sessions that you want to write to another target database. Instead of editing the properties for each session, you can replace the relational database connection for all sessions in the repository that use the connection. When you replace database connections, the Workflow Manager replaces the relational database connections in the following locations for all sessions using the connection:
♦ ♦ ♦ ♦ ♦

Source connection Target connection Connection Information property in Lookup and Stored Procedure transformations $Source Connection Value session property $Target Connection Value session property

If the repository contains both relational and application connections with the same name, the Workflow Manager only replaces the relational connection when you specified the connection type as relational in all locations in the repository. For example, you have a relational and an application source, each called ITEMS. In one session, you specified the name ITEMS for a source connection instead of Relational:ITEMS. When you replace the relational connection ITEMS with another relational connection, the Workflow Manager does not replace any relational connection in the repository because it cannot determine the connection type for the source connection entered as ITEMS. The PowerCenter Server uses the updated connection information the next time the workflow runs. To replace connections in the Workflow Manager, you must have Super User privilege. You must first close all folders before replacing a relational database connection.
To replace a relational database connection: 1. 2.

Close all folders in the repository. Choose Connections-Replace.

62

Chapter 2: Configuring the Workflow Manager

The Replace Connections dialog box appears.
Replace a connection.

3. 4.

Click the Add button to replace a connection. In the From list, choose a relational database connection you want to replace.

5. 6.

In the To list, choose the replacement relational database connection. Click Replace. All sessions in the repository that use the From connection now use the connection you choose in the To list.

Replacing a Relational Database Connection

63

64

Chapter 2: Configuring the Workflow Manager

Chapter 3

Using the Workflow Manager
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 66 Navigating the Workspace, 69 Working with Repository Objects, 73 Checking Out and In Versioned Repository Objects, 74 Searching For Versioned Objects, 76 Copying Repository Objects, 77 Comparing Repository Objects, 79 Working with Metadata Extensions, 82

65

Overview
In the Workflow Manager, you define a set of instructions called a workflow to execute mappings you build in the Designer. Generally, a workflow contains a session and any other task you may want to perform when you execute a session. Tasks can include a session, email notification, or scheduling information. You connect each task with links in the workflow. You can also create a worklet in the Workflow Manager. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You can execute a batch of worklets inside a workflow. After you create a workflow, you run the workflow in the Workflow Manager and monitor it in the Workflow Monitor. For details on the Workflow Monitor, see “Monitoring Workflows” on page 401.

Workflow Manager Tools
To create a workflow, you first create tasks such as a session, which contains the mapping you build in the Designer. You then connect tasks with conditional links to specify the order of execution for the tasks you created. The Workflow Manager consists of three tools to help you develop a workflow:
♦ ♦

Task Developer. Use the Task Developer to create tasks you want to execute in the workflow. Workflow Designer. Use the Workflow Designer to create a workflow by connecting tasks with links. You can also create tasks in the Workflow Designer as you develop the workflow. Worklet Designer. Use the Worklet Designer to create a worklet.

Figure 3-1 shows what a workflow might look like if you want to run a session, perform a shell command after the session completes, and then stop the workflow:
Figure 3-1. Sample Workflow

Workflow Tasks
You can create the following types of tasks in the Workflow Manager:
♦ ♦

Assignment. Assigns a value to a workflow variable. For details, see “Working with the Assignment Task” on page 140. Command. Specifies a shell command to run during the workflow. For details, see “Using Workflow Variables” on page 103.

66

Chapter 3: Using the Workflow Manager

♦ ♦ ♦ ♦ ♦ ♦ ♦

Control. Stops or aborts the workflow. For details on the Control task, see “Stopping or Aborting the Workflow” on page 129. Decision. Specifies a condition to evaluate. For details, see “Working with the Decision Task” on page 149. Email. Sends email during the workflow. For details on the Email task, see “Sending Email” on page 319. Event-Raise. Notifies the Event-Wait task that an event has occurred. For details, see “Working with Event Tasks” on page 153. Event-Wait. Waits for an event to occur before executing the next task. For details, see “Working with Event Tasks” on page 153. Session. Runs a mapping you create in the Designer. For details on the Session task, see “Working with Sessions” on page 173. Timer. Waits for a timed event to trigger. For details, see “Scheduling a Workflow” on page 112.

Workflow Manager Windows
The Workflow Manager displays the following windows to help you create and organize workflows:
♦ ♦ ♦

Navigator. Allows you to connect to and work in multiple repositories and folders. In the Navigator, the Workflow Manager displays a red icon over invalid objects. Workspace. Allows you to create, edit, and view tasks, workflows, and worklets. Output. Contains tabs to display different types of output messages. The Output window contains the following tabs:
− − − − − −

Save. Displays messages when you save a workflow, worklet, or task. The Save tab displays a validation summary when you save a workflow or a worklet. Fetch Log. Displays messages when the Workflow Manager fetches objects from the repository. Validate. Displays messages when you validate a workflow, worklet, or task. Copy. Displays messages when you copy repository objects. Server. Displays messages from the PowerCenter Server. Notifications. Displays messages from the Repository Server.

Overview. An optional window that allows you to easily view large workflows in the workspace. Outlines the visible area in the workspace and highlights selected objects in color. Choose View-Overview Window to display this window.

You can view a list of open windows and switch from one window to another in the Workflow Manager. To view the list of open windows, choose Window-Windows. The Workflow Manager also displays a status bar that shows the status of the operation you perform.

Overview

67

Figure 3-2 shows the Workflow Manager windows:
Figure 3-2. Workflow Manager Windows Navigator Workspace

Overview

Output

Status Bar

68

Chapter 3: Using the Workflow Manager

Navigating the Workspace
The Workflow Manager allows you to perform the following operations to navigate the workspace:
♦ ♦ ♦ ♦ ♦

Customize windows. Customize toolbars. Search for tasks, links, events and variables. Arrange objects in the workspace. Zoom and pan the workspace.

Customizing Workflow Manager Windows
You can customize the following options for the Workflow Manager windows:
♦ ♦ ♦

Display a window. From the menu, choose View. Then select the window you want to open. Close a window. Click the small x in the upper right corner of the window. Dock or undock a window. Double-click the title bar, or drag the title bar toward or away from the workspace.

Using Toolbars
The Workflow Manager can display the following toolbars to help you select tools and perform operations quickly:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Standard. Contains buttons to connect to and disconnect from repositories and folders, toggle windows, zoom in and out, pan the workspace, and find objects. Connections. Contains buttons to open connection browsers and to assign servers. Repository. Contains buttons to connect to, disconnect from, and add repositories, open folders, close tools, save changes to repositories, and print the workspace. View. Contains buttons to customize toolbars, toggle the status bar and windows, toggle full-screen view, create a new workbook, and view the properties of objects. Layout. Contains buttons to arrange and restore objects in the workspace, find objects, zoom in and out, and pan the workspace. Tasks. Contains buttons to create tasks. Workflow. Contains buttons to edit workflow properties. Run. Contains buttons to schedule the workflow, start the workflow, or start a task.

Navigating the Workspace

69

You can perform the following operations with toolbars:
♦ ♦ ♦

Display or hide a toolbar. Create a new toolbar. Add or remove buttons.

For details on how to perform these toolbar operations, see “Using the Designer” in the Designer Guide.

Searching for Items
The Workflow Manager includes search features to help you find tasks, links, variables, and events in the workspace as well as text in the Output window. You can search for items in any Workflow Manager tool or Output window. There are two ways to search for items in the workspace:
♦ ♦

Find in Workspace. Searches multiple items at once and returns a list of all task names, link conditions, event names, or variable names that contain the search string. Find Next. Searches through items one at a time and highlights the first task, link, event, variable, or text string that contains the search string. If you repeat the search, the Workflow Manager highlights the next item that contains the search string.

To find a task, link, event, or variable in the workspace: 1.

In any Workflow Manager tool, click the Find in Workspace toolbar button or choose Edit-Find in Workspace. The Find in Workspace dialog box opens:

2. 3.

Choose whether you want to search for tasks, links, variables, or events. Enter a search string, or select a string from the list. The Workflow Manager saves the last 10 search strings in the list.

4. 5.

Specify whether or not to match whole words and whether or not to perform a casesensitive search. Click Find Now. The Workflow Manager lists task names, link conditions, event names, or variable names that match the search string at the bottom of the dialog box.

6. 70

Click Close.

Chapter 3: Using the Workflow Manager

To find a single object: 1.

To search for a task, link, event, or variable, open the appropriate Workflow Manager tool and click a task, link, or event. To search for text in the Output window, click the appropriate tab in the Output window. Enter a search string in the Find field on the standard toolbar. The search is not case-sensitive.
Find Next Button Find Field

2.

3.

Choose Edit-Find Next, click the Find Next button on the toolbar, or press Enter or F3 to search for the string. The Workflow Manager highlights the first task name, link condition, event name, or variable name that contains the search string, or the first string in the Output window that matches the search string.

4.

To search for the next item, press Enter or F3 again. The Workflow Manager alerts you when you have searched through all items in the workspace or Output window before it highlights the same objects a second time.

Arranging Objects in the Workspace
The Workflow Manager can arrange objects in the workspace horizontally or vertically. In the Task Manager, you can also arrange tasks evenly in the workspace by choosing Tile. To arrange objects in the workspace, select Layout-Arrange and choose Horizontal, Vertical, or Tile.

Zooming the Workspace
You can zoom in and out as well as pan the workspace to adjust the view. Use the following toolbar or Layout menu options to set zoom levels:
♦ ♦ ♦

Zoom Center In/Out by 10%. Increases or decreases the magnification by 10% increments while maintaining the center of the view. Zoom Point In/Out by 10%. Uses a point you select as the center point and increases or decreases the magnification by 10% increments. Zoom Rectangle. Increases the current magnification of a rectangular area you select. Degree of magnification depends upon the size of the area you select, workspace size, and current magnification. Zoom Normal. Sets the zoom level to 100%. Scale to Fit. Scales all workspace objects to fit the workspace.

♦ ♦

Navigating the Workspace

71

Zoom Percent. Sets the zoom level to the percent you choose while maintaining the center of the view.

To maximize the size of the workspace window, choose View-Full Screen. To go back to normal view, click the Close Full Screen button or press Esc. To pan the workspace, click Layout-Pan or click the Pan button on the toolbar. Drag the focus of the workspace window and release the mouse button when it is in the appropriate position. Double-click the workspace to stop panning.

72

Chapter 3: Using the Workflow Manager

Working with Repository Objects
The Workflow Manager allows you to perform the following general operations with repository objects:
♦ ♦ ♦

View properties for each object. Enter descriptions for each object. Rename an object.

To edit any repository object, you must first add a repository in the Navigator so you can access the repository object. To add a repository in the Navigator, choose Repository-Add or click the Add Repository button on the Repository toolbar. Enter the repository name and user name and click OK.

Viewing Object Properties
To view properties of a repository object, first select the repository object in the Navigator. Choose View-Properties to view object properties. Or, right-click the repository object and choose Properties. You can view properties of a folder, task, worklet, or workflow. For folders, the Workflow Manager displays folder name and whether the folder is shared. Object properties are readonly. You can also view dependencies for repository objects, for more information about viewing object dependencies, see the Repository Guide.

Entering Descriptions for Repository Objects
When you edit an object in the Workflow Manager, you can enter descriptions and comments for that object. The maximum number of characters you can enter is 2,000 bytes/K, where K is the maximum number of bytes a character contains in the selected repository code page. For example, if the repository code page is a Japanese code page where the each character can contain up to two bytes (K=2), each description and comment field allows you to enter up to 1,000 characters.

Renaming Repository Objects
You can rename repository objects by clicking the Rename button in the Edit Tasks dialog box or the Edit Workflow dialog box. You can also rename repository objects by clicking the object name in the workspace and typing in the new name.

Working with Repository Objects

73

Checking Out and In Versioned Repository Objects
When you work with versioned objects, you check out an object when you want to change it, and check it in when you want to commit your changes to the repository. Checking in new objects adds a new version to the object history. For more information, see “Working with Versioned Objects” in the Repository Guide.

Checking Out Objects
When you open an object in the workspace, the repository checks out the object and locks the object for your use. No other user can check out the object. If another user has checked out the object, you can open the object as read-only. You can view objects you and other users have checked out. You might want to view checkouts to see if an object is available for you to work with, or if you need to check in all of the objects you have worked with. For more information on viewing object checkouts, see “Working with Versioned Objects” in the Repository Guide.

Checking In Objects
You commit changes to the repository by checking in objects. When you check in an object, the repository creates a new version of the object and assigns it a version number. The repository increments the version number by one each time it creates a new version. You can check in an object from the Workflow Manager workspace. To do this, select the object and choose Versioning-Check in. You can check in an object when you review the results of the following tasks:
♦ ♦ ♦

View object history. You can check in an object from the View History window when you view the history of an object. View checkouts. You can check in an object from the View Checkouts window when you search for checked out objects. View query results. You can check in an object from the Query Results window when you search for object dependencies or run an object query.

To check in an object, select the object or objects and choose Versioning-Check in. Enter text into the comment field in the Check In dialog box.

74

Chapter 3: Using the Workflow Manager

Figure 3-3 shows the Check In dialog box:
Figure 3-3. Check In Workflow Manager Objects

Apply the check in comment to multiple objects.

When you check in an object, the repository creates a new version of the object and increments the version number by one.

Checking Out and In Versioned Repository Objects

75

Searching For Versioned Objects
You can use an object query to search for versioned objects in the repository that meet specified conditions. When you run a query, the repository returns results based on those conditions. You may want to create an object query to perform the following tasks:

Track repository objects during development. You can add Label, User, Last saved, or Comments parameters to queries to track objects during development. For more information about creating object queries, see “Grouping Versioned Objects” in the Repository Guide. Associate a query with a deployment group. When you create a dynamic deployment group, you can associate an object query with it. For more information about working with deployment groups, see “Copying Folders and Deployment Groups” in the Repository Guide.

To create an object query, choose Versioning-Queries to open the Query Browser. Figure 3-4 shows the Query Browser:
Figure 3-4. Query Browser

Edit a query. Delete a query. Create a query. Configure permissions. Run a query.

From the Query Browser, you can create, edit, and delete queries. You can also configure permissions for each query from the Query Browser. You can run any queries for which you have read permissions from the Query Browser. For information about working with object queries, see “Grouping Versioned Objects” in the Repository Guide.

76

Chapter 3: Using the Workflow Manager

Copying Repository Objects
You can copy repository objects (such as workflows, worklets, or tasks) within the same folder, to a different folder, or to a different repository. If you want to copy the object to another folder, you must open the destination folder before you copy the object into the folder. The Workflow Manager provides a Copy Wizard that allows you to copy objects. When you copy a workflow or a worklet, the Copy Wizard copies all of the worklets, sessions, and tasks in the workflow. You must resolve all conflicts that occur. Conflicts occur when the Copy Wizard finds a workflow or worklet with the same name in the target folder, or when the server connection does not exist in the target repository. If a server connection does not exist, you can skip the conflict and choose a server connection after you copy the workflow. You cannot copy server connections. Conflicts may also occur when you copy Session tasks. For more details on the Copy Wizard, see “Copying Objects” in the Repository Guide. You can configure display settings and functions of the Copy Wizard by choosing ToolsOptions. For details, see “Configuring Miscellaneous Options” on page 43.
Note: The Workflow Manager provides an Import Wizard that allows you to import objects

from an XML file. The Import Wizard provides the same options to resolve conflicts as the Copy Wizard. For details, see “Exporting and Importing Objects” in the Repository Guide.

Copying Sessions
When you copy a Session task, the Copy Wizard looks for the database connection and associated mapping in the destination folder. If the mapping or connection does not exist in the destination folder, you can select a new mapping or connection. If the destination folder does not contain any mapping, you must first copy a mapping to the destination folder in the Designer before you can copy the session. When you copy a session that has mapping variable values saved in the repository, the Workflow Manager either copies or retains the saved variable values.

Copying Workflow Segments
You can copy segments of workflows and worklets when you want to reuse a portion of workflow or worklet logic. A segment consists of one or more tasks, the links between the tasks, and any condition in the links. You can copy reusable and non-reusable objects when copying and pasting segments. You can copy segments of workflows or worklets into workflows and worklets within the same folder, within another folder, or within a folder in a different repository. You can also paste segments of workflows or worklets into an empty Workflow Designer or Worklet Designer workspace.

Copying Repository Objects

77

To copy a segment from a workflow or worklet: 1. 2.

Open the workflow or worklet. Select a segment by highlighting each task you want to copy. You can select multiple reusable or non-reusable objects. You can also select segments by dragging the pointer in a rectangle around objects in the workspace. Choose Edit-Copy or press Ctrl+C to copy the segment to the clipboard. Open the workflow or worklet into which you want to paste the segment. You can also copy the object into the Workflow or Worklet Designer workspace. Choose Edit-Paste or press Ctrl+V.

3. 4. 5.

The Copy Wizard opens, and notifies you if it finds copy conflicts.
Note: You can copy individual non-reusable tasks by selecting the individual task and

following the instructions for copying and pasting segments.

78

Chapter 3: Using the Workflow Manager

Comparing Repository Objects
The Workflow Manager allows you to compare two repository objects of the same type to identify differences between the objects. For example, if you have two similar Email tasks in a folder, you can compare them to see which one contains the attributes you need. When you compare two objects, the Workflow Manager displays their attributes in detail. You can compare objects across folders and repositories. To do this, you must have both folders open. You can compare a reusable object with a non-reusable object. You can also compare two versions of the same object. For more information about versioned objects, see “Working with Versioned Objects” in the Repository Guide. To compare objects, you must have read permission on each folder that contains the objects you want to compare. You can compare the following types of objects:
♦ ♦ ♦ ♦

Tasks Sessions Worklets Workflows

You can also compare instances of the same type. For example, if the workflows you compare contain worklet instances with the same name, you can compare the instances to see if they differ. The Workflow Manager also allows you to compare the following instances and attributes:
♦ ♦ ♦

Instances of sessions and tasks in a workflow or worklet comparison. For example, when you compare workflows, you can compare task instances that have the same name. Instances of mappings and transformations in a session comparison. For example, when you compare sessions, you can compare mapping instances. The attributes of instances of the same type within a mapping comparison. For example, when you compare flat file sources, you can compare attributes, such as file type (delimited or fixed), delimiters, escape characters, and optional quotes.

You can compare schedulers and session configuration objects in the Repository Manager. You cannot compare objects of different types. For example, you cannot compare an Email task with a Session task. When you compare objects, the Workflow Manager displays the results in the Diff Tool window. The Diff Tool output contains different nodes for different types of objects. When you import Workflow Manager objects, you can compare object conflicts. For more information, see “Exporting and Importing Objects” in the Repository Guide.

Comparing Repository Objects

79

Steps for Comparing Objects
Use the following procedure to compare objects.
To compare two objects: 1. 2. 3.

Open the folders that contain the objects you want to compare. Open the appropriate Workflow Manager tool. Choose Tasks-Compare, Worklets-Compare, or Workflow-Compare. A dialog box similar to the following one opens:

4. 5.

Click Browse to select an object. Click Compare.
Tip: You can also compare objects from the Navigator or workspace. In the Navigator,

select the objects, right-click and choose Compare Objects. In the workspace, select the objects, right-click and choose Compare Objects.

80

Chapter 3: Using the Workflow Manager

Figure 3-5 shows the result of comparing two objects:
Figure 3-5. Diff Tool Window Filter nodes that have same attribute values. Drill down to further compare objects.

Differences between objects are highlighted and the nodes are flagged. Differences between object properties are marked. Displays the properties of the node you select.

You can further compare differences between object properties by clicking the Compare Further icon or by right-clicking the differences.

6.

If you want to save the comparison as a text or HTML file, choose File-Save to File.
Comparing Repository Objects 81

Working with Metadata Extensions
You can extend the metadata stored in the repository by associating information with individual repository objects. For example, you may wish to store your name with the worklets you create. If you create a session, you can store your telephone extension with that session. You associate information with repository objects using metadata extensions. Repository objects can contain both vendor-defined and user-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them. You can create, edit, delete, and view user-defined metadata extensions, as well as change their values. You can create metadata extensions for the following objects in the Workflow Manager:
♦ ♦ ♦

Sessions Workflows Worklets

You can create both reusable and non-reusable metadata extensions. You associate reusable metadata extensions with all repository objects of a certain type such as all sessions or all worklets. You associate non-reusable metadata extensions with a single repository object such as one workflow. For more information about metadata extensions, see “Metadata Extensions” in the Repository Guide. To create, edit, and delete user-defined metadata extensions in the Workflow Manager, you must have read and write permissions on the folder.

Creating a Metadata Extension
You can create user-defined, reusable and non-reusable metadata extensions for repository objects using the Workflow Manager. To create a metadata extension, you edit the object for which you want to create the metadata extension, and then add the metadata extension to the Metadata Extensions tab. If you need to create multiple reusable metadata extensions, it is easier to create them using the Repository Manager. For details, see “Metadata Extensions” in the Repository Guide.
To create a metadata extension: 1. 2. 3.

Open the appropriate Workflow Manager tool. Drag the appropriate object into the workspace. Double-click the title bar of the object to edit it.

82

Chapter 3: Using the Workflow Manager

4.

Click the Metadata Extensions tab:

User-Defined Metadata Extensions

This tab lists the existing user-defined and vendor-defined metadata extensions. Userdefined metadata extensions appear in the User Defined Metadata Domain. If they exist, vendor-defined metadata extensions appear in their own domains.
5.

Click the Add button. A new row appears in the User Defined Metadata Extension Domain.

6.

Enter the information in Table 3-1:
Table 3-1. Metadata Extension Attributes in the Workflow Manager Field Extension Name Required/ Optional Required Description Name of the metadata extension. Metadata extension names must be unique for each type of object in a domain. Metadata extension names cannot contain any special characters except underscores and cannot begin with numbers. The datatype: numeric (integer), string, or boolean. The maximum length for string metadata extensions.

Datatype Precision

Required Required for string objects

Working with Metadata Extensions

83

Table 3-1. Metadata Extension Attributes in the Workflow Manager Field Value Required/ Optional Optional Description An optional value. For a numeric metadata extension, the value must be an integer between -2,147,483,647 and 2,147,483,647. For a boolean metadata extension, choose true or false. For a string metadata extension, click the Open button in the Value field to enter a value of more than one line, up to 2,147,483,647 bytes. Makes the metadata extension reusable or non-reusable. Check to apply the metadata extension to all objects of this type (reusable). Clear to make the metadata extension apply to this object only (non-reusable). Note: If you make a metadata extension reusable, you cannot change it back to non-reusable. The Workflow Manager makes the extension reusable as soon as you confirm the action. Restores the default value of the metadata extension when you click Revert. This column appears only if the value of one of the metadata extensions was changed. Description of the metadata extension.

Reusable

Required

UnOverride

Optional

Description 7.

Optional

Click OK.

Editing a Metadata Extension
You can edit user-defined, reusable, and non-reusable metadata extensions for repository objects using the Workflow Manager. To edit a metadata extension, you edit the repository object, and then make changes to the Metadata Extensions tab. What you can edit depends on whether the metadata extension is reusable or non-reusable. You can promote a non-reusable metadata extension to reusable, but you cannot change a reusable metadata extension to non-reusable.

Editing Reusable Metadata Extensions
If the metadata extension you want to edit is reusable and editable, you can change the value of the metadata extension, but not any of its properties. However, if the vendor or user who created the metadata extension did not make it editable, you cannot edit the metadata extension or its value. For details, see “Metadata Extensions” in the Repository Guide. To edit the value of a reusable metadata extension, click the Metadata Extensions tab and modify the Value field. To restore the default value for a metadata extension, click Revert in the UnOverride column.

84

Chapter 3: Using the Workflow Manager

Editing Non-Reusable Metadata Extensions
If the metadata extension you want to edit is non-reusable, you can change the value of the metadata extension as well as its properties. You can also promote the metadata extension to a reusable metadata extension. To edit a non-reusable metadata extension, click the Metadata Extensions tab. You can update the Datatype, Value, Precision, and Description fields. For a description of these fields, see Table 3-1 on page 83. If you wish to make the metadata extension reusable, check Reusable. If you make a metadata extension reusable, you cannot change it back to non-reusable. The Workflow Manager makes the extension reusable as soon as you confirm the action. To restore the default value for a metadata extension, click Revert in the UnOverride column.

Deleting a Metadata Extension
You can delete metadata extensions for repository objects. You delete reusable metadata extensions using the Repository Manager. You can delete non-reusable metadata extensions using the Workflow Manager. To do this, edit the repository object, and then delete the metadata extension from the Metadata Extensions tab.

Working with Metadata Extensions

85

Keyboard Shortcuts
When editing a repository object or maneuvering around the Workflow Manager, use the following Keyboard shortcuts to help you complete different operations quickly. Table 3-2 lists the Workflow Manager keyboard shortcuts for editing a repository object:
Table 3-2. Workflow Manager Keyboard Shortcuts To Cancel editing in a cell Check and uncheck a check box. Copy text from a cell onto the clipboard. Cut text from a cell onto the clipboard. Edit the text of a cell. Find all combination and list boxes. Find tables or fields in the workspace. Move around cells in a dialog box. Paste copied or cut text from the clipboard into a cell. Select the text of a cell. Press Esc Space Bar Ctrl+C Ctrl+X F2. Then move the cursor to the desired location. Type the first letter on the list. Ctrl+F Ctrl+directional arrows Ctrl+V F2

Table 3-3 lists the Workflow Manager keyboard shortcuts for navigating in the workspace:
Table 3-3. Keyboard Shortcuts for Navigating the Workspace To Create links. Press Ctrl+F2. Press Ctrl+F2 to select first task you want to link. Press Tab to select the rest of the tasks you want to link. Press Ctrl+F2 again to link all the tasks you selected. F2 SHIFT + * (use asterisk on numeric keypad ) Tab Ctrl+mouse click

Edit task name in the workspace. Expand selected node and all its children. Move across Select tasks in the workspace. Select multiple tasks.

86

Chapter 3: Using the Workflow Manager

Chapter 4

Working with Workflows
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 88 Developing Workflows, 91 Using the Workflow Wizard, 99 Using Workflow Variables, 103 Scheduling a Workflow, 112 Validating a Workflow, 119 Running the Workflow, 122 Suspending the Workflow, 127 Stopping or Aborting the Workflow, 129

87

Overview
A workflow is a set of instructions that tells the PowerCenter Server how to execute tasks such as sessions, email notifications, and shell commands. After you create tasks in the Task Developer and Workflow Designer, you connect the tasks with links to create a workflow. In the Workflow Designer, you can specify conditional links and use workflow variables to create branches in the workflow. The Workflow Manager also provides Event-Wait and EventRaise tasks so you can control the sequence of task execution in the workflow. You can also create worklets and nest them inside the workflow. Every workflow contains a Start task, which represents the beginning of the workflow. Figure 4-1 shows a sample workflow:
Figure 4-1. Sample Workflow Workflow Tasks

Start Task

Session Task

Assignment Task

Link

Command Task

You can create workflows with branches to execute tasks concurrently.

88

Chapter 4: Working with Workflows

Figure 4-2 shows a sample workflow with two branches:
Figure 4-2. Sample Workflow With Two Branches

After you create a workflow, select a PowerCenter Server to run the workflow. You can then start the workflow using the Workflow Manager, Workflow Monitor, or pmcmd. Use the Workflow Monitor to see the progress of a workflow during its run. The Workflow Monitor can also show the history of a workflow. For more information about the Workflow Monitor, see “Monitoring Workflows” on page 401. Use the following guidelines when you develop a workflow: 1. 2. Create a new workflow. Create a new workflow in the Workflow Designer. For details on creating a new workflow, see “Creating a New Workflow” on page 91. Add tasks in the workflow. You might have already created tasks in the Task Developer. Or, you can add tasks to the workflow as you develop the workflow in the Workflow Designer. For details on workflow tasks, see “Working with Tasks” on page 131. Connect tasks with links. After you add tasks in the workflow, connect them with links to specify the order of execution in the workflow. For details on links, see “Working with Links” on page 92. Specify conditions for each link. You can specify conditions on the links to create branches and dependencies. For details, see “Working with Links” on page 92. Validate workflow. Validate the workflow in the Workflow Designer to identify errors. For details on validation rules, see “Validating a Workflow” on page 119. Save workflow. When you save the workflow, the Workflow Manager validates the workflow and updates the repository. Run workflow. In the workflow properties, select a PowerCenter Server to run the workflow. Run the workflow from the Workflow Manager, Workflow Monitor, or pmcmd. You can monitor the workflow in the Workflow Monitor. For details on starting a workflow, see “Running the Workflow” on page 122.

3.

4. 5. 6. 7.

For a complete list of workflow properties, see “Workflow Properties Reference” on page 721.
Overview 89

Workflow Privileges
You need the one of the following privileges to create a workflow:
♦ ♦

Use Workflow Manager privilege with read and write folder permissions Super User privilege Workflow Operator privilege Super User privilege

You need one of the following privileges to run, schedule, and monitor the workflow:
♦ ♦

90

Chapter 4: Working with Workflows

Developing Workflows
The first step to develop a workflow is to create a new workflow in the Workflow Designer. A workflow must contain a Start task. The Start task represents the beginning of a workflow. When you create a workflow, the Workflow Designer creates a Start task and adds it to the workflow. You cannot delete the Start task. After you create a new workflow, the next step is to add tasks to the workflow. The Workflow Manager includes tasks such as the Session task, the Command task, and the Email task so you can design your workflow. Finally, you connect workflow tasks with links to specify the order of execution in the workflow. You can add conditions to links.

Creating a New Workflow
You must create a workflow before you can add tasks such as a Session, Command, or Email. When adding a session, if the workspace in the Workflow Designer is empty, you can create a workflow automatically.
To create a workflow manually: 1. 2. 3. 4.

Open the Workflow Designer. Choose Workflows-Create. Enter a name for the new workflow. Click OK. The Workflow Designer creates a Start task in the new workflow.

For information on using the Workflow Wizard, see “Using the Workflow Wizard” on page 99.

Developing Workflows

91

To create a workflow automatically: 1. 2. 3.

Open the Workflow Designer. Close any open workflow. Click the session button on the Tasks toolbar. Click in the Workflow Designer workspace. The Mappings dialog box displays.

4.

Select a mapping to associate with the session and click OK. The Create Workflow dialog box appears. The Workflow Designer names the workflow wf_MappingName by default. You can rename the workflow or change other workflow properties. For more information on workflow properties, see “Workflow Properties Reference” on page 721.

5.

Click OK. The Workflow Designer creates a workflow for the session.

Adding Tasks to Workflows
After you create a new workflow, you add tasks you want to execute in the workflow. You may already have created tasks in the Task Developer. Or, you may want to create tasks in the Workflow Designer as you develop the workflow. If you have already created tasks in the Task Developer, add them to the workflow by dragging the tasks from the Navigator window to the Workflow Designer workspace. To create and add tasks as you develop the workflow, choose Tasks-Create in the Workflow Designer. Or, you can also use the Tasks toolbar to create and add tasks to the workflow. Click the button on the Tasks toolbar for the task you want to create. Click again in the Workflow Designer workspace to create and add the task. Tasks you create in the Workflow Designer are non-reusable. Tasks you create in the Task Developer are reusable. For more information about reusable tasks, see “Reusable Workflow Tasks” on page 135.

Working with Links
Use links to connect each workflow task. You can specify conditions with links to create branches in the workflow. The Workflow Manager does not allow you to use links to create loops in the workflow. Each link in the workflow can execute only once. The workflow in Figure 4-3 is not a loop because each task runs at most once.

92

Chapter 4: Working with Workflows

Figure 4-3 shows a valid workflow:
Figure 4-3. Valid Workflow

The Workflow Manager does not allow you to create a workflow that contains a loop, such as the loop shown in Figure 4-4. Figure 4-4 shows a loop where the three sessions may be run multiple times:
Figure 4-4. Example of a Loop

Use the following procedure to link tasks in the Workflow Designer or the Worklet Designer.
To link two tasks: 1.

In the Tasks toolbar, click the link button.
Link Button

2. 3.

In the workspace, click the first task you want to connect and drag it to the second task. A link appears between the two tasks.

If you have a number of tasks that you want to link concurrently, you may not wish to connect each link manually. To quickly link tasks concurrently, use the following procedure.
To link several tasks concurrently: 1. 2.

In the workspace, click the first task you want to connect. Ctrl-click all other tasks you want to connect.
Developing Workflows 93

Note: Do not use Ctrl+A or Edit-Select to choose tasks.
3. 4.

Choose Tasks-Link concurrent. A link appears between the first task you selected and each task you added. The first task you selected links to each task concurrently.

If you have a number of tasks that you want to link sequentially, you may not wish to connect each link manually. To quickly link tasks sequentially, use the following procedure.
To link several tasks sequentially: 1. 2. 3. 4.

In the workspace, click the first task you want to connect. Ctrl-click the next task you want to connect. Continue to add tasks in the order you want them to run. Choose Tasks-Link sequential. Links appear in sequential order between the first task and each subsequent task you added.

Specifying Link Conditions
Once you create links between tasks, you can specify conditions for each link to determine the order of execution in the workflow. If you do not specify conditions for each link, the PowerCenter Server executes the next task in the workflow by default. You can use pre-defined or user-defined workflow variables in the link condition. If the link condition evaluates to True, the PowerCenter Server executes the next task in the workflow. If the link condition evaluates to False, the PowerCenter Server does not execute the next task in the workflow. You can view results of link evaluation during workflow runs in the workflow log file.

Example of Link Conditions
You can use link conditions to specify the order of execution in the workflow or to create branches in the workflow. For example, you may have two Session tasks in the workflow, s_STORES_CA and s_STORES_AZ. You want the PowerCenter Server to run the second Session task only if the first Session task has no target failed rows. To accomplish this, you can set the link condition between the two sessions so that the s_STORES_AZ executes only if the number of failed target rows for S_STORES_CA is zero.

94

Chapter 4: Working with Workflows

Figure 4-5 shows how to set the link condition using the target failed rows variable for S_STORES_CA:
Figure 4-5. Setting Link Condition

After you specify the link condition in the Expression Editor, the Workflow Manager validates the link condition and displays it next to the link in the workflow. Figure 4-6 shows the link condition displayed in the workspace:
Figure 4-6. Displaying Link Condition in the Workflow

Link Condition

To specify a condition for a link: 1.

In the Workflow Designer workspace, double-click the link you want to specify. or Right-click the link and choose Edit. The Expression Editor displays.
Developing Workflows 95

2.

In the Expression Editor, enter the link condition. The Expression Editor provides pre-defined workflow variables, user-defined workflow variables, variable functions, and boolean and arithmetic operators.

3.

Validate the expression using the Validate button. The Workflow Manager displays error messages in the Output window.

Tip: Click and drag the end point of a link to move it from one task to another without losing

the link condition.

Using the Expression Editor
The Workflow Manager provides an Expression Editor for any expressions in the workflow. You can enter expressions using the Expression Editor for the following:
♦ ♦ ♦

Link conditions Decision task Assignment task

Figure 4-7 shows the Expression Editor:
Figure 4-7. Expression Editor

The Expression Editor displays system variables, user-defined, and pre-defined workflow variables such as $Session.status. For details on workflow variables, see “Using Workflow Variables” on page 103. The Expression Editor also displays a list of functions. PowerCenter uses a SQL-like language that contains many functions designed to handle common expressions. For example, you can use the ABS function to find the absolute value. For a complete list of functions, see the Transformation Language Reference.

96

Chapter 4: Working with Workflows

Adding Comments
The Expression Editor also allows you to add comments using -- or // comment indicators. You can use comments to give descriptive information about the expression, or you can specify a valid URL to access business documentation about the expression. For examples on adding comments to expressions, see “The Transformation Language” in the Transformation Language Reference.

Validating Expressions
You can use the Validate button to validate an expression. If you do not validate an expression, the Workflow Manager validates it when you close the Expression Editor. You cannot run a workflow with invalid expressions. Expressions in link conditions and Decision task conditions must evaluate to a numerical value. Workflow variables used in expressions must exist in the workflow.

Expression Editor Display
The Expression Editor can display syntax expressions in different colors for better readability. If you have the latest Rich Edit control, riched20.dll, installed on your system, the Expression Editor displays expression functions in blue, comments in grey, and quoted strings in green. You can resize the Expression Editor. Expand the dialog box by dragging from the borders. The Workflow Manager saves the new size for the dialog box as a client setting.

Deleting a Workflow
You may decide to delete a workflow that you no longer use. When you delete a workflow, you delete all non-reusable tasks and reusable task instances associated with the workflow. Reusable tasks used in the workflow remain in the folder when you delete the workflow. If you delete a workflow that is running, the PowerCenter Server aborts the workflow. If you delete a workflow that is scheduled to run, the PowerCenter Server removes the workflow from the schedule. You can delete a workflow in the Navigator window, or you can delete the workflow currently displayed in the Workflow Designer workspace.
♦ ♦

To delete a workflow from the Navigator window, open the folder, select the workflow and press the Delete key. To delete a workflow currently displayed in the Workflow Designer workspace, choose Workflows-Delete.

Developing Workflows

97

Editing a Workflow
When you edit a workflow, the repository updates the workflow information when you save the workflow. If a workflow is running when you make edits, the PowerCenter Server uses the updated information the next time you run the workflow.

Viewing Links in Workflow or Worklet
When you edit a workflow or worklet, you can view the forward or backward link paths to other tasks. You can highlight paths to see links in the workflow branch from the Start task to the last task in the branch.
Note: You can configure the color the Workflow Manager uses to display links. When you

configure the format options, choose the Link Selection option.
To view link paths: 1. 2.

In the Worklet Designer or Workflow Designer, right-click a task and choose Highlight Path. Choose Forward Path, Backward Path, or Both. The Workflow Manager highlights all links in the branch you select.

Deleting Links in a Workflow or Worklet
When you edit a workflow or worklet, you can delete multiple links at once without deleting the connected tasks.
To delete multiple links: 1.

In the Worklet Designer or Workflow Designer, select all links you want to delete.
Tip: You can use the mouse to click and drag the selection, or you can Ctrl-click the tasks

and links.
2.

Choose Edit-Delete Links. The Workflow Manager removes all selected links.

98

Chapter 4: Working with Workflows

Using the Workflow Wizard
You can use the Workflow Wizard to automate the process of creating sessions, adding sessions to a workflow, and linking sessions to create a workflow. The Workflow Wizard creates sessions from mappings and adds them to the workflow. It also creates a Start task and allows you to schedule the workflow. You can add tasks and edit other workflow properties after the Workflow Wizard completes. If you want to create concurrent sessions, use the Workflow Designer to manually build a workflow. Before you create a workflow, verify that the folder contains a valid mapping for the Session task. Complete the following steps to build a workflow using the Workflow Wizard: 1. 2. 3. Assign a name and PowerCenter Server to the workflow. Create a session. Schedule the workflow.

Step 1. Assign a Name and PowerCenter Server to the Workflow
In the first step of the Workflow Wizard, you add the name and description of the workflow and choose the PowerCenter Server to run the workflow.
To create the workflow: 1. 2. 3.

In the Workflow Manager, open the folder containing the mapping you want to use in the workflow. Open the Workflow Designer. Choose Workflows-Wizard.

Using the Workflow Wizard

99

The Workflow Wizard appears.

4.

Enter a name for the workflow. The convention for naming workflows is wf_WorkflowName. For a complete list of naming conventions for repository objects, see “Naming Conventions” in Getting Started.

5. 6.

Enter a description for the workflow. Choose the PowerCenter Server to run the workflow, and click Next.

The next step is to create a session.

Step 2. Create a Session
In the second step of the Workflow Wizard, you create a session based on a mapping. You can add tasks later in the Workflow Designer workspace. For details on working with tasks, see “Working with Tasks” on page 131.
To create a session: 1.

In the second step of the Workflow Wizard, select a valid mapping and click the right arrow button. The Workflow Wizard creates a Session task in the right pane using the selected mapping and names it s_MappingName by default.

100

Chapter 4: Working with Workflows

The following figure shows a mapping selected for a session:

2.

You can select additional mappings to create more Session tasks in the workflow. When you add multiple mappings to the list, the Workflow Wizard creates sequential sessions in the order you add them.

3. 4.

Use the arrow buttons to change the session order. Specify whether the session should be reusable. When you create a reusable session, you can use the session in other workflows. For details on reusable sessions, see “Working with Tasks” on page 131

5.

Specify how you want the PowerCenter Server to run the workflow. You can specify that the PowerCenter Server runs sessions only if previous sessions complete, or you can specify that the PowerCenter Server always runs each session. When you select this option, it applies to all sessions you create using the Workflow Wizard.

Step 3. Schedule a Workflow
In the third step of the Workflow Wizard, you can schedule a workflow to run continuously, repeat at a given time or interval, or start manually. The PowerCenter Server runs a workflow unless the prior workflow run fails. When a workflow fails, the PowerCenter Server removes the workflow from the schedule, and you must reschedule it. You can do this in the Workflow Manger or using pmcmd.

Using the Workflow Wizard

101

To schedule a workflow: 1.

In the third step of the Workflow Wizard, configure the scheduling and run options. For more information about scheduling a workflow, see “Scheduling a Workflow” on page 112. Click Next. The Workflow Wizard displays the settings for the workflow:

2.

3.

Verify the workflow settings and click Finish. To edit settings, click Back. The completed workflow opens in the Workflow Designer workspace. From the workspace, you can add tasks, create concurrent sessions, add conditions to links, or modify properties.

4.

When you finish modifying the workflow, choose Repository-Save.

102

Chapter 4: Working with Workflows

Using Workflow Variables
You can create and use variables in a workflow to reference values and record information. For example, you can use a variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task. If not, you can stop the workflow. You can use the following types of workflow variables:

Pre-defined workflow variables. The Workflow Manager provides pre-defined workflow variables for tasks within a workflow. For more information, see “Pre-Defined Workflow Variables” on page 105. User-defined workflow variables. You create user-defined workflow variables when you create a workflow. For more information, see “User-Defined Workflow Variables” on page 108. Assignment tasks. You can use an Assignment task to assign a value to a user-defined workflow variable. For example, you can increment a user-defined counter variable by setting the variable to its current value plus 1. For information on using workflow variables in Assignment tasks, see “Working with the Assignment Task” on page 140. Decision tasks. Decision tasks determine how the PowerCenter Server executes a workflow. For example, you can use the Status variable to run a second session only if the first session completes successfully. For information on using workflow variables in Decision tasks, see “Working with the Decision Task” on page 149. Links. Links connect each workflow task. You can use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false. For information on using workflow variables in Link tasks, see “Working with Links” on page 92. Timer tasks. Timer tasks specify when the PowerCenter Server begins to execute the next task in the workflow. You can use a user-defined date/time variable to specify the exact time the PowerCenter Server starts to execute the next task. For information on using workflow variables in Timer tasks, see “Working with the Timer Task” on page 161.

You can use workflow variables when you configure the following types of tasks:

You can use the Expression Editor to create an expression that uses variables.

Using Workflow Variables

103

Figure 4-8 shows the Expression Editor:
Figure 4-8. Expression Editor Select pre-defined variables. Select user-defined variables. Create an expression using variables.

When you build an expression, you can select pre-defined variables on the Pre-Defined tab. You can select user-defined variables on the User-Defined tab. The Functions tab contains functions that you can use with workflow variables. Use the point-and-click method to enter an expression using a variable. For information on using the Expression Editor, see “Using the Expression Editor” on page 96. You can use the following keywords to write expressions for user-defined and pre-defined workflow variables:
♦ ♦ ♦ ♦ ♦ ♦ ♦

AND OR NOT TRUE FALSE NULL SYSDATE

104

Chapter 4: Working with Workflows

Pre-Defined Workflow Variables
Each workflow contains a set of pre-defined variables that you can use to evaluate workflow and task conditions. You can use the following types of pre-defined variables:

Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. You can use task-specific variables in a link condition to control the path the PowerCenter Server takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor. System variables. You can use the SYSDATE and WORKFLOWSTARTTIME system variables within a workflow. For more information on system variables, see “Variables” in the Transformation Language Reference. The Workflow Manager lists system variables under the Built-in node in the Expression Editor.

Table 4-1 lists the task-specific workflow variables available in the Workflow Manager:
Table 4-1. Task-Specific Workflow Variables Task-Specific Variables Condition Description Evaluation result of decision condition expression. If the task fails, the Workflow Manager keeps the condition set to null. Date and time the associated task ended. Last error code for the associated task. If there is no error, the PowerCenter Server sets ErrorCode to 0 when the task completes. Last error message for the associated task. If there is no error, the PowerCenter Server sets ErrorMsg to an empty string when the task completes. Error code for the first error message in the session. If there is no error, the PowerCenter Server sets FirstErrorCode to 0 when the session completes. The first error message in the session. If there is no error, the PowerCenter Server sets FirstErrorMsg to an empty string when the task completes. Status of the previous task in the workflow that the PowerCenter Server ran. Statuses include: - ABORTED - FAILED - STOPPED - SUCCEEDED Use these key words when writing expressions to evaluate the status of the previous task. For more information, see “Evaluating Task Status in a Workflow” on page 107. Total number of rows the PowerCenter Server failed to read from the source. Total number of rows successfully read from the sources. Task Types Decision Datatype Integer

EndTime ErrorCode

All tasks All tasks

Date/time Integer

ErrorMsg

All tasks

Nstring*

FirstErrorCode

Session

Integer

FirstErrorMsg

Session

Nstring*

PrevTaskStatus

All tasks

Integer

SrcFailedRows SrcSuccessRows

Session Session

Integer Integer

Using Workflow Variables

105

Table 4-1. Task-Specific Workflow Variables Task-Specific Variables StartTime Status Description Date and time the associated task started. Status of the previous task in the workflow. Task statuses include: - ABORTED - DISABLED - FAILED - NOTSTARTED - STARTED - STOPPED - SUCCEEDED Use these key words when writing expressions to evaluate the status of the current task. For more information, see “Evaluating Task Status in a Workflow” on page 107. Total number of rows the PowerCenter Server failed to write to the target. Total number of rows successfully written to the targets. Total number of transformation errors. Task Types All tasks All tasks Datatype Date/time Integer

TgtFailedRows TgtSuccessRows TotalTransErrors

Session Session Session

Integer Integer Integer

* Variables of type Nstring can have a maximum length of 600 characters.

All pre-defined workflow variables except Status have a default value of null. The PowerCenter Server uses the default value of null when it encounters a pre-defined variable from a task that has not yet run in the workflow. Therefore, expressions and link conditions that depend upon tasks not yet run are valid. The default value of Status is NOTSTARTED.

Using Pre-Defined Workflow Variables in Expressions
When you use a workflow variable in an expression, the PowerCenter Server evaluates the expression and returns True or False. If the condition evaluates to true, the PowerCenter Server runs the next task. The PowerCenter Server writes an entry in the workflow log similar to the following message:
INFO : LM_36506 : (1980|1040) Link [Session2 --> Session3]: condition is TRUE for the expression [$Session2.PrevTaskStatus = SUCCEEDED].

The Expression Editor displays the pre-defined workflow variables on the Pre-defined tab. The Workflow Manager groups task-specific variables by task and lists system variables under the Built-in node. To use a variable in an expression, double-click the variable. The Expression Editor displays task-specific variables in the Expression field in the following format:
$<TaskName>.<Pre-definedVariable>

106

Chapter 4: Working with Workflows

Figure 4-9 shows the Expression Editor with an expression using a task-specific workflow variable and keyword:
Figure 4-9. Expression Using a Pre-Defined Workflow Variable

Evaluating Task Status in a Workflow
You can use Status and PrevTaskStatus in link conditions to test the status of tasks in a workflow. Use Status to test the status of the previous task in the workflow. Use PrevTaskStatus to test the status of the previous task in the workflow that the PowerCenter Server ran. Use PrevTaskStatus if you disable a task in the workflow. Status and PrevTaskStatus return the same value unless the condition uses a disabled task. Figure 4-10 shows a workflow with link conditions using Status:
Figure 4-10. Status Variable Example Previous Task in Workflow

Link condition:
$Session2.Status = SUCCEEDED

The PowerCenter Server returns value based on the previous task in the workflow, Session2.

When you run the workflow, the PowerCenter Server evaluates the link condition and returns the value based on the status of Session2.

Using Workflow Variables

107

Figure 4-11 shows a workflow with link conditions using PrevTaskStatus:
Figure 4-11. PrevTaskStatus Variable Example Previous Task Run Disabled Task

Link condition:
$Session2.PrevTaskStatus = SUCCEEDED

The PowerCenter Server returns value based on the previous task run, Session1.

When you run the workflow, the PowerCenter Server skips Session2 because the session is disabled. When the PowerCenter Server evaluates the link condition, it returns the value based on the status of Session1.
Tip: If you do not disable Session2, the PowerCenter Server returns the value based on the

status of Session2. You do not need to change the link condition when you enable and disable Session2.

User-Defined Workflow Variables
You can create your own variables within a workflow. When you create a variable in a workflow, it is valid only in that workflow. You can use the variable in tasks within that workflow. You can edit and delete user-defined workflow variables. You can use user-defined variables when you need to make a workflow decision based on criteria you specify. For example, suppose you create a workflow to load data to an orders database nightly. You also need to load a subset of this data to headquarters periodically, perhaps every tenth time you update the local orders database. You create separate sessions to update the local database and the one at headquarters. The workflow looks like Figure 4-12:
Figure 4-12. Sample Workflow Using Workflow Variable

108

Chapter 4: Working with Workflows

You can use a user-defined variable to determine when to run the session that updates the orders database at headquarters. To do this, set up the workflow as follows: 1. 2. 3. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run. Add a Start task and both sessions to the workflow. Place a Decision task after the session that updates the local orders database. Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. You can use the modulus (MOD) function to do this. 4. 5. Create an Assignment task to increment the $$WorkflowCount variable by one. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false.

When you do this, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs.

Start and Current Values
Conceptually, the PowerCenter Server holds two different values for a workflow variable during a workflow run:
♦ ♦

Start value of a workflow variable Current value of a workflow variable

The start value is the value of the variable at the start of the workflow. The start value could be a value defined in the parameter file for the variable, a value saved in the repository from the previous run of the workflow, a user-defined initial value for the variable, or the default value based on the variable datatype. The PowerCenter Server looks for the start value of a variable in the following order: 1. 2. 3. 4. Value in parameter file Value saved in the repository (if the variable is persistent) User-specified default value Datatype default value

For a list of datatype default values, see Table 4-2 on page 110. For example, you create a workflow variable in a workflow and enter a default value, but you do not define a value for the variable in a parameter file. The first time the PowerCenter Server runs the workflow, it evaluates the start value of the variable to the user-defined default value.

Using Workflow Variables

109

If you declare the variable as persistent, the PowerCenter Server saves the value of the variable to the repository at the end of the workflow run. The next time the workflow runs, the PowerCenter Server evaluates the start value of the variable as the value saved in the repository. If the variable is non-persistent, the PowerCenter Server does not save the value of the variable. The next time the workflow runs, the PowerCenter Server evaluates the start value of the variable as the user-specified default value. If you want to override the value saved in the repository before running a workflow, you need to define a value for the variable in a parameter file. When you define a workflow variable in the parameter file, the PowerCenter Server uses this value instead of the value saved in the repository or the configured initial value for the variable. The current value is the value of the variable as the workflow progresses. When a workflow starts, the current value of a variable is the same as the start value. The value of the variable can change as the workflow progresses if you create an Assignment task that updates the value of the variable. If the variable is persistent, the PowerCenter Server saves the current value of the variable to the repository at the end of a successful workflow run. If the workflow fails to complete, the PowerCenter Server does not update the value of the variable in the repository. The PowerCenter Server states the value saved to the repository for each workflow variable in the workflow log.

Datatype Default Values
If the PowerCenter Server cannot determine the start value of a variable by any other means, it uses a default value for the variable based on its datatype. For more information on how the PowerCenter Server determines start values for a variable, see “Start and Current Values” on page 109. Table 4-2 lists the datatype default values for user-defined workflow variables:
Table 4-2. Datatype Default Values for User-defined Workflow Variables Datatype Date/time Double Integer Nstring Workflow Manager Default Value 1/1/1753 A.D. 0 0 Empty string

Creating User-Defined Workflow Variables
You can create workflow variables for a workflow in the workflow properties.

110

Chapter 4: Working with Workflows

To create a workflow variable: 1. 2.

In the Workflow Designer, create a new workflow or edit an existing one. Select the Variables tab.
Add Button

Validate Button

3.

Click Add and enter a name for the variable. The correct format for a user-defined workflow variable is $$VariableName. Do not use a single $ for a user-defined workflow variable. The single $ is reserved for system variables and pre-defined workflow variables. Workflow variable names are not case-sensitive.

4.

In the Datatype field, select the datatype for the new variable. You can select from the following datatypes:
♦ ♦ ♦ ♦

Date/time Double Integer Nstring

Variables of type Nstring can have a maximum length of 600 characters.
5.

Enable the Persistent option if you want the value of the variable retained from one execution of the workflow to the next. For more information, see “Start and Current Values” on page 109. Enter the default value for the variable in the Default field. If the default value is a null value, enable the Is Null option. To validate the default value of the new workflow variable, click the Validate button. Click Apply to save the new workflow variable. Click OK to close the workflow properties.

6. 7. 8. 9.

Using Workflow Variables

111

Scheduling a Workflow
You can schedule a workflow to run continuously, repeat at a given time or interval, or you can manually start a workflow. The PowerCenter Server runs a scheduled workflow as configured. By default, the workflow runs on demand. You can change the schedule settings by editing the scheduler. If you change schedule settings, the PowerCenter Server reschedules the workflow according to the new settings. Each workflow has an associated scheduler. A scheduler is a repository object that contains a set of schedule settings. You can create a non-reusable scheduler for the workflow. Or, you can create a reusable scheduler so you can use the same set of schedule settings for workflows in the folder. The Workflow Manager marks a workflow invalid if you delete the scheduler associated with the workflow. If you choose a different PowerCenter Server for the workflow or restart the PowerCenter Server, it reschedules all workflows. This includes workflows that are scheduled to run continuously but whose start time has passed. You must manually reschedule workflows whose start time has passed if they are not scheduled to run continuously. The PowerCenter Server does not run the workflow if:

The prior workflow run fails. When a workflow fails, the PowerCenter Server removes the workflow from the schedule, and you must manually reschedule it. You can reschedule the workflow in the Workflow Manager or using pmcmd. In the Workflow Manager Navigator window, right-click the workflow and select Schedule Workflow. For more information about the pmcmd scheduleworkflow command, see “Scheduleworkflow” on page 604. You remove the workflow from the schedule. You can remove the workflow from the schedule in the Workflow Manager or using pmcmd. In the Workflow Manager Navigator window, right-click the workflow and select Unschedule Workflow. For more information about the pmcmd unscheduleworkflow command, see “Unscheduleworkflow” on page 610.

Note: The PowerCenter Server schedules the workflow in the time zone of the PowerCenter

Server machine. For example, the PowerCenter Client is in your current time zone and the PowerCenter Server is in a time zone two hours later. If you schedule the workflow to start at 9 a.m., it starts at 9 a.m. in the time zone of the PowerCenter Server machine and 7 a.m. current time.
To schedule a workflow: 1. 2. 3.

In the Workflow Designer, open the workflow. Choose Workflows-Edit. In the Scheduler tab, choose Non-reusable if you want to create a non-reusable set of schedule settings for the workflow. Choose Reusable if you want to select an existing reusable scheduler for the workflow.

112

Chapter 4: Working with Workflows

Note: If you do not have a reusable scheduler in the folder, you must create one before you

choose Reusable. The Workflow Manager displays a warning message if you do not have an existing reusable scheduler.
4.

Click the right side of the Scheduler field to edit scheduling settings for the scheduler.

Edit scheduler settings.

For a complete list of scheduler options, see “Configuring Scheduler Settings” on page 114.
5.

If you select Reusable, choose a reusable scheduler from the Scheduler Browser dialog box.

6.

Click OK.

To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unschedule Workflow.

Scheduling a Workflow

113

To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.

Creating a Reusable Scheduler
For each folder, the Workflow Manager allows you to create reusable schedulers so you can reuse the same set of scheduling settings for workflows in the folder. Use a reusable scheduler so you do not need to configure the same set of scheduling settings in each workflow. When you delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, you must edit them and replace the missing scheduler.
To create a reusable scheduler: 1. 2.

In the Workflow Designer, choose Workflows-Schedulers. Click Add to add a new scheduler.

3. 4.

In the General tab, enter a name for the scheduler. Configure the scheduler settings in the Scheduler tab. For a complete list of scheduler settings, see Table 4-3 on page 115.

Configuring Scheduler Settings
Configure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule.

114

Chapter 4: Working with Workflows

Figure 4-13 shows the Schedule tab:
Figure 4-13. Schedule tab

Table 4-3 describes the settings on the Schedule tab:
Table 4-3. Schedule Tab Settings Scheduler Options Run Options: Run On Server Initialization/ Run On Demand/Run Continuously Required/ Optional Optional Description Indicates the workflow schedule type. If you select Run On Server Initialization, the PowerCenter Server runs the workflow as soon as the server is initialized. The PowerCenter Server then starts the next run of the workflow according to settings in Schedule Options. If you select Run On Demand, the PowerCenter Server runs the workflow when you start the workflow manually. If you select Run Continuously, the PowerCenter Server runs the workflow as soon as the server initializes. The PowerCenter Server then starts the next run of the workflow as soon as it finishes the previous run. Required if you select Run On Server Initialization, or if you do not choose any setting in Run Options. If you select Run Once, the PowerCenter Server runs the workflow once, as scheduled in the scheduler. If you select Run Every, the PowerCenter Server runs the workflow at regular intervals, as configured. If you select Customized Repeat, the PowerCenter Server runs the workflow on the dates and times specified in the Repeat dialog box. When you select Customized Repeat, click Edit to open the Repeat dialog box. The Repeat dialog box allows you to schedule specific dates and times for the workflow run. The selected scheduler appears at the bottom of the page.

Schedule Options: Run Once/Run Every/ Customized Repeat

Optional

Scheduling a Workflow

115

Table 4-3. Schedule Tab Settings Scheduler Options Start Options: Start Date/Start Time Required/ Optional Optional Description Start Date indicates the date on which the PowerCenter Server begins the workflow schedule. Start Time indicates the time at which the PowerCenter Server begins the workflow schedule. Required if the workflow schedule is Run Every or Customized Repeat. If you select End On, the PowerCenter Server stops scheduling the workflow in the selected date. If you select End After, the PowerCenter Server stops scheduling the workflow after the set number of workflow runs. If you select Forever, the PowerCenter Server schedules the workflow as long as the workflow does not fail.

End Options: End On/End After/Forever

Required/ Optional

Customizing Repeat Option
You can schedule the workflow to run once, run at an interval, or customize your own repeat option. Click the Edit button to open the Customized Repeat dialog box. Figure 4-14 shows the Customized Repeat dialog box:
Figure 4-14. Customized Repeat Dialog Box

116

Chapter 4: Working with Workflows

Table 4-4 describes options in the Customized Repeat dialog box:
Table 4-4. Repeat Dialog Box Options Repeat Option Repeat Every Required/ Optional Required Description Enter the numeric interval you would like the PowerCenter Server to schedule the workflow, and then select Days, Weeks, or Months, as appropriate. If you select Days, select the appropriate Daily Frequency settings. If you select Weeks, select the appropriate Weekly and Daily Frequency settings. If you select Months, select the appropriate Monthly and Daily Frequency settings. Required to enter a weekly schedule. Select the day or days of the week on which you would like the PowerCenter Server to run the workflow. Required to enter a monthly schedule. If you select Run On Day, select the dates on which you want the workflow scheduled on a monthly basis. The PowerCenter Server schedules the workflow to run on the selected dates. If you select a numeric date exceeding the number of days within a given month, the PowerCenter Server schedules the workflow for the last day of the month, including leap years. For example, if you schedule the workflow to run on the 31st of every month, the PowerCenter Server schedules the session on the 30th of the following months: April, June, September, and November. If you select Run On The, select the week(s) of the month, then day of the week on which you want the workflow to run. For example, if you select Second and Last, then select Wednesday, the PowerCenter Server schedules the workflow to run on the second and last Wednesday of every month. Enter the number of times you would like the PowerCenter Server to run the workflow on any day the session is scheduled. If you select Run Once, the PowerCenter Server schedules the workflow once on the selected day, at the time entered on the Start Time setting on the Time tab. If you select Run Every, enter Hours and Minutes to define the interval at which the PowerCenter Server runs the workflow. The PowerCenter Server then schedules the workflow at regular intervals on the selected day. The PowerCenter Server uses the Start Time setting for the first scheduled workflow of the day.

Weekly Monthly

Required/ Optional Required/ Optional

Daily

Optional

Editing Scheduler Settings
You can edit scheduler settings for both non-reusable and reusable schedulers.

Non-reusable schedulers. When you configure or edit a non-reusable scheduler, check in the workflow to allow the schedule to automatically take effect. You can update the schedule manually with the workflow checked out. Right-click the workflow in the Navigator, and select Schedule Workflow. Note that the changes are applied only to the latest checked-in version of the workflow.

Scheduling a Workflow

117

Reusable schedulers. When you edit settings for a reusable scheduler, the repository creates a new version of the scheduler and increments the version number by one. To update a workflow with the latest schedule, check in the scheduler after you edit it. When you configure a reusable scheduler for a new workflow, you must check in both the workflow and the scheduler to enable the schedule to take effect. Thereafter, when you check in the scheduler after revising it, the workflow schedule is updated automatically even if it is checked out. You need to update the workflow schedule manually if you do not check in the scheduler. To update a workflow schedule manually, right-click the workflow in the Navigator, and select Schedule Workflow. Note that the new schedule is implemented only for latest version of the workflow that is checked in. Workflows that are checked out are not updated with the new schedule.

Disabling Workflows
You may want to disable the workflow while you edit it. This prevents the PowerCenter Server from running the workflow on its schedule. Select the Disable Workflows option on the General tab of the workflow properties. The PowerCenter Server does not run disabled workflows until you clear the Disable Workflows option. Once you clear the Disable Workflows option, the PowerCenter Server reschedules the workflow.

118

Chapter 4: Working with Workflows

Validating a Workflow
Before you can run a workflow, you must validate it. When you validate the workflow, you validate all task instances in the workflow, including nested worklets. The Workflow Manager validates the following properties:
♦ ♦ ♦

Expressions. Expressions in the workflow must be valid. Tasks. Non-reusable task and Reusable task instances in the workflow must follow validation rules. Scheduler. If the workflow uses a reusable scheduler, the Workflow Manager verifies that the scheduler exists.

The Workflow Manager also verifies that you linked each task properly. For example, you must link the Start task to at least one task in the workflow.
Note: The Workflow Manager validates Session tasks separately. If a session is invalid, the

workflow may still be valid. For more information about session validation, see “Validating a Session” on page 195.

Expression Validation
The Workflow Manager validates all expressions in the workflow. You can enter expressions in the Assignment task, Decision task, and link conditions. The Workflow Manager writes any error message to the Output window. Expressions in link conditions and Decision task conditions must evaluate to a numerical value. Workflow variables used in expressions must exist in the workflow. The Workflow Manager marks the workflow invalid if a link condition is invalid.

Task Validation
The Workflow Manager validates each task in the workflow as you create it. When you save or validate the workflow, the Workflow Manager validates all tasks in the workflow except Session tasks. It marks the workflow invalid if it detects any invalid task in the workflow. The Workflow Manager verifies that attributes in the tasks follow validation rules. For example, the user-defined event you specify in an Event task must exist in the workflow. The Workflow Manager also verifies that you linked each task properly. For example, you must link the Start task to at least one task in the workflow. For details on task validation rules, see “Validating Tasks” on page 139. When you delete a reusable task, the Workflow Manager removes the instance of the deleted task from workflows. The Workflow Manager also marks the workflow invalid when you delete a reusable task used in a workflow. The Workflow Manager verifies that there are no duplicate task names in a folder, and that there are no duplicate task instances in the workflow.
Validating a Workflow 119

Workflow Properties Validation
The Workflow Manager marks the workflow invalid if the scheduler you specify for the workflow does not exist in the folder.

Running Validation
When you validate a workflow, you validate worklet instances, worklet objects, and all other nested worklets in the workflow. You validate task instances and worklets, regardless of whether you have edited them. The Workflow Manager validates the worklet object using the same validation rules for workflows. The Workflow Manager validates the worklet instance by verifying attributes in the Parameter tab of the worklet instance. For details on validating worklets, see “Validating Worklets” on page 171. If the workflow contains nested worklets, you can select a worklet to validate the worklet and all other worklets nested under it. To validate a worklet and its nested worklets, right-click the worklet and choose Validate.

Example
For example, you have a workflow that contains a non-reusable worklet called Worklet_1. Worklet_1 contains a nested worklet called Worklet_a. The workflow also contains a reusable worklet instance called Worklet_2. Worklet_2 contains a nested worklet called Worklet_b. In the example workflow in Figure 4-15, the Workflow Manager validates links, conditions, and tasks in the workflow. The Workflow Manager validates all tasks in the workflow, including tasks in Worklet_1, Worklet_2, Worklet_a, and Worklet_b. You can validate a part of the workflow. Right-click Worklet_1 and choose Validate. The Workflow Manager validates all tasks in Worklet_1 and Worklet_a. Figure 4-15 shows the example workflow:
Figure 4-15. Example Workflow - Validation Worklet_1: Non-reusable worklet. Contains a nested worklet called Worklet_a. Worklet_2: Reusable worklet. Contains a nested worklet called Worklet_b.

Validating Multiple Workflows
You can validate multiple workflows or worklets without fetching them into the workspace. To validate multiple workflows, you must select and validate the workflows from a query
120 Chapter 4: Working with Workflows

results view or a view dependencies list. When you validate multiple workflows, the validation does not include sessions, nested worklets, or reusable worklet objects in the workflows.
Note: If you are using the Repository Manager, you can select and validate multiple workflows

from the Repository Navigator. You can save and optionally check in workflows that change from invalid to valid status. For more information about validating multiple objects, see “Validating Multiple Objects” in the Repository Guide.
To validate multiple workflows: 1. 2.

Select workflows from either a query list or a view dependencies list. Right-click one of the selected workflows and choose Validate. The Validate Objects dialog box displays.

3.

Choose whether to save objects and check in objects that you validate.

Validating a Workflow

121

Running the Workflow
Before you can run a workflow, you must save changes in the folder and select a PowerCenter Server to run the workflow. You can manually start a workflow configured to run on demand or to run on a schedule. Use the Workflow Manager, Workflow Monitor, or pmcmd to run a workflow. You can choose to run the entire workflow, part of a workflow, or a task in the workflow.

Selecting a Server to Run the Workflow
You must choose a server to run the workflow. If you only register one server, the Workflow Manager lists the single registered PowerCenter Server that runs the workflow. For PowerCenter repositories with multiple servers, the Workflow Manager lists all servers.
To select a server to run a workflow: 1. 2. 3.

In the Workflow Designer, open the Workflow. Choose Workflows-Edit. The Edit Workflow dialog box appears. Click the Select Server button on the General tab. A list of registered servers appear.

Select a server.

4. 5.

Select the server on which you want to run the workflow. Click OK twice to select the server for the workflow.

Assigning the PowerCenter Server to a Workflow
After you register the PowerCenter Server, you can assign it to workflows you want to run on that server. This allows you to assign the PowerCenter Server to multiple workflows without

122

Chapter 4: Working with Workflows

editing each workflow property individually. To assign the PowerCenter Server to multiple workflows, you must first close all folders in the repository. You can also choose a PowerCenter Server to run a specific workflow by editing the workflow property. For details, see “Running a Workflow” on page 124. To assign the PowerCenter Server to workflows, you must have Super User privilege.
To assign the PowerCenter Server: 1. 2.

Close all folders in the repository. Choose Server-Assign Server. or Right-click the server name in the Navigator and choose Assign Server. The Assign Server dialog box opens.

Select a server to assign. Select a folder.

Assign a server to a workflow.

3. 4. 5. 6.

From the Choose Server list, select the server you want to assign. From the Show Folder list, select the folder you want to view. Or, choose All to view workflows in all folders in the repository. Select the Select check box for each workflow you want to run on the PowerCenter Server. Click Assign.

Removing an Assigned Server from a Workflow
You can remove an assigned server from a workflow in the Assign Server dialog box. Perform the following steps to remove an assigned server from a workflow.

Running the Workflow

123

To remove an assigned server: 1. 2. 3. 4. 5. 6.

Close all folders in the repository. Choose Server-Assign Server. From the Choose Server list, select None. From the Show Folder list, select the folder you want to view. Or, choose All to view workflows in all folders in the repository. Select the workflows from which you want to remove the assigned server. Click Assign.

Running a Workflow
When you choose Workflows-Start, the PowerCenter Server runs the entire workflow. To run a workflow from pmcmd, use the startworkflow command. For details on using pmcmd, see “Using pmcmd” on page 581.
To start a workflow with the Workflow Manager: 1. 2. 3.

Connect to a repository and open the folder containing the workflow. From the Navigator, select the workflow that you want to start. Right-click the workflow in the Navigator and choose Start Workflow. The PowerCenter Server starts running the entire workflow.

When you choose Start Workflow, the workflow runs on the PowerCenter Server you selected in the workflow properties. You can also use the Choose Server toolbar button to run the workflow on a different server. After the Workflow Manager sends a request to the PowerCenter Server, the Output window displays the PowerCenter Server response. If an error displays, check the workflow log or session log for error messages. You can also manually start a workflow by right-clicking in the Workflow Designer workspace and choosing Start Workflow.

Running a Part of a Workflow
You can choose to run only part of the workflow. To run part of the workflow, right-click the task that you want the PowerCenter Server to begin running and choose Start Workflow From Task. The PowerCenter Server runs the workflow from the selected task to the end of the workflow. When you run a workflow from a selected task, the PowerCenter Server runs the workflow on the registered server you choose in the workflow properties. The PowerCenter Server logs messages in the workflow log when you start a workflow from a task.

124

Chapter 4: Working with Workflows

To run a part of a workflow from pmcmd, use the startfrom flag of the startworkflow command. For details on using pmcmd, see “Using pmcmd” on page 581.
To run a part of a workflow: 1. 2.

Connect to the folder containing the workflow. In the Navigator window, drill down the Workflow folder to show the tasks in the workflow. or In the Workflow Designer workspace, select the task from which you want the PowerCenter Server to begin running.

3. 4.

Right-click the task on which you want the PowerCenter Server to begin running. Choose Start Workflow From Task.

For example, you have a workflow with multiple tasks. The example workflow in Figure 4-16 contains two branches. If you want to run the tasks commandtask2, e_email2, and command3, you start the workflow from commandtask2. All subsequent tasks in the branch will run.
Figure 4-16. Running Part of a Workflow - Example

When you start the workflow from commandtask2, the PowerCenter Server runs this portion of the workflow.

Running a Task in the Workflow
When you start a task in the workflow, the Workflow Manager locks the entire workflow so another user cannot start the workflow. The PowerCenter Server runs the selected task. It does not run the rest of the workflow. To run a task using the Workflow Manager, select the task in the Workflow Designer workspace. Right-click the task and choose Start Task. You can select a task to start using menu commands in the Workflow Manager. In the Navigator window, drill down the Workflow folder to show the tasks in the workflow you want to start. Right-click the task you want to start and choose Start Task.

Running the Workflow

125

To start a task in a workflow from pmcmd, use the starttask command. For details on using pmcmd, see “Using pmcmd” on page 581.

126

Chapter 4: Working with Workflows

Suspending the Workflow
When a task in the workflow fails, you might want to suspend the workflow, fix the error, and resume or recover the workflow. The PowerCenter Server suspends the workflow if you enable the Suspend On Error option in the workflow properties. You can optionally set a suspension email so the PowerCenter Server sends an email when it suspends a workflow. When you enable the Suspend On Error option, the PowerCenter Server suspends the workflow when one of the following fails:
♦ ♦ ♦ ♦

Session Command Worklet Email

When a task fails in the workflow, the PowerCenter Server stops running tasks in its path. The PowerCenter Server does not evaluate the output link of the failed task. If no other task is running in the workflow, the Workflow Monitor displays the status of the workflow as “Suspended.” If one or more tasks are still running in the workflow when a task fails, the PowerCenter Server stops running the failed task and continues running tasks in other paths. The Workflow Monitor displays the status of the workflow as “Suspending.” When the status of the workflow is “Suspended” or “Suspending,” you can fix the error, such as a target database error, and resume or recover the workflow in the Workflow Monitor. When you resume or recover a workflow, the PowerCenter Server restarts the failed tasks and continues evaluating the rest of the tasks in the workflow. The PowerCenter Server does not run any task that already completed successfully.
Note: Do not edit a workflow or the tasks inside a workflow when the PowerCenter Server

suspends a workflow. For details about resuming the workflow, see “Resuming a Workflow or Worklet” on page 417. For details about recovering the workflow, see “Recovering a Workflow or Worklet” on page 417.
To suspend a workflow: 1. 2.

In the Workflow Designer, open the workflow. Choose Workflows-Edit.

Suspending the Workflow

127

3.

In the General tab, enable Suspend On Error.

4.

Click OK.

Configuring Suspension Email
You can configure the workflow so that the PowerCenter Server sends an email when it suspends a workflow. Select an existing reusable email task for the suspension email. When a task fails, the PowerCenter Server starts suspending the workflow and sends the suspension email. If another task fails while the PowerCenter Server is suspending the workflow, you do not get the suspension email again. The PowerCenter Server sends out a suspension email if another task fails after you resume the workflow. For details on configuring suspension emails, see “Working with Suspension Email” on page 339.

128

Chapter 4: Working with Workflows

Stopping or Aborting the Workflow
You can specify when and how you want the PowerCenter Server to stop or abort a workflow by using the Control task in the workflow. After you start a workflow, you can stop or abort it through the Workflow Monitor or pmcmd. You can issue the stop or abort command at any time during the execution of a workflow. You can stop or abort a workflow by performing one of the following actions:
♦ ♦ ♦

Use a Control task in the workflow. For details, see “Working with the Control Task” on page 147. Issue a stop or abort command in the Workflow Monitor. For details, see “Monitoring Workflows” on page 401. Issue a stop or abort command in pmcmd. For details, see “pmcmd Reference” on page 594.

You can also stop or abort a task within a workflow. For details on stopping the Session task, see “Stopping and Aborting a Session” on page 200.

Server Handling of Stop and Abort
When you stop a workflow, the PowerCenter Server tries to stop all the tasks that are currently running in the workflow. If the workflow contains a worklet, the PowerCenter Server also tries to stop all the tasks that are currently running in the worklet. If it cannot stop the workflow, you need to abort the workflow. The PowerCenter Server can stop the following tasks completely:
♦ ♦ ♦ ♦ ♦

Session Command Timer Event-Wait Worklet

When you stop a Command task that contains multiple commands, the PowerCenter Server finishes executing the current command and does not execute the rest of the commands. The PowerCenter Server cannot stop tasks such as the Email task. For example, if the PowerCenter Server has already started sending an email when you issue the stop command, the PowerCenter Server finishes sending the email before it stops running the workflow. The PowerCenter Server aborts the workflow if the Repository Server process shuts down.

Stopping or Aborting a Task
You can stop or abort a task within a workflow from the Workflow Monitor. When you stop or abort a task, the PowerCenter Server stops processing the task. The PowerCenter Server does not process other tasks in the path of the stopped or aborted task. The PowerCenter
Stopping or Aborting the Workflow 129

Server continues processing concurrent tasks in the workflow. If the PowerCenter Server cannot stop the task, you can abort the task. When you abort a task, the PowerCenter Server kills the process on the task. The PowerCenter Server continues processing concurrent tasks in the workflow when you abort a task. You can also stop or abort a worklet. The PowerCenter Server stops and aborts a worklet similar to stopping and aborting a task. The PowerCenter Server stops the worklet while executing concurrent tasks in the workflow. You can also stop or abort tasks within a worklet.

Stopping or Aborting a Session Task
If the PowerCenter Server is executing a Session task when you issue the stop command, the PowerCenter Server stops reading data. It continues processing and writing data and committing data to targets. If the PowerCenter Server cannot finish processing and committing data, you can issue the abort command. The PowerCenter Server handles the abort command for the Session task like the stop command, except it has a timeout period of 60 seconds. If the PowerCenter Server cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session. For details on stopping or aborting a session, see “Stopping and Aborting a Session” on page 200.

130

Chapter 4: Working with Workflows

Chapter 5

Working with Tasks
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 132 Creating a Task, 133 Configuring Tasks, 135 Validating Tasks, 139 Working with the Assignment Task, 140 Working with the Command Task, 143 Working with the Control Task, 147 Working with the Decision Task, 149 Working with Event Tasks, 153 Working with the Timer Task, 161

131

Overview
The Workflow Manager contains many types of tasks to help you build workflows and worklets. You can create reusable tasks in the Task Developer. Or, create and add tasks in the Workflow or Worklet Designer as you develop the workflow. Table 5-1 summarizes workflow tasks available in Workflow Manager:
Table 5-1. Workflow Tasks Task Name Assignment Command Tool Workflow Designer Worklet Designer Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer Workflow Designer Worklet Designer Reusable No Yes Description Assigns a value to a workflow variable. For details, see “Working with the Assignment Task” on page 140. Specifies shell commands to run during the workflow. You can choose to run the Command task only if the previous task in the workflow completes. For details, see “Working with the Command Task” on page 143. Stops or aborts the workflow. For details, see “Working with the Control Task” on page 147. Specifies a condition to evaluate in the workflow. Use the Decision task to create branches in a workflow. For details, see “Working with the Decision Task” on page 149. Sends email during the workflow. For details, see “Sending Email” on page 319. Represents the location of a user-defined event. The Event-Raise task triggers the user-defined event when the PowerCenter Server runs the Event-Raise task. For details, see “Working with Event Tasks” on page 153. Waits for a user-defined or a pre-defined event to occur. Once the event occurs, the PowerCenter Server completes the rest of the workflow. For details, see “Working with Event Tasks” on page 153. Set of instructions to run a mapping. For details, see “Working with Sessions” on page 173. Waits for a specified period of time to run the next task. For details, see “Working with Event Tasks” on page 153.

Control Decision

No No

Email

Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer

Yes

Event-Raise

No

Event-Wait

Workflow Designer Worklet Designer

No

Session

Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer

Yes

Timer

No

The Workflow Manager validates tasks attributes and links. If a task is invalid, the workflow becomes invalid. Workflows containing invalid sessions may still be valid. For details on validating tasks, see “Validating Tasks” on page 139.

132

Chapter 5: Working with Tasks

Creating a Task
You can create tasks in the Task Developer, or you can create them in the Workflow Designer or the Worklet Designer as you develop the workflow or worklet. Tasks you create in the Task Developer are reusable. Tasks you create in the Workflow Designer and Worklet Designer are non-reusable by default. For details on reusable tasks, see “Reusable Workflow Tasks” on page 135.

Creating a Task in the Task Developer
You can create the following three types of tasks in the Task Developer:
♦ ♦ ♦

Command Session Email

Perform the following steps to create tasks in the Task Developer.
To create a task in the Task Developer: 1.

In the Task Developer, choose Tasks-Create. The Create Task dialog box appears.

2. 3. 4. 5.

Select the task type you want to create, Command, Session, or Email. Enter a name for the task. For session tasks, select the mapping you want to associate with the session. Click Create. The Task Developer creates the workflow task.

6.

Click Done to close the Create Task dialog box.

Creating a Task in the Workflow or Worklet Designer
You can create and add tasks in the Workflow Designer or Worklet Designer as you develop the workflow or worklet. You can create any type of task in the Workflow Designer or Worklet Designer. Tasks you create in the Workflow Designer or Worklet Designer are non-reusable. Edit the General tab of the task properties to promote a non-reusable task to a reusable task.

Creating a Task

133

Perform the following steps to create tasks in the Workflow Designer or Worklet Designer.
To create tasks in the Workflow Designer or Worklet Designer: 1. 2. 3. 4. 5.

In the Workflow Designer or Worklet Designer, open a workflow or worklet. Choose Tasks-Create. Select the type of task you want to create. Enter a name for the task. Click Create. The Workflow Designer or Worklet Designer creates the task and adds it to the workspace.

6.

Click Done.

You can also use the Tasks toolbar to create and add tasks to the workflow. Click the button on the Tasks toolbar for the task you want to create. Click again in the Workflow Designer or Worklet Designer workspace to create and add the task. The Workflow Designer or Worklet Designer creates the task with a default task name when you use the Tasks toolbar.

134

Chapter 5: Working with Tasks

Configuring Tasks
After you create the task, you can configure general task options on the General tab. For each task instance in the workflow, you can configure how the PowerCenter Server runs the task and the other objects associated with the selected task. You can also disable the task so you can run rest of the workflow without the selected task. Figure 5-1 displays the General tab in the Edit Tasks dialog box:
Figure 5-1. General Tab - Edit Tasks Dialog Box

When you use a task in the workflow, you can edit the task in the Workflow Designer and configure the following task options in the General tab:
♦ ♦ ♦ ♦

Treat input link as AND or OR. Choose to have the PowerCenter Server run the task when all or one of the input link conditions evaluates to True. Disable this task. Choose to disable the task so you can run the rest of the workflow without the task. Fail parent if this task fails. Choose to fail the workflow or worklet containing the task if the task fails. Fail parent if this task does not run. Choose to fail the workflow or worklet containing the task if the task does not run.

Reusable Workflow Tasks
Workflows can contain reusable task instances and non-reusable tasks. Non-reusable tasks exist within a single workflow. Reusable tasks can be used in multiple workflows in the same folder.

Configuring Tasks

135

You have the option to create any task as non-reusable or reusable. Tasks you create in the Task Developer are reusable. Tasks you create in the Workflow Designer are non-reusable by default. However, you can edit the general properties of a task to promote it to a reusable task. The Workflow Manager stores each reusable task separate from the workflows that use the task. You can view a list of reusable tasks in the Tasks node in the Navigator window. You can see a list of all reusable Session tasks in the Sessions node in the Navigator window.
To promote a non-reusable workflow task: 1. 2. 3. 4. 5.

In the Workflow Designer, double-click the task you want to make reusable. In the General tab of the Edit Task dialog box, check the Make Reusable option. When prompted whether you are sure you want to promote the task, click Yes. Click OK to return to the workflow. Choose Repository-Save.

The newly promoted task appears in the list of reusable tasks in the Tasks node in the Navigator window.

Instances and Inherited Changes
When you add a reusable task to a workflow, you add an instance of the task. The definition of the task exists outside the workflow, while an instance of the task exists in the workflow. You can edit the task instance in the Workflow Designer. Changes you make in the task instance exist only in the workflow. The task definition remains unchanged in the Task Developer. When you make changes to a reusable task definition in the Task Developer, the changes reflect in the instance of the task in the workflow only if you have not edited the instance.

Reverting Changes in Reusable Tasks Instances
When you edit an instance of a reusable task in the workflow, you can revert back to the settings in the task definition. When you change settings in the task instance, the Revert button appears. The Revert button appears after you override task properties. You cannot use the Revert button for settings that are read-only or locked by another user.

136

Chapter 5: Working with Tasks

Figure 5-2 displays the Revert button in the Mapping tab of a Session task:
Figure 5-2. Revert Button in Session Properties

AND or OR Input Links
For each task, you can choose to treat the input link as an AND link or an OR link. When a task has one input link, the PowerCenter Server processes the task when the previous object completes and the link condition evaluates to True. If you have multiple links going into one task, you can choose to have an AND input link so that the PowerCenter Server runs the task when all the link conditions evaluates to True. Or, you can choose to have an OR input link so that the PowerCenter Server runs the task as soon as any link condition evaluates to True. To set the type of input links, double-click the task to open the Edit Tasks dialog box. Select AND or OR for the input link type. For details on working with links and link conditions, see “Working with Links” on page 92.

Disabling Tasks
In the Workflow Designer, you can disable a workflow task so that the PowerCenter Server runs the workflow without the disabled task. The status of a disabled task is DISABLED. Disable a task in the workflow by selecting the Disable This Task option in the Edit Tasks dialog box.

Configuring Tasks

137

Failing Parent Workflow or Worklet
You can choose to fail the workflow or worklet if a task fails or does not run. The workflow or worklet that contains the task instance is called the parent. A task might not run when the input condition for the task evaluates to False. To fail the parent workflow or worklet if the task fails, double-click the task and select the Fail Parent If This Task Fails option in the General tab. When you select this option and a task fails, it does not prevent the other tasks in the workflow or worklet from running. Instead, the PowerCenter Server marks the status of the workflow or worklet as failed. If you have a session nested within multiple worklets, you must select the Fail Parent If This Task Fails option for each worklet instance to see the failure at the workflow level. To fail the parent workflow or worklet if the task does not run, double-click the task and select the Fail Parent If This Task Does Not Run option in the General tab. When you choose this option, the PowerCenter Server fails the parent workflow if a task did not run.
Note: The PowerCenter Server does not fail the parent workflow if you disable a task.

138

Chapter 5: Working with Tasks

Validating Tasks
You can validate reusable tasks in the Task Developer. Or, you can validate task instances in the Workflow Designer. When you validate a task, the Workflow Manager validates task attributes and links. For example, the user-defined event you specify in an Event tasks must exist in the workflow. The Workflow Manager uses the following rules to validate tasks:

Assignment. The Workflow Manager validates the expression you enter for the Assignment task. For example, the Workflow Manager verifies that you assigned a matching datatype value to the workflow variable in the assignment expression. Command. The Workflow Manager does not validate the shell command you enter for the Command task. Event-Wait. If you choose to wait for a pre-defined event, the Workflow Manager verifies that you specified a file to watch. If you choose to use the Event-Wait task to wait for a user-defined event, the Workflow Manager verifies that you specified an event. Event-Raise. The Workflow Manager verifies that you specified a user-defined event for the Event-Raise task. Timer. The Workflow Manager verifies that the variable you specified for the Absolute Time setting has the Date/Time datatype. Start. The Workflow Manager verifies that you linked the Start task to at least one task in the workflow.

♦ ♦

♦ ♦ ♦

When a task instance is invalid, the workflow using the task instance becomes invalid. When a reusable task is invalid, it does not affect the validity of the task instance used in the workflow. However, if a Session task instance is invalid, the workflow may still be valid. The Workflow Manager validates sessions differently. For details, see “Validating a Session” on page 195. To validate a task, select the task in the workspace and choose Tasks-Validate. Or, right-click the task in the workspace and choose Validate.

Validating Tasks

139

Working with the Assignment Task
The Assignment task allows you to assign a value to a user-defined workflow variable. To use an Assignment task in the workflow, first create and add the Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to user-defined variables. After you assign a value to a variable using the Assignment task, the PowerCenter Server uses the assigned value for the variable during the remainder of the workflow. You must create a variable before you can assign values to it. You cannot assign values to predefined workflow variables.
To create an Assignment task: 1.

In the Workflow Designer, click the Assignment icon on the Tasks toolbar.

Assignment Task Toolbar Icon

or Choose Tasks-Create. Select Assignment Task for the task type.
2.

Enter a name for the Assignment task. Click Create. Then click Done. The Workflow Designer creates and adds the Assignment task to the workflow.

3. 4.

Double-click the Assignment task to open the Edit Task dialog box. On the Expressions tab, click Add to add an assignment.

Add an assignment.

Open Button

5.

Click the Open button in the User Defined Variables field.

140

Chapter 5: Working with Tasks

The Select Variable dialog box appears.

6. 7.

Select the variable for which you want to assign a value. Click OK. Click the Edit button in the Expression field to open the Expression Editor. The Expression Editor shows pre-defined workflow variables, user-defined workflow variables, variable functions, and boolean and arithmetic operators.

8.

Enter the value or expression you want to assign. For example, if you want to assign the value 500 to the user-defined variable $$custno1, enter the number 500 in the Expression Editor. Validate the expression before you close the Expression Editor.
Working with the Assignment Task 141

9. 10.

Repeat steps 5-7 to add more variable assignments as necessary. Use the up and down arrows in the Expressions tab to change the order of the variable assignments. Click OK.

142

Chapter 5: Working with Tasks

Working with the Command Task
The Command task allows you to specify one or more shell commands to run during the workflow. For example, you can specify shell commands in the Command task to delete reject files, copy a file, or archive target files. You can use a Command task in the following ways:
♦ ♦

Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run shell commands. Pre- and post-session shell command. You can call a Command task as the pre- or postsession shell command for a Session task. For more information about specifying presession and post-session shell commands, see “Using Pre- or Post-Session Shell Commands” on page 188.

Note: You can use server variables or session variables in pre- and post-session shell commands.

You cannot use server variables or session variables in standalone Command tasks. The PowerCenter Server does not expand server variables or session variables in standalone Command tasks. Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows servers. For example, you might use a shell command to copy a file from one directory to another. For a Windows server you would use the following shell command to copy the SALES_ ADJ file from the source directory, L, to the target, H:
copy L:\sales\sales_adj H:\marketing\

For a UNIX server, you would use the following command to perform a similar operation:
cp sales/sales_adj marketing/

Each shell command runs in the same environment (UNIX or Windows) as the PowerCenter Server. Environment settings in one shell command script do not carry over to other scripts. To run all shell commands in the same environment, call a single shell script that invokes other scripts.

Using Session Parameters
You can use session parameters in pre- or post-session shell commands. For example, you might use an input file parameter instead of hard-coding the name of a source file.

Creating a Command Task
Perform the following steps to create a Command task.

Working with the Command Task

143

To create a Command task: 1.

In the Workflow Designer or the Task Developer, click the Command Task icon on the Tasks toolbar.

Command Task Icon

or Choose Task-Create. Select Command Task for the task type.
2. 3. 4.

Enter a name for the Command task. Click Create. Then click Done. Double-click the Command task in the workspace to open the Edit Tasks dialog box. In the Commands tab, click the Add button to add a command.

Add Button

Edit Button

5.

In the Name field, enter a name for the new command.

144

Chapter 5: Working with Tasks

6.

In the Command field, click the Edit button to open the Command Editor.

7. 8. 9. 10.

Enter the command you want to perform. Enter only one command in the Command Editor. Click OK to close the Command Editor. Repeat steps 3-8 to add more commands in the task. Click OK.

If you specify non-reusable shell commands for a session, you can promote the non-reusable shell commands to a reusable Command task. For details, see “Creating a Reusable Command Task from Pre- or Post-Session Commands” on page 191.

Executing Commands in the Command Task
The PowerCenter Server processes the shell commands in the order you specify them. You can choose to run a command only if the previous command completed successfully. Or, you choose to run all commands in the Command Task, regardless of the result of the previous command. If you configure multiple commands in a Command task to run on UNIX, each command runs in a separate shell. To run the next command only if the previous command completes successfully, select the “Run If Previous Completed” option in the Properties tab of the Command task. If you select the Run If Previous Completed option, when one of the commands in the Command task fails, the PowerCenter Server stops running the rest of the commands and fails the task. If you do not select the Run If Previous Completed option, the PowerCenter Server runs all the commands in the Command task and treats the task as completed, even if a command fails.

Working with the Command Task

145

Figure 5-3 shows the Run If Previous Completed option:
Figure 5-3. Run If Previous Completed Option

146

Chapter 5: Working with Tasks

Working with the Control Task
You can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition. A parent workflow or worklet is the workflow or worklet that contains the Control task.
To create a Control task: 1.

In the Workflow Designer, click the Control Task icon on the Tasks toolbar.

Control Task Icon

or Choose Tasks-Create. Select Control Task for the task type.
2.

Enter a name for the Control task. Click Create. Then click Done. The Workflow Manager creates and adds the Control task to the workflow.

3.

Double-click the Control task in the workspace to open it.

Working with the Control Task

147

4.

Configure control options on the Properties tab.

You can choose from the following control options:
Control Option Fail Me Description Marks the Control task as “Failed.” The PowerCenter Server fails the Control task if you choose this option. If you choose Fail Me in the Properties tab and choose Fail Parent If This Task Fails in the General tab, the PowerCenter Server fails the parent workflow. Marks the status of the workflow or worklet that contains the Control task as failed after the workflow or worklet completes. Stops the workflow or worklet that contains the Control task. Aborts the workflow or worklet that contains the Control task. Fails the workflow that is running. Stops the workflow that is running. Aborts the workflow that is running.

Fail Parent Stop Parent Abort Parent Fail Top-Level Workflow Stop Top-Level Workflow Abort Top-Level Workflow

148

Chapter 5: Working with Tasks

Working with the Decision Task
The Decision task allows you to enter a condition that determines the execution of the workflow, similar to a link condition. The Decision task has a pre-defined variable called $Decision_task_name.condition that represents the result of the decision condition. The PowerCenter Server evaluates the condition in the Decision task and sets the pre-defined condition variable to True (1) or False (0). You can specify one decision condition per Decision task. After the PowerCenter Server evaluates the Decision task, you can use the pre-defined condition variable in other expressions in the workflow to help you develop the workflow. Depending on the workflow, you might use link conditions instead of a Decision task. However, the Decision task simplifies the workflow. For details on link conditions, see “Working with Links” on page 92. If you do not specify a condition in the Decision task, the PowerCenter Server evaluates the Decision task to True.

Using the Decision Task
You can use the Decision task instead of multiple link conditions in a workflow. Instead of specifying multiple link conditions, use the pre-defined Condition variable in a Decision task to simplify link conditions.

Example
For example, you have a Command task that depends on the status of the three sessions in the workflow. You want the PowerCenter Server to run the Command task when any of the three sessions fails. To accomplish this, use a Decision task with the following decision condition:
$Q1_session.status = FAILED OR $Q2_session.status = FAILED OR $Q3_session.status = FAILED

You can then use the pre-defined condition variable in the input link condition of the Command task. Configure the input link with the following link condition:
$Decision.condition = True

Working with the Decision Task

149

Figure 5-4 shows the example workflow using a Decision task:
Figure 5-4. Example Workflow Using a Decision Task

You can configure the same logic in the workflow without the Decision task. Without the Decision task, you need to use three link conditions and treat the input links to the Command task as OR links. Figure 5-5 shows the example workflow without the Decision task:
Figure 5-5. Example Workflow without a Decision Task

You can further expand the example workflow in Figure 5-4. In Figure 5-4, the PowerCenter Server runs the Command task if any of the three Session tasks fails. Suppose now you want the PowerCenter Server to also run an Email task if all three Session tasks succeed.

150

Chapter 5: Working with Tasks

To do this, add an Email task and use the decision condition variable in the link condition. Figure 5-6 shows the expanded example workflow using a Decision task:
Figure 5-6. Expanded Example Workflow Using a Decision Task

$Decision.condition = True

$Decision.condition = False

Creating a Decision Task
Perform the following steps to create a Decision task.
To create a Decision task: 1.

In the Workflow Designer, click the Decision Task icon on the Tasks toolbar.

Decision Task Icon

or Choose Tasks-Create. Select Decision Task for the task type.
2.

Enter a name for the Decision task. Click Create. Then click Done. The Workflow Designer creates and adds the Decision task to the workspace.

Working with the Decision Task

151

3.

Double-click the Decision task to open it.

4. 5.

Click the Open button in the Value field to open the Expression Editor. In the Expression Editor, enter the condition you want the PowerCenter Server to evaluate. Validate the expression before you close the Expression Editor.

6.

Click OK.

152

Chapter 5: Working with Tasks

Working with Event Tasks
You can define events in the workflow to specify the sequence of task execution. The event is triggered based on the completion of the sequence of tasks. Use the following tasks to help you use events in the workflow:

Event-Raise task. Event-Raise task represents a user-defined event. When the PowerCenter Server runs the Event-Raise task, the Event-Raise task triggers the event. Use the EventRaise task with the Event-Wait task to define events. Event-Wait task. The Event-Wait task waits for an event to occur. Once the event triggers, the PowerCenter Server continues executing the rest of the workflow.

To coordinate the execution of the workflow, you may specify the following types of events for the Event-Wait and Event-Raise tasks:

Pre-defined event. A pre-defined event is a file-watch event. For pre-defined events, use an Event-Wait task to instruct the PowerCenter Server to wait for the specified indicator file to appear before continuing with the rest of the workflow. When the PowerCenter Server locates the indicator file, it starts the next task in the workflow. User-defined event. A user-defined event is a sequence of tasks in the workflow. Use an Event-Raise task to specify the location of the user-defined event in the workflow. A userdefined event is sequence of tasks in the branch from the Start task leading to the EventRaise task. When all the tasks in the branch from the Start task to the Event-Raise task complete, the Event-Raise task triggers the event. The Event-Wait task waits for the Event-Raise task to trigger the event before continuing with the rest of the tasks in its branch.

Example of User-Defined Events
Say you have four sessions you want to run in a workflow. You want Q1_session and Q2_session to run concurrently to save time. You also want to run Q3_session after Q1_session completes. You want to run Q4_session only when Q1_session, Q2_session, and Q3_session complete. Figure 5-7 shows how to accomplish this using the Event-Raise and Event-Wait tasks:
Figure 5-7. Example of User-Defined Event User-defined event: Q1Q3_Complete

Working with Event Tasks

153

Perform the following steps to configure the workflow shown in Figure 5-7: 1. 2. 3. 4. 5. 6. 7. 8. Link Q1_session and Q2_session concurrently. Add Q3_session after Q1_session. Declare an event called Q1Q3_Complete in the Events tab of the workflow properties. In the workspace, add an Event-Raise task after Q3_session. Specify the Q1Q3_Complete event in the Event-Raise task properties. This allows the Event-Raise task to trigger the event when Q1_session and Q3_session complete. Add an Event-Wait task after Q2_session. Specify the Q1Q3_Complete event for the Event-Wait task. Add Q4_session after the Event-Wait task. When the PowerCenter Server processes the Event-Wait task, it waits until the Event-Raise task triggers Q1Q3_Complete before it runs Q4_session. The PowerCenter Server runs Q1_session and Q2_session concurrently. When Q1_session completes, the PowerCenter Server runs Q3_session. The PowerCenter Server finishes executing Q2_session. The Event-Wait task waits for the Event-Raise task to trigger the event. The PowerCenter Server completes Q3_session. The Event-Raise task triggers the event, Q1Q3_complete. The PowerCenter Server runs Q4_session because the event, Q1Q3_Complete, has been triggered. The PowerCenter Server runs the Email task.

The PowerCenter Server runs the workflow shown in Figure 5-7 in the following order: 1. 2. 3. 4. 5. 6. 7. 8.

Working with Event-Raise Tasks
The Event-Raise task represents the location of a user-defined event. A user-defined event is the sequence of tasks in the branch from the Start task to the Event-Raise task. When the PowerCenter Server runs the Event-Raise task, the Event-Raise task triggers the user-defined event. To use an Event-Raise task, you must first declare the user-defined event. Then, create an Event-Raise task in the workflow to represent the location of the user-defined event you just declared. In the Event-Raise task properties, specify the name of a user-defined event.

154

Chapter 5: Working with Tasks

Declaring a User-Defined Event
Perform the following steps to declare a name for a user-defined event.
To declare a user-defined event: 1. 2.

In the Workflow Designer, select Workflow-Edit to open the workflow properties. Select the Events tab in the Edit Workflow dialog box.

Add a userdefined event.

3. 4.

Click Add to add an event name. Event name is not case-sensitive. Click OK.

Using the Event-Raise Task For a User-Defined Event
After you declare a user-defined event, use the Event-Raise task to represent the location of the event and to trigger the event. Perform the following steps to use an Even-Raise task.
To use an Event-Raise task: 1.

In the Workflow Designer workspace, create an Event-Raise task and place it in the workflow to represent the user-defined event you want to trigger. A user-defined event is the sequence of tasks in the branch from the Start task to the Event-Raise task.

Working with Event Tasks

155

2.

Double-click the Event-Raise task to open it.

3.

Click the Open button in the Value field on the Properties tab to open the Events Browser for user-defined events.

4. 5.

Choose an event in the Events Browser. Click OK twice to return to the workspace.

Working With Event-Wait Tasks
The Event-Wait task waits for a pre-defined event or a user-defined event. A pre-defined event is a file-watch event. When you use the Event-Wait task to wait for a pre-defined event, you

156

Chapter 5: Working with Tasks

specify an indicator file for the PowerCenter Server to watch. The PowerCenter Server waits for the indicator file to appear. Once the indicator file appears, the PowerCenter Server continues executing tasks after the Event-Wait task. Do not use the Event-Raise task to trigger the event when you wait for a pre-defined event. You can also use the Event-Wait task to wait for a user-defined event. To use the Event-Wait task for a user-defined event, you specify the name of the user-defined event in the EventWait task properties. The PowerCenter Server waits for the Event-Raise task to trigger the user-defined event. Once the user-defined event is triggered, the PowerCenter Server continues running tasks after the Event-Wait task.

Waiting for User-Defined Events
You can use the Event-Wait task to wait for a user-defined event. A user-defined event is triggered by the Event-Raise task. To wait for a user-defined event, you must first use an Event-Raise task to trigger the user-defined event.
To wait for a user-defined event: 1. 2.

In the workflow, create an Event-Wait task and double-click the Event-Wait task to open the Edit Task dialog box. In the Events tab of the Edit Tasks dialog box, select User-Defined.

Open the Events Browser.

Working with Event Tasks

157

3.

Click the Event button to open the Events Browser dialog box.

4. 5.

Select a user-defined event for the PowerCenter Server to wait. Click OK twice.

Waiting for Pre-Defined Events
To use a pre-defined event, you need a shell command, script, or batch file to create an indicator file. The file must be created or sent to a directory local to the PowerCenter Server. The file can be any format recognized by the PowerCenter Server operating system. You can choose to have the PowerCenter Server delete the indicator file after it detects the file, or you can manually delete the indicator file. The PowerCenter Server marks the status of the EventWait task as failed if it cannot delete the indicator file. When you specify the indicator file in the Event-Wait task, enter the directory in which the file will appear and the name of the indicator file. You must provide the absolute path for the file. The directory must be local to the PowerCenter Server. If you only specify the file name and not the directory, the PowerCenter Server looks for the indicator file in the system directory. For example, on Windows 2000, the system directory is c:\winnt\system32. You can enter the actual name of the file or use server variables to specify the location of the files. For more information on server variables, see “Server Variables” on page 46. The PowerCenter Server writes the time the file appears in the workflow log.
Note: Do not use a source or target file name as the indicator file name.

Perform the following steps to wait for a pre-defined event in the workflow.
To wait for a pre-defined event: 1.

Create an Event-Wait task and double-click the Event-Wait task to open it.

158

Chapter 5: Working with Tasks

2.

In the Events tab of the Edit Task dialog box, select Pre-defined.

3. 4.

Enter the path of the indicator file. If you want the PowerCenter Server to delete the indicator file after it detects the file, select the Delete Filewatch File option in the Properties tab.

5.

Click OK.

Enabling Past Events
By default, the Event-Wait task waits for the Event-Raise task to trigger the event. By default, the Event-Wait task does not check if the event already occurred. You can select the Enable Past Events option so that the PowerCenter Server checks if the event has already occurred.
Working with Event Tasks 159

When you select Enable Past Events, the PowerCenter Server continues executing the next tasks if the event already occurred. Select the Enable Past Events option in the Properties tab of the Event-Wait task.

160

Chapter 5: Working with Tasks

Working with the Timer Task
The Timer task allows you to specify the period of time to wait before the PowerCenter Server runs the next task in the workflow. You can choose to start the next task in the workflow at an exact time and date. You can also choose to wait a period of time after the start time of another task, workflow, or worklet before starting the next task. The Timer task has two types of settings:

Absolute time. You specify the exact time that the PowerCenter Server starts running the next task in the workflow. You may specify the exact date and time, or you can choose a user-defined workflow variable to specify the exact time. Relative time. You instruct the PowerCenter Server to wait for a specified period of time after the Timer task, the parent workflow, or the top-level workflow starts.

For example, you may have two sessions in the workflow. You want the PowerCenter Server wait ten minutes after the first session completes before it runs the second session. Use a Timer task after the first session. In the Relative Time setting of the Timer task, specify ten minutes from the start time of the Timer task. Figure 5-8 shows the example workflow using the Timer task:
Figure 5-8. Example Workflow Using the Timer Task

You can use a Timer task anywhere in the workflow after the Start task.
To create a Timer task: 1.

In the Workflow Designer, click the Timer task icon on the Tasks toolbar.

Timer Task Toolbar Icon

or Choose Tasks-Create. Select Timer Task for the task type.
2. 3.

Double-click the Timer task to open it. On the General tab, enter a name for the Timer task.

Working with the Timer Task

161

4.

Click the Timer tab to specify when the PowerCenter Server starts the next task in the workflow.

Specify attributes for Absolute Time or Relative Time described in Table 5-2:
Table 5-2. Timer Task Attributes Timer Attribute Absolute Time: Specify the exact time to start Absolute Time: Use this workflow date-time variable to calculate the wait Description The PowerCenter Server starts the next task in the workflow at the exact date and time you specify. Specify a user-defined date-time workflow variable. The PowerCenter Server starts the next task in the workflow at the time you choose. The Workflow Manager verifies that the variable you specify has the Date/Time datatype. The Timer task fails if the date-time workflow variable evaluates to NULL. Specify the period of time the PowerCenter Server waits to start executing the next task in the workflow. Choose this option to wait a specified period of time after the start time of the Timer task to run the next task. Choose this option to wait a specified period of time after the start time of the parent workflow/worklet to run the next task. Choose this option to wait a specified period of time after the start time of the top-level workflow to run the next task.

Relative time: Start after Relative time: from the start time of this task Relative time: from the start time of the parent workflow/ worklet Relative time: from the start time of the top-level workflow

162

Chapter 5: Working with Tasks

Chapter 6

Working with Worklets
This chapter covers the following topics:
♦ ♦ ♦ ♦

Overview, 164 Developing a Worklet, 165 Using Worklet Variables, 169 Validating Worklets, 171

163

Overview
A worklet is an object that represents a set of tasks. It can contain any task available in the Workflow Manager. You can run worklets inside a workflow. The workflow that contains the worklet is called the parent workflow. You can also nest a worklet in another worklet. Create a worklet when you want to reuse a set of workflow logic in several workflows. Use the Worklet Designer to create and edit worklets. When the PowerCenter Server runs a worklet, it expands the worklet. The PowerCenter Server then runs the worklet as it would any other workflow, executing tasks and evaluating links in the worklet. The worklet does not contain any scheduling or server information. To run a worklet, include the worklet in a workflow. The worklet runs on the PowerCenter Server you choose for the workflow. The Workflow Manager does not provide a parameter file or log file for worklets. The PowerCenter Server writes information about worklet execution in the workflow log.

Suspending Worklets
When you choose Suspend On Error for the parent workflow, the PowerCenter Server also suspends the worklet if a task in the worklet fails. When a task in the worklet fails, the PowerCenter Server stops executing the failed task and other tasks in its path. If no other task is running in the worklet, the worklet status is “Suspended.” If one or more tasks are still running in the worklet, the worklet status is “Suspending.” The PowerCenter Server suspends the parent workflow when the status of the worklet is “Suspended” or “Suspending.” For details on suspending workflows, see “Suspending the Workflow” on page 127.

164

Chapter 6: Working with Worklets

Developing a Worklet
To develop a worklet, you must first create a worklet. After you create a worklet, configure worklet properties and add tasks to the worklet. You can create reusable worklets in the Worklet Designer. You can also create non-reusable worklets in the Workflow Designer as you develop the workflow.

Creating a Reusable Worklet
Create reusable worklets in the Worklet Designer. You can view a list of reusable worklets in the Navigator Worklets node.
To create a reusable worklet: 1.

In the Worklet Designer, choose Worklets-Create. The Create Worklet dialog box appears.

2. 3.

Enter a name for the worklet. Click OK. The Worklet Designer creates a Start task in the worklet.

Creating a Non-Reusable Worklet
You can create non-reusable worklets in the Workflow Designer as you develop the workflow. Non-reusable worklets only exist in the workflow. You cannot use a non-reusable worklet in another workflow. After you create the worklet in the Workflow Designer, open the worklet to edit it in the Worklet Designer.

Developing a Worklet

165

You can promote non-reusable worklets to reusable worklets by selecting the Reusable option in the worklet properties. To rename non-reusable worklets, open the worklet properties in the Workflow Designer.
To create a non-reusable worklet: 1. 2. 3. 4. 5.

In the Workflow Designer, open a workflow. Choose Tasks-Create. Select Worklet for the Task type. Enter a name for the worklet. Click Create. The Workflow Designer creates the worklet and adds it to the workspace.

6.

Click Done.

Configuring Worklet Properties
When you use a worklet in a workflow, you can configure the same set of general task settings on the General tab as any other task. For example, you can make a worklet reusable, disable a worklet, configure the input link to the worklet, or fail the parent workflow based on the worklet. For details on these task settings, see “Configuring Tasks” on page 135. In addition to general task settings, you can configure the following worklet properties:

Worklet variables. Use worklet variables to reference values and record information. You use worklet variables the same way you use workflow variables. You can assign a workflow variable to a worklet variable to override its initial value. For details on worklet variables, see “Using Worklet Variables” on page 169. Events. To use the Event-Wait and Event-Raise tasks in the worklet, you must first declare an event in the worklet properties. Metadata extension. Extend the metadata stored in the repository by associating information with repository objects. For details, see “Working with Metadata Extensions” on page 82.

♦ ♦

Adding Tasks in Worklets
After you create a new worklet, add tasks by opening the worklet in the Worklet Designer. A worklet must contain a Start task. The Start task represents the beginning of a worklet. When you create a worklet, the Worklet Designer automatically creates a Start task for you.
To add tasks to a non-reusable worklet: 1. 2.

Create a non-reusable worklet in the Workflow Designer workspace. Right-click the worklet and choose Open Worklet.

166

Chapter 6: Working with Worklets

3. 4. 5.

The Worklet Designer opens so you can add tasks in the worklet. Add tasks in the worklet by using the Tasks toolbar or choose Tasks-Create in the Worklet Designer. Connect tasks with links.

Declaring Events in Worklets
Similar to workflows, you can use Event-Wait and Event-Raise tasks in a worklet. To use the Event-Raise task, you first declare a user-defined event in the worklet. Events in one instance of a worklet do not affect events in other instances of the worklet. You cannot specify worklet events in the Event tasks in the parent workflow. For more information about using event tasks, see “Working with Event Tasks” on page 153.

Viewing Links in a Worklet
When you edit a workflow or worklet, you can view the forward or backward link paths to other tasks. You can highlight paths to see links in the workflow branch from the Start task to the last task in the branch. For details, see “Developing Workflows” on page 91.

Nesting Worklets
You can nest a worklet within another worklet. When you run a workflow containing nested worklets, the PowerCenter Server runs the nested worklet from within the parent worklet. You can group several worklets together by function or simplify the design of a complex workflow when you nest worklets. You might choose to nest worklets to load data to fact and dimension tables. Create a nested worklet to load fact and dimension data into a staging area. Then, create a nested worklet to load the fact and dimension data from the staging area to the data warehouse. You might choose to nest worklets to simplify the design of a complex workflow. Nest worklets that can be grouped together within one worklet. In the workflow in Figure 6-1, two worklets relate to regional sales and two worklets relate to quarterly sales. Figure 6-1 shows a workflow that uses multiple worklets:
Figure 6-1. Workflow with Multiple Worklets

Developing a Worklet

167

The workflow in Figure 6-2 shows the same workflow with the worklets grouped and nested in parent worklets. Figure 6-2 shows a workflow that uses nested worklets:
Figure 6-2. Workflow with Nested Worklets

Creating Nested Worklets
From the Worklet Designer, open the parent worklet. To nest an existing reusable worklet, choose Tasks-Insert Worklet. To create a non-reusable nested worklet, choose Tasks-Create, and select worklet.

168

Chapter 6: Working with Worklets

Using Worklet Variables
Worklet variables are similar to workflow variables. A worklet has the same set of pre-defined variables as any task. You can also create user-defined worklet variables. Like user-defined workflow variables, user-defined worklet variables can be persistent or non-persistent. For details on workflow variables, see “Using Workflow Variables” on page 103. You cannot use variables from the parent workflow in the worklet. Similarly, you cannot use user-defined worklet variables in the parent workflow. However, you can use pre-defined worklet variables in the parent workflow, just as you can use pre-defined variables for other tasks in the workflow.

Persistent Worklet Variables
User-defined worklet variables can be persistent or non-persistent. To create a persistent worklet variable, select Persistent when you create the variable. When you create a persistent worklet variable, the worklet variable retains its value the next time the PowerCenter Server executes the worklet instance in the parent workflow. For example, you might have a worklet with a persistent variable. Use two instances of the worklet in a workflow to run the worklet twice. You name the first instance of the worklet Worklet1 and the second instance Worklet2. Figure 6-3 shows the example workflow:
Figure 6-3. Example of Persistent Worklet Variable

When you run the example workflow shown in Figure 6-3, the persistent worklet variable retains its value from Worklet1 and becomes the initial value in Worklet2. After the PowerCenter Server executes Worklet2, it retains the value of the persistent variable in the repository and uses the value the next time you run the workflow. Worklet variables only persist when you run the same workflow. A worklet variable does not retain its value when you use instances of the worklet in different workflows.

Overriding Initial Value
For each worklet instance, you can override the initial value of the worklet variable by assigning a workflow variable to it.
To override the initial value of a worklet variable: 1.

Double-click the worklet instance in the Workflow Designer workspace.

Using Worklet Variables

169

2.

On the Parameters tab, click the Add button.

Add Button

Select a user-defined worklet variable.

3. 4. 5.

Click the open button in the User-Defined Worklet Variables field to select a worklet variable. Click the Open button in the Parent Workflow Variable field to select a workflow variable to assign to the worklet variable. Click Apply. The worklet variable in this worklet instance now has the selected workflow variable as its initial value.

170

Chapter 6: Working with Worklets

Validating Worklets
The Workflow Manager validates worklets when you save the worklet in the Worklet Designer. In addition, when you use worklets in a workflow, the PowerCenter Server validates the workflow according to the following validation rules at runtime:
♦ ♦ ♦

You cannot run two instances of the same worklet concurrently in the same workflow. You cannot run two instances of the same worklet concurrently across two different workflows. Each worklet instance in the workflow can run only once.

When a worklet instance is invalid, the workflow using the worklet instance remains valid. For details on workflow validation rules, see “Validating a Workflow” on page 119. The Workflow Manager displays a red invalid icon if the worklet object is invalid. The Workflow Manager validates the worklet object using the same validation rules for workflows. The Workflow Manager displays a blue invalid icon if the worklet instance in the workflow is invalid. The worklet instance may be invalid when any of the following conditions occurs:
♦ ♦ ♦

The parent workflow or worklet variable you assign to the user-defined worklet variable does not have a matching datatype. The user-defined worklet variable you used in the worklet properties does not exist. You do not specify the parent workflow or worklet variable you want to assign.

For non-reusable worklets, you may see both red and blue invalid icons displayed over the worklet icon in the Navigator.

Validating Worklets

171

172

Chapter 6: Working with Worklets

Chapter 7

Working with Sessions
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 174 Creating a Session Task, 175 Editing a Session, 177 Creating a Session Configuration Object, 183 Using Pre- and Post-Session SQL Commands, 186 Using Pre- or Post-Session Shell Commands, 188 Using Post-Session Email, 194 Validating a Session, 195 Running the Session, 197 Stopping and Aborting a Session, 200 Mapping Parameters and Variables in Sessions, 203 Handling High Precision Data, 204

173

Overview
A session is a set of instructions that tells the PowerCenter Server how and when to move data from sources to targets. A session is a type of task, similar to other tasks available in the Workflow Manager. In the Workflow Manager, you configure a session by creating a Session task. To run a session, you must first create a workflow to contain the Session task. When you create a Session task, you enter general information such as the session name, session schedule, and the PowerCenter Server to run the session. You can also select options to execute pre-session shell commands, send On-Success or On-Failure email, and use FTP to transfer source and target files. Using session properties, you can also override parameters established in the mapping, such as source and target location, source and target type, error tracing levels, and transformation attributes. When you assign a server in a server grid to a session, the server you specify at the session level overrides the server you specify at the workflow level. You can run as many sessions in a workflow as you need. You can run the Session tasks sequentially or concurrently, depending on your needs. The PowerCenter Server creates several files and in-memory caches depending on the transformations and options used in the session. For more details on session output files and caches, see “Output Files and Caches” on page 28.

174

Chapter 7: Working with Sessions

Creating a Session Task
You create a Session task for each mapping you want the PowerCenter Server to run. The PowerCenter Server uses the instructions configured in the session to move data from sources to targets. You can create a reusable Session task in the Task Developer. You can also create non-reusable Session tasks in the Workflow Designer as you develop the workflow. After you create the session, you can edit the session properties at any time.
Note: Before you create a Session task, you must configure the Workflow Manager to

communicate with databases and the PowerCenter Server. You must assign appropriate permissions for any database, FTP, or external loader connections you configure. For details on configuring the Workflow Manager, see “Configuring the Workflow Manager” on page 37.

Session Privileges
To create sessions, you must have one of the following sets of privileges and permissions:
♦ ♦

Use Workflow Manager privilege with read, write, and execute permissions Super User privilege

You must have read permission for connection objects associated with the session in addition to the above privileges and permissions. PowerCenter allows you to set a read-only privilege for sessions. The Workflow Operator privilege allows a user to view, start, stop, and monitor sessions without being able to edit session properties.

Steps to Create a Session Task
Create the Session task in the Task Developer or the Workflow Designer. Session tasks created in the Task Developer are reusable. For more information about reusable tasks and other general information about workflow tasks, see “Reusable Workflow Tasks” on page 135.
To create a Session task: 1.

In the Workflow Designer, click the Session Task icon on the Tasks toolbar. or Choose Tasks-Create. Select Session Task for the task type.

2. 3.

Enter a name for the Session task. Click Create. The Mappings dialog box appears.

Creating a Session Task

175

4.

Select the mapping you want to use in the Session task and click OK.

5.

Click Done. The Session task appears in the workspace.

176

Chapter 7: Working with Sessions

Editing a Session
After you create a session, you can edit it. For example, you might need to adjust the buffer and cache sizes, modify the update strategy, or clear a variable value saved in the repository. Double-click the Session task to open the session properties. The session has the following tabs, and each of those tabs has multiple settings:
♦ ♦ ♦ ♦ ♦ ♦

General tab. Enter session name, mapping name, description for the Session task, specify a PowerCenter Server override, and configure additional task options. Properties tab. Enter session log information, test load settings, and performance configuration. Config Object tab. Enter advanced settings, log options, and error handling configuration. Mapping tab. Enter source and target information, override transformation properties, and configure the session for partitioning. Components tab. Configure pre- or post-session shell commands and emails. Metadata Extension tab. Configure metadata extension options.

For a detailed description of the session properties tabs and associated options, see “Session Properties Reference” on page 667. Figure 7-1 shows the session properties:
Figure 7-1. Session Properties

Editing a Session

177

You can edit session properties at any time. The repository updates the session properties immediately. If the session is running when you edit the session, the repository updates the session when the session completes. If the mapping changes, the Workflow Manager might issue a warning that the session is invalid. The Workflow Manager then allows you to continue editing the session properties. After you edit the session properties, the PowerCenter Server validates the session and reschedules the session as necessary. For details on session validation, see “Validating a Session” on page 195.

Edit Session Privilege
To edit a session, you must have one of the following sets of privileges and permissions:
♦ ♦

Use Workflow Manager privilege with read and write permissions on the folder Super User privilege

Applying Attributes to All Instances
When you edit the session properties, you can apply source, target, and transformation settings to all instances of the same type in the session. You can also apply settings to all partitions in a pipeline. You can apply reader or writer settings, connection settings, and properties settings. For example, you might need to change a relational connection from a test to a production database for all the target instances in a session. You can change the connection value for one target in a session and apply the connection to the other relational target objects.

178

Chapter 7: Working with Sessions

Figure 7-2 shows the writers, connections, and properties settings for a target instance in a session:
Figure 7-2. Session Target Object Settings

For a target instance, you can change writers, connections, and properties settings.

Table 7-1 shows the options you can use to apply attributes to objects in a session. You can apply different options depending on whether the setting is a reader or writer, connection, or an object property.
Table 7-1. Apply All Options Setting Reader Writer Reader Writer Option Apply Type to All Instances Description Applies a reader or writer type to all instances of the same object type in the session. For example, you can apply a relational reader type to all the other readers in your session. Applies a reader or writer type to all the partitions in a pipeline. For example, if you have four partitions, you can change the writer type in one partition for a target instance. Then you can use this option to apply the change to the other three partitions. Applies the same type of connection to all instances. Connection types are relational, FTP, queue, application, or external loader.

Apply Type to All Partitions

Connections

Apply Connection Type

Editing a Session

179

Table 7-1. Apply All Options Setting Connections Option Apply Connection Value Description Apply a connection value to all instances or partitions. The connection value defines a specific connection that you can view in the connection browser. You can only apply a connection value that is valid for the existing connection type. Apply only the connection attribute values to all instances or partitions. Each type of connection has different attributes. You can apply connection attributes separately from connection values. To view sample connection attributes, see Figure 7-3 on page 181. Apply the connection value and its connection attributes to all the other instances that have the same connection type. This option combines the connection option and the connection attribute option. Applies the connection value and its attributes to all the other instances even if they do not have the same connection type. This option is similar to Apply Connection Data, but it allows you to change the connection type. Applies an attribute value to all instances of the same object type in the session. For example, if you have a relational target you can choose to truncate a table before you load data. You can apply the attribute value to all the relational targets in your session. Applies an attribute value to all partitions in a pipeline. For example, you can change the name of the reject file name in one partition for a target instance, then apply the file name change to the other partitions.

Connections

Apply Connection Attributes

Connections

Apply Connection Data

Connections

Apply All Connection Information

Properties

Apply Attribute to all Instances

Properties

Apply Attribute to all Partitions

Applying Connection Settings
When you apply connection settings you can apply the connection type, connection value, and connection attributes. You can only apply a connection value that is valid for a connection type unless you choose the Apply All Connection Information option. For example, if a target instance uses an FTP connection, you can only choose an FTP connection value to apply to it. The Apply All Connection Information option enables you to apply a new connection type, connection value, and connection attributes.

180

Chapter 7: Working with Sessions

Figure 7-3 illustrates the connection options by showing where they display on a connection browser:
Figure 7-3. Connection Options

The connection type can be relational, FTP, queue, application, or external loader.

The connection value defines a specific connection.

Connection attributes are different for each connection type.

Applying Attributes to Partitions or Instances
When you apply attributes to all instances or partitions in a session, you must open the session and edit one of the session objects. You apply attributes or properties to other instances by choosing an attribute in that object and selecting to apply its value to the other instances or partitions.
To apply attributes to all instances or partitions: 1. 2. 3.

Open a session in the workspace. Click the Mappings tab. Choose a source, target, or transformation instance from the Navigator. Settings for properties, connections, and readers or writers might display, depending on the object you choose.

Editing a Session

181

4.

Right-click a reader, writer, property, or connection value. A list of options display.

5. 6.

Select an option from the list and choose to apply it to all instances or all partitions. Click OK to apply the attribute or property.

182

Chapter 7: Working with Sessions

Creating a Session Configuration Object
The Config Object tab in the session properties includes commit and load settings, log options, and error handling settings. The Workflow Manager allows you to create a reusable set of attributes for the Config Object tab. When you configure attributes in the Config Object tab, you can specify a session configuration object you already created. Or, you can specify the default session configuration object called default_session_config. Override the attributes in the session configuration object in the Config Object tab. Figure 7-4 shows the Config Object tab of the session properties:
Figure 7-4. Config Object Tab

Select a session configuration object.

Click the Browse button in the Config Name field to choose a session configuration. Select a user-defined or default session configuration object from the browser.
To create a session configuration object: 1.

In the Workflow Manager, click Tasks-Session Configuration.

Creating a Session Configuration Object

183

The Session Configuration Browser appears. Figure 7-5 shows the Session Configuration Browser:
Figure 7-5. Session Configuration Browser

2.

Click New to create a new session configuration object.

3.

Enter a name for the session configuration object.

184

Chapter 7: Working with Sessions

4.

In the Properties tab, configure advanced settings, log options, and error handling options.

5.

Click OK.

For session configuration object settings descriptions, see “Config Object Tab” on page 675.

Creating a Session Configuration Object

185

Using Pre- and Post-Session SQL Commands
You can specify pre- and post-session SQL in the Source Qualifier transformation and the target instance when you create a mapping. When you create a Session task in the Workflow Manager you can override the SQL commands on the Mapping tab. You might want to use these commands to drop indexes on the target before the session runs, and then recreate them when the session completes. The PowerCenter Server executes pre-session SQL commands before it reads the source. It executes post-session SQL commands after it writes to the target.

Guidelines for Entering Pre- and Post-Session SQL Commands
Remember the following guidelines when creating the SQL statements:
♦ ♦ ♦ ♦ ♦ ♦

You can use any command that is valid for the database type. However, the PowerCenter Server does not allow nested comments, even though the database might. You can use mapping parameters and variables in SQL executed against the source, but not the target. Use a semi-colon (;) to separate multiple statements. The PowerCenter Server ignores semi-colons within single quotes, double quotes, or within /* ...*/. If you need to use a semi-colon outside of quotes or comments, you can escape it with a back slash (\). The Workflow Manager does not validate the SQL.

Error Handling
You can configure error handling on the Config Object tab. You can choose to stop or continue the session if the PowerCenter Server encounters an error issuing the pre- or postsession SQL command.

186

Chapter 7: Working with Sessions

Figure 7-6 shows how to configure error handling for a pre- or post-session SQL commands:
Figure 7-6. Stop or Continue the Session on Pre- or Post-Session SQL Errors

Stop or continue the session on preor postsession SQL error.

Using Pre- and Post-Session SQL Commands

187

Using Pre- or Post-Session Shell Commands
The PowerCenter Server can perform shell commands at the beginning of the session or at the end of the session. Shell commands are operating system commands. You can use pre- or postsession shell commands, for example, to delete a reject file or session log, or to archive target files before the session begins. The Workflow Manager provides the following types of shell commands for each Session task:

Pre-session command. The PowerCenter Server performs pre-session shell commands at the beginning of a session. You can configure a session to stop or continue if a pre-session shell command fails. Post-session success command. The PowerCenter Server performs post-session success commands only if the session completed successfully. Post-session failure command. The PowerCenter Server performs post-session failure commands only if the session failed to complete. Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows servers. Configure the session to execute the pre- or post-session shell commands.

♦ ♦

Use the following guidelines to call a shell command:
♦ ♦

The Workflow Manager provides a task called the Command task that allows you to specify shell commands anywhere in the workflow. You can choose a reusable Command task for the pre- or post-session shell command. Or, you can create non-reusable shell commands for the pre- or post-session shell commands. For details on the Command task, see “Working with the Command Task” on page 143. If you create a non-reusable pre- or post-session shell command, you can make it into a reusable Command task. The Workflow Manager allows you to choose from the following options when you configure shell commands:
♦ ♦

Create non-reusable shell commands. Create a non-reusable set of shell commands for the session. Other sessions in the folder cannot use this set of shell commands. Use an existing reusable Command task. Select an existing Command task to run as the pre- or post-session shell command.

Configure pre- and post-session shell commands in the Components tab of the session properties.

Using Server and Session Variables
You can include any server variable, such as $PMTargetFileDir, or session variables in commands in pre-session and post-session commands. When you use a server variable instead of entering a specific directory, you can run the same workflow on different PowerCenter Servers without changing session properties. You cannot use server variables or session
188 Chapter 7: Working with Sessions

variables in standalone Command tasks in the workflow. The PowerCenter Server does not expand server variables or session variables used in standalone Command tasks.

Configuring Non-Reusable Shell Commands
When you create non-reusable pre- or post-session shell commands, the commands are only visible in session properties. The Workflow Manager does not create Command tasks from these non-reusable commands. You can make non-reusable shell commands into a reusable Command tasks. Figure 7-7 shows the Make Reusable option for a pre-session shell command:
Figure 7-7. Make Reusable Option for Pre-Session Shell Commands

Make this shell command reusable.

Perform the following steps to create pre- or post-session shell commands for a specific session.

Using Pre- or Post-Session Shell Commands

189

To create non-reusable shell commands: 1.

In the Components tab of the session properties, select Non-reusable for pre- or postsession shell command.

Edit presession commands.

2. 3.

Click the Edit button in the Value field to open the Edit Pre- or Post-Session Command dialog box. Enter a name for the command in the General tab.

190

Chapter 7: Working with Sessions

4.

If you want the PowerCenter Server to perform the next command only if the previous command completed successfully, select Run If Previous Completed in the Properties tab.

5.

In the Commands tab, click the Add button to add shell commands. Enter one command for each line.

Add a command.

6.

Click OK.

Creating a Reusable Command Task from Pre- or Post-Session Commands
If you create non-reusable pre- or post-session shell commands, you can make them into a reusable Command task. Once you make the pre- or post-session shell commands into a reusable Command task, you cannot revert back.

Using Pre- or Post-Session Shell Commands

191

To create a Command Task from non-reusable pre- or post-session shell commands, click the Edit button to open the Edit dialog box for the shell commands. In the General tab, select the Make Reusable checkbox. After you check the Make Reusable checkbox and click OK, a new Command task appears in the Tasks folder in the Navigator window. You can use this Command task in other workflows, just as you do with any other reusable workflow tasks.

Configuring Reusable Shell Commands
Perform the following steps to call an existing reusable Command task as the pre- or postsession shell command for the Session task.
To select an existing Command task as the pre-session shell command: 1. 2. 3.

In the Components tab of the session properties, click Reusable for the pre- or postsession shell command. Click the Edit button in the Value field to open the Task Browser dialog box. Select the Command task you want to run as the pre- or post-session shell command.

4.

Click the Override button in the Task Browser dialog box if you want to change the order of the commands, or if you want to specify whether to run the next command when the previous command fails. Changes you make to the Command task from the session properties only apply to the session. In the session properties, you cannot edit the commands in the Command task.

5.

Click OK to select the Command task for the pre- or post-session shell command. The name of the Command task you select appears in the Value field for the shell command.

192

Chapter 7: Working with Sessions

Using Server Variables
You can include any server variable, such as $PMTargetFileDir, in pre- or post-session shell commands. When you use a server variable instead of entering a specific directory, you can run the same workflow on different PowerCenter Servers without changing session properties.

Pre-Session Shell Command Errors
You can configure the session to stop or continue if a pre-session shell command fails. If you select stop, the PowerCenter Server stops the session, but continues with the rest of the workflow. If you select Continue, the PowerCenter Server ignores the errors and continues the session. By default the PowerCenter Server stops the session upon shell command errors. Configure the session to stop or continue if a pre-session shell command fails in the Error Handling settings on the Config Object tab. Figure 7-8 shows how to configure the session to stop or continue when a pre-session shell command fails:
Figure 7-8. Stop or Continue the Session on Pre-Session Shell Command Error

Stop or continue the session on presession shell command error.

Using Pre- or Post-Session Shell Commands

193

Using Post-Session Email
The PowerCenter Server can send emails after the session completes. You can send an email when the session completes successfully. Or, you can send an email when the session fails. The PowerCenter Server can send the following types of emails for each Session task:
♦ ♦

On-Success Email. The PowerCenter Server sends the email when the session completes successfully. On-Failure Email. The PowerCenter Server sends the email when the session fails.

You can also use an Email task to send email anywhere in the workflow. If you already created a reusable Email task, you can select it as the On-Success or On-Failure email for the session. Or, you can create non-reusable emails that exist only within the Session task. For more information about sending post-session emails, see “Sending Email” on page 319.

194

Chapter 7: Working with Sessions

Validating a Session
The Workflow Manager validates a Session task when you save it. You can also manually validate Session tasks and session instances. Validate reusable Session tasks in the Task Developer. Validate non-reusable sessions and reusable session instances in the Workflow Designer. The Workflow Manager marks a reusable session or session instance invalid if you perform one of the following tasks:

Edit the mapping in a way that might invalidate the session. You can edit the mapping used by a session at any time. When you edit and save a mapping, the repository might invalidate sessions that already use the mapping. The PowerCenter Server does not execute invalid sessions. You must reconnect to the folder to see the effect of mapping changes on Session tasks. For details on validating mappings, see “Mappings” in the Designer Guide. When you edit a session based on an invalid mapping, the Workflow Manager displays a warning message:
The mapping [mapping_name] associated with the session [session_name] is invalid.

♦ ♦ ♦

Delete a database, FTP, or external loader connection used by the session. Leave session attributes blank. For example, the session is invalid if you do not specify the source file name. Change the code page of a session database connection to an incompatible code page.

If you delete objects associated with a Session task such as session configuration object, Email, or Command task, the Workflow Manager marks a reusable session invalid. However, the Workflow Manager does not mark a non-reusable session invalid if you delete an object associated with the session. If you delete a shortcut to a source or target from the mapping, the Workflow Manager does not mark the session invalid. The Workflow Manager does not validate SQL overrides or filter conditions entered in the session properties when you validate a session. You must validate SQL override and filter conditions in the SQL Editor. If a reusable session task is invalid, the Workflow Manager displays an invalid icon over the session task in the Navigator and in the Task Developer workspace. This does not affect the validity of the session instance and the workflows using the session instance. If a reusable or non-reusable session instance is invalid, the Workflow Manager marks it invalid in the Navigator and in the Workflow Designer workspace. Workflows using the session instance remain valid. To validate a session, select the session in the workspace and choose Tasks-Validate. Or, rightclick the session instance in the workspace and choose Validate.

Validating a Session

195

Validating Multiple Sessions
You can validate multiple sessions without fetching them into the workspace. You must select and validate the sessions from a query results view or a view dependencies list. You can save and optionally check in sessions that change from invalid to valid status. For more information about validating multiple objects, see “Validating Multiple Objects” in the Repository Guide.
Note: If you are using the Repository Manager, you can select and validate multiple sessions

from the Navigator.
To validate multiple sessions: 1. 2.

Select sessions from either a query list or a view dependencies list. Right-click one of the selected sessions and choose Validate. The Validate Objects dialog box displays.

3.

Choose whether to save objects and check in objects that you validate.

196

Chapter 7: Working with Sessions

Running the Session
By default, the PowerCenter Server you assign to a workflow runs all tasks. If you register multiple servers to a repository, you can override the PowerCenter Server at the session level. In a server grid, the master server distributes the sessions to available worker servers. You can assign a PowerCenter Server to a session. The session always runs on the server you assigned to it. For more information about how a server grid distributes sessions, see “Distributing Sessions” on page 446.

Selecting a Server to Run the Session
You can choose a server to run the session. If you only register one server, the Workflow Manager lists the single registered PowerCenter Server that runs the workflow and session. For PowerCenter repositories with multiple servers, the Workflow Manager lists all servers.
To select a server to run a session: 1. 2. 3.

Open a session in a workflow. Double-click the session in the workflow. The Edit Tasks dialog box appears. Click the Select Server button on the General tab. A list of registered servers appear.

Select a server.

4.

Select a server to run the session.

Running the Session

197

5.

Click OK twice to select the server for the session.

Instead of choosing a server for each session in the folder, you can assign multiple sessions to a server.

Assigning the PowerCenter Server to a Session
After you register the PowerCenter Server, you can assign it to sessions you want to run on that server. This allows you to assign the PowerCenter Server to multiple sessions without editing each session property individually. To assign the PowerCenter Server to multiple sessions, you must first close all folders in the repository. To assign the PowerCenter Server to sessions, you must have the Super User privilege. Figure 7-9 shows the Assign Server dialog box:
Figure 7-9. Assign Server Dialog Box

Select a server to assign. Select a folder. Show sessions.

Assign a server to a session.

To assign the PowerCenter Server: 1. 2.

Close all folders in the repository. Choose Server-Assign Server. or Right-click the server name in the Navigator and choose Assign Server. The Assign Server dialog box opens.

3.

From the Choose Server list, select the server you want to assign.

198

Chapter 7: Working with Sessions

4. 5. 6. 7.

From the Show Folder list, select the folder you want to view. Or, choose All to view workflows in all folders in the repository. Select the Show Sessions check box. Select each session you want to run on the PowerCenter Server. Click Assign.

You can remove an assigned server from a session in the Assign Server dialog box. Perform the following steps to remove an assigned server from a session.
To remove an assigned server: 1. 2. 3. 4. 5. 6.

Close all folders in the repository. Choose Server-Assign Server. From the Choose Server list, select None. From the Show Folder list, select the folder you want to view. Or, choose All to view workflows in all folders in the repository. Select the sessions from which you want to remove the assigned server. Click Assign.

Running the Session

199

Stopping and Aborting a Session
You can stop or abort a session just as you can stop or abort any task. You can also abort a session by using the ABORT() function in the mapping logic. Session errors can cause the PowerCenter Server to stop a session early. You can control the stopping point by setting an error threshold in a session, using the ABORT function in mappings, or requesting the PowerCenter Server to stop the session. You cannot control the stopping point when the PowerCenter Server encounters fatal errors, such as loss of connection to the target database. If a session fails as a result of error, you can consider performing session recovery. For more information on recovery, see “Recovering a Session Task” on page 311. For more information on row error logging, see “Overview” on page 482.

Threshold Errors
You can choose to stop a session on a designated number of non-fatal errors. A non-fatal error is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the session properties with the Stop On option. When you enable this option, the PowerCenter Server counts non-fatal errors that occur in the reader, writer, and transformation threads. The PowerCenter Server maintains an independent error count when reading sources, transforming data, and writing to targets. The PowerCenter Server counts the following nonfatal errors when you set the stop on option in the session properties:

Reader errors. Errors encountered by the PowerCenter Server while reading the source database or source files. Reader threshold errors can include alignment errors while running a session in Unicode mode. Writer errors. Errors encountered by the PowerCenter Server while writing to the target database or target files. Writer threshold errors can include key constraint violations, loading nulls into a not null field, and database trigger responses. Transformation errors. Errors encountered by the PowerCenter Server while transforming data. Transformation threshold errors can include conversion errors, and any condition set up as an ERROR, such as null input.

When you create multiple partitions in a pipeline, the PowerCenter Server maintains a separate error threshold for each partition. When the PowerCenter Server reaches the error threshold for any partition, it stops the session. The writer may continue writing data from one or more partitions, but it does not affect your ability to perform a successful recovery.
Note: If alignment errors occur in a non line-sequential VSAM file, the PowerCenter Server

sets the error threshold to 1 and stops the session.

Fatal Error
A fatal error occurs when the PowerCenter Server cannot access the source, target, or repository. This can include loss of connection or target database errors, such as lack of
200 Chapter 7: Working with Sessions

database space to load data. If the session uses a Normalizer or Sequence Generator transformation, the PowerCenter Server cannot update the sequence values in the repository, and a fatal error occurs. If the session does not use a Normalizer or Sequence Generator transformation, and the PowerCenter Server loses connection to the repository, the PowerCenter Server does not stop the session. The session completes, but the PowerCenter Server cannot log session statistics into the repository.

ABORT Function
Use the ABORT function in the mapping logic to abort a session when the PowerCenter Server encounters a designated transformation error. For more information about ABORT, see “Functions” in the Transformation Language Reference.

User Command
You can stop or abort the session from the Workflow Manager. You can also stop the session using pmcmd.

PowerCenter Server Handling for Session Failure
The PowerCenter Server handles session errors in different ways, depending on the error or event that causes the session to fail. Table 7-2 describes the PowerCenter Server behavior when a session fails:
Table 7-2. PowerCenter Server Behavior for Failed Sessions Cause for Session Errors - Error threshold met due to reader errors - Stop command using Workflow Manager or pmcmd PowerCenter Server Behavior The PowerCenter Server performs the following tasks: - Stops reading. - Continues processing data. - Continues writing and committing data to targets. If the PowerCenter Server cannot finish processing and committing data, you need to issue the Abort command to stop the session. The PowerCenter Server performs the following tasks: - Stops reading. - Continues processing data. - Continues writing and committing data to targets. If the PowerCenter Server cannot finish processing and committing data within 60 seconds, it kills the PowerCenter Server process.

Abort command using Workflow Manager

Stopping and Aborting a Session

201

Table 7-2. PowerCenter Server Behavior for Failed Sessions Cause for Session Errors - Fatal error from database - Error threshold met due to writer errors PowerCenter Server Behavior The PowerCenter Server performs the following tasks: - Stops reading and writing. - Rolls back all data not committed to the target database. If the session stops due to fatal error, the commit or rollback may or may not be successful. The PowerCenter Server performs the following tasks: - Stops reading. - Flags the row as an abort row and continues processing data. - Continues to write to the target database until it hits the abort row. - Issues commits based on commit intervals. - Rolls back all data not committed to the target database.

- Error threshold met due to transformation errors - ABORT( ) - Invalid evaluation of transaction control expression

202

Chapter 7: Working with Sessions

Mapping Parameters and Variables in Sessions
You can use mapping parameters in the session properties to alter certain mapping attributes. For example, you can use a mapping parameter in a transformation override to override a filter or user-defined join in a Source Qualifier transformation. If you use mapping variables in a session, you can clear any of the variable values saved in the repository by editing the session. When you clear the variable values, the PowerCenter Server uses the values in the parameter file the next time you run a session. If the session does not use a parameter file, the PowerCenter Server uses the initial values defined in the mapping. For more information on mapping variables, see “Mapping Parameters and Variables” in the Designer Guide.
To view or delete values for mapping variables saved in the repository: 1.

In the Navigator window of the Workflow Manager, right-click the Session task and select View Persistent Values.

2. 3.

Click Delete Values to delete existing variable values. To save changes, click OK.

Mapping Parameters and Variables in Sessions

203

Handling High Precision Data
The PowerCenter Server processes decimal values as Doubles or Decimals. When you create a session, you choose to enable the Decimal datatype or let the PowerCenter Server process the data as a Double (precision of 15). To enable high precision data handling:
♦ ♦

Use the Decimal datatype with a precision of 16 to 28 in the mapping. Select Enable High Precision in the session properties.

The precision attributed to a number also includes the scale of the number. For example, the value 11.47 has a precision of 4 and a scale of 2. For example, you might have a mapping with Decimal (20,0) that passes the number 40012030304957666903. If you enable high precision, the PowerCenter Server passes the number as is. If you do not enable high precision, the PowerCenter Server passes 4.00120303049577 x 10 19. If you want to process a Decimal value with a precision greater than 28 digits, the PowerCenter Server automatically treats it as a Double value. For example, if you want to process the number 2345678904598383902092.1927658, which has a precision of 29 digits, the PowerCenter Server automatically treats this number as a Double value of 2.34567890459838 x 10 21.
To use high precision data handling in a session: 1.

In the Workflow Manager, open the session properties.

204

Chapter 7: Working with Sessions

2.

On the Properties tab, select Enable High Precision.

Enable High Precision

3.

Click OK twice to save changes.

Handling High Precision Data

205

206

Chapter 7: Working with Sessions

Chapter 8

Working with Sources
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 208 Configuring Sources in a Session, 210 Working with Relational Sources, 214 Working with File Sources, 218 Server Handling for File Sources, 226 Server Handling for File Sources, 226 Using a File List, 230

207

Overview
In the Workflow Manager, you can create sessions with the following sources:

Relational. You can extract data from any relational database that the PowerCenter Server can connect to. When extracting data from relational sources and Application sources, you must configure the database connection to the data source prior to configuring the session. File. You can create a session to extract data from a flat file, COBOL, or XML source. The PowerCenter Server can extract data from any local directory or FTP connection for the source file. If the file source requires an FTP connection, you need to configure the FTP connection to the host machine before you create the session. Heterogeneous. You can extract data from multiple sources in the same session. You can extract from multiple relational sources, such as Oracle and SQL Server. Or, you can extract from multiple source types, such as relational and flat file. When you configure a session with heterogeneous sources, configure each source instance separately.

Globalization Features
You can choose a code page that you want the PowerCenter Server to use for relational sources and flat files. You specify code pages for relational sources when you configure database connections in the Workflow Manager. You can set the code page for file sources in the session properties. For more information about code pages, see “Globalization Overview” in the Installation and Configuration Guide.

Source Connections
Before you can extract data from a source, you must configure the connection properties the PowerCenter Server uses to connect to the source file or database. You can configure source database and FTP connections in the Workflow Manager. For more information on creating database connections, see “Configuring the Workflow Manager” on page 37. For more information on creating FTP connections, see “Using FTP” on page 559.

Permissions and Privileges
You must have read permissions for the connections you use in the session. For example, if the source requires database connections or FTP connections, you must have permission to read those connections in the session.

Allocating Buffer Memory
When the PowerCenter Server initializes a session, it allocates blocks of memory to hold source and target data. The PowerCenter Server allocates at least two blocks for each source and target partition. Sessions that use a large number of sources or targets might require
208 Chapter 8: Working with Sources

additional memory blocks. If the PowerCenter Server cannot allocate enough memory blocks to hold the data, it fails the session. For more information on allocating buffer memory, see “Optimizing the Session” on page 655.

Partitioning Sources
You can create multiple partitions for relational, Application, and file sources. For relational or Application sources, the PowerCenter Server creates a separate connection to the source database for each partition you set in the session properties. For file sources, you can configure the session to read the source with one thread or multiple threads. For more information on partitioning data, see “Pipeline Partitioning” on page 345.

Overview

209

Configuring Sources in a Session
Configure source properties for sessions in the Sources node of the Mapping tab of the session properties. When you configure source properties for a session, you define properties for each source instance in the mapping. Figure 8-1 shows the Sources node on the Mapping tab:
Figure 8-1. Sources Node of the Session Properties

The Sources node lists the sources used in the session and displays their settings. To view and configure settings for a source, select the source from the list. You can configure the following settings for a source:
♦ ♦ ♦

Readers Connections Properties

Configuring Readers
You can click the Readers settings on the Sources node to view the reader the PowerCenter Server uses with each source instance. The Workflow Manager specifies the necessary reader for each source instance in the Readers settings on the Sources node.

210

Chapter 8: Working with Sources

Figure 8-2 shows the Readers settings in the Sources node of the Mapping tab:
Figure 8-2. Readers Settings in the Sources Node of the Mapping Tab

Configuring Connections
Click the Connections settings on the Sources node to define source connection information.

Configuring Sources in a Session

211

Figure 8-3 shows the Connections settings in the Sources node of the Mapping tab:
Figure 8-3. Connections Settings in the Sources Node

Edit a connection. Choose a connection.

For relational sources, choose a configured database connection in the Value column for each relational source instance. By default, the Workflow Manager displays the source type for relational sources. For details on configuring database connections, see “Selecting the Source Database Connection” on page 214. For flat file and XML sources, choose one of the following source connection types in the Type column for each source instance:

FTP. If you want to read data from a flat file or XML source using FTP, you must specify an FTP connection when you configure source options. You must define the FTP connection in the Workflow Manager prior to configuring the session. You must have read permission for any FTP connection you want to associate with the session. The user starting the session must have execute permission for any FTP connection associated with the session. For details on using FTP, see “Using FTP” on page 559.

None. Choose None when you want to read from a local flat file or XML file.

Configuring Properties
Click the Properties settings in the Sources node to define source property information. The Workflow Manager displays properties, such as source file name and location for flat file,

212

Chapter 8: Working with Sources

COBOL, and XML source file types. You do not need to define any properties on the Properties settings for relational sources. Figure 8-4 shows the Properties settings in the Sources node of the Mapping tab:
Figure 8-4. Properties Settings in the Sources Node of the Mapping Tab

For more information on configuring sessions with relational sources, see “Working with Relational Sources” on page 214. For more information on configuring sessions with flat file sources, see “Working with File Sources” on page 218. For more information on configuring sessions with XML sources, see the XML User Guide.

Configuring Sources in a Session

213

Working with Relational Sources
When you configure a session to read data from a relational source, you can configure the following properties for sources:
♦ ♦

Source database connection. Select the database connection for each relational source. For more information, see “Selecting the Source Database Connection” on page 214. Treat source rows as. Define how the PowerCenter Server treats each source row as it reads it from the source table. For more information, see “Defining the Treat Source Rows As Property” on page 214. Table owner name. Define the table owner name for each relational source. For more information, see “Configuring the Table Owner Name” on page 216. Override SQL query. You can override the default SQL query to extract source data. For more information, see “Overriding the SQL Query” on page 216.

♦ ♦

Selecting the Source Database Connection
Before you can run a session to read data from a source database, the PowerCenter Server must connect to the source database. Database connections must exist in the repository to appear on the source database list. You must define them prior to configuring a session. For details on configuring a database connection, see “Setting Up a Relational Database Connection” on page 53. On the Connections settings in the Sources node, select the database connection from the list. You must have read permission for the source database connection to configure the session to use it. The user starting the configured session must have execute permission for source database connections.

Defining the Treat Source Rows As Property
When the PowerCenter Server reads a source, it marks each row with an indicator to specify which operation to perform when the row reaches the target. You can define how the PowerCenter Server marks each row using the Treat Source Rows As property in the General Options settings on the Properties tab.

214

Chapter 8: Working with Sources

Figure 8-5 shows the Treat Source Rows As property on the General Options settings:
Figure 8-5. Treat Source Rows As Property

Treat Source Rows As Property

Table 8-1 describes the options you can choose for the Treat Source Rows As property:
Table 8-1. Treat Source Rows As Options Treat Source Rows As Option Insert Delete Update Description The PowerCenter Server marks all rows to insert into the target. The PowerCenter Server marks all rows to delete from the target. The PowerCenter Server marks all rows to update the target. You can further define the update operation in the target options. For more information, see “Target Properties” on page 241. The PowerCenter Server uses the Update Strategy transformations in the mapping to determine the operation on a row-by-row basis. You define the update operation in the target options. If the mapping contains an Update Strategy transformation, this option defaults to Data Driven. You can also use this option when the mapping contains Custom transformations configured to set the update strategy.

Data Driven

Once you determine how to treat all rows in the session, you also need to set update strategy options for individual targets. For more information on setting the target update strategy options, see “Target Properties” on page 241. For more information on setting the update strategy for a session, see “Update Strategy Transformation” in the Transformation Guide.
Working with Relational Sources 215

Configuring the Table Owner Name
You can define the owner name of the source table in the session properties. For some databases such as DB2, tables can have different owners. If the database user specified in the database connection is not the owner of the source tables in a session, specify the table owner for each source instance. A session can fail if the database user is not the owner and you do not specify the table owner name. Specify the table owner name in the Owner Name field in the Properties settings in the Sources node. Figure 8-6 shows the Properties settings where you define the table owner name for relational sources:
Figure 8-6. Source Table Owner Name Property

Owner Name

Overriding the SQL Query
You can alter or override the default query in the mapping by entering SQL override in the Properties settings in the Sources node. You can enter any SQL statement supported by the source database. The Workflow Manager does not validate the SQL override. The following errors could cause the session to fail, and possibly cause data errors:
♦ ♦

Fields with incompatible datatypes or unknown fields Typing mistakes or other errors

216

Chapter 8: Working with Sources

Figure 8-7 shows the Properties settings in the Sources node where you can override the SQL query:
Figure 8-7. SQL Query Override Property in the Session Properties

SQL Query

To override the default query for a relational source: 1. 2. 3. 4. 5. 6.

In the Workflow Manager, open the session properties. Click the Mapping tab and open the Transformations view. Click the Sources node and open the Properties settings. Click the Open button in the SQL Query field to open the SQL Editor. Enter the SQL override. Click OK to return to the session properties.

Working with Relational Sources

217

Working with File Sources
You can create a session to extract data from flat file or COBOL sources. When you create a session to read data from a flat file or COBOL file, you can configure the following information in the session properties:

Source properties. You can define source properties on the Properties settings in the Sources node, such as source file options. For more information, see “Configuring Source Properties” on page 218. Flat file properties. You can edit fixed-width and delimited source file properties. For more information, see “Configuring Fixed-Width File Properties” on page 220 and “Configuring Delimited File Properties” on page 222. Line sequential buffer length. You can change the buffer length for flat files on the Advanced settings on the Config Object tab. For more information, see “Configuring Line Sequential Buffer Length” on page 225. Treat source rows as. Define how the PowerCenter Server treats each source row as it reads it from the source. For more information, see “Defining the Treat Source Rows As Property” on page 214.

Configuring Source Properties
You can define session source properties on the Properties settings in the Sources node.

218

Chapter 8: Working with Sources

Figure 8-8 shows the flat file source properties you define in the Properties settings of the Sources node on the Mapping tab:
Figure 8-8. Properties Settings in the Sources Node for a Flat File Source

Working with File Sources

219

Table 8-2 describes the properties you define on the Properties settings for flat file source definitions:
Table 8-2. Flat File Source Properties File Source Options Source File Directory Required/ Optional Optional Description Enter the directory name in this field. By default, the PowerCenter Server looks in the server variable directory, $PMSourceFileDir, for file sources. If you specify both the directory and file name in the Source Filename field, clear this field. The PowerCenter Server concatenates this field with the Source Filename field when it runs the session. You can also use the $InputFileName session parameter to specify the file directory. For details on session parameters, see “Session Parameters” on page 495. Enter the file name, or file name and path. Optionally use the $InputFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Source File Directory field when it runs the session. For example, if you have “C:\data\” in the Source File Directory field, then enter “filename.dat” in the Source Filename field. When the PowerCenter Server begins the session, it looks for “C:\data\filename.dat”. By default, the Workflow Manager enters the file name configured in the source definition. For details on session parameters, see “Session Parameters” on page 495. Allows you to configure multiple file sources using a file list. Indicates whether the source file contains the source data, or whether it contains a list of files with the exact same file properties. Choose Direct if the source file contains the source data. Choose Indirect if the source file contains a list of files. When you select Indirect, the PowerCenter Server finds the file list and reads each listed file when it runs the session. For details on file lists, see “Using a File List” on page 230. Opens a dialog box that allows you to override source file properties. By default, the Workflow Manager displays file properties as configured in the source definition. For more information, see “Configuring Fixed-Width File Properties” on page 220 and “Configuring Delimited File Properties” on page 222.

Source Filename

Required

Source Filetype

Required

Set File Properties link

Optional

Configuring Fixed-Width File Properties
When you read data from a fixed-width file, you can edit file properties in the session, such as the null character or code page. You can configure fixed-width properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot configure fixed-width properties for instances of reusable sessions in the Workflow Designer. Click Set File Properties to open the Flat Files dialog box.

220

Chapter 8: Working with Sources

Figure 8-9 shows the Flat Files dialog box:
Figure 8-9. Flat Files Dialog Box

To edit the fixed-width properties, select Fixed Width and click Advanced. The Fixed-Width Properties dialog box appears. By default, the Workflow Manager displays file properties as configured in the mapping. Edit these settings to override those configured in the source definition. Figure 8-10 shows the Fixed-Width Properties dialog box:
Figure 8-10. Fixed-Width File Properties Dialog Box

Working with File Sources

221

Table 8-3 describes options you can define in the Fixed Width Properties dialog box for file sources:
Table 8-3. Fixed-Width File Properties for File Sources Fixed-Width Properties Options Text/Binary Required/ Optional Required Description Indicates the character representing a null value in the file. This can be any valid character in the file code page, or any binary value from 0 to 255. For more information about specifying null characters, see “Null Character Handling” on page 227. If selected, the PowerCenter Server reads repeat NULL characters in a single field as a single NULL value. If you do not select this option, the PowerCenter Server reads a single null character at the beginning of a field as a null field. Important: For multibyte code pages, Informatica recommends that you specify a single-byte null character if you are using repeating non-binary null characters. This ensures that repeating null characters fit into the column exactly. For more information about specifying null characters, see “Null Character Handling” on page 227. Select the code page of the fixed-width file. The default setting is the client code page. The PowerCenter Server skips the specified number of rows before reading the file. Use this to skip header rows. One row may contain multiple records. If you select the Line Sequential File Format option, the PowerCenter Server ignores this option. The PowerCenter Server skips the specified number of bytes between records. For example, you have an ASCII file on Windows with one record on each line, and a carriage return and line feed appear at the end of each line. If you want the PowerCenter Server to skip these two single-byte characters, enter 2. If you have an ASCII file on UNIX with one record for each line, ending in a carriage return, skip the single character by entering 1. If selected, the PowerCenter Server strips trailing blank spaces from records before passing them to the Source Qualifier transformation. Select this option if the file uses a carriage return at the end of each record, shortening the final column.

Repeat Null Character

Optional

Code Page Number of Initial Rows to Skip

Required Optional

Number of Bytes to Skip Between Records

Optional

Strip Trailing Blanks Line Sequential File Format

Optional Optional

Configuring Delimited File Properties
When you read data from a delimited file, you can edit file properties in the session, such as the delimiter or code page. You can configure delimited properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot configure delimited properties for instances of reusable sessions in the Workflow Designer. Click Set File Properties to open the Flat Files dialog box.

222

Chapter 8: Working with Sources

Figure 8-11 shows the Flat Files dialog box:
Figure 8-11. Flat Files Dialog Box

To edit the delimited properties, select Delimited and click Advanced. The Delimited File Properties dialog box appears. By default, the Workflow Manager displays file properties as configured in the mapping. Edit these settings to override those configured in the source definition. Figure 8-12 shows the Delimited File Properties dialog box:
Figure 8-12. Delimited File Properties Dialog Box

Working with File Sources

223

Table 8-4 describes options you can define in the Delimited File Properties dialog box for file sources:
Table 8-4. Delimited File Properties for File Sources Delimited File Properties Options Delimiters Required/ Optional Required Description Character used to separate columns of data in the source file. Use the button to the right of this field to enter a different delimiter. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). You cannot select unprintable multibyte characters as delimiters. The delimiter must be in the same code page as the flat file code page. By default, the PowerCenter Server reads pairs of delimiters as a null value. If selected, the PowerCenter Server reads any number of consecutive delimiter characters as one. For example, a source file uses a comma as the delimiter character and contains the following record: 56, , , Jane Doe. By default, the PowerCenter Server reads that record as four columns separated by three delimiters: 56, NULL, NULL, Jane Doe. If you select this option, the PowerCenter Server reads the record as two columns separated by one delimiter: 56, Jane Doe. Select No Quotes, Single Quote, or Double Quotes. If you select a quote character, the PowerCenter Server ignores delimiter characters within the quote characters. Therefore, the PowerCenter Server uses quote characters to escape the delimiter. For example, a source file uses a comma as a delimiter and contains the following row: 342-3849, ‘Smith, Jenna’, ‘Rockville, MD’, 6. If you select the optional single quote character, the PowerCenter Server ignores the commas within the quotes and reads the row as four fields. If you do not select the optional single quote, the PowerCenter Server reads six separate fields. When the PowerCenter Server reads two optional quote characters within a quoted string, it treats them as one quote character. For example, the PowerCenter Server reads the following quoted string as I’m going
tomorrow:

Treat Consecutive Delimiters as One

Optional

Optional Quotes

Required

2353, ‘I’’m going tomorrow.’, MD Additionally, if you select an optional quote character, the PowerCenter Server only reads a string as a quoted string if the quote character is the first character of the field. Note: You can improve session performance if the source file does not contain quotes or escape characters. Code Page Escape Character Required Optional Select the code page of the delimited file. The default setting is the client code page. Character immediately preceding a delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string. When you specify an escape character, the PowerCenter Server reads the delimiter character as a regular character (called escaping the delimiter or quote character). Note: You can improve session performance for mappings containing Sequence Generator transformations if the source file does not contain quotes or escape characters.

224

Chapter 8: Working with Sources

Table 8-4. Delimited File Properties for File Sources Delimited File Properties Options Remove Escape Character From Data Number of Initial Rows to Skip Required/ Optional Optional Optional Description This option is selected by default. Clear this option to include the escape character in the output string. The PowerCenter Server skips the specified number of rows before reading the file. Use this to skip title or header rows in the file.

Configuring Line Sequential Buffer Length
You can configure the line buffer length for file sources. By default, the PowerCenter Server reads a file record into a buffer that holds 1024 bytes. If the source file records are larger than 1024 bytes, increase the Line Sequential Buffer Length property in the session properties accordingly. Figure 8-13 shows the Advanced settings on the Config Object tab in the session properties where you define the line buffer length:
Figure 8-13. Line Sequential Buffer Length Property for File Sources

Line Sequential Buffer Length

Working with File Sources

225

Server Handling for File Sources
When you configure a session with file sources, you might take these additional features into account when creating mappings with file sources:
♦ ♦ ♦ ♦ ♦ ♦

Character set Multibyte character error handling Null character handling Row length handling for fixed-width flat files Numeric data handling Tab handling

Character Set
You can configure the PowerCenter Server to run sessions in either ASCII or Unicode data movement mode. Table 8-5 describes source file formats supported by each data movement path in PowerCenter:
Table 8-5. Support for ASCII and Unicode Data Movement Modes Character Set 7-bit ASCII US-EBCDIC (COBOL sources only) 8-bit ASCII 8-bit EBCDIC (COBOL sources only) ASCII-based MBCS EBCDIC-based MBCS Unicode mode Supported Supported Supported Supported Supported Supported ASCII mode Supported Supported Supported Supported PowerCenter Server generates a warning message. Not supported. The PowerCenter Server terminates the session.

If you configure a session to run in ASCII data movement mode, delimiters, escape characters, and null characters must be valid in the ISO Western European Latin 1 code page. Any 8-bit characters you specified in previous versions of PowerCenter are still valid. In Unicode data movement mode, delimiters, escape characters, and null characters must be valid in the specified code page of the flat file. For more information about configuring and working with data movement modes, see “Globalization Overview” in the Installation and Configuration Guide.

226

Chapter 8: Working with Sources

Multibyte Character Error Handling
Misalignment of multibyte data in a file causes session errors. Data becomes misaligned when you place column breaks incorrectly in a file, resulting in multibyte characters that extend beyond the last byte in a column. When you import a fixed-width flat file, you can create, move, or delete column breaks using the Flat File Wizard. Incorrect positioning of column breaks can create alignment errors when you run a session containing multibyte characters. The PowerCenter Server handles alignment errors in fixed-width flat files according to the following guidelines:

Non-line sequential file. The PowerCenter Server skips rows containing misaligned data and resumes reading the next row. The skipped row appears in the session log with a corresponding error message. If an alignment error occurs at the end of a row, the PowerCenter Server skips both the current row and the next row, and writes them to the session log. Line sequential file. The PowerCenter Server skips rows containing misaligned data and resumes reading the next row. The skipped row appears in the session log with a corresponding error message. Reader error threshold. You can configure a session to stop after a specified number of non-fatal errors. A row containing an alignment error increases the error count by 1. The session stops if the number of rows containing errors reaches the threshold set in the session properties. Errors and corresponding error messages appear in the session log file.

Fixed-width COBOL sources are always byte-oriented and can be line sequential. The PowerCenter Server handles COBOL files according to the following guidelines:

Line sequential files. The PowerCenter Server skips rows containing misaligned data and writes the skipped rows to the session log. The session stops if the number of error rows reaches the error threshold. Non-line sequential files. The session stops at the first row containing misaligned data.

Null Character Handling
You can specify single-byte or multibyte null characters for fixed-width flat files. The PowerCenter Server uses these characters to determine if a column is null.

Server Handling for File Sources

227

Table 8-6 describes how the PowerCenter Server uses the Null Character and Repeat Null Character properties to determine if a column is null:
Table 8-6. Null Character Handling Null Character Binary Repeat Null Character Disabled PowerCenter Server Behavior A column is null if the first byte in the column is the binary null character. The PowerCenter Server reads the rest of the column as text data only to determine the column alignment and track the shift state for shift sensitive code pages. If data in the column is misaligned, the PowerCenter Server skips the row and writes the skipped row and a corresponding error message to the session log. A column is null if the first character in the column is the null character. The PowerCenter Server reads the rest of the column only to determine the column alignment and track the shift state for shift sensitive code pages. If data in the column is misaligned, the PowerCenter Server skips the row and writes the skipped row and a corresponding error message to the session log. A column is null if it contains only the specified binary null character. The next column inherits the initial shift state of the code page. A column is null if the repeating null character fits into the column exactly, with no bytes leftover. For example, a five-byte column is not null if you specify a two-byte repeating null character. In shift-sensitive code pages, shift bytes do not affect the null value of a column. A column is still null if it contains a shift byte at the beginning or end of the column. Informatica recommends you specify a single-byte null character if you use repeating non-binary null characters. This ensures that repeating null characters fit into a column exactly.

Non-binary

Disabled

Binary Non-binary

Enabled Enabled

Row Length Handling for Fixed-Width Flat Files
For fixed-width flat files, data in a row can be shorter than the row length in the following situations:
♦ ♦

The file is fixed-width line-sequential with a carriage return or line feed that appears sooner than expected. The file is fixed-width non-line sequential, and the last line in the file is shorter than expected.

In these cases, the PowerCenter Server reads the data but does not append any blanks to fill the remaining bytes. The PowerCenter Server reads subsequent fields as NULL. Fields containing repeating null characters that do not fill the entire field length are not considered NULL.

228

Chapter 8: Working with Sources

Numeric Data Handling
Sometimes, file sources contain non-numeric data in numeric columns. When the PowerCenter Server reads non-numeric data, it treats the row differently, depending on the source type. When the PowerCenter Server reads non-numeric data from numeric columns in a flat file source or an XML source, it drops the row and writes the row to the session log. When the PowerCenter Server reads non-numeric data for numeric columns in a COBOL source, it reads a null value for the column.

Server Handling for File Sources

229

Using a File List
You can create a session to run multiple source files for one source instance in the mapping. You might use this feature if, for example, your company collects data at several locations which you then want to move through the same session. When you create a mapping to use multiple source files for one source instance, the properties of all files must exactly match the source definition. To use multiple source files, you create a file containing the names and directories of each source file you want the PowerCenter Server to use. This file is referred to as a file list. When you configure the session properties, enter the file name of the file list in the Source Filename field and enter the location of the file list in the Source File Directory field. When the session starts, the PowerCenter Server reads the file list, then locates and reads the first file source in the list. After the PowerCenter Server reads the first file, it locates and reads the next file in the list. The PowerCenter Server writes the path and name of the file list to the session log. If the PowerCenter Server encounters an error while accessing a source file, it logs the error in the session log and stops the session.
Note: When you use a file list and the session performs incremental aggregation, the

PowerCenter Server performs incremental aggregation across all listed source files.

Creating the File List
The file list contains the names of all the source files you want the PowerCenter Server to use for the source instance in the session. Create the file list in an editor appropriate to the PowerCenter Server platform and save it as a text file. For example, you can create a file list for a PowerCenter Server on Windows with any text editor then save it as ASCII. The PowerCenter Server interprets the file list using the PowerCenter Server code page. Each file in the list must use the user-defined code page configured in the source definition. This code page must be a subset of the repository code page. Each file in the file list must share the same file properties as configured in the source definition or as entered for the source instance in the session property sheet. You can enter different paths for each file in the list, but for the session to complete successfully, the paths must be local to the PowerCenter Server machine. Map the drives on a PowerCenter Server on Windows or mount the drives on a PowerCenter Server on UNIX, as necessary. If you do not specify a path for a file, the PowerCenter Server assumes the file is in the same directory as the file list. The file list format must follow the following guidelines:
♦ ♦

Text file One file name, or path and file name, for each line

230

Chapter 8: Working with Sources

The PowerCenter Server skips blank lines and ignores leading blank spaces. Any characters indicating a new line, such as \n in ASCII files, must be valid in the code page of the PowerCenter Server. The following example shows a valid file list created for a PowerCenter Server on Windows. Each of the drives listed are mapped on the server machine. The western_trans.dat file is located in the same directory as the file list.
western_trans.dat d:\data\eastern_trans.dat

e:\data\midwest_trans.dat f:\data\canada_trans.dat

Once you create the file list, place it in a directory local to the PowerCenter Server.

Configuring a Session to Use a File List
After you create a file list for multiple source files, you can configure the session to access those files.
To use multiple source files for one source instance in a session: 1. 2.

In the Workflow Manager, open the session properties. Click the Mapping tab and open the Transformations view.

Using a File List

231

3.

Click the Properties settings in the Sources node.

Source Filename Indirect File Type

4. 5.

In the Source Filetype field, choose Indirect. In the Source Filename field, replace the file name with the name of the file list. If necessary, also enter the path in the Source File Directory field. If you enter only a file name in the Source Filename field, and you have specified a path in the Source File Directory field, the PowerCenter Server looks for the named file in the listed directory. If you enter only a file name in the Source Filename field, and you do not specify a path in the Source File Directory field, the PowerCenter Server looks for the named file in the directory where the PowerCenter Server is installed on UNIX or in the system directory on Windows.

6.

Click OK.

232

Chapter 8: Working with Sources

Chapter 9

Working with Targets
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 234 Configuring Targets in a Session, 236 Working with Relational Targets, 240 Working with Target Connection Groups, 257 Working with Active Sources, 259 Working with File Targets, 261 Server Handling for File Targets, 268 Working with Heterogeneous Targets, 274

233

Overview
In the Workflow Manager, you can create sessions with the following targets:

Relational. You can load data to any relational database that the PowerCenter Server can connect to. When loading data to relational targets, you must configure the database connection to the target before you configure the session. File. You can load data to a flat file or XML target. The PowerCenter Server can load data to any local directory or FTP connection for the target file. If the file target requires an FTP connection, you need to configure the FTP connection to the host machine before you create the session. Heterogeneous. You can output data to multiple targets in the same session. You can output to multiple relational targets, such as Oracle and Microsoft SQL Server. Or, you can output to multiple target types, such as relational and flat file. For more information, see “Working with Heterogeneous Targets” on page 274.

Globalization Features
You can configure the PowerCenter Server to run sessions in either ASCII or Unicode data movement mode. Table 9-1 describes target character sets supported by each data movement mode in PowerCenter:
Table 9-1. Support for ASCII and Unicode Data Movement Modes Character Set 7-bit ASCII 8-bit ASCII ASCII-based MBCS UTF-8 Unicode Mode Supported Supported Supported Supported (targets only) ASCII Mode Supported Supported PowerCenter Server generates a warning message, but does not terminate the session. PowerCenter Server generates a warning message, but does not terminate the session.

PowerCenter allows you to work with targets that use multibyte character sets. You can choose a code page that you want the PowerCenter Server to use for relational objects and flat files. You specify code pages for relational objects when you configure database connections in the Workflow Manager. The code page for a database connection used as a target must be a superset of the repository code page. When you change the database connection code page to one that is not two-way compatible with the old code page, the Workflow Manager generates a warning and invalidates all sessions that use that database connection.

234

Chapter 9: Working with Targets

Code pages you select for a file represent the code page of the data contained in these files. If you are working with flat files, you can also specify delimiters and null characters supported by the code page you have specified for the file. Target code pages must be a superset of the repository code page. They must also be a superset of the source code page and the PowerCenter Server code page. However, if you configure the PowerCenter Server and Client for relaxed code page validation, you can select any code page supported by PowerCenter for the target database connection. When using relaxed code page validation, select compatible code pages for the source and target data to prevent data inconsistencies. For more information about code page compatibility, see “Globalization Overview” in the Installation and Configuration Guide. If the target contains multibyte character data, configure the PowerCenter Server to run in Unicode mode. When the PowerCenter Server runs a session in Unicode mode, it uses the database code page to translate data. If the target contains only single-byte characters, configure the PowerCenter Server to run in ASCII mode. When the PowerCenter Server runs a session in ASCII mode, it does not validate code pages.

Target Connections
Before you can load data to a target, you must configure the connection properties the PowerCenter Server uses to connect to the target file or database. You can configure target database and FTP connections in the Workflow Manager. For details on creating database connections, see “Setting Up a Relational Database Connection” on page 53. For details on creating FTP connections, see “Using FTP” on page 559.

Partitioning Targets
When you create multiple partitions in a session with a relational target, the PowerCenter Server creates multiple connections to the target database to write target data concurrently. When you create multiple partitions in a session with a file target, the PowerCenter Server creates one target file for each partition. You can configure the session properties to merge these target files. For details on configuring a session for pipeline partitioning, see “Pipeline Partitioning” on page 345.

Permissions and Privileges
You must have execute permissions for connection objects associated with the session. For example, if the target requires database connections or FTP connections, you must have read permission on the connections to configure the session, and execute permission to run the session.

Overview

235

Configuring Targets in a Session
Configure target properties for sessions in the Transformations view on Mapping tab of the session properties. Click the Targets node to view the target properties. When you configure target properties for a session, you define properties for each target instance in the mapping. Figure 9-1 shows where you define target properties in a session:
Figure 9-1. Defining Target Properties in the Session Properties

Targets Node Writers Settings

Connections Settings

Properties Settings

Transformations View

The Targets node contains the following settings where you define properties:
♦ ♦ ♦

Writers Connections Properties

Configuring Writers
Click the Writers settings in the Transformations view to define the writer to use with each target instance.

236

Chapter 9: Working with Targets

Figure 9-2 shows you define the writer to use with each target instance:
Figure 9-2. Writers Settings on the Mapping Tab of the Session Properties

Writers Settings

When the mapping target is a flat file, an XML file, an SAP BW target, or an IBM MQSeries target, the Workflow Manager specifies the necessary writer in the session properties. However, when the target in the mapping is relational, you can change the writer type to File Writer if you plan to use an external loader.
Note: You can change the writer type for non-reusable sessions in the Workflow Designer and

for reusable sessions in the Task Developer. You cannot change the writer type for instances of reusable sessions in the Workflow Designer. When you override a relational target to use the file writer, the Workflow Manager changes the properties for that target instance on the Properties settings. It also changes the connection options you can define in the Connections settings. After you override a relational target to use a file writer, define the file properties for the target. Click Set File Properties and choose the target to define. For more information, see “Configuring Fixed-Width Properties” on page 265 and “Configuring Delimited Properties” on page 266.

Configuring Connections
View the Connections settings on the Mapping tab to define target connection information.

Configuring Targets in a Session

237

Figure 9-3 shows the Connections settings on the Mapping tab of the session properties:
Figure 9-3. Connections Settings on the Mapping Tab of the Session Properties

Connections Settings

Choose a connection. Edit a connection.

For relational targets, the Workflow Manager displays Relational as the target type by default. In the Value column, choose a configured database connection for each relational target instance. For details on configuring database connections, see “Target Database Connection” on page 241. For flat file and XML targets, choose one of the following target connection types in the Type column for each target instance:

FTP. If you want to load data to a flat file or XML target using FTP, you must specify an FTP connection when you configure target options. FTP connections must be defined in the Workflow Manager prior to configuring sessions. You must have read permission for any FTP connection you want to associate with the session. The user starting the session must have execute permission for any FTP connection associated with the session. For details on using FTP, see “Using FTP” on page 559.

Loader. You can use the external loader option to improve the load speed to Oracle, DB2, Sybase IQ, or Teradata target databases. To use this option, you must use a mapping with a relational target definition and choose File as the writer type on the Writers settings for the relational target instance. The PowerCenter Server uses an external loader to load target files to the Oracle, DB2, Sybase

238

Chapter 9: Working with Targets

IQ, or Teradata database. You cannot choose external loader if the target is defined in the mapping as a flat file, XML, MQ, or SAP BW target. For details on using the external loader feature, see “External Loading” on page 523.
♦ ♦

Queue. Choose Queue when you want to output to an IBM MQSeries message queue. For details, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide. None. Choose None when you want to write to a local flat file or XML file.

Configuring Properties
View the Properties settings on the Mapping tab to define target property information. The Workflow Manager displays different properties for the different target types: relational, flat file, and XML. Figure 9-4 shows the Properties settings on the Mapping tab:
Figure 9-4. Properties Settings on the Mapping Tab of the Session Properties

Properties Settings

For more information on relational target properties, see “Working with Relational Targets” on page 240. For more information on flat file target properties, see “Working with File Targets” on page 261. For more information on XML target properties, see “Working with Heterogeneous Targets” on page 274. For more information on configuring sessions with multiple target types, see “Working with Heterogeneous Targets” on page 274.

Configuring Targets in a Session

239

Working with Relational Targets
When you configure a session to load data to a relational target, you define most properties in the Transformations view on the Mapping tab. You also define some properties on the Properties tab and the Config Object tab. You can configure the following properties for relational targets:
♦ ♦ ♦ ♦ ♦

Target database connection. Define database connection information. For more information, see “Target Database Connection” on page 241. Target properties. You can define target properties such as target load type, target update options, and reject options. For more information, see “Target Properties” on page 241. Truncate target tables. The PowerCenter Server can truncate target tables before loading data. For more information, see “Truncating Target Tables” on page 245. Deadlock retry. You can configure the session to retry deadlocks when writing to targets. For more information, see “Deadlock Retry” on page 246. Drop and recreate indexes. Use pre- and post-session SQL to drop and recreate an index on a relational target table to optimize query speed. For more information, see “Dropping and Recreating Indexes” on page 248. Constraint-based loading. The PowerCenter Server can load data to targets based on primary key-foreign key constraints and active sources in the session mapping. For more information, see “Constraint-Based Loading” on page 248. Bulk loading. You can specify bulk mode when loading to DB2, Microsoft SQL Server, Oracle, and Sybase databases. For more information, see “Bulk Loading” on page 252.

You can define the following properties in the session and override the properties you define in the mapping:

Table name prefix. You can specify the target owner name or prefix in the session properties to override the table name prefix in the mapping. For more information, see “Table Name Prefix” on page 254. Pre-session SQL. You can create SQL commands and execute them in the target database before loading data to the target. For example, you might want to drop the index for the target table before loading data into it. For more information, see “Using Pre- and PostSession SQL Commands” on page 186. Post-session SQL. You can create SQL commands and execute them in the target database after loading data to the target. For example, you might want to recreate the index for the target table after loading data into it. For more information, see “Using Pre- and PostSession SQL Commands” on page 186.

If any target table or column name contains a database reserved word, you can create and maintain a reserved words file containing database reserved words. When the PowerCenter Server executes SQL against the database, it places quotes around the reserved words. For more information, see “Reserved Words” on page 255. When the PowerCenter Server runs a session with at least one relational target, it performs database transactions per target connection group. For example, it commits all data to targets
240 Chapter 9: Working with Targets

in a target connection group at the same time. For more information, see “Working with Target Connection Groups” on page 257.

Target Database Connection
Before you can run a session to load data to a target database, the PowerCenter Server must connect to the target database. Database connections must exist in the repository to appear on the target database list. You must define them prior to configuring a session. For details on configuring a database connection, see “Configuring the Workflow Manager” on page 37. You can choose the target connections in the Transformations view of the Mapping tab. Click either the Targets or Connections node and select the database connection from the list for each target instance. You must have read permission for the target database connection to configure the session to use it. The user starting the configured session must have execute permission for target database connections.

Target Properties
You can configure session properties for relational targets in the Transformations view on the Mapping tab, and in the General Options settings on the Properties tab. Define the properties for each target instance in the session. When you click the Transformations view on the Mapping tab, you can view and configure the settings of a specific target. Select the target under the Targets node.

Working with Relational Targets

241

Figure 9-5 shows the relational target properties you define in the Properties settings on the Mapping tab:
Figure 9-5. Properties Settings on the Mapping Tab for a Relational Target

Edit settings for a particular target.

Table 9-2 describes the properties available in the Properties settings on the Mapping tab of the session properties:
Table 9-2. Relational Target Properties Target Property Target Load Type Required/ Optional Required Description You can choose Normal or Bulk. If you select Normal, the PowerCenter Server loads targets normally. You can only choose Bulk when you load to Sybase, Oracle, or Microsoft SQL Server. If you specify Bulk for other database types, the PowerCenter Server reverts to a normal load. Note: Choose Normal mode if the mapping contains an Update Strategy transformation. For more information, see “Bulk Loading” on page 252. If selected, the PowerCenter Server inserts all rows flagged for insert. By default, this option is selected. If selected, the PowerCenter Server updates all rows flagged for update. By default, this option is selected. If selected, the PowerCenter Server inserts all rows flagged for update. By default, this option is not selected.

Insert* Update (as Update)* Update (as Insert)*

Optional Optional Optional

242

Chapter 9: Working with Targets

Table 9-2. Relational Target Properties Target Property Update (else Insert)* Required/ Optional Optional Description If selected, the PowerCenter Server updates rows flagged for update if they exist in the target, then inserts any remaining rows marked for insert. By default, this option is not selected. If selected, the PowerCenter Server deletes all rows flagged for delete. By default, this option is selected. If selected, the PowerCenter Server truncates the target before loading. By default, this option is not selected. For details on this feature, see “Truncating Target Tables” on page 245. Enter the directory name in this field. By default, the PowerCenter Server writes all reject files to the server variable directory, $PMBadFileDir. If you specify both the directory and file name in the Reject Filename field, clear this field. The PowerCenter Server concatenates this field with the Reject Filename field when it runs the session. You can also use the $BadFileName session parameter to specify the file directory. For details on session parameters, see “Session Parameters” on page 495. Enter the file name, or file name and path. By default, the PowerCenter Server names the reject file after the target instance name: target_name.bad. Optionally use the $BadFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Reject File Directory field when it runs the session. For example, if you have “C:\reject_file\” in the Reject File Directory field, and enter “filename.bad” in the Reject Filename field, the PowerCenter Server writes rejected rows to C:\reject_file\filename.bad. For details on session parameters, see “Session Parameters” on page 495.

Delete* Truncate Table

Optional Optional

Reject File Directory

Optional

Reject Filename

Required

*For details on target update strategies, see “Update Strategy Transformation” in the Transformation Guide.

Working with Relational Targets

243

Figure 9-6 shows the test load options in the General Options settings on the Properties tab:
Figure 9-6. Test Load Options

Test Load Options

Table 9-3 describes the test load options on the General Options settings on the Properties tab:
Table 9-3. Test Load Options Property Enable Test Load Required/ Optional Optional Description You can configure the PowerCenter Server to perform a test load. With a test load, the PowerCenter Server reads and transforms data without writing to targets. The PowerCenter Server generates all session files, and performs all pre- and post-session functions, as if running the full session. The PowerCenter Server writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the PowerCenter Server does not write data to the targets. Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions using XML sources. Note: You can perform a test load for relational targets when you configure a session for normal mode. If you configure the session for bulk mode, the session fails. Enter the number of source rows you want the PowerCenter Server to test load. The PowerCenter Server reads the exact number you configure for the test load.

Number of Rows to Test

Optional

244

Chapter 9: Working with Targets

Truncating Target Tables
The PowerCenter Server can truncate target tables before running a session. You can choose to truncate tables on a target-by-target basis. If you have more than one target instance, you only have to select the truncate target table option for one target instance. Depending on the target database and primary key-foreign key relationships in the session target, the PowerCenter Server might issue a delete or truncate command. Table 9-4 lists the commands that the PowerCenter Server issues for each database:
Table 9-4. PowerCenter Server Commands on Supported Databases Target Database DB2 Informix ODBC Oracle Microsoft SQL Server Sybase 11.x Table contains a primary key referenced by a foreign key truncate table <table_name>* delete from <table_name> delete from <table_name> delete from <table_name> unrecoverable delete from <table_name> truncate table <table_name> Table does not contain a primary key referenced by a foreign key truncate table <table_name>* delete from <table_name> delete from <table_name> truncate table <table_name> truncate table <table_name>** truncate table <table_name>

*If you use a DB2 database on AS/400, the PowerCenter Server issues a clrpfm command. ** If you use the Microsoft SQL Server ODBC driver, the PowerCenter Server issues a delete statement.

If the PowerCenter Server issues a truncate target table command and the target table instance specifies a table name prefix, the PowerCenter Server verifies the database user privileges for the target table by issuing a truncate command. If the database user is not specified as the target owner name or does not have the database privilege to truncate the target table, the PowerCenter Server automatically issues a delete command instead and writes the following error message to the session log:
WRT_8208 Error truncating target table <target table name> trying DELETE FROM query.

If the PowerCenter Server issues a delete command and the database has logging enabled, the database saves all deleted records to the log for rollback. If you do not want to save deleted records for rollback, you can disable logging to improve the speed of the delete. For all databases, if the PowerCenter Server fails to truncate or delete any selected table because the user lacks the necessary privileges, the session fails. If you use truncate target tables with one of the following functions, the PowerCenter Server fails to successfully truncate target tables for the session:

Incremental aggregation. When you enable both truncate target tables and incremental aggregation in the session properties, the Workflow Manager issues a warning that you cannot enable truncate target tables and incremental aggregation in the same session.

Working with Relational Targets

245

Test load. When you enable both truncate target tables and test load, the PowerCenter Server disables the truncate table function, runs a test load session, and writes the following message to the session log:
WRT_8105 Truncate target tables option turned off for test load session.

To truncate a target table: 1. 2. 3.

In the Workflow Manager, open the session properties. Click the Mapping tab, and then click the Transformations view. Click the Targets node.

Truncate Target Table Option

4. 5.

In the Properties settings, select Truncate Target Table Option for each target table you want the PowerCenter Server to truncate before it runs the session. Click OK.

Deadlock Retry
Select the Session Retry on Deadlock option in the session properties if you want the PowerCenter Server to retry target writes on a deadlock. A deadlock might occur when the PowerCenter Server attempts to take control of the same lock for a row when loading partitioned targets or when running two sessions simultaneously to the same target.

246

Chapter 9: Working with Targets

If the PowerCenter Server encounters a deadlock when it tries to write to a target, the deadlock only affects targets in the same target connection group. The PowerCenter Server still writes to targets in other target connection groups. Encountering deadlocks can slow session performance. To improve session performance, you can increase the number of target connection groups the PowerCenter Server uses to write to the targets in a session. To use a different target connection group for each target in a session, use a different database connection name for each target instance. If you want, you can specify the same connection information for each connection name. For more information, see “Working with Target Connection Groups” on page 257. You can only retry sessions on deadlock for targets configured for normal load. If you select this option and configure a target for bulk mode, the PowerCenter Server does not retry target writes on a deadlock for that target. You can also configure the PowerCenter Server to set the number of deadlock retries and the deadlock sleep time period. For more information on configuring the PowerCenter Server, see the Installation and Configuration Guide. To retry a session on deadlock, click the Properties tab in the session properties and then scroll down to the Performance settings. Figure 9-7 shows how to retry sessions on deadlock:
Figure 9-7. Session Retry on Deadlock

Session Retry on Deadlock

Working with Relational Targets

247

Dropping and Recreating Indexes
After you insert significant amounts of data into a target, you normally need to drop and recreate indexes on that table to optimize query speed. You can drop and recreate indexes by:

Using pre- and post-session SQL. The preferred method for dropping and re-creating indexes is to define a SQL statement in the Pre SQL property that drops indexes before loading data to the target. You can use the Post SQL property to recreate the indexes after loading data to the target. Define the Pre SQL and Post SQL properties for relational targets in the Transformations view on the Mapping tab in the session properties. For more information, see “Using Pre- and Post-Session SQL Commands” on page 186. Using the Designer. The same dialog box you use to generate and execute DDL code for table creation can drop and recreate indexes. However, this process is not automatic. Every time you run a session that modifies the target table, you need to launch the Designer and use this feature.

Constraint-Based Loading
In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the PowerCenter Server orders the target load on a row-by-row basis. For every row generated by an active source, the PowerCenter Server loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraintbased loading depends on the following requirements:
♦ ♦ ♦ ♦

Active source. Related target tables must have the same active source. Key relationships. Target tables must have key relationships. Target connection groups. Targets must be in one target connection group. Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint-based loading.

Active Source
When target tables receive rows from different active sources, the PowerCenter Server reverts to normal loading for those tables, but loads all other targets in the session using constraintbased loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the PowerCenter Server reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the PowerCenter Server performs constraint-based loading: loading the primary key table first, then the foreign key table. For more information on active sources, see “Working with Active Sources” on page 259.

Key Relationships
When target tables have no key relationships, the PowerCenter Server does not perform constraint-based loading. Similarly, when target tables have circular key relationships, the
248 Chapter 9: Working with Targets

PowerCenter Server reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The PowerCenter Server cannot enforce constraint-based loading for these tables. It reverts to a normal load.

Target Connection Groups
The PowerCenter Server enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the PowerCenter Server cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, perform the following tasks:
♦ ♦ ♦ ♦ ♦

Verify all targets are in the same target load order group and receive data from the same active source. Use the default partition properties and do not add partitions or partition points. Define the same target type for all targets in the session properties. Define the same database connection name for all targets in the session properties. Choose normal mode for the target load type for all targets in the session properties.

For more information, see “Working with Target Connection Groups” on page 257.

Treat Rows as Insert
Use constraint-based loading only when the session option Treat Source Rows As is set to Insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading. When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options:
♦ ♦

Load primary key table in one mapping and dependent tables in another mapping. You can use constraint-based loading to load the primary table. Perform inserts in one mapping and updates in another mapping.

For more information about update strategies, see “Update Strategy Transformation” in the Transformation Guide. Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the PowerCenter Server reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint-based loading establishes the order in which the PowerCenter Server loads individual targets within a set of targets receiving data from a single source qualifier.

Working with Relational Targets

249

Example
The session for the mapping in Figure 9-8 is configured to perform constraint-based loading. In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key. Since these four tables receive records from a single active source, SQ_A, the PowerCenter Server loads rows to the target in the following order:
♦ ♦ ♦

T_1 T_2 and T_3 (in no particular order) T_4

The PowerCenter Server loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The PowerCenter Server then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The PowerCenter Server loads T_4 last, because it has a foreign key that references a primary key in T_3.
Figure 9-8. Mapping Using Constraint-Based Loading

After loading the first set of targets, the PowerCenter Server begins reading source B. If there are no key relationships between T_5 and T_6, the PowerCenter Server reverts to a normal load for both targets. If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the PowerCenter Server loads rows to the tables in the following order:
♦ ♦

T_5 T_6

250

Chapter 9: Working with Targets

T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The PowerCenter Server includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets.
To enable constraint-based loading: 1.

In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property.

Treat rows as insert.

Working with Relational Targets

251

2.

Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.

Constraint Based Load Ordering

3.

Click OK.

Bulk Loading
You can enable bulk loading when you load to DB2, Sybase, Oracle, or Microsoft SQL Server. If you enable bulk loading for other database types, the PowerCenter Server reverts to a normal load. Bulk loading improves the performance of a session that inserts a large amount of data to the target database. Configure bulk loading on the Mapping tab. When bulk loading, the PowerCenter Server invokes the database bulk utility and bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery. Therefore, you must weigh the importance of improved session performance against the ability to recover an incomplete session. For more information on increasing session performance when bulk loading, see “Bulk Loading” on page 642.
Note: When loading to DB2, Microsoft SQL Server, and Oracle targets, you must specify a

normal load for data driven sessions. When you specify bulk mode and data driven, the PowerCenter Server reverts to normal load.

252

Chapter 9: Working with Targets

Committing Data
When bulk loading to Sybase and DB2 targets, the PowerCenter Server ignores the commit interval you define in the session properties and commits data when the writer block is full. When bulk loading to Microsoft SQL Server and Oracle targets, the PowerCenter Server commits data at each commit interval. Also, Microsoft SQL Server and Oracle start a new bulk load transaction after each commit.
Tip: When bulk loading to Microsoft SQL Server or Oracle targets, define a large commit

interval to reduce the number of bulk load transactions and increase performance.

Oracle Guidelines
Oracle allows bulk loading for the following software versions:
♦ ♦

Oracle server version 8.1.5 or higher Oracle client version 8.1.7.2 or higher

You can use the Oracle client 8.1.7 if you install the Oracle Threaded Bulk Mode patch. Use the following guidelines when bulk loading to Oracle:
♦ ♦ ♦

Do not define CHECK constraints in the database. Do not define primary and foreign keys in the database. However, you can define primary and foreign keys for the target definitions in the Designer. To bulk load into indexed tables, choose non-parallel mode. To do this, you must disable the Enable Parallel Mode option. For more information, see “Configuring a Relational Database Connection” on page 56. Note that when you disable parallel mode, you cannot load multiple target instances, partitions, or sessions into the same table. To bulk load in parallel mode, you must drop indexes and constraints in the target tables before running a bulk load session. After the session completes, you can rebuild them. If you use bulk loading with the session on a regular basis, you can use pre- and post-session SQL to drop and rebuild indexes and key constraints.

♦ ♦

When you use the LONG datatype, verify it is the last column in the table. Specify the Table Name Prefix for the target when you use Oracle client 9i. If you do not specify the table name prefix, the PowerCenter Server uses the database login as the prefix.

For more information, see your Oracle documentation.

DB2 Guidelines
Use the following guidelines when bulk loading to DB2:

You must drop indexes and constraints in the target tables before running a bulk load session. After the session completes, you can rebuild them. If you use bulk loading with the session on a regular basis, you can use pre- and post-session SQL to drop and rebuild indexes and key constraints.

Working with Relational Targets

253

♦ ♦

You cannot use source-based or user-defined commit when you run bulk load sessions on DB2. If you create multiple partitions for a DB2 bulk load session, you must use database partitioning for the target partition type. If you choose any other partition type, the PowerCenter Server reverts to normal load and writes the following message to the session log:
ODL_26097 Only database partitioning is support for DB2 bulk load. Changing target load type variable to Normal.

When you bulk load to DB2, the DB2 database writes non-fatal errors and warnings to a message log file in the session log directory. The message log file name is <session_log_name>.<target_instance_name>.<partition_index>.log. You can check both the message log file and the session log when you troubleshoot a DB2 bulk load session.

For more information, see your DB2 documentation.

Table Name Prefix
The table name prefix is the owner of the target table. For some databases, such as DB2, tables can have different owners. If the database user specified in the database connection is not the owner of the target tables in a session, specify the table owner for each target instance. A session can fail if the database user is not the owner and you do not specify the table owner name. You can specify the table owner name in the target instance or in the session properties. When you specify the table owner name in the session properties, you override table owner name in the transformation properties. For more information about specifying table owner name in the mapping properties, see “Mappings” in the Designer Guide.
Note: When you specify the table owner name and you set the sqlid for a DB2 database in the

environment SQL, the PowerCenter Server uses table owner name in the target instance. To use the table owner name specified in the SET sqlid statement, do not enter a name in the target name prefix.
To specify the target owner name or prefix at the session level: 1. 2.

In the Workflow Manager, open the session properties and click the Transformations view on the Mapping tab. Select the target instance under the Targets node.

254

Chapter 9: Working with Targets

3.

In the Properties settings, enter the table owner name or prefix in the Table Name Prefix field, and click OK.

Target Instance

Table Name Prefix

Reserved Words
If any table name or column name contains a database reserved word, such as MONTH or YEAR, the session fails with database errors when the PowerCenter Server executes SQL against the database. You can create and maintain a reserved words file, reswords.txt, in the PowerCenter Server installation directory. When the PowerCenter Server initializes a session, it searches for reswords.txt. If the file exists, the PowerCenter Server places quotes around matching reserved words when it executes SQL against the database. Use the following rules and guidelines when working with reserved words.
♦ ♦ ♦

The PowerCenter Server searches the reserved words file when it generates SQL to connect to source, target, and lookup databases. If you override the SQL for a source, target, or lookup, you must enclose any reserved word in quotes. You may need to enable some databases, such as Microsoft SQL Server and Sybase, to use SQL-92 standards regarding quoted identifiers. You can use environment SQL to issue the command. For example, with Microsoft SQL Server, you can use the following command:
SET QUOTED_IDENTIFIER ON

Working with Relational Targets

255

Sample reswords.txt File
To use a reserved words file, create a file named reswords.txt and place it in the PowerCenter Server installation directory. Create a section for each database that you need to store reserved words for. Add reserved words used in any table or column name. You do not need to store all reserved words for a database in this file. Database names and reserved words in resword.txt are not case sensitive. Following is a sample resword.txt file:
[Teradata] MONTH DATE INTERVAL [Oracle] OPTION START [DB2] [SQL Server] CURRENT [Informix] [ODBC] MONTH [Sybase]

256

Chapter 9: Working with Targets

Working with Target Connection Groups
When you create a session with at least one relational target, SAP BW target, or dynamic MQSeries target, you need to consider target connection groups. A target connection group is a group of targets that the PowerCenter Server uses to determine commits and loading. When the PowerCenter Server performs a database transaction, such as a commit, it performs the transaction to all targets in a target connection group. The PowerCenter Server performs the following database transactions per target connection group:

Deadlock retry. If the PowerCenter Server encounters a deadlock when it writes to a target, the deadlock only affects targets in the same target connection group. The PowerCenter Server still writes to targets in other target connection groups. For more information, see “Deadlock Retry” on page 246. Constraint-based loading. The PowerCenter Server enforces constraint-based loading for targets in a target connection group. If you want to specify constraint-based loading, you must verify the primary table and foreign table are in the same target connection group. For more information, see “Constraint-Based Loading” on page 248. Belong to the same partition. Belong to the same target load order group. Have the same target type in the session. Have the same database connection name for relational targets, and Application connection name for SAP BW targets. For more information, see the PowerCenter Connect for SAP BW User and Administrator Guide. Have the same target load type, either normal or bulk mode.

Targets in the same target connection group meet the following criteria:
♦ ♦ ♦ ♦

For example, suppose you create a session based on a mapping that reads data from one source and writes to two Oracle target tables. In the Workflow Manager, you do not create multiple partitions in the session. You use the same Oracle database connection for both target tables in the session properties. You specify normal mode for the target load type for both target tables in the session properties. The targets in the session belong to the same target connection group. Suppose you create a session based on the same mapping. In the Workflow Manager, you do not create multiple partitions. However, you use one Oracle database connection name for one target, and you use a different Oracle database connection name for the other target. You specify normal mode for the target load type for both target tables. The targets in the session belong to different target connection groups.
Note: When you define the target database connections for multiple targets in a session using

session parameters, the targets may or may not belong to the same target connection group. The targets belong to the same target connection group if all session parameters resolve to the same target connection name. For example, you create a session with two targets and specify the session parameter $DBConnection1 for one target, and $DBConnection2 for the other
Working with Target Connection Groups 257

target. In the parameter file, you define $DBConnection1 as Sales1 and you define $DBConnection2 as Sales1 and run the workflow. Both targets in the session belong to the same target connection group.

258

Chapter 9: Working with Targets

Working with Active Sources
An active source is an active transformation the PowerCenter Server uses to generate rows. An active source can be any of the following transformations:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Aggregator Application Source Qualifier Custom, configured as an active transformation Joiner MQ Source Qualifier Normalizer (VSAM or pipeline) Rank Sorter Source Qualifier XML Source Qualifier Mapplet, if it contains any of the above transformation

Note: Although the Filter, Router, Transaction Control, and Update Strategy transformations

are active transformations, the PowerCenter Server does not use them as active sources in a pipeline. Active sources affect how the PowerCenter Server processes a session when you use any of the following transformations or session properties:

XML targets. The PowerCenter Server can load data from different active sources to an XML target when each input group receives data from one active source. For more information on XML targets, see “Working with XML Targets” in the XML User Guide. Transaction generators. Transaction generators, such as Transaction Control transformations, become ineffective for downstream transformations or targets if you put a transaction control point after it. Transaction control points are transaction generators and active sources that generate commits. For more information on effective and ineffective transaction generators, see “Transaction Control Transformation” in the Transformation Guide. For a list of transaction control points, see “Transformation Scope” on page 287. Mapplets. An Input transformation must receive data from a single active source. For more information on connecting mapplets to active sources in mappings, see “Mapplets” in the Designer Guide. Source-based commit. Some active sources generate commits. When you run a sourcebased commit session, the PowerCenter Server generates a commit from these active sources at every commit interval. For more information on source-based commit sessions, see “Source-Based Commits” on page 278.

Working with Active Sources

259

Constraint-based loading. To use constraint-based loading, you must connect all related targets to the same active source. The PowerCenter Server orders the target load on a rowby-row basis based on rows generated by an active source. For more information on constraint-based loading, see “Constraint-Based Loading” on page 248. Row error logging. If an error occurs downstream from an active source that is not a source qualifier, the PowerCenter Server cannot identify the source row information for the logged error row. For more information on logging errors, see “Overview” on page 482.

260

Chapter 9: Working with Targets

Working with File Targets
You can output data to a flat file in either of the following ways:

Use a flat file target definition. Create a mapping with a flat file target definition. Create a session using the flat file target definition. When the PowerCenter Server runs the session, it creates the target flat file based on the flat file target definition. Use a relational target definition. Use a relational definition to write to a flat file when you want to use an external loader to load the target. Create a mapping with a relational target definition. Create a session using the relational target definition. Configure the session to output to a flat file by specifying the File Writer in the Writers settings on the Mapping tab. For details on using the external loader feature, see “External Loading” on page 523. Target properties. You can define target properties such as partitioning options, output file options, and reject options. For more information, see “Configuring Target Properties” on page 261. Flat file properties. You can choose to create delimited or fixed-width files, and define their properties. For more information, see “Configuring Fixed-Width Properties” on page 265 and “Configuring Delimited Properties” on page 266.

You can configure the following properties for flat file targets:

Configuring Target Properties
You can configure session properties for flat file targets in the Properties settings on the Mapping tab, and in the General Options settings on the Properties tab. Define the properties for each target instance in the session.

Working with File Targets

261

Figure 9-9 shows the flat file target properties you define in the Properties settings on the Mapping tab in the session properties:
Figure 9-9. Properties Settings on the Mapping Tab for a Flat File Target

Flat File Target Instance Set File Properties Properties Settings

Table 9-5 describes the properties you define in the Properties settings for flat file target definitions:
Table 9-5. Flat File Target Properties Target Properties Merge Partitioned Files Required/ Optional Optional Description When selected, the PowerCenter Server merges the partitioned target files into one file when the session completes, and then deletes the individual output files. If the PowerCenter Server fails to create the merged file, it does not delete the individual output files. You cannot merge files if the session uses FTP, an external loader, or a message queue. For details on configuring a session for partitioning, see “Pipeline Partitioning” on page 345. Enter the directory name in this field. By default, the PowerCenter Server writes the merged file in the server variable directory, $PMTargetFileDir. If you enter a full directory and file name in the Merge File Name field, clear this field. Name of the merge file. Default is target_name.out. This property is required if you select Merge Partitioned Files.

Merge File Directory

Optional

Merge File Name

Optional

262

Chapter 9: Working with Targets

Table 9-5. Flat File Target Properties Target Properties Output File Directory Required/ Optional Optional Description Enter the directory name in this field. By default, the PowerCenter Server writes output files in the server variable directory, $PMTargetFileDir. If you specify both the directory and file name in the Output Filename field, clear this field. The PowerCenter Server concatenates this field with the Output Filename field when it runs the session. You can also use the $OutputFileName session parameter to specify the file directory. For details on session parameters, see “Session Parameters” on page 495. Enter the file name, or file name and path. By default, the Workflow Manager names the target file based on the target definition used in the mapping: target_name.out. If the target definition contains a slash character, the Workflow Manager replaces the slash character with an underscore. When you use an external loader to load to an Oracle database, you must specify a file extension. If you do not specify a file extension, the Oracle loader cannot find the flat file and the PowerCenter Server fails the session. For more information about external loading, see “Loading to Oracle” on page 533. Enter the file name, or file name and path. Optionally use the $OutputFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Output File Directory field when it runs the session. For details on session parameters, see “Session Parameters” on page 495. Note: If you specify an absolute path file name when using FTP, the PowerCenter Server ignores the Default Remote Directory specified in the FTP connection. When you specify an absolute path file name, do not use single or double quotes. Enter the directory name in this field. By default, the PowerCenter Server writes all reject files to the server variable directory, $PMBadFileDir. If you specify both the directory and file name in the Reject Filename field, clear this field. The PowerCenter Server concatenates this field with the Reject Filename field when it runs the session. You can also use the $BadFileName session parameter to specify the file directory. For details on session parameters, see “Session Parameters” on page 495. Enter the file name, or file name and path. By default, the PowerCenter Server names the reject file after the target instance name: target_name.bad. Optionally use the $BadFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Reject File Directory field when it runs the session. For example, if you have “C:\reject_file\” in the Reject File Directory field, and enter “filename.bad” in the Reject Filename field, the PowerCenter Server writes rejected rows to C:\reject_file\filename.bad. For details on session parameters, see “Session Parameters” on page 495. Opens a dialog box that allows you to define flat file properties. For more information, see “Configuring Fixed-Width Properties” on page 265 and “Configuring Delimited Properties” on page 266. When you output to a flat file using a relational target definition in the mapping, make sure you define the flat file properties by clicking the Set File Properties link.

Output Filename

Required

Reject File Directory

Optional

Reject Filename

Required

Set File Properties Link

Optional

Working with File Targets

263

Figure 9-10 shows the test load options in the General Options settings on the Properties tab:
Figure 9-10. Test Load Options

Test Load Options

Table 9-6 describes the test load options in the General Options settings on the Properties tab:
Table 9-6. Test Load Options Property Enable Test Load Required/ Optional Optional Description You can configure the PowerCenter Server to perform a test load. With a test load, the PowerCenter Server reads and transforms data without writing to targets. The PowerCenter Server generates all session files and performs all pre- and post-session functions, as if running the full session. The PowerCenter Server writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the PowerCenter Server does not write data to the targets. Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions using XML sources. Note: You can perform a test load for relational targets when you configure a session for normal mode. If you configure the session for bulk mode, the session fails. Enter the number of source rows you want the PowerCenter Server to test load. The PowerCenter Server reads the number you configure for the test load.

Number of Rows to Test

Optional

264

Chapter 9: Working with Targets

Configuring Fixed-Width Properties
When you output data to a fixed-width file, you can edit file properties in the session properties, such as the null character or code page. You can configure fixed-width properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot configure fixed-width properties for instances of reusable sessions in the Workflow Designer. In the Transformations view on the Mapping tab, click the Targets node and then click Set File Properties to open the Flat Files dialog box. Figure 9-11 shows the Flat Files dialog box:
Figure 9-11. Flat Files Dialog Box

To edit the fixed-width properties, select Fixed Width and click Advanced. Figure 9-12 shows the Fixed Width Properties dialog box:
Figure 9-12. Fixed Width Properties Dialog Box

Working with File Targets

265

Table 9-7 describes the options you define in the Fixed Width Properties dialog box:
Table 9-7. Writing to a Fixed-Width Target Fixed-Width Properties Options Null Character Required/ Optional Required Description Enter the character you want the PowerCenter Server to use to represent null values. You can enter any valid character in the file code page. For more information about using null characters for target files, see “Null Characters in Fixed-Width Files” on page 272. Select this option to indicate a null value by repeating the null character to fill the field. If you do not select this option, the PowerCenter Server enters a single null character at the beginning of the field to represent a null value. For more information about specifying null characters for target files, see “Null Characters in Fixed-Width Files” on page 272. Select the code page of the fixed-width file. The default setting is the client code page.

Repeat Null Character

Optional

Code Page

Required

Configuring Delimited Properties
When you output data to a delimited file, you can edit file properties in the session properties, such as the delimiter or code page. You can configure delimited properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot configure delimited properties for instances of reusable sessions in the Workflow Designer. In the Transformations view on the Mapping tab, click the Targets node and then click Set File Properties to open the Flat Files dialog box. Figure 9-13 shows the Flat Files dialog box:
Figure 9-13. Flat Files Dialog Box

To edit the delimited properties, select Delimited and click Advanced.

266

Chapter 9: Working with Targets

Figure 9-14 shows the Delimited File Properties dialog box:
Figure 9-14. Delimited File Properties Dialog Box

Table 9-8 describes the options you can define in the Delimited File Properties dialog box:
Table 9-8. Delimited File Properties Edit Delimiter Options Delimiters Required/ Optional Required Description Character used to separate columns of data. Use the button to the right of this field to enter a non-printable delimiter. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). You cannot select unprintable multibyte characters as delimiters. Select None, Single, or Double. If you select a quote character, the PowerCenter Server does not treat delimiter characters within the quote characters as a delimiter. For example, suppose an output file uses a comma as a delimiter and the PowerCenter Server receives the following row: 3423849, ‘Smith, Jenna’, ‘Rockville, MD’, 6. If you select the optional single quote character, the PowerCenter Server ignores the commas within the quotes and writes the row as four fields. If you do not select the optional single quote, the PowerCenter Server writes six separate fields. Select the code page of the delimited file. The default setting is the client code page.

Optional Quotes

Required

Code Page

Required

Working with File Targets

267

Server Handling for File Targets
When you configure a session to write to file targets, you need to know how the PowerCenter Server loads data. In the mapping, you must correctly configure your flat file target definitions and the relational target definitions you use to write to flat files. The PowerCenter Server loads data to flat files based on the following criteria:
♦ ♦

Writing to fixed-width flat files from relational target definitions. The PowerCenter Server adds spaces to target columns based on transformation datatype. Writing to fixed-width flat files from flat file target definitions. You must configure the precision and field width for flat file target definitions to accommodate the total length of the target field. Writing multibyte data to fixed-width files. You must configure the precision of string columns to accommodate character data. When writing shift-sensitive data to a fixedwidth flat file target, the PowerCenter Server adds shift characters and spaces to meet file requirements. Null characters in fixed-width files. The PowerCenter Server writes repeating or nonrepeating null characters to fixed-width target file columns differently depending on whether the characters are single- or multibyte. Character set. You can write ASCII or Unicode data to a flat file target. Writing metadata to flat file targets. You can configure the PowerCenter Server to write the column header information when you write to flat file targets.

♦ ♦

Writing to Fixed-Width Flat Files with Relational Target Definitions
When you want to output to a fixed-width file based on a relational target definition in the mapping, consider how the PowerCenter Server handles spacing in the target file. When the PowerCenter Server writes to a fixed-width flat file based on a relational target definition in the mapping, it adds spaces to columns based on the transformation datatype connected to the target. This allows the PowerCenter Server to write optional symbols necessary for the datatype, such as a negative sign or decimal point, without sending the row to the reject file. For example, you connect a transformation Integer(10) port to a Number(10) column in a relational target definition. In the session properties, you override the relational target definition to use the File Writer and you specify to output a fixed-width flat file. In the target flat file, the PowerCenter Server appends an additional byte to the Number(10) column to allow for negative signs that might be associated with Integer data.

268

Chapter 9: Working with Targets

Table 9-9 describes the number of bytes the PowerCenter Server adds to the target column and optional characters it uses for each datatype:
Table 9-9. Datatype Modifications for File Target Columns Transformation Datatype Connected to Fixed-Width Flat File Target Column Decimal Double Bytes Added by PowerCenter Server 2 7

Optional Characters for the Datatype - Negative sign (-) for the mantissa. - Decimal point (.). - Negative sign for the mantissa. - Decimal point. - Negative sign, e, and three digits for the exponent, for example, -4.2-e123. - Negative sign for the mantissa. - Decimal point. - Negative sign, e, and three digits for the exponent. - Negative sign for the mantissa. - Negative sign for the mantissa. - Decimal point. - Negative sign for the mantissa. - Decimal point. - Negative sign for the mantissa. - Decimal point. - Negative sign, e, and three digits for the exponent.

Float

7

Integer Money Numeric Real

1 2 2 7

Writing to Fixed-Width Files with Flat File Target Definitions
When you want to output to a fixed-width flat file based on a flat file target definition, you must configure precision and field width for the target field to accommodate the total length of the target field. If the data for a target field is too long for the total length of the field, the PowerCenter Server performs one of the following actions:
♦ ♦

Truncates the row for string columns Writes the row to the reject file for numeric and datetime columns

Note: When the PowerCenter Server writes a row to the reject file, it writes a message in the

session log. When a session writes to a fixed-width flat file based on a fixed-width flat file target definition in the mapping, the PowerCenter Server defines the total length of a field by the precision or field width defined in the target. Fixed-width files are byte-oriented, which means the total length of a field is measured in bytes.

Server Handling for File Targets

269

Table 9-10 describes how the PowerCenter Server measures the total field length for fields in a fixed-width flat file target definition:
Table 9-10. Field Length Measurements for Fixed-Width Flat File Targets Datatype Number String Datetime Target Field Property That Determines Total Field Length Field width Precision Field width

Table 9-11 lists the characters you must accommodate when you configure the precision or field width for flat file target definitions to accommodate the total length of the target field:
Table 9-11. Characters to Include when Calculating Field Length for Fixed-Width Targets Datatype Number Characters to Accommodate - Decimal separator. - Thousands separators. - Negative sign (-) for the mantissa. - Multibyte data. - Shift-in and shift-out characters. For more information, see “Writing Multibyte Data to Fixed-Width Flat Files” on page 270. - Date and time separators, such as slashes (/), dashes (-), and colons (:). For example, the format MM/DD/YYYY HH24:MI:SS has a total length of 19 bytes.

String

Datetime

When you edit the flat file target definition in the mapping, define the precision or field width great enough to accommodate both the target data and the characters in Table 9-11. For example, suppose you have a mapping with a fixed-width flat file target definition. The target definition contains a number column with a precision of 10 and a scale of 2. You use a comma as the decimal separator and a period as the thousands separator. You know some rows of data might have a negative value. Based on this information, you know the longest possible number is formatted with the following format:
-NN.NNN.NNN,NN

Open the flat file target definition in the mapping and define the field width for this number column as a minimum of 14 bytes. For more information on formatting numeric and datetime values, see “Working with Flat Files” in the Designer Guide.

Writing Multibyte Data to Fixed-Width Flat Files
If you plan to load multibyte data into a fixed-width flat file, configure the precision to accommodate the multibyte data. Fixed-width files are byte-oriented, not character-oriented. So, when you configure the precision for a fixed-width target, you need to consider the number of bytes you load into the target, rather than the number of characters.
270 Chapter 9: Working with Targets

For string columns, the PowerCenter Server truncates the data if the precision is not large enough to accommodate the multibyte data. You might work with the following types of multibyte data:

Non shift-sensitive multibyte data. The file contains all multibyte data. Configure the precision in the target definition to allow for the additional bytes. For example, you know that the target data contains four double-byte characters, so you define the target definition with a precision of 8 bytes. If you configure the target definition with a precision of 4, the PowerCenter Server truncates the data before writing to the target.

Shift-sensitive multibyte data. The file contains single-byte and multibyte data. When writing to a shift-sensitive flat file target, the PowerCenter Server adds shift characters and spaces to meet file requirements. You must configure the precision in the target definition to allow for the additional bytes and the shift characters. For more information, see “Writing Shift-Sensitive Multibyte Data” on page 271.

Note: Delimited files are character-oriented, and you do not need to allow for additional

precision for multibyte data.

Writing Shift-Sensitive Multibyte Data
When writing to a shift-sensitive flat file target, the PowerCenter Server adds shift characters and spaces if the data going into the target does not meet file requirements. You need to allow at least two extra bytes in each data column containing multibyte data so the output data precision matches the byte width of the target column. The PowerCenter Server writes shift characters and spaces in the following ways:
♦ ♦ ♦

If a column begins or ends with a double-byte character, the PowerCenter Server adds shift characters so the column begins and ends with a single-byte shift character. If the data is shorter than the column width, the PowerCenter Server pads the rest of the column with spaces. If the data is longer than the column width, the PowerCenter Server truncates the data so the column ends with a single-byte shift character.

To illustrate how the PowerCenter Server handles a fixed-width file containing shift-sensitive data, say you want to output the following data to the target:
SourceCol1 AAAA SourceCol2 aaaa

A

is a double-byte character, a is a single-byte character.

The first target column contains eight bytes and the second target column contains four bytes.

Server Handling for File Targets

271

The PowerCenter Server must add shift characters to handle shift-sensitive data. Since the first target column can only handle eight bytes, the PowerCenter Server truncates the data before it can add the shift characters.
TargetCol1 -oAAA-i TargetCol2 aaaa

The following table describes the notation used in this example:
Notation A -o -i Description Double-byte character Shift-out character Shift-in character

For the first target column, the PowerCenter Server writes only three of the double-byte characters to the target. It cannot write any additional double-byte characters to the output column because the column must end in a single-byte character. If you add two more bytes to the first target column definition, then the PowerCenter Server can add shift characters and write all the data without truncation. For the second target column, the PowerCenter Server writes all four single-byte characters to the target. It does not add write shift characters to the column because the column begins and ends with single-byte characters.

Null Characters in Fixed-Width Files
You can specify any valid single-byte or multibyte character as a null character for a fixedwidth target. You can also use a space as a null character. The null character can be repeating or non-repeating. If the null character is repeating, the PowerCenter Server writes as many null characters as possible into a target column. If you specify a multibyte null character and there are extra bytes left after writing null characters, the PowerCenter Server pads the column with single-byte spaces. If a column is smaller than the multibyte character specified as the null character, the session fails at initialization.

Character Set
You can configure the PowerCenter Server to run sessions with flat file targets in either ASCII or Unicode data movement mode. If you configure a session with a flat file target to run in Unicode data movement mode, the target file code page must be a superset of the PowerCenter Server code page and the source code page. Delimiters, escape, and null characters must be valid in the specified code page of the flat file. If you configure a session to run in ASCII data movement mode, delimiters, escape, and null characters must be valid in the ISO Western European Latin1 code page. Any 8-bit character you specified in previous versions of PowerCenter is still valid.
272 Chapter 9: Working with Targets

For more information about configuring and working with data movement modes and code pages, see “Globalization Overview” in the Installation and Configuration Guide.

Writing Metadata to Flat File Targets
When you write to flat file targets, you can configure the PowerCenter Server to write the column header information. When you enable the Output Metadata For Flat File Target option, the PowerCenter Server writes column headers to flat file targets. It writes the target definition port names to the flat file target in the first line, starting with the # symbol. By default, this option is disabled. When writing to fixed-width files, the PowerCenter Server truncates the target definition port name if it is longer than the column width. For example, you have the following fixed-width flat file target definition:

The column width for ITEM_ID is six. When you enable the Output Metadata For Flat File Target option, the PowerCenter Server writes the following text to a flat file:
#ITEM_ITEM_NAME 100001Screwdriver 100002Hammer 100003Small nails PRICE 9.50 12.90 3.00

For information about configuring the PowerCenter Server to output flat file metadata, see the Installation and Configuration Guide.

Server Handling for File Targets

273

Working with Heterogeneous Targets
You can output data to multiple targets in the same session. When the target types or database types of those targets differ from each other, you have a session with heterogeneous targets. To create a session with heterogeneous targets, you can create a session based on a mapping with heterogeneous targets. Or, you can create a session based on a mapping with homogeneous targets and select different database connections. A heterogeneous target has one of the following characteristics:
♦ ♦

Multiple target types. You can create a session that writes to both relational and flat file targets. Multiple target connection types. You can create a session that writes to a target on an Oracle database and to a target on a DB2 database. Or, you can create a session that writes to multiple targets of the same type, but you specify different target connections for each target in the session.

All database connections you define in the Workflow Manager are unique to the PowerCenter Server, even if you define the same connection information. For example, you define two database connections, Sales1 and Sales2. You define the same user name, password, connect string, code page, and attributes for both Sales1 and Sales2. Even though both Sales1 and Sales2 define the same connection information, the PowerCenter Server treats them as different database connections. When you create a session with two relational targets and specify Sales1 for one target and Sales2 for the other target, you create a session with heterogeneous targets. You can create a session with heterogeneous targets in one of the following ways:
♦ ♦

Create a session based on a mapping with targets of different types or different database types. In the session properties, keep the default target types and database types. Create a session based on a mapping with the same target types. However, in the session properties, specify different target connections for the different target instances, or override the target type to a different type.

You can override the target type in the session properties. However, you can only perform certain overrides. You can specify the following target type overrides in a session:
♦ ♦ ♦

Relational target to flat file. Relational target to any other relational database type. Verify the datatypes used in the target definition are compatible with both databases. SAP BW target to a flat file target type.

Note: When the PowerCenter Server runs a session with at least one relational target, it

performs database transactions per target connection group. For example, it orders the target load for targets in a target connection group when you enable constraint-based loading. For more information, see “Working with Target Connection Groups” on page 257.

274

Chapter 9: Working with Targets

Chapter 10

Understanding Commit Points
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦

Overview, 276 Target-Based Commits, 277 Source-Based Commits, 278 User-Defined Commits, 283 Understanding Transaction Control, 287 Setting Commit Properties, 292

275

Overview
A commit interval is the interval at which the PowerCenter Server commits data to targets during a session. The commit point can be a factor of the commit interval, the commit interval type, and the size of the buffer blocks. The commit interval is the number of rows you want to use as a basis for the commit point. The commit interval type is the type of rows that you want to use as a basis for the commit point. You can choose between the following commit types:

Target-based commit. The PowerCenter Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size, the commit interval, and the PowerCenter Server configuration for writer timeout. Source-based commit. The PowerCenter Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties. User-defined commit. The PowerCenter Server commits data based on transactions defined in the mapping properties. You can also configure some commit and rollback options in the session properties.

Source-based and user-defined commit sessions have partitioning restrictions. If you configure a session with multiple partitions to use source-based or user-defined commit, you can only choose pass-through partitioning at certain partition points in a pipeline. For more information, see “Specifying Partition Types” on page 356.

276

Chapter 10: Understanding Commit Points

Target-Based Commits
During a target-based commit session, the PowerCenter Server commits rows based on the number of target rows and the key constraints on the target table. The commit point depends on the following factors:
♦ ♦ ♦

Commit interval. The number of rows you want to use as a basis for commits. Configure the target commit interval in the session properties. Writer wait timeout. The amount of time the writer waits before it issues a commit. Configure the writer wait timeout in the PowerCenter Server setup. Buffer blocks. Blocks of memory that hold rows of data during a session. You can configure the buffer block size in the session properties, but you cannot configure the number of rows the block holds.

When you run a target-based commit session, the PowerCenter Server may issue a commit before, on, or after, the configured commit interval. The PowerCenter Server uses the following process to issue commits:
♦ ♦

When the PowerCenter Server reaches a commit interval, it continues to fill the writer buffer block.When the writer buffer block fills, the PowerCenter Server issues a commit. If the writer buffer fills before the commit interval, the PowerCenter Server writes to the target, but waits to issue a commit. It issues a commit when one of the following conditions is true:
− −

The writer is idle for the amount of time specified by the PowerCenter Server writer wait timeout option. The PowerCenter Server reaches the commit interval and fills another writer buffer.

For more information about configuring the writer wait timeout, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.
Note: When you choose target-based commit for a session containing an XML target, the

Workflow Manager disables the On Commit session property on the Transformations view of the Mapping tab.

Target-Based Commits

277

Source-Based Commits
During a source-based commit session, the PowerCenter Server commits data to the target based on the number of rows from some active sources in a target load order group. These rows are referred to as source rows. When the PowerCenter Server runs a source-based commit session, it identifies commit source for each pipeline in the mapping. The PowerCenter Server generates a commit row from these active sources at every commit interval. The PowerCenter Server writes the name of the transformation used for source-based commit intervals into the session log:
Source-based commit interval based on... TRANSFORMATION_NAME

The PowerCenter Server might commit less rows to the target than the number of rows produced by the active source. For example, you have a source-based commit session that passes 10,000 rows through an active source, and 3,000 rows are dropped due to transformation logic. The PowerCenter Server issues a commit to the target when the 7,000 remaining rows reach the target. The number of rows held in the writer buffers does not affect the commit point for a sourcebased commit session. For example, you have a source-based commit session that passes 10,000 rows through an active source. When those 10,000 rows reach the targets, the PowerCenter Server issues a commit. If the session completes successfully, the PowerCenter Server issues commits after 10,000, 20,000, 30,000, and 40,000 source rows. If the targets are in the same transaction control unit, the PowerCenter Server commits data to the targets at the same time. If the session fails or aborts, the PowerCenter Server rolls back all uncommitted data in a transaction control unit to the same source row. If the targets are in different transaction control units, the PowerCenter Server performs the commit when each target receives the commit row. If the session fails or aborts, the PowerCenter Server rolls back each target to the last commit point. It might not roll back to the same source row for targets in separate transaction control units. For more information on transaction control units, see “Understanding Transaction Control Units” on page 289.
Note: Source-based commit may slow session performance if the session uses a one-to-one

mapping. A one-to-one mapping is a mapping that moves data from a Source Qualifier, XML Source Qualifier, or Application Source Qualifier transformation directly to a target. For more information about performance, see “Performance Tuning” on page 635.

Determining the Commit Source
When you run a source-based commit session, the PowerCenter Server generates commits at all source qualifiers and transformations that do not propagate transaction boundaries. This includes the following active sources:
♦ ♦ ♦

Source Qualifier Application Source Qualifier MQ Source Qualifier

278

Chapter 10: Understanding Commit Points

♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

XML Source Qualifier when you only connect ports from one output group Normalizer (VSAM) Aggregator with the All Input transformation scope Joiner with the All Input transformation scope Rank with the All Input transformation scope Sorter with the All Input transformation scope Custom with one output group and with the All Input transformation scope A multiple input group transformation with one output group connected to multiple upstream transaction control points Mapplet, if it contains one of the above transformations

For more information on transformation scope and transaction control, see “Understanding Transaction Control” on page 287. For more information on active sources, see “Working with Active Sources” on page 259. A mapping can have one or more target load order groups, and a target load order group can have one or more active sources that generate commits. The PowerCenter Server uses the commits generated by the active source that is closest to the target definition. This is known as the commit source. For example, you have the mapping in Figure 10-1:
Figure 10-1. Mapping with a Single Commit Source

Transformation Scope property is All Input.

The mapping contains a Source Qualifier transformation and an Aggregator transformation with the All Input transformation scope. The Aggregator transformation is closer to the targets than the Source Qualifier transformation and is therefore used as the commit source for the source-based commit session.

Source-Based Commits

279

Also, suppose you have the mapping in Figure 10-2:
Figure 10-2. Mapping with Multiple Commit Sources

Transformation Scope property is All Input.

The mapping contains a target load order group with one source pipeline that branches from the Source Qualifier transformation to two targets. One pipeline branch contains an Aggregator transformation with the All Input transformation scope, and the other contains an Expression transformation. The PowerCenter Server identifies the Source Qualifier transformation as the commit source for t_monthly_sales and the Aggregator as the commit source for T_COMPANY_ALL. It performs a source-based commit for both targets, but uses a different commit source for each.

Switching from Source-Based to Target-Based Commit
If the PowerCenter Server identifies a target in the target load order group that does not receive commits from an active source that generates commits, it reverts to target-based commit for that target only. The PowerCenter Server writes the name of the transformation used for source-based commit intervals into the session log. When the PowerCenter Server switches to target-based commit, it writes a message in the session log. A target might not receive commits from a commit source in the following circumstances:

The target receives data from the XML Source Qualifier transformation, and you connect multiple output groups from an XML Source Qualifier transformation to downstream transformations. An XML Source Qualifier transformation does not generate commits when you connect multiple output groups downstream. The target receives data from an active source with multiple output groups other than an XML Source Qualifier transformation. For example, the target receives data from a Custom transformation that you do not configure to generate transactions. Multiple output group active sources neither generate nor propagate commits.

280

Chapter 10: Understanding Commit Points

Connecting XML Sources in a Mapping
An XML Source Qualifier transformation does not generate commits when you connect multiple output groups downstream. When you an XML Source Qualifier transformation in a mapping, the PowerCenter Server can use different commit types for targets in this session depending on the transformations used in the mapping:

You put a commit source between the XML Source Qualifier transformation and the target. The PowerCenter Server uses source-based commit for the target because it receives commits from the commit source. The active source is the commit source for the target. You do not put a commit source between the XML Source Qualifier transformation and the target. The PowerCenter Server uses target-based commit for the target because it receives no commits.

Suppose you have the mapping in Figure 10-3:
Figure 10-3. Mapping with Targets Connected to a Commit Source

Connected to an XML Source Qualifier transformation with multiple connected output groups. PowerCenter Server uses target-based commit when loading to these targets.

Connected to an active source that generates commits, AGG_Sales. PowerCenter Server uses source-based commit when loading to this target.

Transformation Scope = All Input

This mapping contains an XML Source Qualifier transformation with multiple output groups connected downstream. Because you connect multiple output groups downstream, the XML Source Qualifier transformation does not generate commits. You connect the XML Source Qualifier transformation to two relational targets, T_STORE and T_PRODUCT. Therefore, these targets do not receive any commit generated by an active source. The PowerCenter Server uses target-based commit when loading to these targets. However, the mapping includes an active source that generates commits, AGG_Sales, between the XML Source Qualifier transformation and T_YTD_SALES. The PowerCenter Server uses source-based commit when loading to T_YTD_SALES.

Source-Based Commits

281

Connecting Multiple Output Group Custom Transformations in a Mapping
Multiple output group Custom transformations that you do not configure to generate transactions neither generate nor propagate commits. Therefore, the PowerCenter Server can use different commit types for targets in this session depending on the transformations used in the mapping:

You put a commit source between the Custom transformation and the target. The PowerCenter Server uses source-based commit for the target because it receives commits from the active source. The active source is the commit source for the target. You do not put a commit source between the Custom transformation and the target. The PowerCenter Server uses target-based commit for the target because it receives no commits.

Suppose you have the mapping in Figure 10-4:
Figure 10-4. Mapping a Custom Transformation with a Commit Source Connected to a multiple output group active source, CT_XML_Parser. PowerCenter Server uses target-based commit when loading to these targets.

Connected to an active source that generates commits, AGG_store_orders. PowerCenter Server uses source-based commit when loading to this target. Transformation Scope is All Input.

The mapping contains a multiple output group Custom transformation, CT_XML_Parser, which drops the commits generated by the Source Qualifier transformation. Therefore, targets T_store_name and T_store_addr do not receive any commits generated by an active source. The PowerCenter Server uses target-based commit when loading to these targets. However, the mapping includes an active source that generates commits, AGG_store_orders, between the Custom transformation and T_store_orders. The PowerCenter Server uses source-based commit when loading to T_store_orders.
Note: You can configure a Custom transformation to generate transactions when the Custom

transformation procedure outputs transactions. When you do this, configure the session for user-defined commit. For more information on user-defined commit sessions, see “UserDefined Commits” on page 283.

282

Chapter 10: Understanding Commit Points

User-Defined Commits
During a user-defined commit session, the PowerCenter Server commits and rolls back transactions based on a row or set of rows that pass through a Transaction Control transformation. The PowerCenter Server evaluates the transaction control expression for each row that enters the transformation. The return value of the transaction control expression defines the commit or rollback point. You can use also create a user-defined commit session when the mapping contains a Custom transformation configured to generate transactions. When you do this, the procedure associated with the Custom transformation defines the transaction boundaries. When the PowerCenter Server evaluates a commit row, it commits all rows in the transaction to the target or targets. When it evaluates a rollback row, it rolls back all rows in the transaction from the target or targets. The PowerCenter Server writes a message to the session log at each commit and rollback point. The session details are cumulative. The following message is a sample commit message from the session log:
WRITER_1_1_1> WRT_8317 USER-DEFINED COMMIT POINT Wed Oct 15 08:15:29 2003

=================================================== WRT_8036 Target: TCustOrders (Instance Name: [TCustOrders]) WRT_8038 Inserted rows - Requested: 1003 Rejected: 0 Affected: 1023 Applied: 1003

When the PowerCenter Server writes all rows in a transaction to all targets, it issues commits sequentially for each target. The PowerCenter Server rolls back data based on the return value of the transaction control expression or error handling configuration. If the transaction control expression returns a rollback value, the PowerCenter Server rolls back the transaction. If an error occurs, you can choose to roll back or commit at the next commit point. If the transaction control expression evaluates to a value other than commit, rollback, or continue, the PowerCenter Server fails the session. For more information about valid values, see “Transaction Control Transformation” in the Transformation Guide. When the session completes, the PowerCenter Server may write data to the target that was not bound by commit rows. You can choose to commit at end of file or to roll back that open transaction.
Note: If you use bulk loading with a user-defined commit session, the target may not recognize

the transaction boundaries. If the target connection group does not support transactions, the PowerCenter Server writes the following message to the session log:
WRT_8234 Warning: Target Connection Group’s connection doesn’t support transactions. Targets may not be loaded according to specified transaction boundaries rules.

User-Defined Commits

283

Rolling Back Transactions
The PowerCenter Server rolls back transactions in the following circumstances:
♦ ♦ ♦ ♦

Rollback evaluation. The transaction control expression returns a rollback value. Open transaction. You choose to roll back at the end of file. Roll back on error. You choose to roll back commit transactions if the PowerCenter Server encounters a non-fatal error. Roll back on failed commit. If any target connection group in a transaction control unit fails to commit, the PowerCenter Server rolls back all uncommitted data to the last successful commit point.

For more information on transaction control units, see “Understanding Transaction Control Units” on page 289.

Rollback Evaluation
If the transaction control expression returns a rollback value, the PowerCenter Server rolls back the transaction and writes a message to the session log indicating that the transaction was rolled back. It also indicates how many rows were rolled back. The following message is a sample message that the PowerCenter Server writes to the session log when the transaction control expression returns a rollback value:
WRITER_1_1_1> WRT_8326 User-defined rollback processed WRITER_1_1_1> WRT_8331 Rollback statistics WRT_8162 =================================================== WRT_8330 Rolled back [333] inserted, [0] deleted, [0] updated rows for the target [TCustOrders]

Roll Back Open Transaction
If the last row in the transaction control expression evaluates to TC_CONTINUE_TRANSACTION, the session completes with an open transaction. If you choose to roll back that open transaction, the PowerCenter Server rolls back the transaction and writes a message to the session log indicating that the transaction was rolled back. The following message is a sample message indicating that Commit on End of File is disabled in the session properties:
WRITER_1_1_1> WRT_8168 End loading table [TCustOrders] at: Wed Nov 05 10:21:56 2003 WRITER_1_1_1> WRT_8325 Final rollback executed for the target [TCustOrders] at end of load

The following message is a sample message indicating that Commit on End of File is enabled in the session properties:
WRITER_1_1_1> WRT_8143 Commit at end of Load Order Group Wed Nov 05 08:15:29 2003

284

Chapter 10: Understanding Commit Points

Roll Back on Error
You can choose to roll back a transaction at the next commit point if the PowerCenter Server encounters a non-fatal error. When the PowerCenter Server encounters a non-fatal error, it processes the error row and continues processing the transaction. If the transaction boundary is a commit row, the PowerCenter Server rolls back the entire transaction and writes it to the reject file. The following table describes row indicators in the reject file for rolled-back transactions:
Row Indicator 4 5 6 Description Rolled-back insert Rolled-back update Rolled-back delete

Note: The PowerCenter Server does not roll back a transaction if it encounters an error before

it processes any row through the Transaction Control transformation.

Roll Back on Failed Commit
When the PowerCenter Server reaches the commit point for all targets in a transaction control unit, it issues commits sequentially for each target. If the commit fails for any target connection group within a transaction control unit, the PowerCenter Server rolls back all data to the last successful commit point. The PowerCenter Server cannot roll back committed transactions, but it does write the transactions to the reject file. For example, use the mapping in Figure 10-5 on page 286 to read through the following scenario. This mapping has one transaction control unit and three target connection groups. The target names contain information about the target connection group. For example, TCG1_T1 represents the first target connection group and the first target. 1. 2. 3. 4. 5. 6. The PowerCenter Server reaches the third commit point for all targets. It begins to issue commits sequentially for each target. The PowerCenter Server successfully commits to TCG1_T1 and TCG1_T2. The commit fails for TCG2_T3. The PowerCenter Server does not issue a commit for TCG3_T4. The PowerCenter Server rolls back TCG2_T3 and TCG3_T4 to the second commit point, but it cannot roll back TCG1_T1 and TCG1_T2 to the second commit point because it successfully committed at the third commit point. The PowerCenter Server writes the rows to the reject file from TCG2_T3 and TCG3_T4. These are the rollback rows associated with the third commit point. The PowerCenter Server writes the row to the reject file from TCG_T1 and TCG1_T2. These are the commit rows associated with the third commit point.

7. 8.

User-Defined Commits

285

Figure 10-5 illustrates PowerCenter Server behavior when it rolls back on a failed commit:
Figure 10-5. Roll Back on Failed Commit Example

Third commit is successful (3). Rows appear in the reject file (8).

Third commit fails (4). PowerCenter Server rolls back to second commit (6). Rows appear in reject file (7). PowerCenter Server does not issue third commit (5). It rolls back to second commit (6). Rows appear in reject file (7).

The following table describes row indicators in the reject file for committed transactions in a failed transaction control unit:
Row Indicator 7 8 9 Description Committed insert Committed update Committed delete

286

Chapter 10: Understanding Commit Points

Understanding Transaction Control
PowerCenter allows you to define transactions that the PowerCenter Server uses when it processes transformations, and when it commits and rolls back data at a target. You can define a transaction based on a varying number of input rows. A transaction is a set of rows bound by commit or rollback rows, the transaction boundaries. Some rows may not be bound by transaction boundaries. This set of rows is an open transaction. You can choose to commit at end of file or to roll back open transactions when you configure the session. For more information on the Commit On End of File session property, see “Setting Commit Properties” on page 292. The PowerCenter Server can process a transformation for each row at a time, for all rows in a transaction, or for all source rows together. Processing a transformation for all rows in a transaction allows you to include such transformations, such as an Aggregator, in a real-time session. For more information on configuring how the PowerCenter Server processes a transformation, see “Transformation Scope” on page 287. Transaction boundaries originate from transaction control points. A transaction control point is a transformation that defines or redefines the transaction boundary in the following ways:

Generates transaction boundaries. The transformations that define transaction boundaries differ, depending on the session commit type:

Target-based and user-defined commit. Transaction generators generate transaction boundaries. A transaction generator is a transformation that generates both commit and rollback rows. The Transaction Control and Custom transformation are transaction generators. Source-based commit. Some active sources generate commits. They do not generate rollback rows. Also, transaction generators generate commit and rollback rows. For a list of active sources that generate commits, see “Determining the Commit Source” on page 278.

Drops incoming transaction boundaries. When a transformation drops incoming transaction boundaries, and does not generate commits, the PowerCenter Server outputs all rows into an open transaction. All active sources that generate commits and transaction generators drop incoming transaction boundaries.

For a list of transaction control points, see Table 10-1 on page 288.

Transformation Scope
You can configure how the PowerCenter Server applies the transformation logic to incoming data with the Transformation Scope transformation property. When the PowerCenter Server processes a transformation, it either drops transaction boundaries or preserves transaction boundaries, depending on the transformation scope and the mapping configuration. You can choose one of the following values for the transformation scope:

Row. Applies the transformation logic to one row of data at a time. Choose Row when a row of data does not depend on any other row. When you choose Row for a
Understanding Transaction Control 287

transformation connected to multiple upstream transaction control points, the PowerCenter Server drops transaction boundaries and outputs all rows from the transformation as an open transaction. When you choose Row for a transformation connected to a single upstream transaction control point, the PowerCenter Server preserves transaction boundaries.

Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. When you choose Transaction, the PowerCenter Server preserves incoming transaction boundaries. It resets any cache, such as an aggregator or lookup cache, when it receives a new transaction. When you choose Transaction for a multiple input group transformation, you must connect all input groups to the same upstream transaction control point.

All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter Server drops incoming transaction boundaries and outputs all rows from the transformation as an open transaction. Choose All Input when a row of data depends on all rows in the source.

Table 10-1 lists the transformation scope values available for each transformation:
Table 10-1. Transformation Scope Property Values Transformation Aggregator Application Source Qualifier Custom* n/a. Transaction control point. Optional. Transaction control point or when configured to generate commits. Default. Does not display. Default. Does not display. Default. Does not display. Optional. Default. Does not display. n/a. Transaction control point. n/a. Transaction control point. Default. Does not display. Optional. Default. Transaction control point. Default. Transaction control point. Optional. Transaction control point or when configured to generate commits. Default. Transaction control point when it has one output group or when configured to generate commits. Row Transaction Optional. All Input Default. Transaction control point.

Expression External Procedure Filter Joiner Lookup MQ Source Qualifier Normalizer (VSAM) Normalizer (relational) Rank

288

Chapter 10: Understanding Commit Points

Table 10-1. Transformation Scope Property Values Transformation Router Sorter Sequence Generator Source Qualifier Stored Procedure Transaction Control Union Update Strategy XML Generator Default. Does not display. n/a. Transaction control point. Default. Does not display. Default. Does not display. Transaction control point. Default. Does not display. Default. Does not display. Optional. Transaction when the flush on commit is set to create a new document, Default. Does not display. n/a. Transaction control point. Default. Does not display. Row Default. Does not display. Optional. Default. Transaction control point. Transaction All Input

XML Parser XML Source Qualifier

*For more information on how the Transformation Scope property affects the Custom transformation, see “Custom Transformation” in the Transformation Guide.

Understanding Transaction Control Units
A transaction control unit is the group of targets connected to an active source that generates commits or an effective transaction generator. A transaction control unit may contain multiple target connection groups. For more information on target connection groups, see “Working with Target Connection Groups” on page 257. When the PowerCenter Server reaches the commit point for all targets in a transaction control unit, it issues commits sequentially for each target.

Understanding Transaction Control

289

Figure 10-6 illustrates transaction control units with a Transaction Control transformation:
Figure 10-6. Transaction Control Units

Target Connection Group 1

Transaction Control Unit 1 Target Connection Group 2

Target Connection Group 3

Target Connection Group 4

Transaction Control Unit 2

Note that T5_ora1 uses the same connection name as T1_ora1 and T2_ora1. Because T5_ora1 is connected to a separate Transaction Control transformation, it is in a separate transaction control unit and target connection group. If you connect T5_ora1 to tc_TransactionControlUnit1, it will be in the same transaction control unit as all targets, and in the same target connection group as T1_ora1 and T2_ora1.

Rules and Guidelines
Consider the following rules and guidelines when you work with transaction control:
♦ ♦ ♦

Transformations with Transaction transformation scope must receive data from a single transaction control point. The PowerCenter Server uses the transaction boundaries defined by the first upstream transaction control point for transformations with Transaction transformation scope. Transaction generators can be effective or ineffective for a target. The PowerCenter Server uses the transaction generated by an effective transaction generator when it loads data to a target. For more information on effective and ineffective transaction generators, see “Transaction Control Transformation” in the Transformation Guide. The Workflow Manager prevents you from using incremental aggregation in a session with an Aggregator transformation with Transaction transformation scope. Transformations with All Input transformation scope cause a transaction generator to become ineffective for a target in a user-defined commit session. For more information on

♦ ♦

290

Chapter 10: Understanding Commit Points

using transaction generators in mappings, see “Transaction Control Transformation” in the Transformation Guide.

The PowerCenter Server resets any cache at the beginning of each transaction for Aggregator, Joiner, Rank, and Sorter transformations with Transaction transformation scope. You can only choose the Transaction transformation scope for Joiner transformations when you use sorted input. When you add a partition point at a transformation with Transaction transformation scope, the Workflow Manager uses the pass-through partition type by default. You cannot change the partition type.

♦ ♦

Understanding Transaction Control

291

Setting Commit Properties
When you create a session, you can configure commit properties. The properties you set depend on the type of mapping and the type of commit you want the PowerCenter Server to perform. Figure 10-7 shows the session commit properties that you set in the General Options settings of the Properties tab:
Figure 10-7. Session Commit Properties

Commit Type Commit Interval Commit on End of File Roll Back Transactions on Error

Table 10-2 describes the session commit properties that you set in the General Options settings of the Properties tab:
Table 10-2. Session Commit Properties Property Commit Type Target-Based Selected by default if no transaction generator or only ineffective transaction generators are in the mapping. Default is 10,000. Source-Based Choose for source-based commit if no transaction generator or only ineffective transaction generators are in the mapping. Default is 10,000. User-Defined Selected by default if effective transaction generators are in the mapping. n/a

Commit Interval*

292

Chapter 10: Understanding Commit Points

Table 10-2. Session Commit Properties Property Commit on End of File Target-Based Commits data at the end of the file. Enabled by default. You cannot disable this option. n/a Source-Based Commits data at the end of the file. Clear this option if you want the PowerCenter Server to roll back open transactions. If the PowerCenter Server encounters a non-fatal error, you can choose to roll back the transaction at the next commit point. When the PowerCenter Server encounters a transformation error, it only rolls back the transaction if the error occurs after the effective transaction generator for the target. User-Defined Commits data at the end of the file. Clear this option if you want the PowerCenter Server to roll back open transactions. If the PowerCenter Server encounters a non-fatal error, you can choose to roll back the transaction at the next commit point. When the PowerCenter Server encounters a transformation error, it only rolls back the transaction if the error occurs after the effective transaction generator for the target.

Roll Back Transactions on Errors

*Tip: When you bulk load to Microsoft SQL Server or Oracle targets, define a large commit interval. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of bulk load transactions and increases performance.

Setting Commit Properties

293

294

Chapter 10: Understanding Commit Points

Chapter 11

Recovering Data
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 296 Preparing for Recovery, 297 Recovering a Suspended Workflow, 305 Recovering a Failed Workflow, 308 Recovering a Session Task, 311 Server Handling for Recovery, 314 Completing Unrecoverable Sessions, 316

295

Overview
If you stop a session or if an error causes a session to stop unexpectedly, refer to the session logs to determine the cause of the failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the configuration of the mapping and the session, the specific failure, and how much progress the session made before it failed. If the PowerCenter Server did not commit any data, run the session again. If the session issued at least one commit and is recoverable, consider running the session in recovery mode. Recovery allows you to restart a failed session and complete it as if the session had run without pause. When the PowerCenter Server runs in recovery mode, it continues to commit data from the point of the last successful commit. For more information on PowerCenter Server processing during recovery, see “Server Handling for Recovery” on page 314. All recovery sessions run as part of a workflow. When you recover a session, you also have the option to run part of the workflow. Consider the configuration and design of the workflow and the status of other tasks in the workflow before you choose a method of recovery. Depending on the configuration and status of the workflow and session, you can choose one or more of the following recovery methods:

Recover a suspended workflow. If the workflow suspends due to session failure, you can recover the failed session and resume the workflow. For details, see “Recovering a Suspended Workflow” on page 305. Recover a failed workflow. If the workflow fails as a result of session failure, you can recover the session and run the rest of the workflow. For details, see “Recovering a Failed Workflow” on page 308. Recover a session task. If the workflow completes, but a session fails, you can recover the session alone without running the rest of the workflow. You can also use this method to recover multiple failed sessions in a branched workflow. For details, see “Recovering a Session Task” on page 311.

For more information on session failure, see “Stopping and Aborting a Session” on page 200.

296

Chapter 11: Recovering Data

Preparing for Recovery
Before you perform recovery, you must configure the mapping, session, workflow, and target database to ensure that the recovery session will consistently read, transform, and write data as though the session had not failed. Under certain circumstances, you cannot recover the session and must run it again. For more information on completing unrecoverable sessions, see “Completing Unrecoverable Sessions” on page 316.

Configuring the Mapping
When you design a mapping, consider requirements for session recovery. Configure the mapping so that the PowerCenter Server can extract, transform, and load data with the same results each time it runs the session. Use the following guidelines when you configure the mapping:

Sort the data from the source. This guarantees that the PowerCenter Server always receives source rows in the same order. You can do this by configuring the Sorted Ports option in the Source Qualifier or Application Source Qualifier transformation or by adding a Sorter transformation configured for distinct output rows to the mapping after the source qualifier. Verify all targets receive data from transformations that produce repeatable data. Some transformations produce repeatable data. You can enable a session for recovery in the Workflow Manager when all targets in the mapping receive data from transformations that produce repeatable data. For more information on repeatable data, see “Working with Repeatable Data” on page 301.

Also, to perform consistent data recovery, the source, target, and transformation properties for the recovery session must be the same as those for the failed session. Do not change the properties of objects in the mapping before you run the recovery session.

Configuring the Session
To perform recovery on a failed session, the session must meet the following criteria:
♦ ♦

The session is enabled for recovery. The previous session run failed and the recovery information is accessible.

To enable recovery, select the Enable Recovery option in the Error Handling settings of the Configuration tab in the session properties. If you enable recovery and also choose to truncate the target for a relational normal load session, the PowerCenter Server does not truncate the target when you run the session in recovery mode. Use the following guidelines when you enable recovery for a partitioned session:

Preparing for Recovery

297

♦ ♦

The Workflow Manager configures all partition points to use the default partitioning scheme for each transformation when you enable recovery. The Workflow Manager sets the partition type to pass-through unless the transformation receiving the data is either an Aggregator transformation, a Rank transformation, or a sorted Joiner transformation. You can only enable recovery for unsorted Joiner transformations with one partition. For Custom transformations, you can enable recovery only for transformations with one input group.

♦ ♦

The PowerCenter Server disables test load when you enable the session for recovery. To perform consistent data recovery, the session properties for the recovery session must be the same as the session properties for the failed session. This includes the partitioning configuration and the session sort order.

Configuring the Workflow
The recovery method you choose for the workflow depends on the design and configuration of the workflow. As with sessions, you can configure a workflow so that you can correct errors and complete the workflow as though it ran without error. If other tasks or workflows in your environment depend on the successful completion of a session, configure the workflow containing the session to suspend on error. This is useful for sequential and concurrent sessions because it prevents the PowerCenter Server from continuing the workflow after the session fails. This is also useful if multiple concurrent sessions fail or if other workflows depend on the successful completion of the workflow. For details on recovering a suspended workflow, see “Recovering a Suspended Workflow” on page 305. If you do not want to configure the workflow to suspend on error, you can configure recoverable sessions to fail the workflow if the session fails. This prevents the PowerCenter Server from continuing to run the workflow after the session fails. In this case, you may want to perform recovery by running the part of the workflow that did not yet run. For more information, see “Recovering a Failed Workflow” on page 308. You can also allow the workflow to complete even if sessions or other tasks fail. You can then choose to recover only the failed session tasks. This allows you to recover the sessions without running previously successful tasks. For more information, see “Recovering a Session Task” on page 311.

Configuring the Target Database
When the PowerCenter Server runs a session in recovery mode, it uses information in recovery tables that it creates on the target database system. The PowerCenter Server creates the recovery tables when it runs a session enabled for recovery. If the tables already exist, the PowerCenter Server writes information to them.

298

Chapter 11: Recovering Data

The PowerCenter Server creates the following recovery tables in the target database:

PM_RECOVERY. This table records target load information during the session run. The PowerCenter Server removes the information from this table after each successful session and initializes the information at the beginning of subsequent sessions. PM_TGT_RUN_ID. This table records information the PowerCenter Server uses to identify each target on the database. The information remains in the table between session runs.

If you want the PowerCenter Server to create the recovery tables, you must grant table creation privileges to the database user name for the target database connection. If you do not want the PowerCenter Server to create the recovery tables, you must create the recovery tables manually. Do not edit or drop the recovery tables while recovery is enabled. If you want to disable recovery, the PowerCenter Server does not remove the recovery tables from the target database. You must manually remove the recovery tables. Table 11-1 describes the format of PM_RECOVERY:
Table 11-1. PM_RECOVERY Table Definition Column Name REP_GID WFLOW_ID SUBJ_ID TASK_INST_ID TGT_INST_ID PARTITION_ID TGT_RUN_ID RECOVERY_VER CHECK_POINT ROW_COUNT Datatype VARCHAR(240) NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER

Table 11-2 describes the format of PM_TGT_RUN_ID:
Table 11-2. PM_TGT_RUN_ID Table Definition Column Name LAST_TGT_RUN_ID Datatype NUMBER

Note: If you manually create the PM_TGT_RUN_ID table, you must specify a value other

than zero in the LAST_TGT_RUN_ID column to ensure that the session runs successfully in recovery mode.

Preparing for Recovery

299

Creating pmcmd Scripts
You can use pmcmd to perform recovery from the command line or in a script. When you use pmcmd commands in a script, pmcmd indicates the success or failure of the command with a return code. The following return codes apply to recovery sessions. Table 11-3 describes the return codes for pmcmd that relate to recovery:
Table 11-3. pmcmd Return Codes for Recovery Code 12 19 Description The PowerCenter Server cannot start recovery because the session or workflow is scheduled, suspending, waiting for an event, waiting, initializing, aborting, stopping, disabled, or running. The PowerCenter Server cannot start the session in recovery mode because the workflow is configured to run continuously.

For details on additional pmcmd return codes, see “pmcmd Return Codes” on page 590.

300

Chapter 11: Recovering Data

Working with Repeatable Data
You can enable a session for recovery in the Workflow Manager when all targets in the mapping receive data from transformations that produce repeatable data. All transformations have a property that determines when the transformation produces repeatable data. For most transformations, this property is hidden. However, you can write the Custom transformation procedure to output repeatable data, and then configure the Custom transformation Output Is Repeatable property to match the procedure behavior. Transformations can produce repeatable data under the following circumstances:
♦ ♦

Never. The order of the output data is inconsistent between session runs. This is the default for active Custom transformations. Based on input order. The output order is consistent between session runs when the input data order for all input groups is consistent between session runs. This is the default for passive Custom transformations. Always. The order of the output data is consistent between session runs even if the order of the input data is inconsistent between session runs. Based on transformation configuration. The transformation produces repeatable data depending on how you configure the transformation. You can always enable the session for recovery, but you may get inconsistent results depending on how you configure the transformation.

♦ ♦

Table 11-4 lists which transformations produce repeatable data:
Table 11-4. Transformations that Output Repeatable Data Transformation Source Qualifier (relational) Output is Repeatable Based on transformation configuration. Use sorted ports to produce repeatable data. Or, add a transformation that produces repeatable data immediately after the Source Qualifier transformation. If you do not do either of these options, you might get inconsistent results. Always. Based on transformation configuration. Use sorted ports for relational sources, such as Siebel sources, to produce repeatable data. Or, add a transformation that produces repeatable data immediately after the Application Source Qualifier transformation. If you do not do either of these options, you might get inconsistent results. Always. Always. Always. Based on transformation configuration. Configure the Output is Repeatable property according to the Custom transformation procedure behavior.

Source Qualifier (flat file) Application Source Qualifier

MQ Source Qualifier XML Source Qualifier Aggregator Custom

Working with Repeatable Data

301

Table 11-4. Transformations that Output Repeatable Data Transformation Expression External Procedure Filter Joiner Lookup Normalizer (VSAM) Output is Repeatable Based on input order. Based on input order. Based on input order. Based on input order. Based on input order. Always. You can enable the session for recovery, however, you might get inconsistent results if you run the session in recovery mode. The Normalizer transformation generates source data in the form of primary keys. Recovering a session might generate different values than if the session completed successfully. However, the PowerCenter Server continues to produce unique key values. Based on input order. Always. Based on input order. Based on transformation configuration. You must reset the sequence value to the value set in the failed session run. If you do not, you might get inconsistent results. Always. Based on input order. Based on input order. Based on input order. Never. Based on input order. Always. Always.

Normalizer (pipeline) Rank Router Sequence Generator

Sorter, configured for distinct output rows Sorter, not configured for distinct output rows Stored Procedure Transaction Control Union Update Strategy XML Generator XML Parser

To run a session in recovery mode, you must first enable the failed session for recovery. To enable a session for recovery, the Workflow Manager verifies all targets in the mapping receive data from transformations that produce repeatable data. The Workflow Manager uses the values in the Table 11-4 to determine whether or not you can enable a session for recovery. However, the Workflow Manager cannot verify whether or not you configure some transformations, such as the Sequence Generator transformation, correctly and always allows you to enable these sessions for recovery. You may get inconsistent results if you do not configure these transformations correctly.

302

Chapter 11: Recovering Data

You cannot enable a session for recovery in the Workflow Manager under the following circumstances:

You connect a transformation that never produces repeatable data directly to a target. To enable this session for recovery, you can add a transformation that always produces repeatable data between the transformation that never produces repeatable data and the target. You connect a transformation that never produces repeatable data directly to a transformation that produces repeatable data based on input order. To enable this session for recovery, you can add a transformation that always produces repeatable data immediately after the transformation that never produces repeatable data.

When a mapping contains a transformation that never produces repeatable data, you can add a transformation that always produces repeatable data immediately after it.
Note: In some cases, you might get inconsistent data if you run some sessions in recovery

mode. For a description of circumstances that might lead to inconsistent data, see “Completing Unrecoverable Sessions” on page 316. Figure 11-1 illustrates a mapping you can enable for recovery:
Figure 11-1. Mapping You Can Enable for Recovery

The mapping contains an Aggregator transformation that always produces repeatable data. The Aggregator transformation provides data for the Lookup and Expression transformations. Lookup and Expression transformations produce repeatable data if they receive repeatable data. Therefore, the target receives repeatable data, and you can enable this session for recovery.

Working with Repeatable Data

303

Figure 11-2 illustrates a mapping you cannot enable for recovery:
Figure 11-2. Mapping You Cannot Enable for Recovery

Never produces repeatable data. Configured for distinct output rows. Always produces repeatable data.

The mapping contains two Source Qualifier transformations that produce repeatable data. However, the mapping contains a Union and Custom transformation downstream that never produce repeatable data. The Lookup transformation only produces repeatable data if it receives repeatable data. Therefore, the target does not receive repeatable data, and you cannot enable this session for recovery. You can modify this mapping to enable the session for recovery by adding a Sorter transformation configured for distinct output rows immediately after transformations that never output repeatable data. Since the Union transformation is connected directly to another transformation that never produces repeatable data, you only need to add a Sorter transformation after the Custom transformation, as shown in the mapping in Figure 11-3:
Figure 11-3. Modified Mapping You Can Enable for Recovery

Never produces repeatable data. Configured for distinct output rows. Always produces repeatable data. Produces repeatable data based on input order.

304

Chapter 11: Recovering Data

Recovering a Suspended Workflow
You can configure the workflow to suspend if a task fails. If a session that is enabled for recovery fails, you can correct the error that caused the session to fail and resume the suspended workflow in recovery mode. When the PowerCenter Server resumes the workflow, it runs the failed session in recovery mode. If the recovery session succeeds, the PowerCenter Server runs the rest of the workflow. You can recover a suspended workflow with sequential or concurrent sessions. For workflows with either sequential or concurrent sessions, suspending the workflow on error is useful if successive tasks in the workflow depend on the success of the previous sessions. For a workflow with concurrent sessions, resuming a suspended workflow in recovery mode also allows you to simultaneously recover concurrent failed sessions. You can only resume a suspended workflow in recovery mode if a session that is enabled for recovery fails. If a session fails that is not enabled for recovery, you can resume the workflow normally. When you resume the workflow, the PowerCenter Server restarts the session. If the session succeeds, the PowerCenter Server runs the rest of the workflow. To configure the workflow to suspend on error, enable the Suspend On Error option on the General tab of the workflow properties. For more information about suspending the workflow, see “Suspending the Workflow” on page 127. For steps on recovering a suspended workflow, see “Steps for Recovering a Suspended Workflow” on page 307.

Recovering a Suspended Workflow with Sequential Sessions
When a sequential session enabled for recovery fails, the PowerCenter Server places the workflow in a suspended state. While the workflow is suspended, you can correct the error that caused the session to fail. After you correct the error, you can resume the workflow in recovery mode. When it resumes the workflow, the PowerCenter Server starts the failed session in recovery mode. If the recovery session succeeds, the PowerCenter Server runs the rest of the workflow. If the recovery session fails, the PowerCenter Server suspends the workflow again.

Example
Suppose the workflow w_ItemOrders contains two sequential sessions. In this workflow, s_ItemSales is enabled for recovery, and the workflow is configured to suspend on error.

Recovering a Suspended Workflow

305

Figure 11-4 illustrates w_ItemOrders:
Figure 11-4. Resuming a Suspended Workflow with Sequential Sessions Workflow configured to suspend on error. Session enabled for recovery.

Suppose s_ItemSales fails, and the PowerCenter Server suspends the workflow. You correct the error and resume the workflow in recovery mode. The PowerCenter Server recovers the session successfully, and then runs s_UpdateOrders. If s_UpdateOrders also fails, the PowerCenter Server suspends the workflow again. You correct the error, but you cannot resume the workflow in recovery mode because you did not enable the session for recovery. Instead, you resume the workflow. The PowerCenter Server starts s_UpdateOrders from the beginning, completes the session successfully, and then runs the StopWorkflow control task.

Recovering a Suspended Workflow with Concurrent Sessions
When a concurrent session enabled for recovery fails, the PowerCenter Server places the workflow in a suspending state while it completes any other concurrently running tasks. After concurrent tasks succeed or fail, the PowerCenter Server places the workflow in a suspended state. While the workflow is suspended, you can correct the error that caused the session to fail. If concurrent tasks failed, you can also correct those errors. After you correct the error, you can resume the workflow in recovery mode. The PowerCenter Server runs the failed session in recovery mode. If multiple concurrent sessions failed, the PowerCenter Server starts all failed sessions enabled for recovery in recovery mode, and restarts other concurrent tasks or sessions not enabled for recovery. After successful recovery or completion of all failed sessions and tasks, the PowerCenter Server completes the rest of the workflow. If a recovery session or task fails again, the PowerCenter Server suspends the workflow.

Example
Suppose you have the workflow w_ItemsDaily, containing three concurrent sessions, s_SupplierInfo, s_PromoItems, and s_ItemSales. In this workflow, s_SupplierInfo and s_PromoItems are enabled for recovery, and the workflow is configured to suspend on error.

306

Chapter 11: Recovering Data

Figure 11-5 illustrates w_ItemsDaily:
Figure 11-5. Resuming a Suspended Workflow with Concurrent Sessions Sessions enabled for recovery.

Workflow configured to suspend on error.

Suppose s_SupplierInfo fails while the PowerCenter Server is running the three sessions. The PowerCenter Server places the workflow in a suspending state and continues running the other two sessions. s_PromoItems and s_ItemSales also fail, and the PowerCenter Server then places the workflow in a suspended state. You correct the errors that caused each session to fail and then resume the workflow in recovery mode. The PowerCenter Server starts s_SupplierInfo and s_PromoItems in recovery mode. Since s_ItemSales is not enabled for recovery, it restarts the session from the beginning. The PowerCenter Server runs the three sessions concurrently. After all sessions succeed, the PowerCenter Server runs the Command task.

Steps for Recovering a Suspended Workflow
You can use the Workflow Monitor to resume a workflow in recovery mode. If the workflow or session is currently scheduled, waiting, or disabled, the PowerCenter Server cannot run the session in recovery mode. You must stop or unschedule the workflow or stop the session.
To resume a workflow or worklet in recovery mode: 1. 2.

In the Navigator, select the suspended workflow you want to resume. Choose Task-Resume/Recover. The PowerCenter Server resumes the workflow.

You can also use pmcmd to resume a workflow in recovery mode. For more information, see “Using pmcmd” on page 581.

Recovering a Suspended Workflow

307

Recovering a Failed Workflow
You can configure a session to fail the workflow if the session fails. If the session is also enabled for recovery, you can correct the error that caused the session to fail and recover the workflow from the failed session. When the PowerCenter Server recovers the workflow from the failed session, it runs the failed session in recovery mode. If the recovery session succeeds, the PowerCenter Server runs the rest of the workflow. You can recover a workflow from a failed sequential or concurrent session. You might want to fail a workflow as a result of session failure if successive tasks in the workflow depend on the success of the previous sessions. To configure a session to fail the workflow if the session fails, enable the Fail Parent If This Task Fails option on the General tab of the session properties. For more information, see “Working with Tasks” on page 131. For steps on recovering a failed workflow, see “Steps for Recovering a Failed Workflow” on page 310.

Recovering a Failed Workflow with Sequential Sessions
When a sequential session fails that is enabled for recovery and configured to fail the workflow, the PowerCenter Server fails the workflow. You can correct the error that caused the session to fail and recover the workflow from the failed session. When the PowerCenter Server recovers the workflow from the session, it runs the session in recovery mode. If the recovery session succeeds, the PowerCenter Server runs the rest of the workflow. If the recovery session fails, the PowerCenter Server fails the workflow again.

Example
Suppose the workflow w_ItemOrders contains two sequential sessions. s_ItemSales is enabled for recovery and also configured to fail the parent workflow if it fails. Figure 11-6 illustrates w_ItemOrders:
Figure 11-6. Recovering Part of a Workflow With Sequential Sessions Session enabled for recovery.

Sessions configured to fail workflow if either session fails.

308

Chapter 11: Recovering Data

Suppose s_ItemSales fails, and the PowerCenter Server fails the workflow. You correct the error and recover the workflow from s_ItemSales. The PowerCenter Server successfully recovers the session, and then runs the next task in the workflow, s_UpdateOrders. Suppose s_UpdateOrders also fails, and the PowerCenter Server fails the workflow again. You correct the error, but you cannot recover the workflow from the session. Instead, you start the workflow from the session. The PowerCenter Server starts s_UpdateOrders from the beginning, completes the session successfully, and then runs the StopWorkflow control task.

Recovering a Failed Workflow with Concurrent Sessions
When a concurrent session fails that is enabled for recovery and configured to fail the workflow, the PowerCenter Server fails the workflow. You can then correct the error that caused the session to fail and recover the workflow from the failed session. When the PowerCenter Server recovers the workflow, it runs the session in recovery mode. If the recovery session succeeds, the PowerCenter Server runs successive tasks in the workflow in the same path as the session. The PowerCenter Server does not recover or restart concurrent tasks when you recover a workflow from a failed session. If multiple concurrent sessions fail that are enabled for recovery and configured to fail the workflow, the Informatica fails the workflow when the first session fails. Concurrent sessions continue to run until they succeed or fail. After all concurrent sessions complete, you can correct the errors that caused failures. After you correct the errors, you can recover the workflow. If multiple sessions enabled for recovery fail, individually recover all but one failed session. You can then recover the workflow from the remaining failed session. This ensures that the Informatica recovers all concurrent failed sessions before it runs the rest of the workflow. For details on recovering a session individually, see “Recovering a Session Task” on page 311.

Example
Suppose the workflow w_ItemsDaily contains three concurrent sessions, s_SupplierInfo, s_PromoItems, and s_ItemSales. In this workflow, each session is enabled for recovery and configured to fail the parent workflow if the session fails. Figure 11-7 illustrates w_ItemsDaily:
Figure 11-7. Recovering Part of a Workflow with Concurrent Sessions Sessions enabled for recovery. Sessions configured to fail parent workflow if the session fails.

Recovering a Failed Workflow

309

Suppose s_SupplierInfo fails while the three concurrent sessions are running, and the PowerCenter Server fails the workflow. s_PromoItems and s_ItemSales also fail. You correct the errors that caused each session to fail. In this case, you must combine two recovery methods to run all sessions before completing the workflow. You recover s_PromoItems individually. You cannot recover s_ItemSales because it is not enabled for recovery, but you start the session from the beginning. After the PowerCenter Server successfully completes s_PromoItems and s_ItemSales, you recover the workflow from s_SupplierInfo. The PowerCenter Server runs the session in recovery mode, and then runs the Command task.

Steps for Recovering a Failed Workflow
You can use the Workflow Manager or Workflow Monitor to recover a failed workflow. If the workflow or session is currently scheduled, waiting, or disabled, the PowerCenter Server cannot run the session in recovery mode. You must stop or unschedule the workflow or stop the session.
To recover a failed workflow using the Workflow Manager: 1. 2.

Select the failed session in the Navigator or in the Workflow Designer workspace. Right-click the failed session and choose Recover Workflow from Task. The PowerCenter Server runs the failed session in recovery mode, and then runs the rest of the workflow.

To recover a failed workflow using the Workflow Monitor: 1. 2.

Select the failed session in the Navigator. Right-click the session and choose Recover Workflow From Task. or Choose Task-Recover Workflow From Task. The PowerCenter Server runs the session in recovery mode.

You can also use pmcmd to recover a failed workflow. For more information, see “Using pmcmd” on page 581.

310

Chapter 11: Recovering Data

Recovering a Session Task
If you do not configure the workflow to suspend on error, and you do not configure the workflow to fail if sessions or tasks fail, the PowerCenter Server completes the workflow even if it encounters errors. If a session fails, but other tasks in the workflow complete successfully, you may want to recover only the failed session. When the PowerCenter Server recovers a session, it runs the session in recovery mode. You can recover sequential or concurrent sessions. For workflows with sequential sessions, individually recovering a session is useful if the rest of the workflow succeeded and you need to recover the failed session. This allows you to recover the session without restarting successful tasks. For workflows with concurrent sessions, this method is useful if multiple concurrent sessions fail and also cause the workflow to fail. You can individually recover concurrent sessions and individually start subsequent tasks in the workflow paths until the paths converge at a single task. In other complex, branched workflows, individually recovering multiple failed sessions allows you to specify the order in which the sessions run.

Recovering Sequential Sessions
When a sequential session enabled for recovery fails, and the workflow is not configured to suspend or fail on error, the PowerCenter Server continues to run the workflow. You can correct the error that caused the session to fail. After you correct the error, you can individually recover the failed session. When the PowerCenter Server individually recovers a session, it runs the session in recovery mode. It does not run other tasks in the workflow.

Recovering Concurrent Sessions
When a concurrent session enabled for recovery fails, the PowerCenter Server continues to run the workflow. Other tasks and the workflow may succeed. You can correct the error that caused the session to fail. If concurrent tasks failed, you can also correct those errors. After you correct the errors, you can individually recover each session without running the rest of the workflow. If multiple concurrent sessions fail that are enabled for recovery and configured to fail the workflow on session failure, the PowerCenter Server fails the workflow. You can correct the errors that caused the sessions to fail. After you correct the errors, you can individually recover each session. Once all concurrent tasks are recovered or complete, you can start the session from a task where the concurrent paths converge.

Recovering a Session Task

311

Example
Suppose the workflow w_ItemsDaily contains three concurrently running sessions. Each session is enabled for recovery and configured to fail the workflow if the session fails. Figure 11-8 illustrates w_ItemsDaily:
Figure 11-8. Recovering Concurrent Sessions Individually Sessions enabled for recovery. Sessions configured to fail parent workflow if the session fails.

Suppose s_ItemSales fails and the PowerCenter Server fails the workflow. s_PromoItems and s_SupplierInfo also fail. You correct the errors that caused the sessions to fail. After you correct the errors, you individually recover each failed session. The PowerCenter Server successfully recovers the sessions. The workflow paths after the sessions converge at the Command task, allowing you to start the workflow from the Command task and complete the workflow. Alternatively, after you correct the errors, you could also individually recover two of the three failed sessions. After the PowerCenter Server successfully recovers the sessions, you can recover the workflow from the third session. The PowerCenter Server then recovers the third session and, on successful recovery, runs the rest of the workflow.

Steps for Recovering a Session Task
You can use the Workflow Manager or Workflow Monitor to recover a failed session in a workflow. If the workflow or session is currently scheduled, waiting, or disabled, the PowerCenter Server cannot run the session in recovery mode. You must stop or unschedule the workflow or stop the session.
To recover a failed session using the Workflow Manager: 1. 2.

Select the failed session in the Navigator or in the Workflow Designer workspace. Right-click the failed session and choose Recover Task. The PowerCenter Server runs the session in recovery mode.

To recover a failed session using the Workflow Monitor: 1.

Select the failed session in the Navigator.

312

Chapter 11: Recovering Data

2.

Right-click the session and choose Recover Task. or Choose Task-Recover Task. The PowerCenter Server runs the session in recovery mode.

You can also use pmcmd to recover a failed session. For more information, see “Using pmcmd” on page 581.

Recovering a Session Task

313

Server Handling for Recovery
The PowerCenter Server writes recovery data to relational target databases when you run a session enabled for recovery. If the session fails, the PowerCenter Server uses the recovery data to determine the point at which it continues to commit data during the recovery session.

Verifying Recovery Tables
The PowerCenter Server creates recovery information in cache files for all sessions enabled for recovery. It also creates recovery tables on the target database for relational targets during the initial session run. If the session is enabled for recovery, the PowerCenter Server creates recovery information in cache files during the normal session run. The PowerCenter Server stores the cache files in the directory specified for $PMCacheDir. The PowerCenter Server generates file names in the format PMGMD_METADATA_*.dat. Do not alter these files or remove them from the PowerCenter Server cache directory. The PowerCenter Server cannot run the recovery session if you delete the recovery cache files. If the session writes to a relational database and is enabled for recovery, the PowerCenter Server also verifies the recovery tables on the target database for all relational targets at the beginning of a normal session run. If the tables do not exist, the PowerCenter Server creates them. If the database user name the PowerCenter Server uses to connect to the target database does not have permission to create the recovery tables, you must manually create them. For information about recovery table structure, see “Configuring the Target Database” on page 298. During the session run, the PowerCenter Server writes target load information for normal load targets into the recovery tables. If the session fails, the PowerCenter Server uses this information to complete the session in recovery mode. If the session is configured to write to relational targets in bulk mode, the PowerCenter Server does not write recovery information to the recovery tables. If the session completes successfully, the PowerCenter Server deletes all recovery cache files and removes recovery table entries that are related to the session. The PowerCenter Server initializes the information in the recovery tables at the beginning of the next session run. The PowerCenter Server also uses the recovery cache files to store messages from real-time sources. For more information, see your PowerCenter Connect documentation.

Running Recovery
If a session enabled for recovery fails, you can run the session in recovery mode. The PowerCenter Server moves a recovery session through the states of a normal session: scheduled, waiting, running, succeeded, and failed. When the PowerCenter Server starts the recovery session, it runs all pre-session tasks.

314

Chapter 11: Recovering Data

For relational normal load targets, the PowerCenter Server performs incremental load recovery. It uses the recovery information created during the normal session run to determine the point at which the session stopped committing data to the target. It then continues writing data to the target. On successful recovery, the PowerCenter Server removes the recovery information from the tables. For example, if the PowerCenter Server commits 10,000 rows before the session fails, when you run the session in recovery mode, the PowerCenter Server bypasses the rows up to 10,000 and starts loading with row 10,001. If the session writes to a relational target in bulk mode, the PowerCenter Server performs the entire writer run. If the Truncate Target Table option is enabled in the session properties, the PowerCenter Server truncates the target before loading data. If the session writes to a flat file or XML file, the PowerCenter Server performs full load recovery. It overwrites the existing output file and performs the entire writer run. If the session writes to heterogeneous targets, the PowerCenter Server performs incremental load recovery for all relational normal load targets and full load recovery for all other target types. On successful recovery, the PowerCenter Server deletes recovery cache files associated with the session. It also performs all post-session tasks.

Server Handling for Recovery

315

Completing Unrecoverable Sessions
In some cases, you cannot perform recovery for a session. There may also be circumstances that cause a recovery session to fail or produce inconsistent data. If you cannot recover a session, you can run the session again. You cannot run sessions in recovery mode under the following circumstances:
♦ ♦

You change the number of partitions. If you change the number of partitions after the session fails, the recovery session fails. Recovery table is empty or missing from the target database. The PowerCenter Server fails the recovery session under the following circumstances:
− −

You deleted the table after the PowerCenter Server created it. The session enabled for recovery succeeded, and the PowerCenter Server removed the recovery information from the table.

♦ ♦

Recovery cache file is missing. The PowerCenter Server fails the recovery session if the recovery cache file is missing from the PowerCenter Server cache directory. The PowerCenter Server performing recovery is on a different operating system. The operating system of the PowerCenter Server that runs the recovery session must be the same as the operating system of the PowerCenter Server that ran the failed session. You change the partitioning configuration. If you change any partitioning options after the session fails, you may get inconsistent data. Source data is not sorted. To perform a successful recovery, the PowerCenter Server must process source rows during recovery in the same order it processes them during the initial session. Use the Sorted Ports option in the Source Qualifier transformation or add a Sorter transformation directly after the Source Qualifier transformation. The sources or targets change after the initial session failure. If you drop or create indexes, or edit data in the source or target tables before recovering a session, the PowerCenter Server may return missing or repeat rows. The session writes to a relational target in bulk mode, but the session is not configured to truncate the target table. The PowerCenter Server may load duplicate rows to the during the recovery session. The mapping uses a Normalizer transformation. The Normalizer transformation generates source data in the form of primary keys. Recovering a session might generate different values than if the session completed successfully. However, the PowerCenter Server will continue to produce unique key values. The mapping uses a Sequence Generator transformation. The Sequence Generator transformation generates source data in the form of sequence values. Recovering a session might generate different values than if the session completed successfully. If you want to ensure the same sequence data is generated during the recovery session, you can reset the value specified as the Current Value in the Sequence Generator

You might get inconsistent data if you perform recovery under the following circumstances:
♦ ♦

316

Chapter 11: Recovering Data

transformation properties to the same value used when you ran the failed session. If you do not reset the Current Value, the PowerCenter Server will continue to generate unique Sequence values.

The session performs incremental aggregation and the PowerCenter Server stops unexpectedly. If the PowerCenter Server stops unexpectedly while running an incremental aggregation session, the recovery session cannot use the incremental aggregation cache files. Rename the backup cache files for the session from PMAGG*.idx.bak and PMAGG*.dat.bak to PMAGG*.idx and PMAGG*.dat before you perform recovery. The PowerCenter Server data movement mode changes after the initial session failure. If you change the data movement mode before recovering the session, the PowerCenter Server might return incorrect data. The PowerCenter Server code page or source and target code pages change after the initial session failure. If you change the source, target, or PowerCenter Server code pages, the PowerCenter Server might return incorrect data. You can perform recovery if the new code pages are two-way compatible with the original code pages. The PowerCenter Server runs in Unicode mode and you change the session sort order. When the PowerCenter Server runs in Unicode mode, it sorts character data based on the sort order selected for the session. Do not perform recovery if you change the session sort order after the session fails.

Completing Unrecoverable Sessions

317

318

Chapter 11: Recovering Data

Chapter 12

Sending Email
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 320 Configuring Email on UNIX, 321 Configuring Email on Windows, 322 Working with Email Tasks, 328 Working with Post-Session Email, 332 Working with Suspension Email, 339 Using Email Tasks in a Workflow or Worklet, 341 Tips, 342

319

Overview
You can send email to designated recipients when the PowerCenter Server runs a workflow. For example, if you want to track how long a session takes to complete, you can configure the session to send an email containing the time and date the session starts and completes. Or, if you want the PowerCenter Server to notify you when a workflow suspends, you can configure the workflow to send email when it suspends. When you create a workflow or worklet, you can include the following types of email:

Email task. You can include reusable and non-reusable Email tasks anywhere in the workflow or worklet. For more information, see “Using Email Tasks in a Workflow or Worklet” on page 341. Post-session email. You can configure the session so the PowerCenter Server sends an email when the session completes or fails. You create an Email task and use it for postsession email. For more information, see “Working with Post-Session Email” on page 332. When you configure the subject and body of post-session email, you can use email variables to include information about the session run, such as session name, status, and the total number of records loaded. You can also use email variables to attach the session log or other files to email messages. For more information, see “Email Variables and Format Tags” on page 333.

Suspension email. You can configure the workflow so the PowerCenter Server sends an email when the workflow suspends. You create an Email task and use it for suspension email. For more information, see “Working with Suspension Email” on page 339.

Before you can configure a session or workflow to send email, you need to create an Email task. For more information, see “Working with Email Tasks” on page 328. The PowerCenter Server on Windows sends email in MIME format. This allows you to include characters in the subject and body that are not in 7-bit ASCII. For more information on the MIME format or the MIME decoding process, see your email documentation. Before creating Email tasks, configure the PowerCenter Server to send email. For more information, see “Configuring Email on UNIX” on page 321 and “Configuring Email on Windows” on page 322.

320

Chapter 12: Sending Email

Configuring Email on UNIX
The PowerCenter Server on UNIX uses rmail to send email. To send email, the repository user who starts the PowerCenter Server must have the rmail tool installed in the path. If you want to send email to more than one person, separate the email address entries with a comma. Do not put spaces between addresses.
To verify the rmail tool is accessible on AIX: 1. 2.

Log on to the UNIX system as the Informatica user who starts the PowerCenter Server. Type the following lines at the prompt and press Enter:
rmail <your fully qualified email address>,<second fully qualified email address> From <your_user_name>

3.

To indicate the end of the message, type ^D. You should receive a blank email from the email account of the user you specify in the From line. If not, locate the directory where rmail resides and add that directory to the path.

To verify the rmail tool is accessible on all other UNIX machines: 1. 2.

Log on to the UNIX system as the Informatica user who starts the PowerCenter Server. Type the following line at the prompt and press Enter:
rmail <your fully qualified email address>,<second fully qualified email address>

3.

To indicate the end of the message, type . on a line of its own and press Enter. Or, type ^D. You should receive a blank email from the email account of the Informatica user. If not, locate the directory where rmail resides and add that directory to the path.

Once you verify that rmail is installed correctly, you can send email. For more information on configuring email, see “Working with Email Tasks” on page 328.

Configuring Email on UNIX

321

Configuring Email on Windows
The PowerCenter Server on Windows uses Microsoft Outlook to send email using the MAPI interface. You must meet the following requirements to send email on a PowerCenter Server on Windows:
♦ ♦ ♦

Install the Microsoft Outlook mail client on the PowerCenter Server machine. Run Microsoft Outlook on a Microsoft Exchange Server. Create a Windows user account that has Log on as a service rights and a Microsoft Outlook profile.

To configure the PowerCenter Server on Windows to send email, you must perform the following steps: 1. 2. 3. 4. 5. Verify the Informatica Service startup account. Configure a Microsoft Outlook profile for the Informatica Service startup account. Configure Logon network security. Create distribution lists in the Personal Address Book in Microsoft Outlook. Configure the PowerCenter Server to send email using the Microsoft Outlook profile you created in step 2.

Step 1. Verify the Informatica Service Startup Account
You must have an Informatica Service startup account, which grants a user the Log on as a service right to start the Informatica Service. Verify the Informatica Service startup account so that you can create a Microsoft Outlook profile for the user who has Log on as a service right for the Informatica Service Start Account. For details on verifying service rights, see the Troubleshooting section of “Installing and Configuring the PowerCenter Server on Windows” in the Installation and Configuration Guide.

Step 2. Configure a Microsoft Outlook User
You must set up a Microsoft Outlook user for the Informatica Service startup account before configuring the PowerCenter Server to send email. The user profile must contain the following services:
♦ ♦

Microsoft Exchange Server Personal Address Book

Use the same log on name for both the Microsoft Outlook account you create and the user you grant Log on as a service rights in the Informatica Service startup account.
Note: If you do not already have a Microsoft Outlook mailbox for the Informatica Service

startup account user, ask your network administrator to create one.
322 Chapter 12: Sending Email

To configure a Microsoft Outlook user: 1. 2.

Open the Control Panel on the machine running the PowerCenter Server. Double-click the Mail (or Mail and Fax) icon.

3.

On the Services tab of the user Properties dialog box, click Show Profiles.

The Mail dialog box displays the list of profiles configured for the computer.
4.

If you have a Microsoft Outlook profile set up for the Informatica Service startup account, skip to “Step 3. Configure Logon Network Security” on page 325. If you do not already have a Microsoft Outlook profile set up for the Informatica Service startup account, continue to the next step. Click Add in the mail properties window. The Microsoft Outlook Setup Wizard appears.

5.

Configuring Email on Windows

323

6.

Select Use The Following Information Services and then select Microsoft Exchange Server. Click Next.

7.

Enter a profile name. You can enter any name, but Informatica recommends that you enter a text string that matches the Informatica Service startup account. Click Next.

324

Chapter 12: Sending Email

8.

Enter the name of the Microsoft Exchange Server. Enter your mailbox name. Click Next.

9. 10.

Indicate whether you travel with your computer. Click Next. Enter the path to your personal address book. Click Next.

11. 12. 13.

Indicate whether you want to run Outlook when you start Windows. Click Next. The Setup Wizard indicates that you have successfully configured an Outlook profile. Click Finish.

Step 3. Configure Logon Network Security
You must configure the Logon Network Security before you run the Microsoft Exchange Server Service.
To configure Logon Network Security for the Microsoft Exchange Server: 1. 2.

Open the Control Panel on the machine running the PowerCenter Server. Double-click the Mail (or Mail and Fax) icon. The User Properties sheet appears.

Configuring Email on Windows

325

3.

On the Services tab, select Microsoft Exchange Server and click Properties.

4.

Click the Advanced tab. Set the Logon network security option to NT Password Authentication.

Logon Network Security

5.

Click OK.

Step 4. Create Distribution Lists
When the PowerCenter Server runs on Windows, you can enter only one email address in the Workflow Manager. If you want to send email to multiple recipients, create a distribution list containing these addresses in the Personal Address Book in Microsoft Outlook. Enter the distribution list name as the recipient when configuring email. For more information about working with your Personal Address Book, refer to Microsoft Outlook documentation.

326

Chapter 12: Sending Email

Step 5. Configure the PowerCenter Server Setup
After you create the Microsoft Outlook profile, configure the PowerCenter Server to send email as that Microsoft Outlook user.
To configure the PowerCenter Server as a Microsoft Outlook user: 1. 2.

From the PowerCenter Server Setup, click the Configuration tab. In the MS Exchange Profile field, enter the name of the Microsoft Outlook profile you created for the Informatica Service startup account.

Microsoft Exchange Profile

Configuring Email on Windows

327

Working with Email Tasks
The Workflow Manager provides an Email task that allows you to send email during a workflow. You can create reusable Email tasks in the Task Developer for any type of email. Or, you can create non-reusable Email tasks in the Workflow and Worklet Designer. You can use Email tasks in any of the following locations:

Session properties. You can configure the session to send email when the session completes or fails. For more information, see “Working with Post-Session Email” on page 332. Workflow properties. You can configure the workflow to send email when the workflow suspends. For more information, see “Working with Suspension Email” on page 339. Workflow or worklet. You can include an Email task anywhere in the workflow or worklet to send email based on a condition you define. For more information, see “Using Email Tasks in a Workflow or Worklet” on page 341.

♦ ♦

Figure 12-1 shows the Edit Tasks dialog box for an Email task in the Task Developer:
Figure 12-1. Email Task

Email Address Tips and Guidelines
Consider the following tips and guidelines when you enter the email address in an Email task:
♦ ♦

Enter the email address using 7-bit ASCII characters only. You can enter either the $PMSuccessEmailUser or $PMFailureEmailUser server variable for post-session email. For more information, see “Using Server Variables” on page 333.

328

Chapter 12: Sending Email

If the PowerCenter Server runs on Windows, you can enter a Microsoft Exchange Profile name. The mail recipient must have an entry in the Global Address book of the Microsoft Outlook profile. If the PowerCenter Server runs on Windows, you can send email to multiple recipients by creating a distribution list in your Personal Address book. All recipients must also be in the Global Address book. You cannot enter multiple addresses separated by commas or semicolons. If the PowerCenter Server runs on UNIX, you can enter multiple email addresses separated by a comma. Do not include spaces between email addresses.

Steps to Create an Email Task
You can create Email tasks in the Task Developer, Worklet Designer, and Workflow Designer. Use the following steps to create an Email task.
To create an Email task in the Task Developer: 1.

In the Task Developer, choose Tasks-Create. The Create Task dialog box appears.

2.

Select an Email task and enter a name for the task. Click Create. The Workflow Manager creates an Email task in the workspace.

3.

Click Done.

Working with Email Tasks

329

4.

Double-click the Email task in the workspace. The Edit Tasks dialog box appears.

5. 6. 7.

Click Rename to enter a name for the task. You can optionally enter a description for the task in the Description field. Click the Properties tab.

Enter the email text.

8.

Enter the fully qualified email address of the mail recipient in the Email User Name field. For more information on entering the email address, see “Email Address Tips and Guidelines” on page 328.

330

Chapter 12: Sending Email

9. 10.

Enter the subject of the email in the Email Subject field. Or, you can leave this field blank. Click the Open button in the Email Text field to open the Email Editor.

11.

Enter the text of the email message in the Email Editor. When you use the Email task, you can incorporate format tags in your message. For more information, see “Email Variables and Format Tags” on page 333. You can leave the Email Text field blank.

12.

Click OK twice to save your changes.

Working with Email Tasks

331

Working with Post-Session Email
You can configure a session so the PowerCenter Server sends email to someone when it fails or completes a session. You can create two Email tasks, one the PowerCenter Server sends if it completes the session, and the other if it fails the session. The PowerCenter Server sends post-session email at the end of a session, after executing postsession shell commands or stored procedures. When the PowerCenter Server encounters an error sending the email, it writes a message to the server or event log. It does not fail the session. The Workflow Manager includes the following session properties to send post-session email:
♦ ♦

On-Success Email On-Failure Email

Figure 12-2 shows the On-Success and On-Failure email properties on the Components tab of the session properties:
Figure 12-2. Post-Session Email Properties

Use a reusable Email task. Select a reusable Email task. Edit the nonreusable Email task. Use a nonreusable Email task.

You can specify a reusable Email task you create in the Task Developer for either success email or failure email. Or, you can create a non-reusable Email task for each session property. When you create a non-reusable Email task for the session property, you create the Email task for that session only. You cannot use the Email task in the workflow or worklet.

332

Chapter 12: Sending Email

You cannot specify a non-reusable Email task you create in the Workflow or Worklet Designer for post-session email.
Tip: When you configure an Email task for post-session email, use the email server variables,

$PMSuccessEmailUser or $PMFailureEmailUser, for the email recipient. Verify you specify the values of the server variables for the PowerCenter Server that runs the session.

Using Server Variables
You can use server variables to address post-session email. When you register the PowerCenter Server, you can configure its server variables. You can use the following server variables for sending post-session email:

$PMSuccessEmailUser. Email address of the user to receive email when the session completes successfully. Use this variable for the Email User Name for success email only. The PowerCenter Server does not expand this variable when you use it for any other email type. $PMFailureEmailUser. Email address of the user to receive email when the session fails to complete. Use this variable for the Email User Name for failure email only. The PowerCenter Server does not expand this variable when you use it for any other email type.

When you use one of these server variables, the PowerCenter Server sends email to the address configured for the server variable. You might use this functionality when you have an administrator who troubleshoots all failed sessions. Instead of entering the administrator email address for each session, you can use the email variable $PMFailureEmailUser. If the administrator changes, you can correct all sessions by editing the $PMFailureEmailUser server variable, instead of editing the email address in each session. You might also use this functionality when you have different administrators for different PowerCenter Servers. If you deploy a folder from one repository to another or otherwise change the PowerCenter Server that runs the session, the new server automatically sends email to users associated with the new server when you use server variables instead of hard-coded email addresses.
Note: $PMSuccessEmailUser and $PMFailureEmailUser are optional server variables. Verify

you define a variable before using it to address email.

Email Variables and Format Tags
You can use email variables and format tags in an email message for post-session emails. You can use some email variables in the subject of the email. With email variables, you can include important session information in the email, such as the number of rows loaded, the session completion time, or read and write statistics. You can also attach the session log or other relevant files to the email. Use format tags in the body of the message to make the message easier to read.

Working with Post-Session Email

333

Note: The PowerCenter Server does not limit the type or size of attached files. However, since

large attachments can cause problems with your email system, avoid attaching excessively large files, such as session logs generated using verbose tracing. The PowerCenter Server generates an error message in the email if an error occurs attaching the file. Table 12-1 describes the email variables you can use in a post-session email:
Table 12-1. Email Variables for Post-Session Email Email Variable %s %e %b %c %i %l %r %t Description Session name. Session status. Session start time. Session completion time. Session elapsed time (session completion time-session start time). Total rows loaded. Total rows rejected. Source and target table details, including read throughput in bytes per second and write throughput in rows per second. The PowerCenter Server includes all information displayed in the session detail dialog box. Name of the mapping used in the session. Name of the folder containing the session. Name of the repository containing the session. Attach the session log to the message. Attach the named file. The file must be local to the PowerCenter Server. The following are valid file names: %a<c:\data\sales.txt> or %a</users/john/data/sales.txt>. Note: The file name cannot include the greater than character (>) or a line break.

%m %n %d %g %a<filename>

Note: The PowerCenter Server ignores %a, %g, or %t when you include them in the email subject. Include these variables in the email message only.

Table 12-2 lists the format tags you can use in an Email task:
Table 12-2. Format Tags for Email Tasks Formatting tab new line Format Tag \t \n

Configuring Post-Session Email
You can configure post-session email to use a reusable or non-reusable Email task.

334

Chapter 12: Sending Email

Using a Reusable Email Task
Use the following steps to configure post-session email to use a reusable Email task.
To configure post-session email to use a reusable Email task: 1.

Open the session properties and click the Components tab.

2. 3.

Select Reusable in the Type column for the success email or failure email field. Click the Open button in the Value column to select the reusable Email task.

Working with Post-Session Email

335

4. 5.

Select the Email task in the Object Browser dialog box and click OK. You can optionally edit the Email task for this session property by clicking the Edit button in the Value column. If you edit the Email task for either success email or failure email, the edits only apply to this session.

6.

Click OK to close the session properties.

Using a Non-Reusable Email Task
Follow these steps to configure success email or failure email to use a non-reusable Email task.
To configure success email or failure email to use a non-reusable Email task: 1.

Open the session properties and click the Components tab.

2.

Select Non-Reusable in the Type column for the success email or failure email field.

336

Chapter 12: Sending Email

3.

Open the email editor using the Open button.

4. 5.

Edit the Email task and click OK. For more information on editing Email tasks, see “Working with Email Tasks” on page 328. Click OK to close the session properties.

Sample Email
The following is user-entered text from a sample post-session email configuration using variables:
Session complete. Session name: %s %l %r %e %b %c %i %g

The following is sample output from the configuration above:
Session complete. Session name: sInstrTest Total Rows Loaded = 1 Total Rows Rejected = 0 Completed

Working with Post-Session Email

337

Start Time: Tue Nov 17 12:26:31 2003 Completion Time: Tue Nov 17 12:26:41 2003 Elapsed time: 0:00:10 (h:m:s)

338

Chapter 12: Sending Email

Working with Suspension Email
You can configure a workflow to send email when the PowerCenter Server suspends the workflow. For example, when a task fails, the PowerCenter Server suspends the workflow and sends the suspension email. You can fix the error and resume the workflow. If another task fails while the PowerCenter Server is suspending the workflow, you do not get the suspension email again. However, the PowerCenter Server sends another suspension email if another task fails after you resume the workflow. For more information, see “Suspending the Workflow” on page 127. Configure suspension email on the General tab of the workflow properties. Figure 12-3 shows the Suspension Email workflow options:
Figure 12-3. Suspension Email

Select a reusable Email task.

Remove the reusable Email task.

Select Suspend On Error.

To configure suspension email: 1. 2. 3.

In the Workflow Designer, open the workflow. Choose Workflows-Edit to open the workflow properties. On the General tab, select Suspend on Error.

Working with Suspension Email

339

4.

Click the Browse Emails button to select a reusable Email task.

Note: The Workflow Manager returns an error message if you do not have any reusable

Email tasks in the folder. Create a reusable Email task in the folder before you configure suspension email.
5. 6.

Choose a reusable Email task and click OK. Click OK to close the workflow properties.

340

Chapter 12: Sending Email

Using Email Tasks in a Workflow or Worklet
You can use Email tasks anywhere in a workflow or worklet. For example, you can include an Email task in a workflow after a Command task that executes a shell script. You can configure the links in the workflow or worklet so the PowerCenter Server sends you email if the Command task fails. You might want the PowerCenter Server to generate a report during a workflow and email the report to you after generating it.
Note: When you use an Email task outside of a Session task, the PowerCenter Server reads

variables related to the session as text. For example, if you use the variable %s in an Email task in the workflow, the PowerCenter Server cannot provide a session name, as it is not within a session. Figure 12-4 shows a workflow that performs this operation:
Figure 12-4. Email Task in a Workflow

Configure the gen_report Command task to execute a shell script that generates the report. Verify the shell script saves the report to a directory local to the PowerCenter Server. Configure the em_report Email task to attach the file generated from the shell script.

Using Email Tasks in a Workflow or Worklet

341

Tips
The following suggestions can extend the capabilities of Email tasks. Create generic user for sending email. Often there are multiple users who can start sessions on a PowerCenter Server. If you want to avoid entering the Microsoft Outlook profile each time the PowerCenter user changes, create a generic Microsoft Outlook profile, such as “PowerCenter,” then grant each PowerCenter user rights to send mail through this profile. Use server variables to address post-session emails. When the server variables $PMSuccessEmailUser and $PMFailureEmailUser are configured for the PowerCenter Server, use them to address post-session emails. This allows you to change the recipient of post-session emails for all sessions the server runs by editing the server variables. It can also make deploying sessions into production easier when the variables are defined for both development and production servers. Generate and send post-session reports. You can use a post-session success command to generate a report file and attach that file to a success email. For example, you create a batch file called Q3rpt.bat that generates a sales report, and you are running Microsoft Outlook on Windows. Figure 12-5 shows how you can configure the post-session success command to generate a report:
Figure 12-5. Using Post-Session Commands to Generate Reports

342

Chapter 12: Sending Email

Figure 12-6 shows how you can configure success email to attach a report file:
Figure 12-6. Using Email Variables to Attach Reports

Use email variable %a to attach the report.

Use other mail programs. If you do not have Microsoft Outlook, you can use a post-session success command to invoke a command line email program, such as WindMail. In this case, you do not have to enter the email user name or subject, since your recipients, email subject, and body text will be contained in the batch file, sendmail.bat. Figure 12-7 shows how you can configure the post-session success command to invoke a command line email program:
Figure 12-7. Sending Email without Microsoft Outlook

Tips

343

344

Chapter 12: Sending Email

Chapter 13

Pipeline Partitioning
This chapter covers the following subjects:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 346 Configuring Partitioning Information, 351 Cache Partitioning, 359 Round-Robin Partition Type, 360 Hash Keys Partition Types, 361 Key Range Partition Type, 363 Pass-Through Partition Type, 367 Database Partitioning Partition Type, 369 Partitioning Relational Sources, 371 Partitioning File Sources, 374 Partitioning Relational Targets, 378 Partitioning File Targets, 380 Partitioning Joiner Transformations, 384 Partitioning Lookup Transformations, 391 Partitioning Sorter Transformations, 392 Mapping Variables in Partitioned Pipelines, 394 Partitioning Rules, 395
345

Overview
You create a session for each mapping you want the PowerCenter Server to run. Every mapping contains one or more source pipelines. A source pipeline consists of a source qualifier and all the transformations and targets that receive data from that source qualifier. If you purchase the Partitioning option, you can specify partitioning information for each source pipeline in a mapping. The partitioning information for a pipeline controls the following factors:

The number of reader, transformation, and writer threads that the master thread creates for the pipeline. For more information, see “Understanding Processing Threads” on page 14. How the PowerCenter Server reads data from the source, including the number of connections to the source. How the PowerCenter Server distributes rows of data to each transformation as it processes the pipeline. How the PowerCenter Server writes data to the target, including the number of connections to each target in the pipeline. Location of partition points. Partition points mark the thread boundaries in a pipeline and divide the pipeline into stages. The PowerCenter Server sets partition points at several transformations in a pipeline by default. If you have the Partitioning option, you can define other partition points. When you add partition points, you increase the number of transformation threads, which can improve session performance. The PowerCenter Server can redistribute rows of data at partition points, which can also improve session performance. For more information on partition points, see “Partition Points” on page 346. Number of partitions. A partition is a pipeline stage that executes in a single thread. If you purchase the Partitioning option, you can set the number of partitions at any partition point. When you add partitions, you increase the number of processing threads, which can improve session performance. For more information, see “Number of Partitions” on page 348. Partition types. The PowerCenter Server specifies a default partition type at each partition point. If you purchase the Partitioning option, you can change the partition type. The partition type controls how the PowerCenter Server redistributes data among partitions at partition points. For more information, see “Partition Types” on page 348.

♦ ♦ ♦

You can specify partitioning information for a pipeline by setting the following attributes:

Partition Points
By default, the PowerCenter Server sets partition points at various transformations in the pipeline. Partition points mark thread boundaries as well as divide the pipeline into stages. A stage is a section of a pipeline between any two partition points. When you set a partition point at a transformation, the new pipeline stage includes that transformation.
346 Chapter 13: Pipeline Partitioning

Table 13-1 lists the partition points that the Workflow Manager creates by default:
Table 13-1. Default Partition Points Transformation (Partition Point) Source Qualifier or Normalizer transformation Rank and unsorted Aggregator transformations Target instances Default Partition Type Pass-through Hash auto-keys Pass-through Description Controls how the PowerCenter Server reads data from the source and passes data into the source qualifier. Ensures that the PowerCenter Server groups rows properly before it sends them to the transformation. Controls how the target instances pass data to the targets.

If you purchase the Partitioning option, you can add partition points at other transformations and delete some partition points. Figure 13-1 shows the default partition points and pipeline stages for a simple mapping with one source pipeline:
Figure 13-1. Default Partition Points and Stages in a Sample Mapping

*

*

*

*

Default Partition Points

First Stage

Second Stage

Third Stage Fourth Stage

The mapping in Figure 13-1 contains four stages. The partition point at the source qualifier marks the boundary between the first (reader) and second (transformation) stages. The partition point at the Aggregator transformation marks the boundary between the second and third (transformation) stages. The partition point at the target instance marks the boundary between the third (transformation) and fourth (writer) stage. When you add a partition point, you increase the number of pipeline stages by one. Similarly, when you delete a partition point, you reduce the number of stages by one. For more information, see “Understanding Processing Threads” on page 14. Besides marking stage boundaries, partition points also mark the points in the pipeline where the PowerCenter Server can redistribute data across partitions. For example, if you place a partition point at a Filter transformation and define multiple partitions, the PowerCenter Server can redistribute rows of data among the partitions before the Filter transformation processes the data. The partition type you set at this partition point controls the way in which the PowerCenter Server passes rows of data to each partition. For more information, see “Partition Types” on page 348. For more information on adding and deleting partition points, see “Adding and Deleting Partition Points” on page 353.

Overview

347

Number of Partitions
A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. By default, the PowerCenter Server defines a single partition in the source pipeline. If you purchase the Partitioning option, you can increase the number of partitions. This increases the number of processing threads, which can improve session performance. For example, you need to use the mapping in Figure 13-1 to extract data from three flat files of various sizes. To do this, you define three partitions at the source qualifier to read the data simultaneously. When you do this, the Workflow Manager defines three partitions in the pipeline. Figure 13-2 shows the threads that the master thread creates for this mapping:
Figure 13-2. Threads Created for a Sample Mapping with Three Partitions

*

*

*

*

Default Partition Points

Threads for Partition #1 Threads for Partition #2 Threads for Partition #3 3 Reader Threads (First Stage) 6 Transformation Threads (Third Stage) 3 Writer Threads (Fourth Stage)

(Second Stage)

By default, the PowerCenter Server sets the number of partitions to one. You can generally define up to 64 partitions at any partition point. However, there are situations in which you can define only one partition in the pipeline. For more information, see “Restrictions on the Number of Partitions” on page 395.
Note: Increasing the number of partitions or partition points increases the number of threads.

Therefore, increasing the number of partitions or partition points also increases the load on the server machine. If the server machine contains ample CPU bandwidth, processing rows of data in a session concurrently can increase session performance. However, if you create a large number of partitions or partition points in a session that processes large amounts of data, you can overload the system. For more information on adding and deleting partitions, see “Adding and Deleting Partitions” on page 356.

Partition Types
When you configure the partitioning information for a pipeline, you must specify a partition type at each partition point in the pipeline. The partition type determines how the PowerCenter Server redistributes data across partition points.

348

Chapter 13: Pipeline Partitioning

The Workflow Manager allows you to specify the following partition types:

Round-robin. The PowerCenter Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows. For more information, see “Round-Robin Partition Type” on page 360. Hash. The PowerCenter Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the PowerCenter Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of ports to form the partition key. Use hash partitioning where you want to ensure that the PowerCenter Server processes groups of rows with the same partition key in the same partition. For more information, see “Hash Keys Partition Types” on page 361. Key range. You specify one or more ports to form a compound partition key. The PowerCenter Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. For more information, see “Key Range Partition Type” on page 363. Pass-through. The PowerCenter Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions. For more information, see “Pass-Through Partition Type” on page 367. Database partitioning. The PowerCenter Server queries the IBM DB2 system for table partition information and loads partitioned data to the corresponding nodes in the target database. Use database partitioning with IBM DB2 targets stored on a multi-node tablespace. For more information, see “Database Partitioning Partition Type” on page 369.

You can specify different partition types at different points in the pipeline. Figure 13-3 shows a mapping where you can specify different partition types to increase session performance:
Figure 13-3. Sample Mapping

The mapping in Figure 13-3 reads data about items and calculates average wholesale costs and prices. The mapping must read item information from three flat files of various sizes, and then filter out discontinued items. It sorts the active items by description, calculates the average prices and wholesale costs, and writes the results to a relational database in which the target tables are partitioned by key range. When you use this mapping in a session, you can increase session performance by specifying different partition types at the following partition points in the pipeline:

Source qualifier. To read data from the three flat files concurrently, you must specify three partitions at the source qualifier. Accept the default partition type, pass-through.

Overview

349

Filter transformation. Since the source files vary in size, each partition processes a different amount of data. Set a partition point at the Filter transformation, and choose round-robin partitioning to balance the load going into the Filter transformation. Sorter transformation. To eliminate overlapping groups in the Sorter and Aggregator transformations, use hash auto-keys partitioning at the Sorter transformation. This causes the PowerCenter Server to group all items with the same description into the same partition before the Sorter and Aggregator transformations process the rows. You can delete the default partition point at the Aggregator transformation. Target. Since the target tables are partitioned by key range, specify key range partitioning at the target to optimize writing data to the target.

For more information on specifying partition types, see “Specifying Partition Types” on page 356.

350

Chapter 13: Pipeline Partitioning

Configuring Partitioning Information
When you create or edit a session, you can change the partitioning information for each pipeline in a mapping. If the mapping contains multiple pipelines, you can specify multiple partitions in some pipelines and single partitions in others. You update partitioning information using the Partitions view on the Mapping tab in the session properties. You can configure the following information in the Partitions view on the Mapping tab:
♦ ♦ ♦ ♦

Add and delete partition points. Enter a description for each partition. Specify the partition type at each partition point. Add a partition key and key ranges for certain partition types.

Figure 13-4 shows the configuration options on the Partitions view on the Mapping tab:
Figure 13-4. Session Properties Partitions View on the Mapping Tab Add a partition point. Delete a partition point.

Edit the selected partition point. Selected Partition Point

Partitioning Workspace

Edit Keys Specify key ranges.

Click to display Partitions view.

Configuring Partitioning Information

351

Table 13-2 describes the configuration options for the Partitions view on the Mapping tab:
Table 13-2. Options on Session Properties Partitions View on the Mapping Tab Partitions View Option Add Partition Point Delete Partition Point Description Click to add a new partition point in the mapping. When you add a partition point, the transformation name appears under the Partition Points node. Click to delete the selected partition point. You cannot delete certain partition points. For details, see “Adding and Deleting Partition Points” on page 353. Click to edit the selected partition point. This opens the Edit Partition Point dialog box. For more information on the options in this dialog box, see Table 13-3 on page 353. Displays the key and key ranges for the partition point, depending on the partition type. For key range partitioning, you specify the key ranges. For hash user keys partitioning, this field displays the partition key. The Workflow Manager does not display this area for other partition types. Click to add or remove the partition key for key range or hash user keys partitioning. You cannot create a partition key for hash auto-keys, round-robin, or pass-through partitioning.

Edit Partition Point Key Range

Edit Keys

You can configure the following information when you edit or add a partition point:
♦ ♦ ♦

Specify the partition type at the partition point. Add and delete partitions. Enter a description for each partition.

Figure 13-5 shows the configuration options in the Edit Partition Point dialog box:
Figure 13-5. Edit Partition Point Dialog Box Selected Partition Point Add a partition. Delete a partition. Select a partition. Enter the partition description.

Specify the partition type.

352

Chapter 13: Pipeline Partitioning

Table 13-3 describes the configuration options in the Edit Partition Point dialog box:
Table 13-3. Edit Partition Point Dialog Box Options Partition Options Select Partition Type Partition Names Add a Partition Description Changes the partition type. Selects individual partitions from this dialog box to configure. Adds a partition. You can add up to 64 partitions at any partition point. The number of partitions must be consistent across the pipeline. Therefore, if you define three partitions at one partition point, the Workflow Manager defines three partitions at all partition points in the pipeline. Deletes the selected partition. Each partition point must contain at least one partition. Enter an optional description for the current partition.

Delete a Partition Description

Adding and Deleting Partition Points
When you create a session, the Workflow Manager creates one partition point at the following transformations in the pipeline:

Source Qualifier or Normalizer. This partition point controls how the PowerCenter Server extracts data from the source and passes it to the source qualifier. You cannot delete this partition point. Rank and unsorted Aggregator transformations. These partition points ensure that the PowerCenter Server groups rows properly before it sends them to the transformation. You can delete these partition points if the pipeline contains only one partition or if the PowerCenter Server passes all rows in a group to a single partition before they enter the transformation. For example, in the mapping in Figure 13-3 on page 349, you can delete the default partition point at the Aggregator transformation because hash auto-keys partitioning at the Sorter transformation sends all rows that contain items with the same description to the same partition. Therefore, the Aggregator transformation receives data for all items with the same description in one partition and can calculate the average costs and prices for this item correctly.

Target instances. This partition point controls how the writer passes data to the targets. You cannot delete this partition point.

Rules for Adding and Deleting Partition Points
You can add and delete partition points at other transformations in the pipeline according to the following rules:
♦ ♦

You cannot create partition points at source instances. You cannot create partition points at Sequence Generator transformations or unconnected transformations.

Configuring Partitioning Information

353

You can add partition points at any other transformation provided that no partition point receives input from more than one pipeline stage.

Figure 13-6 shows the valid partition points in a mapping:
Figure 13-6. Sample Mapping Showing Valid Partition Points

* Valid Partition Points * * *

In this mapping, the Workflow Manager creates partition points at the source qualifier and target instance by default. You can place an additional partition point at Expression transformation EXP_3. If you place a partition point at EXP_3 and define one partition, the master thread creates the following threads:

* Partition Points * * *

Reader Thread (First Stage)

(Second Stage)

Transformation Threads (Third Stage)

Writer Thread (Fourth Stage)

In this case, each partition point receives data from only one pipeline stage, so EXP_3 is a valid partition point. The following transformations are not valid partition points:
Transformation Source Reason This is a source instance.

354

Chapter 13: Pipeline Partitioning

Transformation SG_1 EXP_1 and EXP_2

Reason This is a Sequence Generator transformation. If you could place a partition point at EXP_1 or EXP_2, you would create an additional pipeline stage that processes data from the source qualifier to EXP_1 or EXP_2. In this case, EXP_3 would receive data from two pipeline stages, which is not allowed.

For more information about processing threads, see “Understanding Processing Threads” on page 14.

Steps for Adding Partition Points
You add partition points from the Mappings tab of the session properties.
To add a partition point: 1.

On the Partitions view of the Mapping tab, select a transformation that is not already a partition point, and click the Add a Partition Point button.
Tip: You can select a transformation from the Non-Partition Points node.

2.

Select the partition type for the partition point or accept the default value. For information on specifying a valid partition type, see “Specifying Partition Types” on page 356. Click OK. The transformation appears in the Partition Points node in the Partitions view on the Mapping tab of the session properties.

3.

Configuring Partitioning Information

355

Adding and Deleting Partitions
In general, you can define up to 64 partitions at any partition point in a source pipeline. In certain circumstances, the number of partitions in the pipeline must be set to one. For more information, see “Restrictions on the Number of Partitions” on page 395. The number of partitions you specify equals the number of connections to the source or target. If the pipeline contains a relational source or target, the number of partitions at the source qualifier or target instance equals the number of connections to the database. If the pipeline contains file sources, you can configure the session to read the source with one thread or with multiple threads. For more information on connecting to relational sources and targets, see “Partitioning Relational Sources” on page 371 and “Partitioning Relational Targets” on page 378. For more information on connecting to file sources and targets, see “Partitioning File Sources” on page 374 and “Partitioning File Targets” on page 380. The number of partitions you specify remains consistent throughout the pipeline. So if you specify three partitions at any partition point, the PowerCenter Server creates three partitions at all other partition points in the pipeline.

Entering Partition Descriptions
You can enter a description for each partition you create. To enter a description, select the partition in the Edit Partition Point dialog box, and then enter the description in the Description field.

Specifying Partition Types
The Workflow Manager sets a default partition type for each partition point in the pipeline. At the source qualifier and target instance, the Workflow Manager specifies pass-through partitioning. For Rank and unsorted Aggregator transformations, for example, the Workflow Manager specifies hash auto-keys partitioning when the transformation scope is All Input. When you create a new partition point, the Workflow Manager sets the partition type to the default partition type for that transformation. You can change the default type. You must specify pass-through partitioning for all transformations that are downstream from a transaction generator or an active source that generates commits, and upstream from a target or a transformation with Transaction transformation scope. Also, if you configure the session to use constraint-based loading, you must specify pass-through partitioning for all transformations that are downstream from the last active source.

356

Chapter 13: Pipeline Partitioning

Table 13-4 lists valid partition types and the default partition type for different partition points in the pipeline:
Table 13-4. Valid Partition Types for Partition Points Transformation (Partition Point) Source definition Source Qualifier (relational sources) Source Qualifier (flat file sources) XML Source Qualifier Normalizer (COBOL sources) Normalizer (relational) Aggregator (sorted) Aggregator (unsorted) Custom Expression External Procedure Filter Joiner Lookup Rank Router Sequence Generator Sorter Stored Procedure Transaction Control Union Update Strategy X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X RoundRobin Hash Auto-Keys Hash User Keys Key Range PassThrough Database Partitioning Default Partition Type Not a valid partition point Pass-through Pass-through Pass-through Pass-through Pass-through Pass-through Based on transformation scope* Pass-through Pass-through Pass-through Pass-through Based on transformation scope* Pass-through Based on transformation scope* Pass-through Not a valid partition point Based on transformation scope* Pass-through Pass-through Pass-through Pass-through

Configuring Partitioning Information

357

Table 13-4. Valid Partition Types for Partition Points Transformation (Partition Point) Unconnected transformation Relational target definition X X X X X (DB2 targets only) RoundRobin Hash Auto-Keys Hash User Keys Key Range PassThrough Database Partitioning Default Partition Type Not a valid partition point Pass-through The default for DB2 targets is database partitioning Pass-through Not a valid partition point

Flat file target definition XML target definition

X

X

X

X

* The default partition type is pass-through when the transformation scope is Transaction, and hash auto-keys when the transformation scope is All Input.

Adding Keys and Key Ranges
If you select key range or hash user keys partitioning at any partition point, you need to specify a partition key. The PowerCenter Server uses the key to pass rows to the appropriate partition. For example, if you specify key range partitioning at a Source Qualifier transformation, the PowerCenter Server uses the key and ranges to create the WHERE clause when it selects data from the source. Therefore, you can have the PowerCenter Server pass all rows that contain customer IDs less than 135000 to one partition and all rows that contain customer IDs greater than or equal to 135000 to another partition. For more information, see “Key Range Partition Type” on page 363. If you specify hash user keys partitioning at a transformation, the PowerCenter Server uses the key to group data based on the ports you select as the key. For example, if you specify ITEM_DESC as the hash key, the PowerCenter Server distributes data so that all rows that contain items with the same description go to the same partition. For more information, see “Hash Keys Partition Types” on page 361.

358

Chapter 13: Pipeline Partitioning

Cache Partitioning
When you create a session with multiple partitions, the PowerCenter Server can partition caches for the Aggregator, Joiner, Lookup, and Rank transformations. It creates a separate cache for each partition, and each partition works with only the rows needed by that partition. As a result, the PowerCenter Server requires only a portion of total cache memory for each partition. When you run a session, the PowerCenter Server accesses the cache in parallel for each partition. After you configure the session for partitioning, you can configure memory requirements and cache directories for each transformation in the Transformations view on the Mapping tab of the session properties. To configure the memory requirements, calculate the total requirements for a transformation, and divide by the number of partitions. To further improve performance, you can configure separate directories for each partition. The guidelines for cache partitioning is different for each cached transformation:

Aggregator transformation. The PowerCenter Server uses cache partitioning for any multi-partitioned session with an Aggregator transformation. You do not have to set a partition point at the Aggregator transformation. Joiner transformation. The PowerCenter Server uses cache partitioning when you create a partition point at the Joiner transformation. For more information about partitioning with Joiner transformations, see “Partitioning Joiner Transformations” on page 384. Lookup transformation. The PowerCenter Server uses cache partitioning when you create a hash auto-keys partition point at the Lookup transformation. For more information about partitioning with Lookup transformations, see “Partitioning Lookup Transformations” on page 391. Rank transformation. The PowerCenter Server uses cache partitioning for any multipartitioned session with a Rank transformation. You do not have to set a partition point at the Rank transformation.

For more caching information, see “Session Caches” on page 613.

Cache Partitioning

359

Round-Robin Partition Type
In round-robin partitioning, the PowerCenter Server distributes rows of data evenly to all partitions. Each partition processes approximately the same number of rows. Table 13-4 on page 357 lists the partition points where you can specify round-robin partitioning. Use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions. In a pipeline that reads data from file sources of different sizes, you can use round-robin partitioning to ensure that each partition receives approximately the same number of rows. Figure 13-7 shows a mapping where round-robin partitioning helps distribute rows before they enter a Filter transformation:
Figure 13-7. Mapping where Round-robin Partitioning Can Increase Performance

Round-robin partitioning distributes data evenly at the Filter transformation.

The session based on this mapping reads item information from three flat files of different sizes:
♦ ♦ ♦

Source file 1: 80,000 rows Source file 2: 5,000 rows Source file 3: 15,000 rows

When the PowerCenter Server reads the source data, the first partition begins processing 80% of the data, the second partition processes 5% of the data, and the third partition processes 15% of the data. To distribute the workload more evenly, set a partition point at the Filter transformation and set the partition type to round-robin. The PowerCenter Server distributes the data so that each partition processes approximately one third of the data.

360

Chapter 13: Pipeline Partitioning

Hash Keys Partition Types
In hash partitioning, the PowerCenter Server uses a hash function to group rows of data among partitions. The PowerCenter Server groups the data based on a partition key. Use hash partitioning when you want the PowerCenter Server to distribute rows to the partitions by group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number. There are two types of hash partitioning:

Hash auto-keys. The PowerCenter Server uses all grouped or sorted ports as a compound partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and unsorted Aggregator transformations. Hash user keys. You specify a number of ports to generate the partition key.

Table 13-4 on page 357 lists the partition points where you can specify hash partitioning.

Hash Auto-Keys
You can use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations. Figure 13-8 shows a mapping where hash auto-keys partitioning causes the PowerCenter Server to distribute rows to each partition according to group before they enter the Sorter and Aggregator transformations:
Figure 13-8. Mapping where Hash Partitioning Can Increase Performance

Hash auto-keys partitioning groups data at the Sorter.

In this mapping, the Sorter transformation sorts items by item description. If items with the same description exist in more than one source file, each partition will contain items with the same description. Without hash auto-keys partitioning, the Aggregator transformation might calculate average costs and prices for each item incorrectly. To prevent errors in the cost and prices calculations, set a partition point at the Sorter transformation and set the partition type to hash auto-keys. When you do this, the PowerCenter Server redistributes the data so that all items with the same description reach the Sorter and Aggregator transformations in a single partition.

Hash Keys Partition Types

361

Hash User Keys
In hash user keys partitioning, the PowerCenter Server uses a hash function to group rows of data among partitions based on a user-defined partition key. You choose the ports that define the partition key. In the mapping in Figure 13-8 on page 361, if you specify hash auto-keys partitioning, the Sorter transformation receives rows of data grouped by the sort key, such as ITEM_DESC. If the item descriptions are long, and you know that each item has a unique ID number, you can specify hash user keys partitioning at the Sorter transformation and select ITEM_ID as the hash key. This may improve the performance of the session since the hash function usually processes numerical data more quickly than string data.

Adding a Hash Key
If you select hash user keys partitioning at any partition point, you must specify a hash key. The PowerCenter Server uses the hash key to distribute rows to the appropriate partition according to group. To specify the hash key, select the partition point on the Partitions view of the Mapping tab, and click Edit Keys. This displays the Edit Partition Key dialog box. The Available Ports list displays the connected input and input/output ports in the transformation. To specify the hash key, select one or more ports from this list, and then click Add. Figure 13-9 shows one port selected as the hash key for a Filter transformation:
Figure 13-9. Edit Partition Key Dialog Box

Rearrange selected ports.

To rearrange the order of the ports that make up the key, select a port in the Selected Ports list and click the up or down arrow.

362

Chapter 13: Pipeline Partitioning

Key Range Partition Type
With key range partitioning, the PowerCenter Server distributes rows of data based on a port or set of ports that you specify as the partition key. For each port, you define a range of values. The PowerCenter Server uses the key and ranges to send rows to the appropriate partition. Table 13-4 on page 357 lists the partition points where you can specify key range partitioning. Use key range partitioning in mappings where the source and target tables are partitioned by key range. Figure 13-10 shows a mapping where key range partitioning can optimize writing to the target table:
Figure 13-10. Mapping where Key Range Partitioning Can Increase Performance

Key range partitioning at the target optimizes writing to the target tables.

The target table in the database is partitioned by ITEM_ID as follows:
♦ ♦ ♦

Partition 1: 0001–2999 Partition 2: 3000–5999 Partition 3: 6000–9999 Set the partition type at the target instance to key range. Create three partitions. Choose ITEM_ID as the partition key. The PowerCenter Server uses this key to pass data to the appropriate partition.

To optimize writing to the target table, perform the following tasks: 1. 2. 3.

4.

Set the key ranges as follows:
ITEM_ID Partition #1 Partition #2 Partition #3 3000 6000 Start Range End Range 3000 6000

When you do this, the PowerCenter Server sends all items with IDs less than 3000 to the first partition. It sends all items with IDs between 3000 and 5999 to the second partition. Items with IDs greater than or equal to 6000 go to the third partition. For more information on key ranges, see “Adding Key Ranges” on page 365.
Key Range Partition Type 363

Adding a Partition Key
To specify the partition key for key range partitioning, select the partition point on the Partitions view of the Mapping tab, and click Edit Keys. This displays the Edit Partition Key dialog box. The Available Ports list displays the connected input and input/output ports in the transformation. To specify the partition key, select one or more ports from this list, and then click Add. Figure 13-11 shows one port selected as the partition key for the target table T_ITEM_PRICES:
Figure 13-11. Edit Partition Key Dialog Box

Rearrange the selected ports.

To rearrange the order of the ports that make up the partition key, select a port in the Selected Ports list and click the up or down arrow. In key range partitioning, the order of the ports does not affect how the PowerCenter Server redistributes rows among partitions, but it can affect session performance. For example, you might configure the following compound partition key:
Selected Ports ITEMS.DESCRIPTION ITEMS.DISCONTINUED_FLAG

Since boolean comparisons are usually faster than string comparisons, the session may run faster if you arrange the ports in the following order:
Selected Ports ITEMS.DISCONTINUED_FLAG ITEMS.DESCRIPTION

364

Chapter 13: Pipeline Partitioning

Adding Key Ranges
After you identify the ports that make up the partition key, you must enter the ranges for each port on the Partitions view of the Mapping tab. Figure 13-12 shows where you enter key ranges on the Partitions view of the Mapping tab:
Figure 13-12. Adding Key Ranges

Specify key ranges.

You can leave the start or end range blank for a partition. When you leave the start range blank, the PowerCenter Server uses the minimum data value as the start range. When you leave the end range blank, the PowerCenter Server uses the maximum data value as the end range. For example, you can add the following ranges for a key based on CUSTOMER_ID in a pipeline that contains two partitions:
CUSTOMER_ID Partition #1 Partition #2 135000 Start Range End Range 135000

When the PowerCenter Server reads the Customers table, it sends all rows that contain customer IDs less than 135000 to the first partition, and all rows that contain customer IDs equal to or greater than 135000 to the second partition. The PowerCenter Server eliminates rows that contain null values or values that fall outside the key ranges.

Key Range Partition Type

365

When you configure a pipeline to load data to a relational target, if a row contains null values in any column that makes up the partition key or if a row contains a value that fall outside all of the key ranges, the PowerCenter Server sends that row to the first partition. When you configure a pipeline to read data from a relational source, the PowerCenter Server reads rows that fall within the key ranges. It does not read rows with null values in any partition key column. If you want to read rows with null values in the partition key, use pass-through partitioning and create a SQL override. Consider the following guidelines when you create key ranges:
♦ ♦ ♦ ♦ ♦

The partition key must contain at least one port. You must specify a range for each port. Use the standard PowerCenter date format to enter dates in key ranges. The Workflow Manager does not validate overlapping string or numeric ranges. The Workflow Manager does not validate gaps or missing ranges.

Adding Filter Conditions
If you specify key range partitioning for a relational source, you can specify optional filter conditions or override the SQL query. For details, see “Partitioning Relational Sources” on page 371.

366

Chapter 13: Pipeline Partitioning

Pass-Through Partition Type
In pass-through partitioning, the PowerCenter Server processes data without redistributing rows among partitions. Therefore, all rows in a single partition stay in that partition after crossing a pass-through partition point. When you add a partition point to a pipeline, the master thread creates an additional pipeline stage. Use pass-through partitioning when you want to increase data throughput, but you cannot or do not want to increase the number of partitions. You can specify pass-through partitioning at any valid partition point in a pipeline. Figure 13-13 shows a mapping where pass-through partitioning can increase data throughput:
Figure 13-13. Mapping where Pass-through Partitioning Can Increase Performance

Reader Thread (First Stage)

Transformation Thread (Second Stage)

Writer Thread (Third Stage)

By default, this mapping contains partition points only at the source qualifier and target instance. Since this mapping contains an XML target, you can configure only one partition at any partition point. In this case, the master thread creates one reader thread to read data from the source, one transformation thread to process the data, and one writer thread to write data to the target. Each pipeline stage processes the rows as follows:
Time Source Qualifier (First Stage) Row Set 1 Row Set 2 Row Set 3 Row Set 4 ... Row Set n Transformations (Second Stage) – Row Set 1 Row Set 2 Row Set 3 ... Row Set n-1 Target Instance (Third Stage) – – Row Set 1 Row Set 2 ... Row Set n-2

Because the pipeline contains three stages, the PowerCenter Server can process three sets of rows concurrently. If the Expression transformations are very complicated, processing the second (transformation) stage can take a long time and cause low data throughput. To improve performance, set a partition point at Expression transformation EXP_2 and set the partition

Pass-Through Partition Type

367

type to pass-through. This creates an additional pipeline stage. The master thread creates an additional transformation thread:

Reader Thread (First Stage)

(Second Stage)

Transformation Threads (Third Stage)

Writer Thread (Fourth Stage)

The PowerCenter Server can now process four sets of rows concurrently as follows:
Source Qualifier (First Stage) Row Set 1 Row Set 2 Row Set 3 Row Set 4 ... Row Set n FIL_1 & EXP_1 Transformations (Second Stage) Row Set 1 Row Set 2 Row Set 3 ... Row Set n-1 EXP_2 & LKP_1 Transformations (Third Stage) Row Set 1 Row Set 2 ... Row Set n-2 Target Instance (Fourth Stage) Row Set 1 ... Row Set n-3

Time

By adding an additional partition point at Expression transformation EXP_2, you replace one long running transformation stage with two shorter running transformation stages. Data throughput depends on the longest running stage. So in this case, data throughput increases. For more information about processing threads, see “Understanding Processing Threads” on page 14.

368

Chapter 13: Pipeline Partitioning

Database Partitioning Partition Type
When you load to an IBM DB2 table stored on a multi-node tablespace, you can optimize session performance by using the database partitioning partition type instead of the passthrough partition type for IBM DB2 targets. When you use database partitioning, the PowerCenter Server queries the DB2 system for table partition information and loads partitioned data to the corresponding nodes in the target database. You can only specify database partitioning for relational targets. You can specify database partitioning for the target partition type with any number of pipeline partitions and any number of database nodes. However, you can improve load performance further when the number of pipeline partitions equals the number of database nodes. Use the following rules and guidelines when you use database partitioning:

By default, the PowerCenter Server fails the session when you use database partitioning for non-DB2 targets. However, you can configure the PowerCenter Server to default to passthrough partitioning when you use database partitioning for non-DB2 relational targets:
− −

On Windows. Select the Treat Database Partitioning as Pass-Through option on the Configuration tab of the PowerCenter Server setup. By default, this option is disabled. On UNIX. Add the following entry to the file pmserver.cfg:
TreatDBPartitionAsPassThrough=Yes

♦ ♦ ♦

You cannot use database partitioning when you configure the session to use source-based or user-defined commit, constraint-based loading, or session recovery. The target table must contain a partition key. Also, you must link all not-null partition key columns in the target instance to a transformation in the mapping. You must use high precision mode when the IBM DB2 table partitioning key uses a Bigint field. The PowerCenter Server fails the session when the IBM DB2 table partitioning key uses a Bigint field and you use low precision mode. If you create multiple partitions for a DB2 bulk load session, you must use database partitioning for the target partition type. If you choose any other partition type, the PowerCenter Server reverts to normal load and writes the following message to the session log:
ODL_26097 Only database partitioning is support for DB2 bulk load. Changing target load type variable to Normal.

If you configure a session for database partitioning, the PowerCenter Server reverts to passthrough partitioning under the following circumstances:
♦ ♦

The DB2 target table is stored on one node. You run the session in debug mode using the Debugger.

Database Partitioning Partition Type

369

You configure the PowerCenter Server to treat the database partitioning partition type as pass-through partitioning and you used database partitioning for a non-DB2 relational target.

370

Chapter 13: Pipeline Partitioning

Partitioning Relational Sources
When you run a session that partitions relational or Application sources, the PowerCenter Server creates a separate connection to the source database for each partition. It then creates an SQL query for each partition. You can customize the query for each source partition by entering filter conditions in the Transformation view on the Mapping tab. You can also override the SQL query for each source partition using the Transformations view on the Mapping tab. Figure 13-14 shows where you can override the SQL query for each source partition:
Figure 13-14. Overriding the SQL Query and Entering a Filter Condition

Browse Button Enter SQL overrides. Enter filter conditions. Transformations View

For more information about partitioning Application sources, refer to the PowerCenter Connect documentation.

Entering an SQL Query
You can enter an SQL override if you want to customize the SELECT statement in the SQL query. The SQL statement you enter on the Transformations view of the Mapping tab overrides any customized SQL query that you set in the Designer when you configure the Source Qualifier transformation. For more information, see “Source Qualifier Transformation” in the Transformation Guide.

Partitioning Relational Sources

371

The SQL query also overrides any key range and filter condition that you enter for a source partition. So, if you also enter a key range and source filter, the PowerCenter Server uses the SQL query override to extract source data. If you create a key that contains null values, you can extract the nulls by creating another partition and entering an SQL query or filter to extract null values. To enter an SQL query for each partition, click the Browse button in the SQL Query field. Enter the query in the SQL Editor dialog box, and then click OK. If you entered an SQL query in the Designer when you configured the Source Qualifier transformation, that query appears in the SQL Query field for each partition. To override this query, click the Browse button in the SQL Query field, revise the query in the SQL Editor dialog box, and then click OK.

Entering a Filter Condition
If you specify key range partitioning at a relational source qualifier, you can enter an additional filter condition. When you do this, the PowerCenter Server generates a WHERE clause that includes the filter condition you enter in the session properties. The filter condition you enter on the Transformations view of the Mapping tab overrides any filter condition that you set in the Designer when you configure the Source Qualifier transformation. For more information, see “Source Qualifier Transformation” in the Transformation Guide. If you use key range partitioning, the filter condition works in conjunction with the key ranges. For example, you want to select data based on customer ID, but you do not want to extract information for customers outside the USA. Define the following key ranges:
CUSTOMER_ID Partition #1 Partition #2 135000 Start Range End Range 135000

If you know that the IDs for customers outside the USA fall within the range for a particular partition, you can enter a filter in that partition to exclude them. Therefore, you enter the following filter condition for the second partition:
CUSTOMERS.COUNTRY = ‘USA’

When the session runs, the following queries for the two partitions appear in the session log:
READER_1_1_1> RR_4010 SQ instance [SQ_CUSTOMERS] SQL Query [SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.LAST_NAME FROM CUSTOMERS WHERE CUSTOMER.CUSTOMER ID < 135000] [...] READER_1_1_2> RR_4010 SQ instance [SQ_CUSTOMERS] SQL Query [SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.LAST_NAME FROM CUSTOMERS WHERE CUSTOMERS.COUNTRY = ‘USA’ AND 135000 <= CUSTOMERS.CUSTOMER_ID]

372

Chapter 13: Pipeline Partitioning

To enter a filter condition, click the Browse button in the Source Filter field. Enter the filter condition in the SQL Editor dialog box, and then click OK. If you entered a filter condition in the Designer when you configured the Source Qualifier transformation, that query appears in the Source Filter field for each partition. To override this filter, click the Browse button in the Source Filter field, change the filter condition in the SQL Editor dialog box, and then click OK.

Partitioning Relational Sources

373

Partitioning File Sources
When a session uses a file source, you can configure it to read the source with one thread or with multiple threads. The PowerCenter Server creates one connection to the file source when you configure the session to read with one thread, and it creates multiple concurrent connections to the file source when you configure the session to read with multiple threads. Configure the source file name property for partitions 2-n to specify single- or multi-threaded reading. To configure for single-threaded reading, pass empty data through partitions 2-n. To configure for multi-threaded reading, leave the source file name blank for partitions 2-n. For more information about configuring file properties with multiple partitions, see “Configuring for File Partitioning” on page 375.

Guidelines for Partitioning File Sources
Use the following guidelines when you configure a file source session with multiple partitions:
♦ ♦ ♦ ♦ ♦

You can use pass-through partitioning at the source qualifier. You can use single- or multi-threaded reading with flat file or COBOL sources. You can use single-threaded reading with XML sources. You cannot use multi-threaded reading if the source files are non-disk files, such as FTP files or IBM MQSeries sources. If you use a shift-sensitive code page, you can use multi-threaded reading only if the following conditions are true:
− − −

The file is fixed-width. The file is not line sequential. You did not enable user-defined shift state in the source definition.

If you configure a session for multi-threaded reading, and the PowerCenter Server cannot create multiple threads to a file source, it writes a message to the session log and reads the source with one thread. When the PowerCenter Server uses multiple threads to read a source file, it may not read the rows in the file sequentially. If sort order is important, configure the session to read the file with a single thread. For example, sort order may be important if the mapping contains a sorted Joiner transformation and the file source is the sort origin. You can also use a combination of direct and indirect files to balance the load. Session performance for multi-threaded reading is optimal with large source files. Although the PowerCenter Server can create multiple connections to small source files, performance may not be optimal.

♦ ♦

374

Chapter 13: Pipeline Partitioning

Using One Thread to Read a File Source
When the PowerCenter Server uses one thread to read a file source, it creates one connection to the source. The PowerCenter Server reads the rows in the file or file list sequentially. You can configure single-threaded reading for direct or indirect file sources in a session:

Reading direct files. You can configure the PowerCenter Server to read from one or more direct files. If you configure the session with more than one direct file, the PowerCenter Server creates a concurrent connection to each file. It does not create multiple connections to a file. Reading indirect files. When the PowerCenter Server reads an indirect file, it reads the file list and reads the files in the list sequentially. If the session has more than one file list, the PowerCenter Server reads the file lists concurrently, and it reads the files in the list sequentially.

Using Multiple Threads to Read a File Source
When the PowerCenter Server uses multiple threads to read a source file, it creates multiple concurrent connections to the source. The PowerCenter Server may or may not read the rows in a file sequentially. You can configure multi-threaded reading for direct or indirect file sources in a session:

Reading direct files. When the PowerCenter Server reads a direct file, it creates multiple reader threads to read the file concurrently. You can configure the PowerCenter Server to read from one or more direct files. For example, if a session reads from two files and you create five partitions, the PowerCenter Server may distribute one file among two partitions and one file among three partitions. Reading indirect files. When the PowerCenter Server reads an indirect file, it creates multiple threads to read the file list concurrently. It also creates multiple threads to read the files in the list concurrently. The PowerCenter Server may use more than one thread to read a single file.

Configuring for File Partitioning
After you create partition points and configure partitioning information, you can configure source connection settings and file properties on the Transformations view of the Mapping tab. Click the source instance name you want to configure under the Sources node. When you click the source instance name for a file source, the Workflow Manager displays connection and file properties in the session properties. You can configure the source file names and directories for each source partition. The Workflow Manager generates a file name and location for each partition.

Partitioning File Sources

375

Table 13-5 describes the file properties settings for file sources in a mapping:
Table 13-5. File Properties Settings for File Sources Attribute Value Source File Directory Source File Name Description Enter the local source file directory. The default location is $PMSourceFileDir. Enter the local source file name. You can also use the session variable, $InputFileName, as defined in the parameter file. If you use a file list, enter the name of the list. By default, the Workflow Manager uses the source file name for each partition. Edit the file name property for partitions 2-n based on how you want the PowerCenter Server to read the files. Choose Direct to use source files or Indirect to use a file list.

Source File Type

Configuring Sessions to Use a Single Thread
To configure a session to read a file with a single thread, pass empty data through partitions 2n. To pass empty data, create a file with no data, such as “empty.txt,” and put it in the source file directory. Then, use “empty.txt” as the source file name. Table 13-6 describes the session configuration and the PowerCenter Server behavior when it uses a single thread to read source files:
Table 13-6. Configuring Source File Name for Single-Threaded Reading Source File Name Partition #1 Partition #2 Partition #3 Partition #1 Partition #2 Partition #3 Value ProductsA.txt empty.txt empty.txt ProductsA.txt empty.txt ProductsB.txt PowerCenter Server Behavior The PowerCenter Server creates one thread to read ProductsA.txt. It reads rows in the file sequentially. After it reads the file, it passes the data to three partitions in the transformation pipeline. The PowerCenter Server creates two threads. It creates one thread to read ProductsA.txt, and it creates one thread to read ProductsB.txt. It reads the files concurrently, and it reads rows in the files sequentially.

If you use FTP to access source files, you can choose a different connection for each direct file. For more information about using FTP to access source files, see “Using FTP” on page 559.

Configuring Sessions to Use Multiple Threads
To configure a session to read a file with multiple threads, leave the source file name blank for partitions 2-n. The PowerCenter Server uses partitions 2-n to read a portion of the previous partition file or file list. The PowerCenter Server ignores the directory field of that partition.

376

Chapter 13: Pipeline Partitioning

Table 13-7 describes the session configuration and the PowerCenter Server behavior when it uses multiple threads to read source files:
Table 13-7. Configuring Source File Name for Multi-Threaded Reading Attribute Partition #1 Partition #2 Partition #3 Partition #1 Partition #2 Partition #3 Value ProductsA.txt <blank> <blank> ProductsA.txt <blank> ProductsB.txt PowerCenter Server Behavior The PowerCenter Server creates three threads to concurrently read ProductsA.txt. The PowerCenter Server creates three threads to read ProductsA.txt and ProductsB.txt concurrently. Two threads read ProductsA.txt and one thread reads ProductsB.txt.

Partitioning File Sources

377

Partitioning Relational Targets
When you configure a pipeline to load data to a relational target, the PowerCenter Server creates a separate connection to the target database for each partition at the target instance. It concurrently loads data for each partition into the target database. Configure partition attributes for targets in the pipeline on the Transformations view of the Mapping tab in the session properties. For relational targets, you configure the reject file names and directories. The PowerCenter Server creates one reject file for each target partition. Figure 13-15 shows the Properties settings for relational targets:
Figure 13-15. Properties Settings for Relational Targets in the Session Properties

Properties Settings Selected Target Instance

Enter reject file directories.

Enter reject file names. Transformations View

378

Chapter 13: Pipeline Partitioning

Table 13-8 describes the partitioning attributes for relational targets in a pipeline:
Table 13-8. Partitioning Relational Target Attributes Attribute Reject File Directory Reject File Name Description Location for the target reject files. Default is $PMBadFileDir. Name of reject file. Default is target name partition number.bad. You can also use the session variable, $BadFileName, as defined in the parameter file.

Database Compatibility
When you configure a session with multiple partitions at the target instance, the PowerCenter Server creates one connection to the target for each partition. If you configure multiple target partitions in a session that loads to a database or ODBC target that does not support multiple concurrent connections to tables, the session fails. When you create multiple target partitions in a session that loads data to an Informix database, you must create the target table with row-level locking. If you insert data from a session with multiple partitions into an Informix target configured for page-level locking, the session fails and returns the following message:
WRT_8206 Error: The target table has been created with page level locking. The session can only run with multi partitions when the target table is created with row level locking.

Sybase IQ does not allow multiple concurrent connections to tables. If you create multiple target partitions in a session that loads to Sybase IQ, the PowerCenter Server loads all of the data in one partition.

Partitioning Relational Targets

379

Partitioning File Targets
When you configure a session to write to a file target, the PowerCenter Server writes the output to a separate file for each partition at the target instance. When you run the session, the PowerCenter Server writes to the files concurrently. You can configure connection settings and file properties for each target partition. You configure these settings in the Transformations view on the Mapping tab.

Configuring Connection Settings
The Connections settings in the Transformations view on the Mapping tab allow you to configure the connection type for all target partitions. You can choose different connection objects for each partition, but they must all be of the same type. You can use one of the following connection types with target files:
♦ ♦

Local. Write the partitioned target files to the local machine. FTP. Transfer the partitioned target files to another machine. You can transfer the files to any machine to which the PowerCenter Server can connect. For more information about using FTP to load to target files, see “Using FTP” on page 559. Loader. Use an external loader that can load from multiple output files. This option appears if the pipeline loads data to a relational target and you choose a file writer in the Writers settings on the Mapping tab. If you choose a loader that cannot load from multiple output files, the PowerCenter Server fails the session. For more information about configuring external loaders for partitioning, see “Partitioning Sessions with External Loaders” on page 526. Message Queue. Transfer the partitioned target files to an IBM MQSeries message queue. For more information about loading to message queues, refer to the PowerCenter Connect for IBM MQSeries User and Administrator Guide.

You can merge target files only if you choose local connections for all target partitions.

380

Chapter 13: Pipeline Partitioning

Figure 13-16 shows the Connections settings for file targets:
Figure 13-16. Connections Settings for File Targets in the Session Properties

Selected Target Instance Connections Settings

Connection Type

Transformations View

Table 13-9 describes the connection options for file targets in a mapping:
Table 13-9. File Targets Connection Options Attribute Connection Type Description Choose a local, FTP, external loader, or message queue connection. Select None for a local connection. The connection type is the same for all partitions. For an FTP, external loader, or message queue connection, click the button in this field to select the connection object. You can specify a different connection object for each partition.

Value

Configuring File Properties
The Properties settings in the Transformations view on the Mapping tab allow you to configure file properties such as the reject file names and directories, the output file names and directories, and whether to merge the target files.

Partitioning File Targets

381

Figure 13-17 shows the Properties settings for file targets:
Figure 13-17. Properties Settings for File Targets in the Session Properties

Selected Target Instance

Properties Settings Select to merge target files.

Enter output file directories.

Enter output file names.

Enter reject file directories. Enter reject file names.

Table 13-10 describes the file properties for file targets in a mapping:
Table 13-10. Target File Properties Attribute Merge Partitioned Files Description If you select this option, the PowerCenter Server merges the partitioned target files into one file when the session completes, and then deletes the individual output files. It does not delete the individual files if it fails to create the merged file. You cannot merge files if the session uses FTP, an external loader, or an MQSeries message queue. Location for the merge file. Default is $PMTargetFileDir. Name of the merge file. Default is target name.out. Location for the target file. Default is $PMTargetFileDir.

Merge File Directory Merge File Name Output File Directory

382

Chapter 13: Pipeline Partitioning

Table 13-10. Target File Properties Attribute Output File Name Reject File Directory Reject File Name Description Name of target file. Default is target name partition number.out. You can also use the session variable, $OutputFileName, as defined in the parameter file. Location for the target reject files. Default is $PMBadFileDir. Name of reject file. Default is target name partition number.bad.

Partitioning File Targets

383

Partitioning Joiner Transformations
When you create a partition point at the Joiner transformation, the Workflow Manager sets the partition type to hash auto-keys when the transformation scope is All Input. The Workflow Manager sets the partition type to pass-through when the transformation scope is Transaction. You must create the same number of partitions for the master and detail source. If you configure the Joiner transformation for sorted input, you can change the partition type to pass-through. See the Transformation Guide for more information about configuring the Joiner transformation for sorted input. To use cache partitioning with a Joiner transformation, you must create a partition point at the Joiner transformation. This allows you to create multiple partitions for both the master and detail source of a Joiner transformation. For more information about cache partitioning, see “Cache Partitioning” on page 359.
Note: If you do not create a partition point at the Joiner transformation, you can create n

partitions for the detail source, but only one partition for the master source (1:n).

Partitioning Sorted Joiner Transformations
When you include a Joiner transformation that uses sorted input in the mapping, you must verify the Joiner transformation receives sorted data. If your sources contain large amounts of data, you may want to configure partitioning to improve performance. However, partitions that redistribute rows can rearrange the order of sorted data, so it is important to configure partitions to maintain sorted data. For example, when you use a hash auto-keys partition point, the PowerCenter Server uses a hash function to determine the best way to distribute the data among the partitions. However, it does not maintain the sort order, so you must follow specific partitioning guidelines to use this type of partition point. When you join data, you can partition data for the master and detail pipelines in the following ways:

1:n. Use one partition for the master source and multiple partitions for the detail source. The PowerCenter Server maintains the sort order because it does not redistribute master data among partitions. n:n. Use an equal number of partitions for the master and detail sources. When you use n:n partitions, the PowerCenter Server processes multiple partitions concurrently. You may need to configure the partitions to maintain the sort order depending on the type of partition you use at the Joiner transformation.

Note: When you use 1:n partitions, do not add a partition point at the Joiner transformation.

If you add a partition point at the Joiner transformation, the Workflow Manager adds an equal number of partitions to both master and detail pipelines. Use different partitioning guidelines, depending on where you sort the data:

384

Chapter 13: Pipeline Partitioning

Using sorted flat files. Use one of the following partitioning configurations:
− −

Use 1:n partitions when you have one flat file in the master pipeline and multiple flat files in the detail pipeline. Configure the session to use one reader-thread for each file. Use n:n partitions when you have one large flat file in the master and detail pipelines. Configure partitions to pass all sorted data in the first partition, and pass empty file data in the other partitions. Use 1:n partitions for the master and detail pipeline. Use n:n partitions. If you use a hash auto-keys partition, configure partitions to pass all sorted data in the first partition.

Using sorted relational data. Use one of the following partitioning configurations:
− −

Using the Sorter transformation. Use n:n partitions. If you use a hash auto-keys partition at the Joiner transformation, configure each Sorter transformation to use hash auto-keys partition points as well.

Note: Add only pass-through partition points between the sort origin and the Joiner

transformation.

Using Sorted Flat Files
Use 1:n partitions when you have one flat file in the master pipeline and multiple flat files in the detail pipeline. When you use 1:n partitions, the PowerCenter Server maintains the sort order because it does not redistribute data among partitions. When you have one large flat file in each master and detail pipeline, you can use n:n partitions and add a pass-through or hash auto-keys partition at the Joiner transformation. When you add a hash auto-keys partition point, you must configure partitions to pass all sorted data in the first partition to maintain the sort order.

Using 1:n Partitions
If the session uses one flat file in the master pipeline and multiple flat files in the detail pipeline, you can use one partition for the master source and n partitions for the detail file sources (1:n). Add a pass-through partition point at the detail Source Qualifier transformation. Do not add a partition point at the Joiner transformation. The PowerCenter Server maintains the sort order when you create one partition for the master source because it does not redistribute sorted data among partitions. When you have multiple files in the detail pipeline that have the same structure, pass the files to the Joiner transformation using the following guidelines:
♦ ♦ ♦

Configure the mapping with one source and one Source Qualifier transformation in each pipeline. Specify the path and file name for each flat file in the Properties settings of the Transformations view on the Mapping tab of the session properties. Each file must use the same file properties as configured in the source definition.

Partitioning Joiner Transformations

385

The range of sorted data in the flat files can overlap. You do not need to use a unique range of data for each file.

Figure 13-18 shows sorted file data joined using 1:n partitioning:
Figure 13-18. Sorted File Data with 1:n Partitions

Flat File

Source Qualifier Joiner transformation

Flat File 1 Flat File 2 Flat File 3

Source Qualifier with passthrough partition Sorted Data Sorted output depends on join type.

The Joiner transformation may output unsorted data depending on the join type. If you use a full outer or detail outer join, the PowerCenter Server processes unmatched master rows last, which can result in unsorted data.

Using n:n Partitions
If the session uses sorted flat file data, you can use n:n partitions for the master and detail pipelines. You can add a pass-through partition or hash auto-keys partition at the Joiner transformation. If you add a pass-through partition at the Joiner transformation, follow instructions in the Transformation Guide for maintaining the sort order in mappings. If you add a hash auto-keys partition point at the Joiner transformation, you can maintain the sort order by passing all sorted data to the Joiner transformation in a single partition. When you pass sorted data in one partition, the PowerCenter Server maintains the sort order when it redistributes data using a hash function. To allow the PowerCenter Server to pass all sorted data in one partition, configure the session to use the sorted file for the first partition and empty files for the remaining partitions. The PowerCenter Server redistributes the rows among multiple partitions and joins the sorted data.

386

Chapter 13: Pipeline Partitioning

Figure 13-19 shows sorted file data passed through a single partition to maintain sort order:
Figure 13-19. Sorted File Data Passed Through a Single Partition Source Qualifier Joiner transformation with hash autokeys partition point

Source Qualifier

Sorted Data No Data

The example in Figure 13-19 shows sorted data passed in a single partition to maintain the sort order. The first partition contains sorted file data while all other partitions pass empty file data. At the Joiner transformation, the PowerCenter Server distributes the data among all partitions while maintaining the order of the sorted data.

Using Sorted Relational Data
When you join relational data, you can use 1:n partitions for the master and detail pipeline. When you use 1:n partitions, you cannot add a partition point at the Joiner transformation. If you use n:n partitions, you can add a pass-through or hash auto-keys partition at the Joiner transformation. If you use a hash auto-keys partition point, you must configure partitions to pass all sorted data in the first partition to maintain sort order.

Using 1:n Partitions
If the session uses sorted relational data, you can use one partition for the master source and n partitions for the detail source (1:n). Add a key-range or pass-through partition point at the Source Qualifier transformation. Do not add a partition point at the Joiner transformation. The PowerCenter Server maintains the sort order when you create one partition for the master source because it does not redistribute data among partitions.

Partitioning Joiner Transformations

387

Figure 13-20 shows sorted relational data with 1:n partitioning:
Figure 13-20. Sorted Relational Data with 1:n Partitioning

Relational Source

Source Qualifier transformation

Joiner transformation Relational Source Source Qualifier transformation with key-range or passthrough partition point

Sorted Data Unsorted Data Sorted output depends on join type.

The Joiner transformation may output unsorted data depending on the join type. If you use a full outer or detail outer join, the PowerCenter Server processes unmatched master rows last, which can result in unsorted data.

Using n:n Partitions
If the session uses sorted relational data, you can use n:n partitions for the master and detail pipelines and add a pass-through or hash auto-keys partition point at the Joiner transformation. When you use a pass-through partition at the Joiner transformation, follow instructions in the Transformation Guide for maintaining sorted data in mappings. When you use a hash auto-keys partition point, you maintain the sort order by passing all sorted data to the Joiner transformation in a single partition. Add a key-range partition point at the Source Qualifier transformation that contains all source data in the first partition. When you pass sorted data in one partition, the PowerCenter Server redistributes data among multiple partitions using a hash function and joins the sorted data.

388

Chapter 13: Pipeline Partitioning

Figure 13-21 shows sorted relational data passed through a single partition to maintain the sort order:
Figure 13-21. Sorted Relational Data Passed Through a Single Partition

Relational Source

Source Qualifier transformation with key-range partition point

Relational Source

Source Qualifier transformation with key-range partition point

Joiner transformation with hash autokeys partition point

Sorted Data No Data

The example in Figure 13-21 shows sorted relational data passed in a single partition to maintain the sort order. The first partition contains sorted relational data while all other partitions pass empty data. After the PowerCenter Server joins the sorted data, it redistributes data among multiple partitions.

Using Sorter Transformations
If the session uses the Sorter transformations to sort data, you can use n:n partitions for the master and detail pipelines. Use a hash auto-keys partition point at the Sorter transformation to group the data. You can add a pass-through or hash auto-keys partition point at the Joiner transformation. The PowerCenter Server groups data into partitions of the same hash values, and the Sorter transformation sorts the data before passing it to the Joiner transformation. When the PowerCenter Server processes the Joiner transformation configured with a hash auto-keys partition, it maintains the sort order by processing the sorted data using the same partitions it uses to route the data from each Sorter transformation.

Partitioning Joiner Transformations

389

Figure 13-22 shows Sorter transformations used with hash auto-keys to maintain sort order:
Figure 13-22. Using Sorter Transformations with Hash Auto-Keys to Maintain Sort Order Source with unsorted data Sorter transformation with hash autokeys partition point

Source Qualifier transformation

Source with unsorted data

Source Qualifier transformation

Sorter transformation with hash autokeys partition point

Joiner transformation with hash autokeys or passthrough partition point

Sorted Data Unsorted Data

Note: For best performance, use sorted flat files or sorted relational data. You may want to

calculate the processing overhead for adding Sorter transformations to your mapping.

Optimizing Sorted Joiner Transformations with Partitions
When you use partitions with a sorted Joiner transformation, you may optimize performance by grouping data and using n:n partitions.

Add a Hash Auto-keys Partition Upstream of the Sort Origin
To obtain expected results and get best performance when partitioning a sorted Joiner transformation, you must group and sort data. To group data, ensure that rows with the same key value are routed to the same partition. The best way to ensure that data is grouped and distributed evenly among partitions is to add a hash auto-keys or key-range partition point before the sort origin. Placing the partition point before you sort the data ensures that you maintain grouping and sort the data within each group.

Use n:n Partitions
You may be able to improve performance for a sorted Joiner transformation by using n:n partitions. When you use n:n partitions, the Joiner transformation reads master and detail rows concurrently and does not need to cache all of the master data. This reduces memory usage and speeds processing. When you use 1:n partitions, the Joiner transformation caches all the data from the master pipeline and writes the cache to disk if the memory cache fills. When the Joiner transformation receives the data from the detail pipeline, it must then read the data from disk to compare the master and detail pipelines.

390

Chapter 13: Pipeline Partitioning

Partitioning Lookup Transformations
You can use cache partitioning for static and dynamic caches, and named and unnamed caches. When you create a partition point at a connected Lookup transformation, you can use cache partitioning under the following conditions:
♦ ♦ ♦

You use the hash auto-keys partition type for the Lookup transformation. The lookup condition contains only equality operators. The database is configured for case-sensitive comparison. For example, if the lookup condition contains a string port and the database is not configured for case-sensitive comparison, the PowerCenter Server does not perform cache partitioning and writes the following message to the session log:
CMN_1799 Cache partitioning requires case sensitive string comparisons. Lookup will not use partitioned cache as the database is configured for case insensitive string comparisons.

For more information about cache partitioning, see “Cache Partitioning” on page 359.

Partitioning Lookup Transformations

391

Partitioning Sorter Transformations
If you configure multiple partitions in a session that uses a Sorter transformation, the PowerCenter Server sorts data in each partition separately. The Workflow Manager allows you to choose hash auto-keys, key-range, or pass-through partitioning when you add a partition point at the Sorter transformation. Use hash-auto keys partitioning when you place the Sorter transformation before an Aggregator transformation configured to use sorted input. Hash auto-keys partitioning groups rows with the same values into the same partition based on the partition key. After grouping the rows, the PowerCenter Server passes the rows through the Sorter transformation. The PowerCenter Server processes the data in each partition separately, but hash auto-keys partitioning accurately sorts all of the source data because rows with matching values are processed in the same partition. Use key-range partitioning when you want to send all rows in a partitioned session from multiple partitions into a single partition for sorting. When you merge all rows into a single partition for sorting, the PowerCenter Server can process all of your data together. Use pass-through partitioning if you already used hash partitioning in the pipeline. This ensures that the data passing into the Sorter transformation is correctly grouped among the partitions. Pass-through partitioning increases session performance without increasing the number of partitions in the pipeline. For more information on Sorter transformations, see “Sorter Transformation” in the Transformation Guide.

Configuring Sorter Transformation Work Directories
The PowerCenter Server creates temporary files for each Sorter transformation in a pipeline. It reads and writes data to these files while it performs the sort. The PowerCenter Server stores these files in the Sorter transformation work directories. By default, the Workflow Manager sets the work directories for all partitions at Sorter transformations to $PMTempDir. You can specify a different work directory for each partition in the session properties.

392

Chapter 13: Pipeline Partitioning

Figure 13-23 shows where you specify the work directories in the session properties:
Figure 13-23. Session Properties - Configuring Sorter Transformations

Selected Sorter Transformation

Enter Sorter transformation work directories.

Partitioning Sorter Transformations

393

Mapping Variables in Partitioned Pipelines
When you specify multiple partitions in a target load order group that uses mapping variables, the PowerCenter Server evaluates the value of a mapping variable in each partition separately. The PowerCenter Server uses the following process to evaluate variable values: 1. 2. It updates the current value of the variable separately in each partition according to the variable function used in the mapping. After loading all the targets in a target load order group, the PowerCenter Server combines the current values from each partition into a single final value based on the aggregation type of the variable. If there is more than one target load order group in the session, the final current value of a mapping variable in a target load order group becomes the current value in the next target load order group. When the PowerCenter Server completes loading the last target load order group, the final current value of the variable is saved into the repository. For more information about mapping variables, see “Mapping Parameters and Variables” in the Designer Guide. For more information about target load order groups, see “Reading Source Data” on page 22. Use one of the following variable functions in the mapping to set the variable value:
♦ ♦ ♦

3.

4.

SetCountVariable SetMaxVariable SetMinVariable

For more information about the variable functions, see “Functions” in the Transformation Language Reference. Table 13-11 describes how the PowerCenter Server calculates variable values across partitions:
Table 13-11. Variable Value Calculations with Partitioned Sessions Variable Function SetCountVariable SetMaxVariable SetMinVariable Variable Value Calculation Across Partitions PowerCenter Server calculates the final count values from all partitions. PowerCenter Server compares the final variable value for each partition and saves the highest value. PowerCenter Server compares the final variable value for each partition and saves the lowest value.

Note: You should use the SetVariable function only once for each mapping variable in a

pipeline. When you create multiple partitions in a pipeline, the PowerCenter Server uses multiple threads to process that pipeline. If you use this function more than once for the same variable, the current value of a mapping variable may have indeterministic results.

394

Chapter 13: Pipeline Partitioning

Partitioning Rules
You can create multiple partitions in a pipeline if the PowerCenter Server can maintain data consistency when it processes the partitioned data. When you create a session, the Workflow Manager validates each pipeline for partitioning. You can change the partitioning information for a pipeline as long as it conforms to the rules and restrictions listed in this section. There are several types of partitioning rules and restrictions. These include restrictions on the number of partitions, partitioning restrictions when you change a mapping, restrictions that apply to other Informatica products, and general guidelines.

Restrictions on the Number of Partitions
In general, you can create up to 64 partitions at any partition point in each pipeline in a mapping. Under certain circumstances however, the number of partitions should or must be limited.

Restrictions for Numerical Functions
The numerical functions CUME, MOVINGSUM, and MOVINGAVG calculate running totals and averages on a row-by-row basis. According to the way you partition a pipeline, the order that rows of data pass through a transformation containing one of these functions can change. Therefore, a session with multiple partitions that uses CUME, MOVINGSUM, or MOVINGAVG functions may not always return the same calculated result.

Restrictions for Relational Targets
When you configure a session to load data to relational targets, the PowerCenter Server can create one or more connections to each target. If you configure multiple target partitions in a session that writes to a database or ODBC target that does not support multiple connections, the session fails. When you create multiple target partitions in a session that loads data to an Informix database, you must create the target table with row-level locking. For more information, see “Database Compatibility” on page 379. Sybase IQ does not allow multiple concurrent connections to tables. If you create multiple target partitions in a session that loads to Sybase IQ, the PowerCenter Server loads all of the data in one partition.

Restrictions for Transformations
Some restrictions on the number of partitions depend on the types of transformations in the pipeline. These restrictions apply to all transformations, including reusable transformations, transformations created in mappings and mapplets, and transformations, mapplets, and mappings referenced by shortcuts.

Partitioning Rules

395

Table 13-12 describes the restrictions on the number of partitions for transformations:
Table 13-12. Restrictions on the Number of Partitions for Transformations Transformation Custom transformation Restrictions By default, you can only specify one partition if the pipeline contains a Custom transformation. However, this transformation contains an option on the Properties tab to allow multiple partitions. If you enable this option, you can specify multiple partitions at this transformation. Do not select Is Partitionable if the Custom transformation procedure performs the procedure based on all the input data together, such as data cleansing. By default, you can only specify one partition if the pipeline contains an External Procedure transformation. This transformation contains an option on the Properties tab to allow multiple partitions. If this option is enabled, you can specify multiple partitions at this transformation. You can specify only one partition if the pipeline contains the master source for a Joiner transformation and you do not add a partition point at the Joiner transformation. You can specify only one partition if the pipeline contains XML targets.

External Procedure transformation

Joiner transformation

XML target instance

Sequence numbers generated by Normalizer and Sequence Generator transformations might not be sequential for a partitioned source, but they are unique.

Restrictions when Running the Debugger
You can run the Debugger on a session if all pipelines in the mapping contain one partition.

Partition Restrictions for Editing Objects
When you edit object properties, you can impact your ability to create multiple partitions in a a session or to run an existing session with multiple partitions.

Before You Create a Session
When you create a session, the Workflow Manager checks the mapping properties. Mappings dynamically pick up changes to shortcuts, but not to reusable objects, such as reusable transformations and mapplets. Therefore, if you edit a reusable object in the Designer after you save a mapping and before you create a session, you must open and resave the mapping for the Workflow Manager to recognize the changes to the object.

After You Create a Session with Multiple Partitions
When you edit a mapping after you create a session with multiple partitions, the Workflow Manager does not invalidate the session even if the changes violate partitioning rules. The PowerCenter Server fails the session the next time it runs unless you edit the session so that it no longer violates partitioning rules.

396

Chapter 13: Pipeline Partitioning

The following changes to mappings can cause session failure:
♦ ♦ ♦ ♦

You delete a transformation that was a partition point. You add a transformation that is a default partition point. You move a transformation that is a partition point to a different pipeline. You change a transformation that is a partition point in any of the following ways:
− − −

The existing partition type is invalid. The transformation can no longer support multiple partitions. The transformation is no longer a valid partition point.

♦ ♦

You disable partitioning in an External Procedure transformation after you create a pipeline with multiple partitions. You switch the master and detail source for the Joiner transformation after you create a pipeline with multiple partitions.

Partition Restrictions for Informatica Application Products
You can specify multiple partitions in Informatica Application products, but there are some additional restrictions with these products. Table 13-13 describes the partitioning restrictions that apply to Informatica Application products:
Table 13-13. Partitioning Guidelines for Informatica Application Products Product PowerCenter Connect for PeopleSoft Restrictions If the pipeline contains an Application Source Qualifier transformation for PeopleSoft when it is connected to or associated with a PeopleSoft tree, then you can specify only one partition and the partition type must be passthrough. For MQSeries sources, you can specify multiple partitions only if there is no associated source qualifier in the pipeline. You cannot merge output files from sessions with multiple partitions if you use an MQSeries message queue as the target connection type. If the mapping contains hierarchies or IDOCs, then you can specify only one partition and the partition type must be pass-through. If you generate the ABAP program using exec SQL, then you can specify only one partition and the partition type must be pass-through. You must use the Informatica default date format to enter dates in key ranges. You can specify only one partition when the target load order group contains an SAP BW target.

PowerCenter Connect for IBM MQSeries

PowerCenter Connect for SAP R/3

PowerCenter Connect for SAP BW

Partitioning Rules

397

Table 13-13. Partitioning Guidelines for Informatica Application Products Product PowerCenter Connect for Siebel Restrictions When you use a source filter in a join override, always use the following syntax for Siebel business components:
SiebelBusinessComponentName.SiebelFieldName

When you create a source filter for a Siebel business component, always use the following syntax:
SiebelBusinessComponentName.SiebelFieldName

PowerCenter Connect SDK

If the mapping contains a multi-group target that receives data from more than one pipeline, then you can specify only one partition. If the mapping contains a multi-group target that receives data from multiple groups, then the partition type must be pass-through.

For more information about these other products, please see the product documentation.

Partitioning Guidelines
This section summarizes the other guidelines that appear throughout this chapter.

Guidelines for Adding and Deleting Partition Points
The following guidelines apply to adding and deleting partition points:
♦ ♦ ♦ ♦

You cannot delete a partition point at a Source Qualifier transformation, a Normalizer transformation for COBOL sources, or a target instance. You cannot create a partition point at a source instance. You cannot create a partition point at a Sequence Generator transformation or an unconnected transformation. You can add a partition point at any other transformation provided that no partition point receives input from more than one pipeline stage.

For more information, see “Adding and Deleting Partition Points” on page 353.

Guidelines for Specifying the Partition Type
You must choose pass-through partitioning at certain partition points in a pipeline if the session uses a source-based commit or constraint-based loading, or if the mapping contains a transaction generator, such as a Transaction Control transformation. For more information, see Table 13-4 on page 357. If recovery is enabled, the Workflow Manager sets pass-through as the partition type unless the partition point is either an Aggregator transformation or a Rank transformation.

Guidelines for Adding and Deleting Partition Keys
The following guidelines apply to creating and deleting partition keys:

A partition key must contain at least one port.

398

Chapter 13: Pipeline Partitioning

♦ ♦

If you choose key range partitioning at any partition point, you must specify a range for each port in the partition key. If you choose key range partitioning and need to enter a date range for any port, use the standard PowerCenter date format. For details on the default date format, see “Dates” in the Transformation Language Reference. The Workflow Manager does not validate overlapping string ranges, overlapping numeric ranges, gaps, or missing ranges. If a row contains a null value in any column that makes up the partition key, or if a row contains values that fall outside all of the key ranges, the PowerCenter Server sends that row to the first partition.

♦ ♦

For more information, see “Adding Key Ranges” on page 365.

Guidelines for Partitioning File Sources and Targets
The following guidelines apply to partitioning file sources and targets:

When connecting to file sources or targets, you must choose the same connection type for all partitions. You may choose different connection objects as long as each object is of the same type. For more information, see “Partitioning File Sources” on page 374 and “Partitioning File Targets” on page 380. You cannot merge output files from sessions with multiple partitions if you use FTP, an external loader, or an MQSeries message queue as the target connection type. For more information, see “Partitioning File Targets” on page 380.

Partitioning Rules

399

400

Chapter 13: Pipeline Partitioning

Chapter 14

Monitoring Workflows
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 402 Using the Workflow Monitor, 404 Customizing Workflow Monitor Options, 409 Using Workflow Monitor Toolbars, 415 Working with Tasks and Workflows, 416 Workflow and Task Status, 421 Using the Gantt Chart View, 423 Using the Task View, 430 Monitoring Session Details, 434 Creating and Viewing Performance Details, 436 Tips, 441

401

Overview
You can monitor workflows and tasks in the Workflow Monitor. View details about a workflow or task in Gantt Chart view or Task view. You can run, stop, abort, and resume workflows from the Workflow Monitor. The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives information from the PowerCenter Server and Repository Server. It also fetches information from the repository to display historic information. The Workflow Monitor consists of the following windows:
♦ ♦ ♦ ♦ ♦

Navigator window. Displays monitored repositories, servers, and repository objects. Output window. Displays messages from the PowerCenter Server and the Repository Server. Time window. Displays progress of workflow runs. Gantt Chart view. Displays details about workflow runs in chronological (Gantt Chart) format. Task view. Displays details about workflow runs in a report format, organized by workflow run.

The Workflow Monitor displays time relative to the time configured on the PowerCenter Server machine. For example, a folder contains two workflows. One workflow runs on a PowerCenter Server in your local time zone, and the other runs on a PowerCenter Server in a time zone two hours later. If you start both workflows at 9 a.m. local time, the Workflow Monitor displays the start time as 9 a.m. for one workflow and as 11 a.m. for the other workflow.

402

Chapter 14: Monitoring Workflows

Figure 14-1 shows the Workflow Monitor in Gantt Chart view:
Figure 14-1. Workflow Monitor

Navigator Window

Gantt Chart View

Task View

Output Window

Time Window

Toggle between Gantt Chart view and Task view by clicking the tabs on the bottom of the Workflow Monitor.
Note: You can view and hide the Output window in the Workflow Monitor. To toggle back

and forth, choose View-Output.

Permissions and Privileges
To use the Workflow Monitor, you must have one of the following sets of permissions and privileges:
♦ ♦ ♦

Use Workflow Manager privilege with the execute permission on the folder Workflow Operator privilege with the read permission on the folder Super User privilege

You must also have execute permission for connection objects to restart, resume, stop, or abort a workflow containing a session. For more information on permissions and privileges necessary to use the Workflow Monitor, see “Permissions and Privileges by Task” in the Repository Guide.

Overview

403

Using the Workflow Monitor
The Workflow Monitor provides options to view information about workflow runs. After you open the Workflow Monitor and connect to a repository, you can view dynamic information about workflow runs by connecting to a PowerCenter Server. You can customize the Workflow Monitor display by configuring the maximum days or workflow runs the Workflow Monitor shows. You can also filter tasks and servers in both Gantt Chart and Task view. Complete the following steps to monitor workflows: 1. 2. 3. 4. 5. Open the Workflow Monitor. Connect to the repository containing the workflow. Connect to the PowerCenter Server. Select the workflow you want to monitor. Choose from Gantt Chart view or Task view.

Opening the Workflow Monitor
You can open the Workflow Monitor in the different ways:
♦ ♦ ♦

From the Windows Start menu From the Workflow Manager Navigator Configure the Workflow Manager to open the Workflow Monitor when you run a workflow from the Workflow Manager.

You can open multiple instances of the Workflow Monitor on one machine using the Windows Start menu.
To open the Workflow Monitor when you start a workflow: 1. 2.

In the Workflow Manager, choose Tools-Options. In the General tab, select Launch Workflow Monitor When Workflow Is Started.

To open the Workflow Monitor from the Workflow Manager: 1. 2.

In the Workflow Manager, connect to a repository. In the Navigator, right-click a server or a repository and choose Run Monitor. The Workflow Monitor appears.

404

Chapter 14: Monitoring Workflows

Connecting to Repositories
When you open the Workflow Monitor, you must connect to a repository to monitor the objects in it. Connect to repositories by choosing Repository-Connect. Enter the repository name and connection information. Once you connect to a repository, the Workflow Monitor displays a list of servers available for the repository. The Workflow Monitor can monitor multiple repositories, PowerCenter Servers, and workflows at the same time.
Note: If you are not connected to a repository, you can remove the repository from the

Navigator. Select the repository in the Navigator and choose Edit-Delete. The Workflow Monitor displays a message verifying that you want to remove the repository from the Navigator list. Click Yes to remove the repository. You can connect to the repository again at any time.

Connecting to PowerCenter Servers
When you connect to a repository, the Workflow Monitor displays all registered PowerCenter Servers and deleted PowerCenter Servers. To monitor tasks and workflows that run on a server, you must connect to the server. In the Navigator, the Workflow Monitor displays a red icon over deleted servers. To connect to a server, right-click it and choose Connect. When you connect to a server, you can view all folders that you have read permission on. You can disconnect from a server by right-clicking it and selecting Disconnect. When you disconnect from a server, or when the Workflow Monitor cannot connect to a server, the Workflow Monitor displays disconnected for the server status. You can also verify whether a PowerCenter Server is running by pinging it. Right-click the server in the Navigator and select Ping Server. You can view the ping response time in the Output window.
Note: You can also open a PowerCenter Server node in the Navigator without connecting to it.

When you open a PowerCenter Server, the Workflow Monitor gets workflow run information stored in the repository. It does not get dynamic workflow run information from currently running workflows.

Filtering Tasks and Servers
You can filter tasks and servers in both Gantt Chart view and Task view. Use the Filters menu to hide tasks and servers you do not want to view in the Workflow Monitor.

Filtering Tasks
You can view all or some workflow tasks. You can filter out tasks to view only tasks you want. For example, if you want to view only Session tasks, you can hide all other tasks. You can view all tasks at any time.

Using the Workflow Monitor

405

You can also filter deleted tasks. To filter deleted tasks, choose Filters-Deleted Tasks.
To filter tasks: 1.

Choose Filters-Tasks. The Filter Tasks dialog box appears.

2. 3.

Clear the tasks you want to hide, and select the tasks you want to view. Click OK.
Note: When you filter a task, the Gantt Chart view displays a red link between tasks to

indicate a filtered task. You can double-click the link to view the tasks you hid.

Filtering Servers
When you connect to a repository, the Workflow Monitor displays a list of registered servers and deleted servers. When you register multiple servers, you can filter out servers to view only servers you want to monitor. When you hide a server, the Workflow Monitor hides the server from the Navigator for both Gantt Chart and Task view. You can show the server at any time. You can hide unconnected servers. When you hide a connected server, the Workflow Monitor asks if you want to disconnect from the server and then filter it. You must disconnect from a server before hiding it.
To filter ser vers: 1.

In the Navigator, right-click a repository and select Filter Servers. or Choose Filters-Servers.

406

Chapter 14: Monitoring Workflows

The Filter Servers dialog box appears.

2.

Select the servers you want to view, and clear the servers you want to filter. Click OK. If you are connected to a server that you clear, the Workflow Monitor prompts you to disconnect from the server before filtering.

3.

Click Yes to disconnect from the server and filter it. The Workflow Monitor hides the server from the Navigator. Click No to remain connected to the server. If you click No, you cannot filter the server.

Tip: You can also filter a server in the Navigator by right-clicking it and selecting Filter Server.

Opening and Closing Folders
You can choose which folders to open and close in the Workflow Monitor. When you open a folder, the Workflow Monitor displays the number of workflow runs that you configured in the Workflow Monitor options. For more information, see “Configuring General Options” on page 409. You can open and close folders in both Gantt Chart and Task view. When you open a folder, it opens in both views. To open a folder, right-click it in the Navigator and select Open. Or, you can double-click the folder. To view folder contents in the Workflow Monitor, you must have one of the following sets of permissions and privileges:
♦ ♦

Workflow Operator privilege with read permission on the folder Super User privilege

Using the Workflow Monitor

407

Viewing Statistics
You can view statistics about the objects you monitor in the Workflow Monitor by choosing View-Statistics. The Statistics dialog box displays the following information:
♦ ♦ ♦

Number of opened repositories. Number of repositories you are connected to in the Workflow Monitor. Number of connected servers. Number of servers you connected to since you opened the Workflow Monitor. Number of fetched tasks. Number of tasks the Workflow Monitor fetched from the repository during the period specified in the Time window.

Figure 14-2 shows the Statistics dialog box:
Figure 14-2. Workflow Monitor Statistics Dialog Box

Viewing Properties
You can view properties for the following items:
♦ ♦

Tasks. You can view properties such as task name, start time, and status. Sessions. You can view properties about the Session task and session run, such as mapping name and number of rows successfully loaded. You can also view load statistics about the session run. For more information on session details, see “Monitoring Session Details” on page 434. You can also view performance details about the session run. For more information, see “Creating and Viewing Performance Details” on page 436. Workflows. You can view properties such as start time, status, and run type. Links. When you double-click a link between tasks in Gantt Chart view, you can view tasks you hide. Servers. You can view properties such as server version and startup time. You can also view the sessions and workflows running on the PowerCenter Server. Folders. You can view properties such as the number of workflow runs displayed in the Time window.

♦ ♦ ♦ ♦

To view properties for all objects, right-click the object and select Properties. You can rightclick items in the Navigator or the Time window in either Gantt Chart view or Task view. To view link properties, double-click the link in the Time window of Gantt Chart view. When you view link properties, you can double-click a task in the Link Properties dialog box to view the properties for the filtered task.

408

Chapter 14: Monitoring Workflows

Customizing Workflow Monitor Options
You can configure how the Workflow Monitor displays general information, workflows, and tasks. You can configure general tasks such as the maximum number of days or runs that the Workflow Monitor displays. You can also configure options specific to Gantt Chart and Task view. Choose Tools-Options to configure Workflow Monitor options. You can configure the following options in the Workflow Monitor:

General. Customize general options such as the maximum number of workflow runs to display and whether to receive messages from the Workflow Manager. See “Configuring General Options” on page 409 Gantt Chart view. Configure Gantt Chart view options such as workspace color, status colors, and time format. See “Configuring Gantt Chart View Options” on page 411. Task view. Configure which columns to display in Task view. See “Configuring Task View Options” on page 412. Advanced. Configure advanced options such as the number of workflow runs the Workflow Monitor holds in memory for each server. “Configuring Advanced Options” on page 412.

♦ ♦ ♦

Configuring General Options
You can customize general options such as the maximum number of days to display and which text editor to use for viewing session and workflow logs.

Customizing Workflow Monitor Options

409

Figure 14-3 shows the General Options tab:
Figure 14-3. General Tab for Workflow Monitor Options

Table 14-1 describes the options you can configure on the General tab:
Table 14-1. Workflow Monitor General Options Setting Maximum Days Maximum Workflow Runs per Folder Receive Messages from Workflow Manager Receive Notifications from Repository Server Log File Editor Description Specifies the number of tasks the Workflow Monitor displays up to a maximum number of days. The default is 5. Specifies the maximum number of workflow runs the Workflow Monitor displays for each folder. The default is 200. Select this option to receive messages from the Workflow Manager. The Workflow Manager sends messages when you start or schedule a workflow in the Workflow Manager. The Workflow Monitor displays these messages in the Output window. Select this option to receive notifications from the Repository Server. Notifications from the Repository Server display in the Output window Notifications tab. Enter the path and file name of the text editor to view and edit workflow and session logs. You can browse to select an editor. By default, the Workflow Monitor uses WordPad. The location where the Workflow Monitor stores temporary versions of log files when you open session or workflow logs from the Workflow Monitor.

Location

410

Chapter 14: Monitoring Workflows

Configuring Gantt Chart View Options
You can configure Gantt Chart view options such as workspace color, status colors, and time format. Figure 14-4 shows the Gantt Chart Options tab:
Figure 14-4. Gantt Chart Options

Table 14-2 describes the options you can configure on the Gantt Chart Options tab:
Table 14-2. Gantt Chart Options Gantt Chart Option Status Color Description Choose a status and configure the color for the status. The Workflow Monitor displays tasks with the selected status in the colors you choose. You can choose two colors to display a gradient. Configure the color for the recovery sessions. The Workflow Monitor uses the status color for the body of the status bar, and it uses and the recovery color as a gradient in the status bar. Choose a color for each workspace component. Select a display format for the time window.

Recovery Color Workspace Color Time Format

Customizing Workflow Monitor Options

411

Configuring Task View Options
You can choose the columns you want to display in Task view. You can also reorder the columns and specify a default column width. Figure 14-5 shows the Task View Options tab:
Figure 14-5. Task View Options

Configuring Advanced Options
You can configure advanced options such as the number of workflow runs the Workflow Monitor holds in memory for each server.

412

Chapter 14: Monitoring Workflows

Figure 14-6 shows the Advanced Options tab:
Figure 14-6. Advanced Tab for Workflow Monitor Options

Table 14-3 describes the options you can configure on the Advanced tab:
Table 14-3. Advanced Workflow Monitor Options Setting Expand Running Workflows Automatically Hide Folders/Workflows That Do Not Contain Any Runs When Filtering By Running/ Schedule Runs Highlight the Entire Row When an Item Is Selected Description Expands running workflows in the Navigator. Hides folders or workflows under the Workflow Run column in the Time window when you filter running or scheduled tasks. Highlights the entire row in the Time window for selected items. When you disable this option, the Workflow Monitor highlights the item in the Workflow Run column in the Time window.

Customizing Workflow Monitor Options

413

Table 14-3. Advanced Workflow Monitor Options Setting Open Latest 20 Runs At a Time Minimum Number of Workflow Runs (Per Server) the Workflow Monitor Will Accumulate in Memory Description Allows you to open the number of workflow runs of your choice. The number of runs to be opened is set at 20 by default. Specifies the minimum number of workflow runs per server that the Workflow Monitor holds in memory before it starts releasing older runs from memory. When you connect to a server, the Workflow Monitor fetches the number of workflow runs specified on the General tab for each folder you connect to. When the number of runs is less than the number specified in this option, the Workflow Monitor stores new runs in memory until it reaches this number. Then it releases the oldest run from memory when it fetches a new run. When the number of workflow runs the Workflow Monitor initially fetches exceeds the number specified in this option, the Workflow Monitor stores all those runs and then releases the oldest run from memory when it fetches a new run.

414

Chapter 14: Monitoring Workflows

Using Workflow Monitor Toolbars
The Workflow Monitor toolbars allow you to select tools and tasks quickly. You can perform the following toolbar operations:
♦ ♦ ♦

Display or hide a toolbar. Create a new toolbar. Add or remove buttons.

For details on how to perform these toolbar operations, see “Using the Designer” in the Designer Guide. By default, the Workflow Monitor displays the following toolbars:

Standard. Contains buttons to connect to and disconnect from repositories, and to zoom and print the workspace. Figure 14-7 displays the Standard toolbar:
Figure 14-7. Standard Toolbar

Server. Contains buttons to connect to and disconnect from PowerCenter Servers, to ping the server, and to start and stop workflows, worklets, and tasks. Figure 14-8 displays the Server toolbar:
Figure 14-8. Server Toolbar

View. Contains buttons to refresh the view and to open workflow and session logs. Figure 14-9 displays the View toolbar:
Figure 14-9. View Toolbar

Filter. Contains buttons to display most recent runs, and to filter tasks, servers, and folders. Figure 14-10 displays the Filter toolbar:
Figure 14-10. Filter Toolbar

Using Workflow Monitor Toolbars

415

Working with Tasks and Workflows
You can perform the following tasks with objects in the Workflow Monitor:
♦ ♦ ♦ ♦ ♦ ♦

Run a task or workflow. Resume a suspended workflow. Stop or abort a task or workflow. Schedule and unschedule a workflow. View session logs and workflow logs. View history names.

Running a Task, Workflow, or Worklet
The Workflow Monitor displays workflows that have run at least once. In the Workflow Monitor, you can run a workflow or any task or worklet in the workflow. To run a workflow or part of a workflow, right-click the workflow or task and choose a restart option. When you choose restart, the task, workflow, or worklet runs on the PowerCenter Server you specify in the workflow properties. You can also run part of a workflow. When you run part of a workflow, the PowerCenter Server runs the workflow from the selected task to the end of the workflow. For details on running workflows and tasks in the Workflow Manager, see “Running the Workflow” on page 122.
To run a workflow from the Workflow Monitor: 1. 2.

In the Navigator, select the workflow you want to run. Right-click the workflow in the Navigator and choose Restart. or Choose Task-Restart. The PowerCenter Server runs the workflow you specify.

To run a task from the Workflow Monitor: 1. 2.

In the Navigator, select the task or worklet you want to run. Right-click the task or worklet in the Navigator and choose Restart Task. The PowerCenter Server runs the task or worklet you specify. It does not run the rest of the workflow.

To run a part of a workflow from the Workflow Monitor: 1.

In the Navigator, select the task from which you want to run the workflow.

416

Chapter 14: Monitoring Workflows

2.

Right-click the task and choose Restart Workflow from Task. or Choose Task-Restart. The PowerCenter Server runs the workflow starting with the task you specify.

Resuming a Workflow or Worklet
In the workflow properties, you can choose to suspend the workflow or worklet if a task fails. After you fix the failed task, resume the workflow in the Workflow Monitor. When you resume a workflow, the PowerCenter Server finds the failed task, runs the task again, and continues running the rest of the tasks in the workflow path. For details on suspending a workflow, see “Suspending the Workflow” on page 127.
To resume a workflow or worklet: 1. 2.

In the Navigator, select the workflow or worklet you want to resume. Choose Tasks-Resume. or Right-click the workflow or worklet in the Navigator and choose Resume. The Workflow Monitor displays server messages about the resume command in the Output window.

Recovering a Workflow or Worklet
In the workflow properties, you can choose to suspend the workflow or worklet if a session fails. After you fix the errors that caused the session to fail, recover the workflow in the Workflow Monitor. When you recover a workflow, the PowerCenter Server recovers the failed session, and continues running the rest of the tasks in the workflow path. For details on suspending a workflow, see “Suspending the Workflow” on page 127.
To recover a workflow or worklet: 1. 2.

In the Navigator, select the workflow or worklet you want to recover. Choose Tasks-Resume/Recover. or Right-click the workflow or worklet in the Navigator and choose Resume/Recover. The Workflow Monitor displays server messages about the recover command in the Output window.

Working with Tasks and Workflows

417

Stopping or Aborting Tasks and Workflows
You can stop or abort a task, workflow, or worklet in the Workflow Monitor at any time. When you stop a task in the workflow, the PowerCenter Server stops processing the task and all other tasks in its path. The PowerCenter Server continues running concurrent tasks. If the PowerCenter Server cannot stop processing the task, you need to abort the task. When the PowerCenter Server aborts a task, it kills the DTM process and terminates the task. For details on server handling of stop and abort, see “Server Handling of Stop and Abort” on page 129.
To stop or abort workflows, tasks, or worklets in the Workflow Monitor: 1. 2.

In the Navigator, select the task, workflow, or worklet you want to stop or abort. Choose Tasks-Stop or Tasks-Abort. or Right-click the task, workflow, or worklet in the Navigator and choose Stop or Abort.

3.

The Workflow Monitor displays the status of the stop or abort command in the Output window.

Scheduling and Unscheduling Workflows
You can schedule and unschedule workflows in the Workflow Monitor. You can schedule any workflow that is not configured to run on demand. When you try to schedule a run on demand workflow, the Workflow Monitor displays an error message in the Output window. When you schedule an unscheduled workflow, the workflow uses its original schedule specified in the workflow properties. If you want to specify a different schedule for the workflow, you must edit the scheduler in the Workflow Manager.
To schedule an unscheduled workflow in the Workflow Monitor: ♦

Right-click the workflow and choose Schedule. The Workflow Monitor displays the workflow status as Scheduled, and displays a message in the Output window.

To unschedule a scheduled workflow in the Workflow Monitor: ♦

Right-click the workflow and choose Unschedule. The Workflow Monitor displays the workflow status as Unscheduled, and displays a message in the Output window.

For details on scheduling workflows, see “Scheduling a Workflow” on page 112.

418

Chapter 14: Monitoring Workflows

Viewing Session Logs and Workflow Logs
You can open and edit session and workflow log files from the Workflow Monitor. To view workflow or session logs, connect to the server. You can view the most recent session or workflow log. Or, select a particular workflow run and view the log for that run. If a past session or workflow log is not available, the Workflow Manager opens the most recent log file. You can view log files in any text editor on the PowerCenter Client. To change the log file editor, choose Tools-Options. Enter the path and file name of the text editor in the Log File Editor field on the General tab. When you open a session or workflow log, the Workflow Monitor copies the log file from the PowerCenter Server machine to the directory specified on the General tab of the Options dialog box. The Workflow Monitor opens the file from the temporary directory on the client machine. When you open a session or workflow log, you can cancel the operation at any time.
Note: To view past session or workflow log files, you must configure the session or workflow to

save logs by timestamp. For more information on workflow and session logs, see “Log Files” on page 455.

Viewing Dynamic Log Files
When you open a session or workflow log, the Workflow Monitor opens the most recent version of the log file, even if the PowerCenter Server is currently writing to the log file. Each time you choose Get Session Log or Get Workflow Log, the Workflow Monitor opens a new text file with the most recent version of the log file. If you choose to open the log file after the session completes, the Workflow Monitor opens the entire log in a new text file.

Steps to View Log Files
Perform the following steps to view a session or workflow log.
To view a session or workflow log file: 1. 2.

Right-click a Session task or workflow in the Navigator or Time window. Choose Get Session Log, or choose Get Workflow Log. The most recent session or workflow log file opens in the log file editor you specify for the Workflow Monitor.
Tip: When the Workflow Monitor retrieves the session or workflow log, you can press the

Esc key to cancel the process.

Viewing History Names
If you rename a task, workflow, or worklet, the Workflow Monitor can show a history of names. When you start a renamed task, workflow, or worklet, the Workflow Monitor displays the current name. To view a list of historical names, select the task, workflow, or worklet in the Navigator. Right-click and choose Show History Names.

Working with Tasks and Workflows

419

Figure 14-11 shows the History Names dialog box:
Figure 14-11. History Names Dialog Box

420

Chapter 14: Monitoring Workflows

Workflow and Task Status
The Workflow Monitor displays the status of workflows and tasks. Table 14-4 describes the different statuses for workflow and tasks:
Table 14-4. Workflow and Task Status Status Name Aborted Aborting Disabled Status for Workflows Tasks Workflows Tasks Workflows Tasks Workflows Tasks Workflows Tasks Workflows Workflows Tasks Workflows Tasks Workflows Tasks Workflows Worklets Workflows Worklets Description The PowerCenter Server aborted the workflow or task. The PowerCenter Server kills the DTM process when you abort a workflow or task. The PowerCenter Server is in the process of aborting the workflow or task. You select the Disabled option in the workflow or task properties. The PowerCenter Server does not run the disabled workflow or task until you clear the Disabled option. The PowerCenter Server failed the workflow or task due to errors. The PowerCenter Server is running the workflow or task. You schedule the workflow to run at a future date. The PowerCenter Server runs the workflow for the duration of the schedule. You choose to stop the workflow or task in the Workflow Monitor. The PowerCenter Server stopped the workflow or task. The PowerCenter Server is in the process of stopping a workflow or task. The PowerCenter Server successfully completed the workflow or task. The PowerCenter Server suspends the workflow because a task fails and no other tasks are running in the workflow. This status is available only when you choose the Suspend on Error option. A task fails in the workflow when other tasks are still running. The PowerCenter Server stops executing the failed task and continues executing tasks in other paths. This status is available only when you choose the Suspend on Error option. The PowerCenter Server terminated unexpectedly when it was running this workflow or task. You removed a workflow from the schedule. Or, the workflow is scheduled and the PowerCenter Server is about to run the scheduled workflow. The PowerCenter Server is waiting for available resources so it can execute the workflow or task. For example, you may set the maximum number of concurrent sessions to 10. If the PowerCenter Server is already executing 10 concurrent sessions, all other workflows and tasks has the Waiting status until the PowerCenter Server is free to execute more tasks.

Failed Running Scheduled Stopped Stopping Succeeded Suspended

Suspending

Terminated Unscheduled Waiting

Workflows Workflows Workflows Tasks

Workflow and Task Status

421

To see a list of tasks by status, view the workflow in Task view and sort by status. Or, choose Edit-List Tasks in Gantt Chart view. For details, see “Listing Tasks and Workflows” on page 424.

422

Chapter 14: Monitoring Workflows

Using the Gantt Chart View
The Gantt Chart view allows you to view chronological details of workflow runs. The Gantt Chart view displays the following information:
♦ ♦ ♦ ♦

Task name. Name of the task in the workflow. Duration. The length of time the PowerCenter Server spends running the most recent task or workflow. Status. The status of the most recent task or workflow. For more information about status, see “Workflow and Task Status” on page 421. Connection between objects. The Workflow Monitor shows links between objects in the Time window.

Figure 14-12 displays the Gantt Chart view:
Figure 14-12. Gantt Chart View

Organizing Tasks
In Gantt Chart view, you can organize tasks in the Navigator. You can drag and drop tasks within a workflow to change the order they appear in the Navigator.
Using the Gantt Chart View 423

For example, the Workflow Monitor usually displays the Decision task as the first task in the following workflow:

Decision task displays first.

You can drag and drop the Decision task within the Navigator so the Decision task is in the middle or at the bottom of the list of tasks for that workflow:

Decision task displays between other tasks.

Listing Tasks and Workflows
The Workflow Monitor lists tasks and workflows in all repositories you connect to. You can view tasks and workflows by status, such as failed or succeeded. You can highlight the task in Gantt Chart view by double-clicking the task in the list.

424

Chapter 14: Monitoring Workflows

To view a list of tasks and workflows by status: 1.

Open the Gantt Chart view and choose Edit-List Tasks. The List Tasks dialog box appears.

2.

In the List What field, select the type of task status you want to list. For example, select Failed to view a list of failed tasks and workflows.

3.

Click List to view the list.
Tip: Double-click the task name in the List Tasks dialog box to highlight the task in Gantt

Chart view.

Navigating the Time Window in Gantt Chart View
You can scroll through the Time window in Gantt Chart view to monitor the workflow runs. To scroll the Time window, you can use any of the following methods:
♦ ♦ ♦

Use the scroll bars. Right-click the task or workflow and choose Go To Next Run, or choose Go To Previous Run. Choose View-Organize to select the date you want to display.

When you choose View-Organize, the Go To field appears above the Time window. Click the Go To field to view a calendar and select the date you want to display. When you choose a date, the Workflow Monitor displays that date beginning at 12:00 a.m.

Using the Gantt Chart View

425

Figure 14-13 shows the Go To field:
Figure 14-13. Organizing Gantt Chart

Zooming the Gantt Chart View
You can change the zoom settings in Gantt Chart view. By default, the Workflow Monitor shows the Time window in increments of one hour. You can change the time increments to zoom the Time window.

426

Chapter 14: Monitoring Workflows

Figure 14-14 shows the Time window in 30 minute increments:
Figure 14-14. Zooming the Gantt Chart View

Zoom

30 Minute Increments Solid Line For Hour Increments Dotted Line For Half Hour Increments

To zoom the Time window in Gantt Chart view, choose View-Zoom and then choose the desired time increment. You can also choose the time increment in the Zoom button on the toolbar.

Performing a Search
Use the search tool in the Gantt Chart view to search for tasks, workflows, and worklets in all repositories you connect to. The Workflow Monitor searches for the word you specify in task names, workflow names, and worklet names. You can highlight the task in Gantt Chart view by double-clicking the task after searching.

Using the Gantt Chart View

427

To perform a search: 1.

Open the Gantt Chart view and choose Edit-Find. The Find Object dialog box appears.

2. 3.

In the Find What field, enter the keyword you want to find. Click Find Now. The Workflow Monitor displays a list of tasks, workflows, and worklets that match the keyword.
Tip: Double-click the task name in the Find Object dialog box to highlight the task in

Gantt Chart view.

428

Chapter 14: Monitoring Workflows

Opening All Folders
You can open all folders that you have read permission on in a Repository. To open all the folders in the Gantt Chart view, right-click the server you want to view, and then choose Open All Folders. The Workflow Monitor displays workflows and tasks in the folders.

Using the Gantt Chart View

429

Using the Task View
The Task view displays information about workflow runs in a report format. The Task view provides a convenient way to compare and filter details of workflow runs. Task view displays the following information:

Workflow run list. The list of workflow runs. The workflow run list contains folder, workflow, worklet, and task names. The Workflow Monitor displays workflow runs chronologically with the most recent run at the top. It displays folders and servers alphabetically. Status. The status of the task or workflow. Start time. The time that the PowerCenter Server starts executing the task or workflow. Completion time. The time that the PowerCenter Server finishes executing the task or workflow. Status message. Message from the PowerCenter Server regarding the status of the task or workflow. Run type. The method you used to start the workflow. You might manually start the workflow or schedule the workflow to start. Worker server. The PowerCenter Server that ran the task. Filter tasks. Use the Filter menu to select the tasks you want to display or hide. For more information on filtering tasks in Task view, see “Filtering in Task View” on page 431. Hide and view columns. Hide or view an entire column in Task view. For details on hiding and viewing columns in Task view, see “Configuring Task View Options” on page 412. Hide and view the Navigator. You can hide the Navigator in Task view. Choose ViewNavigator to hide or view the Navigator.

♦ ♦ ♦ ♦ ♦ ♦

You can perform the following tasks in Task view:
♦ ♦

To view the tasks in Task view, select the server you want to monitor in the Navigator.

430

Chapter 14: Monitoring Workflows

Figure 14-15 displays the Task view:
Figure 14-15. Task View

Navigator Window

Workflow Run List

Time Window

Task View Output Window

Filtering in Task View
In Task view, you can view all or some workflow tasks. You can filter tasks in the following ways:

By task type. You can filter out tasks to view only tasks you want. For example, if you want to view only session task types, you can filter out all other tasks. For more information on filtering task types and servers, see “Filtering Tasks and Servers” on page 405. By nodes in the Navigator. You can filter the workflow runs the Workflow Monitor displays in the Time window by selecting different nodes in the Navigator. For example, when you select a repository name in the Navigator, the Time window displays all workflow runs that ran on the PowerCenter Servers registered to that repository. When you select a folder name in the Navigator, the Time window displays all workflow runs in that folder. By the most recent runs. To display by the most recent runs, choose Filters-Most Recent Runs and choose the number of runs you want to display. By Time window columns. You can choose Filters-Auto Filter and filter by properties you specify in the Time window columns.

♦ ♦

Using the Task View

431

To filter by Time view columns: 1.

Choose Filters-Auto Filter. The Filter button appears in the some columns of the Time Window in Task view:

Filter Button Select the workflows you want to display.

2. 3.

Click the Filter button in a column in the Time Window. Choose the properties you want to filter.
Tip: If you want to view all tasks, select All to view all tasks.

When you click the Filter button in either the Start Time or Completion Time column, you can choose a custom time to filter.
4.

Select Custom for either Start Time or Completion Time. The Filter Start Time or Custom Completion Time dialog box appears.

5.

Choose to show tasks before, after, or between the time you specify. Select the date and time. Click OK.

432

Chapter 14: Monitoring Workflows

Opening All Folders
You can open all folders that you have read permission on in a Repository. To open all folders in the Task view, right-click the server with the folders you want to view, and then choose Open All Folders. The Workflow Monitor displays workflows and tasks in the folders.

Using the Task View

433

Monitoring Session Details
When the PowerCenter Server runs a Session task, the Workflow Monitor creates session details that provide load statistics for each target in the mapping. You can view session details when the session runs or after the session completes. To view session details, right-click the session in the Workflow Monitor and choose Properties. Click the Transformation Statistics tab in the Properties dialog box. Figure 14-16 shows the session details on the Transformation Statistics tab:
Figure 14-16. Session Properties Transformation Statistics

When you create multiple partitions in a session, the PowerCenter Server provides session details for each partition. You can use these details to determine if the data is evenly distributed among the partitions. For example, if the PowerCenter Server moves more rows through one target partition than another, or if the throughput is not evenly distributed, you might want to adjust the data range for the partitions. When you load data to a target with multiple groups, such as an XML target, the PowerCenter Server provides session details for each group. Table 14-5 lists the information on the Transformation Statistics tab:
Table 14-5. Session Details on the Transformation Statistics Tab Session Detail Instance Name Description Name of the source qualifier instance or the target instance in the mapping. If you create multiple partitions in the source or target, the Instance Name displays the partition number. If the source or target contains multiple groups, the Instance Name displays the group name. Name of the source qualifier or target.

Transformation Name

434

Chapter 14: Monitoring Workflows

Table 14-5. Session Details on the Transformation Statistics Tab Session Detail Applied Rows Description For targets, shows the number of rows the PowerCenter Server successfully applied to the target (that is, the target returned no errors). For sources, shows the number of rows the PowerCenter Server successfully read from the source. Note: The number of applied rows equals the number of affected rows for sources. For targets, shows the number of rows affected by the specified operation. For example, you have a table with one column called SALES_ID and five rows containing the values 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer affects three rows, even though there was only one update request. Or, if you mark rows for update where SALES_ID is 4, the writer affects 0 rows. For sources, shows the number of rows the PowerCenter Server successfully read from the source. Note: The number of applied rows equals the number of affected rows for sources. Number of rows the PowerCenter Server dropped when reading from the source, or the number of rows the PowerCenter Server rejected when writing to the target. Rate at which the PowerCenter Server read rows from the source or wrote data into the target in bytes per second. The most recent error message written to the session log. If you view details after the session completes, this field displays the last error message. The error message code of the most recent error message written to the session log. If you view details after the session completes, this field displays the last error code. The time the PowerCenter Server started to read from the source or write to the target. The Workflow Monitor displays time relative to the PowerCenter Server. The time the PowerCenter Server finished reading from the source or writing to the target. The Workflow Monitor displays time relative to the PowerCenter Server.

Affected Rows

Rejected Rows Throughput (Rows/Sec) Last Error Message Last Error Code Start Time End Time

Monitoring Session Details

435

Creating and Viewing Performance Details
The performance details provide counters that help you understand the session and mapping efficiency. Each source qualifier, target definition, and individual transformation appears in the performance details, along with counters that display performance information about each transformation. You can view performance details through the Workflow Monitor as the session runs, or you can open the resulting file in a text editor. You create performance details by selecting Collect Performance Data in the session properties before running the session. By evaluating the final performance details, you can determine where session performance slows down. Monitoring also provides session-specific details that can help tune the following:
♦ ♦ ♦

Buffer block size Index and data cache size for Aggregator, Rank, Lookup, and Joiner transformations Lookup transformations Enable monitoring Increase Load Manager shared memory Understand performance counters

Before using performance details to improve session performance you must do the following:
♦ ♦ ♦

Enabling Monitoring
To view performance details, you must enable monitoring in the session properties before running the session.
To enable monitoring: 1. 2. 3.

In the Workflow Manager, open the selected session properties. In the Performance settings of the Properties tab, select Collect Performance Data, and click OK. Run the session.

Viewing Session Performance Details
You can view session performance details in the Workflow Monitor or by locating and opening the performance details file. In the Workflow Monitor, you can watch performance details during the session run.

436

Chapter 14: Monitoring Workflows

To view performance details in the Workflow Monitor: 1. 2.

While the session is running, right-click the session in the Workflow Monitor and choose Properties. Click the Performance tab in the Properties dialog box.

3.

Click OK.

To view the performance details file: 1.

Locate the performance details file. The PowerCenter Server names the file session_name.perf, and stores it in the same directory as the session log. If there is no session-specific directory for the session log, the PowerCenter Server saves the file in the default log files directory.

2.

Open the file in any text editor.

Memory Requirement for Performance Details
When you enable monitoring, you must increase the size of the Load Manager Shared Memory. For each session in shared memory that you configure to create performance details, the Load Manager requires 200,000 bytes of additional shared memory. If you create performance details for all sessions, multiply the MaxSessions parameter by 200,000 bytes to calculate the additional shared memory requirements.

Understanding Performance Counters
All transformations have some basic counters that indicate the number of input rows, output rows, and error rows. Source Qualifiers, Normalizers, and targets have additional counters that indicate the efficiency of data moving into and out of buffers. You can use these counters to locate performance bottlenecks.
Creating and Viewing Performance Details 437

Some transformations have counters specific to their functionality. For example, each Lookup transformation has a counter that indicates the number of rows stored in the lookup cache. When you read performance details, the first column displays the transformation name as it appears in the mapping, the second column contains the counter name, and the third column holds the resulting number or efficiency percentage. When you create multiple partitions in a pipeline, the PowerCenter Server generates one set of counters for each partition. The following performance counters illustrate two partitions for an Expression transformation:
Transformation EXPTRANS [1] Counter Expression_input rows Expression_output rows EXPTRANS [2] Expression_input rows Expression_output rows Value 8 8 16 16

Note: When you increase the number of partitions, the number of aggregate or rank input

rows may be different from the number of output rows from the previous transformation. Table 14-6 lists the counters that may appear in the Session Performance Details dialog box or in the performance details file:
Table 14-6. Performance Counters Transformation Counters Aggregator/Rank_inputrows Aggregator/Rank_outputrows Aggregator/Rank_errorrows Aggregator/Rank_readfromcache Aggregator/Rank_writetocache Aggregator and Rank Transformations Aggregator/Rank_readfromdisk Description Number of rows passed into the transformation. Number of rows sent out of the transformation. Number of rows in which the PowerCenter Server encountered an error. Number of times the PowerCenter Server read from the index or data cache. Number of times the PowerCenter Server wrote to the index or data cache. Number of times the PowerCenter Server read from the index or data file on the local disk, instead of using cached data. Number of times the PowerCenter Server wrote to the index or data file on the local disk, instead of using cached data. Number of new groups the PowerCenter Server created. Number of times the PowerCenter Server used existing groups.

Aggregator/Rank_writetodisk

Aggregator/Rank_newgroupkey Aggregator/Rank_oldgroupkey

438

Chapter 14: Monitoring Workflows

Table 14-6. Performance Counters Transformation Counters Lookup_inputrows Lookup Transformation Lookup_outputrows Lookup_errorrows Lookup_rowsinlookupcache Joiner_inputMasterRows Joiner_inputDetailRows Joiner_outputrows Joiner_errorrows Joiner_readfromcache Joiner_writetocache Joiner_readfromdisk* Description Number of rows passed into the transformation. Number of rows sent out of the transformation. Number of rows in which the PowerCenter Server encountered an error. Number of rows stored in the lookup cache. Number of rows the master source passed into the transformation. Number of rows the detail source passed into the transformation. Number of rows sent out of the transformation. Number of rows in which the PowerCenter Server encountered an error. Number of times the PowerCenter Server read from the index or data cache. Number of times the PowerCenter Server wrote to the index or data cache. Number of times the PowerCenter Server read from the index or data files on the local disk, instead of using cached data. Number of times the PowerCenter Server wrote to the index or data files on the local disk, instead of using cached data. Number of times the PowerCenter Server read from the index or data files on the local disk, instead of using cached data. Number of times the PowerCenter Server wrote to the index or data cache. Number of times the PowerCenter Server accessed the index or data files on the local disk. Number of times the PowerCenter Server wrote to the detail cache. The PowerCenter Server generates this counter only if you join data from a single source. Number of duplicate rows the PowerCenter Server found in the master relation. Number of times the PowerCenter Server used the duplicate rows in the master relation.

Joiner Transformation

Joiner_writetodisk*

Joiner_readBlockFromDisk**

Joiner_writeBlockToDisk** Joiner_seekToBlockInDisk** Joiner_insertInDetailCache*

Joiner_duplicaterows Joiner_duplicaterowsused

Creating and Viewing Performance Details

439

Table 14-6. Performance Counters Transformation Counters Transformation_inputrows All Other Transformations Transformation_outputrows Transformation_errorrows Description Number of rows passed into the transformation. Number of rows sent out of the transformation. Number of rows in which the PowerCenter Server encountered an error.

*The PowerCenter Server generates this counter when you use sorted input for the Joiner transformation. **The PowerCenter Server generates this counter when you do not use sorted input for the Joiner transformation.

If you have multiple source qualifiers and targets, evaluate them as a whole. For source qualifiers and targets, a high value is considered 80-100 percent. Low is considered 0-20 percent.

440

Chapter 14: Monitoring Workflows

Tips
Reduce the size of the Time window. When you reduce the size of the Time window, the Workflow Monitor refreshes the screen faster, reducing flicker. Use the Repository Manager to truncate the list of workflow logs. If the Workflow Monitor takes a long time to refresh from the repository or to open folders, truncate the list of workflow logs. When you configure a session or workflow to archive session logs or workflow logs, the PowerCenter Server saves those logs in local directories. The repository also creates an entry for each saved workflow log and session log. If you move or delete a session log or workflow log from the workflow log directory or session log directory, truncate the lists of workflow and session logs to remove the entries from the repository. The repository always retains the most recent workflow log entry for each workflow.

Tips

441

442

Chapter 14: Monitoring Workflows

Chapter 15

Using Multiple Servers
This chapter covers the following topics:
♦ ♦ ♦ ♦

Overview, 444 Using Server Variables, 445 Working with Server Grids, 446 Configuring Server Grids, 450

443

Overview
You can register and run multiple PowerCenter Servers against a local or global repository. When you register multiple PowerCenter Servers to the same repository, you can distribute the workload across the servers to increase performance. You have the following options to run workflows and sessions using multiple servers:

Use a server grid to run workflows. You can use a server grid to automate the distribution of sessions. A server grid is a server object that distributes sessions in a workflow to servers based on server availability. The grid maintains connections to multiple servers in the grid. For more information about using server grids, see “Working with Server Grids” on page 446. Change the assigned server for a workflow. When you configure a workflow, you assign a server to run that workflow. Each time the scheduled workflow runs, it runs on the assigned server. You can change the assigned server for a workflow in the workflow properties. Change the assigned server for a session. When you configure a session, by default it runs on the server assigned to the workflow. You can change the assigned server for a session in the session properties. Start a workflow on a non-assigned server. By default, each workflow runs on its assigned PowerCenter Server. You can run a workflow on a non-assigned server if the workflow is not currently running. Use the Start Workflow button on the Standard toolbar, and choose a PowerCenter Server.

You can use the Workflow Monitor to monitor workflows running on multiple servers. For server grids, the Workflow Monitor shows the individual status of each server in a grid. You can identify the server grid that a server is assigned to by right-clicking the server in the Workflow Monitor and selecting Properties. For more information about using the Workflow Monitor, see “Monitoring Workflows” on page 401.
Tip: You might want to place the most CPU intensive sessions on the more powerful servers.

444

Chapter 15: Using Multiple Servers

Using Server Variables
In a multiple server environment, each server must have access to input files and directories used by the session it runs. You can use server variables to simplify the process of changing the server that runs a session or workflow. Server variables set the paths for files and caches created during a session. If you override a server variable in a workflow or session, you may need to manually edit the session or workflow properties. If the new PowerCenter Server cannot locate the override directory, it cannot run the session.

Using a File Server
Consider setting up a central location or using a file server accessible to all the PowerCenter Servers. This allows you to run sessions on different servers without moving cache files and input files.
♦ ♦

Configure $PMRootDir for each server to point to the central location. Use the same variables on each machine.

If you do not use a central file server, you need to relocate input files to the default directories of the new PowerCenter Server. Input files can include parameter files, cache files, external procedures, and flat file sources.

Running Sessions with Cache Files
In a multiple server environment, each PowerCenter Server needs access to the index and data cache files created during previous sessions. This can include incremental aggregation files and persistent lookup cache files. If the PowerCenter Server cannot locate the cache files, it rebuilds them. When the PowerCenter Server rebuilds incremental aggregation files, it loses aggregate history. Use one of the following methods to save aggregate history in a multiple server environment:
♦ ♦

Use consistent server variables. Use the same variable for $PMCacheDir for each PowerCenter Server running incremental aggregation sessions. Run incremental aggregation sessions on the same machine. When you run large incremental aggregation sessions, you might want to consider assigning a server to a session and overriding the server variable to write to a drive local to the assigned PowerCenter Server. Move incremental aggregation files. If you cannot make files accessible to each PowerCenter Server, or if the files are very large, you must move them to the server running the session.

Note: Since aggregate files can become very large, make sure the directory can accommodate

the necessary files.

Using Server Variables

445

Working with Server Grids
You can increase workflow performance by using a server grid to balance the server workload. When you create a server grid, you can add PowerCenter Servers to the grid. When you run a workflow against a PowerCenter Server in the grid, that server becomes the master server for the workflow. The master server runs all non-session tasks and assigns session tasks to run on other servers in the grid. The other servers become worker servers for that workflow run. You can specify server grid distribution options at the server level, workflow level, and session level. PowerCenter Servers specified at the session level override both server level and workflow level properties. For more information about these overrides, see “Configuring Server Grids” on page 450.
Note: You cannot run a single session on multiple servers.

Distributing Sessions
In a server grid, the master server starts the workflow and then distributes sessions to worker servers. The master server is the server that starts a workflow. A worker server is a server that runs sessions assigned to it by a master server. By default, each PowerCenter Server in a server grid is both a master server and a worker server. This means that a server in a grid can distribute sessions to and receive sessions from every server in the grid. The master server distributes sessions that are ready to run to available worker servers in a round-robin fashion based on server availability. The starting point for the session assignment is random. If a worker server is running the maximum number of concurrent sessions, the master server assigns another worker server to run the session. If all worker servers are running the maximum number of concurrent sessions, the master server places the session in its own ready queue. For information about configuring the maximum number of concurrent sessions, see “Installing and Configuring the PowerCenter Server on Windows” and “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide. Figure 15-1 shows how a master server distributes the sessions in Workflow1 among the servers in a grid. The server grid contains Server A, Server B, and Server C. Server A is the master server, and Server B and Server C are worker servers.
Figure 15-1. Distributing Sessions in a Server Grid In Workflow1, Server A is the master server. Server B Server A Server C

Server A

446

Chapter 15: Using Multiple Servers

Figure 15-2 shows how a master server distributes sessions in a workflow where a non-session task exists. Server C is the master server, and Server A and Server B are worker servers. Server C runs all non-session tasks it encounters and assigns sessions in a round-robin fashion.
Figure 15-2. Running a Non-session Task on the Master Server Server C is the master server. Server A Server C

Server C

Server B

Server A

Server C

Server B

Server Grid Connectivity
PowerCenter Servers in a server grid create and maintain a connection to each other. A server grid contains information about other servers in the grid. When you start a PowerCenter Server, it fetches the server grid object and creates a TCP/IP connection to the other servers in the grid. Each server in the grid monitors the other servers to check connectivity status. As a result, the grid notifies each server when you add, edit, or delete any server in the grid. You can add servers to a server grid at any time. When a server starts up, it connects to the grid and can run sessions from master servers and distribute sessions to worker servers in the grid. The Workflow Monitor communicates with the master server to monitor progress of workflows, get session statistics, retrieve performance details, and stop or abort the workflow or task instances. If a PowerCenter Server loses its connection to the grid, it tries to reestablish a connection. You do not need to restart the server for it to connect to the grid. If a PowerCenter Server is not connected to the server grid, the other PowerCenter Servers in the server grid do not send it tasks. When a PowerCenter Server cannot reestablish a connection to the grid, session and workflow completion depends on factors such as shut down mode and which server loses connectivity.

Working with Server Grids

447

Table 15-1 lists scenarios where a server grid can lose connectivity:
Table 15-1. Losing Connectivity in a Server Grid Connectivity Loss Worker server shuts down unexpectedly or you shut it down before it receives a session. Worker server shuts down unexpectedly while running a session. Server Behavior The worker server is not available to the master servers in the server grid. Master servers do not assign a session to the unavailable worker server and proceed with the round-robin distribution of sessions. The master server marks the status of the session as terminated. The worker server stops running all sessions. The session settings you specify determine if the workflow fails. For more information about the Fail parent if this task fails option, Fail parent if this task does not run option, or Disable this task option, see “Configuring Tasks” on page 135. The shut down mode you specify determines how the worker server handles sessions when it shuts down. When you shut down the worker server in complete mode, it continues to run the sessions it started until it completes, but does not accept sessions from master servers. For more information about shut down modes, see “pmcmd Reference” on page 594. The worker server continues to run the session and writes its status to the session log. However, the master server marks the status of the session as terminated. You must resume the workflow or resume from the failed task to continue running the workflow and update the session status. If you do not need the session status of the previous run, you can restart the workflow or restart the workflow from a task to start up a new workflow run. For more information, see “Working with Tasks and Workflows” on page 416. Workflow fails. You must restart the workflow on another server or wait for the master server to become available. The shut down mode you specify determines how the master server handles workflows and sessions when it shuts down. When you shut down the master server in complete mode, it continues to run the workflows and sessions it started until they complete, but does not accept tasks from other master servers. For more information about shut down modes, see “pmcmd Reference” on page 594. The master server continues to run workflows as a standalone PowerCenter Server. If a worker server is assigned to a session, the session fails because the master server cannot distribute the session to the worker server. The session settings you specify determine if the workflow fails. For more information about the Fail parent if this task fails option, Fail parent if this task does not run option, or Disable this task option, see “Configuring Tasks” on page 135.

You shut down a worker server while it is running a session.

Worker server loses its network connection and cannot connect to the server grid.

Master server shuts down unexpectedly. You shut down the master server while running a workflow or session.

Master server loses its network connection and cannot connect to the server grid.

Server Grid Guidelines and Requirements
Informatica recommends that each PowerCenter Server in a server grid uses the same operating system. While you can specify different session log directories, workflow log

448

Chapter 15: Using Multiple Servers

directories, and temp directories for the PowerCenter Servers, each PowerCenter Server in a server grid must meet the following requirements:
♦ ♦ ♦ ♦ ♦

Register each PowerCenter Server to the same repository. Use the same database connectivity for each PowerCenter Server. Use the same server variables for each server in a grid, except for the $PMTempDir, $PMSessionLogDir, and $PMWorkflowLogDir variables. Use the same cache directory. Configure the following PowerCenter Server parameters the same:
− − − − − − − − − − −

Fail session if maximum number of concurrent sessions is reached PMServer 4.0 date handling compatibility Aggregate treat null as zero Aggregate treat rows as insert Treat CHAR as CHAR on read Data Movement Mode Validate Data Code Pages Output Session Log In UTF8 Export Session Log Lib Name Treat Null in comparison operator as Data Display Format

♦ ♦

PowerCenter Servers must be the same product version. DB2 EEE loader must be on the same machine as PowerCenter Server.

Working with Server Grids

449

Configuring Server Grids
When you work with server grids, you can configure properties in the grid, workflow, and session. When you run a session using a server grid, the server grid evaluates session properties first, then workflow properties, and then grid properties.

Configuring Server Grid Properties
By default, each PowerCenter Server you add to the server grid can be both a master server and a worker server. Each server accepts tasks from the grid. You can configure a server to be only a master server by clearing Accept task from Server Grid. A PowerCenter Server that is only a master server does not run sessions from other servers in the grid, but it can distribute sessions to other servers in the grid.

Configuring Workflow Properties
When you configure a workflow, you can configure the following server properties:
♦ ♦

You can assign a server to run the workflow. When you assign a server to a workflow, the server becomes the master server for the workflow. You can configure the entire workflow to run only on the master server. By default, the master server distributes sessions to worker servers. You can configure the session to override this workflow configuration.

Configuring Session Properties
You can assign a server to run a session. When you assign a server to a session, you override workflow and grid server assignments. You might want to assign a server to sessions that use the following features:

Caching. When you run sessions that access large cache files, such as incremental aggregation files, you can increase performance by using a drive local to the PowerCenter Server for the cache directory. Assign a server to a session and override the server variable to write to a drive local to the PowerCenter Server. External loader. Assign a server to run DB2 EEE external loader sessions. DB2 EEE loaders require that the loader process runs on the PowerCenter Server running the session.

Note: If you assign a server to a session that is not in the grid, and the master server cannot

connect to the assigned server, the session fails.

450

Chapter 15: Using Multiple Servers

Override Examples
Table 15-2 shows a configuration where the session properties override the workflow properties. The session runs on Server B even though you select the workflow option to run all tasks on Server A because the session is assigned to Server B.
Table 15-2. Override Workflow Properties Level Grid Workflow Session Configuration - Server A accepts tasks from server grid. - Server B accepts tasks from server grid. - Run on Server A. - Tasks must run on server. Run on Server B.

Table 15-3 shows a configuration where the session properties override the server grid properties. The session runs on Server B, even though you configure Server B not to accept tasks from the grid because you assigned the session to Server B.
Table 15-3. Override Server Grid Properties Level Grid Workflow Session Configuration - Server A accepts tasks from server grid. - Server B does not accept tasks from server grid. - Run on Server A. - Tasks can run on other servers in the grid. Run on Server B.

Steps for Creating a Server Grid
Use the Server Grid Browser to create and edit server grids. When you create or edit a server grid, you can choose servers from the list of available servers. A server is available if it is registered in the same repository and is not part of another server grid. You can add up to 64 PowerCenter Servers in a grid. Use the following procedure to create a server grid.
To create a server grid: 1.

Choose Server-Server Grid. The Server Grid Browser opens.

2.

Click New.

Configuring Server Grids

451

The Server Grid Editor opens with a list of available PowerCenter Servers.

3. 4.

Enter a server grid name and description. Select the server you want to include in the server grid, and click Add. The selected server appears in Selected Servers column.

5.

Clear Accept tasks from Server Grid if you want the server to be only a master server.

Configure as both a master and worker server.

6.

Repeat steps 4 and 5 until you have chosen all the servers for the grid.

452

Chapter 15: Using Multiple Servers

7.

Click OK. The server grid name appears in the Server Grid Browser. Select Show servers in grid to view the servers in the grid.

8.

Click Close.

Configuring Server Grids

453

454

Chapter 15: Using Multiple Servers

Chapter 16

Log Files
This chapter covers the following topics:
♦ ♦ ♦ ♦

Overview, 456 Workflow Logs, 457 Session Logs, 463 Reject Files, 476

455

Overview
The PowerCenter Server can create log files for each workflow it runs. These files contain information about the tasks the PowerCenter Server performs, plus statistics about the workflow and all sessions in the workflow. If the writer or target database rejects data during a session run, the PowerCenter Server creates a file that contains the rejected rows. The PowerCenter Server can create the following types of log files:

Workflow log. Contains information about the workflow run such as workflow name, tasks executed, and workflow errors. By default, the PowerCenter Server writes this information to the server log or Windows Event Log, depending on how you configure the PowerCenter Server. If you wish to create a workflow log, enter a workflow file name in the workflow properties. For more information, see “Workflow Logs” on page 457. Session log. Contains information about the tasks that the PowerCenter Server performs during a session, plus load summary and transformation statistics. By default, the PowerCenter Server creates one session log for each session it runs. If a workflow contains multiple sessions, the PowerCenter Server creates a separate session log for each session in the workflow. For more information, see “Session Logs” on page 463. Reject file. Contains rows rejected by the writer or target file during a session run. If the writer or target does not reject any data during a session, the PowerCenter Server does not generate a reject file for that session. For more information, see “Reject Files” on page 476.

By default, the PowerCenter Server saves each type of log file in its own directory. The PowerCenter Server represents these directories using server variables. Table 16-1 shows the default location for each type of log file:
Table 16-1. Log File Default Locations Log File Type Workflow logs Session logs Reject files Default Directory (Server Variable) $PMWorkflowLogDir $PMSessionLogDir $PMBadFileDir Value $PMRootDir/WorkflowLogs $PMRootDir/SessLogs $PMRootDir/BadFiles

You can change the default directories at the server level by editing the server connection in the Workflow Manager. You can also override these values for individual workflows or sessions by updating the workflow or session properties.

456

Chapter 16: Log Files

Workflow Logs
You can configure a workflow to create a workflow log. When you do this, the PowerCenter Server writes information such as process initialization, workflow task run information, errors encountered, and workflow run summary to the workflow log. In general, a workflow log contains the following information about the workflow:
♦ ♦ ♦ ♦ ♦ ♦ ♦

Workflow name Workflow status Status of tasks and worklets in the workflow Start and end times for tasks and worklets Results of link conditions Some session messages and errors Errors encountered during the workflow

The PowerCenter Server categorizes workflow log error messages into severity levels. The PowerCenter Server either writes or does not write an error message to the log file based on the error severity level. You can set the Error Severity Level for Log Files in the PowerCenter Server setup program. For more information, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide. You can also configure the PowerCenter Server to suppress writing messages to the workflow log file completely. As with PowerCenter Server logs and session logs, the PowerCenter Server enters a code number into the workflow log file message along with message text. You can find information on error messages in the Troubleshooting Guide. You configure a workflow to create a workflow log by entering a workflow log file name in the workflow properties. If you choose to create a workflow log, the PowerCenter Server saves the workflow log in a directory entered for the server variable $PMWorkflowLogDir in the PowerCenter Server registration. You can override the workflow log directory at the server level or at the workflow level. By default, the PowerCenter Server saves one workflow log for each workflow. If you want to save multiple logs for different workflow runs, you can configure the workflow to save a workflow log file by timestamp, which permits an unlimited number of workflow logs, or by run, which saves a specified number of logs. To view previous workflow logs, save log files by timestamp. If you choose not to create workflow logs, the PowerCenter Server writes the workflow log messages to the to the server log or Windows Event Log, depending on how you configure the PowerCenter Server. For more information on configuring the PowerCenter Server, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.

Workflow Logs

457

Workflow Log Messages
The PowerCenter Server precedes each message in the log file with a code and number. It also precedes some messages with a timestamp. The code defines a group of messages for a specific process. The number defines a specific message. The message can provide general information or it can be an error message. You can configure the PowerCenter Server to append a time stamp to every message it writes to the workflow log. To do this, enable the Time Stamp Workflow Log option in the PowerCenter Server setup program. For more information, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.

Workflow Log Codes
You can use the workflow log to determine the cause of workflow problems. To resolve workflow problems, locate the relevant log file codes and text prefixes in the workflow log, then see the Troubleshooting Guide for details. You can find workflow-related server messages in the UNIX server log (default name: pmserver.log) or in the Windows Event Log (viewed with the Event Viewer). Table 16-2 describes the codes that can appear in workflow logs:
Table 16-2. Workflow Log Codes Error Code CMN LM REP TM VAR Description Messages related to databases, memory allocation, Lookup and Joiner transformations, and internal errors. Messages related to the Load Manager. Messages related to repository functions. Messages related to Data Transformation Manager (DTM). Messages related to mapping variables.

Workflow Log Sample
The following sample is a workflow log from a simple workflow that shows log file codes:
INFO : LM_36315 [Tue Nov 18 11:16:38 2003] : (270|305) Starting execution of workflow [wf_PhoneList]. INFO : LM_36330 [Tue Nov 18 11:16:38 2003] : (270|305) Starting execution of start instance [StartWorkflow]. INFO : LM_36333 [Tue Nov 18 11:16:38 2003] : (270|305) Execution of start instance [StartWorkflow] succeeded. INFO : LM_36505 : (270|305) Link [StartWorkflow --> s_PhoneList]: empty expression string, evaluated to TRUE. INFO : LM_36330 [Tue Nov 18 11:16:38 2003] : (270|305) Starting execution of session instance [s_PhoneList].

458

Chapter 16: Log Files

INFO : LM_36522 : (270|305) Started DTM process [pid = 273] for session instance [s_PhoneList]. INFO : CMN_1760 : (273|255) Message from session: LM_36033 [Connected to repository [SALES] running on server:port [monster]:[5001] user [Administrator]]. INFO : CMN_1760 : (273|255) Message from session: TM_6228 [Writing session output to log file [d:\pcserver\SessLogs\s_PhoneList.log].]. INFO : LM_36333 [Tue Nov 18 11:16:43 2003] : (270|306) Execution of session instance [s_PhoneList] succeeded. INFO : LM_36318 [Tue Nov 18 11:16:43 2003] : (270|306) Execution of workflow [wf_PhoneList] succeeded.

Configuring Workflow Logs
You can configure workflow log options in the workflow properties. You can configure the following information for a workflow log:

Location. You can configure the directory where you want the workflow log created. By default, the PowerCenter Server creates the workflow log in the directory configured for the $PMWorkflowLogDir server variable. You can enter a different directory, but if the directory does not exist or is not local to the PowerCenter Server that runs the workflow, the workflow fails. Name. If you wish to create a workflow log, you can enter a name for the workflow log file. If you do not enter a filename, the PowerCenter Server does not create a workflow log. Instead, the PowerCenter Server writes workflow log messages to the Windows Event Log or UNIX server log. Archive. You can configure the number of workflow logs you want the PowerCenter Server to archive for each workflow. By default, the PowerCenter Server does not archive workflow logs.

Archiving Workflow Logs
By default, the PowerCenter Server does not save multiple logs for a single workflow. It creates one workflow log for each workflow and overwrites the existing log with the latest workflow log. If you wish to save multiple logs for a workflow, you can configure the PowerCenter Server to do this. The PowerCenter Server can save workflow logs in two ways:
♦ ♦

Save a selected number of logs Save all logs by timestamp

If you configure the workflow to save a specific number of workflow logs, it names the most recent log filename.log. It then cycles through a closed naming sequence for historical logs as follows: filename.log.0, filename.log.1, filename.log.2, …, filename.log.n-1, where n represents the number of workflow logs. Because the PowerCenter Server cycles through the numeric naming sequence, check the workflow log file timestamp to determine the chronological order of those files.
Workflow Logs 459

Instead of entering a specific number of workflow logs to save, you can use the server variable $PMWorkflowLogCount. When you use $PMWorkflowLogCount server variable, the PowerCenter Server archives the number of workflow logs configured for the server variable. If you use $PMWorkflowLogCount for all workflows, you can increase the number of archived workflow logs for all workflows by changing the server variable.
Note: By default, $PMWorkflowLogCount is set to 0. To archive workflow logs using

$PMWorkflowLogCount, configure it for a larger number of workflow logs. For details on configuring server variables, see “Registering the PowerCenter Server” on page 46. You can also save all workflow logs by configuring a workflow to save logs by timestamp. When timestamping workflow logs, the PowerCenter Server appends the year, month, day, hour, and minute of the workflow completion to the log file. The resulting log file name is filename.log.yyyymmddhhmi, where:
♦ ♦ ♦ ♦ ♦

yyyy = year mm = month, ranging from 1-12 dd = day, ranging from 1-31 hh = hour, ranging from 0-23 mi = minute, ranging from 0-59

To prevent filling the workflow log directory, periodically delete or backup log files when using the timestamp option.
Note: You can also truncate workflow and session log entries from the repository. For more

information, see “Using the Repository Manager” in the Repository Guide.

Steps for Configuring Workflow Logs
You can configure workflow log information on the Properties tab of the workflow properties.
To configure workflow log information: 1.

In the Workflow Manager, open the workflow properties.

460

Chapter 16: Log Files

2.

Select the Properties tab.

3.

Enter the following workflow log options:
Option Name Parameter File Name Description Designates the name and directory for the parameter file. Use the parameter file to define workflow parameters. For details on parameter files, see “Parameter Files” on page 511. Optionally enter a file name, or a file name and directory. If you leave this field blank, the PowerCenter Server does not create a workflow log. Instead, the PowerCenter Server writes workflow log messages to the server log or Windows Event Log, depending on how you configure the PowerCenter Server. If you fill in this field, the PowerCenter Server appends information in this field to that entered in the Workflow Log File Directory field. For example, if you have "C:\workflow_logs\" in the Workflow Log File Directory field, then enter "logname.txt" in the Workflow Log File Name field, the PowerCenter Server writes logname.txt to the C:\workflow_logs\ directory. Designates a location for the workflow log file. By default, the PowerCenter Server writes the log file in the server variable directory, $PMWorkflowLogDir. If you enter a full directory and file name in the Workflow Log File Name field, clear this field.

Workflow Log File Name

Workflow Log File Directory

Workflow Logs

461

Option Name Save Workflow Log By

Description If you select Save Workflow Log by Timestamp, the PowerCenter Server saves all workflow logs, appending a timestamp to each log. If you select Save Workflow Log by Runs, the PowerCenter Server saves a designated number of workflow logs. Configure the number of workflow logs in the Save Workflow Log for These Runs option. For details on these options, see “Archiving Workflow Logs” on page 459. You can also use the $PMWorkflowLogCount server variable to save the configured number of workflow logs for the PowerCenter Server. The number of historical workflow logs you want the PowerCenter Server to save. The Informatica saves the number of historical logs you specify, plus the most recent workflow log. Therefore, if you specify 5 runs, the PowerCenter Server saves the most recent workflow log, plus historical logs 0 to 4, for a total of 6 logs. You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the PowerCenter Server saves only the most recent workflow log.

Save Workflow Log for These Runs

4.

Click OK to save the workflow.

Viewing Workflow Logs
Workflow logs are text files that you can open with any text editor. The PowerCenter Server saves workflow logs in the directory you specify in the Workflow Log File Directory field in the workflow properties. You can also view workflow logs through the Workflow Monitor. When you do this, the Workflow Manager creates a temporary file that stores the workflow log. You can view the temporary file through the Workflow Monitor. The PowerCenter Server generates the workflow log based on the PowerCenter Server code page. You can specify the language in which you want to view the workflow log based on the locale of the machine hosting the PowerCenter Server.
To use the Workflow Monitor to view the most recent workflow log: 1. 2. 3.

In the Navigator window, connect to the server on which the workflow runs. Open the folder that contains the workflow. Right-click the workflow and choose Get Workflow Log.

If you save workflow logs by timestamp, you can also use the Workflow Monitor to view past workflow logs. To do this, right click the workflow in the Gantt chart view and choose Get Workflow Log. For more information about the Workflow Monitor, see “Using the Workflow Monitor” on page 404.

462

Chapter 16: Log Files

Session Logs
The session log file contains information about all tasks the PowerCenter Server performs, plus the load summary and transformation statistics. The amount of detail in the session log depends on the tracing level that you set. You can define the tracing level for each transformation or for the entire session. The session-level tracing overrides any transformation-level tracing levels. In general, the session log contains the following information about the session:
♦ ♦ ♦ ♦ ♦ ♦ ♦

Allocation of system shared memory Execution of pre-session commands Creation of SQL commands for reader and writer threads Start and end times for target loading Errors encountered during session Execution of post-session commands Load summary of reader, writer, and Data Transformation Manager (DTM) statistics

By default, the PowerCenter Server saves session logs in the directory for the PowerCenter Server variable $PMSessionLogDir, which you define in the Workflow Manager. The default name for the session log is s_mapping name.log. You can override the session log name and location in the session properties. The PowerCenter Server does not archive session logs by default. Instead, it creates one log for each session and overwrites the existing log with the latest session log. However, you can configure the session to archive session logs. For more information, see “Archiving Session Logs” on page 471. By default, the PowerCenter Server generates session log files based on the PowerCenter Server code page. However, if you enable the Output Session Log in UTF-8 option on the Configuration tab of the PowerCenter Server setup program, the PowerCenter Server writes to the session log using the UTF-8 character set.
Note: By default, the PowerCenter Server writes row errors to the session log. However, if you

enable row error logging in the sessions properties, the PowerCenter Server does not write dropped rows to the session log. When you enable row error logging, you can configure the PowerCenter Server to write row errors to the session log in addition to the row error log by enabling verbose data tracing.

Session Log Messages
The PowerCenter Server precedes each message in the log file with a thread identification and then a code and number. The code defines a group of messages for a specific process. The number defines a specific message. The message can provide general information or it can be an error message.

Session Logs

463

You can configure the PowerCenter Server to write session log messages to an external library as well as to the session log. To do this, you can set the Export Session Log Lib Name in the PowerCenter Server setup program. For more information, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.

Session Log Codes
You can use the session log to determine the cause of session problems. To resolve session problems, locate the relevant log file codes and text prefixes in the session log, then see the Troubleshooting Guide for details. You can find session-related server messages in the UNIX server log (default name: pmserver.log) or in the Windows Event Log (viewed with the Event Viewer). Table 16-3 describes the codes that can appear in session logs:
Table 16-3. Session Log Codes Message Code BLKR CNX CMN DBG DBGR EP ES FR FTP HIER LM NTSERV OBJM ODL PETL PMF RAPP REP RR SF Description Messages related to reader process, including Application, relational, or flat file. Messages related to the Repository Agent connections. Messages related to databases, memory allocation, Lookup and Joiner transformations, and internal errors. Messages related to PowerCenter Server loading and debugging. Messages related to the Debugger. Messages related to external procedures. Messages related to the Repository Server. Messages related to file sources. Messages related to File Transfer Protocol operations. Messages related to reading XML sources. Messages related to the Load Manager. Messages related to Windows server operations. Messages related to the Repository Agent. Messages related to database functions. Messages related to pipeline partitioning. Messages related to caching Aggregator, Rank, Joiner, or Lookup transformations. Messages related to the Repository Agent. Messages related to repository functions. Messages related to relational sources. Messages related to server framework, used by Load Manager and Repository Server.

464

Chapter 16: Log Files

Table 16-3. Session Log Codes Message Code SORT TE TM TT VAR WRT XMLR XMLW Description Messages related to the Sorter transformation. Messages related to transformations. Messages related to Data Transformation Manager (DTM). Messages related to transformations. Messages related to mapping variables. Messages related to the Writer. Messages related to the XML Reader. Messages related to the XML Writer.

Thread Identification
The thread identification consists of the thread type and a series of numbers separated by underscores. The numbers following a thread name indicate the following information:
♦ ♦ ♦

Target load order group number Partition point number Partition number

Note: The PowerCenter Server writes an asterisk (*) as the partition point number for writer

threads. The PowerCenter Server prints the thread identification before the log file code and the message text in the session log. The following example illustrates a reader thread from target load order group one, concurrent source set one, source pipeline one, and partition one:
READER_1_1_1> DBG_21438 Reader: Source is [p152636], user [jennie]

For more information on partitioning, see “Pipeline Partitioning” on page 345. When you configure the PowerCenter Server to read Joiner transformation sources sequentially, the PowerCenter Server writes numbers with the following information after the thread name:
♦ ♦ ♦ ♦

Target load order group number Concurrent source set number Partition point number Partition number

A concurrent source set is the group of sources in a target load order group the PowerCenter Server reads concurrently. A target load order group might contain multiple concurrent source sets if it contains a Joiner transformation and you configure the PowerCenter Server to read Joiner transformation sources sequentially.

Session Logs

465

Enable the PMServer 6.X Joiner source order compatibility PowerCenter Server option to configure it to read Joiner transformation sources sequentially.

Session Log Sample
The following sample is an excerpt from a session log file that illustrates log file codes and thread identifications:
TM_6703 Session [s_m_SampleSessionLog] is run by PowerCenter Server [sarao]. MASTER> CMN_1688 Allocated [12000000] bytes from process memory for [DTM Buffer Pool]. MASTER> PETL_24000 Parallel Pipeline Engine initializing. MASTER> PETL_24001 Parallel Pipeline Engine running. MASTER> PETL_24003 Initializing session run. MAPPING> TM_6014 Initializing session [s_m_SampleSessionLog] at [Tue Aug 03 11:29:57 2004] . . . *****START LOAD SESSION*****

Load Start Time: Tue Aug 03 11:30:00 2004

Target tables:

Emp_target

READER_1_1_1> BLKR_16019 Read [1] rows, read [0] error rows for source table [EMP_SRC] instance name [EMP_SRC] READER_1_1_1> BLKR_16008 Reader run completed. TRANSF_1_1_1> DBG_21216 Finished transformations for Source Qualifier [SQ_EMP_SRC]. Total errors [0] WRITER_1_*_1> WRT_8167 Start loading table [Emp_target] at: Tue Aug 03 11:30:00 2004 . MASTER> PETL_24002 Parallel Pipeline Engine finished. MASTER> PETL_24012 Session run completed successfully.

466

Chapter 16: Log Files

Some messages are embedded within other messages. For example, a code CMN_1039 contains informational messages from the Microsoft SQL Server as it changes to the source database to be used in the session.
Note: If you configure the PowerCenter Server to run in ASCII mode, the session log file

reports the sort order as Binary, even if you select a different sort order in the session properties.

Load Summary
The session log includes a load summary that reports the number of rows inserted, updated, deleted, and rejected for each target as of the last commit point. The PowerCenter Server reports the load summary for each session by default. However, you can set tracing level to Verbose Initialization or Verbose Data to report the load summary for each transformation. The following sample is an excerpt from a load summary:
*****START LOAD SESSION*****

Load Start Time: Tue Aug 03 11:30:00 2004

Target tables:

Emp_target Commit on end-of-data Aug 03 11:30:07 2004

===================================================

WRT_8036 Target: Emp_target (Instance Name: [Emp_target]) WRT_8038 Inserted rows - Requested: 1 Rejected: 0 Affected: 1 Applied: 1

WRITER_1_*_1> WRT_8035 Load complete time: Tue Aug 03 11:30:07 2004

LOAD SUMMARY ============

Session Logs

467

WRT_8036 Target: Emp_target (Instance Name: [Emp_target]) WRT_8038 Inserted rows - Requested: 1 Rejected: 0 Affected: 1 . . , WRITER_1_*_1> WRT_8043 *****END LOAD SESSION***** Applied: 1

The PowerCenter Server reports statistics for each of the following operations performed on the target:
♦ ♦

Inserted. Shows the number of rows the PowerCenter Server marked for insert into the target. The number of affected rows cannot be larger than requested for this operation. Updated. Shows the number of rows the PowerCenter Server marked for update in the target. The number of affected rows can be different from the number of requested rows. For example, you have a table with one column called SALES_ID and five rows containing the values: 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer affects three rows, even though there was only one update request. Or, if you mark rows for update where SALES_ID is 4, the writer affects 0 rows. Deleted. Shows the number of rows the PowerCenter Server marked to remove from the target. The number of affected rows can be different from the number of requested rows. Rejected. Shows the number of rows the PowerCenter Server rejected during the writing process. These rows cannot be applied to the target. For the Rejected rows category, the number of affected and applied rows is always zero since these rows are not written to the target. Requested rows. Shows the number of rows the writer actually received for the specified operation. Applied rows. Shows the number of rows the writer successfully applied to the target (that is, the target returned no errors). Affected rows. Shows the number of rows affected by the specified operation. Depending on the operation, the number of affected rows can be different from the number of requested rows. For example, you have a table with one column called SALES_ID and five rows containing the values: 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer affects three rows, even though there was only one update request. Or, if you mark rows for update where SALES_ID is 4, the writer affects 0 rows. Rejected rows. Shows the number of rows the writer could not apply to the target. For example, the target database rejects a row if the PowerCenter Server attempts to insert NULL into a not-null field. The PowerCenter Server writes all rejected rows to the session reject file, or to the row error log, depending on how you configure the session.

♦ ♦

The load summary provides the following statistics:
♦ ♦ ♦

468

Chapter 16: Log Files

Mutated from update. Shows the number of rows originally flagged for update that are instead inserted into the target when the session is configured Update Else Insert.

If the number of rows requested, applied, rejected, and affected are all zero for any of these four operations, the operation does not appear as a line in the load summary. If no data is passed to the target, the writer reports the following message:
No data loaded for this target.

Detailed Transformation Statistics
The DTM enables transformation statistics in the session log for two levels of tracing, Verbose Initialization and Verbose Data. Transformation statistics appear after the load summary in the log file. The PowerCenter Server reports the following details for each transformation in the mapping:
♦ ♦ ♦ ♦

The name of the transformation The number of input rows and the name of the input source The number of output rows and the name of the output transformation or target The number of rows dropped

The following sample is an excerpt from the transformation statistics in a session log file:
DETAILED TRANSFORMATION ROW STATISTICS for DSQ [SQ_EMPLOYEES], Partition[1] --------------------------------MAPPING> MAPPING> TT_11031 Transformation [SQ_EMPLOYEES]: MAPPING> TT_11035 Input - 12 (__READER__)

MAPPING> TT_11037 [T_EMPLOYEES]: Output - 12, Dropped - 0 MAPPING> . . .

Configuring Session Logs
Configure session log options in the session properties. You can configure the following information for a session log:

Location. You can configure the directory where you want the session log created. By default, the PowerCenter Server creates the session log in the directory configured for the $PMSessionLogDir server variable. You can enter a different directory, but if the directory does not exist or is not local to the PowerCenter Server that runs the session, the session fails.
Session Logs 469

♦ ♦ ♦

Name. You can name the session log or accept the default name. The default name for the session log is s_mapping name.log. Archive. You can configure the number of session logs you want the PowerCenter Server to archive for each session. By default, the PowerCenter Server does not archive session logs. Tracing levels. You can control the type of information the PowerCenter Server includes in the session log by setting a tracing level for the session. By default, the PowerCenter Server uses tracing levels configured in the mapping.

Configuring Session Log Locations and Filenames
You can configure the name and location of the session log on the Properties tab of the session properties.
To configure session log information: 1. 2.

In the Workflow Manager, open the session properties. Select the General Options settings on the Properties tab.

Session Log Filename and Directory

470

Chapter 16: Log Files

3.

Enter the following session log options:
Option Name Session Log File Name Description By default, the PowerCenter Server uses the session name for the log file name: s_mapping name.log. For a debug session, it uses DebugSession_mapping name.log. Optionally enter a file name, a file name and directory, or use the $PMSessionLogFile session parameter. The PowerCenter Server appends information in this field to that entered in the Session Log File Directory field. For example, if you have “C:\session_logs\” in the Session Log File Directory field, then enter “logname.txt” in the Session Log File field, the PowerCenter Server writes the logname.txt to the C:\session_logs\ directory. You can also use the $PMSessionLogFile session parameter to represent the name of the session log or the name and location of the session log. For details on session parameters, see “Session Parameters” on page 495. Location of the log file. Enter a valid directory local to the PowerCenter Server. By default, the PowerCenter Server creates session logs in the directory configured for the $PMSessionLogDir server variable.

Session Log File Directory

4.

Click OK to save the session.

Archiving Session Logs
You can archive session logs on a session-by-session basis. The PowerCenter Server can save session logs in the following ways:
♦ ♦

Save a selected number of logs Save all logs by timestamp

By default, the PowerCenter Server does not archive session logs. It creates one session log for each session and overwrites the existing log with the latest session log. If you configure the session to save a specific number of session logs, it names the most recent log s_mapping name.log. It then cycles through a closed naming sequence for historical logs as follows: s_mapping name.log.0, s_mapping name.log.1, s_mapping name.log.2, …, s_mapping name.log.n-1, where n is the number of session logs. Because the PowerCenter Server cycles through the numeric naming sequence, check the session log file timestamp to determine the chronological order of those files. Instead of entering a specific number of session logs to save, you can use the server variable $PMSessionLogCount. When you use $PMSessionLogCount server variable, the PowerCenter Server archives the number of session logs configured for the server variable. If you use $PMSessionLogCount for all sessions, you can increase the number of archived session logs for all sessions by changing the server variable.
Note: By default, $PMSessionLogCount is set to 0. To archive session logs using

$PMSessionLogCount, configure it for a larger number of session logs. For details on configuring server variables, see “Registering the PowerCenter Server” in the Installation and Configuration Guide.

Session Logs

471

You can also save all session logs by configuring a session to save logs by timestamp. When timestamping session logs, the PowerCenter Server appends the month, day, hour, and minute of the session completion to the log file. The resulting log file name is s_mapping name.log.yyyymmddhhmi, where:
♦ ♦ ♦ ♦ ♦

yyyy = year mm = month, ranging from 1-12 dd = day, ranging from 1-31 hh = hour, ranging from 0-23 mi = minute, ranging from 0-59

To prevent filling the session log directory, periodically delete or backup log files when using the timestamp option.
Note: You can also truncate workflow and session log entries from the repository. For more

information, see “Using the Repository Manager” in the Repository Guide.
To specify archiving information: 1. 2.

In the Workflow Manager, open the session properties. Select the Log Options settings on the Config Object tab.

Log Options Settings

472

Chapter 16: Log Files

3.

Enter the following session log options:
Option Name Save Session Log By Description If you select Save Session Log by Timestamp, the PowerCenter Server saves all session logs, appending a timestamp to each log. If you select Save Session Log by Runs, the PowerCenter Server saves a designated number of session logs. Configure the number of sessions in the Save Session Log for These Runs option. You can also use the $PMSessionLogCount server variable to save the configured number of session logs for the PowerCenter Server. The number of historical session logs you want the PowerCenter Server to save. The Informatica saves the number of historical logs you specify, plus the most recent session log. Therefore, if you specify 5 runs, the PowerCenter Server saves the most recent session log, plus historical logs 0 to 4, for a total of 6 logs. You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the PowerCenter Server saves only the most recent session log.

Save Session Log for These Runs

4.

Click OK to save the session.

Setting Tracing Levels
The amount of detail in the session log depends on the tracing level that you set. You can define tracing levels for each transformation or for the entire session. By default, the PowerCenter Server uses tracing levels configured in the mapping. Setting a tracing level for the session overrides the tracing levels configured for each transformation in the mapping. If you select a normal tracing level or higher, the PowerCenter Server writes row errors into the session log, including the transformation in which the error occurred and complete row data. If you configure the session for row error logging, the PowerCenter Server writes row errors to the error log instead of the session log. If you want the PowerCenter Server to write dropped rows to the session log as well, configure the session with Verbose Data tracing level. Table 16-4 describes the session log tracing levels:
Table 16-4. Session Log Tracing Levels Tracing Level None Terse Normal Description The PowerCenter Server uses the tracing level set in the mapping. PowerCenter Server logs initialization information as well as error messages and notification of rejected data. PowerCenter Server logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows.

Session Logs

473

Table 16-4. Session Log Tracing Levels Tracing Level Verbose Initialization Verbose Data Description In addition to normal tracing, PowerCenter Server logs additional initialization details, names of index and data files used, and detailed transformation statistics. In addition to verbose initialization tracing, PowerCenter Server logs each row that passes into the mapping. Also notes where the PowerCenter Server truncates string data to fit the precision of a column and provides detailed transformation statistics. When you configure the tracing level to verbose data, the PowerCenter Server writes row data for all rows in a block when it processes a transformation.

You can also enter tracing levels for individual transformations in the mapping. When you enter a tracing level in the session properties, you override tracing levels configured for transformations in the mapping.
To set the tracing level: 1.

Select the Error Handling settings on the Config Object tab.

Tracing Level

2. 3.

Select a tracing level from the Override Tracing list. Table 16-4 on page 473 describes the session log tracing levels. Click OK to save the session.

Viewing Session Logs
Session logs are text files that you can open with any text editor. The PowerCenter Server saves session logs in the directory you specify in the Session Log File Directory field in the session properties.

474

Chapter 16: Log Files

You can also view session logs through the Workflow Monitor. When you do this, the Workflow Monitor creates a temporary file that stores the session log. You can view the temporary file through the Workflow Monitor. If a session fails, you can still view the session log file. The PowerCenter Server generates the session log based on the PowerCenter Server code page. You can specify the language in which you want to view the session log based on the locale of the machine hosting the PowerCenter Server.
To use the Workflow Monitor to view the most recent session log: 1. 2. 3. 4.

In the Navigator window, connect to the server on which the workflow runs. Open the folder that contains the workflow. Open the workflow that contains the session whose log you wish to view. Right-click the session and choose Get Session Log.

If you save session logs by timestamp, you can also use the Workflow Monitor to view past session logs. To do this, right-click the session in the Gantt chart view and choose Get Session Log. For more information about the Workflow Monitor, see “Using the Workflow Monitor” on page 404.

Session Logs

475

Reject Files
During a session, the PowerCenter Server creates a reject file for each target instance in the mapping. If the writer or the target rejects data, the PowerCenter Server writes the rejected row into the reject file. The reject file and session log contain information that helps you determine the cause of the reject. Each time you run a session, the PowerCenter Server appends rejected data to the reject file. Depending on the source of the problem, you can correct the mapping and target database to prevent rejects in subsequent sessions.
Note: If you enable row error logging in the session properties, the PowerCenter Server does

not create a reject file. It writes the reject rows to the row error tables or file.

Locating Reject Files
The PowerCenter Server creates reject files for each target instance in the mapping. It creates reject files in the session reject file directory, as configured on the Properties settings of the Targets node on the Mapping tab (Transformation view). By default, the PowerCenter Server creates reject files in the $PMBadFileDir server variable directory. The PowerCenter Server names reject files after the target instance name. The default name for reject files is target instance partition number.bad. You can view or edit reject file names in the session properties. The Workflow Manager replaces slash characters in the target instance name with underscore characters. To find the location and name of the reject files, view the properties settings of the Targets node on the Mapping tab (Transformation view).

476

Chapter 16: Log Files

Figure 16-1 shows the properties settings on the Mapping tab:
Figure 16-1. Properties Settings on the Mapping Tab

Reject file directory and filename

When you run a session that contains multiple partitions, the PowerCenter Server creates a separate reject file for each partition.

Reading Reject Files
After you locate a reject file, you can read it using a text editor that supports the reject file code page. Reject files contain rows of data rejected by the writer or the target database. Though the PowerCenter Server writes the entire row in the reject file, the problem generally centers on one column within the row. To help you determine which column caused the row to be rejected, the PowerCenter Server adds row and column indicators to give you more information about each column:

Row indicator. The first column in each row of the reject file is the row indicator. The numeric indicator tells whether the row was marked for insert, update, delete, or reject. If the session is a user-defined commit session, the row indicator might tell whether the transaction was rolled back due to a non-fatal error or if the committed transaction was in a failed target connection group. For more information about user-defined commit sessions and rejected rows, see “User-Defined Commits” on page 283.

Column indicator. Column indicators appear after every column of data. The alphabetical character indicators tell whether the data was valid, overflow, null, or truncated.

The following sample reject file shows the row and column indicators:
0,D,1921,D,Nelson,D,William,D,415-541-5145,D 0,D,1922,D,Page,D,Ian,D,415-541-5145,D

Reject Files

477

0,D,1923,D,Osborne,D,Lyle,D,415-541-5145,D 0,D,1928,D,De Souza,D,Leo,D,415-541-5145,D 0,D,2001,D,S. MacDonald,D,Ira,D,415-541-5145,D

Row Indicators
The first column in the reject file is the row indicator. The number listed as the row indicator tells the writer what to do with the row of data. Table 16-5 describes the row indicators in a reject file:
Table 16-5. Row Indicators in Reject File Row Indicator 0 1 2 3 4 5 6 7 8 9 Meaning Insert Update Delete Reject Rolled-back insert Rolled-back update Rolled-back delete Committed insert Committed update Committed delete Rejected By Writer or target Writer or target Writer or target Writer Writer Writer Writer Writer Writer Writer

If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject. If a row indicator is 0, 1, or 2, either the writer or the target database rejected the row. To narrow down the reason why rows marked 0, 1, or 2 were rejected, review the column indicators and consult the session log.

Column Indicators
After the row indicator is a column indicator, followed by the first column of data, and another column indicator. Column indicators appear after every column of data and define the type of the data preceding it.

478

Chapter 16: Log Files

Table 16-6 describes the column indicators in a reject file:
Table 16-6. Column Indicators in Reject File Column Indicator D Type of data Valid data. Writer Treats As Good data. Writer passes it to the target database. The target accepts it unless a database error occurs, such as finding a duplicate key. Bad data, if you configured the mapping target to reject overflow or truncated data. Good data. Writer passes it to the target, which rejects it if the target database does not accept null values. Bad data, if you configured the mapping target to reject overflow or truncated data.

O N T

Overflow. Numeric data exceeded the specified precision or scale for the column. Null. The column contains a null value. Truncated. String data exceeded a specified precision for the column, so the PowerCenter Server truncated it.

Null columns appear in the reject file with commas marking their column. An example of a null column surrounded by good data appears as follows:
5,D,,N,5,D

Because either the writer or target database can reject a row, and because they can reject the row for a number of reasons, you need to evaluate the row carefully and consult the session log to determine the cause for reject.

Reject Files

479

480

Chapter 16: Log Files

Chapter 17

Row Error Logging
This chapter includes the following topics:
♦ ♦ ♦ ♦

Overview, 482 Understanding the Error Log Tables, 483 Understanding the Error Log File, 489 Configuring Error Log Options, 493

481

Overview
When you configure a session, you can choose to log row errors in a central location. When a row error occurs, the PowerCenter Server logs error information that allows you to determine the cause and source of the error. The PowerCenter Server logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information. You can log row errors into relational tables or flat files. When you enable error logging, the PowerCenter Server creates the error tables or an error log file the first time it runs the session. Error logs are cumulative. If the error logs exist, the PowerCenter Server appends error data to the existing error logs. You can choose to log source row data. Source row data includes row data, source row ID, and source row type from the source qualifier where an error occurs. The PowerCenter Server cannot identify the row in the source qualifier that contains an error if the error occurs after a non pass-through partition point with more than one partition or one of the following active sources:
♦ ♦ ♦ ♦ ♦ ♦

Aggregator Custom, configured as an active transformation Joiner Normalizer (pipeline) Rank Sorter

By default, the PowerCenter Server logs transformation errors in the session log and reject rows in the reject file. When you enable error logging, the PowerCenter Server does not generate a reject file or write dropped rows to the session log. Without a reject file, the PowerCenter Server does not log Transaction Control transformation rollback or commit errors. If you want to write rows to the session log in addition to the row error log, you can enable verbose data tracing.
Note: When you log row errors, session performance may decrease because the PowerCenter Server processes one row at a time instead of a block of rows at once.

Error Log Code Pages
The code page for the error log must match the code page for the session log. By default, the error log code page matches the server code page, and you can set the server configuration parameter to use UTF-8. The code page for the relational database where the error tables exist needs to be one-way compatible with the server code page. For more information about code pages, see “Globalization Overview” in the Installation and Configuration Guide.

482

Chapter 17: Row Error Logging

Understanding the Error Log Tables
When you choose relational database error logging, the PowerCenter Server creates four error tables the first time you run a session. You specify the database connection to the database where the PowerCenter Server creates these tables. If the error tables exist for a session, the PowerCenter Server appends row errors to these tables. Relational database error logging allows you to collect row errors from multiple sessions in one set of error tables. To do this, you specify the same error log table name prefix for all sessions. You can issue select statements on the generated error tables to retrieve error data for a particular session. You can specify a prefix for the error tables. The error table names can have up to eleven characters. Do not specify a prefix that exceeds 19 characters when naming Oracle, Sybase, or Teradata error log tables, as these databases have a maximum length of 30 characters for table names. The PowerCenter Server creates the error tables without specifying primary and foreign keys. However, you can specify key columns. The PowerCenter Server generates the following tables to help you track row errors:
♦ ♦ ♦ ♦

PMERR_DATA. Stores data and metadata about a transformation row error and its corresponding source row. PMERR_MSG. Stores metadata about an error and the error message. PMERR_SESS. Stores metadata about the session. PMERR_TRANS. Stores metadata about the source and transformation ports, such as name and datatype, when a transformation error occurs.

PMERR_DATA
When the PowerCenter Server encounters a row error, it inserts an entry into the PMERR_DATA table. This table stores data and metadata about a transformation row error and its corresponding source row. Table 17-1 describes the structure of the PMERR_DATA table:
Table 17-1. PMERR_DATA Table Schema Column Name REPOSITORY_GID WORKFLOW_RUN_ID WORKLET_RUN_ID SESS_INST_ID TRANS_MAPPLET_INST Datatype Varchar Integer Integer Integer Varchar Description A unique identifier for the repository. A unique identifier for the workflow. A unique identifier for the worklet. If a session is not part of a worklet, this value is “0”. A unique identifier for the session. Name of the mapplet where an error occurred.

Understanding the Error Log Tables

483

Table 17-1. PMERR_DATA Table Schema Column Name TRANS_NAME TRANS_GROUP Datatype Varchar Varchar Description Name of the transformation where an error occurred. Name of the input group or output group where an error occurred. Defaults to either “input” or “output” if the transformation does not have a group. Specifies the partition number of the transformation where an error occurred. Specifies the row ID generated by the last active source. Delimited string containing all column data, including the column indicator. Column indicators are: D - valid O - overflow N - null T - truncated B - binary U - data unavailable The fixed delimiter between column data and column indicator is colon ( : ). The delimiter between the columns is pipe ( | ). You can override the column delimiter in the error handling settings. The PowerCenter Server converts all column data to text string in the error table. For binary data, the PowerCenter Server uses only the column indicator. This value can span multiple rows. When the data exceeds 2000 bytes, the PowerCenter Server creates a new row. The line number for each row error entry is stored in the LINE_NO column. SOURCE_ROW_ID Integer Value that the source qualifier assigns to each row it reads. If the PowerCenter Server cannot identify the row, the value is -1. The row indicator that tells whether the row was marked for insert, update, delete, or reject. 0 - Insert 1 - Update 2 - Delete 3 - Reject

TRANS_PART_INDEX TRANS_ROW_ID TRANS_ROW_DATA

Integer Integer Long Varchar

SOURCE_ROW_TYPE

Integer

484

Chapter 17: Row Error Logging

Table 17-1. PMERR_DATA Table Schema Column Name SOURCE_ROW_DATA Datatype Long Varchar Description Delimited string containing all column data, including the column indicator. Column indicators are: D - valid O - overflow N - null T - truncated B - binary U - data unavailable The fixed delimiter between column data and column indicator is colon ( : ). The delimiter between the columns is pipe ( | ). You can override the column delimiter in the error handling settings. The PowerCenter Server converts all column data to text string in the error table or error file. For binary data, the PowerCenter Server uses only the column indicator. This value can span multiple rows. When the data exceeds 2000 bytes, the PowerCenter Server creates a new row. The line number for each row error entry is stored in the LINE_NO column. LINE_NO Integer Specifies the line number for each row error entry in SOURCE_ROW_DATA and TRANS_ROW_DATA that spans multiple rows.

Informatica recommends using the fields in bold to join tables.

PMERR_MSG
When the PowerCenter Server encounters a row error, it inserts an entry into the PMERR_MSG table. This table stores metadata about the error and the error message. Table 17-2 describes the structure of the PMERR_MSG table:
Table 17-2. PMERR_MSG Table Schema Column Name REPOSITORY_GID WORKFLOW_RUN_ID WORKLET_RUN_ID SESS_INST_ID MAPPLET_INST_NAME TRANS_NAME Datatype Varchar Integer Integer Integer Varchar Varchar Description A unique identifier for the repository. A unique identifier for the workflow. A unique identifier for the worklet. If a session is not part of a worklet, this value is “0”. A unique identifier for the session. Mapplet to which the transformation belongs. If the transformation is not part of a mapplet, this value is N/A. Name of the transformation where an error occurred.

Understanding the Error Log Tables

485

Table 17-2. PMERR_MSG Table Schema Column Name TRANS_GROUP Datatype Varchar Description Name of the input group or output group where an error occurred. Defaults to either “input” or “output” if the transformation does not have a group. Specifies the partition number of the transformation where an error occurred. Specifies the row ID generated by the last active source. Counter for the number of errors per row in each transformation group. If a session has multiple partitions, the PowerCenter Server maintains this counter for each partition. For example, if a transformation generates three errors in partition 1 and two errors in partition 2, ERROR_SEQ_NUM generates the values 1, 2, and 3 for partition 1, and values 1 and 2 for partition 2. Timestamp of the PowerCenter Server when the error occurred. The Coordinated Universal Time, also known as Greenwich Mean Time, of when an error occurred. The error code that the error generates. Error message, which can span multiple rows. When the data exceeds 2000 bytes, the PowerCenter Server creates a new row. The line number for each row error entry is stored in the LINE_NO column. The type of error that occurred. The PowerCenter Server uses the following values: 1 - Reader error 2 - Writer error 3 - Transformation error Specifies the line number for each row error entry in ERROR_MSG that spans multiple rows.

TRANS_PART_INDEX TRANS_ROW_ID ERROR_SEQ_NUM

Integer Integer Integer

ERROR_TIMESTAMP ERROR_UTC_TIME ERROR_CODE ERROR_MSG

Date/Time Integer Integer Long Varchar

ERROR_TYPE

Integer

LINE_NO

Integer

Informatica recommends using the fields in bold to join tables.

PMERR_SESS
When you choose relational database error logging, the PowerCenter Server inserts entries into the PMERR_SESS table. This table stores metadata about the session where an error occurred.

486

Chapter 17: Row Error Logging

Table 17-3 describes the structure of the PMERR_SESS table:
Table 17-3. PMERR_SESS Table Schema Column Name REPOSITORY_GID WORKFLOW_RUN_ID WORKLET_RUN_ID SESS_INST_ID SESS_START_TIME SESS_START_UTC_TIME REPOSITORY_NAME FOLDER_NAME WORKFLOW_NAME TASK_INST_PATH Datatype Varchar Integer Integer Integer Date/Time Integer Varchar Varchar Varchar Varchar Description A unique identifier for the repository. A unique identifier for the workflow. A unique identifier for the worklet. If a session is not part of a worklet, this value is “0”. A unique identifier for the session. Timestamp of the PowerCenter Server when a session starts. The Coordinated Universal Time, also known as Greenwich Mean Time, of when the session starts. The repository name where sessions are stored. Specifies the folder where the mapping and session are located. Specifies the workflow that runs the session being logged. Fully qualified session name that can span multiple rows. The PowerCenter Server creates a new line for the session name. The PowerCenter Server also creates a new line for each worklet in the qualified session name. For example, you have a session named WL1.WL2.S1. Each component of the name appears on a new line: WL1 WL2 S1 The PowerCenter Server writes the line number in the LINE_NO column. Specifies the mapping that the session uses. Specifies the line number for each row error entry in TASK_INST_PATH that spans multiple rows.

MAPPING_NAME LINE_NO

Varchar Integer

Informatica recommends using the fields in bold to join tables.

PMERR_TRANS
When the PowerCenter Server encounters a transformation error, it inserts an entry into the PMERR_TRANS table. This table stores metadata, such as the name and datatype of the source and transformation ports. Table 17-4 describes the structure of the PMERR_TRANS table:
Table 17-4. PMERR_TRANS Table Schema Column Name REPOSITORY_GID WORKFLOW_RUN_ID Datatype Varchar Integer Description A unique identifier for the repository. A unique identifier for the workflow.

Understanding the Error Log Tables

487

Table 17-4. PMERR_TRANS Table Schema Column Name WORKLET_RUN_ID SESS_INST_ID TRANS_MAPPLET_INST TRANS_NAME TRANS_GROUP Datatype Integer Integer Varchar Varchar Varchar Description A unique identifier for the worklet. If a session is not part of a worklet, this value is “0”. A unique identifier for the session. Specifies the instance of a mapplet. Name of the transformation where an error occurred. Name of the input group or output group where an error occurred. Defaults to either “input” or “output” if the transformation does not have a group. Lists the port names and datatypes of the input or output group where the error occurred. Port name and datatype pairs are separated by commas, for example: portname1:datatype, portname2:datatype. This value can span multiple rows. When the data exceeds 2000 bytes, the PowerCenter Server creates a new row for the transformation attributes and writes the line number in the LINE_NO column. SOURCE_MAPPLET_INST SOURCE_NAME Varchar Varchar Name of the mapplet in which the source resides. Name of the source qualifier. N/A appears when a row error occurs downstream of an active source that is not a source qualifier or a non pass-through partition point with more than one partition. For a list of active sources that can affect row error logging, see “Overview” on page 482. Lists the connected field(s) in the source qualifier where an error occurred. When an error occurs in multiple fields, each field name is entered on a new line. Writes the line number in the LINE_NO column. Specifies the line number for each row error entry in TRANS_ATTR and SOURCE_ATTR that spans multiple rows.

TRANS_ATTR

Varchar

SOURCE_ATTR

Varchar

LINE_NO

Integer

Informatica recommends using the fields in bold to join tables.

488

Chapter 17: Row Error Logging

Understanding the Error Log File
You can create an error log file to collect all errors that occur in a session. This error log file is a column delimited line sequential file. By specifying a unique error log file name, you can create a separate log file for each session in a workflow. When you want to analyze the row errors for only one session, use an error log file. In an error log file, double pipes “||” delimit error logging columns. By default, pipe “|” delimits row data. You can change this row data delimiter by setting the Data Column Delimiter error log option. The code page for the error file is the same as the code page for the session log file. If the session log uses a UTF-8 code page, the error file also uses a UTF-8 code page. For more information about code pages, see “Globalization Overview” in the Installation and Configuration Guide. Error log files have the following structure:
[Session Header] [Column Header] [Column Data] ♦ ♦ ♦

Session header. Contains session run information. Information in the session header is like the information stored in the PMERR_SESS table. Column header. Contains data column names. Column data. Contains actual row data and error message information.

The following sample error log file contains a session header, column header, and column data:
********************************************************************** Repository GID: fe4817ab-7d87-465f-9110-354222424df0 Repository: CustomerInfo Folder: Row_Error_Logging Workflow: wf_basic_REL_errors_AGG_case Session: s_m_basic_REL_errors_AGG_case Mapping: m_basic_REL_errors_AGG_case Workflow Run ID: 1310 Worklet Run ID: 0 Session Instance ID: 19 Session Start Time: 08/03/2004 16:57:01 Session Start Time (UTC): 1067126221 **********************************************************************

Understanding the Error Log File

489

Transformation||Transformation Mapplet Name||Transformation Group||Partition Index||Transformation Row ID||Error Sequence||Error Timestamp||Error UTC Time||Error Code||Error Message||Error Type||Transformation Data||Source Mapplet Name||Source Name||Source Row ID||Source Row Type||Source Data agg_REL_basic||N/A||Input||1||1||1||08/03/2004 16:57:03||1067126223||11019||Port [CUST_ID_NULL]: Default value is: ERROR(<<Expression Error>> [ERROR]: [AGG] CUST_ID - NULL detected on input.\n... nl:ERROR(s:'[AGG] CUST_ID - NULL detected on input.')).||3||D:1221|N:|N:|N:|D:Kauai Dive Shoppe|D:4-976 Sugarloaf Hwy|D:Kapaa Kauai|D:HI|D:94766|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||1||0||D:1221|D:Kauai Dive Shoppe|D:4-976 Sugarloaf Hwy|D:Kapaa Kauai|D:HI|D:94766 agg_REL_basic||N/A||Input||1||4||1||08/03/2004 16:57:03||1067126223||11019||Port [CITY_IN]: Default value is: ERROR(<<Expression Error>> [ERROR]: [AGG] Null detected for City_IN.\n... nl:ERROR(s:'[AGG] Null detected for City_IN.')).||3||D:1354|N:|N:|D:1354|T:Cayman Divers World|D:PO Box 541|N:|D:Gr|N:|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||4||0||D:1354|D:Cayman Divers World Unlim|D:PO Box 541|N:|D:Gr|N: agg_REL_basic||N/A||Input||1||5||1||08/03/2004 16:57:03||1067126223||11131||Transformation [agg_REL_basic] had an error evaluating variable column [Var_Divide_by_Price]. Error message is [<<Expression Error>> [/]: divisor is zero\n... f:(f:2 / f:(f:1 f:TO_FLOAT(i:1)))].||3||D:1356|N:|N:|D:1356|T:Tom Sawyer Diving C|T:632-1 Third Frydenh|D:Christiansted|D:St|D:00820|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||5||0||D:1356|D:Tom Sawyer Diving Centre|D:632-1 Third Frydenho|D:Christiansted|D:St|D:00820

Table 17-5 describes the columns in an error log file:
Table 17-5. Error Log File Column Headers Log File Column Headers Transformation Transformation Mapplet Name Transformation Group Partition Index Transformation Row ID Error Sequence Description The name of the transformation used by a mapping where an error occurred. Name of the mapplet that contains the transformation. N/A appears when this information is not available. Name of the input or output group where an error occurred. Defaults to either “input” or “output” if the transformation does not have a group. Specifies the partition number of the transformation partition where an error occurred. Specifies the row ID for the error row. Counter for the number of errors per row in each transformation group. If a session has multiple partitions, the PowerCenter Server maintains this counter for each partition. For example, if a transformation generates three errors in partition 1 and two errors in partition 2, ERROR_SEQ_NUM generates the values 1, 2, and 3 for partition 1, and values 1 and 2 for partition 2.

490

Chapter 17: Row Error Logging

Table 17-5. Error Log File Column Headers Log File Column Headers Error Timestamp Error UTC Time Error Code Error Message Error Type Description Timestamp of the PowerCenter Server when the error occurred. The Coordinated Universal Time, also known as Greenwich Mean Time, when the error occurred. The error code that corresponds to the error message. Error message. The type of error that occurred. The PowerCenter Server uses the following values: 1 - Reader error 2 - Writer error 3 - Transformation error Delimited string containing all column data, including the column indicator. Column indicators are: D - valid O - overflow N - null T - truncated B - binary U - data unavailable The fixed delimiter between column data and column indicator is a colon ( : ). The delimiter between the columns is a pipe ( | ). You can override the column delimiter in the error handling settings. The PowerCenter Server converts all column data to text string in the error file. For binary data, the PowerCenter Server uses only the column indicator. Source Name Name of the source qualifier. N/A appears when a row error occurs downstream of an active source that is not a source qualifier or a non pass-through partition point with more than one partition. For a list of active sources that can affect row error logging, see “Overview” on page 482. Value that the source qualifier assigns to each row it reads. If the PowerCenter Server cannot identify the row, the value is -1.

Transformation Data

Source Row ID

Understanding the Error Log File

491

Table 17-5. Error Log File Column Headers Log File Column Headers Source Row Type Description The row indicator that tells whether the row was marked for insert, update, delete, or reject. 0 - Insert 1 - Update 2 - Delete 3 - Reject Delimited string containing all column data, including the column indicator. Column indicators are: D - valid O - overflow N - null T - truncated B - binary U - data unavailable The fixed delimiter between column data and column indicator is a colon ( : ). The delimiter between the columns is a pipe ( | ). You can override the column delimiter in the error handling settings. The PowerCenter Server converts all column data to text string in the error table or error file. For binary data, the PowerCenter Server uses only the column indicator.

Source Data

492

Chapter 17: Row Error Logging

Configuring Error Log Options
You configure error logging for each session in a workflow. You can find error handling options in the Config Object tab of the sessions properties.
Tip: You can use the Workflow Manager to create a reusable set of attributes for the Config

Object tab. For more information on creating a session configuration object, see “Creating a Session Configuration Object” on page 183.
To configure error logging options: 1. 2. 3.

Double-click the Session task to open the session properties. Select the Config Object tab. Choose error handling options.

Error Log Options

Configuring Error Log Options

493

Table 17-6 describes the error logging settings of the Config Object tab:
Table 17-6. Error Log Options Error Log Options Error Log Type Required/ Optional Required Description Specifies the type of error log to create. You can specify relational database, flat file, or no log. By default, the PowerCenter Server does not create an error log. Specifies the database connection for a relational log. This option is required when you enable relational database logging. Specifies the table name prefix for relational logs. The PowerCenter Server appends 11 characters to the prefix name. Oracle and Sybase have a 30 character limit for table names. If a table name exceeds 30 characters, the session fails. Specifies the directory where errors are logged. By default, the error log file directory is $PMBadFilesDir\. This option is required when you enable flat file logging. Specifies error log file name. The character limit for the error log file name is 255. By default, the error log file name is PMError.log. This option is required when you enable flat file logging. Specifies whether or not to log transformation row data. By default, the PowerCenter Server logs transformation row data. If you disable this property, N/A or -1 appears in transformation row data fields. If you choose not to log source row data, or if source row data is unavailable, the PowerCenter Server writes an indicator such as N/ A or -1, depending on the column datatype. If you do not need to capture source row data, consider disabling this option to increase PowerCenter Server performance. Delimiter for string type source row data and transformation group row data. By default, the PowerCenter Server uses a pipe ( | ) delimiter. Verify that you do not use the same delimiter for the row data as the error logging columns. If you use the same delimiter, you may find it difficult to read the error log file.

Error Log DB Connection Error Log Table Name Prefix

Required/ Optional Optional

Error Log File Directory

Required/ Optional Required/ Optional Optional

Error Log File Name

Log Row Data

Log Source Row Data

Optional

Data Column Delimiter

Required

4.

Click OK.

494

Chapter 17: Row Error Logging

Chapter 18

Session Parameters
This chapter contains information on the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 496 Session Log Parameter, 497 Database Connection Parameters, 499 Source File Parameters, 502 Target File Parameters, 504 Lookup File Parameters, 506 Reject File Parameters, 508 Tips, 510

495

Overview
Session parameters, like mapping parameters, represent values you might want to change between sessions, such as a database connection or source file. Use session parameters in the session properties, and then define the parameters in a parameter file. You can specify the parameter file for the session to use in the session properties. You can also specify it when you use pmcmd to start the session. The Workflow Manager provides one built-in session parameter, $PMSessionLogFile. With $PMSessionLogFile, you can change the name of the session log generated for the session. The Workflow Manager also allows you to create user-defined session parameters. Table 18-1 describes required naming conventions for the session parameters you can define:
Table 18-1. Naming Conventions for User-Defined Session Parameters Parameter Type Database Connection Source File Target File Lookup File Reject File Naming Convention $DBConnectionName $InputFileName $OutputFileName $LookupFileName $BadFileName

Use session parameters to make sessions more flexible. For example, you have the same type of transactional data written to two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the databases. You want to use the same mapping for both tables. Instead of creating two sessions for the same mapping, you can create a database connection parameter, $DBConnectionSource, and use it as the source database connection for the session. When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session. After the session completes, you set $DBConnectionSource to TransDB2 and run the session again. You might use several session parameters together to make session management easier. For example, you might use source file and database connection parameters to configure a session to read data from different source files and write the results to different target databases. You can then use reject file parameters to write the session reject files to the target machine. You can use the session log parameter, $PMSessionLogFile, to write to different session logs in the target machine, as well. When you use session parameters, you must define the parameters in the parameter file. Session parameters do not have default values. When the PowerCenter Server cannot find a value for a session parameter, it fails to initialize the session.

496

Chapter 18: Session Parameters

Session Log Parameter
The Workflow Manager provides a built-in session parameter named $PMSessionLogFile. Use $PMSessionLogFile in the session properties to change the name or location of the session log between runs. When you use $PMSessionLogFile in the session properties, define the parameter in the parameter file.

Changing the Session Log Name
You can use $PMSessionLogFile to change the session log name between sessions. In the General Options settings of the Properties tab, enter $PMSessionLogFile in the Session Log Filename field. Then define $PMSessionLogFile in the parameter file. When the PowerCenter Server runs the session, it creates a session log in the directory listed in the Session Log File Directory field and names the session log as instructed by the parameter file. If a session log with the same name already exists, the PowerCenter Server overwrites the existing file. Figure 18-1 illustrates how to use the session log parameter with a directory:
Figure 18-1. Using $PMSessionLogFile as the Name of the Session Log

Session Log Parameter Session Log Directory Parameter Filename

For example, in a session, you leave Session Log File Directory set to its default value, the $PMSessionLogDir server variable. For Session Log File Name, you enter the session parameter $PMSessionLogFile. In the parameter file, you set $PMSessionLogFile to “TestRun.txt”. When you registered the PowerCenter Server, you defined $PMSessionLogDir as C:/Program Files/Informatica/PowerCenter Server/SessLogs. When the PowerCenter Server
Session Log Parameter 497

runs the session, it creates a session log named TextRun.txt in the C:/Program Files/ Informatica/PowerCenter Server/SessLogs directory.

Changing the Session Log Name and Location
You can also use $PMSessionLogFile to change both the directory and the session log name between sessions. If you do this, you also need to clear the Session Log File Directory field. The PowerCenter Server concatenates both fields to determine where and how to name the session log. For example, you have one session writing target files to different systems. You want each session log written to the target machine so the local administrator can review the file. In the session, you configure a target file session parameter $PMOutputFile1. You then use $PMSessionLogFile to define the session log file name and clear the Session Log File Directory. In the parameter file, you configure both the target file and session log file parameter to write to the same machine. Set $PMOutputFile1 to E:/target files/ Marketing.out, and $PMSessionLogFile to E:/session logs/Marketing.txt. After you run the session, you can edit the parameter file to change the directory and file names for both the target file and session log parameters. Alternatively, you can create a different parameter file for each target. You can then use pmcmd to specify which parameter file to use when you start the session.

Steps for Using $PMSessionLogFile
Use $PMSessionLogFile when you want to change the name and/or location of a session log between session runs.
To use the session log parameter: 1. 2. 3. 4. 5.

In the session properties, click the General Options settings of the Properties tab. Enter $PMSessionLogFile in the Session Log File field. If you want $PMSessionLogFile to represent both the session log name and directory, clear the Session Log File Directory field. Enter a parameter file and directory in the Parameter File Name field. Click OK.

Before you run the session, create the parameter file in the specified directory and define $PMSessionLogFile. For details, see “Parameter Files” on page 511.

498

Chapter 18: Session Parameters

Database Connection Parameters
You can create user-defined database connection session parameters to reuse sessions for different relational sources, targets, or lookups. You can create a database connection parameter in the session properties of any session that uses a relational source, target, or lookup. Name all database connection session parameters with the prefix $DBConnection, followed by any alphanumeric and underscore characters. When you define the parameter in the parameter file, you can reference any database connection in the repository. For example, you have a session you want to use with two relational sources. You access the first source with a database connection named “Marketing” and the second with a connection named “Sales.” In the session, you create a source database connection parameter named $DBConnection_Source. In the parameter file, you define $DBConnection_Source as Marketing and run the session. After the session completes, you set $DBConnection_Source to Sales in the parameter file, and then run the session. Alternatively, you can create two different parameter files, one for each source database connection. You can then use pmcmd to specify which parameter file to use when you start the session. If you want to use the same database connection for more than one connection, such as source and target, you can enter the same $DBConnection parameter for both source and target database connection. In the parameter file, enter one default value for the $DBConnection parameter. The PowerCenter Server uses the same DBConnectionName when accessing source and target. Similarly heterogeneous sources may also use the same $DBConnection parameter.
To configure a database connection parameter: 1.

In the session properties, click the Mapping tab (Transformation view) and click Connections settings for the sources or targets node.

Database Connection Parameters

499

2.

Click the Open button in the Value field.

Open Button

3. 4.

In the Relational Connection Browser, select Use Connection Variable. Enter a name for the database connection parameter. Name the connection parameter $DBConnectionName.

500

Chapter 18: Session Parameters

5.

In the General Options settings of the Properties tab, enter a parameter file and directory in the Parameter Filename field. The directory must be local to the PowerCenter Server.

6.

Click OK.

Before you run the session, create the parameter file in the specified directory and define the database connection parameter. For details, see “Parameter Files” on page 511.

Database Connection Parameters

501

Source File Parameters
You can create user-defined source file session parameters. Use a source file parameter when you want to change the name or location of a session source file between session runs. Name all source file session parameters with the prefix $InputFile, followed by any alphanumeric and underscore characters. All source file session parameters within a session must have distinct names. You can create a source file parameter in any session that reads from file sources. When you define the parameter in the parameter file, you can reference any source file local to the PowerCenter Server. You can use a user-defined source file session parameter in either the Source File Directory or Source Filename session property.

Changing the Source File
You can use a source file parameter to change the name of the source file a session uses. In the Properties settings of the Mapping tab, enter the source file parameter in the Source Filename field. Then define the parameter in a parameter file. When the PowerCenter Server runs the session, it connects to the directory listed in the Source File Directory field and reads the source file listed in the parameter file. Figure 18-2 shows how to use a source file parameter with a source directory:
Figure 18-2. Using Parameters to Change the Session Source File

Source File Directory Source Filename In the Parameter File

502

Chapter 18: Session Parameters

For example, in a session, you leave Source File Directory set to its default, the $PMSourceFileDir server variable. For the source file name, you create a session parameter named $Inputfile_products. In the parameter file, you set $Inputfile_products to “products.txt”. When you registered the PowerCenter Server, you set $PMSourceFileDir for C:/Program Files/Informatica/PowerCenter Server/SrcFiles. When the PowerCenter Server runs the session, it reads the products.txt file in the C:/Program Files/Informatica/ PowerCenter Server/SrcFiles directory.

Changing the Source File and Directory
You can use a source file parameter to change both the source file and directory used by a session. When you specify both the source file and directory in the Source Filename field, you need to clear the Source File Directory field. The PowerCenter Server concatenates both fields to determine where to find the indicated source file.

Steps for Using a Source File Parameter
Use a source file parameter when you want to change the source file and/or location between session runs.
To use a source file parameter: 1. 2. 3.

Select a source under the Sources node on the Mapping tab. Go to the Properties settings. In the Source Filename field, enter the source file parameter name. Name all source file parameters $InputFileName.

4. 5. 6.

If you want the parameter to represent both the source file name and location, clear the Source Directory field. In the General Options settings of the Properties tab, enter a parameter file and directory in the Parameter Filename field. Click OK.

Before you run the session, create the parameter file in the specified directory and define the source file parameter. For details, see “Parameter Files” on page 511.

Source File Parameters

503

Target File Parameters
You can create user-defined target file session parameters. Use a target file parameter when you want to change the name or location of a session target file between session runs. Name all target file session parameters with the prefix $OutputFile, followed by any alphanumeric and underscore characters. All target file session parameters within a session need to have distinct names. You can create a target file parameter in any session that writes to file targets. When you define the parameter in a parameter file, you can write the target file to any directory local to the PowerCenter Server. You can use a user-defined target file session parameter in either the Output File Directory or Output Filename session property.

Changing the Target File
You can use a target file parameter to change the name of the target file the PowerCenter Server creates when it runs a session. In the Properties settings of the Mapping tab, enter the target file parameter in the Output File Name field. Then define the parameter in a parameter file. When the PowerCenter Server runs the session, it connects to the directory listed in the Output File Directory field and creates the target file listed in the parameter file. If the target file exists, the PowerCenter Server overwrites the existing target file. Figure 18-3 shows how to use a target file parameter with a target file directory:
Figure 18-3. Using Parameters to Change the Session Target File

Target file directory Target file name in the parameter file

504

Chapter 18: Session Parameters

For example, you want to name the target file based on the month in which the session runs. In the session you leave the target directory set to its default, the $PMTargetFileDir server variable. For the target file name, you create a session parameter named $OutputFileName. In the parameter file, you set $OutputFileName to “Nov2000.out”. When you registered the PowerCenter Server, set the $PMTargetFileDir to C:/Program Files/Informatica/PowerCenter Server/TgtFiles. When the PowerCenter Server runs the session, it creates Nov2000.out in the C:/Program Files/Informatica/PowerCenter Server/TgtFiles directory.

Changing the Target File and Directory
You can use a target file parameter to change both the target file and directory used by a session. When you specify both the target file and directory in the Output Filename field, you need to clear the Output File Directory field. The PowerCenter Server concatenates both fields to determine where to create the target file. For example, a session uses a source file parameter to read both internal and external weblogs on different session runs. You want to write the results of the internal weblog session to one system and the external weblog session to another. In the session, you name the target file $OutputFileName and clear the Output File Directory field. In the parameter file, you set $OutputFileName to “E:/internal_weblogs/November_int.txt” to create a target file for the internal weblog session. After the session completes, you change $OutputFileName to “F:/ external_weblogs/November_ex.txt” for the external weblog session. Alternatively, you can create a different parameter file for each target. You can then use pmcmd to specify which parameter file to use when you start the session.

Steps for Using a Target File Parameter
Use a target file parameter when you want to change the name and/or location of a target file between session runs.
To use a target file parameter: 1. 2. 3.

Select a target under the Targets node on the Mapping tab. Go to the Properties settings. In the Output Filename field, enter the target file parameter name. Name all target file parameters $OutputFileName.

4. 5. 6.

If you want the parameter to represent both the target file name and location, clear the Output File Directory field. In the General Options settings of the Properties tab, enter a parameter file and directory in the Parameter Filename field. Click OK.

Before you run the session, create the parameter file in the specified directory and define the target file parameter you created. For details, see “Parameter Files” on page 511.

Target File Parameters

505

Lookup File Parameters
You can create user-defined lookup file session parameters. Use a lookup file parameter when you want to change the name or location of a session lookup file between session runs. Name all lookup file session parameters with the prefix $LookupFile, followed by any alphanumeric and underscore characters. All lookup file session parameters within a session must have distinct names. You can create a lookup file parameter in any session that performs lookups onflat files. When you define the parameter in the parameter file, you can reference any lookup file local to the PowerCenter Server. You can use a user-defined lookup file session parameter in either the Lookup Source File Directory or Lookup Source Filename session property.

Changing the Lookup File
You can use a lookup file parameter to change the name of the lookup file a session uses. In the Properties settings of the Mapping tab, enter the lookup file parameter in the Lookup Filename field. Then define the parameter in a parameter file. When the PowerCenter Server runs the session, it connects to the directory listed in the Lookup File Directory field and reads the source file listed in the parameter file. Figure 18-4 shows how to use a lookup file parameter with a lookup directory:
Figure 18-4. Using Parameters to Change the Session Lookup File

Lookup File Directory Lookup file name in the parameter file

506

Chapter 18: Session Parameters

For example, in a session, you leave Lookup File Directory set to its default, the $PMLookupFileDir server variable. For the lookup file name, you create a session parameter named $LookupFile_orders. In the parameter file, you set $LookupFile_orders to “orders.txt”. When you registered the PowerCenter Server, you set $PMLookupFileDir for C:/Program Files/Informatica/PowerCenter Server/LkpFiles. When the PowerCenter Server runs the session, it reads the orders.txt file in the C:/Program Files/Informatica/PowerCenter Server/ LkpFiles directory.

Changing the Lookup File and Directory
You can use a lookup file parameter to change both the lookup file and directory used by a session. When you specify both the lookup file and directory in the Lookup Source Filename field, you need to clear the Lookup Source File Directory field. The PowerCenter Server concatenates both fields to determine where to find the indicated lookup file.

Steps for Using a Lookup File Parameter
Use a lookup file parameter when you want to change the lookup file and/or location between session runs.
To use a lookup file parameter: 1. 2. 3.

Select a Lookup transformation on the Mapping tab. Go to the Properties settings. In the Lookup Source Filename field, enter the lookup file parameter name. Name all lookup file parameters $LookupFileName.

4. 5. 6.

If you want the parameter to represent both the source file name and location, clear the Lookup Directory field. In the General Options settings of the Properties tab, enter a parameter file and directory in the Parameter Filename field. Click OK.

Before you run the session, create the parameter file in the specified directory and define the lookup file parameter. For details, see “Parameter Files” on page 511.

Lookup File Parameters

507

Reject File Parameters
You can create user-defined reject file session parameters. Use a reject file parameter when you want to change the name or location of session reject files between session runs. Name all reject file session parameters with the prefix $BadFile, followed by any alphanumeric and underscore characters. All reject file parameters within a session need to have distinct names. You can create a reject file parameter for any target in a session. When you define the parameter in a parameter file, you can reference any directory local to the PowerCenter Server. You can use a user-defined reject file session parameter in either the Reject File Directory or Reject Filename session property.

Changing the Reject File Name
You can use a reject file parameter to change the name of a reject file a session uses. In the Properties settings of the Mapping tab, enter the reject file parameter in the Reject Filename field. Then define the parameter in the parameter file. When the PowerCenter Server runs the session, it locates the directory listed in the Reject File Directory field and creates the reject file listed in the parameter file. If the reject file already exists, it appends rejected data to the existing reject file. Figure 18-5 shows how to use a reject file parameter with a reject file directory:
Figure 18-5. Using Parameters to Change the Reject File Name

Reject file directory Reject file name in the parameter file

508

Chapter 18: Session Parameters

For example, you want to rename reject files between sessions to keep rejected data from different session runs in different files. in a session, you leave Reject File Directory set to its default, the $PMBadFileDir server variable. For the reject file name, you create a session parameter named $BadFileName. In the parameter file, you set $BadFileName to “FirstRun.bad.” When you registered the PowerCenter Server, you set $PMBadFileDir for C:/ Program Files/Informatica/PowerCenter Server/BadFiles. When the PowerCenter Server runs the session, it creates the FirstRun.bad file in the C:/Program Files/Informatica/PowerCenter Server/BadFiles directory.

Changing the Reject File and Directory
You can use a reject file parameter to change both the directory and name for session reject files. When you specify both the reject file and directory in the Reject Filename field, you need to clear the Reject File Directory field. The PowerCenter Server concatenates both fields to determine where to find the indicated reject file. For example, you use a database connection parameter to configure a session to write to different target databases. Instead of having the PowerCenter Server append rejected data from all sessions to the same reject file, you want to have a reject file for each target system. In the session, you name the reject file $BadFileName and clear the Reject File Directory field. In the parameter file, you set $BadFileName to the reject filename and directory for the target database used in the session. When you change the database connection parameter to a different database, you can also change the reject filename and directory. Alternatively, you can create a different parameter file for each target system. You can then use pmcmd to specify which parameter file to use when you start the session.

Steps for Using a Reject File Parameter
Use a reject file parameter when you want to change the reject file and/or location between session runs.
To use a reject file parameter: 1. 2.

Go to the Properties settings of the Mapping tab. In the Reject Filename field, enter the reject file parameter name. Name all reject file parameters $BadFileName.

3. 4. 5.

If you want the parameter to represent both the reject file name and location, clear the Reject File Directory field. In the General Options settings of the Properties tab, enter a parameter file and directory in the Parameter Filename field. Click OK.

Before you run the session, create the parameter file in the specified directory and define the reject file parameter. For details, see “Parameter Files” on page 511.

Reject File Parameters

509

Tips
Use reject file and session log parameters in conjunction with target file or target database connection parameters. When you use a target file or target database connection parameter with a session, you can keep track of reject files by using a reject file parameter to write the reject file to the target machine. You can also use the session log parameter to write the session log to the target machine.

510

Chapter 18: Session Parameters

Chapter 19

Parameter Files
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 512 Parameter File Format, 513 Guidelines for Creating Parameter Files, 515 Sample Parameter File, 517 Configuring the Parameter File Location, 518 Troubleshooting, 520 Tips, 521

511

Overview
You can use a parameter file to define the values for parameters and variables used in a workflow, worklet, or session. You can create a parameter file using a text editor such as WordPad or Notepad. You list the parameters or variables and their values in the parameter file. Parameter files can contain the following types of parameters and variables:
♦ ♦ ♦ ♦

Workflow variables Worklet variables Session parameters Mapping parameters and variables

When you use parameters or variables in a workflow, worklet, or session, the PowerCenter Server checks the parameter file to determine the start value of the parameter or variable. You can use a parameter file to initialize workflow variables, worklet variables, mapping parameters, and mapping variables. If you do not define start values for these parameters and variables, the PowerCenter Server checks for the start value of the parameter or variable in other places. For more information, see “Using Workflow Variables” on page 103 and “Mapping Parameters and Variables” in the Designer Guide. You can place parameter files on the PowerCenter Server machine or on a local machine. Use a local parameter file if you do not have access to parameter files on the PowerCenter Server machine. When you use a local parameter file, pmcmd passes variables and values in the file to the PowerCenter Server. Local parameter files are used with the startworkflow pmcmd command. For more information, see “pmcmd Reference” on page 594. You must define session parameters in a parameter file. Since session parameters do not have default values, when the PowerCenter Server cannot locate the value of a session parameter in the parameter file, it fails to initialize the session. You can include parameter or variable information for more than one workflow, worklet, or session in a single parameter file by creating separate sections for each object within the parameter file. You can also create multiple parameter files for a single workflow, worklet, or session and change the file that these tasks use as needed. To specify the parameter file the PowerCenter Server uses with a workflow, worklet, or session, you can do either of the following:
♦ ♦

Enter the parameter file name and directory in the workflow, worklet, or session properties. Start the workflow, worklet, or session using pmcmd and enter the parameter filename and directory in the command line. For details, see “Using pmcmd” on page 581.

If you enter a parameter file name and directory in both the workflow, worklet, or session properties and in the pmcmd command line, the PowerCenter Server uses the information you enter in the pmcmd command line.

512

Chapter 19: Parameter Files

Parameter File Format
When you enter values in a parameter file, you must precede the entries with a heading that identifies the workflow, worklet, or session whose parameters and variables you want to assign. You assign individual parameters and variables directly below this heading, entering each parameter or variable on a new line. You can list parameters and variables in any order for each task. You can define the following heading formats:

Workflow variables:
[folder name.WF:workflow name]

Worklet variables:
[folder name.WF:workflow name.WT:worklet name]

Worklet variables in nested worklets:
[folder name.WF:workflow name.WT:worklet name.WT:worklet name...]

Session parameters, plus mapping parameters and variables:
[folder name.WF:workflow name.ST:session name]

or
[folder name.session name]

or
[session name]

Below each heading, you define parameter and variable values as follows:
parameter name=value parameter2 name=value variable name=value variable2 name=value

For example, you have a session, s_MonthlyCalculations, in the Production folder. The session uses a string mapping parameter, $$State, that you want to set to “MA”, and a datetime mapping variable, $$Time. $$Time already has an initial values of “9/30/2000 00:00:00” saved in the repository, but you want to override this value to “10/1/2000 00:00:00.” The session also uses session parameters to connect to source files and target databases, as well as to write session log to the appropriate session log file. Table 19-1 shows the parameters and variables that you define in the parameter file:
Table 19-1. Parameters and Variables in Parameter File Parameter and Variable Type String Mapping Parameter Datetime Mapping Variable Parameter and Variable Name $$State $$Time Desired Definition MA 10/1/2000 00:00:00

Parameter File Format

513

Table 19-1. Parameters and Variables in Parameter File Parameter and Variable Type Source File (Session Parameter) Database Connection (Session Parameter) Session Log File (Session Parameter) Parameter and Variable Name $InputFile1 $DBConnection_Target $PMSessionLogFile Desired Definition Sales.txt Sales (database connection) d:/session logs/firstrun.txt

The parameter file for the session includes the folder and session name, as well as each parameter and variable:
[Production.s_MonthlyCalculations] $$State=MA $$Time=10/1/2000 00:00:00 $InputFile1=sales.txt $DBConnection_target=sales $PMSessionLogFile=D:/session logs/firstrun.txt

The next time you run the session, you might edit the parameter file to change the state to MD and delete the $$Time variable. This allows the PowerCenter Server to use the value for the variable that was set in the previous session run.

514

Chapter 19: Parameter Files

Guidelines for Creating Parameter Files
Use the following guidelines when creating parameter files:
♦ ♦ ♦

Capitalize folder and session names as necessary. Folder and session names are casesensitive in the parameter file. Enter folder names for non-unique session names. When a session name exists more than once in a repository, enter the folder name to indicate the location of the session. Create one or more parameter files. You assign parameter files to workflows, worklets, and sessions individually. You can specify the same parameter file for all of these tasks or create several parameter files. When you want to include parameter and variable information for more than one session in the file, create a new section for each session as follows. The folder name is optional.
[folder_name.session_name] parameter_name=value variable_name=value mapplet_name.parameter_name=value [folder2_name.session_name] parameter_name=value variable_name=value mapplet_name.parameter_name=value

Specify headings in any order. You can place headings in any order in the parameter file. However, if you define the same parameter or variable more than once in the file, the PowerCenter Server assigns the parameter or variable value using the first instance of the parameter or variable. Specify parameters and variables in any order. Below each heading, you can specify the parameters and variables in any order. When defining parameter values, do not use unnecessary line breaks or spaces. The PowerCenter Server might interpret additional spaces as part of the value. List all necessary mapping parameters and variables. Values entered for mapping parameters and variables become the start value for parameters and variables in a mapping. Mapping parameter and variable names are not case sensitive. List all session parameters. Session parameters do not have default values. An undefined session parameter can cause the session to fail. Session parameter names are not casesensitive. Use correct date formats for datetime values. When entering datetime values, use the following date formats:
− −

♦ ♦ ♦

MM/DD/RR MM/DD/RR HH24:MI:SS
Guidelines for Creating Parameter Files 515

− − ♦ ♦

MM/DD/YYYY MM/DD/YYYY HH24:MI:SS

Do not enclose parameters or variables in quotes. The PowerCenter Server interprets everything after the equal sign as part of the value. Precede parameters and variables created in mapplets with the mapplet name as follows:
mapplet_name.parameter_name=value mapplet2_name.variable_name=value

516

Chapter 19: Parameter Files

Sample Parameter File
The following text is an excerpt from a parameter file:
[HET_TGTS.WF:wf_TCOMMIT_INST_ALIAS] $$platform=unix [HET_TGTS.WF:wf_TGTS_ASC_ORDR.ST:s_TGTS_ASC_ORDR] $$platform=unix $DBConnection_ora=qasrvrk2_hp817 [ORDERS.WF:wf_PARAM_FILE.WT:WL_PARAM_Lvl_1] $$DT_WL_lvl_1=02/01/2000 00:00:00 $$Double_WL_lvl_1=2.2 [ORDERS.WF:wf_PARAM_FILE.WT:WL_PARAM_Lvl_1.WT:NWL_PARAM_Lvl_2] $$DT_WL_lvl_2=03/01/2000 00:00:00 $$Int_WL_lvl_2=3 $$String_WL_lvl_2=ccccc

Sample Parameter File

517

Configuring the Parameter File Location
You can specify the parameter filename and directory in the workflow or session properties.
To enter a parameter file in the workflow properties: 1. 2. 3.

Select Workflows-Edit. Click the Properties tab. Enter the parameter directory and name in the Parameter Filename field. You can enter either a direct path or a server variable directory. Use the appropriate delimiter for the PowerCenter Server operating system.

Enter the parameter directory.

4.

Click OK.

To enter a parameter file in the session properties: 1. 2.

Click the Properties tab and open the General Options settings. Enter the parameter directory and name in the Parameter Filename field.

518

Chapter 19: Parameter Files

3.

You can enter either a direct path or a server variable directory. Use the appropriate delimiter for the PowerCenter Server operating system.

Enter the parameter directory.

4.

Click OK.

Configuring the Parameter File Location

519

Troubleshooting
I have a section in a parameter file for a session, but the PowerCenter Server does not seem to read it. In the parameter file, folder and session names are case-sensitive. Make sure to enter folder and session names exactly as they appear in the Workflow Manager. Also, use the appropriate prefix for all user-defined session parameters. Table 19-2 describes required naming conventions for user-defined session parameters:
Table 19-2. Naming Conventions for User-Defined Session Parameters Parameter Type Database Connection Reject File Source File Target File Lookup File Naming Convention $DBConnectionName $BadFileName $InputFileName $OutputFileName $LookupFileName

I am trying to use a source file parameter to specify a source file and location, but the PowerCenter Server cannot find the source file. Make sure to clear the source file directory in the session properties. The PowerCenter Server concatenates the source file directory with the source file name to locate the source file. Also, make sure to enter a directory local to the PowerCenter Server and to use the appropriate delimiter for the operating system. I am trying to run a workflow with a parameter file and one of the sessions keeps failing. The session might contain a parameter that is not listed in the parameter file. The PowerCenter Server uses the parameter file to start all sessions in the workflow. Check the session properties, then verify that all session parameters are defined correctly in the parameter file.

520

Chapter 19: Parameter Files

Tips
Use a single parameter file to group parameter information for related sessions. When sessions are likely to use the same database connection or directory, you might want to include them in the same parameter file. When existing systems are upgraded, you can update information for all sessions by editing one parameter file. Use pmcmd and multiple parameter files for sessions with regular cycles. When you change parameter values for a session in a cycle, reuse the same values on a regular basis. If you run a session against both the sales and marketing databases once a week, you might want to create separate parameter files for each regular session run. Then, instead of changing the parameter file in the session properties each time you run the session, use pmcmd to specify the parameter file to use when you start the session.

Tips

521

522

Chapter 19: Parameter Files

Chapter 20

External Loading
This chapter covers the following topics:
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦

Overview, 524 External Loader Permissions, 525 External Loader Behavior, 526 Loading to DB2, 528 Loading to Oracle, 533 Loading to Sybase IQ, 535 Loading to Teradata, 538 Creating an External Loader Connection, 551 Configuring External Loading in a Session, 553 Troubleshooting, 557

523

Overview
You can configure a session to use DB2, Oracle, Sybase IQ, and Teradata external loaders to load session target files into the respective databases. External Loaders can increase session performance since these databases can load information directly from files faster than they can run the SQL commands to insert the same data into the database. To use an external loader for a session, you must perform the following tasks: 1. Create an external loader connection in the Workflow Manager and configure the external loader attributes. For details on creating external loader connections, see “Creating an External Loader Connection” on page 551. Configure the session to write to flat file instead of to a relational database. For more information, see “Configuring a Session to Write to a File” on page 553. Choose an external loader connection for each target file in the session properties. For more information, see “Selecting an External Loader Connection” on page 555.

2. 3.

When you run a session that uses an external loader, the PowerCenter Server creates a control file and a target flat file. The control file contains information about the target flat file such as data format and loading instructions for the external loader. The control file has an extension of .ctl. You can view the control file and the target flat file in the target file directory (default: $PMTargetFileDir). The PowerCenter Server waits for all external loading to complete before performing postsession commands, external procedures, and sending post-session email. Before you run external loaders, consider the following issues:

Disable constraints. Normally, you disable constraints built into the tables receiving the data before performing the load. Consult your database documentation for instructions on how to disable constraints. Performance issues. To preserve high performance, you can increase commit intervals and turn off database logging. However, to perform database recovery on failed sessions, you must have database logging turned on. Code page requirements. DB2, Oracle, Sybase IQ, and Teradata database servers must run in the same code page as the target flat file code page. The external loaders start in the target flat file code page. The PowerCenter Server creates the control and target flat files using the target flat file code page. If you are using a code page other than 7-bit ASCII for the target flat file, run the PowerCenter Server in Unicode data movement mode.

The PowerCenter Server can use multiple external loaders within one session. For example, if the mapping contains two targets, you can create a session that uses different connection types: one uses an Oracle external loader connection and the other uses a Sybase IQ external loader connection.

524

Chapter 20: External Loading

External Loader Permissions
You can set external loader connection permissions in the connection object in the Workflow Manager. The Workflow Manager assigns Owner permissions to the user who registers the connection. The Workflow Manager grants Owner Group permissions to the first group in the Group Memberships list of the owner. You can manage External Loader permissions if you are the owner of the external loader connection or if you have Super User privileges. If you want to edit an external loader connection, you must have read and write permissions for the connection. If you want to run sessions that use a target external loader connection, you must have at least execute permission for the connection.

Permissions and Privileges
To create an external loader connection, you must have one of the following privileges:
♦ ♦

Use Workflow Manager Super User

To configure a session to use an external loader, you must have one of the following sets of privileges and permissions:
♦ ♦

Use Workflow Manager privilege and folder read and write permissions Super User

If you enabled enhanced security, you must also have read permission for external loader connections associated with the session.

External Loader Permissions

525

External Loader Behavior
The behavior of the external loader depends on how you choose to load the data. You can load data in the following ways:
♦ ♦

Loading to named pipes. When you load data to named pipes, the external loader starts to load data to the target database as soon as the data appears in the named pipe. Staging data using flat files. When you stage data in flat files, the external loader starts to load data to the target databases only after the PowerCenter Server completes writing to the target flat files.

Loading Data Using Named Pipes
On UNIX, the PowerCenter Server writes to a named pipe, which is named after the configured target file name. The external loader starts to load data to the database as soon as the data appears in the named pipe. When you use external loaders on UNIX, the loader deletes the named pipe as soon as it completes the load. On Windows, when you load data using named pipes, the PowerCenter Server writes data to a named pipe using the specified format: \\.\pipe\<pipename> where the pipename is the same as the configured target name. If the PowerCenter Server finds a file or named pipe that uses the same name as the target flat file, it deletes the file or named pipe and recreates it. If the PowerCenter Server on UNIX finds a file or named pipe (with the same name as the session target flat file) in the target directory, it deletes the file or named pipe and recreates the named pipe.
Tip: You may not be able to create a named pipe or file if another file exists that uses the same

name. You can rename the output file in the session that uses the external loader.

Staging Data to Flat Files
When you stage data using flat files, the external loader starts loading data to target databases only after the PowerCenter Server completes writing to the target flat files. The external loader does not delete the target flat files after loading them to the database. Make sure the target file directory can accommodate the size of the target flat files. If the session contains fatal errors, the PowerCenter Server does not finish writing data to the target files, and the external loader does not start.

Partitioning Sessions with External Loaders
When you configure multiple partitions in a session with a flat file target, the PowerCenter Server creates a separate flat file for each partition. Some external loaders cannot load data from multiple files into the target. When you use an external loader in a session with multiple partitions, you must configure partitioning according to the external loader you use.

526

Chapter 20: External Loading

When you use an external loader that can load data from multiple files, you can create multiple partitions in the session. You choose an external loader connection for each partition. The PowerCenter Server creates an output file for each partition, and the external loader loads the output from each target file to the database. If you use a loader that cannot load from multiple files, the session fails. Table 20-1 lists the external loaders and loader behavior:
Table 20-1. Partitioning Guidelines for External Loaders External Loader DB2 EE db2load DB2 EEE autoloader Oracle Load Behavior Cannot load from multiple output files. Cannot load from multiple output files.* Behavior based on parallel load configuration: - Disabled. Cannot load from multiple output files. - Enabled. Can load from multiple output files. Cannot load from multiple output files. Cannot load from multiple output files. Can load from multiple output files. Cannot load from multiple output files. Can load from multiple output files.

Sybase IQ Teradata MultiLoad Teradata TPump Teradata Fastload Teradata Warehouse Builder

*The PowerCenter Server cannot pass multiple output files to the DB2 EEE autoloader.

Errors and Error Messages
The PowerCenter Server writes external loader initialization and completion messages in the session log. For details on external loader performance, check the external loader log. The loader saves the log in the same directory as the target flat files (default location: $PMTargetFileDir). The default extension for external loader logs is .ldrlog.

External Loader Behavior

527

Loading to DB2
The DB2 EE external loader and DB2 EEE external loader can perform insert and replace operations on targets. The external loaders can also restart or terminate load operations. The DB2 EE external loader invokes the db2load executable located in the PowerCenter Server installation directory. The DB2 EE external loader can load data to a DB2 server on a machine that is remote to the PowerCenter Server. The DB2 EEE external loader invokes the IBM DB2 Autoloader program to load data. The Autoloader program uses the db2atld executable. The DB2 EEE external loader can partition data and load the partitioned data simultaneously to the corresponding database partitions. When you use the DB2 EEE external loader, the PowerCenter Server and the DB2 EEE server must be on the same machine. The DB2 external loaders load from a delimited flat file. Verify that the target table columns are wide enough to store all of the data. If you select a DB2 loader in a session with multiple partitions, the session fails. For more information about partitioning sessions with external loaders, see “Partitioning Sessions with External Loaders” on page 526. If you configure multiple targets in the same pipeline to use DB2 external loaders, each loader must load to a different tablespace on the target database. For information on selecting external loaders, see “Configuring External Loading in a Session” on page 553. When you load data to a DB2 database using the DB2 EE or DB2 EEE external loader, you must have the correct authority levels and privileges to load data to the database tables.

Setting DB2 External Loader Operation Modes
DB2 operation modes specify the type of load the external loader runs. You can configure the DB2 EE or DB2 EEE external loader to run in one of the following operation modes:
♦ ♦ ♦ ♦

Insert. Adds loaded data to the table without changing existing table data. Replace. Deletes all existing data from the table, and inserts the loaded data. The table and index definitions do not change. Restart. Restarts a previously interrupted load operation. Terminate. Terminates a previously interrupted load operation and rolls back the operation to the starting point, even if consistency points were passed. The tablespaces return to normal state, and all table objects are made consistent.

Configuring Authorities, Privileges, and Permissions
When you load data to a DB2 database using the DB2 EE or DB2 EEE external loader, you must have the correct authority levels and privileges to load data to the database tables.

528

Chapter 20: External Loading

DB2 privileges allow you to create or access database resources. Authority levels provide a method of grouping privileges and higher-level database manager maintenance and utility operations. Together, these act to control access to the database manager and its database objects. You can access objects for which you have the required privilege or authority. To load data into a table, you must have one of the following authorities:
♦ ♦ ♦

SYSADM authority DBADM authority LOAD authority on the database, and one of the following privileges:

INSERT privilege on the table when the load utility is invoked in INSERT mode, TERMINATE mode (to terminate a previous load insert operation), or RESTART mode (to restart a previous load insert operation) INSERT and DELETE privilege on the table when the load utility is invoked in REPLACE mode, TERMINATE mode (to terminate a previous load replace operation), or RESTART mode (to restart a previous load replace operation)

In addition, you must have proper read access and read/write permissions:
♦ ♦

The database instance owner must have read access to the external loader input files. If you run DB2 as a service on Windows, you must configure the service start account with a user account that has read/write permissions to use LAN resources, including drives, directories, and files. If you load to DB2 EEE, the database instance owner must have write access to the load dump file and the load temporary file.

For more information, consult your IBM DB2 database documentation.

Configuring DB2 EE External Loader Attributes
Table 20-2 describes attributes for DB2 EE external loader connections:
Table 20-2. DB2 EE External Loader Attributes Attributes Opmode Default Value Insert Description The DB2 external loader operation mode. Choose one of the following operation modes: - Insert - Replace - Restart - Terminate For more information about DB2 operation modes, see “Setting DB2 External Loader Operation Modes” on page 528. The name of the DB2 EE external loader executable file.

External Loader Executable

db2load

Loading to DB2

529

Table 20-2. DB2 EE External Loader Attributes Attributes DB2 Server Location Default Value Remote Description The location of the DB2 EE database server relative to the PowerCenter Server. Select Local if the DB2 EE database server resides on the PowerCenter Server machine. Select Remote if the DB2 EE Server resides on another machine. The method of loading data. Select Is Staged to load data to a flat file staging area before loading to the database. Otherwise, the data is loaded to the database using a named pipe. For more information, see “Loading Data Using Named Pipes” on page 526 or “Staging Data to Flat Files” on page 526. Sets tablespaces in backup pending state if forward recovery is enabled. If you disable forward recovery, the DB2 tablespace will not set to backup pending state. If the DB2 tablespace is in backup pending state, you must fully back up the database before you perform any other operation on the tablespace.

Is Staged

Disabled

Recoverable

Enabled

DB2 EE External Loader Return Codes
The DB2 EE external loader indicates the success or failure of a load operation with a return code. The PowerCenter Server writes the external loader return code to the session log. Return code (0) indicates that the load operation succeeded. The Informatica Server writes the following message to the session log if the external loader successfully completes the load operation:
WRT_8029 External loader process <external loader name> exited successfully.

Any other return code indicates that the load operation failed. The PowerCenter Server writes the following error message to the session log:
WRT_8047 Error: External loader process <external loader name> exited with error <return code>.

Table 20-3 describes the return codes for the DB2 EE external loader:
Table 20-3. DB2 EE External Loader Return Codes Code 0 1 2 3 4 Description The external loader operation completed successfully. The external loader cannot locate the control file. The external loader could not open the external loader log file. The external loader could not access the control file because the control file is locked by another process. The DB2 database returned an error.

Configuring DB2 EEE External Loader Attributes
You can configure the DB2 EEE external loader to use different loading modes when loading to the database. Loading modes determine how the DB2 EEE external loader loads data across

530

Chapter 20: External Loading

partitions in the database. You can configure the DB2 EEE external loader to use the following loading modes:
♦ ♦ ♦ ♦

Split and load. The DB2 EEE external loader partitions the data and loads it simultaneously on the corresponding database partitions. Split only. The DB2 EEE external loader partitions the data and writes the output to files in the specified split file directory. Load only. The DB2 EEE external loader does not partition the data. It loads data in existing split files on the corresponding database partitions. Analyze. The DB2 EEE external loader generates an optimal partitioning map with even distribution across all database partitions. If you run the external loader in split and load mode after you run it in analyze mode, the external loader uses the optimal partitioning map to partition the data.

For more information about DB2 loading modes, consult your DB2 database documentation. The DB2 EEE external loader also writes multiple external loader logs. The number of external loader logs depends on the number of database partitions to which the external loader loads data. For each partition, the external loader appends a number corresponding to the partition number to the external loader log file name. The DB2 EEE external loader log file format is file_name.ldrlog.partition_number. The PowerCenter Server does not archive or overwrite DB2 EEE external loader logs. If an external loader log of the same name exists when the external loader runs, the external loader appends new external loader log messages to the end of the existing external loader log file. You must manually archive or delete the external loader log files. For details on log files generated by DB2 Autoload, consult your DB2 documentation. For information on DB2 EEE external loader return codes, consult your DB2 documentation. Table 20-4 describes attributes for DB2 EEE external loader connections:
Table 20-4. DB2 EEE External Loader Attributes Attribute Opmode Default Value Insert Description The DB2 external loader operation mode. Choose one of the following operation modes: - Insert - Replace - Restart - Terminate For more information about DB2 operation modes, see “Setting DB2 External Loader Operation Modes” on page 528. The name of the DB2 EEE external loader executable file. The location of the split files. The external loader creates split files if you configure SPLIT_ONLY loading mode. The database partitions on which the load operation is to be performed.

External Loader Executable Split File Location Output Nodes

db2atld n/a n/a

Loading to DB2

531

Table 20-4. DB2 EEE External Loader Attributes Attribute Split Nodes Default Value n/a Description The database partitions that determine how to split the data. If you do not specify this attribute, the external loader automatically determines an optimal splitting method. The loading mode the external loader uses to load the data. Choose one of the following loading modes: - Split and load - Split only - Load only - Analyze Maximum number of splitter processes. Forces the external loader operation to continue even if it determines at startup time that some target partitions or tablespaces are offline. Number of megabytes of data the external loader loads before writing a progress message to the external loader log. You can specify a value between 1 and 4,000 MB. The range of TCP ports the external loader uses to create sockets for internal communications with the DB2 server. Specifies whether the external loader should check for record truncation during input or output. The name of the file that specifies the partitioning map. If you want to use a customized partitioning map, you must specify this attribute. You can generate a customized partitioning map when you run the external loader in Analyze loading mode. The name of the partitioning map when you run the external loader in Analyze loading mode. You must specify this attribute if you want to run the external loader in Analyze loading mode. The number of rows the external loader traces when you need to review a dump of the data conversion process and output of hashing values. The method of loading data. Select Is Staged to load data to a flat file staging area before loading to the database. Otherwise, the data is loaded to the database using a named pipe. For more information, see “Loading Data Using Named Pipes” on page 526 or “Staging Data to Flat Files” on page 526. The date format. The date format in the Connection Object definition must match the date format you define in the target definition. DB2 supports the following date formats: - mm/dd/yyyy - yyyy-mm-dd - dd.mm.yyyy - yyyy-mm-dd

Mode

Split and load

Max Num Splitters Force Status Interval

25 No 100

Ports Check Level Map File Input

6000-6063 Nocheck n/a

Map File Output

n/a

Trace Is Staged

0 Disabled

Date Format

mm/dd/ yyyy

532

Chapter 20: External Loading

Loading to Oracle
The Oracle SQL loader can perform insert, update, and delete operations on targets. The target flat file for an Oracle external loader can be fixed-width or delimited.

Loading Multibyte Data to Oracle
When you load multibyte data to Oracle, data precision is measured in bytes for fixed-width files and in characters for delimited files. Make sure the target table columns are wide enough to store all the data without risking data truncation. To widen the columns, increase the column size in the target table definition. Oracle supports character-oriented datatypes, such as Nchar, where the precision is measured in characters. If you use the Nchar datatype, multiply the maximum number of characters by K, where K is the maximum number of bytes a character contains in the selected target code page. This ensures that the PowerCenter Server does not truncate data before loading the target file.
Note: If you configure a session to write to an Oracle 8 table in bulk mode with NOT NULL

constraints on any columns, the session may write null data into a NOT NULL column.

Oracle External Loader Attributes
Use the following guidelines when you enter attributes for the Oracle external loader connection:

If you select an Oracle external loader, the default external loader executable name is SQLLOAD. This is accurate for most UNIX platforms, but if you use Windows, check your Oracle documentation to find the name of the external loader executable. Select Do Not Enable Parallel Load to write to a non-partitioned Oracle target table. To write to a partitioned Oracle target using Direct Path, you must select Enable Parallel Load and Append load mode. To write to a partitioned Oracle target using Conventional Path, select Enable Parallel Load for best performance.

♦ ♦ ♦

Tip: For optimal performance, select Direct Path when writing to a partitioned Oracle target.

For details, see your Oracle documentation.

Loading to Oracle

533

Table 20-5 describes the attributes for Oracle external loader connections:
Table 20-5. Oracle External Loader Attributes Attribute Error Limit Load Mode Default Value 1 Append Description Number of errors to allow before the external loader stops the load operation. The loading mode the external loader uses to load data. Choose from one of the following loading modes: - Append - Insert - Replace - Truncate The method the external loader uses to load data. Choose from one of the following load methods: - Use Conventional Path - Use Direct Path (Recoverable) - Use Direct Path (Unrecoverable) Determines whether the Oracle external loader loads data in parallel to a partitioned Oracle target table. Choose either Enable Parallel Load or Do Not Enable Parallel Load. You can create multiple partitions in a session if you use a loader configured to enable parallel load. Sessions with multiple partitions fail if you use a loader configured not to enable parallel load. For more information, see “Partitioning Sessions with External Loaders” on page 526. For Conventional Path load method, this attribute specifies the number of rows in the bind array for load operations. For Direct Path load methods, this attribute specifies the number of rows the external loader reads from the target flat file before it saves the data to the database. The name of the external loader executable file. The path and name of the external loader log file. The method of loading data. Select Is Staged to load data to a flat file staging area before loading to the database. Otherwise, the data is loaded to the database using a named pipe. For more information, see “Loading Data Using Named Pipes” on page 526 or “Staging Data to Flat Files” on page 526.

Load Method

Use Conventional Path

Enable Parallel Load

Enable Parallel Load

Rows Per Commit

10000

External Loader Executable Log File Name Is Staged

sqlload n/a Disabled

Reject File
The Oracle external loader creates a reject file for data rejected by the database. The reject file has an extension of .ldrreject. The loader saves the reject file in the target files directory (default location: $PMTargetFileDir).

534

Chapter 20: External Loading

Loading to Sybase IQ
The Sybase external loader can perform insert operations on Sybase IQ targets. It cannot perform update or delete operations on targets. Use the following rules and guidelines when you work with a Sybase IQ external loader:
♦ ♦ ♦ ♦ ♦

Ensure that target tables do not violate primary key constraints. Configure a Sybase IQ user with read/write access before you use a Sybase IQ external loader. Target flat files for a Sybase IQ external loader can be fixed-width or delimited. The PowerCenter Server can load multibyte data to Sybase IQ targets. If you select a Sybase IQ external loader in a session with multiple partitions, the session fails. For more information about partitioning sessions with external loaders, see “Partitioning Sessions with External Loaders” on page 526. If the PowerCenter Server and Sybase IQ Server are on different machines, map a drive from the machine hosting the PowerCenter Server to the machine hosting the Sybase IQ Server. In a UNIX environment, mount the drive.

Using Sybase IQ External Loader on UNIX
For Sybase IQ external loaders, the PowerCenter Server can write to a named pipe if the PowerCenter Server is local to the Sybase IQ database. Use pmconfig to enable the SybaseIQLocaltoPMServer option. If the PowerCenter Server is not local to the Sybase IQ database server or if you do not enable the option, the PowerCenter Server writes to a flat file.

Loading Multibyte Data to Sybase IQ
When you load multibyte data to Sybase IQ targets, consider the following issues involving data precision and delimiters.

Fixed-Width Flat File Targets
If you plan to load multibyte data into a fixed-width flat file target, configure the precision to accommodate the multibyte data. Fixed-width files are byte-oriented, not character-oriented. So when you configure the precision for a fixed-width target, you need to consider the number of bytes you load into the target, rather than the number of characters. The PowerCenter Server writes the row to the reject file if the precision is not large enough to accommodate the multibyte data. For more information about writing to flat files, see “Working with File Targets” on page 261.

Loading to Sybase IQ

535

Delimited Flat File Targets
For delimited flat files, data precision is measured in characters. When you insert multibyte character data in the target, you do not need to allow for additional precision for multibyte data. Sybase IQ does not allow optional quotes. You must choose None for Optional Quotes if you have a delimited target flat file. When you load multibyte data to Sybase IQ targets, null characters and delimiters can be up to four bytes each. To avoid reading the delimiters as regular characters, each byte of the delimiter must have an ASCII value of less than 0x40. For details on loading multibyte data to targets, see “Working with File Targets” on page 261.

Sybase IQ External Loader Attributes
Use the following guidelines when you enter attributes for the Sybase IQ external loader connection:

The connect string must contain the following attributes:
uid=user ID; pwd=password; eng=Sybase IQ database server name; links=tcpip; (host=host name; port=port number)

The server datafile directory is relative to the database server. If the directory is in a Windows system, use a backslashes (\) in the directory path:
D:\mydirectory\inputfile.out

If the directory is in a UNIX system, use a forward slash (/):
/mydirectory/inputfile.out ♦

When you create a Sybase IQ external loader connection, the Workflow Manager sets the name of the external loader executable file to dbisql by default. If you use an executable file with a different name, for example, dbisqlc, you must update the External Loader Executable field. If the external loader executable file directory is not in the system path, you must enter the file path and file name in this field.

Table 20-6 describes the attributes for Sybase IQ external loader connections:
Table 20-6. Sybase IQ External Loader Attributes Attribute Block Factor Default Value 10000 Description The number of records per block in the target Sybase table. The external loader applies the Block Factor attribute to load operations for fixedwidth flat file targets only. The size of blocks used in Sybase database operations. The external loader applies the Block Size attribute to load operations for delimited flat file targets only. If enabled, the Sybase IQ database issues a checkpoint after successfully loading the table. If disabled, the database issues no checkpoints.

Block Size

50000

Checkpoint

Enabled

536

Chapter 20: External Loading

Table 20-6. Sybase IQ External Loader Attributes Attribute Notify Interval Server Datafile Directory Default Value 1000 n/a Description The number of rows the Sybase IQ external loader loads before it writes a status message to the external loader log. The location of the flat file target. You must specify this attribute relative to the database server installation directory. Enter the target file directory path using the syntax for the machine hosting the database server installation. For example, if the PowerCenter Server is on a Windows machine and the Sybase IQ Server is on a UNIX machine, use UNIX syntax. The name of the Sybase IQ external loader executable. The method of loading data. Select Is Staged to load data to a flat file staging area before loading to the database. Otherwise, the data is loaded to the database using a named pipe. For more information, see “Loading Data Using Named Pipes” on page 526 or “Staging Data to Flat Files” on page 526.

External Loader Executable Is Staged

dbisql Enabled

Loading to Sybase IQ

537

Loading to Teradata
When you load to Teradata, you can use the following external loaders:

Multiload. Performs insert, update, delete, and upsert operations for large volume incremental loads. You can use this loader when you run a session with a single partition. Multiload acquires table level locks, making it appropriate for offline loading. For more information about configuring the Multiload external loader connection object, see “Teradata MultiLoad External Loader Attributes” on page 540. TPump. Performs insert, update, delete, and upsert operations for relatively low volume updates. You can use this loader when you run a session with multiple partitions. TPump acquires row-hash locks on the table, allowing other users to access the table as TPump loads to it. For more information about configuring the Tpump external loader connection object, see “Teradata TPump External Loader Attributes” on page 542. FastLoad. Performs insert operations for high volume initial loads, or for high volume truncate and reload operations. You can use this loader when you run a session with a single partition. You can only use this loader on empty tables with no secondary indexes. For more information about configuring the FastLoad external loader connection object, see “Teradata FastLoad External Loader Attributes” on page 545. Warehouse Builder. Performs insert, update, upsert, and delete operations on targets. You can use this loader when you run a session with multiple partitions. You can achieve the functionality of the other loaders based on the operator you use. For more information about configuring the Warehouse Builder external loader connection object, see “Teradata Warehouse Builder External Loader Attributes” on page 547.

If you use a Teradata external loader to perform update or upsert, you can use the Target Update Override option in the Mapping Designer to override the UPDATE statement in the external loader control file. For upsert, the INSERT statement in the external loader control file remains unchanged. For details on using the Target Update Override option, see “Mappings” in the Designer Guide. Use the following guidelines when you use the Teradata external loaders:
♦ ♦

The PowerCenter Server can use Teradata external loaders to load fixed-width flat files to a Teradata database. The target output file name, including the file extension, must not exceed 27 characters. If the session contains multiple partitions, the target output file name, including the file extension, must not exceed 25 characters. You cannot use spaces as null characters. You can use the Teradata external loaders to load multibyte data. You cannot use the Teradata external loaders to load binary data. When you load to Teradata using named pipes, set the checkpoint value to 0 to prevent external loaders from performing checkpoint operations. When you edit a session, you can specify error, log, or work table names, depending on the loader you use. You can also specify error, log, or work database names.

♦ ♦ ♦ ♦ ♦

538

Chapter 20: External Loading

When you edit a session, you can override the control file in the loader connection properties.

You can view the Teradata control file in the target directory. See the Teradata documentation for more information about the loaders.

Overriding the Control File
When you edit the loader connection in a session, you can override the control file. You might want to override the control file to change some loader properties that you