Informatica – the basics

Trainer: Muhammed Naufal

© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Purpose of the training
• •

The training is designed to have you start using PowerCenter, not to make you experts You‟ll know how to:
− Create logical and physical data flow − Design some simple transformations − Choose appropriate transformation for your processing − Schedule and execute jobs − Examine runtime log files − Debug transformations and solve data quality issues − Run Informatica logic from command line − Manage security in Powercenter
2

Agenda – Day 1
• • •


Introduction Software Installation and Configuration DW/BI Basics – Database, Data Warehouse, OLAP/BI, Enterprise Data Warehouse Architecture, ETL & Data Integration, Fitment - Informatica ETL in Data warehouse architecture. Informatica Architecture Brief Description of Informatica Tool & Components (Repository Administrator, Repository Manager, Workflow Manager, Workflow Monitor, Designers, Repository Server, Repository Agent, Informatica Server) LAB: Informatica Installation & Configuration

3

Agenda – Day 2 & 3

ETL Components − Designer (Source Analyzer, Workflow Designer, Task Designer, Mapplet Designer & Mapping Designer) − Workflow Manager (Task Designer, Worklet Designer, Workflow Designer, Workflow Monitor) − Repository Manager & Server Informatica Administration – Basics LAB: Informatica Administration

• •

4

Agenda – Day 3 & 4
• •

Transformations Classifications − Active/Passive − Re-usable − Connected/Un-Connected Transformations and Properties − Source Qualifier , Expressions, Lookup, Router, SeqGen, Update Strategy, Targets, Joiner, Filter, Aggregator, Sorter LAB: Transformations - Demo

5

Agenda – Day 5 • Mapplets and Mappings − Mapping Design − Mapping Development − Mapping Parameters and variables − Incorporating Shortcuts − Using Debugger − Re-usable transformations & Mapplets − Importing Sources & Targets − Versioning – Overview LAB: Mapplet & Mapping Designing • 6 .

etc 7 . DB table loading.Agenda – Day 6 • • Development of Mappings – Sample mappings LAB: Mapping designing for Flat files loading.

etc. . Emailing. Source/Target Connections.Agenda – Day 7 • Workflow and Properties − − − − − − Workflow Manager Tasks (Assignment. Parameters & Variables. Decision. Control. Task Designing. Override/revert properties) Workflow Execution & Monitoring Workflow recovery principles Task recovery strategy Workflow recovery options Command line execution 8 − − − − − • LAB: Workflow Designing. Scheduling. Timer) Session (Re-usable or Local) Worklets Workflow Design Session Configuration (Pre/Post Sessions. Memory Properties. Event. Worklet Designing. Files & Directories. Log/Error Handling. Sample Workflows. Command.

− Memory Management − Caches (Lookups. Aggregators. etc. 9 . Release • Management Demo. Types of Caches) − Performance Tuning − Release Management & Versioning − Repository Metadata Overview LAB: Performance Tuning Demo.Agenda – Day 8 • Advanced Topics − Revisit Informatica Architecture − Server Configurations (pmServer/pmRepServer). Metadata querying.

com.Agenda – Day 9 • GMAS – ETL Process & Development Methodology − Design/Development Guidelines. etc. Best Practices & references − my.Informatica. Checklists. Informatica discussion groups etc Question & Clarification sessions LAB Sessions – Sample mappings/workflows • • • 10 .

Overview. Software setup & Configuration .

ODBC.Informatica PowerCenter • • • • • PowerCenter is an ETL tool: Extract. XML. Save As Picture etc 12 . integrity reports. other) Metadata Repository/Versioning built in Integrated scheduler (possibility to use external) Number of cool features – XML Imports/Exports. Flat. Load A number of Connectivity options (DB-Specific. Transform.

ODBC. other) 13 . Targets can be over any network (local. processing • • Clients (Designer. workflow manager etc): manage the Repository Sources.PowerCenter Concepts • • • Repository: stores all information about definition of processes and execution flow Repository server: provides information from the Repository Server: executes operations on the data − Must have access to sources/targets − Has memory allocated for cache. FTP.

PowerCenter Concepts II Client s Rep Server Server(s ) Sources Repositor y Targets 14 .

Software installation • • • Copy the folder Pcenter 7.1. Do not install the servers or ODBC After the installation you may delete your local “Pcenter 7.1” folder 15 .1 from Share?? Install the Client.1.

Repository is <Server Name>. user is your Name 16 .Registering the Repository • • • You need to tell your client tools where is the Repository Sever Go to the Repository Manager. Choose Repository->Add Repository .

port. etc. 17 . choose Connect Fill in additional details.Registering the Repository II • • Right-click on the newly created repository. Password.

ora on your machine Verify the connection using SQLPLUS 18 .Register the Oracle database Instance • • Add the connection information to tnsnames.

g. Fill in details for your TNumberOL • • 19 . for imports) Go to Control Panel -> Administrative Tools -> Data Sources (ODBC) On the “System DSN” tab click “Add” and choose “Microsoft ODBC for Oracle”.Define the ODBC connection to waretl • This connection will be used to connect from Client tools directly to a database (e.

Environment – misc info • • • Login to Informatica with your User/Pwd Login to Oracle with User/Pwd Work in your INFA folder − First folder you open becomes your working folder. 20 .

STOP Wait for the class to finish the connectivity setup .

Sources. Targets .

Targets. Must have English locale settings on PC 23 • • • • • .overview • Designer – defines Sources. Transformations and their groups (Mapplets /Mappings) Workflow Manager – defines Tasks and Workflows (+scheduling) Workflow Monitor – monitors job execution Repository Manager – defines connection to repository Server Administration Console – for Admins only.Client tools .

The Designer tool Transformation selector Tool Selector Navigator Workspace Messages 24 .

Definition of the data flow • • The data flow structure is defined in the Designer client tool Following concepts are used: − Sources: structure of the source data − Targets: structure of the destination data − Transformation: an operation on the data − Mapplet: reusable combination of Transformations − Mapping: complete data flow from Source to Target 25 .

Mapping: definition of an e2e flow Target(s) Source(s) Transformation(s) 26 .

physical structure in the database. Sources are created using the Source Analyzer in the Designer client tool You can either create or import Source definitions 27 .overview • • • • • Sources define structure of the source data (not where the data is). Source + Connection = complete information It is only the internal Informatica information.g. not e.Sources .

ora files 28 .Sources – how to create a DB source • • • Go to Tools -> Source Analyzer Choose “Import from the database” Choose the waretl ODBC connection and fill in remaining details • If you get “unable to resolve TNS name” error make sure you have added the waretl server to all tnsnames.

Sources – how to create a DB source • Choose table SRC_TNumber and press OK 29 .

• • A Source is created You can see its definition on the Workspace 30 .Sources – Creating cont.

Editing Source properties • • • • To Edit a Source doubleclick on it in the Workspace You can manually add/delete/edit columns You can define PK/FK/other relations in INFA You can load based on PK (Unit). See the “Update Strategy” Transformation 31 .

Date format: dd-mmm-yyyy − Use REAL IDs for CUST and PROD that exist in some hierarchies. 32 . PROD. − Have at least 10 data rows. CUST.Sources: Exercise 1 • Create a comma-delimited file − Define header columns: DATE_SHIPPED. CUST and PROD tables available on waretl • • Define this comma-delimited source in Source Analyzer. Use name SRC_TNumber_Flat Preview data in your file in the Source Analyzer. SALES.

read the Designer Guide -> Chapters 2 and 3 .STOP Wait for the rest of the class to finish If you finish early.

g. not e.Targets – overview • • • • • • Targets define the structure of the destination objects Target + Connection = complete information It is only the internal Informatica information. Targets are created using the Warehouse Designer in the Designer client tool You can either create or import Target definitions Defining Targets is works the same way as defining Sources 34 . physical structure in the database.

Columns 35 .Targets .

Remember to have your Oracle and INFA definitions synchronized! − This can be done literally <1 minute :) 36 .Targets: Exercise I • • • Import table TGT_TNumber_1 to Warehouse Designer using “Import from Database” function Compare your Target with the Flat File Source SRC_TNumber_Flat Modify the your Target to be able to load all data from your Flat File.

.) − This too can be done <1 minute :> 37 .Targets: Exercise II • • Define a new Target called TGT_TNumber_2 Define all columns from our Flat File source plus new ones: − FACT_ID (PK) − GEO_ID • Create the Target in Oracle too! (and don‟t forget grants to dev_rol1.

STOP Wait for the rest of the class to finish If you finish early. read the Designer Guide -> Chapter 4 .

Mappings .Transformations.

Lookup.Overview • • • Transformations modify the data during processing. They are internal INFA objects only Transformations can be reusable Large number of transformations available − E. Filter. − Every transformation has different configuration options • If none applicable.g. write your own − PL/SQL execution straight from INFA − Can write custom COM transformations − “Custom” transformation executes C code 40 .. Aggregator.Transformations .

in Lookup transformation… • Very good online help available – Transformation Guide.g. Use it for self-study! 41 .Transformations – Overview II • Transformations can be created in: − Transformation Developer: reusable − Mapplet/Mapping Designer: processing-specific • Usually you can Override the default SQL generated by a Transformation − E.

Sort. Variable. Output.) 42 .Concepts • Transformations work on Ports: − Input. Group.. other (Lookup.Transformations .

Transformations – Concepts II • Transformations are configured using Properties Tab. That‟s THE job! − HUGE number of properties in total… 43 .

Aggregators etc • A number of things can make the Mapping invalid… 44 .Mappings – Overview • • • • Mapping is a definition of an actual end-to-end processing (from Source to Target) You connect Transformations by Ports – defining a data flow (SH30->CHAIN_TYPE/CHAIN) The data flows internally in INFA on the Server You can visualize the data flowing row by row from a Source to a Target − Exceptions: Sorters.

Mappings . Mappings->Create Use name m_TNumberLoad • Now we need to define the data flow in the Mapping 45 .Creating • • Choose Mapping Designer.

Transformations: Source Qualifier • • • • Begins the process flow from physical Source onwards Can select only a couple of columns from Source (will SELECT only those!) Can Join multiple sources (PK/FK relationship inside INFA. not Oracle) For relational sources: − You can override the “where” condition − Can Sort the input data − Can Filter the input data 46 .

Transformations: SQ II • • • • • For relational sources you can completely Override the SQL statement Is created by default when you drag a Source to the Mapping Designer Standard naming convention is SQ Some options are available only for Relational sources (e. sorting. distinct etc) As usually – more info in the Transformation Guide − Self-study on overwriting the default SQL ad Where conditions 47 .g.

SQ: Creating • • • Having Mapping Designer open. drag your Flat File SRC_TNumber_Flat onto Workspace SQ is created automatically for you SQ is often a non-reusable component 48 .

SQ: Ports + Workspace operations • • • • • Sources have only Output ports SQ has I/O ports Experiment: drag around your Source/SQ Experiment: select. see the Arrange/Arrange Iconic options… 49 . delete and connect Port connectors Right-click on the Workspace.

Use Zoom! 50 .Workspace • • Objects are named and color-coded Iconic view very useful for large flows.

Why? 51 . the ports and links should be connected automatically Hint: there‟s a Transformations menu available when in Mapping designer Hint: you can drag Ports to create them in the target Transformation Save your work – the Mapping is Invalid.Exercise: SQ • • • • • Delete the automatically created SQ Create manually an SQ for your Source.

Exercise: a complete Mapping • • Having Mapping Designer open. drag the TGT_TNumber_1 Target onto Workspace Connect appropriate Ports from SQ to Target • • Save .the Mapping is now Valid! :) Our Mapping is actually a SQL-Loader equivalent 52 .

STOP Wait for the rest of the class to finish .

Sessions .Connections.

Sources/Targets define only the structure • • • • A single Mapping can be executed over different connections An executable instance of a Mapping is called a Session A series of Sessions is a Workflow Workflows are defined in Workflow Designer 55 .Execution of a Mapping • To execute a Mapping you need to specify WHERE it runs − Remember.

Workflow Designer 56 .

Workflows
• • •

Workflows are series of Tasks linked together

Workflows can be executed on demand or scheduled
There are multiple workflow variables defined for every server (Server->Server Configuration)
− E.g. $PMSessionLogDir, $PMSourceFileDir etc.

• • •

The parameters are used to define physical locations for files, logs etc They are relative to Server, not your local PC! Directories must be accessible from Server
57

Connections
• • • •

Connections define WHERE to connect for a number of reasons (Sources, Targets, Lookups..) There are many Connections types available (any relational, local/flat file, FTP..) Connections are defined in the Workflow Manager Connections have their own permissions! (owner/group/others)

58

Connections: Exercise

Define a Shared connection to your own Oracle schema on <Schema_Name>
− In the Workflow Manager click on Create/Edit Relational Connection:

− Choose “New” Connection of type Oracle and fill in the necessary info

59

Tasks

Tasks are definitions of actual actions executed by Informatica Server
− Sessions (instances of Mappings) − Email − Command − More Tasks available within a Workflow (e.g. Event Raise, Event Wait, Timer etc.)

• •

Tasks have huge number of attributes that define WHERE and HOW the task is executed Remember, online manual is your friend :)

60

Sessions : Creating
• • •

Set the Workspace to Task Developer: Go to Tools->Task Developer Go to Tasks->Create, choose Session as a Task Type Create a task called tskTNumber_Load_1

61

Sessions : Creating II • Choose your Mapping 62 .

• Read about the “Update Strategy” Transformation to understand how to use this property.Sessions: important parameters • • • Access the Task properties by double-clicking on it General tab − Name of your Task Properties tab: − Session log file directory (default $PMSessionLogDir\) and file name − $Source and $Target variables: for Sources/Targets. e. Use Connections here (also variables :) ) − Treat Source Rows As: defines how the data is loaded. This is very useful property.g. Lookups etc. able to substitute concept of a Unit 63 .

Sessions : important parameters II • Config Object tab − Constraint based load ordering − Cache LOOKUP() function • • • Multiple error handling options Error Log File/Directory ($PMBadFileDir) All options in the Config Object tab are based on the Session Configuration (a set of predefined options). To predefine go to Tasks>Session Configuration 64 .

.) and details 65 .Sessions : important parameters III • • Mapping tab – defines the WHERE – Connections. Relational. Logs… For every Source and Target define type of connection (File Writer.

Sessions: relational connections • • • For relational Sources and Targets you can/must define owner (schema) Click on the Source/Target on the Connection tab The Attribute is : − “Owner Name” for Sources − “Table Name Prefix” for Targets • If you use a shared Connection and access private objects (without public synonyms) you MUST populate this attribute 66 .

post session commands − Email settings 67 .Sessions : important parameters IV • Components tab − Pre.

Sessions : Summary • • • • • • Lot of options – again. Connections for all Sources. Lookups etc Here‟s also definition of locations of flat files Define error handling/log locations for every Task Use Session Configs Majority of the Session options can be overwritten in Workflows :) − This allows e. Targets. you define all “Where”s: $Source.g. read the online manual Most importantly. $Target. to execute the same Session over different Connections! 68 .

$Target variables − Source and Target locations − Remember. “Indirect”= ? :> − Enable the “Truncate target table” option” for your relational Target. your Source is a flat file and Target is Oracle. This purges the target table before every load − During all Exercises use only Normal load mode (NOT Bulk) − $Source variable is a tricky one :> 69 .Sessions: Exercise • For your tskTNumber_Load_1 session define: − $Source. This means that the file contains actual data. − Source filetype must be set to “Direct”.

STOP Wait for the class to finish .

Workflow execution .

Workflows: Creating • • • Choose Tools-> Workflow Designer Choose Workflows->Create Create a Workflow wrkTNumber_Load_1 72 .

Workflows: Properties • • Available when creating a Workflow or using Workflows->Edit General tab: − Name of the workflow − Server where the workflow will run • Also avilable from Server->Assign Server 73 .

See online manual − Workflow log (different than Session logs!) 74 .Workflows: Properties II • Properties tab: − Parameter filename: holds list of Parameters used in Mappings.

then schedule some jobs and see what happens :) 75 .Workflows: Properties III • Scheduler tab: − Allows to define complete reusable calendars − Explore on your own! First read the online manual.

Workflows: Properties IV • Variables tab − Variables are also used during Mapping execution − Quick task: Find what is the difference between a Parameter and a Variable • Events tab − Add user-defined Events here − These events are used later on to Raise or Wait for a signal (Event) 76 .

Workflow: Adding tasks • With your Workflow open drag the tskTNumber_Load_1 Session onto the Workspace Go to Tasks and choose Link Tasks Link the Start Task with tskTNumber_Load_1 − The Start Task does not have any interesting properties • • 77 .

Workflow: editing Tasks • You can edit Task properties in a Workflow the same way as you do it for a single Task − Editing the task properties in Workflow overwrites the default Task properties − Overwrite only to change the default Task behavior − Use system variables if a Task will be executed e.g. every time on a different instance 78 .

Workflows: Running • You can run a Workflow automatically (scheduled) or on-demand − We‟ll run on-demand only in this course • Before you run a Workflow. run the Workflow Monitor and connect to Server first! 79 .

Workflows: Running II • In Workflow Designer right-click on the Workflow wrkTNumber_Load_1 and choose “Start Workflow” Go to Workflow Monitor to monitor your Workflow • 80 .

Workflows: Running II • • Workflow Monitor displays Workflow status by Repository/Server/Workflow/Session Two views available: GANTT view and TASK view 81 .

Workflows: Logs • You can get the Workflow/Session log by rightclicking on Workflow/Session and choosing the log − Remember. Session log is different than Workflow log! 82 .

Workflows: Logs II • Most interesting information is in the Session logs (e. Oracle errors etc) Exercise: − Where do you define location of Where are the Session/Workflow logs? − Manually locate and open the logs for your Workflow run − Find out why your Workflow has failed :> • 83 .g.

Logs: Session Log missing for a failed Session • Why would you get an error like that?: 84 .

Workflows: Restarting • • Restart your Workflow using “Restart Workflow from Task” (More about job restarting: Workflow Administration Guide -> Chapter 14 ->Working with Tasks and Workflows ) Debug until your Workflow finishes OK • 85 .

Warning! PowerCenter has problems refreshing objects between tools! Use Task->Edit or File>”Close All Tools” 86 • .Workflow: verifying • • • Check that the data is in your Target table. The table will be empty initially – why? Why there‟s an Oracle error in the Session Log about Truncating the Target table? Hint: when you modify a Mapping that is already used in a Session. you need to refresh and save the Session.

check “Truncate target” option and use Normal load mode (NOT Bulk) Create a new Workflow wrkTNumber_Load_2 that will run tskTNumber_Load_2 Run your Workflow. 87 • • • • . Define connection information to waretl.E2E flow: Exercise • Create a new Mapping m_TNumberLoad_2 that will move all data from TGT_TNumber_1 to TGT_TNumber_2 Create a new session tskTNumber_Load_2 for this Mapping. make sure it finishes OK Check that the data is in TGT_TNumber_2 table.

STOP Wait for the class to finish .

Summary: what you have learned so far • Now you can: • • • • • Define structures of Sources and Targets Define where the Sources and Targets are Create a simple Workflow Run a Workflow Debug the Workflow when it fails • You have all basic skills to learn further by yourself! .

Sequences .Transformations Expressions.

Transformations – what can you do? • • • We‟ll be going through a number of Transformations Only some (important) properties will be mentioned Read the Transformation Guide to learn more 91 .

Transformations: Expression (EXP) • • • Expression modifies ports‟ values This is a pure Server Transformation Remember Ports? Input. I/O. Variables − Convention: name ports IN_ and OUT_ • The only Property of EXP is Tracing Level 92 . Output.

Expression Editor • • Available in almost every Transformation Allows to easily access Ports and Functions 93 .

EXP: Example • • • Let‟s create an non-reusable Expression that will change customer “Bolek” into customer “Lolek” You can drag ports to copy them IIF function – similar to DECODE 94 .

Transformations: Sequence Generator • • • • SEQ can generate a sequence of numbers Very similar to Oracle sequence Can generate batches of sequences Each target is populated from a batch of cached values 95 .

SEQ: Ports • • Nextval Currval = Nextval + IncrementBy. No clue how this is useful 96 .

0 = caching disabled Reset: Rewind to Start Value every time a Session runs Disabled for reusable SEQs 97 . Cycle Number of Cached Values: The number of sequential values cached at a time. Use when multiple sessions use the same reusable SEQ at the same time to ensure each session receives unique values.SEQ: Properties • • • • • • • Start Value Increment By The difference between two consecutive values from the NEXTVAL port. End Value: The maximum value the PowerCenter Server generates. Current Value: The current value of the sequence.

Increment By = 2. Cache = 1000? 98 .SEQ: Batch processing • Guess what happens here?: Start = 1.

(remember: grants. Save. Choose “Truncate Target” option 99 • • • . SALES and SALESx2 columns Create appropriate table in Oracle. you should have two Targets).. The Mapping is valid even though the Target is not connected– why? In the Workflow define Connection for this Target (remember.) Add this Target to your Mapping m_TNumberLoad_2 (so. use Normal load mode). synonyms if needed.Transformations: Exercise • Create a Target called TGT_TNumber_Tmp that will hold FACT_ID.

The original SALES field goes to SALES and the multiplied one to SALESx2 100 . The input number must be accessible after the transformation Use this EXP to multiply the SALES field when passing it to TGT_TNumber_Tmp.EXP: Exercise • • • Create a reusable expression called EXP_X2 that will multiply an integer number by two.

SEQ: Exercise • Create a reusable SEQ called SEQ_TNumber_FACT_ID. Cache = 2000 Populate the FACT_ID field in both targets to have the same value from the sequence (parent/detail). Increment By = 6. • 101 . Start = 1.

Rerun the workflow a couple of times. go to Repository -> “Close All Tools” • • • You may need to modify the Workflow before you run it .what information needs to be added? Verify that the data is in both TGT_TNumber_2 and TGT_TNumber_Tmp. What happens with FACT_ID field on every run? 102 .Exercise: verify • • Run your modified Mapping m_TNumberLoad_2 Remember to refresh your Task: − Try Task Editor -> Edit − If you get an Error. the SALES_x2 column equals SALES*2 and the same FACT_ID is used for the same row in _2 and _TMP table.

_TMP and _2 tables will have different FACT_IDs 103 .Solutions: • Wrong! SEQ initialized for every target.

Solutions: • Wrong! SEQ initialized for every target. _TMP and _2 tables will have different FACT_IDs 104 .

The correct solution • SEQ initialized once (one ID for every row) 105 .

STOP Wait for the class to finish .

The Debugger .

The Debugger • Debugger allows you to see every row that passes through a Mapping − Every transformation can be debugged in your Mapping − Nice feature available: breakpoint − The Debugger is available from the Mapping Designer • • • To start the debugger. open your Mapping in Mapping Designer and go to Mappings -> Debugger -> Start Debugger We‟ll be debugging the m_TNumberLoad_2 Mapping Warning: the Debugger uses a lot of server resources! 108 .

This limits debug capabilities 109 .The Debugger: Setting up • You need a Session definition that holds a Mapping to run Debugger − Best way is to use an existing Session − You may create the temporary Debug session.

The Debugger: Setting up II • • Select the Session you want to use All properties of this session (Connections etc) will be used in the debug session 110 .

The Debugger: Setting up III • • Select the Targets you want to debug. This defines flow in the Debugger You have an option to discard the loaded data 111 .

The Debugger: running • The Mapping Designer is now in debug mode Flow Monitor Transformation you’re debugging Target data 112 .

Choose Debugger -> Next instance. you run the Debug session for one Target! Most optimal operation with the Debuger: − Choose your mapping on the Workspace − Choose Debugger -> Step to Instance. This is actually step-by! 113 • • . Goes directly to your Transformation − Or.The Debugger: Running II • Select EXP_x2 as the Transformation you‟ll be monitoring Remember.

The Debugger: View on the data • You can view individual Transformations in the Instance window. The data is shown with regard to given data row: • • “Continue” = run the Session (if there are no Breakpoints the Session with run until OK/Failure When the whole Source is read. the Debugger finishes 114 .

The Debugger: Breakpoints • You can add breakpoints to have the Debugger stop on given condition − Mappings -> Debugger -> Edit Breakpoints 115 .

Breakpoints: Setting up • • • Global breakpoints vs. 116 . Data breakpoints A number of conditions available.. Instance breakpoints Error breakpoints vs.

− A number of restrictions apply: usually can modify only output ports and group conditions 117 . really useful.The Debugger: Exercise • • • Experiment on your own :) Check the difference between Step Into and Go To Setup breakpoints • You can actually change the data in Debugger flow! Check out this feature.

STOP Wait for the class to finish .

Transformations Lookups .

To learn more about caching read the Transformation Guide • Question: where is the lookup cache file created? 120 . persistent or not − Lookup caching is a very large subject.Transformations: Lookups (LKP) • • • Lookups find values Connected and Unconnected Cached and Uncached − PowerCenter has a number of ways to cache values − Cache can be static or dynamic.

Lookups: Connected vs. Unconnected • • Connected lookups receive data from the flow while Unconnected are “inline” functions Unconnected lookup has one port of type “R” = Return Connected Default values Caching #returned columns Supported Any type Multiple Unconnected Not supported (possible using NVL) Static only Only one 121 .

import the definition on Lookup table as either Source or Target − REF_CTRL is owned by user infa_dev • While in the Transformation Designer choose Transformations -> Create and create lkpRefCtrl 122 .Lookups: Creating • First.

Lookups: Creating II • Choose the Table for your Lookup − Only one table allowed 123 .

be careful when deleting. 124 ..Lookups: Creating III • Delete the columns not used in Lookup − If this is a reusable lookup.

Lookups: defining lookup ports and condition • There are two important types of ports: Input and Lookup − Combinations of I/O and O/L ports allowed • • • Input ports define comparison columns in the INFA data flow Lookup ports are columns in the Lookup table For unconnected Lookup there one “R” port – the return column 125 .

Lookups: Creating join ports • Create two port groups: − IN_ ports for data in the stream (“I”nput) − Comparison in the physical table (“L”ookup) − CTRL_PERD will be the “R”eturn column 126 .

Lookups: Creating join conditions

Match the “I”input ports with “L”ookup ports on the Condition tab

127

Lookup: important Properties

Lookup Sql Override: you can actually override the Join. More on Overriding in the Designer Guide Lookup table name: Join table Lookup caching enabled: disable for our Exercise Lookup policy on multiple match: first or last Connection Information: very important! Use $Source, $Target or other Connection information

• • • •

128

Lookups: using Unconnected

We‟ve created an Unconnected lookup
− How can you tell?


Use Unconnected lookup in an Expression as “O” port using syntax: :LKP.lookup_name(parameters)
For example: :LKP.lkpRefCtrl('BS',31,IN_CUST_STRCT,1)
− Parameters can be of course port names − You must put the Unconnected lookup in the Mapping (floating)

129

Lookups: using Connected
• •

Connected Lookups don‟t have “R”eturn port (they use “O”utput ports) Data for the lookup comes from the pipeline

130

Lookups: Exercise
• • • • •

Modify Mapping m_TNumberLoad_2 to find a ISO_CNTRY_NUM for every customer. Column GEO_ID should be populated with ISO_CNTRY_NUM. Use default values if CUST_ID not found Choose Connected or Unconnected Lookup Run your modified Workflow

Verify that when your Workflow finishes OK all rows have GEO_ID populated

131

STOP Wait for the class to finish .

Transformations Stored Procedures .

Oracle Partitioning for promotions in Informatica (post-session) − PowerCenter has hard time running inlines! 134 .Transformations: Stored Procedure • • • Used to execute an external (database) procedure Naming convention is name of the Procedure Huge number of different configurations possible − Connected.g.and post-session/load − Returned parameters • • We‟ll do an example of a simple stored procedure Possible applications: e. Unconnected (same as Lookups) − Pre.

Stored Procedures: Creating • The easiest way is to Import the procedure from DB: Transformation -> Import SP 135 .

• . and uses correct function name :) 136 ...Stored Procedures: Importing • Importing creates required Ports for you.

Stored Procedures: watchouts • A number of watchouts must be taken into account when using SP Transformation − Pre/post SPs require unconnected setup − When more than one value is returned use Mapping Variables − Datatypes of return values must match Ports • Transformation Guide is your friend 137 .

Stored Procedure • • • • Create a procedure in Oracle Import the procedure in Designer Use this procedure as Connected transformation. Run your Workflow • What does this function do? :> 138 .

STOP Wait for the class to finish .

Joiner.Transformations Overview Aggregator. Variables . Transaction Control. Sorter. Router. Filter. Update Strategy.

Overviews • • • Number of interesting Transformations and techniques – outside of the scope of this training Overview gives you an idea that a possibility exists to do something If you want to learn more – self study: read the Designer Guide and the Transformation Guide 141 .

Overview: Aggregation (AGG) • • • Similar to Oracle‟s group-by. functions available Active transformation – changes #rows − A number of restrictions apply A number of caching mechanisms available 142 .

Overview: Filter (FL) • Allows to reject rows that don‟t meet specified criteria. Rows filtered are not in the reject file. − Active transformation • Use as early in the flow as possible 143 .

Overview: Router (RTR) • Used to sent data to different Targets − Active transformation − Don‟t split processing using Router and then join back! − Often used with Update Strategy preceding • Typical usage: 144 .

Router: configuring • Ports are only Input (in reality I/O) • Define condition groups (a row is tested against all groups) 145 .

Router: using in Mapping • Router receives the whole stream and sends it different way depending on conditions 146 .

If more joins needed. use consecutive JNRs 147 .Overview: Joiner (JNR) • Joins pipelines on master/detail basis − Special Port available that marks one of the pipeline sources as Master − Joiner reads ALL (including duplicate) rows for Master and then looks up the detail rows. • • • Outer joins available (including full outer) Caching mechanisms available − Sorted input speeds up processing Restrictions: − Can‟t use is either input pipeline contains an Update Strategy transformation − Can‟t use if one connects a Sequence Generator transformation directly before the Joiner transformation − Allows to join two pipelines.

Joiner: Example • Joining QDF with CUST_ASSOC_DNORM before an aggregation 148 .

Joiner: Ports • One pipeline is master. “M” port denotes which one is which 149 . the other one is detail.

Joiner: Properties • Join Condition tab defines the join .) • Properties tab lets you define join type (amongst other properties) 150 .

g. :IIF( ( SALES_DATE > TODAY). Insert. DD_REJECT.Overview: Update Strategy (UPD) • • This transformation lets you mark a row as Update. 151 . Delete or Reject You do it by a conditional expression in the Properties tab of UPD − E. DD_UPDATE ) • UPD can replace the Unit concept from S1..

create I/O pass-through ports Then enter the conditional expression into Properties tab of UPD. Use variables: − Insert DD_INSERT 0 − Update DD_UPDATE 1 − Delete DD_DELETE 2 − Reject DD_REJECT 3 • You must set properly Session properties.Update Strategy: setting up • • To set up. Read more in the Transformation Guide − For example you must select “Treat Source Rows As” Session Property option to “Data Driven” 152 .

TC_CONTINUE_TRANSACTION) 153 . TC_COMMIT_BEFORE. populate the conditional clause in Properties − For example IIF(value = 1.Overview: Transaction Control (TC) • • This transformation defines commit points in the pipeline To setup.

otherwise the mapping is invalid Read more in the Transformation Guide 154 .TC: Defining commit points • Use following system variables: − TC_CONTINUE_TRANSACTION. − TC_COMMIT_BEFORE − TC_COMMIT_AFTER − TC_ROLLBACK_BEFORE − TC_ROLLBACK_AFTER • There‟s “transformation scope” in majority of transformations • • Transaction control must be effective for every source.

Overview: Sorter (SRT) • Allows to sort on multiple keys.. JNR.) 155 . Has only I/O Ports − Option to output only distinct data (all Ports become Keys) − Option for case sensitive sorts − Option to treat Null as High/Low − Caching mechanism available • Sorter speeds up some Transformations (AGG.

Ports are evaluated in order of dependency: − Input − Variable − Output 156 .Variable ports • You can use Variable ports to: − Store interim results of complex transformations − Capture multiple return values from Stored Procedures − Store values from previous rows • Remember.

Variables: storing values from previous rows • • Useful when e. running inlines for distinct values You need to create two variable Ports to store values from previous rows (why? :> ) 157 .g.

modify the Transformation to complete the task without Variables − Change back your mapping to use Lookup instead of Joiner 158 . Use Variables • • Run the modified workflow Verify that GEO_ID is derived (from CUST.ISO_CNTRU_NUM) and loaded into the Target table • Once you finish.Transformations: Exercise • • Modify the m_TNumberLoad_2 mapping to use JNR transformation for geography lookup (instead of LKP) Count the number of distinct customers in the pipeline (modify the Target table to have CNT column).

Target Add a PK on your Source (e. TRANX_ID) Add TRANX_ID to your flat file Use the UPD strategy to insert new rows and update already existing rows (based on TRANX_ID field) − Remember.Transformations: Exercise II • • • • Build a Loader mapping just with the following objects: Source.g. Load again. check that the Target is updated as needed (rows are added/modified/deleted) 159 . UPD. SQ. set correct Session parameters • • Load your flat file Verify: add and modify some rows in the flat file.

STOP Wait for the class to finish .

Worklets .Mapplets.

Can‟t contain Targets or other Mapplets • Special Transformations available − Input − Output 162 .Mapplets: Overview • • • Mapplets are reusable sets of Transformations put together into a logical unit Can contain Sources and some Transformations.

Mapplets in Mapping • • Use Mapplet Input and Output ports Connect at least one I and one O port 163 .

164 . − You cannot run two instances of the same Worklet concurrently across two different workflows.Worklets: Overview • Worklets are sets of Tasks connected into a logical unit − Can be nested − Can be reusable − Can pass on persistent variables • Runtime restrictions apply: − You cannot run two instances of the same Worklet concurrently in the same workflow.

verify results 165 .Mapplets: Exercise • Create a Mapplet mplt_EMP_NAME that: − Has Input ports of EMPNO and MGR − Looks up the ENAME and DEPTNO fields from EMP table for the input EMPNO − Filters out all rows that have the MGR<=0 • The Mapplet should have three Output ports: − EMPNO. DEPT_NAME as fields (Take the structure of the fields from EMP table) Create a mapping map_EMP_NAME which has EMP1 as Source and EMP_DEPT as target and use the above mapplet inside this mapping.MGR. MGR and DEPTNO concatenated with ENAME as DEPT_NAME • • • Create a table called EMP_DEPT which has EMPNO. Run the Mapping.

STOP Wait for the class to finish .

Advanced Scheduling .

Advanced Scheduling • When you build Workflows in the Workflow Designer you can use a number of non-reusable components Different control techniques available To use the non-reusable components go to Tasks -> Create when in Workflow Designer • • 168 .

e. Start Time − You can access the Workflow Variables here! • Number of predefined Workflow Variables are available • You can create Variables persistent between Workflow runs! • Workflow Variables must be predefined for the Workflow (Workflow -> Edit and then the Variables tab) • Task properties: 169 .Workflows: Links • Links have conditions that set them to True or False − Double-click on a Link to get its properties − If the Link condition evaluated to True (default) the Link executes its target Task • Use Expression Editor to modify Link conditions − Access properties of Tasks.g.

Expression Editor • Links (and some Tasks) can use the Expression Editor for Workflow events 170 .

Tasks: Command • • • “Command” Task executes any script on the Informatica Server It can be reusable Property “Run if previous completed” controls execution flow when more than one script is defined 171 .

Condition”) − On the Properties tab edit the “Decision Name” parameter. but use Decision for Workflow clarity 172 . You can use Expression Editor here • You can use this variable later in a link − Of course you can evaluate the Decision condition directly in a Link.Tasks: Decision • The Decision task sets a Workflow variable (“$Decision_task_name.

> Edit • Variables can be persistent between Workflow runs 173 .Workflow: Variables • • Variables are integers available from Workflow They need to be defined upfront in Workflow definition − Go to Workflows .

Workflow: Variables II • • Persistent variables are saved in the repository You can check the value of a Variable in the Workflow log − Not in the session log! 174 .

Tasks: Assignment
• • •

The Assignment task sets the Workflow variables

You can use the Expression Editor
One Assignment task can set multiple variables

175

Tasks: Email
• •

Use to … send emails! ;) Can be reusable

The waretl server is not set up to send emails

You can use all Workflow Variables and Session Properties
− Including $PMSuccessEmailUser or $PMFailureEmailUser server variables
176

Emails: Advanced

Every Session can send an email on success or failure
− Additional Email options available! Go to Edit Email > Email Subject and click on the small arrow

177

Emails: Additional options

Additional options are available for Email body when using from within a Session

178

Events: Overview
• •

Events are a way of sending a signal from one Task to another Two types of Events supported:
− User-defined (define in Workfow->Events) − Filewatcher event

Two Event Tasks available:
− Event Raise − Event Wait

Actually you can use Links to do the same..

179

Events: User-defined • Create user-defined Events in Workflow properties − Workflows -> Edit • Use them later in Event Raise/Wait Tasks 180 .

Events: Example • Sending Events = Links (in a way.) 181 ..

the Even must be predefined for a Workflow • Only one Event can be Raised 182 .Tasks: Event Raise • Raises a user-defined Event (sends a signal) − Remember.

Tasks: Event Wait • Waits for an Event to be raised − User-defined Event (signal). or − Filewatcher event • Properties available − Enable Past Events! 183 .

end) − There‟s an option to delete the file immediately after the filewatcher kicks in − No wildcards are allowed • Discussion: how to emulate S1 filewatcher. waiting and loading for multiple files? 184 .g.Event Wait: filewatcher • The filewatcher event is designed to wait for a marker file (e. *.

Tasks: Control • • • The Control Task fails unconditionally The abort command can be send to different levels Read more in the Workflow Administration Guide 185 .

Tasks: Timer • The Timer Task executes − On a date (absolute time) − After relative waiting time 186 .

Tasks: Exercise • • Create a new Workflow that will use two Sessions: .Load_2 Session after every 3 runs of ..Load_2 Run the ..Load_1 and .. rerun the whole Workflow − How will you verify success? • Obligatory tasks: − Decision − Event Raise/Wait 187 .Load_1 − Don‟t cycle the Sessions..

STOP Wait for the class to finish .

Command line execution .

g. to: − Schedule Powercenter tasks using an external scheduler − Get status of the server and its tasks • Big number of commands available – see online manual: − Workflow Administration Guide -> Chapter 23: using pmcmd 190 .overview • • A command line tool to execute commands directly on the Informatica server Useful e.pmcmd .

net:4001 -u bartek -p mypassword 191 .example pmcmd getserverdetails <-serveraddr|-s> [host:]portno <<-user|-u> username|<-uservar|-uv> userEnvVar> <<-password|-p> password|<-passwordvar|-pv> passwordEnvVar> [-all|-running|-scheduled] pmcmd getserverdetails -s waretl.pmcmd .emea.cpqcorp.

import/export objects. bulk operations (create 10 users) Usage: pmrep command_name [-option1] argument_1 [-option2] argument_2..overview • Updates session-related parameters and security information on the Repository − E. 192 .pmrep .g. create new user.. • • Very useful for e. create the Connection..g.

pmrep .usage • The first pmrep command must be “connect” Pmrep connect -r repository_name -n repository_username <-x repository_password | X repository_password_environment_variable> -h repserver_host_name -o repserver_port_number • The last command must be “exit” Full list of commands in the Repository Guide -> Chapter 16: using pmrep 193 pmrep exit • .

pmcmd: exercise Run the workflow wrkTNumber_Load_2 from command line: pmcmd startworkflow <-serveraddr|-s> [host:]portno <<-user|-u> username|<-uservar|-uv> userEnvVar> <<-password|-p> password|<-passwordvar|-pv> passwordEnvVar> [<-folder|-f> folder] [<-startfrom> taskInstancePath] [-recovery] [-paramfile paramfile] [<-localparamfile|-lpf> localparamfile] [-wait|-nowait] workflow • 194 .

STOP Wait for the class to finish .

Parameters and Variables .

Parameters and Variables • Parameters and Variables are used to make Mappings/Workflows/Sessions more flexible − Example of a Session variable: name of a file to load − We had an exercise for Workflow variables already − Don‟t confuse with port variables! • Parameters don‟t change but Variables change between Session runs. The changes are persistent Both Variables and Parameters can be defined in the Parameter File − Except port variables − Variables can initialize without being defined upfront 197 • .

to load data incrementally (a week at a time) − Can‟t mix Mapplet and Mapping parameters • • Described in the Designer Guide -> Chapter 8 Use them inside transformations in regular expressions − E. Routers… 198 .g. Filters.Mapping Parameters and Variables • Used inside a Mapping/Mapplet − E. in SQs (WHERE).g.

Mapping Parameters and Variables II • To use a Mapping parameter or variable: − Declare them in Mappings -> Declare Parameters and Variables − If required define parameters and variables in the Parameter file (discussed later) − For variables set the Aggregation type to define partitioning handling − Change the values of variables using special functions • SetVariable. SetMaxVariable… 199 .

Session Parameters • • • Very useful! Can be used to have the same Session work on different files/connections Must be defined in the Parameter File Conventions: Parameter Type Database Connection Source File Target File Lookup File Naming Convention $DBConnectionName $InputFileName $OutputFileName $LookupFileName Reject File $BadFileName 200 .

Session Parameters .usage • • You can replace majority of the Session attributes with Parameters Described in detail in the Workflow Administration Guide -> Chapter 18: Session Parameters 201 .

the change of its value in Mapping will have no effect when the Session runs again! • Described in detail in the Workflow Administration Guide -> Chapter 19: Parameter Files 202 .Parameter File • Parameter file is used to define values for: − Workflow variables − Worklet variables − Session parameters − Mapping/Mapplet parameters and variables • The variable values in the file take precedence over the values saved in the Repository − This means that if a Variable is defined in a Parameter File.

Parameter Files II • Parameter Files can be put on the Informatica Server machine or on a local machine − Local files only for pmcmd use • Parameter files can be defined in two places: − Session Properties for Session/Mapping parameters − Workflow properties − Don‟t know why there are two places… • A single parameter file can have sections to hold ALL parameters and variables 203 .

] − Session parameters.WT:worklet name] − Worklet variables in nested worklets: [folder name.session name] − or − [session name] • Values are defined as: − name=value 204 ...WT:worklet name.WF:workflow name] − Worklet variables: [folder name.ST:session name] − or − [folder name. plus mapping parameters and variables: [folder name.Parameter File Format • You define headers for different sections of your parameter file: − Workflow variables: [folder name.WF:workflow name.WF:workflow name.WF:workflow name.WT:worklet name.

s_MonthlyCalculations] $$State=MA $$Time=10/1/2000 00:00:00 $InputFile1=sales.s_MonthlyCalculations] $$State=MA $$Time=10/1/2000 00:00:00 $InputFile1=sales_test_file.txt $DBConnection_target=sales_test_conn 205 .Parameter File Example [folder_Production.txt $DBConnection_target=sales [folder_Test.

Exercise: Mapping Variables & Parameters • Modify the map_EMP_NAME Mapping to load only one Employee specified by a Parameter • Remember to define the Parameter in the Parameter File • Modify the Mapping to store the SAL as SAL+(SAL*30/100) (increase salary by 30%) Use Mapping Variables Test! • 206 .

Exercise: Session Parameters • Modify the S_EMP_NAME Mapping to use a Parameter for the file name to be loaded − Remember to define the Parameter in the Parameter File • How would you load e.g. 10 files one after another using the same Session? 207 .

STOP Wait for the class to finish .

Security overview .

Security in PowerCenter • • Security topics are described in Repository Guide -> Chapter 5. Repository Security PowerCenter manages privileges internally − Repository privileges (individual or group) − Folder permissions − Connection privileges • Authentication can be either internal or using LDAP • Security is managed through the Repository Manager − You need to have appropriate privileges to manage security! :) 210 .

Groups • Individual (User) and Group privileges are combined to get the overall view on someone‟s permissions The group Administrators has all possible privs • 211 .Users.

Repository privileges • • • Repository privileges are granted to Groups and Users The Repository privileges work on Objects! Detailed description of Repository privileges is in the Repository Guide -> Chapter 5 -> Repository Privileges 212 .

Object permissions • Object permissions apply in conjunction with Repository privileges − Folders − Connections − Other.. 213 .

Performance tuning (basics) .

sorters) − Source/Target performance − Power of the Informatica Server/Repository Server machines • Good overview in the Workflow Administration Guide -> Chapter 25: Performance Tuning 215 .Performance tuning • Workflow performance depends on a number of things: − Mapping performance • Database performance • Lookups • Complex transformations (aggregator.

Performance: What can we tune • Eliminate source and target database bottlenecks − Database/remote system throughout put − Lookup logic • Eliminate mapping bottlenecks − Transformation logic • Eliminate session bottlenecks − Performance-relates Session parameters − Increase #partitions • • Eliminate system bottlenecks − Increase #CPUs. memory Evaluate bottlenecks in this order! 216 .

− Use Filter directly after the SQ − Run the SQ query manually and direct output to /dev/null • LAN speed can affect the performance dramatically for remote Sources/Targets − Query remotely/locally to identify LAN problems 217 .Bottlenecks: Identifying • Target bottlenecks − If Target is relational or remote location.Usually only if relational or remote source. change to local Flat file and compare run time • Source bottlenecks .

Bottlenecks: Identifying Mapping • Mapping bottlenecks − Put Filters just before Targets: if the run time about the same you may have a Mapping bottleneck − Some transformations are obvious candidates • Lookups − Multiple Transformation Errors slow down transformations − Use Performance Details file 218 .

Performance Detail File • Enable in Session Properties • The Performance Detail File has very useful information about every single transformation − File is created in the SessionLog directory − Big number of performance statistics available − Workflow Administration Guide -> Chapter 14 Monitoring Workflows -> Creating and Viewing Performance Details 219 .

Performance File .example Transformation Name LKP_CUST_GENERIC Counter Name Lookup_inputrows Lookup_outputrows Lookup_rowsinlookupcache Counter Value 107295 214590 1239356 220 .

221 . or Rank transformations indicate a session bottleneck.Bottlenecks: Identifying Session • • Usually related to insufficient cache or buffer sizes Use the Performance File − Any value other than zero in the readfromdisk and writetodisk counters for Aggregator. Joiner.

Then.000 bytes. you can calculate the buffer size and/or the buffer block size to create the required number of session blocks. first determine the number of memory blocks the PowerCenter Server requires to initialize the session. ♦ Default Buffer Block Size. The default setting is 64.000. you can increase the number of available memory blocks by adjusting the following session parameters: ♦ DTM Buffer Size. If you run a session that has more than 83 sources and targets.Allocating Buffer Memory By default.000 bytes. Increase the DTM buffer size found in the Performance settings of the Properties tab. 222 . a session has enough buffer blocks for 83 sources and targets. To configure these settings. based on default settings. The default setting is 12. Decrease the buffer block size found in the Advanced settings of the Config Object tab.

000.Buffer Size/Buffer Block For example.9 * 12000000 / 54000 * 1 223 .9 * 14222222 / 64000 * 1 or 200 = .9) * (DTM Buffer Size) / (Default Buffer Block Size) * (number of partitions) 200 = . based on default settings. or you can change the Default Buffer Block Size to 54. You determine that the session requires 200 memory blocks: [(total number of sources + total number of targets)* 2] = (session buffer blocks) 100 * 2 = 200 2.000: (session Buffer Blocks) = (. Next. you determine that you can change the DTM Buffer Size to 15.000. 1.Example . you create a session that contains a single partition using a mapping that contains 50 sources and 50 targets.

Bottlenecks: Identifying System • Obvious to spot on the hardware: − 100% CPU − High paging/second (low physical memory) − High physical disk reads/writes 224 .

331108] secs.979465] secs. Total Idle Time = [319.280695]. Total Idle Time = [248.053326] secs.658512] secs. Busy Percentage = [58.055001]. MASTER> PETL_24019 Thread [TRANSF_1_1_1] created Total Run Time = [592.A balanced Session • The Session Log has statistics on Reader/Transformation/Writer threads (at the end of the file) MASTER> PETL_24018 Thread [READER_1_1_1] created Total Run Time = [595. MASTER> PETL_24022 Thread [WRITER_1_*_1] created Total Run Time = [535. Total Idle 225 • • • . Busy Percentage = [46.725231] secs.

having the overall architecture in mind − Relational databases (Sources.) − Informatica server 226 .. Lookups.Increasing performance • • • A huge subject in itself For every bottleneck there is a number of optimization techniques available Think creatively. Targets.

Tuning Sources/Targets • • Increase the database throughout put Limit SQs − Limit incoming data (# rows) − Tune SQ queries − Prepare the data on the source side (if possible) • • For Targets use Bulk Loading and avoid PKs/Indexes Increase LAN speed for remote connections 227 .

persistent If possible use sorted transformations − Aggregator. Joiner Use Filters as early in the pipeline as possible Use port variables for complex calculations (factor out common logic) Use single-pass reading 228 .Tuning Transformations • • • • • • Tune Lookups with regard to DB performance Use appropriate caching techniques − For Lookups: static vs dynamic.

Optimizing Sessions/System

Increase physical servers capacity
− #CPUs − Memory − LAN − HDD speed


Use appropriate buffer sizes
− Big number of options available

Use bigger number of machines
− Informatica Grids − Oracle‟s RACs

229

Pipeline Partitioning - Overview

Pipeline Partitioning is a way to split a single pipeline into multiple processing threads

Workflow Administration Guide -> Chapter 13: Pipeline Partitioning
230

Default Partition Points

231

Pipeline Partitioning

In a way one partition is a portion of the data
− Partition point is where you create “boundaries” between threads − Different partition points can have different #partitions

This means that there can be multiple
− Reader threads − Transformation threads − Writer threads

• •

This requires multi-CPU machines and relational databases with parallel options enables HUGE performance benefits can be achieved
− If you know what you‟re doing, otherwise you may actually lower system performance!

232

Understanding Pipeline Flow

Pipeline partitions are added in the Mapping tab of Session properties (Workflow Manager)

233

Partitioning Limitations • You need to have a streamlined data flow to add partition points Can’t add partition points to these transformations because not all the columns flow through this part of the pipeline 234 .

Choose passthrough partitioning where you want to create an additional pipeline stage to improve performance. ♦ Key range. The PowerCenter Server queries the IBM DB2 system for table partition information and loads partitioned data to the corresponding nodes in the target database. The PowerCenter Server passes all rows at one partition point to the next partition point without redistributing them. The PowerCenter Server distributes data evenly among all partitions.Partition Types ♦ Round-robin. If you select hash auto-keys. refer Workflow Administration Guide 235 . but do not want to change the distribution of data across partitions. The PowerCenter Server applies a hash function to a partition key to group data among partitions. Use hash partitioning where you want to ensure that the PowerCenter Server processes groups of rows with the same partition key in the same partition. ♦ Pass-through. the PowerCenter Server uses all grouped or sorted ports as the partition key. Use round-robin partitioning where you want each partition to process approximately the same number of rows. If you select hash user keys. The PowerCenter Server passes data to each partition depending on the ranges you specify for each port. Use database partitioning with IBM DB2 targets stored on a multi-node tablespace. ♦ Database partitioning. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. You specify one or more ports to form a compound partition key. ♦ Hash. you specify a number of ports to form the partition key. For more information.

Migration strategies .

g. Test -> QA -> Prod • • Usual problems with object synchronization There are two main types of migration − Repository per Stage − Folder per stage 237 .Migration Strategies • There‟s always a need to migrate objects between stages − E.

Folder Migrations • One folder per stage − Complex directory structure (multiple stages per project folder) • Not allowed to nest directories − Lower server requirements (one repository) − Easier security management (one user login) − Folders are created and managed in the Repository Manager • You need to have appropriate privs 238 .

Repository migrations • • • In this case you have a separate Repository (not necessarily Repository Server) per Stage Reduces the Repository size/complexity Streamlines folder structure Test repository Prod repository 239 .

) Edit -> Paste 240 ..Copy Wizard • Copy Wizard assists you to copy Folders or Deployment Groups − Use Edit -> Copy (.

Copy Wizard II • • You can copy between repositories or within the same repository The Wizard helps you to resolve conflicts − Connections − Variables − Folder names − Other 241 .

g. one can use XML Export/Import 242 .XML Exports/Imports • If not possible to copy between folders or repositories (e. no access at all for Dev group to QA repository).

Sources for Mappings) When Importing an Import Wizard will help you to resolve any possible conflicts − Different folder names − Existing objects − other • 243 .g.XML Imports/Exports II • • You can Export/Import any type of object When Exporting/Importing there are all dependencies exported (e.

XML – other use • How can one use XML data imports? − Transfer of objects between repositories − Automatic Transformation creation from existing processes − Quick import of Source/Target definitions from different format − Backup of PowerCenter objects 244 .

Deployment Groups • • • • • For versioned Repositories you can group objects into Deployment Groups Greater flexibility and reduced migration effort You can define whole application or just a part of it No need to have one folder per application A deployment Group can be Static or Dynamic • • Additional complexity (dependant child objects) Read more in the Repository Guide -> Chapter 9: Grouping Versioned Objects 245 .

Exercise: Copy Wizard • • • Create a new folder TNumber_PROD Copy your entire folder TNumber to folder TNumber_PROD Modify the m_TNumberLoad_1 Mapping back to use hardcoded file name (instead of a Parameter) − In the TNumber folder • • Migrate your change to TNumber_PROD folder Use “Advanced” options 246 .

The Aggregator Test .

not your colleagues or mine It‟s close to real life development work The test requires from you − Application of gained knowledge – use training materials and online guides! − Creativity − Persistance 248 .The Test Objectives • • • • This test checks some skills you should have learned during the course It‟s supposed to prove your knowledge.

it‟s a typical ETL process • You‟ll have to: − Define and create Informatica and Oracle objects − Modify the source information − Run Informatica workflows − Verify that the data has been correctly loaded and transformed 249 . manipulating it on the way − So.The Test Description • Your task is to load some data into target database.

Define a workflow that will load the data from file agg_src_file_1.txt to Oracle • • • Create your own target Oracle table for called ODS_TNUMBER If a numerical value is not numerical then load the row anyway. using 0 for numerical value Use an Update Strategy transformation based on the original transaction ID (ORIG_TRX_ID) to insert new rows and update existing rows Verify: • • #rows in = #rows out Sum(values) in the source file = sum(vaules) in the target • 250 .The Test : Workflow I 1.

The Test: Workflow II • Move all the data from table ODS_TNUMBER to table QDF_TNUMBER.NAME table. linking via CUST.ISO_CNTRY_NUM • • Filter out all rows with sales values <=0 Create your QDF table 251 . adding following columns on the way: − TRADE_CHANL_ID from CUST table − GEO_NAME from GEO.

The Test: Workflow III • Create a report (Oracle table) that will give information how much was daily sales in each Sector − Sector is level 2 in the 710 hierarchy − Use DNORM table to get SECTOR information − Use most recent CTRL_PERD − Create appropriate Oracle report table 252 .

however you may create your own objects 253 .Test rules • No data row can be dropped − #rows in the source file = #rows in the target file. unless a source file row is supposed to update already loaded row • If an ID is not known it is supposed to be replaced with a replacement code − Product ID replacement key: „82100000‟ − Customer ID replacement key: '9900000003‟ • Don‟t change any Oracle or source data.

aggregations. joins… Use log files and the Debugger Use reusable components if feasible 254 .Task hints • • • • Some values may be a “bit” different than other – try to fix as many data issues as possible Remember about performance! Large lookups.

P. The information contained herein is subject to change without notice . L.The End © 2004 Hewlett-Packard Development Company.

Sign up to vote on this title
UsefulNot useful