Transformations and Components Guide

Transformations
Router[C/A] Filter [C/A] SequenceGenerator[p/c]

-Provides multiple targets -In Filter we can pass only one -Is for generating Unique serial
from single source. condition –It also allows to use numbers.
-Router allows to use a a condition to test data -We use this, when the data is
condition . -It tests the data for One not coming from source.
-It tests the data for one or condition. -Provides two output ports:
more conditions. -To maximize session NEXTVAL and CURRVAL.
Ex:- 1)to test data based on 3 performance use Filter as -Generates numeric values.
conditions use one Router close as to Source in the -Can replace missing values.
transformation instead of 3 Mapping Properties: start value/
Filters -Filter condition returns true increment by, end value,
2) groups : Input/ Output or false for each row passes Current value,cycle,
Output group 2types –user through it. no.of cached values,reset
defined and default
Aggregator [Ports :I/O/V] Expression[C/P] Stored
-Performs calculations on -Expression is for
groups of data computations.
Procedure[p/c/u]
[min,max,average,total] -It calculates a value. -Calls a procedure.
-Use Aggregator to eliminate -Expression performs a -It generates a status code to
the Duplicate rows in flat calculation based on values know the stored procedure is
files[Use group by ports] with in a single row. completed successfully or not.
-Components : Agg exp,group Ex:- Based on the price and -For maintaining databases we
by port, sorted input, aggregate quantity of a particular item, use stored procedures.
cache. you can calculate the total Status code: Provides error
-Don’t use sorted input when purchase price for that item. handling for the informatica
the session was incremental server during a workflow.
aggregation & input data is UN CONNECTED: to run
data driven. nested stored procedures.
Sorter[C/A] Rank[C/A] NORMALIZER
-Allows to sort the data. We -To select the TOP or -Use Normalizer instead of
can sort the data from source BOTTOM rank of data. Source Qualifier when you
transformation in ascending or -we can find Large or smallest normalize a COBOL source .
descending order according to numeric value in a group. -Normalizes records from
a specified sort key. -we can select top 10 COBOL and Relational sources
[in mapping designertransselect
-The data which is passing Clients in a company
Normalizer trans create.]
through Sorter is sorted
according to sort key. XML Source Qualifier [p / c]
-Use this only with an XML source
Active:Can change the number Passive: It does not change the definition. -U can link only one
of rows that passes through it no. of rows that passes through it XML source qualifier to one XML
Ex: Filter-Removes that do not Ex:- Expression performs a source definition. –It has one
input/output port for every column
meet the filter condition. calculation on data .
in the XML source.
Lookup [ports: I/O/L/R] Look SQL override: Joiner[C/A]
-Lookup is mostly used to “if -Overrides the default SQL -Joins two related
target row present update else statement to query the look up heterogeneous Sources in
insert “ kind of operation. table. different locations
-I used to see whether the row -Specifies the SQL statement to -We can use two relational
we are processing already use for querying look up values. sources existing in separate
exists in the target or not -Use only with lookup caches data bases.
-If it exists update the target, if enabled. -Two different ODBC sources.
not exists do an insert. Persistent Cache: Relational table and XML
We use lookup If it is persistent we can save the sources with at least one
-to search for related value. lookup cache files and reuse matching port we can join two
-to perform some calculations them ,If non persistent the rows sources
-to search whether records will be deleted -Joiner allows to join sources
exists or not in the target table. Shared Cache that contain binary data.
-lookup is basically used when -We can share the lookup
we want to get some value cache between multiple Cannot Join in the following
based on some condition transformations. Situations: -
-in lookup do SQL Override -We can shared an unnamed -both input pipe lines originate
first and then go for condition. cache between transformations from the same SourceQualifier.
Properties: in the same mapping.
-We can share named cache Normal : Based on the
between transformations in condition the server discards all
the same or different mappings. rows of data from master and
detail source that do not match.
Connected Lookup Master: Keeps all rows of
Un Connected Lookup data from the detail source and
1.we can use Dynamic or static 1.We can use a static cache
cache the matching Rows from the
2.Does not supports.
2.Supports User defined default master source . From the master
values source it discards the
3.Used for updating slowly
3.Is used to fetch values based unmatched rows.
changing dimension tables.
on incoming values. Detail outer : Keeps all rows
4.not connected in the mapping.
4.Connected Lookup is from the master source and the
5.use return port only in this
connected in the mapping. matching rows from the detail
transformation
source.
Dynamic Cache Static Cache Full Outer: keeps all rows of
1. we can insert rows into cache 1.we cannot insert or update data from both the Master and
as we pass to the target rows Detail source.
2. when the condition is false, 2. when the condition is SOURCE QUALIFIER
server inserts rows TRUE: server returns the value -to filter records when the
This indicates that the row is from lookup table server reads source data.
not in the source or target table. FALSE: server returns default -to specify sorted ports
value for connected & null for -to select only distinct values
unconnected from source.
Properties: sql query, user define join,
source filter, no.of sorted ports, pre, post sql
MIGRATION: Testing Mappings And

-You need to check several Respository: Stores Sessions
things before making folder information or metadata used by -We have to write the test cases
copies between environments: informatica server and client that business validity and also
-Have a backup Of dev, QA, tools. to make sure that we are getting
and Production repository. Metadata: Represents diff the right target.
-No informatica jobs are types of objects like mappings, Example to make it clear:
running or scheduled to run transformations, source -Let’s say you are extracting
during the folder copy time. definitions, Target definitions. data from source into staging
-You must wait until jobs area, in that case
finish, or stop all jobs before Metadata Repository: -One of your test cases will be,
doing the move. -Is a web based application that to check the row count of
-Remove locks, make sure users allows you to run reports. source and then the row count
are out of development, QA, -Provides number of reports, of the staging tables and
Production repository before including reports on executed compare the results. Row
moving any folders. -Be sure sessions, mappings, and source count should be Match.
all work has been saved and target schemas. Usually when we talk of DW
-View locks and remove all old -Providing information on how Testing, most of the times we
locks that still remain after all to install and use the web based have to write the SQL queries to
developers logged out.(no jobs metadata reporter to generate do the testing against source,
are running) reports on metadata. staging and Target.
-Copy shared folders first, then We have to check for:
project folder. 1.Source- Target Validation
-Stop and start informatica Bad data handling: 2.Scheduling Process
services-Be sure scheduled We can clean the bad data before 3.Constraints
jobs come backup. doing informatica loading by 4.Full load, Delta Load, Reload
Moving Mappings,Sessions, using some scripts (shell scripts) Processes.
etc from Development to which will check for number of 5.Business Logic
production environment columns/ date format/ numeric 6.All DLL definitions
-Copy development folder to values/ size for data files. -We have QA lead, we will talk
production folder using folder
copy option in repository Surrogate Key to him.
-It has system generated -Depending upon data different
manager. -In this way all the
artificial primary key values validations are done (eg: date,
source and target tables and
-It allows to maintain concats etc)
mappings will be copied to
production environment. historical records in DW . Unit Testing: is testing the
-Copy option will get all the -SK is unique for each row in single module or mapping.
objects to the production . the table. Integration Testing:
Export/Import will have to be -SK is a substitution for the Combining two modules or
done separately for mapping, natural primary key. mappings and checking whether
session etc. -When we copy a -Is is just a unique identifier or data is getting reflected in
folder from one Repository to number for each row. another module.
another Repository every thing
will be copied into the moved
Repository.
Session log files: Recovering Sessions:
-Server creates session log file MAPPING -If you stop a session or if an
for each session. PARAMETERS and error causes a session to stop,
-It writes information about VARIABLES. refer to the session and error
session Mapping Parameters: - logs to determine the cause of
into log files such as -Represents a constant value. failure.
initialization -We can define before running a -Correct the errors and then
process, creation of SQL session. complete the session:
commands -In parameter file we can define Use one of the following to
for errors and load summary. Mapping parameters and complete the session:
Session Detail File: Variables and Session  Run the session again if
-This file contains load Parameters. the server has issued a
statistics for each target in -Mapping Parameters retains commit.
mapping. the same value through out the  Consider performing
-Includes information such as entire session. recovery if the server has
table name, number of rows EX: In my project instead of issued at least one
written or rejected. creating separate mappings for commit.
Performance detail each Client or Customer, We Recovering stand alone
created a mapping for single session:
file: Customer. -A stand alone session is a
Contains information about -Before running the session, Enter
session that is not nested in a
performance the value of the parameter in the
Parameter File. batch.
Reject File: Mapping variable:- -If a stand alone session fails
This file contains the rows of we can recover by using a
-Mapping Variables and
data that the writer does not menu command or pmcmd.
Parameters represents a value
write to targets. -These options are not available
that can change through the
Control Files: for batched session
SESSION.
Contains the information about Parameter File:
the target flat files such as data Q. I am trying to run a
-Parameter File is to define the workflow with a parameter file
format and loading instructions values for Parameters and and one of the session keeps on
Session Parameters: Variables used in a session. failing?
Source File Name: use the -It is created by Text editor Ans: In the parameter file, the
parameters when you want to such as Word pad or Note pad. parameter may not be listed
change the name or location of -In Parameter file we can define
 The session uses the
session source file between Mapping parameters, Mapping
parameter file to start all
session runs. variables and Session
sessions in the WF.
Target File Name: is used Parameters
 Check the session
when u want to change the
properties see whether
name or location of session
session parameters are
target file between session runs.
defined correctly in the
Reject file name: is used when
parameter file
u want to change the name or
location of session reject file  Use pmcmd to start the
between session runs. session.
Target LoadOrder[Joiner Cobol Copy Book
Transformation-designer] Aggregator -Must be a text file.
“Setting the order in which trans- Performance. -A period must exist at the end
the server sends rows to To improve the performance of of the line to separate one line
different target definitions in Aggregator and Session use from the other.
a mapping” SORTED INPUT. -Field names cannot have
--To set the Target Load Order: Aggregate Cache parenthesis( )
1.Create a mapping that contains -Server Stores group values in an
-The level of record type from 1
multiple Source Qualifier Trans. index cache and row data in the
data cache. to 99 must start with a zero( 0 ).
2.When the mapping is completed,
select Mappings -Target Load -Filter before aggregation. Working with COBOL Copy
Plan[dialog box] -We cannot use sorted input when books...
3.Select a Source Qualifier from the source is Data driven. -The designer cannot recognize
the list. IncrementalAggregation a COBOL Copy book(.cpy
4.Click the up and down buttons -If we run the session with file) as a COBOL file (.cbl),
to move the Source Qualifier incremental aggregation enabled, because it lacks the proper
within the load order – ok -save we should not do modifying the format.
index which store Historical
-Constraint based Loading[WFM] -To import the COBOL copy
aggregate information.
EX: 1. ‘A’ has a primary key, B and book in the Map Designer, we
C has foreign keys referencing Sorted Input can insert it into a COBOL file
the ‘A’ primary key. -Shows that input data is pre sorted
template by using the COBOL
2. C - has Primary key , that D by groups.
Statement " COPY ".
references as a Foreign key. Data driven -After we insert the Copy book
These four tables receive records -Is a default option if the
from a single active source. file into the COBOL file
mapping for the session template, we can save the file
LOAD MANAGER contains an update
When the server runs a WF ,the as a ( .cbl ) file and import it in
Load manager: strategy transformation. the Designer.
-Locks the workflow and reads -If we do not choose Data driven -If the ( .cbl ) file and ( .cpy )
workflow properties. when a mapping contains an file are not in the same local
-Reads the parameter file and Update Strategy, The Work directory, the designer prompts
expands workflow variables. Flow Manager shows a warning. for the location of the ( .cpy )
-Runs workflow tasks. –Starts the
DTM to run session file.
-When the COBOL Copy book
DTM: file contains tabs,The designer
When the server runs a SESSION , expands tabs into spaces.
The DTM :
-By default the designer
-Fetches session and mapping from
the repository. expands a tab character into 8
-Creates the session log file. spaces
-Runs pre-session shell commands. -The OCCURS Clause
-Creates and runs mapping reader , specifies the number of
writer and transformation threads to repeated occurrences of data
extract, Transform and load data.
items with the same format in
-Runs post-session shell commands.
a table.
-The REDEFINES Clause is to
have multiple field definitions
for the same storage
SLOWLY CHANGING
Partitioning [workflow DEBUGGER DIMENSIONS
manager] “ I used to find out the changes TYPE 1 : Current data
between the transformations and -New row insert it
Round Robin partition: what level the change is occurring”. -Any update in source, update the
-To process each partition target.
approximately the same number When we run the debugger , the -Insert or Update.
of rows. designer displays: “to keep the recent values in the
Hash partition target”
-To group rows of data among Debug log:View message from the
partitions server uses hash Debugger. TYPE 2: History
function. -New row -> Insert it.
-In Hash partition there are two Target window: View target data. -Any update in source- Insert a
new row
types of keys.
Instance Window: View
Hash auto-keys: Use at RANK,
transformation data “to keep full history of changes in
SORTER
the target,( PK is changed)”.
Hash user keys: To generate the
-The DEBUGGER runs a workflow
partition key specify a number of
for each session type. TYPE 3: History and current.
ports.
We can run the Debugger -New row – Insert it.
Key Range Partition: -Before running the Workflow.
-If the Sources or Targets in the -After running the Workflow. “to keep the current and previous
pipeline are partitioned by key values in the target”.
range we use key range partition. DEBUG PROCESS:
1) Set the partition type at the target 1)Creating break points: Create in Those dimensions which will
instance to key range. mapping to find out the error change over time are SCD.
2) Create 3 partitions condition.
3) Choose ITEM_ID as the partition
key. 2)Configure the debugger:
4) Set the key ranges as follows
ITEM_ID Start Range End 3)Run the Debugger:
Range The server reads the break points
Partition#1 1000 3000 and Pauses the Debugger when
Partition#2 3000 6000 the break points evaluate to
Pass Through [default] TRUE.
Power Mart
-We use this when we want to Power center -does not
increase the performance. -allows controlling of several
-By default mapping contains this systems from a central point.
at Source Qualifier and Target -does not
-allows global repositories. -One reader and writer thread.
Instance.
-gives you the ability to partition
your loads for performance.
This allows multiple reader and
writer threads. -Power mart is for data mart
-Power center is for EDW. needs.
-Can source data from relational
-Power center can source data and flat files.
from mainframe, legacy,
ERP,EAI
MAPPLET/ WORKLET/ PMCMD WORKFLOW / PMCMD FOLLOWING Mapping Standards
Mapplet: Workflow: Following Mapping
-If we are using the same set of -My datamart is with 3 batch Standards:
Transformations in N number incremental load
-Minimize the use of Stored
-We already have batch sessions
of mappings then we will create Procedures
running as workflow
that particular set as a single -So I have been assigned with Naming Conventions—like
mapplet. multiple assignments which are giving specific names
-You can't put a mapplet inside under workflow. -Error Handling
another mapplet. Scheduling Workflow Audit Maintainance
-Mapplets are like pre-processor There are 3 options -How many input records you
macros not like sub-routines – 1)Run on demand->Manual are getting
when the session is loaded, -Should not be checked if U -How many records loaded into
every mapplet instance is have any of the scheduler warehouse-How many errors
replaced by contents of the settings on. Process Tracking—
mapplet. 2)Run continuously-> -Which map, which session,
You cannot use the following -As per schedule but start on which table got invoked
objects in a Mapplet server Initialization. during
1.Normalizer Trans, 2.Cobol -is a special option to keep in the ETL process
Sources 3.XML Source loop. -because we are not touching
Qualifier Transformation. 3)Run on server initialization-> the
4.XML sources, 5.Target -As per Schedule but starts on metadata
Definitions 6.Other Mapplets. Server initialization, But -According to the audit they
Worklet: continuous. need the statistics of the ETL
-Worklet is like another task but -check this only if U want to process in their warehouse
actually it contains set of tasks. run the session on initializing. [Find in Repository]
-Create Worklet when you want Workflow
to reuse a set of Workflow -In workflow manager the Event- If Running Concurrent
wait and Event-Raise tasks are to
logic in several workflows. Sessions[fails]
control the sequence of task
execution in the workflow. -In WF-Manager select session
“I used worklets with in -Tasks we create in Workflow and in general tab: select FAIL
workflow and just scheduled manager are non-reusable, Tasks we PARENT if this TASK fails
that one workflow” create in Task Developer are and fail parent if this task
reusable. does not.
pmcmd using in unix
unix prompt$ pmcmd <enter> pmcmd
-Run workflow by using PMCMD
pmcmd>connect –U user name
command.
-p password –s server name: Command : pmcmd Start workflow
port no <enter> –s ip of infa server. port-U****-
pmcmd>getwfdetails –f folder P****-F Folder_ NAME_Wait
name WF name<enter> Workflow_name`.
displays :[user name,password, -pmcmd is used to communicate
with the Informatica server.
start time, end time ,fail?succ?]
-Used to start and stop workflows
pmcmd>getsessiondetails –f and Tasks.
worklet.session name
Informatica performance tuning Oracle performance tuning

-If you look at session log, you will 1.Eliminate Unnecessary ports in -Creating Indexes
come to know which part of the Source Qualifier. -Enough Table Spaces
mapping took long time. 2.Add where clause where ever -Avoiding too many spaces
-The SQL in the Source Qualifier necessary. -Set SQL TRACE ON to display
might take long time to return 3.Create indexes on lookup the statistics.
results since you are joining 3 condition ports. -Use auto trace function in
large tables. 4.Indexing each column in the SQL*PLUS to automatically see
-The Lookup cache creation will lookup condition can improve the Explain Plans.
also take much time to cache 20 session Performance. -Use HINTS where ever needed.
million records.[This can be tuned Particularly large lookup tables. Explain Plan:-
by having Dynamic Lookup- 5.Do not select the row into a EXPLAIN PLAM gives you the
Persistent cache] lookup unless that row is plan how your query will be
-If you feel that Normalization takes needed. executed, also gives you the
much time then try to replace the 6.Use Bulk loading. “Cost” of the query.
Normalizer with an Expression -Disable the index updates during SQL> EXPLAIN PLAN/ SET
using right logic. the bulk load and then refresh statement_id = ‘emp_sal’ FOR
-Avoid using unwanted ports in the the indexes after the load. SELECT ename,job,sal,dname /
Transformations. -This is generally faster than FROM emp,dept
Target Bottlenecks: dropping and rebuilding indexes. WHERE emp.deptno=dept.deptno
We can identify target bottlenecks 7.The lookup cache creation will AND NOT EXISTS (SELECT *
by configuring the session to write also take much time to catch FROM sal grade WHERE emp.sal
to a flat file target. If the session millions of records. [this can be BETWEEN losal AND hisal);
increases when you write to a flat tuned by having dynamic SQL>EXPLAIN PLAN FOR
file, You have a target bottleneck. lookup – persistent cache] SELECT * FROM emp;
Tuning:- Drop indexes and key 8.Create appropriate indexes on SQL>DESC plan_table;
constraints. -- Use bulk loading. target table. SET TRACEON: Creates a trace
-Increase database network packet Read performance file.
size. – Optimize target database. -Reduce the no. of records TK PROF:- Just a translator that
Source: Extract the query from processed. converts trace file into a readable
the session log and run it. Measure -Move the Filters/Aggregations to format.
the time. If there is difference then the beginning [ next steps -Parsing/Optimizing/Executing
go for tuning processes number of records] $TKPROF Ora_12558.trc
Tuning: Optimize the query. Use Write performance trace.txt file stored in a
conditional filters. -Drop indexes before loading [ if formatted version trace.txt.
update is not there] TKPROF Command Line
Error (Session) -Disable constraints before load and Options:
-Data type mismatches. – Column re-enable the constraints after load.
Value truncated, Table or View -Consider increasing the commit -EXPLAIN/TABLE/SYS/SORT/
does not exists in the database. level. RECORD/PRINT/INSERT.
-Out of table space. -Use Hints.
Handling Qes: When you are running a
-Check the session Log (Run a test session, Ur mapping is fine.Ur
loads with Verbose mode ON to session is fine and Ur session ran
give more information about data). successfully.But the data is not
-Check for directory paths. loaded into ur Database.
-Check for the Source file names. Ans:- Could be in test load. – If it is
-Check for the source and Target an Incremental load then the source
Database connections-Remove the record might be already existing in
Test load option after testing is the Target table.
done.-Run debugger
-Target Load Option as “Normal”.
Importing Sources Informatica Versions differences
Importing Excel sheets [source]
Open the Excel file.
Informatica 5x Informatica 6x
-Select the columns and rows that 1.In 5x we will have to delete 1.In version 6x we can over
you need to use as source. the existing mapping and write the existing mapping
-In the first row, enter the column session and copy the Updated and session.
names as it is for table(don't one 2.From version 6x it is
include spaces).
2.We can not pass a parameter possible.
-Then select from that row onwards
till the column you require. in Lookup SQL Over ride till 3.You can create flat file target
- Now click (on the menu) Insert 5x. definitions in the Designer to
Name -->Define. 3.Informatica 5x has output data to flat files.
If U have many sheets create ranges server manager 4.You can include multiple
for each of the sheets SQT in a mapplet.
2. Format numeric columns as
5. 6x has workflow manager
'Number'
3. Save the file. and to watch the execution
4. Create an ODBC DSN pointing workflow monitor.
to the file 6. Cannot do lookup on flat
files.
Now in Informatica- Designer,
Informatica 7x 7x.Union Transformation
choose 'Import from Database' and
specify the DSN name and then -UNION,CUSTOM,XML -is used to merge data from
choose the tables to be imported. STREAM transformation. pipeline branches into one
Importing Flat files -Lookup on flat files. pipeline branch.
Flat file
-Grid Technologies[ server -It merges data from multiple
-While importing flat file into the sources similar to the UNION
source it will ask for delimiters. Grid]- here servers running
-At that place there is a check box on different operating systems ALL SQL statement to
to enable to treat the consecutive can coexist on the same server combine the results from two
characters as Delimiters. Just grid. or more SQL statements.
enable it. -You can import and export -The Union Transformation
-When u click on “import from file”
repository objects [both does not remove duplicate
in the source menu , in the source
analyzer in the second screen there dependent and independent] rows.
-You can even use PMCMD -We connect heterogeneous
exists a check box to treat
consecutive delimiters as one. Repository. sources to this transformation
-Check or Uncheck this as per your -In web application we can -In this we can create multiple
Requirement. move the mapping which will input groups but only one
Upgrading from version 6.2 to 7.0 output group.
-Make a backup of the repository on be available to others.
the server. Custom transformation
-Repository->file->export->to -creates transformation
location on the same server. applications, such as sorting
-Uninstall Informatica 6.2 –Install and aggregation.
Informatica 7
-Repository 7.1->import->from the -Is used to create a
same server. transformation that requires
multiple input groups, output
groups or both.

Transformations and Components Guide

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Transformations and Components Guide

Uploaded by

Copyright:

Available Formats

Transformations

Router[C/A] Filter [C/A] SequenceGenerator[p/c]

MIGRATION: Testing Mappings And

Informatica performance tuning Oracle performance tuning

You might also like