Professional Documents
Culture Documents
INFORMATICA TRANSFORMATIONS :
Transformations in Informatica
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Rank Transformation
Router Transformation
Sorter Transformation
Union Transformation
Case Converter Transformation
INFORMATICA COMMANDS :
MISCELLANEOUS TOPICS :
Incremental Aggregation
Informatica PowerCenter is one of the Enterprise Data Integration products developed by Informatica Corporation.
Informatica PowerCenter is an ETL tool used for extracting data from the source, transforming and loading data in
to the target.
The Extraction part involves understanding, analyzing and cleaning of the source data.
Transformation part involves cleaning of the data more precisely and modifying the data as per the
business requirements.
The loading part involves assigning the dimensional keys and loading into the warehouse.
The problem comes with traditional programming languages where you need to connect to multiple sources and
you have to handle errors. For this you have to write complex code. ETL tools provide a ready-made solution for
this. You dont need to worry about handling these things and can concentrate only on coding the requirement
part.
Informatica is an ETL tool used for extracting the data from various sources (flat files, relational database, xml etc),
transform the data and finally load the data into a centralized location such as data warehouse or operational data
store. Informatica PowerCenter has a service oriented architecture that provides the ability to scale services and
share resources across multiple machines.
The important components of the informatica power center are listed below:
Domain: Domain is the primary unit for management and administration of services in PowerCenter. The
components of domain are one or more nodes, service manager an application services.
Node: Node is logical representation of machine in a domain. A domain can have multiple nodes. Master gateway
node is the one that hosts the domain. You can configure nodes to run application services like integration service
or repository service. All requests from other nodes go through the master gateway node.
Service Manager: Service manager is for supporting the domain and the application services. The Service Manager
runs on each node in the domain. The Service Manager starts and runs the application services on a machine.
Application services: Group of services which represents the informatica server based functionality. Application
services include powercenter repository service, integration service, Data integration service, Metadata manage
service etc.
Powercenter Repository: The metadata is store in a relational database. The tables contain the instructions to
extract, transform and load data.
Powercenter Repository service: Accepts requests from the client to create and modify the metadata in the
repository. It also accepts requests from the integration service for metadata to run workflows.
Powercenter Integration Service: The integration service extracts data from the source, transforms the data as per
the instructions coded in the workflow and loads the data into the targets.
Informatica Administrator: Web application used to administer the domain and powercenter security.
Metadata Manager Service: Runs the metadata manager web application. You can analyze the metadata from
various metadata repositories.
6. Allows unified administration with a new admin console that enables you to manage power centre and power
exchange from the same console.
7. Powerful new capabilities for data quality.
8. Single admin console for data quality, power centre, power exchange and data services.
9. In Informatica 9, Informatica data quality (IDQ) has been further integrated with the Informatica Platform and
performance, manageability and reusability have all been significantly enhanced.
10. The mappings rules are shared between the browser based tool for analysts and the eclipse based
development leveraging unified metadata underneath.
11. The data services capabilities in Informatica 9 , both over sql and web services ,can be used for real time dash
boarding.
12. Informatica data quality provides world wide address validation support with integrated geocoding.
13. The ability to define rules and view and run profiles is available in both the Informatica developer (Thick client)
and Informatica analyst (browser based tool-Thin client).these tools sit on a unified metadata infrastructure. Both
tools incorporate security features like authentication and authorization ensuring..
14. The developer tool is now eclipse based and supports both data integration and data quality for enhanced
productivity. It provides browser based tool for analysts to support the types of tasks they engage in, such as
profiling data, specifying and validating rules & monitoring data quality.
15. There will a velocity methodology. Soon it’s going to introduce on I9.
16. Informatica has the capability to pull data from IMS, DB2 on series and series and from other several other
legacy systems (Mainframe) environment like VSAM, Datacom, and IDMS etc.
17. There are separate tools available for different roles. The Mapping architect for Vision tool is designed for
architects and developers to create templates for common data integration patterns saving developer’s
tremendous amount of time.
20. Informatica 9 complements existing BI architectures by providing immediate access to data through data
virtualization, which can supplement the data in existing data warehouse and operational data store.
21. Informatica 9 supports profiling of Mainframe data. Leveraging the Informatica platform’s connectivity to
Mainframe sources.
22. Informatica 9 will continue support feature of running the same workflow simultaneously.
24. Browser based tool is a fully functional interface for business analysts.
26. There are 3 interfaces through which these capabilities can be accessed. Analyst tool is a browsed tool for
analyst and stewards. Developers can use the eclipse based developer tool. Line of business managers can view
data quality scorecards.
Advanced workflow guide discusses about topics like Pipeline Partitioning, Pushdown Optimization, Real-Time
Processing, Grid Processing, External Loading etc.
Data profiling guide helps you to understand, analyze the content, quality and structure of data.
Designer Guide:
You can learn how to import or create Sources, Targets, create Transformations, Mappings, Mapplets and so on.
This document will help you on how to use the Informatica PowerCenter tool.
PowerCenter mappings in Microsoft Office Excel, and to export PowerCenter mappings to Microsoft Office Excel
Mapping Architect for Visio helps you to create mapping templates using Microsoft Office Visio.
Repository Guide:
Helps in you understanding the repository architecture, metadata and repository object locks.
Transformation Guide:
Web services describe a collection of operations that are network accessible through standardized XML messaging.
Aggregator Transformation
Expression Transformation
Lookup Transformation
Filter Transformation
Expression Transformation
Router Transformation
Sorter Transformation
Rank Transformation
Joiner Transformation
Union Transformation
Normalizer Transformation
Sql transformation.
Creating SCD Type 1, Type 2 and Type 3 without using the mapping wizard
Specifying the target load order and stored procedure execution order.
Creating a Workflow
Session
Assignment task
Timer task
Email task
Command task
Miscellaneous topics
Post session and Pre-session commands
Session recovery
Commit points
Pushdown optimization
Incremental aggregation
If you have any problems or logic need to be solved in informatica, please drop an email
at vijaybhaskar184@gmail.com with clear description of your problem, source input and how target output should
look like. I will be available online from 7PM to 8PM IST.
Note: Right now i am providing training through gmail chat only. You need to have informatica installed on your
PC. I will update here regarding the full-fledged training.
Transformations in Informatica 9
Active Transformations:
A transformation can be called as an active transformation if it performs any of the following actions.
Change the number of rows: For example, the filter transformation is active because it removes the rows
that do not meet the filter condition. All multi-group transformations are active because they might
change the number of rows that pass through the transformation.
Change the transaction boundary: The transaction control transformation is active because it defines a
commit or roll back transaction.
Change the row type: Update strategy is active because it flags the rows for insert, delete, update or
reject.
Note: You cannot connect multiple active transformations or an active and passive transformation to the
downstream transformation or transformation same input group. This is because the integration service may not
be able to concatenate the rows generated by active transformations. This rule is not applicable for sequence
generator transformation.
Passive Transformations:
Transformations which does not change the number of rows passed through them, maintains the transaction
boundary and row type are called passive transformation.
Connected Transformations:
Transformations which are connected to the other transformations in the mapping are called connected
transformations.
Unconnected Transformations:
An unconnected transformation is not connected to other transformations in the mapping and is called within
another transformation, and returns a value to that.
Recommended Reading:
Select the Aggregator transformation, enter the name and click create. Then click Done. This will create an
aggregator transformation without ports.
To create ports, you can either drag the ports to the aggregator transformation or create in the ports tab
of the aggregator.
Aggregate Cache: The integration service stores the group values in the index cache and row data in the
data cache.
Aggregate Expression: You can enter expressions in the output port or variable port.
Group by Port: This tells the integration service how to create groups. You can configure input,
input/output or variable ports for the group.
Sorted Input: This option can be used to improve the session performance. You can use this option only
when the input to the aggregator transformation in sorted on group by ports.
Property Description
Cache Directory The Integration Service creates the index and data cache files.
Tracing Level Amount of detail displayed in the session log for this transformation.
Indicates input data is already sorted by groups. Select this option only if the input to the
Sorted Input Aggregator transformation is sorted.
Aggregator Data Cache
Size Default cache size is 2,000,000 bytes. Data cache stores row data.
Aggregator Index Cache
Size Default cache size is 1,000,000 bytes. Index cache stores group by ports data
Transformation Scope Specifies how the Integration Service applies the transformation logic to incoming data
Group By Ports:
The integration service performs aggregate calculations and produces one row for each group. If you do not specify
any group by ports, the integration service returns one row for all input rows. By default, the integration service
returns the last row received for each group along with the result of aggregation. By using the FIRST function, you
can specify the integration service to return the first row of the group.
Aggregate Expressions:
You can create the aggregate expressions only in the Aggregator transformation. An aggregate expression can
include conditional clauses and non-aggregate functions. You can use the following aggregate functions in the
Aggregator transformation,
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
You can nest one aggregate function within another aggregate function. You can either use single-level aggregate
functions or multiple nested functions in an aggregate transformation. You cannot use both single-level and nested
aggregate functions in an aggregator transformation. The Mapping designer marks the mapping as invalid if an
aggregator transformation contains both single-level and nested aggregate functions. If you want to create both
single-level and nested aggregate functions, create separate aggregate transformations.
Examples: MAX(SUM(sales))
Conditional clauses:
You can reduce the number of rows processed in the aggregation by specifying a conditional clause.
This will include only the salaries which are greater than 1000 in the SUM calculation.
Note: By default, the Integration Service treats null values as NULL in aggregate functions. You can change this by
configuring the integration service.
Incremental Aggregation:
After you create a session that includes an Aggregator transformation, you can enable the session option,
Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes source data
through the mapping and uses historical cache data to perform aggregation calculations incrementally.
Sorted Input:
You can improve the performance of aggregator transformation by specifying the sorted input. The Integration
Service assumes all the data is sorted by group and it performs aggregate calculations as it reads rows for a group.
If you specify the sorted input option without actually sorting the data, then integration service fails the session.
4. You can add ports to expression transformation either by selecting and dragging ports from other
transformations or by opening the expression transformation and create ports manually.
Adding Expressions
Once you created an expression transformation, you can add the expressions either in a variable port or output
port. Create a variable or output port in the expression transformation. Open the Expression Editor in the
expression section of the variable or output port. Enter an expression and then click on Validate to verify the
expression syntax. Now Click OK.
Transformation: You can enter the name and description of the transformation. You can also make the
expression transformation reusable.
Ports: Create new ports and configuring the ports.
Properties: Configure the tracing level to set the amount of transaction detail to be logged in session log
file.
Metadata Extensions: You can specify extension name, data type, precision, value and can also create
reusable metadata extensions.
Configuring Ports:
Precision and scale: set the precision and scale for each port.
Solution:
In the expression transformation, create a new output port (call it as adj_sal) and enter the expression as
salary+salary*(10/100)
2. Create a mapping to concatenate the first and last names of the employee? Include space between the names
Solution:
Just create a new port in the expression transformation and enter the expression as CONCAT(CONCAT(first_name,'
'),last_name)
The above expression can be simplified as first_name||' '||last_name
4. You can add ports either by dragging from other transformations or manually creating the ports within
the transformation.
To configure the filter condition, go to the properties tab and in the filter condition section open the expression
editor. Enter the filter condition you want to apply. Click on validate button to verify the syntax and then click OK.
Transformation: You can enter the name and description of the transformation.
Properties: You can specify the filter condition to filter the rows. You can also configure the tracing levels.
Metadata Extensions: Specify the metadata details like name, datatype etc.
The following properties needs to be configured on the ports tab in filter transformation
Datatype, precision, and scale: Configure the data type and set the precision and scale for each port.
Use the filter transformation as close as possible to the sources in the mapping. This will reduce the
number of rows to be processed in the downstream transformations.
In case of relational sources, if possible use the source qualifier transformation to filter the rows. This will
reduce the number of rows to be read from the source.
Note: The input ports to the filter transformation mush come from a single transformation. You cannot connect
ports from more than one transformation to the filter.
1. Create a mapping to load the employees from department 50 into the target?
department_id=50
2. Create a mapping to load the employees whose salary is in the range of 10000 to 50000?
3. Create a mapping to load the employees who earn commission (commission should not be null)?
IIF(ISNULL(commission),FALSE,TRUE)
Recommended Reading:
Facebook
Twitter
The joiner transformation is an active and connected transformation used to join two heterogeneous sources. The
joiner transformation joins sources based on a condition that matches one or more pairs of columns between the
two sources. The two input pipelines include a master and a detail pipeline or branch. To join more than two
sources, you need to join the output of the joiner transformation with another source. To join n number of sources
in a mapping, you need n-1 joiner transformations.
Now drag the ports from the second source into the joiner transformation. By default the designer
configures the second source ports as master fields.
Edit the joiner transformation, go the ports tab and check on any box in the M column to switch the
master/detail relationship for the sources.
Go to the condition tab, click on the Add button to add a condition. You can add multiple conditions.
Go to the properties tab and configure the properties of the joiner transformation.
Case-Sensitive String Comparison: When performing joins on string columns, the integration service uses
this option. By default the case sensitive string comparison option is checked.
Cache Directory: Directory used to cache the master or detail rows. The default directory path is
$PMCacheDir. You can override this value.
Join Type: The type of join to be performed. Normal Join, Master Outer Join, Detail Outer Join or Full
Outer Join.
Joiner Data Cache Size: Size of the data cache. The default value is Auto.
Joiner Index Cache Size: Size of the index cache. The default value is Auto.
Sorted Input: If the input data is in sorted order, then check this option for better performance.
Master Sort Order: Sort order of the master source data. Choose Ascending if the master source data is
sorted in ascending order. You have to enable Sorted Input option if you choose Ascending. The default
value for this option is Auto.
Transformation Scope: You can choose the transformation scope as All Input or Row.
Join Condition
The integration service joins both the input sources based on the join condition. The join condition contains ports
from both the input sources that must match. You can specify only the equal (=) operator between the join
columns. Other operators are not allowed in the join condition. As an example, if you want to join the employees
and departments table then you have to specify the join condition as department_id1= department_id. Here
department_id1 is the port of departments source and department_id is the port of employees source.
Join Type
Normal Join
Assume that subjects source is the master and students source is the detail and we will join these sources on the
subject_id port.
Normal Join:
The joiner transformation outputs only the records that match the join condition and discards all the rows that do
not match the join condition. The output of the normal join is
In a detail outer join, the joiner transformation keeps all the records from the master source and only the matching
rows from the detail source. It discards the unmatched rows from the detail source. The output of detail outer join
is
The full outer join first brings the matching rows from both the sources and then it also keeps the non-matched
records from both the master and detail sources. The output of full outer join is
Master Ports | Detail Ports
---------------------------------------------
Subject_Id Subject_Name Student_Id Subject_Id
---------------------------------------------
1 Maths 10 1
2 Chemistry 20 2
3 Physics NULL NULL
NULL NULL 30 NULL
Sorted Input
Use the sorted input option in the joiner properties tab when both the master and detail are sorted on the ports
specified in the join condition. You can improve the performance by using the sorted input option as the
integration service performs the join by minimizing the number of disk IOs. you can see good performance when
worked with large data sets.
Sort the master and detail source either by using the source qualifier transformation or sorter
transformation.
Sort both the source on the ports to be used in join condition either in ascending or descending order.
Specify the Sorted Input option in the joiner transformation properties tab.
The integration service blocks and unblocks the source data depending on whether the joiner transformation is
configured for sorted input or not.
Unsorted Joiner Transformation
In case of unsorted joiner transformation, the integration service first reads all the master rows before it reads the
detail rows. The integration service blocks the detail source while it caches the all the master rows. Once it reads
all the master rows, then it unblocks the detail source and reads the details rows.
Blocking logic may or may not possible in case of sorted joiner transformation. The integration service uses
blocking logic if it can do so without blocking all sources in the target load order group. Otherwise, it does not use
blocking logic.
If possible, perform joins in a database. Performing joins in a database is faster than performing joins in a
session.
You can improve the session performance by configuring the Sorted Input option in the joiner
transformation properties tab.
Specify the source with fewer rows and with fewer duplicate keys as the master and the other source as
detail.
You cannot connect a sequence generator transformation directly to the joiner transformation.
Recommended Reading:
You can import the definition of lookup from any flat file or relational database or even from a source qualifier. The
integration service queries the lookup source based on the ports, lookup condition and returns the result to other
transformations or target in the mapping.
Get a Related Value: You can get a value from the lookup table based on the source value. As an example,
we can get the related value like city name for the zip code value.
Get Multiple Values: You can get multiple rows from a lookup table. As an example, get all the states in a
country.
Perform Calculation. We can use the value from the lookup table and use it in calculations.
Update Slowly Changing Dimension tables: Lookup transformation can be used to determine whether a
row exists in the target or not.
You can configure the lookup transformation in the following types of lookup:
Flat File or Relational lookup: You can perform the lookup on the flat file or relational database. When
you create a lookup using flat file as lookup source, the designer invokes flat file wizard. If you used
relational table as lookup source, then you can connect to the lookup source using ODBC and import the
table definition.
Pipeline Lookup: You can perform lookup on application sources such as JMS, MSMQ or SAP. You have to
drag the source into the mapping and associate the lookup transformation with the source qualifier.
Improve the performance by configuring partitions to retrieve source data for the lookup cache.
Connected or Unconnected lookup: A connected lookup receives source data, performs a lookup and
returns data to the pipeline. An unconnected lookup is not connected to source or target or any other
transformation. A transformation in the pipeline calls the lookup transformation with the :LKP expression.
The unconnected lookup returns one column to the calling transformation.
Cached or Uncached Lookup: You can improve the performance of the lookup by caching the lookup
source. If you cache the lookup source, you can use a dynamic or static cache. By default, the lookup
cache is static and the cache does not change during the session. If you use a dynamic cache, the
integratiion service inserts or updates row in the cache. You can lookup values in the cache to determine
if the values exist in the target, then you can mark the row for insert or update in the target.
Recommended Reading:
This is a frequently asked question in informatica interview. Follow the below steps to tune a lookup
transformation:
Cache the lookup transformation: This will query the lookup source once and stores the data in the cache.
Whenever a row enters the lookup, the lookup retrieves the data from the lookup source rather than querying the
lookup source again. This will improve the performance of lookup a lot.
Restrict Order by columns: By default, the integration orders by on all ports in the lookup transformation.
Override this default order by clause to include few ports in the lookup.
Persistent Cache: If your lookup source is not going change at all (example: countries, zip codes). Use persistent
cache in this case.
Prefer Static Cache over Dynamic Cache: If you use dynamic cache, the lookup may update the cache. Updating
the lookup cache is overhead. Avoid dynamic cache.
Restrict Number of lookup ports: Make sure that you include only the required ports in the lookup transformation.
Unnecessary ports in the lookup make the lookup to take time in querying the lookup source, building the lookup
cache.
Sort the flat file lookups: If the lookup source is a flat file, using the sorted input option improves the performance.
Indexing the columns: If you have used any columns in the where clause, creating any index (in case of relational
lookups) on these columns improves the performance of querying the lookup source.
Database level tuning: For relational lookups you can improve the performance by doing some tuning at database
level.
When creating the lookup transformation itself you have to specify whether the lookup transformation returns
multiple rows or not. Once you make the lookup transformation as active transformation, you cannot change it
back to passive transformation. The "Lookup Policy on Multiple Match" property value will become "Use All
Values". This property becomes read-only and you cannot change this property.
As an example, for each country you can configure the lookup transformation to return all the states in that
country. You can cache the lookup table to improve performance. If you configure the lookup transformation for
caching, the integration service caches all the rows form the lookup source. The integration service caches all rows
for a lookup key by the key index.
Follow the below guidelines when you configure the lookup transformation to return multiple rows:
You can cache all the rows from the lookup source for cached lookups.
You can customize the SQL Override for both cached and uncache lookup that return multiple rows.
You cannot use dynamic cache for Lookup transformation that returns multiple rows.
You can configure multiple Lookup transformations to share a named cache if the Lookup transformations
have matching caching lookup on multiple match policies.
Lookup transformation that returns multiple rows cannot share a cache with a Lookup transformation
that returns one matching row for each input row.
Lookup
Lookup Property Type Description
Lookup SQL Override the default sql query generated by the lookup transformation. Use this
Override Relational option when lookup cache is enabled.
You can choose a source, target or source qualifier as the lookup table name. This
is the lookup source which will be used to query or cache the data.
Lookup Table Pipeline
Name Relational If you have override the sql query, then you can ignore this option
Lookup Source You can filter looking up in the cache based on the value of data in the lookup
Filter Relational ports. Works only when lookup cache is enabled.
When lookup cache is enabled, the integration service queries the lookup source
once and caches the entire data. Caching the lookup source improves the
performance. If the caching is disabled, the integration service queries the lookup
Flat File source for each row.
Lookup Caching Pipeline
Enabled Relational The integration service always caches the flat file and pipeline lookups
Which row to return when the lookup transformation finds multiple rows that
match the lookup condition.
Report Error: Reports error and does not return a row.
Flat File Use Last Value: Returns the last row that matches the lookup condition.
Lookup Policy on Pipeline Use All Values: Returns all matched rows.
Multiple Match Relational Use Any Value: Returns the first value that matches the lookup condition.
Flat File
Pipeline You can define the lookup condition in the condition tab. The lookup condition is
Lookup Condition Relational displayed here.
Connection
Information Relational specifies the database that contains the lookup table.
Flat File
Pipeline
Source Type Relational Indicates the lookup source type: flat file or relational table or source qualifier.
Flat File
Pipeline
Tracing Level Relational Set amount of detail to be included in the lookup
Flat File
Lookup Cache Pipeline
Directory Name Relational Specifies the directory used to build the lookup cache files
Lookup Cache Flat File Use when the lookup source data does not change at all. Examples: zipcodes,
Persistent Pipeline countries, states etc.
Relational
The lookup caches the data once and it uses the cache even in multiple session
runs.
Lookup Data Flat File
Cache Size Lookup Pipeline
Index Cache Size Relational Cache sizes of the lookup data and lookup index
Flat File
Dynamic Lookup Pipeline Indicates to use a dynamic lookup cache. Inserts or updates rows in the lookup
Cache Relational cache as it passes rows to the target table.
Use with dynamic caching enabled. When you enable this property, the Integration
Service outputs old values out of the lookup/output ports. When the Integration
Flat File Service updates a row in the cache, it outputs the value that existed in the lookup
Output Old Value Pipeline cache before it updated the row based on the input data. When the Integration
On Update Relational Service inserts a row in the cache, it outputs null values.
An expression that indicates whether to update dynamic cache. Create an
expression using lookup ports or input ports. The expression can contain input
Flat File values or values in the lookup cache. The Integration Service updates the cache
Update Dynamic Pipeline when the condition is true and the data exists in the cache. Use with dynamic
Cache Condition Relational caching enabled. Default is true.
Flat File
Cache File Name Pipeline Use with persistent lookup cache. Specifies the file name prefix to use with
Prefix Relational persistent lookup cache files.
Flat File
Recache From Pipeline
Lookup Source Relational The integration service rebuilds the lookup cache.
Flat File
Pipeline Use with dynamic caching enabled. Applies to rows entering the Lookup
Insert Else Update Relational transformation with the row type of insert.
Flat File
Pipeline Use with dynamic caching enabled. Applies to rows entering the Lookup
Update Else Insert Relational transformation with the row type of update.
Datetime Format Flat File Specify the date format for the date fields in the file.
Thousand
Separator Flat File specify the thousand separator for the port.
Decimal Separator Flat File Specify the Decimal Separator for the port.
Case-Sensitive The Integration Service uses case sensitive string comparisons when performing
String Comparison Flat File lookups on string columns.
Flat File
Null Ordering Pipeline Specifies how to sort null data.
Flat File
Sorted Input Pipeline Indicates whether the lookup source data is in sorted order or not.
Flat File
Lookup Source is Pipeline
Static Relational The lookup source does not change in a session.
Flat File Allows the Integration Service to build the lookup cache before the Lookup
Pre-build Lookup Pipeline transformation receives the data. The Integration Service can build multiple lookup
Cache Relational cache files at the same time to improve performance.
Subsecond
Precision Relational Specifies the subsecond precision for datetime ports.
1. Login to the Power center Designer. Open either Transformation Developer tab or Mapping Designer tab.
Click on the Transformation in the toolbar, and then click on Create.
2. Select the lookup transformation and enter a name for the transformation. Click Create.
3. Now you will get a "Select Lookup Table" dialog box for selecting the lookup source, choosing active or passive
option. This is shown in the below image:
4. You can choose one of the below option to import the lookup source definition:
Source qualifier in the mapping (applicable only for non-reusable lookup transformation)
6. Click OK or Click Skip if you want to manually add ports to lookup transformation.
8. For unconnected lookup transformation, create a return port for the value you want to return from the lookup.
10. For dynamic lookup transformation, you have to associate an input port, output port or sequence Id with each
lookup port.
Relational lookups:
When you want to use a relational table as a lookup source in the lookup transformation, you have to connect to
the lookup source using a ODBC and import the table definition as the structure for the lookup transformation. You
can use the below options for relational lookups:
You can override the default sql query and write your own customized sql to add a WHERE clause or query
multiple tables.
When you want to use a flat file as a lookup source in the lookup transformation, select the flat file definition in
the repository or import the source when you create the transformation. When you want to import the flat file
lookup source, the designer invokes the flat file wizard. You can use the below options for flat file lookups:
You can use indirect files as lookup sources by configuring a file list as the lookup file name.
You can use case-sensitive string comparison with flat file lookups.
For flat file lookup source, you can improve the performance by sorting the flat files on the columns which are
specified in the lookup condition. The condition columns in the lookup transformation must be treated as a group
for sorting the flat file. Sort the flat file on the condition columns for optimal performance.
TagsInformatica
By default, the rank transformation creates a RANKINDEX port. The RankIndex port is used to store the
ranking position of each row in the group.
You can add additional ports to the rank transformation either by selecting and dragging ports from other
transformations or by adding the ports manually in the ports tab.
In the ports tab, check the Rank (R) option for the port which you want to do ranking. You can check the
Rank (R) option for only one port. Optionally you can create the groups for ranked rows. select the Group
By option for the ports that define the groups.
Cache Directory: Directory where the integration service creates the index and data cache files.
Top/Bottom: Specify whether you want to select the top or bottom rank of data.
Case-Sensitive String Comparison: Used to sort the strings using case sensitive or not.
Rank Data Cache Size: The data cache size default value is 2,000,000 bytes. You can set a numeric value,
or Auto for the data cache size. In case of Auto, the Integration Service determines the cache size at
runtime.
Rank Index Cache Size: The index cache size default value is 1,000,000 bytes. You can set a numeric value,
or Auto for the index cache size. In case of Auto, the Integration Service determines the cache size at
runtime.
Solution:
Create a new mapping, Drag the source definition into the mapping.
Create a rank transformation and drag the ports of source qualifier transformation into the rank
transformation.
Now go to the ports tab of the rank transformation. Check the rank (R) option for the salary port and
Group By option for the Dept_Id port.
Go to the properties tab, select the Top/Bottom value as Top and the Number of Ranks property as 2.
Now connect the ports of rank transformation to the target definition.
4. Select the ports from the upstream transformation and drag them to the router transformation. You can
also create input ports manually on the ports tab.
The router transformation has input and output groups. You need to configure these groups.
Input groups: The designer copies the input ports properties to create a set of output ports for each
output group.
Output groups: Router transformation has two output groups. They are user-defined groups and default
group.
User-defined groups: Create a user-defined group to test a condition based on the incoming data. Each user-
defined group consists of output ports and a group filter condition. You can create or modify the user-defined
groups on the groups tab. Create one user-defined group for each condition you want to specify.
Default group: The designer creates only one default group when you create one new user-defined group. You
cannot edit or delete the default group. The default group does not have a group filter condition. If all the
conditions evaluate to FALSE, the integration service passes the row to the default group.
Specify the group filter condition on the groups tab using the expression editor. You can enter any expression that
returns a single value. The group filter condition returns TRUE or FALSE for each row that passes through the
transformation.
Use router transformation to test multiple conditions on the same input data. If you use more than one filter
transformation, the integration service needs to process the input for each filter transformation. In case of router
transformation, the integration service processes the input data only once and thereby improving the
performance.
Router Transformation Examples
1. Create the employees data into two target tables. The first target table should contain employees with
department_id 10 and second target table should contain employees with department_id 20?
What data will be loaded into the first and second target tables?
Solution: The first target table will have employees from department 30. The second table will have employees
whose department ids are less than or equal to 30.
Recommended Reading:
Select the sequence generator transformation. Enter the name and then click on Create. Click Done.
Edit the sequence generator transformation, go to the properties tab and configure the options.
To generate sequence numbers, connect the NEXTVAL port to the transformations or target in the
mapping.
Increment By: Difference between two consecutive values from the NEXTVAL port. Default value is 1.
Maximum value you can specify is 2,147,483,647.
End Value: Maximum sequence value the integration service generates. If the integration service reaches
this value during the session and the sequence is not configured to cycle, the session fails. Maximum
value is 9,223,372,036,854,775,807.
Current Value: Current Value of the sequence. This value is used as the first value in the sequence. If cycle
option is configured, then this value must be greater than or equal to start value and less than end value.
Number of Cached Values: Number of sequential values the integration service caches at a time. Use this
option when multiple sessions use the same reusable generator. Default value for non-reusable sequence
generator is 0 and reusable sequence generator is 1000. Maximum value is ,223,372,036,854,775,807.
Reset: The integration service generate values based on the original current value for each session.
Otherwise, the integration service updates the current value to reflect the last-generated value for the
session plus one.
Tracing level: The level of detail to be logged in the session log file.
Sequence Generator Transformation Ports:
The sequence generator transformation contains only two output ports. They are CURRVAL and NEXTVAL output
ports.
NEXTVAL Port:
You can connect the NEXTVAL port to multiple transformations to generate the unique values for each row in the
transformation. The NEXTVAL port generates the sequence numbers base on the Current Value and Increment By
properties. If the sequence generator is not configure to Cycle, then the NEXTVAL port generates the sequence
numbers up to the configured End Value.
The sequence generator transformation generates a block of numbers at a time. Once the block of numbers is used
then it generates the next block of sequence numbers. As an example, let say you connected the nextval port to
two targets in a mapping, the integration service generates a block of numbers (eg:1 to 10) for the first target and
then another block of numbers (eg:11 to 20) for the second target.
If you want the same sequence values to be generated for more than one target, then connect the sequence
generator to an expression transformation and connect the expression transformation port to the targets. Another
option is create sequence generator transformation for each target.
CURRVAL Port:
The CURRVAL is the NEXTVAL plus the Increment By value. You rarely connect the CURRVAL port to other
transformations. When a row enters a transformation connected to the CURRVAL port, the integration service
passes the NEXTVAL value plus the Increment By value. For example, when you configure the Current Value=1 and
Increment By=1, then the integration service generates the following values for NEXTVAL and CURRVAL ports.
NEXTVAL CURRVAL
---------------
1 2
2 3
3 4
4 5
5 6
If you connect only the CURRVAL port without connecting the NEXTVAL port, then the integration service passes a
constant value for each row.
Start Value:
Specify the Start Value when you configure the sequence generator transformation for Cycle option. If you
configure the cycle, the integration service cycles back to this value when it reaches the End Value. Use Cycle to
generate a repeating sequence numbers, such as numbers 1 through 12 to correspond to the months in a year. To
cycle the integration service through a sequence:
Enter the lowest value in the sequence to use for the Start Value.
The Integration service generates sequence numbers based on the Current Value and the Increment By properties
in the sequence generator transformation. Increment By is the integer the integration service adds to the existing
value to create the new value in the sequence. The default value of Increment By is 1.
End Value:
End value is the maximum value that the integration service generates. If the integration service reaches the end
value and the sequence generator is not configured for cycle option, then the session fails with the following error
message:
If the sequence generator is configured for cycle option, then the integration service cycles back to the start value
and starts generating numbers from there.
Current Value:
The integration service uses the Current Value as the basis for generated values for each session. Specify the value
in "Current Value" you want the integration service as a starting value to generate sequence numbers. If you want
to cycle through a sequence of numbers, then the current value must be greater than or equal to the Start Value
and less than the End Value.
At the end of the session, the integration service updates the current value to the last generated sequence number
plus the Increment By value in the repository if the sequence generator Number of Cached Values is 0. When you
open the mapping after a session run, the current value displays the last sequence value generated plus the
Increment By value.
Reset:
The reset option is applicable only for non reusable sequence generator transformation and it is disabled for
reusable sequence generator. If you select the Reset option, the integration service based on the original current
value each time it starts the session. Otherwise the integration service updates the current value in the repository
with last value generated plus the increment By value.
The Number of Cached Values indicates the number of values that the integration service caches at one time.
When this value is configured greater than zero, then the integration service caches the specified number of values
and updates the current value in the repository.
The default value of Number of Cached Values is zero for non reusable sequence generators. It means the
integration service does not cache the values. The integration service, accesses the Current Value from the
repository at the start of the session, generates the sequence numbers, and then updates the current value at the
end of the session.
When you set the number of cached values greater than zero, the integration service caches the specified number
of cached values and updates the current value in the repository. Once the cached values are used, then the
integration service again accesses the current value from repository, caches the values and updates the repository.
At the end of the session, the integration service discards any unused cached values.
For non-reusable sequence generator setting the Number of Cached Values greater than zero can increase the
number of times the Integration Service accesses the repository during the session. And also discards unused
cache values at the end of the session.
As an example when you set the Number of Cached Values to 100 and you want to process only 70 records in a
session. The integration service first caches 100 values and updates the current value with 101. As there are only
70 rows to be processed, only the first 70 sequence number will be used and the remaining 30 sequence numbers
will be discarded. In the next run the sequence numbers starts from 101.
The disadvantage of having Number of Cached Values greater than zero are: 1) Accessing the repository multiple
times during the session. 2) Discarding of unused cached values, causing discontinuous sequence numbers
Reusable Sequence Generators:
The default value of Number of Cached Values is 100 for reusable sequence generators. When you are using the
reusable sequence generator in multiple sessions which run in parallel, then specify the Number of Cache Values
greater than zero. This will avoid generating the same sequence numbers in multiple sessions.
If you increase the Number of Cached Values for reusable sequence generator transformation, the number of calls
to the repository decreases. However there is chance of having highly discarded values. So, choose the Number of
Cached values wisely.
Joins: You can join two or more tables from the same source database. By default the sources are joined
based on the primary key-foreign key relationships. This can be changed by explicitly specifying the join
condition in the "user-defined join" property.
Filter rows: You can filter the rows from the source database. The integration service adds a WHERE
clause to the default query.
Sorting input: You can sort the source data by specifying the number for sorted ports. The Integration
Service adds an ORDER BY clause to the default SQL query
Distinct rows: You can get distinct rows from the source by choosing the "Select Distinct" property. The
Integration Service adds a SELECT DISTINCT statement to the default SQL query.
Custom SQL Query: You can write your own SQL query to do calculations.
The easiest method to create a source qualifier transformation is to drag the source definition in to a mapping. This
will create the source qualifier transformation automatically.
Follow the below steps to create the source qualifier transformation manually.
Click on create.
Now you can see in the below image how the source qualifier transformation is connected to the source definition.
Source Qualifier Transformation Properties:
We can configure the following source qualifier transformation properties on the properties tab. To go to the
properties tab, open the source qualifier transformation by double clicking on it and then click on the properties
tab.
Property Description
SQL Query To specify a custom query which replaces the default query.
User-Defined Join Condition used for joining multiple sources.
Source Filter Specifies the filter condition the Integration Service applies when querying rows.
Number of Sorted
Ports Used for sorting the source data
Sets the amount of detail included in the session log when you run a session containing this
Tracing Level transformation.
Select Distinct To select only unique rows from the source.
Pre-session SQL commands to run against the source database before the Integration
Pre-SQL Service reads the source.
Post-session SQL commands to run against the source database after the Integration
Post-SQL Service writes to the target.
Output is
Deterministic Specify only when the source output does not change between session runs.
Output is Repeatable Specify only when the order of the source output is same between the session runs.
Note: For flat file source definitions, all the properties except the Tracing level will be disabled.
To Understand the following, Please create the employees and departments tables in the source and emp_dept
table in the target database.
For relational sources, the Integration Service generates a query for each Source Qualifier transformation when it
runs a session. To view the default query generated, just follow the below steps:
Go to the Properties tab, select "SQL Query" property. Then open the SQL Editor, select the "ODBC data
source" and enter the username, password.
SELECT employees.employee_id,
employees.name,
employees.salary,
employees.manager_id,
employees.department_id
FROM employees
You can write your own SQL query rather than relaying the default query for performing calculations.
Note: You can generate the SQL query only if the output ports of source qualifier transformation is connected to
any other transformation in the mapping. The SQL query generated contains only the columns or ports which are
connected to the downstream transformations.
Specifying the "Source Filter, Number Of Sorted Ports and Select Distinct" properties:
Follow the below steps for specifying the filter condition, sorting the source data and for selecting the distinct
rows.
Select "Source Filter" property, open the editor and enter the filter condition (Example:
employees.department_id=100) and click OK.
Go to the "Number Of Sorted Ports" property and enter a value (Example: 2). This value (2) means to sort
the data on the first two ports in the source qualifier transformation.
Observe the DISTINCT, WHERE and ORDER BY clauses in the SQL query generated. The order by clause contains the
first two ports in the source qualifier transformation. If you want to sort the data on department_id, salary ports;
simply move these ports to top position in the source qualifier transformationa and specify the "Number Of Sorted
Ports" property as 2
Joins:
The SQL transformation can be used to join sources from the same database. By default it joins the sources based
on the primary-key, foreign-key relationships. To join heterogeneous sources, use Joiner Transformation.
A foreign-key is created on the department_id column of the employees table, which references the primary-key
column, department_id, of the departments table.
Follow the below steps to see the default join
Create only one source qualifier transformation for both the employees and departments.
Go to the properties tab of the source qualifier transformation, select the "SQL QUERY" property and generate the
SQL query.
SELECT employees.employee_id,
employees.name,
employees.salary,
employees.manager_id,
employees.department_id,
departments.department_name
FROM employees,
departments
WHERE departments.department_id=employees.department_id
You can see the employees and departments tables are joined on the department_id column in the WHERE clause.
There might be case where there won't be any relationship between the sources. In that case, we need to override
the default join. To do this we have to specify the join condition in the "User Defined Join" Property. Using this
property we can specify outer joins also. The join conditions entered here are database specific.
As an example, if we want to join the employees and departments table on the manager_id column, then in the
"User Defined Join" property specify the join condition as "departments.manager_id=employees.manager_id".
Now generate the SQL and observe the WHERE clause.
You can add the Pre-SQL and Post-SQL commands. The integration service runs the Pre-SQL and Post-SQL before
and after reading the source data respectively.
Recommended Reading:
If you like this article, then please share it or click on the google +1 button.
alter table EMPLOYEES add foreign key (DEPARTMENT_ID) references DEPARTMENTS (DEPARTMENT_ID);
1. Create a mapping to join employees and departments table on " DEPARTMENT_ID " column using source
qualifier transformation?
Solution:
1. Source qualifier transformation can be used to join sources only from the same database.
2. Connect the source definitions of departments and employees to the same qualifier transformation.
3. As there is a primary-key, foreign-key relationship between the source tables, the source qualifier
transformation by default joins the two sources on the DEPARTMENT_ID column.
2. Create a mapping to join employees and departments table on "MANAGER_ID" column using source qualifier
transformation?
Solution:
1. Connect the source definitions of departments and employees to the same qualifier transformation.
2. Go to the properties tab of source qualifier ->User Defined Join and then open the editor. Enter the join
condition as DEPARTMENTS.MANAGER_ID = EMPLOYEES.MANAGER_ID. Click Ok.
3. Now connect the required ports from the source qualifier transformation to the target.
Solution:
This is very simple. Go to the properties tab of source qualifier-> Source Filter. Open the editor and enter
EMPLOYEES.MANAGER_ID IS NOT NULL
4. Create a mapping to sort the data of employees table on DEPARTMENT_ID, SALARY?
Solution:
Make sure the ports order in the source qualifier transformation as shown below
DEPARTMENT_ID
SALARY
EMPLOYEE_ID
NAME
LAST_NAME
MANAGER_ID
The first two ports should be DEPARTMENT_ID, SALARY and the rest of the ports can be in any order.
Now go to the properties tab of source qualifier-> Number Of Sorted Ports. Make the Number Of Sorted Ports
value as 2.
Solution:
1. The source qualifier transformation should only contain the DEPARTMENT_ID port from EMPLOYEES
source definition.
2. Now go to the properties tab of source qualifier-> Select Distinct. Check the check box of Select Distinct
option.
If you are interested to solve complex problems on mappings, just go through Examples of Informatica Mappings.
Sorter Transformation in Informatica
The sorter transformation is used to sort the data from relational or flat file sources. The sorter transformation can
also be used for case-sensitive sorting and can be used to specify whether the output rows should be distinct or
not.
3. Select the Sorter Transformation, enter the name, click on create and then click on Done.
4. Select the ports from the upstream transformation and drag them to the sorter transformation. You can
also create input ports manually on the ports tab.
5. Now edit the transformation by double clicking on the title bar of the transformation.
6. Select the ports you want to use as the sort key. For each selected port, specify whether you want the
integration service to sort data in ascending or descending order.
Case Sensitive: The integration service considers the string case when sorting the data. The integration
service sorts the uppercase characters higher than the lowercase characters.
Work Directory: The integration service creates temporary files in the work directory when it is sorting
the data. After the integration service sorts the data, it deletes the temporary files.
Distinct Output Rows: The integration service produces distinct rows in the output when this option is
configured.
Tracing Level: Configure the amount of data needs to be logged in the session log file.
Null Treated Low: Enable the property, to treat null values as lower when performing the sort operation.
When disabled, the integration service treats the null values as higher than any other value.
Sorter Cache Size: The integration service uses the sorter cache size property to determine the amount of
memory it can allocate to perform sort operation
Use the sorter transformation before the aggregator and joiner transformation and sort the data for better
performance.
Recommended Reading:
If you like this article, then please share it or click on the google +1 button.
Facebook
Twitter
SQL Transformation is a connected transformation used to process SQL queries in the midstream of a pipeline. We
can insert, update, delete and retrieve rows from the database at run time using the SQL transformation.
The SQL transformation processes external SQL scripts or SQL queries created in the SQL editor. You can also pass
the database connection information to the SQL transformation as an input data at run time.
Active/Passive: By default, SQL transformation is an active transformation. You can configure it as passive
transformation.
Database Type: The type of database that the SQL transformation connects to.
Connection type: You can pass database connection information or you can use a connection object.
We will see how to create an SQL transformation in script mode, query mode and passing the dynamic database
connection with examples.
Creating SQL Transformation in Query Mode
Query Mode: The SQL transformation executes a query that defined in the query editor. You can pass parameters
to the query to define dynamic queries. The SQL transformation can output multiple rows when the query has a
select statement. In query mode, the SQL transformation acts as an active transformation.
Static SQL query: The SQL query statement does not change, however you can pass parameters to the sql query.
The integration service runs the query once and runs the same query for all the input rows.
Dynamic SQL query: The SQL query statement and the data can change. The integration service prepares the query
for each input row and then runs the query.
Q1) Let’s say we have the products and Sales table with the below data.
Solution:
Just follow the below steps for creating the SQL transformation to solve the example
Create a new mapping, drag the products source definition to the mapping.
Go to the toolbar -> Transformation -> Create -> Select the SQL transformation. Enter a name and then
click create.
Select the execution mode as query mode, DB type as Oracle, connection type as static. This is shown in
the below image.Then click OK.
Edit the sql transformation, go to the "SQL Ports" tab and add the input and output ports as shown in the
below image. Here for all the ports, you have to define Data Type (informatica specific data types) and
Native Type (Database specific data types).
In the same "SQL Ports" Tab, go to the SQL query and enter the below sql in the SQL editor.
select product, quantity, price from sales where product = ?product?
Here ?product? is the parameter binding variable which takes its values from the input port. Now connect
the source qualifier transformation ports to the input ports of SQL transformation and target input ports
to the SQL transformation output ports. The complete mapping flow is shown below.
Create the workflow, session and enter the connections for source, target. For SQL transformation also
enter the source connection.
After you run the workflow, the integration service generates the following queries for sql transformation
Dynamic SQL query: A dynamic SQL query can execute different query statements for each input row. You can pass
a full query or a partial query to the sql transformation input ports to execute the dynamic sql queries.
Q2) I have the below source table which contains the below data.
Solution:
Just follow the same steps for creating the sql transformation in the example 1.
Now go to the "SQL Ports" tab of SQL transformation and create the input port as "Query_Port". Connect
this input port to the Source Qualifier Transformation.
In the "SQL Ports" tab, enter the sql query as ~Query_Port~. The tilt indicates a variable substitution for
the queries.
As we don’t need any output, just connect the SQLError port to the target.
Q3) In the example 2, you can see the delete statements are similar except Athe table name. Now we will pass only
the table name to the sql transformation. The source table contains the below data.
Solution:
Create the input port in the sql transformation as Table_Name and enter the below query in the SQL Query
window.
Delete FROM ~Table_Name WHERE Product = 'LG'
Recommended Reading
Script Mode
In a script mode, you have to create the sql scripts in a text file. The SQL transformation runs your sql scripts from
these text files. You have to pass each script file name from the source to the SQL transformation ScriptName port.
The script file name should contain a complete path to the script file. The SQL transformation acts as passive
transformation in script mode and returns one row for each input row. The output row contains results of the
query and any database error.
In script mode, By default three ports will be created in SQL transformation. They are
ScriptName (Input port) : Receives the name of the script to execute for the current row.
ScriptResult (output port) : Returns PASSED if the script execution succeeds for the row. Otherwise
FAILED.
ScriptError (Output port) : Returns errors that occur when a script fails for a row.
You can run only static sql queries and cannot run dynamic sql queries in script mode.
You can include multiple sql queries in a script. You need to separate each query with a semicolon.
The integration service ignores the output of select statements in the SQL scripts.
You cannot use procedural languages such as oracle plsql or Microsoft/Sybase T-SQL in the script.
You cannot call a script from another script. Avoid using nested scripts.
You can use mapping variables or parameters in the script file name.
You can use static or dynamic database connection in the script mode.
Note: Use SQL transformation in script mode to run DDL (data definition language) statements like creating or
dropping the tables.
We will see how to create sql transformation in script mode with an example. We will create the following sales
table in oracle database and insert records into the table using the SQL transformation.
I created two script files in the $PMSourceFileDir directory. The sales_ddl.txt contains the sales table creation
statement and the sales_dml.txt contains the insert statements. These are the script files to be executed by SQL
transformation.
We need a source which contains the above script file names. So, I created another file in the $PMSourceFileDir
directory to store these script file names.
Now we will create a mapping to execute the script files using the SQL transformation. Follow the below steps to
create the mapping.
Go to the mapping designer tool, source analyzer and create the source file definition with the structure
as the $PMSourceFileDir/Script_names.txt file. The flat file structure is shown in the below image.
Go to the warehouse designer or target designer and create a target flat file with result and error ports.
This is shown in the below image.
Go to the mapping designer and create a new mapping.
Go to the Transformation in the toolbar, Create, select the SQL transformation, enter a name and click on
create.
Now select the SQL transformation options as script mode and DB type as Oracle and click ok.
Now connect the source qualifier transformation ports to the SQL transformation input port.
Drag the target flat file into the mapping and connect the SQL transformation output ports to the target.
Save the mappping. The mapping flow image is shown in the below picture.
This will create the sales table in the oracle database and inserts the records.
Recommended Reading:
If you like this article, then please share it or click on the google +1 button.
In the informatica power center, you can define the transaction at the following levels:
Mapping level: Use the transaction control transformation to define the transactions.
Session level: You can specify the "Commit Type" option in the session properties tab. The different
options of "Commit Type" are Target, Source and User Defined. If you have used the transaction control
transformation in the mapping, then the "Commit Type" will always be "User Defined"
When you run a session, the integration service evaluates the expression for each row in the transaction control
transformation. When it evaluates the expression as commit, then it commits all the rows in the transaction to the
target(s). When the integration service evaluates the expression as rollback, then it roll back all the rows in the
transaction from the target(s).
When you have flat file as the target, then the integration service creates an output file for each time it commits
the transaction. You can dynamically name the target flat files. Look at the example for creating flat files
dynamically - Dynamic flat file creation.
Select the transaction control transformation, enter the name and click on Create and then Done.
You can drag the ports in to the transaction control transformation or you can create the ports manually
in the ports tab.
Go to the properties tab. Enter the transaction control expression in the Transaction Control Condition.
Configuring Transaction Control Transformation
You can configure the following components in the transaction control transformation:
Transformation Tab: You can rename the transformation and add a description.
Properties Tab: You can define the transaction control expression and tracing level.
You can enter the transaction control expression in the Transaction Control Condition option in the properties tab.
The transaction control expression uses the IIF function to test each row against the condition. Use the following
syntax for the expression
Syntax:
IIF (condition, value1, value2)
Example:
IIF(dept_id=10, TC_COMMIT_BEFORE,TC_ROLLBACK_BEFORE)
Use the following built-in variables in the expression editor of the transaction control transformation:
TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction change for this
row. This is the default value of the expression.
TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new transaction, and
writes the current row to the target. The current row is in the new transaction.
TC_COMMIT_AFTER: The Integration Service writes the current row to the target, commits the
transaction, and begins a new transaction. The current row is in the committed transaction.
TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction, begins a new
transaction, and writes the current row to the target. The current row is in the new transaction.
TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target, rolls back the
transaction, and begins a new transaction. The current row is in the rolled back transaction.
If the transaction control transformation evaluates to a value other than the commit, rollback or continue, then the
integration service fails the session.
Transaction control transformation defines or redefines the transaction boundaries in a mapping. It creates a new
transaction boundary or drops any incoming transaction boundary coming from upstream active source or
transaction control transformation.
Transaction control transformation can be effective or ineffective for the downstream transformations and targets
in the mapping. The transaction control transformation can become ineffective for downstream transformations or
targets if you have used transformation that drops the incoming transaction boundaries after it. The following
transformations drop the transaction boundaries.
A multiple input group transformation, such as a Custom transformation, connected to multiple upstream
transaction control points.
Use the following rules and guidelines when you create a mapping with a Transaction Control transformation:
If the mapping includes an XML target, and you choose to append or create a new document on commit,
the input groups must receive data from the same transaction control point.
Transaction Control transformations connected to any target other than relational, XML, or dynamic
MQSeries targets are ineffective for those targets.
You can connect only one effective Transaction Control transformation to a target.
You cannot place a Transaction Control transformation in a pipeline branch that starts with a Sequence
Generator transformation.
If you use a dynamic Lookup transformation and a Transaction Control transformation in the same
mapping, a rolled-back transaction might result in unsynchronized target data.
A Transaction Control transformation may be effective for one target and ineffective for another target. If
each target is connected to an effective Transaction Control transformation, the mapping is valid.
Either all targets or none of the targets in the mapping should be connected to an effective Transaction
Control transformation.
When you want to maintain a history or source in the target table, then for every change in the source record you
want to insert a new record in the target table.
When you want an exact copy of source data to be maintained in the target table, then if the source data changes
you have to update the corresponding records in the target.
The design of the target table decides how to handle the changes to existing rows. In the informatica, you can set
the update strategy at two different levels:
Session Level: Configuring at session level instructs the integration service to either treat all rows in the
same way (Insert or update or delete) or use instructions coded in the session mapping to flag for
different database operations.
Mapping Level: Use update strategy transformation to flag rows for inert, update, delete or reject.
You have to flag each row for inserting, updating, deleting or rejecting. The constants and their numeric
equivalents for each database operation are listed below.
You have to flag rows by assigning the constant numeric values using the update strategy expression. The update
strategy expression property is available in the properties tab of the update strategy transformation.
Each row is tested against the condition specified in the update strategy expression and a constant value is
assigned to it. A sample expression is show below:
Mostly IIF and DECODE functions are used to test for a condition in update strategy transformation.
Update Strategy and Lookup Transformations:
Update strategy transformation is used mostly with lookup transformation. The row from the source qualifier is
compared with row from lookup transformation to determine whether it is already exists or a new record. Based
on this comparison, the row is flagged to insert or update using the update strategy transformation.
If you place an update strategy before an aggregator transformation, the way the aggregator transformation
performs aggregate calculations depends on the flagging of the row. For example, if you flag a row for delete and
then later use the row to calculate the sum, then the integration service subtracts the value appearing in this row.
If it’s flagged for insert, then the aggregator adds its value to the sum.
Important Note:
Update strategy works only when we have a primary key on the target table. If there is no primary key available on
the target table, then you have to specify a primary key in the target definition in the mapping for update strategy
transformation to work.
Union transformation contains only one output group and can have multiple input groups.
The input groups and output groups should have matching ports. The datatype, precision and scale must
be same.
Union transformation does not remove duplicates. To remove the duplicate rows use sorter
transformation with "select distinct" option after the union transformation.
3. Select the union transformation and enter the name. Now click on Done and then click on OK.
4. Go to the Groups Tab and then add a group for each source you want to merge.
Properties: Specify the amount of tracing level to be tracked in the session log.
Groups Tab: You can create new input groups or delete existing input groups.
Group Ports Tab: You can create and delete ports for the input groups.
Note: The ports tab displays the groups and ports you create. You cannot edit the port or group information in the
ports tab. To do changes use the groups tab and group ports tab.
Why union transformation is active
Union is an active transformation because it combines two or more data streams into one. Though the total
number of rows passing into the Union is the same as the total number of rows passing out of it, and the sequence
of rows from any given input stream is preserved in the output, the positions of the rows are not preserved, i.e.
row number 1 from input stream 1 might not be row number 1 in the output stream. Union does not even
guarantee that the output is repeatable.
1. There are two tables in the source. The table names are employees_US and employees_UK and have the
structure. Create a mapping to load the data of these two tables into single target table employees?
The predefined case conversion types are uppercase, lowercase, toggle case, title case and sentence case.
Reference tables can also be used to control the case conversion. Use the "Valid" column in the reference table to
change the case of input strings. Use reference tables only when the case conversion type is "title case or sentence
case".
You can create multiple case conversion strategies. Each strategy uses a single conversion type. Configure the
following properties on the strategies view in the case converter transformation:
Reference Tables: used to apply the capitalization format specified by a reference table. Reference tables work
only if the case conversion option is title case or sentence case. If a reference table match occurs at the start of a
string, the next character in that string changes to uppercase. For example, if the input string is vieditor and the
reference table has an entry for Vi, the output string is ViEditor.
Conversion Types: The conversion types are uppercase, lowercase, toggle case, title case and sentence case. The
default conversion type is uppercase.
Leave uppercase words unchanged: Overrides the chosen capitalization for uppercase strings.
Delimiters: Specifies how capitalization functions work for title case conversion. For example, choose a colon as a
delimiter to transform "james:bond" to "James:Bond". The default delimiter is the space character.
Recommended Reading:
The Mapping Wizards in informatica provides an easy way to create the different types of SCDs. We will see how to
create the SCDs using the mapping wizards in step by step.
The below steps are common for creating the SCD type 1, type 2 and type 3
Open the mapping designer tool, Go to the source analyzer tab and either create or import the source definition.
As an example i am using the customer table as the source. The fields in the customer table are listed below.
Customers (Customer_Id, Customer_Name, Location)
Go to the mapping designer tab, in the tool bar click on Mappings, select Wizards and then click on Slowly
Changing Dimensions.
Now enter the mapping name and select the SCD mapping type you want to create. This is shown in the below
image. Then click on Next.
Select the source table name (Customers in this example) and enter the name for the target table to be created.
Then click on next.
Now you have to select the logical key fields and fields to compare for changes. Logical key fields are the fields
which the source qualifier and the Lookup will be joined. Fields to compare for changes are the fields which are
used to determine whether the values are changed or not. Here i am using customer_id as the logical key field and
the location as the field to compare.
As of now we have seen the common steps for creating the SCDs. Now we will see the specific steps for creating
each SCD
SCD Type 1 Mapping:
Once you have selected the logical key fields and fields to compare for changes. Then you have to simply click the
finish button to create the SCD Type 1 mapping.
SCD Type 2 Mapping:
After selecting the logical fields click on the next button. You will get a window where you can select what type of
SCD 2 you want to create. For
Effective Date: select "Mark the dimension records with their effective date.
Once you have selected the required type, then click on the finish button to create the SCD type 2 mapping.
SCD Type 3 Mapping:
Click on the next button after selecting the logical key fields. You will get window for selecting the optional
Effective Date. If you want the effective date to be created in the dimension table, you can check this box or else
ignore. Now click on the finish button to create the SCD type 3 mapping.
TagsInformatica
The SCD Type 1 method is used when there is no need to store historical data in the Dimension table. The SCD type
1 method overwrites the old data with the new data in the dimension table.
We see the implementation of SCD type 1 by using the customer dimension table as an example. The source table
looks as
Now I have to load the data of the source into the customer dimension table using SCD Type 1. The Dimension
table structure is shown below.
Open the mapping designer tool, source analyzer and either create or import the source definition.
Go to the Warehouse designer or Target designer and import the target definition.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in
the below image.
Select the customer dimension table and click on OK.
Edit the lkp transformation, go to the properties tab, and add a new port In_Customer_Id. This new port
needs to be connected to the Customer_Id port of source qualifier transformation.
Go to the condition tab of lkp transformation and enter the lookup condition as Customer_Id =
IN_Customer_Id. Then click on OK.
Connect the customer_id port of source qualifier transformation to the IN_Customer_Id port of lkp
transformation.
Create the expression transformation with input ports as Cust_Key, Name, Location, Src_Name,
Src_Location and output ports as New_Flag, Changed_Flag
For the output ports of expression transformation enter the below expressions and click on ok
New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND (Name != Src_Name
OR Location != Src_Location),
1, 0 )
Now connect the ports of lkp transformation (Cust_Key, Name, Location) to the expression
transformaiton ports (Cust_Key, Name, Location) and ports of source qualifier transformation(Name,
Location) to the expression transforamtion ports(Src_Name, Src_Location) respectively.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as New_Flag=1. Then
click on ok.
Now create an update strategy transformation and connect all the ports of the filter transformation
(except the New_Flag port) to the update strategy. Go to the properties tab of update strategy and enter
the update strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the appropriate ports from update strategy
to the target definition.
Create a sequence generator transformation and connect the NEXTVAL port to the target surrogate key
(cust_key) port.
The part of the mapping diagram for inserting a new row is shown below:
Now create another filter transformation and drag the ports from lkp transformation (Cust_Key), source
qualifier transformation (Name, Location), expression transformation (changed_flag) ports into the filter
transformation.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as Changed_Flag=1.
Then click on ok.
Now create an update strategy transformation and connect the ports of the filter transformation
(Cust_Key, Name, and Location) to the update strategy. Go to the properties tab of update strategy and
enter the update strategy expression as DD_Update
Now drag the target definition into the mapping and connect the appropriate ports from update strategy
to the target definition.
Recommended Reading
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
We will see how to implement the SCD Type 2 version in informatica. As an example consider the customer
dimension. The source and target table structures are shown below:
--Source Table
The basic steps involved in creating a SCD Type 2 version mapping are
Identifying the new records and inserting into the dimension table with version number as one.
Identifying the changed record and inserting into the dimension table by incrementing the version
number.
Lets divide the steps to implement the SCD type 2 version mapping into three parts.
Open the mapping designer tool, source analyzer and either create or import the source definition.
Go to the Warehouse designer or Target designer and import the target definition.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in
the below image.
Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only
Cust_key, customer_id, location ports and Version ports in the lookup transformation. Create a new port
(IN_Customer_Id) in the lookup transformation. This new port needs to be connected to the customer_id
port of the source qualifier transformation.
Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id =
IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override.
Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override
expression editor and then add the order by clause.
You have to use an order by clause in the above query. If you sort the version column in ascending order,
then you have to specify "Use Last Value" in the "Lookup policy on multiple match" property. If you have
sorted the version column in descending order then you have to specify the "Lookup policy on multiple
match" option as "Use First Value"
Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation
to the In_Customer_Id port of the LKP transformation.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location and
output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.
In this part, we will identify the new records and insert them into the target with version value as 1. The steps
involved are:
Now create a filter transformation to identify and insert new record in to the dimension table. Drag the
ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id,
Location) into the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1
Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id,
Location). Go to the properties tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression transformation as
"Expr_Ver".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In the
expression transformation create a new output port (Version) and assign value 1 to it.
Now connect the ports of expression transformation (Nextval, Version) to the Target definition ports
(Cust_Key, Version). The part of the mapping flow is shown in the below image.
In this part, we will identify the changed records and insert them into the target by incrementing the version
number. The steps involved are:
Create a filter transformation. This is used to find the changed record. Now drag the ports from
expression transformation (changed_flag), source qualifier transforamtion (customer_id, location) and
LKP transformation (version) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Create an expression transformation and drag the ports of filter transformation except the changed_flag
port into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (O_Version) and assign the
expression as (version+1).
Now create an update strategy transformation and drag the ports of expression transformation
(customer_id, location,o_version) into the update strategy transformation. Go to the properties tab and
enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Now connect the Next_Val port of expression transformation (Expr_Ver created in part 2) to the cust_key
port of the target definition. The complete mapping diagram is shown in the below image:
You can implement the SCD type 2 version mapping in your own way. Remember that SCD type2 version mapping
is rarely used in real time.
Recommended Reading
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
TagsInformatica
SCD type 2 will store the entire history in the dimension table. Know more about SCDs at Slowly Changing
Dimensions Concepts.
We will see how to implement the SCD Type 2 Flag in informatica. As an example consider the customer
dimension. The source and target table structures are shown below:
--Source Table
The basic steps involved in creating a SCD Type 2 Flagging mapping are
Identifying the new records and inserting into the dimension table with flag column value as one.
Identifying the changed record and inserting into the dimension table with flag value as one.
Identify the changed record and update the existing record in dimension table with flag value as zero.
We will divide the steps to implement the SCD type 2 flagging mapping into four parts.
Here we will see the basic set up and mapping flow require for SCD type 2 Flagging. The steps involved are:
Go to the Warehouse designer or Target designer and import the target definition.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in
the below image.
Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only
Cust_key, customer_id and location ports in the lookup transformation. Create a new port
(IN_Customer_Id) in the lookup transformation. This new port needs to be connected to the customer_id
port of the source qualifier transformation.
Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id =
IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override.
Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override
expression editor and then add the WHERE clause.
Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation
to the In_Customer_Id port of the LKP transformation.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location and
output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.
In this part, we will identify the new records and insert them into the target with flag value as 1. The steps involved
are:
Now create a filter transformation to identify and insert new record in to the dimension table. Drag the
ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id,
Location) into the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1
Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id,
Location). Go to the properties tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression transformation as
"Expr_Flag".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In the
expression transformation create a new output port (Flag) and assign value 1 to it.
Now connect the ports of expression transformation (Nextval, Flag) to the Target definition ports
(Cust_Key, Flag). The part of the mapping flow is shown in the below image.
SCD Type 2 Flag implementation - Part 3
In this part, we will identify the changed records and insert them into the target with flag value as 1. The steps
involved are:
Create a filter transformation. Call this filter transformation as FIL_Changed. This is used to find the
changed records. Now drag the ports from expression transformation (changed_flag), source qualifier
transformation (customer_id, location), LKP transformation (Cust_Key) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Now create an update strategy transformation and drag the ports of Filter transformation (customer_id,
location) into the update strategy transformation. Go to the properties tab and enter the update strategy
expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Now connect the Next_Val, Flag ports of expression transformation (Expr_Flag created in part 2) to the
cust_key, Flag ports of the target definition respectively. The part of the mapping diagram is shown
below.
Create an expression transformation and drag the Cust_Key port of filter transformation (FIL_Changed
created in part 3) into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (Flag). Assign a value "0"
to this Flag port.
Now create an update strategy transformation and drag the ports of the expression transformation into it.
Go to the properties tab and enter the update strategy expression as DD_UPDATE.
Drag the target definition into the mapping and connect the appropriate ports of update strategy to it.
The complete mapping image is shown below.
Recommended Reading
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
SCD type 2 will store the entire history in the dimension table. In SCD type 2 effective date, the dimension table
will have Start_Date (Begin_Date) and End_Date as the fields. If the End_Date is Null, then it indicates the current
row. Know more about SCDs at Slowly Changing Dimensions Concepts.
We will see how to implement the SCD Type 2 Effective Date in informatica. As an example consider the customer
dimension. The source and target table structures are shown below:
--Source Table
The basic steps involved in creating a SCD Type 2 Effective Date mapping are
Identifying the new records and inserting into the dimension table with Begin_Date as the Current date
(SYSDATE) and End_Date as NULL.
Identifying the changed record and inserting into the dimension table with Begin_Date as the Current
date (SYSDATE) and End_Date as NULL.
Identify the changed record and update the existing record in dimension table with End_Date as Curren
date.
We will divide the steps to implement the SCD type 2 Effective Date mapping into four parts.
Here we will see the basic set up and mapping flow require for SCD type 2 Effective Date. The steps involved are:
Open the mapping designer tool, source analyzer and either create or import the source definition.
Go to the Warehouse designer or Target designer and import the target definition.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in
the below image.
Select the customer dimension table and click on OK.
Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only
Cust_key, customer_id and location ports in the lookup transformation. Create a new port
(IN_Customer_Id) in the lookup transformation. This new port needs to be connected to the customer_id
port of the source qualifier transformation.
Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id =
IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override.
Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override
expression editor and then add the WHERE clause.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location and
output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.
In this part, we will identify the new records and insert them into the target with Begin Date as the current date.
The steps involved are:
Now create a filter transformation to identify and insert new record in to the dimension table. Drag the
ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id,
Location) into the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1
Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id,
Location). Go to the properties tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression transformation as
"Expr_Date".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In the
expression transformation create a new output port (Begin_Date with date/time data type) and assign
value SYSDATE to it.
Now connect the ports of expression transformation (Nextval, Begin_Date) to the Target definition ports
(Cust_Key, Begin_Date). The part of the mapping flow is shown in the below image.
In this part, we will identify the changed records and insert them into the target with Begin Date as the current
date. The steps involved are:
Create a filter transformation. Call this filter transformation as FIL_Changed. This is used to find the
changed records. Now drag the ports from expression transformation (changed_flag), source qualifier
transformation (customer_id, location), LKP transformation (Cust_Key) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Now create an update strategy transformation and drag the ports of Filter transformation (customer_id,
location) into the update strategy transformation. Go to the properties tab and enter the update strategy
expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Now connect the Next_Val, Begin_Date ports of expression transformation (Expr_Date created in part 2)
to the cust_key, Begin_Date ports of the target definition respectively. The part of the mapping diagram is
shown below.
SCD Type 2 Effective Date implementation - Part 4
In this part, we will update the changed records in the dimension table with End Date as current date.
Create an expression transformation and drag the Cust_Key port of filter transformation (FIL_Changed
created in part 3) into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (End_Date with date/time
data type). Assign a value SYSDATE to this port.
Now create an update strategy transformation and drag the ports of the expression transformation into it.
Go to the properties tab and enter the update strategy expression as DD_UPDATE.
Drag the target definition into the mapping and connect the appropriate ports of update strategy to it.
The complete mapping image is shown below.
Recommended Reading
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
TagsInformatica
The SCD Type 3 method is used to store partial historical data in the Dimension table. The dimension table contains
the current and previous data.
Identifying the changed record and update the existing record in the dimension table.
We will see the implementation of SCD type 3 by using the customer dimension table as an example. The source
table looks as
Now I have to load the data of the source into the customer dimension table using SCD Type 3. The Dimension
table structure is shown below.
CREATE TABLE Customers_Dim (
Cust_Key Number,
Customer_Id Number,
Curent_Location Varchar2(30),
Previous_Location Varchar2(30)
)
Open the mapping designer tool, source analyzer and either create or import the source definition.
Go to the Warehouse designer or Target designer and import the target definition.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in
the below image.
Go to the condition tab of LKP transformation and enter the lookup condition as Customer_Id =
IN_Customer_Id. Then click on OK.
Connect the customer_id port of source qualifier transformation to the IN_Customer_Id port of LKP
transformation.
Create the expression transformation with input ports as Cust_Key, Prev_Location, Curr_Location and
output ports as New_Flag, Changed_Flag
For the output ports of expression transformation enter the below expressions and click on ok
New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND Prev_Location != Curr_Location,
1, 0 )
Now connect the ports of LKP transformation (Cust_Key, Curent_Location) to the expression
transformaiton ports (Cust_Key, Prev_Location) and ports of source qualifier transformation (Location) to
the expression transformation ports (Curr_Location) respectively.
Create a filter transformation and drag the ports of source qualifier transformation into it. Also drag the
New_Flag port from the expression transformation into it.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as New_Flag=1. Then
click on ok.
Now create an update strategy transformation and connect all the ports of the filter transformation
(except the New_Flag port) to the update strategy. Go to the properties tab of update strategy and enter
the update strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the appropriate ports from update strategy
to the target definition. Connect Location port of update strategy to the Current_Location port of the
target definition.
Create a sequence generator transformation and connect the NEXTVAL port to the target surrogate key
(cust_key) port.
The part of the mapping diagram for inserting a new row is shown below:
Now create another filter transformation, Go to the ports tab and create the ports Cust_Key,
Curr_Location, Prev_Location, Changed_Flag. Connect the ports LKP Transformation (Cust_Key,
Current_Location) to the filter transformation ports (Cust_Key, Prev_Location), source qualifier
transformation ports (Location) to the filter transformation port (Curr_Location) and expression
transformation port(changed_flag) to the changed_flag port of the filter transformation.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as Changed_Flag=1.
Then click on ok.
Now create an update strategy transformation and connect the ports of the filter transformation
(Cust_Key, Curr_Location, Prev_location) to the update strategy. Go to the properties tab of update
strategy and enter the update strategy expression as DD_Update
Now drag the target definition into the mapping and connect the appropriate ports from update strategy
to the target definition.
Recommended Reading
SCD Type 1
SCD Type 3
SCD Type 2 version
SCD Type 2 Flag
SCD Type 2 Effective Date
This article covers the process to load data into the fact table. Follow the below steps for loading data into the fact
table.
First implement the SCD type 2 method to load data into the dimension table. As an example choose the SCD type
2 effective date to load data into the customer dimension table. The data in the customer dimension table looks
as:
Let say, you want to load the sales fact table. Consider the following source transactional (sales) data:
When loading the data into the fact table, you have to get the relavant dimensional keys (surrogate keys) from all
the dimension tables and then insert the records into the fact table. When getting the dimension keys from the
dimension table we have to get the rows for which the End_date column is Null. The following SQL query shows
how to load the data into the fact table
Price Cust_key
--------------
10000 3
50000 4
30000 5
Please do comment here, if you are facing any issues in loading the fact table.
infacmd
infasetup
pmcmd
pmrep
This article covers only about the pmcmd command. The pmcmd is a command line utility provided by the
informatica to perform the following tasks.
Start workflows.
The pmcmd command syntax for scheduling the workflow is shown below:
You cannot specify the scheduling options here. This command just schedules the workflow for the next run.
2. Start workflow
3. Stop workflow
You can start the workflow from a specified task. This is shown below:
5. Stopping a task.
The following pmcmd commands are used to abort workflow and task in a workflow:
We can specify a single operation for all the rows using the "Treat Sources Rows As" setting in the session
properties tab. The different values you can specify for this option are:
Insert: The integration service treats all the rows for insert operation. If inserting a new row violates the
primary key or foreign key constraint in the database, then the integration service rejects the row.
Delete: The integration service treats all the rows for delete operation and deletes the corresponding row
in the target table. You must define a primary key constraint in the target definition.
Update: The integration service treats all the rows for update operation and updates the rows in the
target table that matches the primary key value. You must define a primary key in the target definition.
Data Driven: An update strategy transformation must be used in the mapping. The integration service
either inserts or updates or deletes a row in the target table based on the logic coded in the update
strategy transformation. If you do not specify the data driven option when you are using a update strategy
in the mapping, then the workflow manager displays a warning. The integration service does not follow
the instructions in the update strategy transformation.
You can also specify the update strategy options for each target table individually. Specify the update strategy
options for each target in the Transformations view on the Mapping tab of the session:
Truncate Table: check this option to truncate the target table before loading the data.
The below table illustrates how the data in target table is inserted or updated or deleted for various combinations
of "Row Flagging" and "Settings of Individual Target Table".
Row Flagging
Type Target Table Settings Result
Insert Insert is specified Source row is inserted into the target.
Insert option is not
Insert specified Source row is not inserted into the target
Delete option is
Delete specified If the row exists in target, then it will be deleted.
Delete option is not
Delete specified Even if the row exists in target, then it will not be deleted from the target.
Update Update as Update If the row exists in target, then it will be updated.
Insert is specified
Update as Insert is Even if the row is flagged as udpate, it will not be updated in Target.
Update specified Instead, the row will be inserted into the target.
Insert is not specified
Update as Insert is
Update Specified. Neither update nor insertion of row happens
Insert is specified
Update else Insert is If the row exists in target, then it will be updated. Otherwise it will be
Update specified inserted.
Insert is not specified
Update else Insert is If the row exists in target, then it will be updated. Row will not be inserted
Update Specified in case if it not exists in target.
TagsInformatica
When you want update the target table based on the primary key.
What if you want to update the target table by a matching column other than the primary key? In this case the
update strategy wont work. Informatica provides feature, "Target Update Override", to update even on the
columns that are not primary key.
You can find the Target Update Override option in the target definition properties tab. The syntax of update
statement to be specified in Target Update Override is
UDATE TARGET_TABLE_NAME
SET TARGET_COLUMN1 = :TU.TARGET_PORT1,
[Additional update columns]
WHERE TARGET_COLUMN = :TU.TARGET_PORT
AND [Additional conditions]
Here TU means target update and used to specify the target ports.
Example: Consider the employees table as an example. In the employees table, the primary key is employee_id.
Let say we want to update the salary of the employees whose employee name is MARK. In this case we have to use
the target update override. The update statement to be specified is
UPDATE EMPLOYEES
SET SALARY = :TU.SAL
WHERE EMPLOYEE_NAME = :TU.EMP_NAME
TagsInformatica
Here we will see how to generate the SQL query and the errors that we will get while generating the SQL query.
The most frequent error that we will get is "Cannot generate query because there are no valid fields projected
from the Source Qualifier".
First we will see simulate this error and then we will see how to avoid this. Follow the below steps for simulating
and fixing error:
Create a new mapping and drag the relational source into it. For example drag the customers source
definition into the mapping.
Do not connect the source qualifier transformation to any of other transformations or target.
Edit the source qualifier and go to the properties tab and then open the SQL Query Editor.
Enter the ODBC data source name, user name, password and then click on Generate SQL.
Now we will get the error while generating the SQL query.
Informatica produces this error because the source qualifier transformation ports are not connected to
any other transformations or target. Informatica just knows the structure of the source. However it
doesn't know what columns to be read from source table. It will know only when the source qualifier is
connected to downstream transformations or target.
To avoid this error, connect the source qualifier transformation to downstream transformation or target.
To explain this I am taking the customers table as the source. The source structure looks as below
Follow the below steps to generate the SQL query in source qualifier transformation.
Create a new mapping and drag the customers relational source into the mapping.
Now connect the source qualifier transformation to any other transformation or target. Here I have
connected the SQ to expression transformation. This is shown in the below image.
Edit the source qualifier transformation, go to the properties tab and then open the editor of SQL query.
Enter the username, password, data source name and click on Generate SQL query. Now the SQL query
will be generated. This is shown in the below image.
SELECT Customers.Customer_Id,
Customers.Name,
Customers.Email_Id,
Customers.Phone
FROM Customers
Now we will do a small change to understand more about the "Generating SQL query". Remove the link
(connection) between Name port of source qualifier and expression transformation.
Repeat the above steps to generate the SQL query and observe what SQL query will be generated.
SELECT Customers.Customer_Id,
Customers.Email_Id,
Customers.Phone
FROM Customers
The Name column is missing in the generated query. This means that whatever the ports connected from Source
Qualifier transformation to other downstream transformations or target will be included in the SQL query and read
from the database table.
We use sequence generator transformation mostly in SCDs. Using a sequence generator transformation to
generate unique primary key values can cause performance issues as an additional transformation is required to
process in mapping.
You can use expression transformation to generate surrogate keys in a dimensional table. Here we will see the
logic on how to generate sequence numbers with expression transformation.
When you use the reset option in a sequence generator transformation, the sequence generator uses the original
value of Current Value to generate the numbers. The sequences will always start from the same number.
As an example, if the Current Value is 1 with reset option checked, then the sequences will always start from value
1 for multiple session runs. We will see how to implement this reset option with expression transformation.
Create a mapping parameter and call it as $$Current_Value. Assign the default value to this parameter,
which is the start value of the sequence numbers.
Now create an expression transformation and connect the source qualifier transformation ports to the
expression transformation.
In the expression transformation create the below additional ports and assign the expressions:
v_seq (variable port) = IIF(v_seq>0,v_seq+1,$$Current_Value)
o_key (output port) = v_seq
The v_seq port generates the numbers same as NEXTVAL port in sequence generator transformation.
We will see here how to generate the primary key values using the expression transformation and a parameter.
Follow the below steps:
Create a mapping to write the maximum value of primary key in the target to a parameter file. Assign the
maximum value to the parameter ($$MAX_VAL) in this mapping. Create a session for this mapping. This
should be the first session in the workflow.
Create another mapping where you want to generate the sequence numbers. In this mapping, connect
the required ports to the expression transformation, create the below additional ports in the expression
transformation and assign the below expressions:
The o_surrogate_key port generates the primary key values just as the sequence generator
transformation.
Follow the below steps to generate sequence numbers using expression and lookup transformations.
Create an unconnected lookup transformation with lookup table as target. Create a primary_key_column
port with type as output/lookup/return in the lookup ports tab. Create another port input_id with type as
input. Now overwrite the lookup query to get the maximum value of primary key from the target. The
query looks as
Now create an expression transformation and connect the required ports to it. Now we will call the
unconnected lookup transformation from this expression transformation. Create the below additional
port in the expression transformation:
The o_primary_key port generates the surrogate key values for the dimension table.
TT_11054 Normalizer Transformation: Initialization Error: [Cannot match OFOSid with IFOTid.]
Solution:
2. If the Normalizer has an OCCURS in it, make sure number of input ports matches the number of OCCURS.
When you stop, the integration service first tries to stop processing the task. The integration service does not
process other tasks that are in sequence. However it process the tasks that are in parallel to the task on which the
stop or abort command is issued. If the Integration Service cannot stop the task, you can try to abort the task.
When you abort a task, the Integration Service kills the process on the task.
When you issue a stop command on a session, the integration service first stops reading the data from the sources.
It continues processing and writing data to the targets and then commits the data.
Abort command is handled the same way as the stop command, except that the abort command has timeout
period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout
period, it kills the DTM process and terminates the session.
When you run a session, it holds memory blocks in the OS. When issue a abort on the session, it kills the threads
and leaves the memory blocks. This causes memory issues in the server and leads to poor performance. Some
operating systems clean the lost memory blocks automatically. However most of the operating systems do not
clean up these memory blocks. Stop is clean way of killing the sessions and cleans up the memory blocks.
Let see how to generate list out all the days between two given dates using oracle sql query.
Output:
CALENDAR_DATE
-------------
1/1/2000
1/2/2000
1/3/2000
.
.
.
12/31/2000
Now we can apply date functions on the Calendar date field and can derive the rest of the columns required in a
date dimension.
We will see how to get the list of days between two given dates in informatica. Follow the below steps for creating
the mapping in informatica.
Create a source with two ports ( Start_Date and End_Date) in the source analyzer.
Create a new mapping in the mapping designer Drag the source definition into the mapping.
Now edit the java transformation by double clicking on the title bar and go to the "Java Code" tab. Here
you will again find sub tabs. Go to the "Import Package" tab and enter the below java code:
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
Not all these packages are required. However i included just in case if you want to apply any formatting on
dates. Go to the "On Input Row" tab and enter the following java code:
Compile the java code by clicking on the compile. This will generate the java class files.
Connect only the Start_Date output port from java transformation to expression transformation.
Connect the Start_Date port from expression transformation to target and save the mapping.
Now create a workflow and session. Enter the following oracle sql query in the Source SQL Query option:
Save the workflow and run. Now in the target you can see the list of dates loaded between the two given dates.
Note1: I have used relational table as my source. You can use a flat file instead.
Note2: In the expression transformation, create the additional output ports and apply date functions on the
Start_Date to derive the data required for date dimension.
The aggregator transformation will throw the below error if you do not sort the data and specified the "Sorted
Input" option in the properties of the aggregator transformation:
This error is due to the data sorting issues in the mapping. This is quite obvious as the aggregator transformation
expects the data to be in sorted order and however it gets the data in unsorted order. To avoid this error, simply
follow the below steps:
Be sure to sort the data using a sorter transformation or source qualifier transformation before passing to the
aggregator transformation.
The order of the ports is important while sorting the data. The order of the ports that you specify in the sorter
transformation should be exactly same as the order of the ports specified in "Group By" ports of aggregator
transformation. If the order of the ports does not match, then you will get this error.
If you are using the string or varchar ports in the "Group By" of aggregator transformation, then remove the
trailing, leading spaces in the expression transformation and then pass to sorter transformation to sort the data.
Do not place transformations which change the sorting order before the aggregator transformation.
Example: I want to load the customers data into the target file on a daily basis. The source file name is in the
format customers_yyyymmdd.dat. How to load the data where the filename varies daily?
The solution to this kind of problems is using the parameters. You can specify session parameters for both the
source and target flat files. Then create a parameter file and assign the flat file names to the parameters.
Assume two session parameters $InputFileName and $OutputFileName for specifying the source and target flat file
names respectively. Now create a parameter file in the below format
$InputFileName=customers_20120101.dat
$outputFileName=customers_file.dat
Now you have to specify the parameters in the session. Edit the session and go to the Mapping tab. In the mapping
tab, select the source qualifier in the Sources folder and set the file property "Source FileName" as
$InputFileName. Similarly, for target file set the "Source FileName" as $OutputFileName.
The last step is specifying the parameter file name. You can specify the parameter file name either in the session
level or workflow level. To specify in the session level, go the properties tab of the session and set the property
"Parameter FileName".
To specify the parameter file at workflow level, click on the "Worfklows" in toolbar and then on Edit. Now go to the
properties and set the file property "Parameter FileName"
Thats it you are done with using the parameters as filenames. Now you have to take care of changing the file name
in the parameter file daily.
Note: You can even specify the source and target directories as parameters.
Direct and Indirect Flat File Loading (Source File Type) - Informatica
When you want to a load a single file into the target, then you can use the direct source filetype option. You can
set the following source file properties in the mapping tab of the session:
Source File Directory: Enter the directory name where the source file resides.
Source Filename: Enter the name of the file to be loaded into the target.
Source Filetype: Specify the direct option when you want to load a single file into the target.
Example: Let say we want to load the employees source file (employees.dat) in the directory $PMSourceFileDir
into the target, then source file properties to be configured in the session are:
Let say from each country we are getting the customers data in a separate file. These files have the same structure
and same properties and we want to load all these files into a single target. Creating a mapping for each source file
will be a tedious process. Informatica provides an easy option (indirect load) to handle this type of scenarios.
The indirect source file type option is used load the data from multiple source files that have the same structure
and properties. The integration service reads each file sequentially and then loads the data into the target.
The process involved in specifying the indirect load options are 1.Creating a list file and 2.Configuring the file
properties in session.
You can create a list file manually and specify each source file you want to load into the target in a separate line. As
an example consider the following list file:
>cat customers_list.dat
$PMSourceFileDir/customers_us.dat
$PMSourceFileDir/customers_uk.dat
$PMSourceFileDir/customers_india.dat
Each file in the list must use the user-defined code page configured in the source definition.
Each file in the file list must share the same file properties as configured in the source definition or as
entered for the source instance in the session property sheet.
Enter one file name or one path and file name on a line. If you do not specify a path for a file, the
Integration Service assumes the file is in the same directory as the file list.
Configure the following source file properties in the session for indirect source filetype:
Source File Directory: Enter the directory name where the source file resides.
Source Filename: Enter the list file name in case of indirect load
Source Filetype: Specify the indirect option when you want to load a multiple files with same properties.
Note: If you have multiple files with different properties, then you cannot use the indirect load option. You have to
use direct load option in this case.
Target load order (or) Target load plan is used to specify the order in which the integration service loads the
targets. You can specify a target load order based on the source qualifier transformations in a mapping. If you have
multiple source qualifier transformations connected to multiple targets, you can specify the order in which the
integration service loads the data into the targets.
Target load order will be useful when the data of one target depends on the data of another target. For example,
the employees table data depends on the departments data because of the primary-key and foreign-key
relationship. So, the departments table should be loaded first and then the employees table. Target load order is
useful when you want to maintain referential integrity when inserting, deleting or updating tables that have the
primary key and foreign key constraints.
You can set the target load order or plan in the mapping designer. Follow the below steps to configure the target
load order:
1. Login to the powercenter designer and create a mapping that contains multiple target load order groups.
2. Click on the Mappings in the toolbar and then on Target Load Plan. The following dialog box will pop up listing all
the source qualifier transformations in the mapping and the targets that receive data from each source qualifier.
3. Select a source qualifier from the list.
4. Click the Up and Down buttons to move the source qualifier within the load order.
5. Repeat steps 3 and 4 for other source qualifiers you want to reorder.
6. Click OK.
TagsInformatica
Q) I want to load the data from a flat file into a target. The flat file has n number of records. How the load should
happen is: In the first run i want to load the first 50 records, in the second run the next 20 records, in the third run,
the next 20 records and so on?
We will solve this problem with the help of mapping variables. Follow the below steps to implement this logic:
Create an expression transformation and drag the ports of source qualifier transformation into the
expression transformation.
Now create a filter transformtion and drag the ports of expression transformation into it. In the filter
transformation specfiy the contition as
IIF(v_check_rec=50,
IIF(o_cnt <= o_check_rec,TRUE,FALSE),
IIF(o_cnt<=o_check_rec AND o_cnt>o_ceck_rec-20,TRUE,FALSE)
)
Drag the target definition into the mapping and connect the appropriate ports of filter transformation to
the target.
Create a workflow and run the workflow multiple times to see the effect.
We will create a simple pass through mapping to load the data and "file name" from a flat file into the target.
Assume that we have a source file "customers" and want to load this data into the target "customers_tgt". The
structures of source and target are
Target: Customers_TBL
Customer_Id
Location
FileName
You can create the flat file or import the flat file.
Once you created a flat file, edit the source and go to the properties tab. Check the option "Add Currently
Processed Flat File Name Port". This option is shown in the below image.
A new port, "CurrentlyProcessedFileName" is created in the ports tab.
Now go to the Target Designer or Warehouse Designer and create or import the target definition. Create a
"Filename" port in the target.
Drag the source and target into the mapping. Connect the appropriate ports of source qualifier
transformation to the target.
Now create a workflow and session. Edit the session and enter the appropriate values for source and
target connections.
The loading of the filename works for both Direct and Indirect Source filetype. After running the workflow, the
data and the filename will be loaded in to the target. The important point to note is the complete path of the file
will be loaded into the target. This means that the directory path and the filename will be loaded(example:
/informatica/9.1/SrcFiles/Customers.dat).
If you don’t want the directory path and just want the filename to be loaded in to the target, then follow the below
steps:
Create an expression transformation and drag the ports of source qualifier transformation into it.
Edit the expression transformation, go to the ports tab, create an output port and assign the below
expression to it.
REVERSE
(
SUBSTR
(
REVERSE(CurrentlyProcessedFileName),
1,
INSTR(REVERSE(CurrentlyProcessedFileName), '/') - 1
)
)
Now connect the appropriate ports of expression transformation to the target definition.
You cannot update or delete the rows using the constraint base load ordering.
You have to define the primary key and foreign key relationships for the targets in the warehouse or
target designer.
There is a work around to do updates and deletes using the constraint based load ordering. The informatica
powercenter provides an option called complete constraint-based loading for inserts, updates and deletes in the
target tables. To enable complete constraint based loading, specify FullCBLOSupport=Yes in the Custom Properties
attribute on the Config Object tab of session. This is shown in the below image.
When you enable complete constraint based loading, the change data (inserts, updates and deletes) is loaded in
the same transaction control unit by using the row ID assigned to the data by the CDC reader. As a result the data
is applied to the target in the same order in which it was applied to the sources. You can also set this property in
the integration service, which makes it applicable for all the sessions and workflows. When you use complete
constraint based load ordering, mapping should not contain active transformations which change the row ID
generated by the CDC reader.
The following transformations can change the row ID value
Aggregator Transformation
Joiner Transformation
Normalizer Transformation
Rank Transformation
Sorter Transformation
As an example, consider the following source table with data to be loaded into the target tables using the custom
transformation.
dept_id dept_name
-----------------
10 Finance
20 Hr
Follow the below steps for creating the mapping using constraint based load ordering option.
Go to the mapping designer, source analyzer and import the source definition from the oracle database.
Now go to the warehouse designer or target designer and import the target definitions from the oracle
database.
Make sure that the foreign key relationship exists between the dept and emp targets. Otherwise create
the relationship as shown in the below images.
Now create a new mapping. Drag the source and targets into the mapping.
Connect the appropriate ports of source qualifier transformation to the target definition as shown in the
below image.
Go to the workflow manager tool, create a new workflow and then session.
Go to the Config object tab of session and check the option of constraint based load ordering.
Go to the mapping tab and enter the connections for source and targets.
Source:
YEAR PRICE
----------
2010 100
2010 200
2010 300
2011 500
2011 600
2012 700
For simplicity, I have used only the year and price columns of sales table. We need to do aggregation and find the
total price in each year.
When you run the session for the first time using the incremental aggregation, then integration service process the
entire source and stores the data in two file, index and data file. The integration service creates the files in the
cache directory specified in the aggregator transformation properties.
After the aggregation, the target table will have the below data.
Target:
YEAR PRICE
----------
2010 600
2011 1100
2012 700
Now assume that the next day few more rows are added into the source table.
Source:
YEAR PRICE
----------
2010 100
2010 200
2010 300
2011 500
2011 600
2012 700
2010 400
2011 100
2012 200
2013 800
Now for the second run, you have to pass only the new data changes to the incremental aggregation. So, the
source will contain the last four records. The incremental aggregation uses the data stored in the cache and
calculates the aggregation. Once the aggregation is done, the integration service writes the changes to the target
and the cache. The target table will contains the below data.
Target:
YEAR PRICE
----------
2010 1000
2011 1200
2012 900
2013 800
Points to remember
1. When you use incremental aggregation, first time you have to run the session with complete source data
and in the subsequent runs you have to pass only the changes in the source data.
2. Use incremental aggregation only if the target is not going to change significantly. If the incremental
aggregation process changes more than hhalf of the data in target, then the session perfromance many
not benfit. In this case go for normal aggregation.
Q) How to create a non-reusable transformation or session or task from a reusable transformation or session or
task?
I still remember my first project in which i created so many reusable transformations and developed a mapping.
My project lead reviewed the code and told me that you created unnecessary reusable transformation change
them to non reusable transformations. I created non reusable transformations and re-implemented the entire
logic. It took almost one day for me to implement the code. Still so many new informatica developers will do the
same mistake and re implement the entire logic.
I found an easy way to create a non-reusable transformation from a reusable transformation. Follow the below
steps to create a non-reusable transformation or session or task from a reusable transformation or session or task
in informatica is
2. Select the reusable transformation or session or task which you want to convert to non resuable with the
mouse.
3. Drag the object (transformation/session/task) to the work-space and just before leaving the object on the
work-space hold the ctrl key and then release the object.
Now you are done with creating a non-reusable transformation or session or task.
Make use of Source Qualifer "Filter" Properties if the Source type is Relational.
If the subsequent sessions are doing lookup on the same table, use persistent cache in the first session.
Data remains in the Cache and available for the subsequent session for usage.
Use flags as integer, as the integer comparison is faster than the string comparison.
Use tables with lesser number of records as master table for joins.
While reading from Flat files, define the appropriate data type instead of reading as String and converting.
Have all Ports that are required connected to Subsequent Transformations else check whether we can
remove these ports.
Suppress ORDER BY using the '--' at the end of the query in Lookup Transformations.
Turn off the Verbose Logging while moving the workflows to Production environment.
For large volume of data drop index before loading and recreate indexes after load.
For large of volume of records Use Bulk load Increase the commit interval to a higher value large volume
of data.
Aggregator Active/Connected
Expression Passive/Connected
Filter Active/Connected
Joiner Active/Connected
Lookup Passive/Connected or Unconnected
Normalizer Active/Connected
Rank Active/Connected
Router Active/Connected
Sequence Generator Passive/Connected
Sorter Active/Connected
Source Qualifier Active/Connected
SQL Active or Passive/Connected
Stored Procedure Passive/Connected or Unconnected
Transaction Control Active/Connected
Union Active/Connected
Update Strategy Active/Connected
1. What is a transformation?
12. Which transformation can be created only as reusable transformation but not as non-reusable transformation?
External procedure transformation.
2. As union transformation gives UNION ALL output, how you will get the UNION output?
Pass the output of union transformation to a sorter transformation. In the properties of sorter transformation
check the option select distinct. Alternatively you can pass the output of union transformation to aggregator
transformation and in the aggregator transformation specify all ports as group by ports.
The following rules and guidelines need to be taken care while working with union transformation:
You can create multiple input groups, but only one output group.
All input groups and the output group must have matching ports. The precision, datatype, and scale must
be identical across all groups.
The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add
another transformation such as a Router or Filter transformation.
You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union
transformation.
Union is an active transformation because it combines two or more data streams into one. Though the total
number of rows passing into the Union is the same as the total number of rows passing out of it, and the sequence
of rows from any given input stream is preserved in the output, the positions of the rows are not preserved, i.e.
row number 1 from input stream 1 might not be row number 1 in the output stream. Union does not even
guarantee that the output is repeatable
A transaction is a set of rows bound by a commit or rollback of rows. The transaction control transformation is
used to commit or rollback a group of rows.
2. What is the commit type if you have a transaction control transformation in the mapping?
3. What are the different transaction levels available in transaction control transformation?
The following are the transaction levels or built-in variables:
TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction change for this
row. This is the default value of the expression.
TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new transaction, and
writes the current row to the target. The current row is in the new transaction.
TC_COMMIT_AFTER: The Integration Service writes the current row to the target, commits the
transaction, and begins a new transaction. The current row is in the committed transaction.
TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction, begins a new
transaction, and writes the current row to the target. The current row is in the new transaction.
TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target, rolls back the
transaction, and begins a new transaction. The current row is in the rolled back transaction.
Sorter transformation is used to sort the data. You can sort the data either in ascending or descending order
according to a specified sort key.
As sorter transformation can suppress the duplicate records in the source, it is called an active transformation.
Sort the data using sorter transformation before passing in to aggregator or joiner transformation. As the data is
sorted, the integration service uses the memory to do aggregate and join operations and does not use cache files
to process the data.
Q1. Design a mapping to load the cumulative sum of salaries of employees into target table?
The target table data should look like as
Q2. Design a mapping to get the pervious row salary for the current row. If there is no pervious row exists for the
current row, then the pervious row salary should be displayed as null.
The output should look like as
Department_no, Employee_name
----------------------------
20, R
10, A
10, D
20, P
10, B
10, C
20, Q
20, S
Q1. Design a mapping to load a target table with the following values from the above source?
Department_no, Employee_list
----------------------------
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, A,B,C,D,P
20, A,B,C,D,P,Q
20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S
Q2. Design a mapping to load a target table with the following values from the above source?
Department_no, Employee_list
----------------------------
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, P
20, P,Q
20, P,Q,R
20, P,Q,R,S
Click Here to know the solutions for the question 3 and 4
Mode: Specifies the mode in which SQL transformation runs. SQL transformation supports two modes.
They are script mode and query mode.
Database type: The type of database that SQL transformation connects to.
Connection type: Pass database connection to the SQL transformation at run time or specify a connection
object.
Script mode: The SQL transformation runs scripts that are externally located. You can pass a script name
to the transformation with each input row. The SQL transformation outputs one row for each input row.
Query mode: The SQL transformation executes a query that you define in a query editor. You can pass
parameters to the query to define dynamic queries. You can output multiple rows when the query has a
SELECT statement.
4. In which cases the SQL transformation becomes a passive transformation and active transformation?
If you run the SQL transformation in script mode, then it becomes passive transformation. If you run the SQL
transformation in the query mode and the query has a SELECT statement, then it becomes an active
transformation.
5. When you configure an SQL transformation to run in script mode, what are the ports that the designer adds to
the SQL transformation?
The designer adds the following ports to the SQL transformation in script mode:
ScriptName: This is an input port. ScriptName receives the name of the script to execute the current row.
ScriptResult: This is an output port. ScriptResult returns PASSED if the script execution succeeds for the
row. Otherwise it returns FAILED.
ScriptError: This is an output port. ScriptError returns the errors that occur when a script fails for a row.
6. What are the types of SQL queries you can specify in the SQL transformation when you use it in query mode.
Static SQL query: The query statement does not change, but you can use query parameters to change the
data. The integration service prepares the query once and runs the query for all input rows.
Dynamic SQL query: The query statement can be changed. The integration service prepares a query for
each input row.
7. What are the types of connections to connect the SQL transformation to the database available?
Static connection: Configure the connection object tin the session. You must first create the connection
object in workflow manager.
Logical connection: Pass a connection name to the SQL transformation as input data at run time. You must
first create the connection object in workflow manager.
Full database connection: Pass the connect string, user name, password and other connection information
to SQL transformation input ports at run time.
8. How do you find the number of rows inserted, updated or deleted in a table?
You can enable the NumRowsAffected output port to return the number of rows affected by the INSERT, UPDATE
or DELETE query statements in each input row. This NumRowsAffected option works in query mode.
10. When you enable the NumRowsAffected output port in script mode, what will be the output?
In script mode, the NumRowsAffected port always returns NULL.
11. How do you limit the number of rows returned by the select statement?
You can limit the number of rows by configuring the Max Output Row Count property. To configure unlimited
output rows, set Max Output Row Count to zero.
Get a related value: Retrieve a value from the lookup table based on a value in the source.
Perform a calculation: Retrieve a value from a lookup table and use it in a calculation.
Update slowly changing dimension tables: Determine whether rows exist in a target.
Pipeline lookup
A connected lookup transformation is connected the transformations in the mapping pipeline. It receives
source data, performs a lookup and returns data to the pipeline.
An unconnected lookup transformation is not connected to the other transformations in the mapping
pipeline. A transformation in the pipeline calls the unconnected lookup with a :LKP expression.
6. What are the differences between connected and unconnected lookup transformation?
Connected lookup transformation receives input values directly from the pipeline. Unconnected lookup
transformation receives input values from the result of a :LKP expression in another transformation.
Connected lookup transformation can be configured as dynamic or static cache. Unconnected lookup
transformation can be configured only as static cache.
Connected lookup transformation can return multiple columns from the same row or insert into the
dynamic lookup cache. Unconnected lookup transformation can return one column from each row.
If there is no match for the lookup condition, connected lookup transformation returns default value for
all output ports. If you configure dynamic caching, the Integration Service inserts rows into the cache or
leaves it unchanged. If there is no match for the lookup condition, the unconnected lookup
transformation returns null.
In a connected lookup transformation, the cache includes the lookup source columns in the lookup
condition and the lookup source columns that are output ports. In an unconnected lookup
transformation, the cache includes all lookup/output ports in the lookup condition and the lookup/return
port.
Connected lookup transformation passes multiple output values to another transformation. Unconnected
lookup transformation passes one output value to another transformation.
7. How do you handle multiple matches in lookup transformation? or what is "Lookup Policy on Multiple Match"?
"Lookup Policy on Multiple Match" option is used to determine which rows that the lookup transformation returns
when it finds multiple rows that match the lookup condition. You can select lookup to return first or last row or any
matching row or to report an error.
Insert Else Update option applies to rows entering the lookup transformation with the row type of insert.
When this option is enabled the integration service inserts new rows in the cache and updates existing
rows when disabled, the Integration Service does not update existing rows.
Update Else Insert option applies to rows entering the lookup transformation with the row type of
update. When this option is enabled, the Integration Service updates existing rows, and inserts a new row
if it is new. When disabled, the Integration Service does not insert new rows.
Persistent cache
Static cache
Dynamic cache
Shared Cache
Uncached lookup transformation: For each row that enters the lookup transformation, the Integration
Service queries the lookup source and returns a value. The integration service does not build a cache.
12. How the integration service builds the caches for connected lookup transformation?
The Integration Service builds the lookup caches for connected lookup transformation in the following ways:
Sequential cache: The Integration Service builds lookup caches sequentially. The Integration Service builds
the cache in memory when it processes the first row of the data in a cached lookup transformation.
Concurrent caches: The Integration Service builds lookup caches concurrently. It does not need to wait for
data to reach the Lookup transformation.
13. How the integration service builds the caches for unconnected lookup transformation?
The Integration Service builds caches for unconnected Lookup transformations as sequentially.
15. When you use a dynamic cache, do you need to associate each lookup port with the input port?
Yes. You need to associate each lookup/output port with the input/output port or a sequence ID. The Integration
Service uses the data in the associated port to insert or update rows in the lookup cache.
0 - Integration Service does not update or insert the row in the cache.
Unnamed cache: When Lookup transformations in a mapping have compatible caching structures, the
Integration Service shares the cache by default. You can only share static unnamed caches.
Named cache: Use a persistent named cache when you want to share a cache file across mappings or
share a dynamic and a static cache. The caching structures must match or be compatible with a named
cache. You can share static and dynamic named caches.
Join tables in the database: If the source and the lookup table are in the same database, join the tables in
the database rather than using a lookup transformation.
Avoid ORDER BY on all columns in the lookup source. Specify explicitly the ORDER By clause on the
required columns.
Update strategy transformation is used to flag source rows for insert, update, delete or reject within a mapping.
Based on this flagging each row will be either inserted or updated or deleted from the target. Alternatively the row
can be rejected.
3. What are the constants used in update strategy transformation for flagging the rows?
4. If you place an aggregator after the update strategy transformation, how the output of aggregator will be
affected?
The update strategy transformation flags the rows for insert, update and delete of reject before you perform
aggregate calculation. How you flag a particular row determines how the aggregator transformation treats any
values in that row used in the calculation. For example, if you flag a row for delete and then later use the row to
calculate the sum, the integration service subtracts the value appearing in this row. If the row had been flagged for
insert, the integration service would add its value to the sum.
5. How to update the target table without using update strategy transformation?
In the session properties, there is an option 'Treat Source Rows As'. Using this option you can specify whether all
the source rows need to be inserted, updated or deleted.
6. If you have an update strategy transformation in the mapping, what should be the value selected for 'Treat
Source Rows As' option in session properties?
The value selected for the option is 'Data Driven'. The integration service follows the instructions coded in the
update strategy transformation.
7. If you have an update strategy transformation in the mapping and you did not selected the value 'Data Driven'
for 'Treat Source Rows As' option in session, then how the session will behave?
If you do not choose Data Driven when a mapping contains an Update Strategy or Custom transformation, the
Workflow Manager displays a warning. When you run the session, the Integration Service does not follow
instructions in the Update Strategy transformation in the mapping to determine how to flag rows.
8. In which files the data rejected by update strategy transformation will be written?
If the update strategy transformation is configured to Forward Rejected Rows then the integration service
forwards the rejected rows to next transformation and writes them to the session reject file. If you do not select
the forward reject rows option, the integration service drops rejected rows and writes them to the session log file.
If you enable row error handling, the Integration Service writes the rejected rows and the dropped rows to the row
error logs. It does not generate a reject file.
A stored procedure is a precompiled collection of database procedural statements. Stored procedures are stored
and run within the database.
Check the status of a target database before loading data into it.
The stored procedure transformation is connected to the other transformations in the mapping pipeline.
Run a stored procedure every time a row passes through the mapping.
Pass parameters to the stored procedure and receive multiple output parameters.
The stored procedure transformation is not connected directly to the flow of the mapping. It either runs before or
after the session or is called by an expression in another transformation in the mapping.
7. What are the options available to specify when the stored procedure transformation needs to be run?
The following options describe when the stored procedure transformation runs:
Normal: The stored procedure runs where the transformation exists in the mapping on a row-by-row
basis. This is useful for calling the stored procedure for each row of data that passes through the mapping,
such as running a calculation against an input port. Connected stored procedures run only in normal
mode.
Pre-load of the Source: Before the session retrieves data from the source, the stored procedure runs. This
is useful for verifying the existence of tables or performing joins of data in a temporary table.
Post-load of the Source: After the session retrieves data from the source, the stored procedure runs. This
is useful for removing temporary tables.
Pre-load of the Target: Before the session sends data to the target, the stored procedure runs. This is
useful for verifying target tables or disk space on the target system.
Post-load of the Target: After the session sends data to the target, the stored procedure runs. This is
useful for re-creating indexes on the database.
A connected stored procedure transformation runs only in Normal mode. A unconnected stored procedure
transformation runs in all the above modes.
The order in which the Integration Service calls the stored procedure used in the transformation, relative to any
other stored procedures in the same mapping. Only used when the Stored Procedure Type is set to anything
except Normal and more than one stored procedure exists.
INOUT: Defines the parameter as both input and output. Only Oracle supports this parameter type.
A source qualifier represents the rows that the integration service reads when it runs a session. Source qualifier is
an active transformation.
The source qualifier transformation converts the source data types into informatica native data types.
Join two or more tables originating from the same source (homogeneous sources) database.
The source qualifier transformation joins the tables based on the primary key-foreign key relationship.
When there is no primary key-foreign key relationship between the tables, you can specify a custom join using the
'user-defined join' option in the properties tab of source qualifier.
SQL Query
User-Defined Join
Source Filter
Select Distinct
Pre-SQL
Post-SQL
A Sequence generator transformation generates numeric values. Sequence generator transformation is a passive
transformation.
A sequence generator is used to create unique primary key values, replace missing primary key values or cycle
through a sequential range of numbers.
A sequence generator contains two output ports. They are CURRVAL and NEXTVAL.
4. What is the maximum number of sequence that a sequence generator can generate?
5. When you connect both the NEXTVAL and CURRVAL ports to a target, what will be the output values of these
ports?
6. What will be the output value, if you connect only CURRVAL to the target without connecting NEXTVAL?
8. What is the number of cached values set to default for a sequence generator transformation?
For non-reusable sequence generators, the number of cached values is set to zero.
For reusable sequence generators, the number of cached values is set to 1000.
Start Value
Increment By
End Value
Current Value
Cycle
A router is used to filter the rows in a mapping. Unlike filter transformation, you can specify one or more
conditions in a router transformation. Router is an active transformation.
2. How to improve the performance of a session using router transformation?
Use router transformation in a mapping instead of creating multiple filter transformations to perform the same
task. The router transformation is more efficient in this case. When you use a router transformation in a mapping,
the integration service processes the incoming data only once. When you use multiple filter transformations, the
integration service processes the incoming data for each transformation.
Input
Output
User-defined group
Default group
You can creat the group filter conditions in the groups tab using the expression editor.
6. Can you connect ports of two output groups from router transformation to a single target?
No. You cannot connect more than one output group to one target or a single input group transformation.
A rank transformation is used to select top or bottom rank of data. This means, it selects the largest or smallest
numeric value in a port or group. Rank transformation also selects the strings at the top or bottom of a session sort
order. Rank transformation is an active transformation.
The integration service compares input rows in the data cache, if the input row out-ranks a cached row, the
integration service replaces the cached row with the input row. If you configure the rank transformation to rank
across multiple groups, the integration service ranks incrementally for each group it finds. The integration service
stores group information in index cache and row data in data cache.
The designer creates RANKINDEX port for each rank transformation. The integration service uses the rank index
port to store the ranking position for each row in a group.
4. How do you specify the number of rows you want to rank in a rank transformation?
In the rank transformation properties, there is an option 'Number of Ranks' for specifying the number of rows you
wants to rank.
In the rank transformation properties, there is an option 'Top/Bottom' for selecting the top or bottom ranking for a
column.
The normalizer transformation receives a row that contains multiple-occurring columns and retruns a row for each
instance of the multiple-occurring data. This means it converts column data in to row data. Normalizer is an active
transformation.
Since the cobol sources contain denormalzed data, normalizer transformation is used to normalize the cobol
sources.
The integration service increments the generated key sequence number each time it process a source
row. When the source row contains a multiple-occurring column or a multiple-occurring group of
columns, the normalizer transformation returns a row for each occurrence. Each row contains the same
generated key value.
The normalizer transformation has a generated column ID (GCID) port for each multiple-occurring column.
The GCID is an index for the instance of the multiple-occurring data. For example, if a column occurs 3
times in a source record, the normalizer returns a value of 1,2 or 3 in the generated column ID.
4. What is VSAM?
VSAM (Virtual Storage Access Method) is a file access method for an IBM mainframe operating system. VSAM
organize records in indexed or sequential flat files.
The VSAM normalizer transformation is the source qualifier transformation for a COBOL source definition. A
COBOL source is flat file that can contain multiple-occurring data and multiple types of records in the same file.
Pipeline normalizer transformation processes multiple-occurring data from relational tables or flat files.
Occurs clause is specified when the source row has a multiple-occurring columns.
A redefines clause is specified when the source has rows of multiple columns.
A joiner transformation joins two heterogeneous sources. You can also join the data from the same source. The
joiner transformation joins sources with at least one matching column. The joiner uses a condition that matches
one or more joins of columns between the two sources.
You cannot use a joiner transformation when input pipeline contains an update strategy transformation.
You cannot use a joiner if you connect a sequence generator transformation directly before the joiner.
Normal join: In a normal join, the integration service discards all the rows from the master and detail
source that do not match the join condition.
Master outer join: A master outer join keeps all the rows of data from the detail source and the matching
rows from the master source. It discards the unmatched rows from the master source.
Detail outer join: A detail outer join keeps all the rows of data from the master source and the matching
rows from the detail source. It discards the unmatched rows from the detail source.
Full outer join: A full outer join keeps all rows of data from both the master and detail rows.
When the integration service processes a joiner transformation, it reads the rows from master source and builds
the index and data cached. Then the integration service reads the detail source and performs the join. In case of
sorted joiner, the integration service reads both sources (master and detail) concurrently and builds the cache
based on the master rows.
For an unsorted Joiner transformation, designate the source with fewer rows as the master source.
For a sorted Joiner transformation, designate the source with fewer duplicate key values as the master
source.
When the integration service processes an unsorted joiner transformation, it reads all master rows before it reads
the detail rows. To ensure it reads all master rows before the detail rows, the integration service blocks all the
details source while it caches rows from the master source. As it blocks the detail source, the unsorted joiner is
called a blocking transformation.
Type of join
Join condition
A filter transformation is used to filter out the rows in mapping. The filter transformation allows the rows that
meet the filter condition to pass through and drops the rows that do not meet the condition. Filter transformation
is an active transformation.
We can only specify one condition in the filter transformation. To specify more than one condition, we have to use
router transformation?
If the filter condition is set to TRUE, then it passes all the rows without filtering any data. In this case, the filter
transformation acts as passive transformation.
4. Can we concatenate ports from more than one transformation into the filter transformation?
No. The input ports for the filter must come from a single transformation.
Keep the filter transformation as close as possible to the sources in the mapping. This allows the unwanted data to
be discarded and the integration service processes only the required rows. If the source is relational source, use
the source qualifier to filter the rows.
Use sorted input: Sort the data before passing into aggregator. The integration service uses memory to
process the aggregator transformation and it does not use cache memory.
Limit the number of input/output or output ports to reduce the amount of data the aggregator
transformation stores in the data cache.
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
5. Why cannot you use both single level and nested aggregate functions in a single aggregate transformation?
The nested aggregate function returns only one output row, whereas the single level aggregate function returns
more than one row. Since the number of rows returned are not same, you cannot use both single level and nested
aggregate functions in the same transformation. If you include both the single level and nested functions in the
same aggregator, the designer marks the mapping or mapplet as invalid. So, you need to create separate
aggregator transformations.
6. Up to how many levels, you can nest the aggregate functions?
The integration service performs aggregate calculations and then stores the data in historical cache. Next time
when you run the session, the integration service reads only new data and uses the historical cache to perform
new aggregation calculations incrementally.
In incremental aggregation, the aggregate calculations are stored in historical cache on the server. In this historical
cache the data need not be in sorted order. If you give sorted input, the records come as presorted for that
particular run but in the historical cache the data may not be in the sorted order. That is why this option is not
allowed.
You can configure the integration service to treat null values in aggregator functions as NULL or zero. By default
the integration service treats null values as NULL in aggregate functions.
Session property is a set of instructions that instructs Informatica how and when to move the data from source to
targets.
A session property is a task, just like other tasks that we create in workflow manager. Any session you create must
have a mapping associated with it.
A session can have a single mapping at a time and once assigned, it cannot be changed. To execute a session task,
it must be added to a workflow.
A session can be a reusable object or non-reusable. When you create a session in task developer, then it can be
reused, but when you create a session in workflow designer, then it is non-reusable.
Properties of session
Treat Source Rows as Property
How to Make Treat source rows as – Delete
Commit Interval – Property
Session Log File Name & Session Log File directory
Enable Test Load
Memory Properties
Log options
Error Handling
Mapping and source/target Properties
Connection Properties in Mapping
Source Properties
Target Properties
Success or failure of session task
Properties Of Session
Using the properties of the session you can configure various characteristics of the session like pre and
post SQL scripts, log file name and path, memory properties, etc.
You can also override mapping properties in the session properties. In this section, we will discuss the following
important properties of the session.
Step 1) Open the session “s_m_emp_emp_target” in task developer, which we created in the earlier tutorial.
Step 2) Double click on the session icon inside Task Developer to open edit task window.
Step 3) Inside the “Edit Task” window clicks on the properties tab.
Step 4) In properties tab, it will show the properties of the session
Treat Source Rows As Property
This property allows you to define how the source data affects the target table. For example, you can define that
the source record should be inserted or deleted from the target.
Insert
Update
Delete
Data-driven
When this property is set to insert, the source data will be marked to be inserted. It means the data will
only be inserted.
When the property is set to update, the target data will be updated by the source data. For updating of
data primary key needs to be defined in the target table.
When property is set to delete the source data which is already present in the target will be deleted from
the target table. For this property to execute and apply the changes, the primary key should be defined in
the target table.
With the property set to data driven, the Informatica checks what source records are marked. If in a
mapping the source records are marked as insert then records will be inserted into the target. If records
are marked as an update in the mapping, then the records will be updated in the target. So what
operation will be performed at the target depends on how records are handled inside the mapping.
1. In the property tab of the session task, select “Delete” option in “Treat Source Rows as”
2. Select OK Button
Step 2 – To define primary key in target table, open Informatica designer
1. For the EmpNo column, select key type as “primary key” from the scroll down menu and
2. Select OK button.
Step 4 – Save the changes in Informatica and execute the workflow for this mapping.
When you execute this mapping, the source records which are already present in the target will get deleted.
For example, if you are inserting 20,000 records in a target table, and you define commit interval as 5,000, then
after every 5,000 insertions of records in the target, a commit operation will be performed.
Session Log File Name & Session Log File directory
The $PMSessionLogDir\ is an Informatica variable and in windows it points to the following default location “C:\
Informatica\9.6.1\server\infa_shared\SessLogs”.
If you enable this feature, then there is another property – No of Rows to Test, this property should be configured
for the no of records which you want to be fetched from the source for the test load.
Memory Properties
Memory properties give us the flexibility to fine tune the memory allocated to the Informatica for performance
optimizations. When there are high bottleneck and performance is poor then you can try to improve the
performance using the memory properties.
To configure memory properties click on the “config object” tab of the edit task window. It will open another
window where you can configure the changes.
In this section, you can configure the memory properties. For example, default buffer block size, sequential buffer
length, etc. Changes to this properties will determine how much memory should be allocated to Informatica
services for their operation.
Log options
In this property section, you can configure the log properties of the session. You can set the no for how many no of
logs you want to save for a session, session log file max size.
Error Handling
In this section, you can configure the error properties for the session.
Using Stop on errors you can configure after how many errors the session has to be stopped.
You can also configure the behaviour of the session for various errors encountered for example stored procedure
error, pre-post SQL error, etc.
Using SQL query property, you can override the SQL for the source. You can also override the source table name in
this section.
Target Properties
In this section, you can configure the details of the target. You can define whether target load has to be a bulk load
or a normal mode.
In bulk load, the performance gain is achieved as during the load there are no redo log buffers managed by the
database.
On the other hand, normal load is slower as compared to bulk load, but in case of failure database recovery is
possible.
You can also define the property to truncate the target table before populating it. It means before loading any
records in the target, the target table will be truncated, and then the load will be performed. This property is
useful when we create mappings for stage load.
We can also define target table pre SQL and post SQL. Pre SQL is the piece of SQL code which will be executed
before performing insert in the target table, and post SQL code will be executed after the load of target table is
completed.
Success or Failure Of Session Task
When you have multiple sessions inside a workflow, then there can be a scenario where one or more session fails.
In such condition, there comes a question of what would be the status of the workflow because you are having a
workflow in which few tasks have failed, and few task got succeeded. To handle such conditions, Informatica
provides the option to set this failure specific property inside the workflow. To configure such behaviour –
Step 2 – Double click on the command task, this will open edit task window
When you execute this workflow after making the above changes if any of the tasks fails the workflow status will
be made as failed. so you can identify that during the execution of your workflow some of its task has failed.
Kalyanicynixit
tasks you create in workflow manager, a session property is a job. Any session you create must have a
corresponding mapping. A session may have one mapping at a time, and can not be changed once assigned. This
To get in-Depth knowledge on Informatica you can enroll for a live demo on Informatica online training
A session may be an object which can be reused or not reused. If you create a task creator session, it can be reused,
Using the session property you can configure various session features such as pre- and post SQL scripts, log file
The session properties can also be overridden by mapping properties. Let us address the following important
Interval of commitments
Logging Choices
Error Detection
Step 1
In task developer, open the session “s m emp emp target,” which you create.
Step 2
To open the task edit window, double-click on the session icon within Task Creator. Learn more from informatica
course
Step 3
This property helps you to determine how the destination table affects the source data. For instance, you can define
that you should insert or delete the source record from the goal.
Insert
Upgrade
Stretch
Data-fueled
When this property is set for insertion, the source data to be inserted will be marked. It means just inserting the
data.
The destination data will be updated by the source data when the property is set to update. The primary data key
When property is set to remove from the target table the source data that is already present in the list will be
removed. To execute and apply the modifications to this property, the primary key should be specified in the target
table.
The Informatica checks the source records are labelled, with the property set to data driven. If the source records
are marked as insert in a mapping then records are inserted inside the goal. If records are marked in the mapping as
an update, then the records are updated to the target. So what activity at the target will be conducted depends on
Step 1
Select “Delete” option in the session task ‘s property tab in “Treat Source Rows as”
Select OK button
Step 2
Open Informatica designer to define the primary key in the target table
For the column EmpNo, from the scroll down menu , select key sort as “primary key” and
Click OK link.
Step 4
When you execute this mapping you must delete the source records that are already present in the
target.
Interval of Commitments-Property
This property defines the interval after which commit operation to the target table is rendered by the Informatica.
For instance, if you insert 20,000 records into a target table, and you define commit interval as 5,000, then a commit
operation will be performed after every 5,000 record insertions in the target. Get more skills from informatica
training
sharedSessLogs” in windows.
You can check your session and the mappings using this property. When using this function and running the
sessions, records will be retrieved from the sources but not loaded into the target. This feature thus helps inside the
mapping to test the correctness of mappings, parameter files, to function different transformations.
If this feature is enabled, then there is another property — No of Rows to Check, this property should be configured
for the number of records you want to get from the source for the test load
Memory property
Memory properties give us the flexibility to fine-tune the performance optimisation memory allocated to the
Informatica. When there is high bottleneck and poor performance then using the memory properties you can try to
Click on the “Config file” tab in the edit task window to customize memory resources. It will open another window,
You may configure the memory properties within this section. For example, block size of the default buffer, the
sequential buffer length, etc. Changes to those properties will determine how much memory should be allocated for
Logging Choices
You may configure the log properties of the session in this Property section. You can set the no to how many logs for
The error properties for the session can be configured in this section.
You can configure how many errors the session has to be stopped using Stop on Errors.
You can override the mapping tracking levels using override tracing.
You may also customize the session actions for various errors, such as stored procedure error, SQL error pre-post,
etc.
You can customize the properties relating to the mapping and its sources / targets and transformations in the
mapping tab of the session’s edit task window. For this Properties portion, you can override the source and target
property. you may add table name prefixes for the sources and targets that override the table names. Inside the
mappings you can configure the properties of different transformations, sources and targets, besides that you can
also review and override those properties in this section. Reviewing all of those assets is like a single spot.
Using this property, you can identify source and destination database connections.
Source
You may configure the properties relating to the mapping source in this section. For source, you can configure pre-
You can override the SQL for source using the SQL query property. This segment also helps you to override the name
Goal Equipment
You can configure destination details in this section. You can determine if the target load must be a bulk load or a
normal mode. The performance gain in bulk load is achieved, as there are no redo log buffers handled by the
database during loading. On the other hand, regular load is sloyour than bulk load, but recovery of the database is
You can also describe the property before filling in the target table to truncate it. This means the target table will be
truncated before loading any records into the target, and then the load will be executed. This property is useful
when you construct stage load mappings.You can also describe target pre-SQL table and post-SQL table. Pre SQL is
the piece of SQL code that will be executed before inserting into the target table, and after loading the target table,
If inside a workflow you have multiple sessions, then there may be a scenario where one or more sessions fail. In
such a condition, a question arises as to what the workflow status would be because you have a workflow in which
few tasks have failed, and few tasks have been successful. Informatica offers the option of setting this fault-specific
property inside the workflow to handle such conditions. Configuring such behaviour
Step 1
Open the “wkf run command” workflow that you generated earlier
Step 2
Double click the task of the command, this will open the task edit window
To get in-Depth knowledge on Informatica you can enroll for a live demo on Informatica online training
parameter
in document
lsi
present
focus
Informatica
kw density
1k-100k
fkw density
flesch
60
plagiarism
unique
A session may be an object which can be reused or not reused. If you create a task creator session, it can be reused,
The session properties can also be overridden by mapping properties. Let us address the following important
Interval of commitments
Logging Choices
Error Detection
Contacts
Step 1
In task developer, open the session “s m emp emp target,” which you create.Learn more from informatica course
Step 2
To open the task edit window, double-click on the session icon within Task Creator.
Step 3
This property helps you to determine how the destination table affects the source data. For instance, you can define
that you should insert or delete the source record from the goal.
Insert
Upgrade
Stretch
Data-fueled
When this property is set for insertion, the source data to be inserted will be marked. It means just inserting the
data.
The destination data will be updated by the source data when the property is set to update. The primary data key
When property is set to remove from the target table the source data that is already present in the list will be
removed. To execute and apply the modifications to this property, the primary key should be specified in the target
table.
The Informatica checks the source records are labelled, with the property set to data driven. If the source records
are marked as insert in a mapping then records are inserted inside the goal. If records are marked in the mapping as
an update, then the records are updated to the target. So what activity at the target will be conducted depends on
Step 1
Select “Delete” option in the session task ‘s property tab in “Treat Source Rows as”
Select OK button
Step 2
Open Informatica designer to define the primary key in the target table
For the column EmpNo, from the scroll down menu , select key sort as “primary key” and
Click OK link.
Step 4
When you execute this mapping you must delete the source records that are already present in the
target.
Interval of Commitments-Property
This property defines the interval after which commit operation to the target table is rendered by the Informatica.
For instance, if you insert 20,000 records into a target table, and you define commit interval as 5,000, then a commit
operation will be performed after every 5,000 record insertions in the target. Get more skills from informatica
training
sharedSessLogs” in windows.
You can check your session and the mappings using this property. When using this function and running the
sessions, records will be retrieved from the sources but not loaded into the target. This feature thus helps inside the
mapping to test the correctness of mappings, parameter files, to function different transformations.
If this feature is enabled, then there is another property — No of Rows to Check, this property should be configured
for the number of records you want to get from the source for the test load
Memory property
Memory properties give us the flexibility to fine-tune the performance optimisation memory allocated to the
Informatica. When there is high bottleneck and poor performance then using the memory properties you can try to
Click on the “Config file” tab in the edit task window to customize memory resources. It will open another window,
You may configure the memory properties within this section. For example, block size of the default buffer, the
sequential buffer length, etc. Changes to those properties will determine how much memory should be allocated for
Logging Choices
You may configure the log properties of the session in this Property section. You can set the no to how many logs for
The error properties for the session can be configured in this section.
You can configure how many errors the session has to be stopped using Stop on Errors.
You can override the mapping tracking levels using override tracing.
You may also customize the session actions for various errors, such as stored procedure error, SQL error pre-post,
etc.
You can customize the properties relating to the mapping and its sources / targets and transformations in the
mapping tab of the session’s edit task window. For this Properties portion, you can override the source and target
property. you may add table name prefixes for the sources and targets that override the table names. Inside the
mappings you can configure the properties of different transformations, sources and targets, besides that you can
also review and override those properties in this section. Reviewing all of those assets is like a single spot.
Using this property, you can identify source and destination database connections.
Source
You may configure the properties relating to the mapping source in this section. For source, you can configure pre-
You can override the SQL for source using the SQL query property. This segment also helps you to override the name
Goal Equipment
You can configure destination details in this section. You can determine if the target load must be a bulk load or a
normal mode. The performance gain in bulk load is achieved, as there are no redo log buffers handled by the
database during loading. On the other hand, regular load is sloyour than bulk load, but recovery of the database is
You can also describe the property before filling in the target table to truncate it. This means the target table will be
truncated before loading any records into the target, and then the load will be executed. This property is useful
when you construct stage load mappings.You can also describe target pre-SQL table and post-SQL table. Pre SQL is
the piece of SQL code that will be executed before inserting into the target table, and after loading the target table,
If inside a workflow you have multiple sessions, then there may be a scenario where one or more sessions fail. In
such a condition, a question arises as to what the workflow status would be because you have a workflow in which
few tasks have failed, and few tasks have been successful. Informatica offers the option of setting this fault-specific
property inside the workflow to handle such conditions. Configuring such behaviour
Step 1
Open the “wkf run command” workflow that you generated earlier
Step 2
Double click the task of the command, this will open the task edit window
done as fail. So, you can identify that some of its tasks failed during the execution of your workflow.
What is Workflow?
Workflow is a group of instructions/commands to the integrations service in Informatica. The integration service is
an entity which reads workflow information from the repository, fetches data from sources and after performing
transformation loads it into the target.
Workflow – It defines how to run tasks like session task, command task, email task, etc.
To create a workflow
A Workflow is like an empty container, which has the capacity to store an object you want to execute. You add
tasks to the workflow that you want to execute. In this tutorial, we are going to do following things in workflow.
Workflow execution can be done in two ways
1. We are going to connect to repository “guru99”, so double click on the folder to connect.
2. Enter user name and password then select “Connect Button”.
For Example, in your mapping if you have source table in oracle database, then you will need oracle connection so
that integration service can connect to the oracle database to fetch the source data.
Relational Connection
Ftp Connection
Queue
Application
The choice of connection you will create, will depend on the type of source and target systems you want to
connect. More often, you would be using relational connections.
Task Developer
Worklet Designer
Workflow Designer
Task Developer – Task developer is a tool with the help of which you can create reusable objects. Reusable object
in workflow manager are objects which can be reused in multiple workflows. For Example, if you have created a
command task in task developer, then you can reuse this task in any number of workflows.
The role of Workflow designer is to execute the tasks those are added in it. You can add any no of tasks in a
workflow.
Command task
Session task
Email task
Command task – A command task is used to execute different windows/unix commands during the execution of
the workflow. You can create command task to execute various command based tasks. With help of this task you
can execute commands to create files/folders, to delete files/folders, to do ftp of files etc.
Email task – With the help of email task you can send email to defined recipients when the Integration Service runs
a workflow. For example, if you want to monitor how long a session takes to complete, you can configure the
session to send an email containing the details of session start and end time. Or, if you want the Integration
Service to notify you when a workflow completes/fails, you can configure the email task for the same.
Step 2 – Once task developer is opened up, follow these steps
This will create command task folder. Now you have to configure the task to add command in it, that we will see in
next step.
Step 4 – To configure the task, double click on the command task icon and it will open an “edit task window”. On
the new edit task window
Afther this step you will return to the edit tasks window and you will be able to see the command you added in to
the command text box.
Step 6 – Click OK on the edit task window,
The command task will be created in the task developer under “Guru99” repository.
Note – use ctrl+s shortcut to save the changes in repository
How to create workflow to execute command task
To execute command taks you have to switch on to workflow designer. A workflow designer is a parent or
container object in which you can add multiple tasks and when workflow is executed, all the added tasks will
execute. To create a workflow
Step 1 – Open the workflow designer by clicking on workflow designer menu
Step 2 – In workflow designer
Naming Convention – Workflow names are prefixed with using ‘wkf_’, if you have a session named
‘s_m_employee_detail‘ then workflow for the same can be named as ‘wkf_s_m_employee_detail’.
When you create a workflow, it does not consist of any tasks. So, to execute any task in a workflow you have to
add task in it.
Step 4 – To add command task that we have created in Task developer to the workflow desinger
Step 5 – Select the “link task option” from the toolbox from the top menu. (The link task option links various tasks
in a workflow to the start task, so that the order of execution of tasks can be defined).
Step 6 – Once you select the link task icon, it will allow you to drag the link between start task and command task.
Now select the start task and drag a link to the command task.
Now you are ready with the workflow having a command task to be executed.
Once the workflow is executed, it will execute the command task to create a folder (guru99 folder) in the defined
directory.
Session Task
A session task in Informatica is required to run a mapping.
Without a session task, you cannot execute or run a mapping and a session task can execute only a single mapping.
So, there is a one to one relationship between a mapping and a session. A session task is an object with the help of
which Informatica gets to know how and where to execute a mapping and at which time. Sessions cannot be
executed independently, a session must be added to a workflow. In session object cache properties can be
configured and also advanced performance optimization configuration.
Step 6 – In this step you will create a workflow for the session task. Click on the workflow designer icon.
Step 7 – In the workflow designer tool
Step 10 – Click on the link task option in the tool box.
Step 11 – Link the start task and session task using the link.
Step 12 – Double click on the session object in wokflow manager. It will open a task window to modify the task
properties.
1. Parallel
2. Serial
In parallel linking the tasks are linked directly to the start task and all tasks start executing in parallel at same time.
Step 2 – In the workflow, add session task “s_m_emp_emp_target”. ( by selecting session and then drag and drop)
Step 3 – Select the link task option from the toolbox
Step 4 – link the session task to the start task (by clicking on start taks, holding the click and connecting to session
task)
After linking the session task, the workflow will look like this.
The link between the start task and session task will be removed.
Step 3 – Now again go to top menu and select the link task option from the toolbox
If you start the workflow the command task will execute first and after its execution, session task will start.
Workflow Variable
Workflow variables allows different tasks in a workflow to exchange information with each other and also allows
tasks to access certain properties of other tasks in a workflow. For example, to get the current date you can use
the inbuilt variable “sysdate”.
Most common scenario is when you have multiple tasks in a workflow and in one task you access the variable of
another task. For example, if you have two tasks in a workflow and the requirement is to execute the second task
only when first task is executed successfully. You can implement such scenario using predefined variable in the
workflow.
Implementing the scenario
We had a workflow “wkf_run_command” having tasks added in serial mode. Now we will add a condition to the
link between session task and command task, so that, only after the success of command task the session task will
be executed.
Step 2 – Double click on the link between session and command task
When you execute this workflow, the command task executes first and only when it succeeds then only the session
task will get executed.
Workflow Parameter
Workflow parameters are those values which remain constant throughout the run. once their value is assigned it
remains same. Parameters can be used in workflow properties and their values can be defined in parameter files.
For example, instead of using hard coded connection value you can use a parameter/variable in the connection
name and value can be defined in the parameter file.
Parameter files are the files in which we define the values of mapping/workflow variables or parameters. There
files have the extension of “.par”. As a general standard a parameter file is created for a workflow.
Advantages of Parameter file
[folder_name.WF:Workflow_name]
$Parameter_name=Parameter_value
Folder_name is the name of repository folder, workflow name is the name of workflow for which you are creating
the parameter file.
We will be creating a parameter file for the database connection “guru99” which we assigned in our early sessions
for sources and targets.
In the file we have created a parameter “$DBConnection_SRC”, we will assign the same to a connection in our
workflow.
When we execute the workflow, the workflow picks the parameter file looks for the value of its
paramters/variables in the parameter file and takes those values.