You are on page 1of 28

Lookup Transformation

By the end of this sub-section you will be familiar with:


 Lookup Basics
 How does a Lookup work
 Lookup Properties
 Lookup Conditions
 Lookup Cache Overview
 Lookup Cache considerations
 Lookup Cache Types
 Lookup Techniques

idwbitraining@gmail.com 1
Lookup Basics
Purpose of Lookup Transformation:
Getting related value: Retrieve value from the lookup table
based on a value in the source. And the value returned can
also be used to perform a calculation like any other port.
Update slowly changing dimension tables: Determine whether
rows exist in a target and accordingly you can create a new
record or update the existing one.
Lookup can be used as Connected/Unconnected and it is
termed as both Passive/Active based on the type of output
we want it to deliver.
The lookup can be performed on flat file/relation tables ,views
or synonym.
idwbitraining@gmail.com 2
How a Lookup Transformation Works
 For each Mapping row, one or more port values are looked up
in a database table
 If a match is found, one or more table values are returned to
the Mapping. If no match is found, NULL is returned
Look Up Transformation
Look-up
Values
Return
SQ_TARGET_ITEMS_OR... LKP_OrderID TARGET_ORDERS_COS...
Values
Source Qualifier Lookup Procedure Target Definition
Name Datatype Len... Name Datatype Len... Loo... Ret... AssociatedK...Name
... Datatype L
ITEM_ID decimal 38 IN_ORDER_ID decimal 38 No No ORDER_ID number(p,s) 3
ITEM_NAME string 72 DATE_ENTERED date/ time 19 Yes No DATE_ENTERED date 1
ITEM_DESC string 72 DATE_PROMISED date/ time 19 Yes No DATE_PROMISED date 1
WHOLESALE_CO... decimal 10 DATE_SHIPPED date/ time 19 Yes No DATE_SHIPPED date 1
DISCONTINUED_... decimal 38 EMPLOYEE_ID decimal 38 Yes No EMPLOYEE_ID number(p,s) 3
MANUFACTURER...decimal 38 CUSTOMER_ID decimal 38 Yes No CUSTOMER_ID number(p,s) 3
DISTRIBUTOR_ID decimal 38 SALES_TAX_RATE decimal 5 Yes No SALES_TAX_RATE number(p,s) 5
ORDER_ID decimal 38 STORE_ID decimal 38 Yes No STORE_ID number(p,s) 3
TOTAL_ORDER_... decimal 38 TOTAL_ORDER_... number(p,s) 3

idwbitraining@gmail.com 3
Lookup Transformation
Looks up values in a database table or flat files and provides data to
downstream transformation in a Mapping

 Passive Transformation
 Connected / Unconnected
 Ports
• Mixed
• “L” denotes Lookup port
• “R” denotes port used as a
return value (unconnected
Lookup only)
 Specify the Lookup Condition
 Usage
• Get related values
• Verify if records exists or if
data has changed
idwbitraining@gmail.com 4
Lookup Properties

Override
Lookup SQL
option

Toggle
caching

Native Database
Connection
Object name

idwbitraining@gmail.com 5
Additional Lookup Properties

Set cache
directory

Make cache
persistent

Set Lookup
cache sizes

idwbitraining@gmail.com 6
Lookup Conditions
Multiple conditions are supported

idwbitraining@gmail.com 7
Connected Lookup
SQ_TARGET_ITEMS_OR... LKP_OrderID TARGET_ORDERS_COS...
Source Qualifier Lookup Procedure Target Definition
Name Datatype Len... Name Datatype Len... Loo... Ret... AssociatedK...Name
... Datatype L
ITEM_ID decimal 38 IN_ORDER_ID decimal 38 No No ORDER_ID number(p,s) 3
ITEM_NAME string 72 DATE_ENTERED date/ time 19 Yes No DATE_ENTERED date 1
ITEM_DESC string 72 DATE_PROMISED date/ time 19 Yes No DATE_PROMISED date 1
WHOLESALE_CO... decimal 10 DATE_SHIPPED date/ time 19 Yes No DATE_SHIPPED date 1
DISCONTINUED_... decimal 38 EMPLOYEE_ID decimal 38 Yes No EMPLOYEE_ID number(p,s) 3
MANUFACTURER...decimal 38 CUSTOMER_ID decimal 38 Yes No CUSTOMER_ID number(p,s) 3
DISTRIBUTOR_ID decimal 38 SALES_TAX_RATE decimal 5 Yes No SALES_TAX_RATE number(p,s) 5
ORDER_ID decimal 38 STORE_ID decimal 38 Yes No STORE_ID number(p,s) 3
TOTAL_ORDER_... decimal 38 TOTAL_ORDER_... number(p,s) 3

Connected Lookup
Part of the data flow pipeline

idwbitraining@gmail.com 8
Unconnected Lookup
 Will be physically “unconnected” from other transformations
• There can be NO data flow arrows leading to or from an unconnected Lookup

Lookup function can be set within any


transformation that supports expressions

Lookup data is
called from the
point in the
Mapping that
needs it

Function in the Aggregator


calls the unconnected
Lookup

idwbitraining@gmail.com 9
Unconnected Lookup - Return Port
 The port designated as ‘R’ is the return port for the unconnected lookup
 There can be only one return port
 The look-up (L) / Output (O) port can be assigned as the Return (R) port
 The Unconnected Lookup can be called in any other transformation’s
expression editor using the expression
:LKP.Lookup_Tranformation(argument1, argument2,..)

idwbitraining@gmail.com 10
Connected vs. Unconnected Lookups

CONNECTED LOOKUP UNCONNECTED LOOKUP

Part of the mapping data flow Separate from the mapping data
flow
Returns multiple values (by linking Returns one value (by checking the
output ports to another Return (R) port option for the output
transformation) port that provides the return value)
Executed for every record passing Only executed when the lookup
through the transformation function is called
More visible, shows where the Less visible, as the lookup is called
lookup values are used from an expression within another
transformation
Default values are used Default values are ignored

idwbitraining@gmail.com 11
Conditional Lookup Technique
Two requirements:
 Must be Unconnected (or “function mode”) Lookup
 Lookup function used within a conditional statement

Row keys
Condition (passed to Lookup)

IIF ( ISNULL(customer_id),0,:lkp.MYLOOKUP(order_no))

Lookup function

 Conditional statement is evaluated for each row


 Lookup function is called only under the pre-defined condition
idwbitraining@gmail.com 12
Conditional Lookup Advantage
 Data lookup is performed only for those rows which require it.
Substantial performance can be gained

EXAMPLE: A Mapping will process 500,000 rows. For two percent of those rows
(10,000) the item_id value is NULL. Item_ID can be derived from the
SKU_NUMB.

IIF ( ISNULL(item_id), 0,:lkp.MYLOOKUP (sku_numb))

Condition Lookup
(true for 2 percent of all rows) (called only when condition is true)

Net savings = 490,000 lookups

idwbitraining@gmail.com 13
To Cache or not to Cache?
Caching can significantly impact performance
 Cached
• Lookup table data is cached locally on the machine
• Mapping rows are looked up against the cache
• Only one SQL SELECT is needed

 Uncached
• Each Mapping row needs one SQL SELECT
 Rule Of Thumb: Cache if the number (and size) of records in
the Lookup table is small relative to the number of mapping
rows requiring lookup or large cache memory is available for
Integration Service
idwbitraining@gmail.com 14
Lookup cache - overview

• Lookup transformations can be configured to use cache.

• The Integration Service builds the cache in memory when the first row is
processed. If the memory is inadequate, the data is paged into a cache file.

• If you use a flat file lookup, the Integration Service always caches the lookup rows.

• By default, the cache files are created under $PMCacheDir.

• Cache if the number (and size) of records in the Lookup table is small relative to
the number of mapping rows requiring the lookup.

idwbitraining@gmail.com 15
Lookup cache - Types
• There are two types of lookup caches – Static and Dynamic
Un-cached Static cache Dynamic cache
The lookup table is queried each Cannot insert/update the cache once Can insert/update rows in the cache for each
time. created row from source (previous widget)
Cannot use flat file as lookup Can use relational and flat file lookups Can use relational and flat file lookups
source
When the condition matches, When the condition matches, lookup When the condition matches, rows are
lookup returns a row returns a row updated in the cache or left unchanged
depending on the row type
If the condition is false, the If the condition is false, the default value When the condition is false, rows are
default value is returned for is returned for connected and NULL is updated in the cache or left unchanged
connected and NULL is returned returned for unconnected lookups depending on the row type
for unconnected lookups

idwbitraining@gmail.com 16
Lookup cache – for connected
• The Integration Service can build cache for connected lookups in two ways
• Sequential cache: The Integration Service builds the cache in memory when it processes the
first row of the data in a cached lookup transformation. It waits for upstream transformations
to complete before building a cache.
• Concurrent cache: The Integration Service does not wait for upstream active transformations
to complete. It starts building the cache as soon as session starts. This may improve
performance if you are sure that the cache is needed each time the mapping is run.
• For example: if the transformation logic in a mapping is configured to route data to different
pipelines, the downstream lookup might not be hit each time. In this case, it is advisable to
go for sequential cache.
• Unconnected lookup caches cannot be processed concurrently.

idwbitraining@gmail.com 17
Lookup cache: Static

• This is the default type of cache.

• Cache is built when the first lookup row is processed.

• For each row that passes the transformation, the cache is queried for specified
condition.

• If a match is available, the proper value is returned.

• If a match is not available either default value (for connected lookups only) or
NULL is returned.

• If multiple matches are found, rows are returned based on the option specified in
“Lookup policy on multiple match” in the lookup properties.

idwbitraining@gmail.com 18
Lookup cache: Dynamic

• The cache file is constantly updated by the following actions

• Insert - Inserts the row into the cache if it is not present and you specified to insert
rows. You can configure to insert rows into cache based on input ports or
generated sequence IDs.

• Update – updates the row in cache if the row is already present and an update is
specified in the properties

• No change:
– Row does not exist in cache, but you have specified to only insert new rows

– Row does not exist in cache, but you have specified update existing rows only

– Row exists in the cache, but based on the lookup conditions nothing changes

idwbitraining@gmail.com 19
Lookup cache – dynamic – when to use

• Some situations where dynamic lookups can be used

• Updating a master customer table with new and updated customer information.
– Use a Lookup transformation to perform a lookup on the customer table to determine if
a customer exists in the target. Use a dynamic lookup cache that inserts and updates
rows in the cache as it passes rows to the target.

• Loading data into a slowly changing dimension table and a fact table.
– Load data into a slowly changing dimension table and a fact table. Create two pipelines
and configure a Lookup transformation that performs a lookup on the dimension table.
Use a dynamic lookup cache to load data to the dimension table. Use a static lookup
cache to load data to the fact table, and specify the name of the dynamic cache from the
first pipeline.

idwbitraining@gmail.com 20
Lookup cache – dynamic – properties
• Dynamic lookup cache consists of the following properties
Property Description
NewLookupRow This port is added when the lookup is configured as dynamic. 0=No change, 1=insert, 2=update

Associated port The data in the associated port is used to determine whether to insert/update rows in cache. A
sequence id can also be used as associated port wherein Informatica generates and uses a
primary key
Ignore Null Inputs for This port is selected when you do not want to update the data in cache when this column is
Updates NULL
Ignore in Comparison The Integration Service compares the values in all lookup ports with the values in their
associated input ports by default. Select this property if you want the Integration Service to
ignore the port when it compares values before updating a row.
Insert else Update This affects only rows that enters the lookup transformation flagged as insert. Inserts a row into
cache if it is new. If the row exists in index cache, but the data cache is different, then it updates
the cache. If this option is not selected, Informatica inserts all new rows and ignores update
rows.
Update else Insert This affects only rows that enter the lookup transformation flagged as update. If the row exists
in cache, Informatica updates the data cache. If a row does not exist in cache, it inserts a new
row. If this option is not selected, Informatica updates rows in cache and ignores new rows

idwbitraining@gmail.com 21
Lookup cache – dynamic - behavior
• Dynamic lookup cache behavior for insert row type
Insert else update Row found in cache Data cache is different Lookup cache result NewLookupRow
option value
Not selected Yes n/a No change 0
No n/a Insert 1
selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a Insert 1

 Dynamic lookup cache behavior for update row type


Update else insert Row found in cache Data cache is different Lookup cache result NewLookupRow
option value
Not selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a No change 0
selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a Insert 1

idwbitraining@gmail.com 22
Lookup cache – dynamic - guidelines
• The Lookup transformation must be a connected transformation.
• You can only create an equality lookup condition. You cannot look up a range of data in
dynamic cache.
• Associate each lookup port that is not in the lookup condition with an input port or a
sequence ID.
• When you use a lookup SQL override, make sure you map the correct columns to the
appropriate targets for lookup.
• When you add a WHERE clause to the lookup SQL override, use a Filter transformation before
the Lookup transformation.
• Use Update Strategy transformations after the Lookup transformation to flag the rows for
insert or update for the target.
• Use an Update Strategy transformation before the Lookup transformation to define some or
all rows as update if you want to use the Update Else Insert property in the Lookup
transformation.
• Set the row type to Data Driven in the session properties.
• Select Insert and Update as Update for the target table options in the session properties.

idwbitraining@gmail.com 23
Lookup cache – sharing unnamed cache

• When two Lookup transformations share an unnamed cache, the Integration


Service saves the cache for a Lookup transformation and uses it for subsequent
Lookup transformations that have the same lookup cache structure.

• For example, if you have two instances of the same reusable Lookup
transformation in one mapping and you use the same output ports for both
instances, the Lookup transformations share the lookup cache by default

• Shared transformations must use the same ports in the lookup condition. The
conditions can use different operators, but the ports must be the same.

idwbitraining@gmail.com 24
Lookup cache – sharing named cache

• You can also share the cache between multiple Lookup transformations by using a
persistent lookup cache and naming the cache files.

• When the Integration Service processes the first Lookup transformation, it


searches the cache directory for cache files with the same file name prefix.

• If the Integration Service finds the cache files and you do not specify to recache
from source, the Integration Service uses the saved cache files.

• If the Integration Service does not find the cache files or if you specify to recache
from source, the Integration Service builds the lookup cache us.

• The Integration Service saves the cache files to disk after it processes each target
load order.

idwbitraining@gmail.com 25
Lookup cache – sharing named cache

• The Integration Service fails the session if you configure subsequent Lookup transformations
to recache from source, but not the first one in the same target load order group.

• If the cache structures do not match, the Integration Service fails the session.

• The Integration Service processes multiple sessions simultaneously when the Lookup
transformations only need to read the cache files.

• The Integration Service fails the session if one session updates a cache file while another
session attempts to read or update the cache file.
– For example, Lookup transformations update the cache file if they are configured to use a dynamic
cache or recache from source.

idwbitraining@gmail.com 26
Lookup cache - Tips
• Cache small lookup tables.
• Improve session performance by caching small lookup tables. The result of the
lookup query and processing is the same, whether or not you cache the lookup
table.
• Use a persistent lookup cache for static lookup tables.
• If the lookup table does not change between sessions, configure the Lookup
transformation to use a persistent lookup cache.
• The Integration Service then saves and reuses cache files from session to session,
eliminating the time required to read the lookup table.
• Care should be taken to ensure that data does not become stale while using
persistent cache.
– For example: in a daily load, always cache a persistent lookup first (using re-cache from
source option), before they are used in other mappings. It is a good idea to re-cache a
persistent lookup in order to match any changes in the lookup table

idwbitraining@gmail.com 27
Lookup cache
Enable caching

Cache directory

Using persistent cache

Data cache size

Index cache size

Dynamic lookup

Naming a persistent cache

Recache for persistent cache

Dynamic lookup options

idwbitraining@gmail.com 28

You might also like