You are on page 1of 17

Informatica- Complex Scenarios and their Solutions

Author(s)
Aatish Thulasee Das
Rohan Vaishampayan
Vishal Raj
Date written(MM/DD/YY): 07/01/00!
De"laration
We hereby declare that this document is based on our personal experiences and / or
experiences of our project members. To the best of our knowlede! this document
does not contain any material that infrines the copyrihts of any other indi"idual or
orani#ation includin the customers of Infosys.
$atish Thulasee %as! &ohan 'aishampayan! 'ishal &aj
#roje"t Details
Projects involved: REYREY
H/W Platform: 516 RAM, Microsoft Windows !!!
"/W Environment: #nformatica
A$$ln% &'$e: E&( tool
Project &'$e : )ataware *o+sin,
Tar$et rea%ers: )ataware*o+sin, team +sin, E&( tools
&eywor%s
E&( &ools, #nformatica, )ataware Ho+sin,

INDEX
INFORMATICA- COMPLEX SCENARIOS AND THEIR SOLUTIONS...........................1
Author(s)........................................................................................................................1
Aatish Thulasee Das.......................................................................................................1
Rohan Vaishampayan.....................................................................................................1
Vishal Raj.......................................................................................................................1
Date written(MM/DD/YY): 07/01/2003........................................................................1
Declaration.....................................................................................................................1
Project Details ...............................................................................................................1
Target readers: Datawarehousing team using ETL tools................................................1
Keywords .......................................................................................................................1
INTRODUCTION............................................................................................ ..........................4
SCENARIOS:.......................................................................................................................... ....4
1. PERFORMANCE PROBLEMS WHEN A MAPPING CONTAINS MULTIPLE
SOURCES AND TARGETS..................................................................................................... ....4
1.1 Background................................................................................................................4
1.2 Problem Scenario:.....................................................................................................4
1.3 Solution:...................................................................................................................4
Divide and Rule. It is always better to divide the Complex mapping (i.e. multiple
source and Targets) in to simple mappings with one source and one target. That will
greatly help in managing the mappings. Also all the related mappings can be executed
in parallel in different sessions. Each session will establish its own connection and the
server can handle all the requests in parallel against the multiple targets. Each session
can be placed in to the Batch and run in CONCURRENT mode.................................4
2. WHEN SOURCE DATA IS FLAT FILE.................................................................. .............5
2.1 Background...............................................................................................................5
What is a Flat File?.........................................................................................................5
A Flat file is one in which table data is gathered in lines of ASCII text with the value
from each table cell separated by a delimiter or space and each row represented with a
new line...........................................................................................................................5
Below is the sample Flat File which was used during the project..................................5
........................................................................................................................................5
Fig 2.1: In_Daily - Flat File............................................................................................5
2.2 Problem Scenario......................................................................................................5
When the above flat file was loaded into Informatica the Source analyzer was like
shown below....................................................................................................................5
........................................................................................................................................6
Fig 2.2: In_Daily - Flat File after loading into Informatica...........................................6
Two Issues which were encountered during loading the above shown flat files are as
following:........................................................................................................................6
2.3 Solution....................................................................................................................6
Following is the solution which was incorporated by us to solve the above problem ...6
1. Since the data was so heterogeneous we decided to keep all the data types in the
source qualifier as String and changed them as per the fields in which they were
getting mapped................................................................................................................6
2. Regarding the Size of the fields we changed the size to the maximum possible size
for example as mentioned................................................................................................7
............................................................................................................................7
3 EXTRACTING DATA FROM THE FLAT FILE CONTAINING NESTED RECORD
SETS......................................................................................................................... .....................7
4. TOO LARGE LOOKUP TABLES:.................................................................... ...................
5 COMPLEX LOGIC FOR SE!UENCE GENERATION:................................ .................13
Introduction
This Document is based upon learning that we had during the work on project Reynolds
and Reynolds in CAPS PCC!" Pune# $e ha%e come up with the &est Practices to
o%ercome the comple' scenarios we (aced during the )T* process# This Document also
tells about some common best practices to (ollow while de%eloping the mappings#
Scenarios(
). *erformance problems when a mappin contains multiple
sources and Tarets.
1'1 (a")$roun%
+n +n(ormatica" multiple sources can be mapped with the multiple targets# This
property is ,uite use(ul to map the relati%e mappings at one place# This reduces
the creation o( multiple sessions# Also all the relati%e loading takes place in one
go# +t is ,uite logical to group the di((erent sources and targets in same mapping
that contains the same logic#
1' #ro*lem +"enario:
+n the multiple target scenarios" i( there are some comple' trans(ormations in
some o( the sub mappings then the per(ormance is degraded drastically# +n this
scenario the single database connection is handling multiple database statements#
Also it is di((icult to manage the mapping# -or e'ample i( there is per(ormance
problem due to one o( the sub mapping then other sub mapping will also su((er
the per(ormance degradation#
1'! +olution:
Di%ide and Rule# +t is always better to di%ide the Comple' mapping i#e# multiple
source and Targets! in to simple mappings with one source and one target# That
will greatly help in managing the mappings# Also all the related mappings can be
e'ecuted in parallel in di((erent sessions# )ach session will establish its own
connection and the ser%er can handle all the re,uests in parallel against the
multiple targets# )ach session can be placed in to the &atch and run in
C./C0RR)/T mode#
+. When source data is ,lat ,ile
'1 (a")$roun%
,hat is a -lat -ile.
A Flat file is one in which table data is gathered in lines of ASCII text with the value from
each table cell separated by a delimiter or space and each row represented with a new
line.
Below is the sample Flat File which was used during the project.
-i$ '1: /n0Daily 1 -lat -ile'
' #ro*lem +"enario
hen the above flat file was loaded into Informatica the Source analy!er was li"e shown
below
-i$ ': /n0Daily 1 -lat -ile a2ter loa%in$ into /n2ormati"a'
#wo Issues which were encountered during loading the above shown flat files are as
following$
%. &ata types of the fields from the flat file and the respective fields from #arget tables
were not matching. For example refer to Fig '.% in the First row i.e. record
corresponding to (B)* the Fourth field is having its &ata type as (&ate* also refer the
#hird row i.e. field corresponding to (C+* the fourth field is (Char* and in the target
table the corresponding field was having data type as (Char*.
'. Si!e of the fields from the flat file and the respective fields from #arget tables were
not matching. For example refer to Fig '.% the ,ighth row i.e. record corresponding to
(-+* the fifth field is having its Field si!e as %.. but after the loading process the
source analy!er showed the si!e of the field e/ual to 01 2as shown in the Fig '.'3
also the fifth field corresponding to (C+* is 1 and in the target table the corresponding
field was having si!e e/ual to %...
'! +olution
Following is the solution which was incorporated by us to solve the above problem
%. Since the data was so heterogeneous we decided to "eep all the data types in
the source /ualifier as (String* and changed them as per the fields in which they
were getting mapped.
'. +egarding the Si!e of the fields we changed the si!e to the maximum possible
si!e for example as mentioned

- .xtractin data from the flat file containin nested record
sets.
!'1 (a")$roun%:
#he Flat file shown in the previous section 2fig '.'3 contains the nested record set. #o
explain the nested formation of the record of the above file is restructured in the Fig 4.%.
-ig 1#23 +n4Daily 5 -lat -ile restructured in the /ested (orm#
6ere the data is in 1 le%els# -irst le%el o( data is containing the &atch -ile
in(ormation starting with record &6 and ending with &T record# The second le%el
o( data is containing Dealer records in the batch (ile" starting with record D6 and
ending with DT# The third le%el o( data is containing in(ormation o( di((erent
acti%ities (or a particular dealer#
!' #ro*lem +"enario:
The data re,uired (or loading was in the (orm such that a single row should
consist o( dealer detail as well as di((erent acti%ities done by the particular dealer#
&ut only the second le%el data i#e# 7
nd
and 28
th
rows in the (lat (ile shown abo%e!
contain the di((erent dealer details and the Third le%el o( data contains the
di((erent acti%ity details (or dealers# &oth the data re,uired to be concatenated to
(orm single in(ormation to load in a single row o( target table#
!'! +olution:
+n this particular kind o( scenario" the dealer in(ormation data Second *e%el data!
should be stored into %ariables by putting the condition that satis(ies the dealer
in(ormation# This row should be (iltered in the ne't trans(ormation# So" (or that
*e%el 2
*e%el 1
*e%el 7
particular row o( (lat (ile i#e# dealer in(ormation! the data is stored in the
%ariables# And (or the dealers acti%ity data Third *e%el Data!" row should be
passed to ne't trans(ormation with the Dealer +n(ormation that was stored in the
%ariable during pre%ious row load#
The same is done here3
/. Too 0are 0ookup Tables(
3'1 (a")$roun%:
hat is a 5oo"up #ransformation6
A 5oo"up transformation is used in your mapping to loo" up data in a relational table7
view7 or synonym 2See 0.%3. Import a loo"up definition from any relational database to
which both the Informatica Client and Server can connect. 8ultiple 5oo"up
transformations can be used in a mapping.
#he Informatica Server /ueries the loo"up table based on the loo"up ports in the
transformation 2See Fig 0.'3. It compares 5oo"up transformation port values to loo"up
table column values based on the loo"up condition. 9se the result of the loo"up to pass
to other transformations and the target.
:ou can use the 5oo"up transformation to perform many tas"s7 including$
4et a relate% 5alue' For example7 if your source table includes employee I&7 but you
want to include the employee name in your target table to ma"e your summary data
easier to read.
#er2orm a "al"ulation' 8any normali!ed tables include values used in a calculation7
such as gross sales per invoice or sales tax7 but not the calculated value 2such as net
sales3.
6p%ate slowly "han$in$ %imension ta*les' :ou can use a 5oo"up transformation to
determine whether records already exist in the target.
2#he actual screens are attached for reference.3
-i$ 3'1: 788&6# is a )in% o2 Trans2ormation'
5oo"up Conditions
-i$ 3': The 7oo)up 9on%itions to *e spe"i2ie% in or%er to $et 7oo)up Values'
3' #ro*lem +"enario:
In the project one of the mappings had large loo"up tables that were hampering the
performance of the mapping as
a. #hey were consuming a lot of cache memory unnecessarily and
b. 8ore time was spent in searching for relatively less number of values from a large
loo"up table.
#hus the loading of data from source table2s3 to the target table2s3 was unnecessarily
consuming more time than it should normally do.
3'! 8ur +olution:
e eliminated the first problem by simply using the loo"up table as one of the source
table itself. #he source tables ; target tables are not cached in Informatica and hence it
made sense to use the large loo"up table as a source. 2See Fig 0.43 #his also ensured
that Cache memory would not be wasted unnecessarily and could be used for other
tas"s.
8ultiple Source #ables <oined in the Source -ualifier


Source -ualifier
-i$ 3'!: The Mappin$ showin$ the use o2 7oo)up ta*le as a +our"e ta*le'

S-5 to join the tables
9ser &efined <oin
-i$ !'3: The use o2 :oin "on%ition in the +our"e ;uali2ier'
After using the loo"up table as a source we used a joiner condition in the Source
-ualifier. #his reduced the searching time that was ta"en by Informatica as the numbers
of rows to be searched were drastically reduced since the join condition ta"es care of the
excess rows which would otherwise have been there in the 5oo"up transformation. #hus
the second problem was also successfully eliminated.
1 Complex loic for Se2uence 3eneration(
<'1 (a")$roun%:
hat is a Se/uence =enerator6
A se/uence generator is transformation that generates a se/uence of numbers once you
specify a starting value 2see Fig '.'3 and the increment by which to increment this
starting value. 2#he actual screens are attached for reference.3
-i$ <'1: The +e=uen"e 4enerator is a )in% o2 Trans2ormation'
-i$ <': The Trans2ormation %etails to *e 2ille% in or%er to $enerate a se=uen"e'
<' #ro*lem +"enario: In the project one of the mappings had two re/uirements vi!.7
a. &uring the transfer of &ata to a column of a #arget #able the Se/uence =enerator
was re/uired to trigger only selectively. But as per it>s property7 every time a row gets
loaded into the #arget #able the se/uence generator is triggered.
b. Another re/uirement was that the se/uences of numbers generated by the Se/uence
=enerator were re/uired to be in order.
For e.g.$ #he values that were to be loaded in the column of the target table were either
se/uence generated or obtained from a loo"up table. So whenever the loo"up condition
returned a value that value would populate the #arget #able but at the same time the
Se/uence =enerator would also trigger and hence increment by % so its C9++?A5
2current value7 see Fig 1.%3 would be increment by %. So when the next value is loaded in
the column of the target table the difference between the se/uence generated values
would be ' instead of %. #hus the generated se/uence won>t be continuous and there
would be gaps or holes in the se/uence.
<' 8ur +olution:
A basic rule for the Se/uence =enerator is that if a row gets loaded into the #arget table
the se/uence generator gets triggered. In order to prevent the se/uence generator from
triggering we created two instances of the same target table. 2See Fig 1.43
Se/uence =enerator #arget #able 2Second Instance3

5oo"up #able #arget #able 2First Instance3
-i$ <'!: The Mappin$ showin$ two instan"es o2 the same Tar$et ta*le'

#he se/uence generator was mapped to the column in the #arget #able in the first
instance 2See Fig 1.43 whereas the value returned from the 5oo"up #able 2if any3 was
mapped to the same column in the #arget table in the second instance 2See Fig1.43.
And all the other values for the remaining columns in the #arget #able were filtered on the
basis of the value returned from the 5oo"up #able i.e. if the loo"up table returned a value
then a row in the second instance of the target table would get populated and thus the
se/uence generator wont be triggered.
If the loo"up table returns a null value then a row would get populated in the first instance
of the target table and in this case the se/uence generator would trigger and its value
would get loaded in the column of the #arget #able.
#hus by achieving control over the triggering of the se/uence generator we could avoid
the (holes* or gaps in the se/uence generated by the Se/uence generator.

You might also like