Professional Documents
Culture Documents
377.informatica - What Are The Main Issues While Working With Flat Files As Source and As Targets ?
377.informatica - What Are The Main Issues While Working With Flat Files As Source and As Targets ?
We need to specify correct path in the session and mension either that file is 'direct' or 'indirect'. keep
that file in exact path which you have specified in the session .
-regards
rasmi
=======================================
1. We can not use SQL override. We have to use transformations for all our requirements
2. Testing the flat files is a very tedious job
3. The file format (source/target definition) should match exactly with the format of data file. Most of the time erroneous
result come when the data file layout is not in sync with the actual file.
(i) Your data file may be fixed width but the definition is delimited----> truncated data
(ii) Your data file as well as definition is delimited but specifying a wrong delimiter (a) a delimitor other than present in
actual file or (b) a delimiter that comes as a character in some field of the file---
>wrong data again
(iii) Not specifying NULL character properly may result in wrong data
(iv) there are other settings/attributes while creating file definition which one should be very careful
4. If you miss link to any column of the target then all the data will be placed in wrong fields. That
missed column wont exist in the target data file.
332.Informatica - Explain about Informatica server process that how it works relates to mapping variables?
informatica primarly uses load manager and data transformation manager(dtm) to perform extracting transformation and
loading.load manager reads parameters and variables related to session mapping and server and paases the mapping
parameters and variable information to the DTM.DTM uses this information to perform the datamovement from source
to target
=======================================
The PowerCenter Server holds two different values for a mapping variable during a session run:
l Start value of a mapping variable
l Current value of a mapping variable
Start Value
The start value is the value of the variable at the start of the session. The start value could be a value defined in the
parameter file for the variable a value saved in the repository from the previous run of the session a user defined initial
value for the variable or the default value based on the variable datatype.
The PowerCenter Server looks for the start value in the following order:
1. Value in parameter file
2. Value saved in the repository
3. Initial value
4. Default value
Current Value
The current value is the value of the variable as the session progresses. When a session starts the current value of a
variable is the same as the start value. As the session progresses the PowerCenter Server calculates the current value
using a variable function that you set for the variable. Unlike the start value of a mapping variable the current value can
change as the PowerCenter Server evaluates the current value of a variable as each row passes through the mapping.
=======================================
First load manager starts the session and it performs verifications and validations about variables and manages post
session tasks such as mail. then it creates DTM process.
this DTM inturn creates a master thread which creates remaining threads.
master thread credtes
read thread
write thread
transformation thread
pre and post session thread etc...
Finally DTM hand overs to the load manager after writing into the target
331.Informatica - write a query to retrieve the latest records from the table sorted by version(scd).
you can write a query like inline view clause you can compare previous version to new highest version
then you can get your result
=======================================
hi Sunil
Can u please expalin your answer some what in detail ????
=======================================
Hi
Assume if you put the surrogate key in target (Dept table) like p_key and
version field dno field and loc field is there
then
select a.p_key a.dno a.loc a.version from t_dept a
where a.version (select max(b.version) from t_dept b where a.dno b.dno)
this is the query if you write in lookup it retrieves latest (max)
version in lookup from target. in this way performance increases.
=======================================
Select Acct.* Rank() Over ( partition by ch_key_id order by version desc) as Rank
from Acct
where Rank() 1
=======================================
select business_key max(version) from tablename group by business_key
283.Informatica - hi, how we validate all the mappings in the repository at once
You can not validate all the mappings in one go. But you can validate all the mappings in a folder in one go and
continue the process for all the folders.
For dooing this log on to the repository manager. Open the folder then the mapping sub folder then select all or some of
the mappings(by pressing the shift or control key ctrl+A does not work) and then rightclick and validate.
=======================================
Yes. We can validate all mappings using the Repo Manager.
236.Informatica - What Bulk & Normal load? Where we use Bulk and where Normal?
when we try to load data in bulk mode there will be no entry in database log files so it will be tough to recover data if
session got failed at some point. where as in case of normal mode entry of every record will be with database log file
and with the informatica repository. so if the session got failed it will be easy for us to start data from last committed
point.
Bulk mode is very fast compartively with normal mode.
we use bulk mode to load data in databases it won't work with text files using as target where as normal mode will work
fine with all type of targets.
=======================================
in case of bulk for group of records a dml statement will created and executed
but in the case of normal for every recorda a dml statement will created and executed
if u selecting bulk performance will be increasing
=======================================
Bulk mode is used for Oracle/SQLserver/Sybase. This mode improves performance by not writing to the database log.
As a result when using this mode recovery is unavailable. Further this mode doesn't work when update transformation is
used and there shouldn't be any indexes or constraints on the table. Ofcourse one can use the pre-session and post-
session SQLs to drop and rebuild indexes/constraints.
234.Informatica - Explain in detail about Key Range & Round Robin partition with an example.
key range: The informatica server distributes the rows of data based on the st of ports that u specify as the partition key.
Round robin: The informatica server distributes the equal no of rows for each and every partition.
224.Informatica - what are the transformations that restrict the partitioning of sessions?
Advanced External procedure transformation and External procedure transformation:
This Transformation contains a check box on the properties tab to allow
partitioning.
*Aggregator Transformation:
If you use sorted ports you cannot partition the associated source
*Joiner Transformation:
you can not partition the master source for a joiner transformation
*Normalizer Transformation
*XML targets.
=======================================
1)source defination
2)Sequence Generator
3)Unconnected Transformation
4)Xml Target defination
DECODE(0) DECODE(1) DECODE(2) DECODE(3) for insertion updation deletion and rejection
=======================================
Update Strategy is the most important transformation of all Informatica transformations.
The basic thing one should understand about this is it is essential transformation to perform DML operations on already
data populated targets(i.e targets which contain some records before this mapping loads data)
It is used to perform DML operations.
Insertion Updation Deletion Rejection
when records come to this transformation depending on our requirement we can decide whether to insert or update or
reject the rows flowing in the mapping.
For example take an input row if it is already there in the target(we find this by lookup transformation) update it
otherwise insert it.
We can also specify some conditions based on which we can derive which update strategy we have to use.
eg: iif(condition DD_INSERT DD_UPDATE)
if condition satisfies do DD_INSERT otherwise do DD_UPDATE
DD_INSERT DD_UPDATE DD_DELETE DD_REJECT are called as decode options which can perform the respective
DML operations.
There is a function called DECODE to which we can arguments as 0 1 2 3
DECODE(0) DECODE(1) DECODE(2) DECODE(3) for insertion updation deletion and rejection to perform dml
operations
Steps
1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source
columns) and check whether the data is loaded according to the DATM in to target table.If any data is not loaded
according to the DATM then go and check in the code and rectify it.
This is called Qualitative testing.
This is what a devloper will do in Unit Testing.
184.Informatica - what is the difference between constraind base load ordering and target load plan
Target load order comes in the designer property..Click mappings tab in designer and then target load plan.It will show
all the target load groups in the particular mapping. You specify the order there the server will loadto the target
accordingly.
A target load group is a set of source-source qulifier-transformations and target.
Where as constraint based loading is a session proerty. Here the multiple targets must be generated from one source
qualifier. The target tables must posess primary/foreign key relationships. So that the server loads according to the key
relation irrespective of the Target load order plan.
=======================================
If you have only one source it s loading into multiple target means you have to use Constraint based loading. But the
target tables should have key relationships between them.
If you have multiple source qualifiers it has to be loaded into multiple target you have to use Target load order.
Constraint based loading : If your mapping contains single pipeline(flow) with morethan one target (If target tables
contain Master -Child relationship) you need to use constraint based load in session level.
Target Load plan : If your mapping contains multipe pipeline(flow) (specify execution order one by one.example
pipeline 1 need to execute first then pipeline 2 then pipeline 3) this is purly based on pipeline dependency
keep aggregator between source qualifier and target and choose group by field key it will eliminate the duplicate
records.
=======================================
Hi Before loading to target use an aggregator transformation and make use of group by function to eleminate the
duplicates on columns .Nanda
=======================================
Use Sorter Transformation. When you configure the Sorter Transformation to treat output rows as distinct it configures
all ports as part of the sort key. It therefore discards duplicate rows compared during the sort operation
Hi Before loading to target Use an aggregator transformation and use group by clause to eliminate the duplicate in
columns.Nanda
=======================================
Use sorter transformation select distinct option duplicate rows will be eliminated.
=======================================
if u want to delete the duplicate rows in flat files then we go for rank transformation or oracle external procedure
tranfornation
select all group by ports and select one field for rank its easily dupliuctee now
=======================================
using Sorter Transformation we can eliminate the Duplicate Rows from Flat file
=======================================
to eliminate the duplicate in flatfiles we have distinct property in sorter transformation. If we enable that property
automatically it will remove duplicate rows in flatfiles.
The Partitioning Option increases PowerCenters performance through parallel data processing and this option provides
a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and
grid-based hardware environments.
=======================================
partitions are used to optimize the session performance
we can select in sesstion propetys for partiotions
types
default----passthrough partition
key range partion
round robin partion
hash partiotion
=======================================
In informatica we can tune performance in 5 different levels that is at source level target level mapping level session
level and at network level.
So to tune the performance at session level we go for partitioning and again we have 4 types of partitioning those are
pass through hash round robin key range.
pass through is the default one.
Rank:
1
2<--2nd position
2<--3rd position
4
5
Same Rank is assigned to same totals/numbers. Rank is followed by the Position. Golf game ususally
Ranks this way. This is usually a Gold Ranking.
Dense Rank:
1
2<--2nd position
2<--3rd position
3
4
---------------------------------------------------------------------
151.Informatica - How do you configure mapping in informatica
You should configure the mapping with the least number of transformations and expressions to do the most amount of
work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator Joiner Rank and Lookup transformations) limit connected
input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data
the transformations store in the data cache.
You can also perform the following tasks to optimize the mapping:
l Configure single-pass reading.
l Optimize datatype conversions.
l Eliminate transformation errors.
l Optimize transformations.
l Optimize expressions. You should configure the mapping with the least number of
transformations and expressions to do the most amount of work possible. You should minimize the amount of data
moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator Joiner Rank and Lookup
transformations) limit connected input/output or output ports. Limiting the number of connected input/output or output
ports reduces the amount of data the transformations store in the data
You can also perform the following tasks to optimize the mapping:
m Configure single-pass reading.
m Optimize datatype conversions.
m Eliminate transformation errors.
m Optimize transformations.
m Optimize expressions.
149.Informatica - what are mapping parameters and varibles in which situation we can use it
Mapping parameters have a constant value through out the session
whereas in mapping variable the values change and the informatica server saves the values in the repository and uses
next time when u run the session.
=======================================
If we need to change certain attributes of a mapping after every time the session is run it will be very difficult to edit the
mapping and then change the attribute. So we use mapping parameters and variables and define the values in a
parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter value then we need to edit the
parameter file .
But value of mapping variables can be changed by using variable function. If we need to increment the attribute value
by 1 after every session run then we can use mapping variables .
In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.
=======================================
How can you edit the parameter file? Once you setup a mapping variable how can you define them in a
parameter file?
How to measure Performance of ETL load process
Dear All,
Am new to ETL testing. I have a testing requirement to capture the performance of an ETL system's load process for
various loads in order to size the product components and perform capacity planning. The measurements to be
captured include
Is there any tool which would help me in achieving this? Any pointers/ guidance in this direction would be greatly
appreciated. Many thanks in advance.
3)Prepare a sheet to list the values for the parameters of the feeds you are testing , I have mentioned some of the
parameters in the above two points.
4)Sum the things from the sheet to find the problem area, if any.
139.Informatica - what are cost based and rule based approaches and the difference
Cost based and rule based approaches are the optimization techniques which are used in related to databases where
we need to optimize a sql query.
Basically Oracle provides Two types of Optimizers (indeed 3 but we use only these two techniques. Bcz the third has
some disadvantages.)
When ever you process any sql query in Oracle what oracle engine internally does is it reads the query and decides
which will the best possible way for executing the query. So in this process Oracle follows these optimization
techniques.
1. cost based Optimizer(CBO): If a sql query can be executed in 2 different ways ( like may have path 1 and path2 for
same query) then What CBO does is it basically calculates the cost of each path and the analyses for which path the
cost of execution is less and then executes that path so that it can optimize the quey execution.
2. Rule base optimizer(RBO): this basically follows the rules which are needed for executing a query.
So depending on the number of rules which are to be applied the optimzer runs the query.
If the table you are trying to query is already analysed then oracle will go with CBO.
If the table is not analysed the Oracle follows RBO.
For the first time if table is not analysed Oracle will go with full table scan.
=======================================
while importing the flat file the flat file wizard helps in configuring the properties of the file so that select the numeric
column and just enter the precision value and the scale. precision includes the scale for example if the number is
98888.654 enter precision as 8 and scale as 3 and width as 10 for fixed width flat file
=======================================
you can handle that by simply using the source analyzer window and then go to the ports of that flat file representations
and changing the precision and scales.
=======================================
while importing flat file definetion just specify the scale for a neumaric data type. in the mapping the flat file source
supports only number datatype(no decimal and integer). In the SQ associated with that source will have a data type as
decimal for that number port of the source.
source ->number datatype port ->SQ -> decimal datatype.Integer is not supported. hence decimal is taken care.
join the two source by using the joiner transformation and then apply a look up on the resaulting table
=======================================
what ever my friends have answered earlier is correct. to be more specific
if the two tables are relational then u can use the SQL lookup over ride option to join the two tables in the lookup
properties.u cannot join a flat file and a relatioanl table.
eg: lookup default query will be select lookup table column_names from lookup_table. u can now continue this query.
add column_names of the 2nd table with the qualifier and a where clause. if u want to use a order by then use -- at the
end of the order by.
120.Informatica - How to retrive the records from a rejected file. explane with syntax or example
there is one utility called reject Loader where we can findout the reject records.and able to refine and reload the rejected
records..
=======================================
ya. every time u run the session one reject file will be created and all the reject files will be there in the reject file. u can
modify the records and correct the things in the records and u can load them to the target directly from the reject file
using Regect loader. =======================================
can you explain how to load rejected rows thro informatica
=======================================
During the execution of workflow all the rejected rows will be stored in bad files(where your
informatica server get installed;C:\Program Files\Informatica PowerCenter 7.1\Server) These bad files can be imported
as flat a file in source then thro' direct maping we can load these files in desired format.
97.Informatica - how to get the first 100 rows from the flat file into the target?
please check this one
task ----->(link) session (workflow manager)
double click on link and type $$source sucsess rows(parameter in session variables) 100
it should automatically stops session.
82.Informatica - If i done any modifications for my table in back end does it reflect in informatca warehouse or
mapi
Informatica is not at all concern with back end data base.It
displays u all the information that is to be stored in repository.If want to reflect back end changes to informatica
screens,again u have to import from back end to informatica by valid connection.And u have to replace the existing files
with imported files.
=======================================
Yes It will be reflected once u refresh the mapping once again.
=======================================
It does matter if you have SQL override - say in the SQ or in a Lookup you override the default sql. Then if you make a
change to the underlying table in the database that makes the override SQL incorrect for the modified table the session
will fail.
If you change a table - say rename a column that is in the sql override statement then session will fail.
But if you added a column to the underlying table after the last column then the sql statement in the override will still be
valid. If you make change to the size of columns the sql will still be valid although you may get truncation of data if the
database column has larger size (more characters) than the SQ or subsequent transformation.
When Data driven option is selected in session properties it the code will consider the update strategy
(DD_UPDATE DD_INSERT DD_DELETE DD_REJECT) used in the mapping and not the options
selected in the session properties.
Update as Insert:
This option specified all the update records from source to be flagged as inserts in the target. In other
words instead of updating the records in the target they are inserted as new records.
Update else Insert:
This option enables informatica to flag the records either for update if they are old or insert if they are
new records from source.
A batch fails when the sessions in the workflow are checked with the property
"Fail if parent fails"
and any of the session in the sequential batch fails.
41.What are stored procedure transformations. Purpose of sp transformation. How did you go about using your
project?
Connected stored procedure used in informatica level for example passing one parameter as input and capturing return
value from the stored procedure.
Normal - row wise check
Pre-Load Source - (Capture source incremental data for incremental aggregation)
Post-Load Source - (Delete Temporary tables)
Pre-Load Target - (Check disk space available)
Post-Load Target (Drop and recreate index)
60.Update strategy set DD_Update but in session level have insert. What will happens?
Insert take place. Because this option override the mapping level option
101.Variable v1 has values set as 5 in designer(default), 10 in parameter file, 15 in repository. While running
session which value informatica will read?
- Using this, you apply captured changes in the source to aggregate calculation in a session. If the source
changes only incrementally and you can capture changes, you can configure the session to process only
those changes
- This allows the sever to update the target incrementally, rather than forcing it to process the entire source
and recalculate the same calculations each time you run the session.
Steps:
- The first time you run a session with incremental aggregation enabled, the server process the entire source.
- At the end of the session, the server stores aggregate data from that session ran in two files, the index file
and data file. The server creates the file in local directory.
- The second time you run the session, use only changes in the source as source data for the session. The
server then performs the following actions:
(1) For each input record, the session checks the historical information in the index file for a corresponding
group, then:
If it finds a corresponding group
The server performs the aggregate operation incrementally, using the aggregate data for that
group, and saves the incremental changes.
Else
Server create a new group and saves the record data
(2) When writing to the target, the server applies the changes to the existing target.
o Updates modified aggregate groups in the target
o Inserts new aggregate data
o Delete removed aggregate data
o Ignores unchanged aggregate data
o Saves modified aggregate data in Index/Data files to be used as historical data the next time you run
the session.
o
Each Subsequent time you run the session with incremental aggregation, you use only the incremental source changes
in the session.
If the source changes significantly, and you want the server to continue saving the aggregate data for the future
incremental changes, configure the server to overwrite existing aggregate data with new aggregate data.
Commit Intervals
A commit interval is the interval at which the server commits data to relational targets during a session.
(a) Target based commit
- Server commits data based on the no of target rows and the key constraints on the target table. The commit
point also depends on the buffer block size and the commit pinterval.
- During a session, the server continues to fill the writer buffer, after it reaches the commit interval. When the
buffer block is full, the Informatica server issues a commit command. As a result, the amount of data
committed at the commit point generally exceeds the commit interval.
- The server commits data to each target based on primary foreign key constraints.
(b) Source based commit
- Server commits data based on the number of source rows. The commit point is the commit interval you
configure in the session properties.
- During a session, the server commits data to the target based on the number of rows from an active source
in a single pipeline. The rows are referred to as source rows.
- A pipeline consists of a source qualifier and all the transformations and targets that receive data from
source qualifier.
- Although the Filter, Router and Update Strategy transformations are active transformations, the server does
not use them as active sources in a source based commit session.
- When a server runs a session, it identifies the active source for each pipeline in the mapping. The server
generates a commit row from the active source at every commit interval.
- When each target in the pipeline receives the commit rows the server performs the commit.
Reject Loading
During a session, the server creates a reject file for each target instance in the mapping. If the writer of the target
rejects data, the server writers the rejected row into the reject file.
You can correct those rejected data and re-load them to relational targets, using the reject loading utility. (You
cannot load rejected data into a flat file target)
Each time, you run a session, the server appends a rejected data to the reject file.
Locating the BadFiles
$PMBadFileDir
Filename.bad
When you run a partitioned session, the server creates a separate reject file for each partition.
Reading Rejected data
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things.
(a) Row indicator
Row indicator tells the writer, what to do with the row of wrong data.
Row indicator Meaning Rejected By
0 Insert Writer or target
1 Update Writer or target
2 Delete Writer or target
3 Reject Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.
(b) Column indicator
Column indicator is followed by the first column of data, and another column indicator. They appears after every
column of data and define the type of data preceding it
Column Indicator Meaning Writer Treats as
D Valid Data Good Data. The target accepts
it unless a database error
occurs, such as finding
duplicate key.
O Overflow Bad Data.
N Null Bad Data.
T Truncated Bad Data
NOTE
NULL columns appear in the reject file with commas marking their column.
Correcting Reject File
Use the reject file and the session log to determine the cause for rejected data.
Keep in mind that correcting the reject file does not necessarily correct the source of the reject.
Correct the mapping and target database to eliminate some of the rejected data when you run the session again.
Trying to correct target rejected rows before correcting writer rejected rows is not recommended since they may
contain misleading column indicator.
For example, a series of N indicator might lead you to believe the target database does not accept NULL values,
so you decide to change those NULL values to Zero.
However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer because of an
update strategy expression, not because of a target database restriction.
If you try to load the corrected file to target, the writer will again reject those rows, and they will contain inaccurate 0
values, in place of NULL values.
When a session used External loader, the session creates a control file and target flat file. The control file contains
information about the target flat file, such as data format and loading instruction for the External Loader. The control
file has an extension of *.ctl and you can view the file in $PmtargetFilesDir.
For using an External Loader:
The following must be done:
- configure an external loader connection in the server manager
- Configure the session to write to a target flat file local to the server.
- Choose an external loader connection for each target file in session property sheet.
Issues with External Loader:
- Disable constraints
- Performance issues
o Increase commit intervals
o Turn off database logging
- Code page requirements
- The server can use multiple External Loader within one session (Ex: you are having a session with the two
target files. One with Oracle External Loader and another with Sybase External Loader)
Other Information:
- The External Loader performance depends upon the platform of the server
- The server loads data at different stages of the session
- The serve writes External Loader initialization and completing messaging in the session log. However,
details about EL performance, it is generated at EL log, which is getting stored as same target directory.
- If the session contains errors, the server continues the EL process. If the session fails, the server loads
partial target data using EL.
- The EL creates a reject file for data rejected by the database. The reject file has an extension of *.ldr
reject.
- The EL saves the reject file in the target file directory
- You can load corrected data from the file, using database reject loader, and not through Informatica reject
load utility (For EL reject file only)
Configuring EL in session
- In the server manager, open the session property sheet
- Select File target, and then click flat file options
Caches
- server creates index and data caches in memory for aggregator ,rank ,joiner and Lookup transformation in a
mapping.
- Server stores key values in index caches and output values in data caches : if the server requires more
memory ,it stores overflow values in cache files .
- When the session completes, the server releases caches memory, and in most circumstances, it deletes
the caches files .
- Caches Storage overflow :
- releases caches memory, and in most circumstances, it deletes the caches files .
Caches Storage overflow :
Transformation index cache data cache
Aggregator stores group values stores calculations
As configured in the based on Group-by ports
Group-by ports.
Rank stores group values as stores ranking information
Configured in the Group-by based on Group-by ports .
Joiner stores index values for stores master source rows .
The master source table
As configured in Joiner condition.
Lookup stores Lookup condition stores lookup data thats
Information. Not stored in the index cache.
Determining cache requirements
To calculate the cache size, you need to consider column and row requirements as well as processing
overhead.
- server requires processing overhead to cache data and index information.
Column overhead includes a null indicator, and row overhead can include row to key information.
Steps:
- first, add the total column size in the cache to the row overhead.
- Multiply the result by the no of groups (or) rows in the cache this gives the minimum cache requirements .
- For maximum requirements, multiply min requirements by 2.
Location:
-by default , the server stores the index and data files in the directory $PMCacheDir.
-the server names the index files PMAGG*.idx and data files PMAGG*.dat. if the size exceeds 2GB,you may find
multiple index and data files in the directory .The server appends a number to the end of
filename(PMAGG*.id*1,id*2,etc).
Aggregator Caches
- when server runs a session with an aggregator transformation, it stores data in memory until it completes
the aggregation.
- when you partition a source, the server creates one memory cache and one disk cache and one and disk
cache for each partition .It routes data from one partition to another based on group key values of the
transformation.
- server uses memory to process an aggregator transformation with sort ports. It doesnt use cache memory
.you dont need to configure the cache memory, that use sorted ports.
Index cache:
#Groups (( column size) + 7)
Aggregate data cache:
#Groups (( column size) + 7)
Rank Cache
- when the server runs a session with a Rank transformation, it compares an input row with rows with rows
in data cache. If the input row out-ranks a stored row,the Informatica server replaces the stored row with the
input row.
- If the rank transformation is configured to rank across multiple groups, the server ranks incrementally for
each group it finds .
Index Cache :
#Groups (( column size) + 7)
Rank Data Cache:
#Group [(#Ranks * ( column size + 10)) + 20]
Joiner Cache:
- When server runs a session with joiner transformation, it reads all rows from the master source and builds
memory caches based on the master rows.
- After building these caches, the server reads rows from the detail source and performs the joins
- Server creates the Index cache as it reads the master source into the data cache. The server uses the
Index cache to test the join condition. When it finds a match, it retrieves rows values from the data cache.
- To improve joiner performance, the server aligns all data for joiner cache or an eight byte boundary.
Index Cache :
#Master rows [( column size) + 16)
Joiner Data Cache:
#Master row [( column size) + 8]
Lookup cache:
- When server runs a lookup transformation, the server builds a cache in memory, when it process the first
row of data in the transformation.
- Server builds the cache and queries it for the each row that enters the transformation.
- If you partition the source pipeline, the server allocates the configured amount of memory for each partition.
If two lookup transformations share the cache, the server does not allocate additional memory for the
second lookup transformation.
- The server creates index and data cache files in the lookup cache drectory and used the server code page
to create the files.
Index Cache :
#Rows in lookup table [( column size) + 16)
Lookup Data Cache:
#Rows in lookup table [( column size) + 8]
Mapplets
When the server runs a session using a mapplets, it expands the mapplets. The server then runs the session as it
would any other sessions, passing data through each transformations in the mapplet.
If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet and every mapping
using the mapplet.
You can create a non-reusable instance of a reusable transformation.
Mapplet Objects:
(a) Input transformation
(b) Source qualifier
(c) Transformations, as you need
(d) Output transformation
Mapplet Wont Support:
- Joiner
- Normalizer
- Pre/Post session stored procedure
- Target definitions
- XML source definitions
Types of Mapplets:
(a) Active Mapplets - Contains one or more active transformations
(b) Passive Mapplets - Contains only passive transformation
Copied mapplets are not an instance of original mapplets. If you make changes to the original, the copy does not inherit
your changes
You can use a single mapplet, even more than once on a mapping.
Ports
Default value for I/P port- NULL
Default value for O/P port - ERROR
Default value for variables - Does not support default values
Session Parameters
This parameter represent values you might want to change between sessions, such as DB Connection or source file.
We can use session parameter in a session property sheet, then define the parameters in a session parameter file.
The user defined session parameter are:
(a) DB Connection
(b) Source File directory
(c) Target file directory
(d) Reject file directory
Description:
Use session parameter to make sessions more flexible. For example, you have the same type of transactional data
written to two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the
databases. You want to use the same mapping for both tables.
Instead of creating two sessions for the same mapping, you can create a database connection parameter, like
$DBConnectionSource, and use it as the source database connection for the session.
When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session.
After it completes set the value to TransDB2 and run the session again.
NOTE:
You can use several parameter together to make session management easier.
Session parameters do not have default value, when the server can not find a value for a session parameter, it fails to
initialize the session.
Session Parameter File
- A parameter file is created by text editor.
- In that, we can specify the folder and session name, then list the parameters and variables used in the
session and assign each value.
- Save the parameter file in any directory, load to the server
- We can define following values in a parameter
o Mapping parameter
o Mapping variables
o Session parameters
- You can include parameter and variable information for more than one session in a single parameter file by
creating separate sections, for each session with in the parameter file.
- You can override the parameter file for sessions contained in a batch by using a batch parameter file. A
batch parameter file has the same format as a session parameter file
Locale
Informatica server can transform character data in two modes
(a) ASCII
a. Default one
b. Passes 7 byte, US-ASCII character data
(b) UNICODE
a. Passes 8 bytes, multi byte character data
b. It uses 2 bytes for each character to move data and performs additional checks at session level, to
ensure data integrity.
Code pages contains the encoding to specify characters in a set of one or more languages. We can select a code page,
based on the type of character data in the mappings.
Compatibility between code pages is essential for accurate data movement.
The various code page components are
- Operating system Locale settings
- Operating system code page
- Informatica server data movement mode
- Informatica server code page
- Informatica repository code page
Locale
(a) System Locale - System Default
(b) User locale - setting for date, time, display
Input locale
Mapping Parameter and Variables
These represent values in mappings/mapplets.
If we declare mapping parameters and variables in a mapping, you can reuse a mapping by altering the parameter
and variable values of the mappings in the session.
This can reduce the overhead of creating multiple mappings when only certain attributes of mapping needs to be
changed.
When you want to use the same value for a mapping parameter each time you run the session.
Unlike a mapping parameter, a mapping variable represent a value that can change through the session. The
server saves the value of a mapping variable to the repository at the end of each successful run and used that value
the next time you run the session.
Mapping objects:
Source, Target, Transformation, Cubes, Dimension
Debugger
We can run the Debugger in two situations
(a) Before Session: After saving mapping, we can run some initial tests.
(b) After Session: real Debugging process
MEadata Reporter:
- Web based application that allows to run reports against repository metadata
- Reports including executed sessions, lookup table dependencies, mappings and source/target schemas.
Repository
Types of Repository
(a) Global Repository
a. This is the hub of the domain use the GR to store common objects that multiple developers can use
through shortcuts. These may include operational or application source definitions, reusable
transformations, mapplets and mappings
(b) Local Repository
a. A Local Repository is with in a domain that is not the global repository. Use4 the Local Repository for
development.
Standard Repository
a. A repository that functions individually, unrelated and unconnected to other repository
NOTE:
- Once you create a global repository, you can not change it to a local repository
- However, you can promote the local to global repository
Batches
- Provide a way to group sessions for either serial or parallel execution by server
- Batches
o Sequential (Runs session one after another)
o Concurrent (Runs sessions at same time)
Nesting Batches
Each batch can contain any number of session/batches. We can nest batches several levels deep, defining batches
within batches
Nested batches are useful when you want to control a complex series of sessions that must run sequentially or
concurrently
Scheduling
When you place sessions in a batch, the batch schedule override that session schedule by default. However, we
can configure a batched session to run on its own schedule by selecting the Use Absolute Time Session Option.
Server Behavior
Server configured to run a batch overrides the server configuration to run sessions within the batch. If you have
multiple servers, all sessions within a batch run on the Informatica server that runs the batch.
The server marks a batch as failed if one of its sessions is configured to run if Previous completes and that
previous session fails.
Sequential Batch
If you have sessions with dependent source/target relationship, you can place them in a sequential batch, so that
Informatica server can run them is consecutive order.
They are two ways of running sessions, under this category
(a) Run the session, only if the previous completes successfully
(b) Always run the session (this is default)
Concurrent Batch
In this mode, the server starts all of the sessions within the batch, at same time
Concurrent batches take advantage of the resource of the Informatica server, reducing the time it takes to run the
session separately or in a sequential batch.
Concurrent batch in a Sequential batch
If you have concurrent batches with source-target dependencies that benefit from running those batches in a
particular order, just like sessions, place them into a sequential batch.
Info server uses both process memory and system shared memory to perform ETL process.
It runs as a daemon on UNIX and as a service on WIN NT.
The following processes are used to run a session:
(a) LOAD manager process: - starts a session
creates DTM process, which creates the session.
(b) DTM process: - creates threads to initialize the session
- read, write and transform data.
- handle pre/post session opertions.
Load manager processes:
- manages session/batch scheduling.
- Locks session.
- Reads parameter file.
- Expands server/session variables, parameters .
- Verifies permissions/privileges.
- Creates session log file.
DTM process:
The primary purpose of the DTM is to create and manage threads that carry out the session tasks.
The DTM allocates process memory for the session and divides it into buffers. This is known as buffer
memory. The default memory allocation is 12,000,000 bytes .it creates the main thread, which is called master
thread .this manages all other threads.
Various threads functions
Master thread- handles stop and abort requests from load manager.
Mapping thread- one thread for each session.
Fetches session and mapping information.
Compiles mapping.
Cleans up after execution.
Reader thread- one thread for each partition.
Relational sources uses relational threads and
Flat files use file threads.
Writer thread- one thread for each partition writes to target.
Transformation thread- One or more transformation for each partition.
Note:
When you run a session, the threads for a partitioned source execute concurrently. The threads use buffers
to move/transform data.
Q. What are the advantages of having bitmap index for data warehousing applications? (KPIT Infotech, Pune)
Bitmap indexing benefits data warehousing applications which have large amounts of data and ad hoc queries but a low
level of concurrent transactions. For such applications, bitmap indexing provides:
1. Reduced response time for large classes of ad hoc queries
2. A substantial reduction of space usage compared to other indexing techniques
3. Dramatic performance gains even on very low end hardware
4. Very efficient parallel DML and loads
For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A
bitmap index on this column can out-perform a B-tree index, particularly when this column is often queried in
conjunction with other columns.
B-tree indexes are most effective for high-cardinality data: that is, data with many possible values, such as
CUSTOMER_NAME or PHONE_NUMBER. A regular Btree index can be several times larger than the indexed data.
Used appropriately, bitmap indexes can be significantly smaller than a corresponding B-tree index.
Q. What are clusters?
Clusters are an optional method of storing table data. A cluster is a group of tables that share the same data blocks
because they share common columns and are often used together.
For example, the EMP and DEPT table share the DEPTNO column. When you cluster the EMP and DEPT tables,
Oracle physically stores all rows for each department from both the EMP and DEPT tables in the same data blocks.
Another method, composite partitioning, partitions the data by range and further subdivides the data into sub partitions
using a hash function.
Read lock. Created when you open a repository object in a folder for which you do not have write permission.
Also created when you open an object with an existing write lock.
Write lock. Created when you create or edit a repository object in a folder for which you have write permission.
Execute lock. Created when you start a session or batch, or when the Informatica Server starts a scheduled
session or batch.
Fetch lock. Created when the repository reads information about repository objects from the database.
Save lock. Created when you save information to the repository.
When you use event-based scheduling, the Informatica Server starts a session when it locates the specified indicator
file. To use event-based scheduling, you need a shell command, script, or batch file to create an indicator file when all
sources are available. The file must be created or sent to a directory local to the Informatica Server. The file can be of
any format recognized by the Informatica Server operating system. The Informatica Server deletes the indicator file
once the session starts.
Q: Why doesn't constraint based load order work with a maplet? (08 May 2000)
If your maplet has a sequence generator (reusable) that's mapped with data straight to an "OUTPUT" designation, and
then the map splits the output to two tables: parent/child - and your session is marked with "Constraint Based Load
Ordering" you may have experienced a load problem - where the constraints do not appear to be met?? Well - the
problem is in the perception of what an "OUTPUT" designation is. The OUTPUT component is NOT an "object" that
collects a "row" as a row, before pushing it downstream. An OUTPUT component is merely a pass-through structural
object - as indicated, there are no data types on the INPUT or OUTPUT components of a maplet - thus indicating
merely structure. To make the constraint based load order work properly, move all the ports through a single
expression, then through the OUTPUT component - this will force a single row to be "put together" and passed along to
the receiving maplet. Otherwise - the sequence generator generates 1 new sequence ID for each split target on the
other side of the OUTPUT component.
If you don't care about "reporting" duplicates, use an aggregator. Set the Group By Ports to group by the primary key in
the parent target table. Keep in mind that using an aggregator causes the following: The last duplicate row in the file is
pushed through as the one and only row, loss of ability to detect which rows are duplicates, caching of the data before
processing in the map continues. If you wish to report duplicates, then follow the suggestions in the presentation slides
(available on this web site) to institute a staging table. See the pro's and cons' of staging tables, and what they can do
for you.
Q: What happens in a database when a cached LOOKUP object is created (during a session)?
The session generates a select statement with an Order By clause. Any time this is issued, the databases like Oracle
and Sybase will select (read) all the data from the table, in to the temporary database/space. Then the data will be
sorted, and read in chunks back to Informatica server. This means, that hot-spot contention for a cached lookup will
NOT be the table it just read from. It will be the TEMP area in the database, particularly if the TEMP area is being
utilized for other things. Also - once the cache is created, it is not re-read until the next running session re-creates it.
Q: Can you explain how "constraint based load ordering" works? (27 Jan 2000)
Constraint based load ordering in PowerMart / PowerCenter works like this: it controls the order in which the target
tables are committed to a relational database. It is of no use when sending information to a flat file. To construct the
proper constraint order: links between the TARGET tables in Informatica need to be constructed. Simply turning on
"constraint based load ordering" has no effect on the operation itself. Informatica does NOT read constraints from the
database when this switch is turned on. Again, to take advantage of this switch, you must construct primary / foreign
key relationships in the TARGET TABLES in the designer of Informatica. Creating primary / foreign key relationships is
difficult - you are only allowed to link a single port (field) to a single table as a primary / foreign key.
What is the method of loading 5 flat files of having same structure to a single target and which transformations
will you use?
This can be handled by using the file list in informatica. If we have 5 files
in different locations on the server and we need to load in to single target
table. In session properties we need to change the file type as Indirect.
(Direct if the source file contains the source data. Choose Indirect if the
source file contains a list of files.
When you select Indirect the PowerCenter Server finds the file list then reads
each listed file when it executes the session.)
am taking a notepad and giving following paths and filenames in this notepad and saving this notepad as
emp_source.txt in the directory /ftp_data/webrep/
/ftp_data/webrep/SrcFiles/abc.txt
/ftp_data/webrep/bcd.txt
/ftp_data/webrep/srcfilesforsessions/xyz.txt
/ftp_data/webrep/SrcFiles/uvw.txt
/ftp_data/webrep/pqr.txt
If your session writes to a flat file target, you can optimize session performance by writing to a flat file target that is local
to the Informatica Server.
If your session writes to a relational target, consider performing the following tasks to increase performance:
Drop indexes and key constraints.
Increase checkpoint intervals.
Use bulk loading.
Use external loading.
Turn off recovery.
Increase database network packet size.
Optimize Oracle target databases.
Dropping Indexes and Key Constraints
When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve
performance, drop indexes and key constraints before running your session. You can rebuild those indexes and key
constraints after the session completes.
If you decide to drop and rebuild indexes and key constraints on a regular basis, you can create pre- and post-
load stored procedures to perform these operations each time you run the session.
Note: To optimize performance, use constraint-based loading only if necessary.
For other databases, even if you configure the bulk loading option, Informatica Server ignores the commit
interval mentioned and commits as needed.
Forcing the Informatica Server to make unnecessary datatype conversions slows performance.
For example, if your mapping moves data from an Integer column to a Decimal column, then back to an Integer column,
the unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype conversions
from mappings.
Some datatype conversions can improve system performance. Use integer values in place of other datatypes when
performing comparisons using Lookup and Filter transformations.
For example, many databases store U.S. zip code information as a Char or Varchar datatype. If you convert your zip
code data to an Integer datatype, the lookup database stores the zip code 94303-1234 as 943031234. This helps
increase the speed of the lookup comparisons based on zip code.
Caching Lookups
If a mapping contains Lookup transformations, you might want to enable lookup caching. In general, you want to cache
lookup tables that need less than 300MB.
When you enable caching, the Informatica Server caches the lookup table and queries the lookup cache during the
session. When this option is not enabled, the Informatica Server queries the lookup table on a row-by-row basis. You
can increase performance using a shared or persistent cache:
Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache
between transformations in the same mapping. You can share a named cache between transformations in the same or
different mappings.
Persistent cache. If you want to save and reuse the cache files, you can configure the transformation to use a
persistent cache. Use this feature when you know the lookup table does not change between session runs. Using a
persistent cache can improve performance because the Informatica Server builds the memory cache from the cache
files instead of from the database.
The PowerMart and PowerCenter repository has more than 80 tables and almost all tables use one or more indexes to
speed up queries. Most databases keep and use column distribution statistics to determine which index to use to
execute SQL queries optimally. Database servers do not update these statistics continuously.
In frequently-used repositories, these statistics can become outdated very quickly and SQL query optimizers may
choose a less than optimal query plan. In large repositories, the impact of choosing a sub-optimal query plan can affect
performance drastically. Over time, the repository becomes slower and slower.
To optimize SQL queries, you might update these statistics regularly. The frequency of updating statistics depends on
how heavily the repository is used. Updating statistics is done table by table. The database administrator can create
scripts to automate the task.
You can use the following information to generate scripts to update distribution statistics.
Note: All PowerMart/PowerCenter repository tables and index names begin with OPB_.
Oracle Database
You can generate scripts to update distribution statistics for an Oracle repository.
select 'analyze table ', table_name, ' compute statistics;' from user_tables where table_name like 'OPB_%'
select 'analyze index ', INDEX_NAME, ' compute statistics;' from user_indexes where INDEX_NAME like
'OPB_%'
You can generate scripts to update distribution statistics for a Microsoft SQL Server repository.
select 'update statistics ', name from sysobjects where name like 'OPB_%'
name
------------------ ------------------
name
------------------ ------------------
Once you optimize your source database, target database, and mapping, you can focus on optimizing the session. You
can perform the following tasks to improve overall performance:
Run concurrent batches.
Partition sessions.
Reduce errors tracing.
Remove staging areas.
Tune session parameters.
Table 19-1 lists the settings and values you can use to improve session performance:
64,000 bytes
Buffer block size 4,000 bytes 128,000 bytes
[64 KB]
How to correct and load the rejected files when the session completes
During a session, the Informatica Server creates a reject file for each target instance in the mapping. If the writer or the
target rejects data, the Informatica Server writes the rejected row into the reject file. By default, the Informatica Server
creates reject files in the $PMBadFileDir server variable directory.
The reject file and session log contain information that helps you determine the cause of the reject. You can correct
reject files and load them to relational targets using the Informatica reject loader utility. The reject loader also creates
another reject file for the data that the writer or target reject during the reject loading.
Complete the following tasks to load reject data into the target:
NOTE: You cannot load rejected data into a flat file target
After you locate a reject file, you can read it using a text editor that supports the reject file code page.
Reject files contain rows of data rejected by the writer or the target database. Though the Informatica Server writes the
entire row in the reject file, the problem generally centers on one column within the row. To help you determine which
column caused the row to be rejected, the Informatica Server adds row and column indicators to give you more
information about each column:
Row indicator. The first column in each row of the reject file is the row indicator. The numeric indicator tells
whether the row was marked for insert, update, delete, or reject.
Column indicator. Column indicators appear after every column of data. The alphabetical character indicators
tell whether the data was valid, overflow, null, or truncated.
The following sample reject file shows the row and column indicators:
3,D,1,D,,D,0,D,1094945255,D,0.00,D,-0.00,D
0,D,1,D,April,D,1997,D,1,D,-1364.22,D,-1364.22,D
0,D,1,D,April,D,2000,D,1,D,2560974.96,D,2560974.96,D
3,D,1,D,April,D,2000,D,0,D,0.00,D,0.00,D
0,D,1,D,August,D,1997,D,2,D,2283.76,D,4567.53,D
0,D,3,D,December,D,1999,D,1,D,273825.03,D,273825.03,D
0,D,1,D,September,D,1997,D,1,D,0.00,D,0.00,D
Row Indicators
The first column in the reject file is the row indicator. The number listed as the row indicator tells the writer what to do
with the row of data.
3 Reject Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.
If a row indicator is 0, 1, or 2, either the writer or the target database rejected the row. To narrow down the reason why
rows marked 0, 1, or 2 were rejected, review the column indicators and consult the session log.
Column Indicators
After the row indicator is a column indicator, followed by the first column of data, and another column indicator. Column
indicators appear after every column of data and define the type of the data preceding it.
Table 15-2 describes the column indicators in a reject file:
Overflow. Numeric data exceeded the Bad data, if you configured the mapping target to reject
O
specified precision or scale for the column. overflow or truncated data.
Truncated. String data exceeded a specified Bad data, if you configured the mapping target to reject
T precision for the column, so the Informatica overflow or truncated data.
Server truncated it.
After you correct the target data in each of the reject files, append .in to each reject file you want to load into the
target database. For example, after you correct the reject file, t_AvgSales_1.bad, you can rename it
t_AvgSales_1.bad.in.
After you correct the reject file and rename it to reject_file.in, you can use the reject loader to send those files through
the writer to the target database.
Use the reject loader utility from the command line to load rejected files into target tables. The syntax for reject loading
differs on UNIX and Windows NT/2000 platforms.
pmrejldr [folder_name:]session_name
Recovering Sessions
If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause
of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on
the properties of the mapping, session, and Informatica Server configuration.
When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row
ID of the last row committed to the target database. The Informatica Server then reads all sources again and starts
processing from the next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails,
when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001. The
commit point may be different for source- and target-based commits.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica
Server setup before you run a session so the Informatica Server can create and/or write entries in the
OPB_SRVR_RECOVERY table.
Fatal Error
A fatal error occurs when the Informatica Server cannot access the source, target, or repository. This can include loss of
connection or target database errors, such as lack of database space to load data. If the session uses a Normalizer or
Sequence Generator transformation, the Informatica Server cannot update the sequence values in the repository, and a
fatal error occurs.
When do u we use dynamic cache and when do we use static cache in an connected
and unconnected lookup transformation
We use dynamic cache only for connected lookup. We use dynamic cache to check
whether the record already exists in the target table are not. And depending on that, we
insert,update or delete the records using update strategy. Static cache is the default cache
in both connected and unconnected. If u select static cache on lookup table in infa, it
own't update the cache and the row in the cache remain constant. We use this to check the
results and also to update slowly changing records
How to get two targets T1 containing distinct values and T2 containing duplicate
values from one source S1.
Use filter transformation for loading the target with no duplicates. and for the other
transformation load it directly from source.
How to delete duplicate rows in flat files source is any option in informatica
Use a sorter transformation , in that u will have a "distinct" option make use of it .
What are the basic needs to join two sources in a source qualifier?
Two sources should have primary and Foreign key relation ships.
Two sources should have matching data types.
What are the different options used to configure the sequential batches?
Two options
Run the session only if previous session completes sucessfully. Always runs the session.
Establishing conformity
Developing a set of shared, conformed dimensions is a significant challenge. Any
dimensions that are common across the business processes must represent the
dimension information in the same way. That is, it must be conformed. Each
business process will typically have its own schema that contains a fact table,
several conforming dimension tables, and dimension tables unique to the
specific business function. The same is true for facts.
Degenerate dimensions
Before we discuss degenerate dimensions in detail, it is important to understand
the following:
A fact table may consist of the following data:
_ Foreign keys to dimension tables
_ Facts which may be:
Additive
Semi-additive
Non-additive
Pseudo facts (such as 1 and 0 in case of attendance tracking)
Textual fact (rarely the case)
Derived facts
year-to-date facts
_ Degenerate dimensions (one or more)
Non-additive facts
Non-additive facts are facts which cannot be added meaningfully across any
dimensions.
Textual facts: Adding textual facts does not result in any number. However,
counting textual facts may result in a sensible number.
_ Per-unit prices: Adding unit prices does not produce any meaningful
Percentages and ratios:
Measures of intensity: Measures of intensity such as the room temperature
Averages:
Semi-additive facts
Semi-additive facts are facts which can be summarized across some dimensions
but not others. Examples of semi-additive facts include the following:
_ Account balances
_ Quantity-on-hand
adding the monthly balances across the
different days for the month of January results in an incorrect balance figure.
However, if we average the account balance to find out daily average balance
during each day of the month, it would be valid.
Standalone Repository : A repository that functions individually and this is unrelated to any other repositories.
Global Repository : This is a centralized repository in a domain. This repository can contain shared objects
across the repositories in a domain. The objects are shared through global shortcuts.
Local Repository : Local repository is within a domain and its not a global repository. Local repository can
connect to a global repository using global shortcuts and can use objects in its shared folders.
Versioned Repository : This can either be local or global repository but it allows version control for the
repository. A versioned repository can store multiple copies, or versions of an object. This features allows to
efficiently develop, test and deploy metadata in the production environment.
If the lookup source does not change between sessions, configure the Lookup transformation to use a persistent
lookup cache. The Power Center Server then saves and reuses cache files from session to session, eliminating the
time required to read the lookup source.
When using a dynamic lookup and WHERE clause in SQL override. Make sure that you add a filter before the
lookup. The filter should remove rows which do not satisfy the WHERE Clause.
Reason
During dynamic lookups while inserting the records in cache the WHERE clause is not evaluated, only the join
condition is evaluated. So, the lookup cache and table are not in sync. Records satisfying the join condition are
inserted into lookup cache. Its better to put a filter before the lookup using WHERE clause so that it contains
records satisfying both join condition and where clause.
If a session fails after loading of 10,000 records in to the target.How can u load the
records from 10001 th record when u run the session next time in informatica 6.1?
Running the session in recovery mode will work, but the target load type should be
normal. If its bulk then recovery wont work as expected
Nothing in this thread makes any sense. Nothing gets updated in a dynamic cached other than the cache itself. What
happens in the file is a matter of what your mapping does to it, not the cache.
A lookup (dynamic or otherwise) is loaded from a source. The source can be anything you have defined in your
environment... flat file, table, whatever.
The difference between a dynamic and static cache is in a dynamic cache one of the columns in the source must be
identified as the primary key (separate from the lookup key) and it must be numeric. It uses the values in that column to
figure out what the new key should be should you insert a new row in the cache.
If your flat file does not have such a column you cannot use it in a dynamic lookup.
Enable You can configure the Integration Service to perform a test load.
Test Load With a test load, the Integration Service reads and transforms data without writing to targets. The
Integration Service generates all session files and performs all pre- and post-session functions, as if
running the full session.
The Integration Service writes data to relational targets, but rolls back the data when the session
completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write
data to the targets.
Enter the number of source rows you want to test in the Number of Rows to Test field.
You cannot perform a test load on sessions that use XML sources.
Note: You can perform a test load when you configure a session for normal mode. If you configure the
session for bulk mode, the session fails.