377.informatica - What Are The Main Issues While Working With Flat Files As Source and As Targets ?

377.Informatica - what are the main issues while working with flat files as source and as targets ?
We need to specify correct path in the session and mension either that file is 'direct' or 'indirect'. keep
that file in exact path which you have specified in the session .
-regards
rasmi
=======================================
1. We can not use SQL override. We have to use transformations for all our requirements
2. Testing the flat files is a very tedious job
3. The file format (source/target definition) should match exactly with the format of data file. Most of the time erroneous
result come when the data file layout is not in sync with the actual file.
(i) Your data file may be fixed width but the definition is delimited----> truncated data
(ii) Your data file as well as definition is delimited but specifying a wrong delimiter (a) a delimitor other than present in
actual file or (b) a delimiter that comes as a character in some field of the file-->wrong data again
(iii) Not specifying NULL character properly may result in wrong data
(iv) there are other settings/attributes while creating file definition which one should be very careful
4. If you miss link to any column of the target then all the data will be placed in wrong fields. That
missed column wont exist in the target data file.
332.Informatica - Explain about Informatica server process that how it works relates to mapping variables?
informatica primarly uses load manager and data transformation manager(dtm) to perform extracting transformation and
loading.load manager reads parameters and variables related to session mapping and server and paases the mapping
parameters and variable information to the DTM.DTM uses this information to perform the datamovement from source
to target
=======================================
The PowerCenter Server holds two different values for a mapping variable during a session run:
l Start value of a mapping variable
l Current value of a mapping variable
Start Value
The start value is the value of the variable at the start of the session. The start value could be a value defined in the
parameter file for the variable a value saved in the repository from the previous run of the session a user defined initial
value for the variable or the default value based on the variable datatype.
The PowerCenter Server looks for the start value in the following order:
1. Value in parameter file
2. Value saved in the repository
3. Initial value
4. Default value
Current Value
The current value is the value of the variable as the session progresses. When a session starts the current value of a
variable is the same as the start value. As the session progresses the PowerCenter Server calculates the current value
using a variable function that you set for the variable. Unlike the start value of a mapping variable the current value can
change as the PowerCenter Server evaluates the current value of a variable as each row passes through the mapping.
=======================================
First load manager starts the session and it performs verifications and validations about variables and manages post
session tasks such as mail. then it creates DTM process.
this DTM inturn creates a master thread which creates remaining threads.
master thread credtes
read thread
write thread
transformation thread
pre and post session thread etc...
Finally DTM hand overs to the load manager after writing into the target
331.Informatica - write a query to retrieve the latest records from the table sorted by version(scd).
you can write a query like inline view clause you can compare previous version to new highest version
then you can get your result
=======================================
hi Sunil
Can u please expalin your answer some what in detail ????
=======================================
Hi
Assume if you put the surrogate key in target (Dept table) like p_key and
version field dno field and loc field is there
then
select a.p_key a.dno a.loc a.version from t_dept a
where a.version (select max(b.version) from t_dept b where a.dno b.dno)
this is the query if you write in lookup it retrieves latest (max)
version in lookup from target. in this way performance increases.
=======================================
Select Acct.* Rank() Over ( partition by ch_key_id order by version desc) as Rank
from Acct
where Rank() 1
=======================================
select business_key max(version) from tablename group by business_key
329.Informatica - How do you handle two sessions in Informatica

You can handle 2 session by using a link condition (id $ PrevTaskStatus SUCCESSFULL)
or you can have a decision task between them. I feel since its only one session dependent on one have a link condition
=======================================
By giving a link condition like $PrevTaskStatus SUCCESSFULL
=======================================
where exactly do we need to use this link condition (id $ PrevTaskStatus SUCCESSFULL)
=======================================
you can drag and drop more than one session in a workflow.
there will be linking different and is sequential linking
concurrent linking
in sequential linking you can run which ever session you require or if the workflow runs all the sessions sequentially.
in concurrent linking you can't run any session you want.
319.Informatica - which one is better performance wise joiner or lookup

Are you lookuping flat file or database table? Generaly sorted joiner is more effective on flat files than lookup because
sorted joiner uses merge join and cashes less rows. Lookup cashes always whole file. If the file is not sorted it can be
comparable.Lookups into database table can be effective if the database can return sorted data fast and the amount of
data is small because lookup can create whole cash in memory. If database responses slowly or big amount of data are
processed lookup cache initialization can be really slow (lookup waits for database and stores cashed data on discs).
Then it can be better use sorted joiner which throws data to output as reads them on input.
318.Informatica - How to partition the Session?(Interview

o Round-Robin: PC server distributes rows of data evenly to all partitions. @filtero
Hash keys: distribute rows to the partitions by group. @rank sorter joiner and unsorted aggregator.o Key range:
distributes rows based on a port or set of ports that you specify as the partition key. @source and targeto
Pass-through: processes data without redistributing rows among partitions. @any valid partition point.
When you create or edit a session you can change the partitioning information for each pipeline in a mapping. If the
mapping contains multiple pipelines you can specify multiple partitions in some pipelines and single partitions in others.
You update partitioning information using the Partitions view on the Mapping tab in the session properties.
You can configure the following information in the Partitions view on the Mapping tab:
l Add and delete partition points.
l Enter a description for each partition.
l Specify the partition type at each partition point.
l Add a partition key and key ranges for certain partition types.
=======================================
By default when we create the session workflow creates pass-through partition points at Source Qualifier
transformations and target instances.
312.Informatica - how many types of sessions are there in

informatica.please explain them.
reusable nonusable session
=======================================
Total 10 SESSIONS
1. SESSION: FOR MAPPING EXECUTION
2. EMAIL:TO SEND EMAILS
3. COMMAND: TO EXECUTE OS COMMANDS
4 CONTROL: FAIL STOP ABORT
5.EVENT WAIT: FOR PRE_DEFINED OR POST_DEFINED EVENTS
6 EVENT RAISE:TO RAISE ACTIVE USER_DEFINED EVENT
7. DECISSION :CONDITION TO BE EVALUATED FOR CONTROLING FLOW OR PROCESS
8. TIMER: TO HALT THE PROCESS FOR SPECIFIC TIME
9.WORKLET TASK: REUSABLE TASK
10.ASSIGNEMENT: TO ASSIGN VALUES WORKLET OR WORK FLOW VARIABLES
=======================================
Session is a type of workflow task and set of instructions that describe how to move Data from Source to targets using a
mapping
There are two session in informatica
1. sequential: When Data moves one after another from source to target it is sequential
2.Concurrent: When whole data moves simultaneously from source to target it is Concurrent
309.Informatica - Explain the pipeline partition with real time example?
PIPELINE SPECIFIES THE FLOW OF DATA FROM SOURCE TO TARGET .PIPELINE
PARTISON MEANS PARTISON THE DATA BASED ON SOME KEY VALUES AND LOAD THE DATA TO TARGET
UNDER CONCURRENT MODE. WHICH INPROVES THE SESSION PERFORMANCE i.e data loading time reduces.
in real time we have some thousands of records exists everyday to load the data to targets .so pipeline partisoning
definetly reduces the data loading time.
305.Informatica - how can we remove/optmize source bottlenecks

using "query hints"
Create indexes for source table colums
=======================================
first u must have proper indexes and the table must be analyzed to gather stats to use the cbo. u can get free doc from
oracle technet. use the hints after and it is powerful so be careful with the hints.
=======================================
306.Informatica - how can we eliminate source bottleneck using query hint
You can identify source bottlenecks by executing the read query directly against the source database. Copy the read
query directly from the session log. Execute the query against the source database with a query tool such as isql. On
Windows you can load the result of the query in a file. On UNIX systems you can load the result of the query in
/dev/null.
Measure the query execution time and the time it takes for the query to return the first row. If there is a long delay
between the two time measurements you can use an optimizer hint to eliminate the source bottleneck.
297.Informatica - what is test load?

Test load is nothing but checking whether the data is moving correctly to the target or not.
=======================================
Test load is the property we can set at the session property level by which Informatica performs all pre and post session
tasks but does not save target data(in RDBMS target table it writes the data to check the constraints but rolls it back). If
the target is flat file then it does not write anything in the file. We can specify number of source rows to test load the
mapping. This is another way of debugging the mapping without loading the target.
296.Informatica - how can we delete duplicate rows from flat files ?
=======================================
We can delete duplicate rows from flat files by using Sorter transformation.
=======================================
Sorter Transofrormation do the records in sorting order(for better performence) . iam asking how can we delete
duplicate rows
=======================================
use a lookup by primary key
=======================================
In the mapping read the flat file through a Source Definition and SQ. Apply a Sorter Transformation in the property tab
select distinct . out put will give a sorter distinct data hence you get rid of duplicates.
You can also use an Aggegator Transformation and group by the PK. Gives the same result.
=======================================
Use Sorter Transformation and check Distinct option. It will remove the duplicates.
283.Informatica - hi, how we validate all the mappings in the repository at once
You can not validate all the mappings in one go. But you can validate all the mappings in a folder in one go and
continue the process for all the folders.
For dooing this log on to the repository manager. Open the folder then the mapping sub folder then select all or some of
the mappings(by pressing the shift or control key ctrl+A does not work) and then rightclick and validate.
=======================================
Yes. We can validate all mappings using the Repo Manager.
276.Informatica - How many types of TASKS we have in

Workflomanager? What r they?
work flow task :
1)session 2)command 3)email 4)control 5)command 6)presession 7)post session 8)assigment
=======================================
1) session2) command 3) email4) event-wait5) event-raise6) assignment7) control8) decision9)timer10) worklet3) 8) 9)
are self explanatory. 1) run mappings. 2) run OS commands/scripts. 4 + 5) raise user-defined or pre-defined events and
wait for the the event to be raised. 6) assign values to workflow var 10) run worklets.
=======================================
The following Tasks we r having in Workflow manager Assignment Control Command decision E-mail
Session Event-Wait Event-raise and Timer. The Tasks developed in the task developer rreusable tasks and taske which
r developed by useing workflow or worklet r non reusable. Among these tasks only Session Command and E-mail r the
reusable remaining tasks r non reusable.
274.Informatica - What is target load order ?
In a mapping if there are more than one target table then we need to give in which order the target tables should be
loaded
example: suppose in our mapping there are 2 target table
1. customer
2. Audit table
first customer table should be populated than Audit table for that we use target load order
269.Informatica - how did you handle errors?(ETL-Row-Errors)

If there is an error comes it stored it on target_table.bad file.
The error are in two type
1. row-based errors
2. column based errors
column based errors identified by
D-GOOD DATA N-NULL DATA O-OVERFLOW DATA R-REJECTED DATA
the data stored in .bad file
D1232234O877NDDDN23 Like that
268.Informatica - what is dynamic insert?
When we selecting the dynamic cache in look up transformation the informatica server create the new look up row port
it will indicates the numeric value wheather the informatica server inserts updates or makes no changes to the look up
cashe and if u associate a sequence id the informatica server create a sequence id for newly inserted records.
266.Informatica - what is the event-based scheduling?
In time based scheduling the jobs run at the specified time. In some situations we've to run a job based on some events
like if a file arrives then only the job has to run whatever the time it is. In such cases event based scheduling is used.
=======================================
event based scheduling is using for row indicator file. when u dont no where is the source data that time we use
shellcommand script batch file to send to the local directory of the Informatica server is waiting for row indiactor file
befor running the session.
262.Informatica - in which particular situation we use

unconnected lookup transformation?
hi,
both unconnected and connected will provide single output. if it is the
case that we can use either unconnected or connected i prefer unconnected why because unconnected does not
participate in the dataflow so informatica server creates a seperate cache for unconnected and processing takes place
parallely. So performance increases.
We can use the unconnected lookup transformation when i need to return the only one port at that time I will use the
unconnected lookup transformation instead of connected. We can also use the connected to return the one port but if u
r taking unconnected lookup transformation it is not connected to the other transformation and it is not a part of data
flow that why performance will increase.
=======================================
The major advantage of unconnected lookup is its reusability. We can call an unconnected lookup multiple times in the
mapping unlike connected lookup.
=======================================
We can use the unconnected lookup transformation when we need to return the output from a single port.
If we want the output from a multiple ports at that time we have to use connected lookup
Transformation.
=======================================
Use of connected and unconnected Lookup is completely based on the logic which we need.
However i just wanted to clear that we can get multiple rows data from an unconnected lookup also.
Just concatenate all the values which you want and get the result from the return row of unconnected lookup and then
further split it in the expression.
However using Unconnected lookup takes more time as it breaks the flow and goes to an unconnected lookup to fetch
the results.
=======================================
both unconnected and connected will provide single output. if it is the case that we can use either unconnected or
connected i prefer unconnected why because unconnected doesnot participate in the dataflow so informatica server
creates a seperate cache for unconnected and processing takes place parallely. so performance increases.
247.Informatica - What is Shortcut? What is use of it?

Shortcut is a facility providing by informatica to share metadata objects across folders without copying the objects to
every folder.we can create shortcuts for Source definitions Reusable transformations
Mapplets Mappings Target definitions Business components.there are two diffrent types of shortcuts 1.local shortcut2.
global shortcut
=======================================
248.Informatica - what is the use of Factless Facttable?
Factless Fact table are fact table which is not having any measures.
For example - You want to store the attendance information of the student. This table will give you datewise whether the
student has attended the class or not. But there is no measures because fees paid etc is not daily.
=======================================
transaction can occur without the measure
fore example victim id
236.Informatica - What Bulk & Normal load? Where we use Bulk and where Normal?
when we try to load data in bulk mode there will be no entry in database log files so it will be tough to recover data if
session got failed at some point. where as in case of normal mode entry of every record will be with database log file
and with the informatica repository. so if the session got failed it will be easy for us to start data from last committed
point.
Bulk mode is very fast compartively with normal mode.
we use bulk mode to load data in databases it won't work with text files using as target where as normal mode will work
fine with all type of targets.
=======================================
in case of bulk for group of records a dml statement will created and executed
but in the case of normal for every recorda a dml statement will created and executed
if u selecting bulk performance will be increasing
=======================================
Bulk mode is used for Oracle/SQLserver/Sybase. This mode improves performance by not writing to the database log.
As a result when using this mode recovery is unavailable. Further this mode doesn't work when update transformation is
used and there shouldn't be any indexes or constraints on the table. Ofcourse one can use the pre-session and postsession SQLs to drop and rebuild indexes/constraints.
234.Informatica - Explain in detail about Key Range & Round Robin partition with an example.
key range: The informatica server distributes the rows of data based on the st of ports that u specify as the partition key.
Round robin: The informatica server distributes the equal no of rows for each and every partition.
233.Informatica - What is the differance between Local and
Global repositary?
You can develop global and local repositories to share metadata:
Global repository. The global repository is the hub of the domain. Use the global repository to store common objects
that multiple developers can use through shortcuts. These objects may include operational or Application source
definitions reusable transformations mapplets and mappings.
Local repositories. A local repository is within a domain that is not the global repository. Use
local repositories for development. From a local repository you can create shortcuts to objects in shared folders in the
global repository. These objects typically include source definitions
common dimensions and lookups and enterprise standard transformations. You can also create
copies of objects in non-shared folders.
230.Informatica - What is CDC?
Changed Data Capture (CDC) helps identify the data in the source system that has changed since the last extraction.
With CDC data extraction takes place at the same time the insert update or delete operations occur in the source tables
and the change data is stored inside the database in change tables.
The change data thus captured is then made available to the target systems in a controlled manner.
=======================================
CDC Changed Data Capture. Name itself saying that if any data is changed it will how to get the values. for this one we
have type1 and type2 and type3 cdc's are there. depending upon our requirement we can fallow.
=======================================
Whenever any source data is changed we need to capture it in the target system also this can be basically in 3 ways
Target record is completely replaced with new record(Type 1)
Complete changes can be captured as different records & stored in the target table(Type 2)
Only last change & present data can be captured (Type 3)
CDC can be done generally by using a timestamp or version key
228.Informatica - what is the repository agent?
The Repository Agent is a multi-threaded process that fetches inserts and updates metadata in the repository database
tables. The Repository Agent uses object locking to ensure the consistency of metadata in the repository.
=======================================
The Repository Server uses a process called Repository agent to access the tables from Repository database. The
Repository sever uses multiple repository agent processes to manage multiple repositories on different machines on the
network using native drivers.
=======================================
Name itself it is saying that agent means mediator between and repository server and repository database tables.
simply repository agent means who speaks with repository.
224.Informatica - what are the transformations that restrict the partitioning of sessions?
Advanced External procedure transformation and External procedure transformation:
This Transformation contains a check box on the properties tab to allow
partitioning.
*Aggregator Transformation:
If you use sorted ports you cannot partition the associated source
*Joiner Transformation:
you can not partition the master source for a joiner transformation
*Normalizer Transformation
*XML targets.
=======================================
1)source defination
2)Sequence Generator
3)Unconnected Transformation
4)Xml Target defination
Advanced External procedure transformation and External procedure transformation:
This Transformation contains a check box on the properties tab to allow partitioning.
Aggregator Transformation:
If you use sorted ports you cannot partition the associated source
Joiner Transformation:
you can not partition the master source for a joiner transformation
Normalizer Transformation
XML targets.
222.Informatica - What about rapidly changing dimensions?Can u analyze with an example?
216.Informatica - what is the architecture of any Data

warehousing project? what is the flow?
213.Informatica - How do you create single lookup transformation using multiple tables?
Write a override sql query. Adjust the ports as per the sql query.
=======================================
no it is not possible to create single lookup on multiple tables. beacuse lookup is created upon target
table.
=======================================
for connected lkp transformation1>create the lkp transformation.2>go for skip.3>manually enter the
ports name that u want to lookup.4>connect with the i/p port from src table.5>give the condition6>go
for generate sql then modify according to u'r requirement validateit will work....
=======================================
just we can create the view by using two table then we can take that view as lookup table
=======================================
If you want single lookup values to be used in multiple target tables this can be done !!!
For this we can use Unconnected lookup and can collect the values from source table in any target table
depending upon the business rule ...
212.Informatica - why did u use update stategy in your

application?
Update Strategy is used to drive the data to be Inert Update and Delete depending upon some condition.
You can do this on session level tooo but there you cannot define any condition.For eg: If you want to do update and
insert in one mapping...you will create two flows and will make one as insert and one as update depending upon some
condition.Refer : Update Strategy in Transformation Guide for more
information
=======================================
Update Strategy is the most important transformation of all Informatica transformations.
The basic thing one should understand about this is it is essential transformation to perform DML
operations on already data populated targets(i.e targets which contain some records before this mapping
loads data)
It is used to perform DML operations.
Insertion Updation Deletion Rejection
when records come to this transformation depending on our requirement we can decide whether to insert or update or
reject the rows flowing in the mapping.
For example take an input row if it is already there in the target(we find this by lookup transformation)
update it otherwise insert it.
We can also specify some conditions based on which we can derive which update strategy we have to use.
eg: iif(condition DD_INSERT DD_UPDATE)
if condition satisfies do DD_INSERT otherwise do DD_UPDATE
DD_INSERT DD_UPDATE DD_DELETE DD_REJECT are called as decode options which can perform the respective
DML operations.
There is a function called DECODE to which we can arguments as 0 1 2 3
DECODE(0) DECODE(1) DECODE(2) DECODE(3) for insertion updation deletion and rejection
=======================================
Update Strategy is the most important transformation of all Informatica transformations.
The basic thing one should understand about this is it is essential transformation to perform DML operations on already
data populated targets(i.e targets which contain some records before this mapping loads data)
It is used to perform DML operations.
Insertion Updation Deletion Rejection
when records come to this transformation depending on our requirement we can decide whether to insert or update or
reject the rows flowing in the mapping.
For example take an input row if it is already there in the target(we find this by lookup transformation) update it
otherwise insert it.
We can also specify some conditions based on which we can derive which update strategy we have to use.
eg: iif(condition DD_INSERT DD_UPDATE)
if condition satisfies do DD_INSERT otherwise do DD_UPDATE
DD_INSERT DD_UPDATE DD_DELETE DD_REJECT are called as decode options which can perform the respective
DML operations.
There is a function called DECODE to which we can arguments as 0 1 2 3

DECODE(0) DECODE(1) DECODE(2) DECODE(3) for insertion updation deletion and rejection to perform dml
operations
205.Informatica - how do we do unit testing in informatica?how do we load data in informatica ?

Unit testing are of two types
1. Quantitaive testing
2.Qualitative testing
Steps.
1.First validate the mapping
2.Create session on themapping and then run workflow.
Once the session is succeeded the right click on session and go for statistics tab.
There you can see how many number of source rows are applied and how many number of rows loaded in to targets
and how many number of rows rejected.This is called Quantitative testing. If once rows are successfully loaded then we
will go for qualitative testing.
Steps
1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source
columns) and check whether the data is loaded according to the DATM in to target table.If any data is not loaded
according to the DATM then go and check in the code and rectify it.
This is called Qualitative testing.
This is what a devloper will do in Unit Testing.
197.Informatica - how can we store previous session logs

Just run the session in time stamp mode then automatically session log will not overwrite current session log.
We can do this way also. using $PMSessionlogcount(specify the number of runs of the session log to
save)
Go to Session-->right click -->Select Edit Task then Goto -->Config Object then set the property
Save Session Log By --Runs
Save Session Log for These Runs --->To Number of Historical Session logs you want
184.Informatica - what is the difference between constraind base load ordering and target load plan
Constraint based load ordering
example:
Table 1---Master
Tabke 2---Detail
If the data in table1 is dependent on the data in table2 then table2 should be loaded first.In such cases to control the
load order of the tables we need some conditional loading which is nothing but constraint based load
In Informatica this feature is implemented by just one check box at the session level.
Target load order comes in the designer property..Click mappings tab in designer and then target load plan.It will show
all the target load groups in the particular mapping. You specify the order there the server will loadto the target
accordingly.
A target load group is a set of source-source qulifier-transformations and target.
Where as constraint based loading is a session proerty. Here the multiple targets must be generated from one source
qualifier. The target tables must posess primary/foreign key relationships. So that the server loads according to the key
relation irrespective of the Target load order plan.
=======================================
If you have only one source it s loading into multiple target means you have to use Constraint based loading. But the
target tables should have key relationships between them.
If you have multiple source qualifiers it has to be loaded into multiple target you have to use Target load order.
Constraint based loading : If your mapping contains single pipeline(flow) with morethan one target (If target tables
contain Master -Child relationship) you need to use constraint based load in session level.
Target Load plan : If your mapping contains multipe pipeline(flow) (specify execution order one by one.example
pipeline 1 need to execute first then pipeline 2 then pipeline 3) this is purly based on pipeline dependency
179.Informatica - hwo can we eliminate duplicate rows from flat file?

keep aggregator between source qualifier and target and choose group by field key it will eliminate the duplicate
records.
=======================================
Hi Before loading to target use an aggregator transformation and make use of group by function to eleminate the
duplicates on columns .Nanda
=======================================
Use Sorter Transformation. When you configure the Sorter Transformation to treat output rows as distinct it configures
all ports as part of the sort key. It therefore discards duplicate rows compared during the sort operation
Hi Before loading to target Use an aggregator transformation and use group by clause to eliminate the duplicate in
columns.Nanda
=======================================
Use sorter transformation select distinct option duplicate rows will be eliminated.
=======================================
if u want to delete the duplicate rows in flat files then we go for rank transformation or oracle external procedure
tranfornation
select all group by ports and select one field for rank its easily dupliuctee now
=======================================
using Sorter Transformation we can eliminate the Duplicate Rows from Flat file
=======================================
to eliminate the duplicate in flatfiles we have distinct property in sorter transformation. If we enable that property
automatically it will remove duplicate rows in flatfiles.
178.Informatica - what is Partitioning ? where we can use

Partition? wht is advantages?Is it nessisary?
The Partitioning Option increases PowerCenters performance through parallel data processing and this option provides
a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and
grid-based hardware environments.
=======================================
partitions are used to optimize the session performance
we can select in sesstion propetys for partiotions
types
default----passthrough partition
key range partion
round robin partion
hash partiotion
=======================================
In informatica we can tune performance in 5 different levels that is at source level target level mapping level session
level and at network level.
So to tune the performance at session level we go for partitioning and again we have 4 types of partitioning those are
pass through hash round robin key range.
pass through is the default one.
In hash again we have 2 types that is userdefined and automatic.
round robin can not be applied at source level it can be used at some transformation level
key range can be applied at both source or target levels.
if you want me to explain each partioning level in detail the i can .
158.Informatica - Difference between Rank and Dense Rank?

Rank:
1
2<--2nd position
2<--3rd position
4
5
Same Rank is assigned to same totals/numbers. Rank is followed by the Position. Golf game ususally
Ranks this way. This is usually a Gold Ranking.
Dense Rank:
1
2<--2nd position
2<--3rd position
3
4
--------------------------------------------------------------------151.Informatica - How do you configure mapping in informatica
You should configure the mapping with the least number of transformations and expressions to do the most amount of
work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator Joiner Rank and Lookup transformations) limit connected
input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data
the transformations store in the data cache.
You can also perform the following tasks to optimize the mapping:
l Configure single-pass reading.
l Optimize datatype conversions.
l Eliminate transformation errors.
l Optimize transformations.
l Optimize expressions. You should configure the mapping with the least number of
transformations and expressions to do the most amount of work possible. You should minimize the amount of data
moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator Joiner Rank and Lookup
transformations) limit connected input/output or output ports. Limiting the number of connected input/output or output
ports reduces the amount of data the transformations store in the data
You can also perform the following tasks to optimize the mapping:
m Configure single-pass reading.
m Optimize datatype conversions.
m Eliminate transformation errors.
m Optimize transformations.
m Optimize expressions.
149.Informatica - what are mapping parameters and varibles in which situation we can use it
Mapping parameters have a constant value through out the session
whereas in mapping variable the values change and the informatica server saves the values in the repository and uses
next time when u run the session.
=======================================
If we need to change certain attributes of a mapping after every time the session is run it will be very difficult to edit the
mapping and then change the attribute. So we use mapping parameters and variables and define the values in a
parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter value then we need to edit the
parameter file .
But value of mapping variables can be changed by using variable function. If we need to increment the attribute value
by 1 after every session run then we can use mapping variables .
In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.
=======================================
How can you edit the parameter file? Once you setup a mapping variable how can you define them in a
parameter file?
How to measure Performance of ETL load process

Dear All,
Am new to ETL testing. I have a testing requirement to capture the performance of an ETL system's load process for
various loads in order to size the product components and perform capacity planning. The measurements to be
captured include
1. Amount of data processed per second
2. Number of records processed per second
Is there any tool which would help me in achieving this? Any pointers/ guidance in this direction would be greatly
appreciated. Many thanks in advance.
1)The metadata of your warehouse ---->
Query it to find the things you have asked like throughput, Time taken for a ETL feed to complete and compare it with
the expected values assumed prior to development and measure the lag, if any. If you are using a third party scheduler
measure the hold time of the feeds (feeds waiting for a prior process/feed to finish).If your metadata is non relational
you need to use a metadata reporting utility of the ETL tool. You need not need to do anything in run time if your
metadata and logs are having appropriate info.
2)Ask for the logs :---->
For a days complete run have the logs and find the statistics of your feeds.
These logs will give the time taken for your initialization, extraction,Transformation and Load time taken by the tool. As
these files will have a standard format create a script to find the standard lines in log like :-- session started ----time .---session completed.
3)Prepare a sheet to list the values for the parameters of the feeds you are testing , I have mentioned some of the
parameters in the above two points.
4)Sum the things from the sheet to find the problem area, if any.
139.Informatica - what are cost based and rule based approaches and the difference
Cost based and rule based approaches are the optimization techniques which are used in related to databases where
we need to optimize a sql query.
Basically Oracle provides Two types of Optimizers (indeed 3 but we use only these two techniques. Bcz the third has
some disadvantages.)
When ever you process any sql query in Oracle what oracle engine internally does is it reads the query and decides
which will the best possible way for executing the query. So in this process Oracle follows these optimization
techniques.
1. cost based Optimizer(CBO): If a sql query can be executed in 2 different ways ( like may have path 1 and path2 for
same query) then What CBO does is it basically calculates the cost of each path and the analyses for which path the
cost of execution is less and then executes that path so that it can optimize the quey execution.
2. Rule base optimizer(RBO): this basically follows the rules which are needed for executing a query.
So depending on the number of rules which are to be applied the optimzer runs the query.
If the table you are trying to query is already analysed then oracle will go with CBO.
If the table is not analysed the Oracle follows RBO.
For the first time if table is not analysed Oracle will go with full table scan.
138.Informatica - what are partition points?

Partition points mark the thread boundaries in a source pipeline and divide
the pipeline into stages.
129.Informatica - How do you handle decimal places while

importing a flatfile into informatica?
while geeting the data from flat file in informatica ie import data from flat file it will ask for the precision just enter that
=======================================
while importing the flat file the flat file wizard helps in configuring the properties of the file so that select the numeric
column and just enter the precision value and the scale. precision includes the scale for example if the number is
98888.654 enter precision as 8 and scale as 3 and width as 10 for fixed width flat file
=======================================
you can handle that by simply using the source analyzer window and then go to the ports of that flat file representations
and changing the precision and scales.
=======================================
while importing flat file definetion just specify the scale for a neumaric data type. in the mapping the flat file source
supports only number datatype(no decimal and integer). In the SQ associated with that source will have a data type as
decimal for that number port of the source.
source ->number datatype port ->SQ -> decimal datatype.Integer is not supported. hence decimal is taken care.
126.Informatica - Can we use aggregator/active transformation after update strategy transformation

we can use but the update flag will not be remain.but we can use passive transformation
=======================================
I guess no update can be placed just before to the target qs per my knowledge
=======================================
You can use aggregator after update strategy. The problem will be once you perform the update strategy say you had
flagged some rows to be deleted and you had performed aggregator transformation for all rows say you are using SUM
function then the deleted rows will be subtracted from this aggregator transformation.
122.Informatica - What is the procedure to load the fact table.Give in detail?
Based on the requirement to your fact table choose the sources and data and transform it based on your business
needs. For the fact table you need a primary key so use a sequence generator
transformation to generate a unique key and pipe it to the target (fact) table with the foreign keys from the source tables.
=======================================
we use the 2 wizards (i.e) the getting started wizard and slowly changing dimension wizard to load the fact and
dimension tables by using these 2 wizards we can create different types of mappings according to the business
requirements and load into the star schemas(fact and dimension tables).
=======================================
first dimenstion tables need to be loaded then according to the specifications the fact tables should be loaded. dont
think that fact tables r different in case of loading it is general mapping as we do for other tables. specifications will play
important role for loading the fact.
=======================================
usually source records are looked up with the records in the dimension table.DIM tables are called lookup or reference
table. all the possible values are stored in DIM table. e.g product all the existing prod_id will be in DIM table. when data
from source is looked up against the dim table the corresponding keys are sent to the fact table.this is not the fixed rule
to be followed it may vary as per ur requirments and methods u follow.some times only the existance check will be done
and the prod_id itself will be sent to the fact.
121.Informatica - How to lookup the data on multiple tabels.
why using SQL override..we can lookup the Data on multiple tables.See in the properties..tab..
=======================================
Thanks for your responce. But my question is
I have two sources or target tables i want to lookup that two sources or target tables. How can i. It is possible to SQL
Override.
=======================================
just check with import option
=======================================
How to lookup the data on multiple tabels.
=======================================
if u want to lookup data on multiple tables at a time u can do one thing join the tables which u want then lookup that
joined table. informatica provieds lookup on joined tables hats off to informatica.
=======================================
You can do it.
When you create lookup transformation that time INFA asks for table name so you can choose either source target
import and skip. So click skip and the use the sql overide property in properties tab to join two table for lookup.
join the two source by using the joiner transformation and then apply a look up on the resaulting table
=======================================
what ever my friends have answered earlier is correct. to be more specific
if the two tables are relational then u can use the SQL lookup over ride option to join the two tables in the lookup
properties.u cannot join a flat file and a relatioanl table.
eg: lookup default query will be select lookup table column_names from lookup_table. u can now continue this query.
add column_names of the 2nd table with the qualifier and a where clause. if u want to use a order by then use -- at the
end of the order by.
120.Informatica - How to retrive the records from a rejected file. explane with syntax or example
there is one utility called reject Loader where we can findout the reject records.and able to refine and reload the rejected
records..
=======================================
ya. every time u run the session one reject file will be created and all the reject files will be there in the reject file. u can
modify the records and correct the things in the records and u can load them to the target directly from the reject file
using Regect loader. =======================================
can you explain how to load rejected rows thro informatica
=======================================
During the execution of workflow all the rejected rows will be stored in bad files(where your
informatica server get installed;C:\Program Files\Informatica PowerCenter 7.1\Server) These bad files can be imported
as flat a file in source then thro' direct maping we can load these files in desired format.
98.Informatica - can we modify the data in flat file?
=======================================
Let's not discuss about manually modifying the data of flat file.
Let's assume that the target is a flat file. I want to update the data in the flat file target based on the input source rows.
Like we use update strategy/ target properties in case of relational targets for update; do we have any options in the
session or maaping to perform a similar task for a flat file target?
I have heard about the append option in INFA 8.x. This may be helpful for incremental load in the flat file.
But this is not a workaround for updating the rows.
=======================================
You can modify the flat file using shell scripting in unix ( awk grep sed ).
97.Informatica - how to get the first 100 rows from the flat file into the target?
please check this one
task ----->(link) session (workflow manager)
double click on link and type $$source sucsess rows(parameter in session variables) 100
it should automatically stops session.
82.Informatica - If i done any modifications for my table in back end does it reflect in informatca warehouse or
mapi
Informatica is not at all concern with back end data base.It
displays u all the information that is to be stored in repository.If want to reflect back end changes to informatica
screens,again u have to import from back end to informatica by valid connection.And u have to replace the existing files
with imported files.
=======================================
Yes It will be reflected once u refresh the mapping once again.
=======================================
It does matter if you have SQL override - say in the SQ or in a Lookup you override the default sql. Then if you make a
change to the underlying table in the database that makes the override SQL incorrect for the modified table the session
will fail.
If you change a table - say rename a column that is in the sql override statement then session will fail.
But if you added a column to the underlying table after the last column then the sql statement in the override will still be
valid. If you make change to the size of columns the sql will still be valid although you may get truncation of data if the
database column has larger size (more characters) than the SQ or subsequent transformation.
17.Informatica - What r the mapping paramaters and maping
variables?
17 Maping parameter represents a constant value that U can define before running a session.A mapping parameter
retains the same value throughout the entire session.
When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define the value of
parameter in a parameter file for the session.
Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The
informatica server saves the value of maping variable to the repository at the end of session run and uses that value
next time
U run the session.
21.Informatica - What is aggregate cache in aggregator
transforamtion?
21 The aggregator stores data in the aggregate cache until it completes aggregate calculations.When u run a session
that uses an aggregator transformation,the informatica server creates index and data caches in memory to process the
transformation.If the informatica server requires more space,it stores overflow values in cache files.
26.Informatica - What r the joiner caches?
26 When a Joiner transformation occurs in a session, the
Informatica Server reads all the records from the master source and builds index
and data caches based on the master rows.
After building the caches, the Joiner transformation reads records from the detail
source and perform joins.
30.Informatica - Differences between connected and unconnected
lookup?
32.Informatica - What r the types of lookup caches?
32 Persistent cache: U can save the lookup cache files and reuse
them the next time the informatica server processes a lookup transformation
configured to use the cache.
Recache from database: If the persistent cache is not synchronized with he
lookup table,U can configure the lookup transformation to rebuild the lookup
cache.
Static cache: U can configure a static or readonly cache for only lookup table.By
default informatica server creates a static cache.It caches the lookup table and
lookup values in the cache for each row that comes into the transformation.when
the lookup condition is true,the informatica server does not update the cache
while it prosesses the lookup transformation.
Dynamic cache: If u want to cache the target table and insert new rows into
cache and the target,u can create a look up transformation to use dynamic cache.
The informatica server dynamically inerts data to the target table.
shared cache: U can share the lookup cache between multiple transactions.U can
share unnamed cache between transformations inthe same maping.
36.Informatica - What is the Rankindex in Ranktransformation?
36 The Designer automatically creates a RANKINDEX port for
each Rank transformation. The Informatica Server uses the Rank Index port to
store the ranking position for each record in a group. For example, if you create a
Rank transformation that ranks the top 5 salespersons for each quarter, the rank
index numbers the salespeople from 1 to 5:
38.Informatica - What r the types of groups in Router

transformation?
38 Input group Output group
The designer copies property information from the input ports of the input group
to create a set of output ports for each output group.
Two types of output groups
User defined groups
Default group
U can not modify or delete default groups.
39.Informatica - Why we use stored procedure transformation?
l Check the status of a target database before loading data into it.
l Determine if enough space exists in a database.
l Perform a specialized calculation.
l Drop and recreate indexes.
41.Informatica - What r the tasks that source qualifier performs?
l Join data originating from the same source database. You can join two or more tables with
primary-foreign key relationships by linking the sources to one Source Qualifier.
l Filter records when the Informatica Server reads source data. If you include a filter condition the
Informatica Server adds a WHERE clause to the default query.
l Specify an outer join rather than the default inner join. If you include a user-defined join the
Informatica Server replaces the join information specified by the metadata in the SQL query.
l Specify sorted ports. If you specify a number for sorted ports the Informatica Server adds an
ORDER BY clause to the default SQL query.
l Select only distinct values from the source. If you choose Select Distinct the Informatica Server
adds a SELECT DISTINCT statement to the default SQL query.
l Create a custom query to issue a special SELECT statement for the Informatica Server to read
source data. For example you might use a custom query to perform aggregate calculations or execute a
stored procedure.
Cheers
Sithu
42.Informatica - What is the target load order?
42 U specify the target loadorder based on source qualifiers in a
maping.If u have the multiple
source qualifiers connected to the multiple targets,U can designatethe order in
which informatica
server loads data into the targets.
45.Informatica - what is update strategy transformation ?
45 This transformation is used to maintain the history data or just most recent changes in to target table.
Update strategy transformation is used for flagging the records for insert
update delete and reject
The model you choose constitutes your update strategy how to handle changes to existing rows. In PowerCenter and
PowerMart you set your update strategy at two different levels:
l Within a session. When you configure a session you can instruct the Informatica Server to either treat all rows in the
same way (for example treat all rows as inserts) or use instructions coded into the session mapping to flag rows for
different database operations.
l Within a mapping. Within a mapping you use the Update Strategy transformation to flag rows for insert delete update
or reject.
46.Informatica - What is the default source option for update
stratgey transformation?
46 Data driven.
47.Informatica - What is Datadriven?

47 The informatica server follows instructions coded into update strategy transformations with in the session maping
determine how to flag records for insert, update, delete or reject. If u do not choose data driven option setting,the
informatica server ignores all update strategy transformations in the mapping.
When Data driven option is selected in session properties it the code will consider the update strategy
(DD_UPDATE DD_INSERT DD_DELETE DD_REJECT) used in the mapping and not the options
selected in the session properties.
48.Informatica - What r the options in the target session of update
strategy transsformatioin?
48 Insert
Delete
Update
Update as update
Update as insert
Update esle insert
Truncate table
Update as Insert:
This option specified all the update records from source to be flagged as inserts in the target. In other
words instead of updating the records in the target they are inserted as new records.
Update else Insert:
This option enables informatica to flag the records either for update if they are old or insert if they are
new records from source.
49.Informatica - What r the types of maping wizards that r to be
provided in Informatica?
49 The Designer provides two mapping wizards to help you create
mappings quickly and easily. Both wizards are designed to create mappings for
loading and maintaining star schemas, a series of dimensions related to a central
fact table.
Getting Started Wizard. Creates mappings to load static fact and dimension
tables, as well as slowly growing dimension tables.
Slowly Changing Dimensions Wizard. Creates mappings to load slowly
changing dimension tables based on the amount of historical dimension data you
want to keep and the method you choose to handle historical dimension data.
50.Informatica - What r the types of maping in Getting Started
Wizard?
50 Simple Pass through maping :
Loads a static fact or dimension table by inserting all rows. Use this mapping
when you want to drop all existing data from your table before loading new
data.
Slowly Growing target :
Loads a slowly growing fact or dimension table by inserting new rows. Use this
mapping to load new data when existing data does not require updates.
52.Informatica - What r the different types of Type2 dimension
maping?
52 Type2 Dimension/Version Data Maping: In this maping the
updated dimension in the source will gets inserted in target along with a new
version number.And newly added dimension
in source will inserted into target with a primary key.
Type2 Dimension/Flag current Maping: This maping is also used for slowly
changing dimensions.In addition it creates a flag value for changed or new
dimension.
Flag indiactes the dimension is new or newlyupdated.Recent dimensions will

gets saved with cuurent flag value 1. And updated dimensions r saved with the
value 0.
Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2
maping used for slowly changing dimensions.This maping also inserts both new
and changed dimensions in to the target.And changes r tracked by the effective
date range for each version of each dimension.
58.Informatica - Why we use partitioning the session in
informatica?
58 Partitioning achieves the session performance by reducing the
time period of reading the source and loading the data into target.
Performance can be improved by processing data in parallel in a single session by creating multiple
partitions of the pipeline.
Informatica server can achieve high performance by partitioning the pipleline and performing the
extract transformation and load for each partition in parallel.
59.Informatica - How the informatica server increases the session
performance through partitioning the source?
59 For a relational sources informatica server creates multiple
connections for each parttion of a single source and extracts seperate range of
data for each connection.Informatica server reads multiple partitions of a single
source concurently.Similarly for loading also informatica server creates multiple
connections to the target and loads partitions of data concurently.
For XML and file sources,informatica server reads multiple files concurently.For
loading the data informatica server creates a seperate file for each partition(of a
source file).U can choose to merge the targets.
60.Informatica - What r the tasks that Loadmanger process will
do?
60 Manages the session and batch scheduling: Whe u start the
informatica server the load maneger launches and queries the repository for a list
of sessions configured to run on the informatica server.When u configure the
session the loadmanager maintains list of list of sessions and session start times.
When u sart a session loadmanger fetches the session information from the
repository to perform the validations and verifications prior to starting DTM
process.
Locking and reading the session: When the informatica server starts a session
lodamaager locks the session from the repository.Locking prevents U starting the
session again and again.
Reading the parameter file: If the session uses a parameter files,loadmanager
reads the parameter file and verifies that the session level parematers are
declared in the file
Verifies permission and privelleges: When the sesson starts load manger checks
whether or not the user have privelleges to run the session.
Creating log files: Loadmanger creates logfile contains the status of session.
61.Informatica - What r the different threads in DTM process?
61 Master thread: Creates and manages all other threads
Maping thread: One maping thread will be creates for each session.Fectchs
session and maping information.
Pre and post session threads: This will be created to perform pre and post session
operations.
Reader thread: One thread will be created for each partition of a source.It reads
data from source.
Writer thread: It will be created to load data to the target.
Transformation thread: It will be created to tranform data.
63.Informatica - What is batch and describe about types of

batches?
63 Grouping of session is known as batch.Batches r two types
Sequential: Runs sessions one after the other
Concurrent: Runs session at same time.
If u have sessions with source-target dependencies u have to go for sequential
batch to start the
sessions one after another.If u have several independent sessions u can use
concurrent batches.
Whch runs all the sessions at the same time.
65.Informatica - When the informatica server marks that a batch is
failed?
65 If one of session is configured to "run if previous completes" and
that previous session fails.
A batch fails when the sessions in the workflow are checked with the property
"Fail if parent fails"
and any of the session in the sequential batch fails.
66.Informatica - What r the different options used to configure the
sequential batches?
66 Two options
Run the session only if previous session completes sucessfully. Always runs the
session.
67.Informatica - In a sequential batch can u run the session if
previous session fails?
67 Yes.By setting the option always runs the session.
71.Informatica - What r the session parameters?
71
Session parameters r like maping parameters,represent values U might want to
change between
sessions such as database connections or source files.
Server manager also allows U to create userdefined session parameters.
Following r user defined
session parameters.
Database connections
Source file names: use this parameter when u want to change the name or
location of
session source file between session runs
Target file name : Use this parameter when u want to change the name or
location of
session target file between session runs.
Reject file name : Use this parameter when u want to change the name or
location of
session reject files between session runs.
72.Informatica - What is parameter file?
72 Parameter file is to define the values for parameters and
variables used in a session.A parameter
file is a file created by text editor such as word pad or notepad.
U can define the following values in parameter file
Maping parameters
Maping variables
session parameters
73.Informatica - What is difference between partioning of

relatonal target and partitioning of file targets?
73 If u parttion a session with a relational target informatica
server creates multiple connections
to the target database to write target data concurently.If u partition a session
with a file target
the informatica server creates one target file for each partition.U can configure
session properties
to merge these target files.
74.Informatica - Performance tuning in Informatica?
74 The goal of performance tuning is optimize session performance
so sessions run during the available load window for the Informatica Server.
Increase the session performance by following.
The performance of the Informatica Server is related to network connections.
Data generally moves across a network at less than 1 MB per second, whereas a
local disk moves data five to twenty times faster. Thus network connections
ofteny affect on session performance.So aviod netwrok connections.
Flat files: If ur flat files stored on a machine other than the informatca server,
move those files to the machine that consists of informatica server.
Relational datasources: Minimize the connections to sources ,targets and
informatica server to improve session performance.Moving target database into server system may improve session
performance.
Staging areas: If u use staging areas u force informatica server to perform
multiple datapasses. Removing of staging areas may improve session performance.
U can run the multiple informatica servers againist the same repository.
Distibuting the session load to multiple informatica servers may improve
session performance. Run the informatica server in ASCII datamovement mode improves the session
performance.Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to
store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the
query may improve performance. Also, single table select statements with an
ORDER BY or GROUP BY clause may benefit from optimization such as adding
indexes.
We can improve the session performance by configuring the network packet size,
which allows data to cross the network at one time.To do this go to server manger ,choose server configure database
connections.
If u r target consists key constraints and indexes u slow the loading of data.To
improve the session performance in this case drop constraints and indexes before
u run the session and rebuild them after completion of session.
Running a parallel sessions by using concurrent batches will also reduce the time
of loading the data.So concurent batches may also increase the session performance.
Partitioning the session improves the session performance by creating multiple
connections to sources and targets and loads data in paralel pipe lines.
In some cases if a session contains a aggregator transformation ,u can use
incremental aggregation to improve session performance.
Aviod transformation errors to improve the session performance.
If the sessioin containd lookup transformation u can improve the session
performance by enabling the look up cache.
If Ur session contains filter transformation ,create that filter transformation
nearer to the sources or u can use filter condition in source qualifier.
Aggreagator,Rank and joiner transformation may oftenly decrease the session
performance .Because they must group data before processing it.
To improve session performance in this case use sorted ports option.
76.Informatica - Define informatica repository?

76 The Informatica repository is a relational database that stores information, or metadata, used by the Informatica
Server and Client tools.
Metadata can include information such as mappings describing how to
transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and
connect strings for sources and targets.
The repository also stores administrative information such as usernames and passwords, permissions and privileges,
and product version.
Use repository manager to create the repository.The Repository Manager connects to the repository database and runs
the code needed to create the repository tables.Thsea tables stores metadata in specific format the informatica
server,client tools use.
78.Informatica - What is power center repository?
78 The PowerCenter repository allows you to share metadata across repositories to create a data mart domain. In a
data mart domain, you can create a single global repository to store metadata used across an enterprise, and a number
of local repositories to share the global metadata as needed.
l Standalone repository. A repository that functions individually unrelated and unconnected to other
repositories.
l Global repository. (PowerCenter only.) The centralized repository in a domain a group of connected repositories. Each
domain can contain one global repository. The global repository can contain common objects to be shared throughout
the domain through global shortcuts.
l Local repository. (PowerCenter only.) A repository within a domain that is not the global repository.
Each local repository in the domain can connect to the global repository and use objects in its shared folders.
41.What are stored procedure transformations. Purpose of sp transformation. How did you go about using your
project?
Connected and unconnected stored procudure.
Unconnected stored procedure used for data base level activities such as pre and post load
Connected stored procedure used in informatica level for example passing one parameter as input and capturing return
value from the stored procedure.
Normal - row wise check
Pre-Load Source - (Capture source incremental data for incremental aggregation)
Post-Load Source - (Delete Temporary tables)
Pre-Load Target - (Check disk space available)
Post-Load Target (Drop and recreate index)
60.Update strategy set DD_Update but in session level have insert. What will happens?
Insert take place. Because this option override the mapping level option
101.Variable v1 has values set as 5 in designer(default), 10 in parameter file, 15 in repository. While running
session which value informatica will read?
Informatica read value 15 from repository
108.What does first column of bad file (rejected rows) indicates?
First Column - Row indicator (0, 1, 2, 3)
Second Column Column Indicator (D, O, N, T)
Incremental Aggregation
Using this, you apply captured changes in the source to aggregate calculation in a session. If the source
changes only incrementally and you can capture changes, you can configure the session to process only
those changes
This allows the sever to update the target incrementally, rather than forcing it to process the entire source
and recalculate the same calculations each time you run the session.
Steps:
The first time you run a session with incremental aggregation enabled, the server process the entire source.
-
At the end of the session, the server stores aggregate data from that session ran in two files, the index file
and data file. The server creates the file in local directory.
The second time you run the session, use only changes in the source as source data for the session. The
server then performs the following actions:
(1)
For each input record, the session checks the historical information in the index file for a corresponding
group, then:
If it finds a corresponding group
The server performs the aggregate operation incrementally, using the aggregate data for that
group, and saves the incremental changes.
Else
Server create a new group and saves the record data
(2)
When writing to the target, the server applies the changes to the existing target.
o
Updates modified aggregate groups in the target
Inserts new aggregate data
Delete removed aggregate data
Ignores unchanged aggregate data
Saves modified aggregate data in Index/Data files to be used as historical data the next time you run
the session.
o
Each Subsequent time you run the session with incremental aggregation, you use only the incremental source changes
in the session.
If the source changes significantly, and you want the server to continue saving the aggregate data for the future
incremental changes, configure the server to overwrite existing aggregate data with new aggregate data.
Use Incremental Aggregator Transformation Only IF:

-
Mapping includes an aggregate function
Source changes only incrementally
You can capture incremental changes. You might do this by filtering source data by timestamp.
SESSION LOGS
Information that reside in a session log:
-
Allocation of system shared memory
Execution of Pre-session commands/ Post-session commands
Session Initialization
Creation of SQL commands for reader/writer threads
Start/End timings for target loading
Error encountered during session
Load summary of Reader/Writer/ DTM statistics
Other Information
-
By default, the server generates log files based on the server code page.
Thread Identifier
Ex: CMN_1039
Reader and Writer thread codes have 3 digit and Transformation codes have 4 digits. The number following a
thread name indicate the following:
(a) Target load order group number
(b) Source pipeline number
(c) Partition number
(d) Aggregate/ Rank boundary number
Log File Codes

Error Codes Description
BR -
Related to reader process, including ERP, relational and flat file.
CMN -
Related to database, memory allocation
DBGR -
Related to debugger
EP-
External Procedure
LM -
Load Manager
TM -
DTM
REP -
Repository
WRT -
Writer
Load Summary
(a) Inserted
(b) Updated
(c) Deleted
(d) Rejected
Statistics details
(a) Requested rows shows the no of rows the writer actually received for the specified operation
(b) Applied rows shows the number of rows the writer successfully applied to the target (Without Error)
(c) Rejected rows show the no of rows the writer could not apply to the target
(d) Affected rows shows the no of rows affected by the specified operation
Detailed transformation statistics
The server reports the following details for each transformation in the mapping
(a) Name of Transformation
(b) No of I/P rows and name of the Input source
(c) No of O/P rows and name of the output target
(d) No of rows dropped
Tracing Levels
Normal
- Initialization and status information, Errors encountered, Transformation errors, rows skipped,
summarize session details (Not at the level of individual rows)
Terse
- Initialization information as well as error messages, and notification of rejected data
Verbose Init
- Addition to normal tracing, Names of Index, Data files used and detailed transformation
statistics.
Verbose Data
- Addition to Verbose Init, Each row that passes in to mapping detailed transformation statistics.
NOTE
When you enter tracing level in the session property sheet, you override tracing levels configured for
transformations in the mapping.
Session Failures and Recovering Sessions

Two types of errors occurs in the server
-
Non-Fatal
Fatal
(a) Non-Fatal Errors

It is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the
session property sheet with the stop on option. When you enable this option, the server counts Non-Fatal errors
that occur in the reader, writer and transformations.
Reader errors can include alignment errors while running a session in Unicode mode.
Writer errors can include key constraint violations, loading NULL into the NOT-NULL field and database errors.
Transformation errors can include conversion errors and any condition set up as an ERROR,. Such as NULL
Input.
(b) Fatal Errors
This occurs when the server can not access the source, target or repository. This can include loss of connection
or target database errors, such as lack of database space to load data.
If the session uses normalizer (or) sequence generator transformations, the server can not update the
sequence values in the repository, and a fatal error occurs.
Others
Usages of ABORT function in mapping logic, to abort a session when the server encounters a transformation
error.
Stopping the server using pmcmd (or) Server Manager
Performing Recovery
-
When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the rowid
of the last row commited to the target database. The server then reads all sources again and starts
processing from the next rowid.
By default, perform recovery is disabled in setup. Hence it wont make entries in OPB_SRVR_RECOVERY
table.
The recovery session moves through the states of normal session schedule, waiting to run, Initializing,
running, completed and failed. If the initial recovery fails, you can run recovery as many times.
The normal reject loading process can also be done in session recovery process.
The performance of recovery might be low, if

o
Mapping contain mapping variables
Commit interval is high
Un recoverable Sessions
Under certain circumstances, when a session does not complete, you need to truncate the target and run the
session from the beginning.
Commit Intervals
A commit interval is the interval at which the server commits data to relational targets during a session.
(a) Target based commit
-
Server commits data based on the no of target rows and the key constraints on the target table. The commit
point also depends on the buffer block size and the commit pinterval.
During a session, the server continues to fill the writer buffer, after it reaches the commit interval. When the
buffer block is full, the Informatica server issues a commit command. As a result, the amount of data
committed at the commit point generally exceeds the commit interval.
The server commits data to each target based on primary foreign key constraints.
(b) Source based commit

-
Server commits data based on the number of source rows. The commit point is the commit interval you
configure in the session properties.
During a session, the server commits data to the target based on the number of rows from an active source
in a single pipeline. The rows are referred to as source rows.
A pipeline consists of a source qualifier and all the transformations and targets that receive data from
source qualifier.
Although the Filter, Router and Update Strategy transformations are active transformations, the server does
not use them as active sources in a source based commit session.
When a server runs a session, it identifies the active source for each pipeline in the mapping. The server
generates a commit row from the active source at every commit interval.
When each target in the pipeline receives the commit rows the server performs the commit.
Reject Loading
During a session, the server creates a reject file for each target instance in the mapping. If the writer of the target
rejects data, the server writers the rejected row into the reject file.
You can correct those rejected data and re-load them to relational targets, using the reject loading utility. (You
cannot load rejected data into a flat file target)
Each time, you run a session, the server appends a rejected data to the reject file.
Locating the BadFiles
$PMBadFileDir
Filename.bad
When you run a partitioned session, the server creates a separate reject file for each partition.
Reading Rejected data
Ex:
3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things.
(a) Row indicator
Row indicator tells the writer, what to do with the row of wrong data.
Row indicator
Meaning
Rejected By
Insert
Writer or target
Update
Writer or target
Delete
Writer or target
Reject
Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.
(b) Column indicator
Column indicator is followed by the first column of data, and another column indicator. They appears after every
column of data and define the type of data preceding it
Column Indicator
Meaning
Writer Treats as
Valid Data
Good Data. The target accepts

it unless a database error
occurs, such as finding
duplicate key.
Overflow
Bad Data.
Null
Bad Data.
Truncated
Bad Data
NOTE
NULL columns appear in the reject file with commas marking their column.
Correcting Reject File

Use the reject file and the session log to determine the cause for rejected data.
Keep in mind that correcting the reject file does not necessarily correct the source of the reject.
Correct the mapping and target database to eliminate some of the rejected data when you run the session again.
Trying to correct target rejected rows before correcting writer rejected rows is not recommended since they may
contain misleading column indicator.
For example, a series of N indicator might lead you to believe the target database does not accept NULL values,
so you decide to change those NULL values to Zero.
However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer because of an
update strategy expression, not because of a target database restriction.
If you try to load the corrected file to target, the writer will again reject those rows, and they will contain inaccurate 0
values, in place of NULL values.
Why writer can reject ?

-
Data overflowed column constraints
An update strategy expression
Why target database can Reject ?

-
Data contains a NULL column
Database errors, such as key violations
Steps for loading reject file:

-
After correcting the rejected data, rename the rejected file to reject_file.in
The rejloader used the data movement mode configured for the server. It also used the code page of
server/OS. Hence do not change the above, in middle of the reject loading
Use the reject loader utility

Pmrejldr pmserver.cfg [folder name] [session name]
Other points
The server does not perform the following option, when using reject loader
(a)
Source base commit
(b)
Constraint based loading
(c)
Truncated target table
(d)
FTP targets
(e)
External Loading
Multiple reject loaders

You can run the session several times and correct rejected data from the several session at once. You can correct
and load all of the reject files at once, or work on one or two reject files, load then and work on the other at a later
time.
External Loading
You can configure a session to use Sybase IQ, Teradata and Oracle external loaders to load session target files
into the respective databases.
The External Loader option can increase session performance since these databases can load information directly
from files faster than they can the SQL commands to insert the same data into the database.
Method:
When a session used External loader, the session creates a control file and target flat file. The control file contains
information about the target flat file, such as data format and loading instruction for the External Loader. The control
file has an extension of *.ctl and you can view the file in $PmtargetFilesDir.
For using an External Loader:
The following must be done:
-
configure an external loader connection in the server manager
Configure the session to write to a target flat file local to the server.
Choose an external loader connection for each target file in session property sheet.
Issues with External Loader:

-
Disable constraints
Performance issues
o
Increase commit intervals
Turn off database logging
Code page requirements
The server can use multiple External Loader within one session (Ex: you are having a session with the two
target files. One with Oracle External Loader and another with Sybase External Loader)
Other Information:
-
The External Loader performance depends upon the platform of the server
The server loads data at different stages of the session
The serve writes External Loader initialization and completing messaging in the session log. However,
details about EL performance, it is generated at EL log, which is getting stored as same target directory.
If the session contains errors, the server continues the EL process. If the session fails, the server loads
partial target data using EL.
The EL creates a reject file for data rejected by the database. The reject file has an extension of *.ldr
reject.
The EL saves the reject file in the target file directory
You can load corrected data from the file, using database reject loader, and not through Informatica reject
load utility (For EL reject file only)
Configuring EL in session
-
In the server manager, open the session property sheet
Select File target, and then click flat file options
Caches
-
server creates index and data caches in memory for aggregator ,rank ,joiner and Lookup transformation in a
mapping.
Server stores key values in index caches and output values in data caches : if the server requires more
memory ,it stores overflow values in cache files .
When the session completes, the server releases caches memory, and in most circumstances, it deletes
the caches files .
Caches Storage overflow :
releases caches memory, and in most circumstances, it deletes the caches files .
Caches Storage overflow :

Transformation
Aggregator
index cache
stores group values
data cache
stores calculations
As configured in the
based on Group-by ports
Group-by ports.
Rank
stores group values as
stores ranking information
Configured in the Group-by

Joiner
stores index values for
based on Group-by ports .

stores master source rows .
The master source table

As configured in Joiner condition.
Lookup
stores Lookup condition
stores lookup data thats
Information.
Not stored in the index cache.
Determining cache requirements

To calculate the cache size, you need to consider column and row requirements as well as processing
overhead.
server requires processing overhead to cache data and index information.
Column overhead includes a null indicator, and row overhead can include row to key information.
Steps:
-
first, add the total column size in the cache to the row overhead.
Multiply the result by the no of groups (or) rows in the cache this gives the minimum cache requirements .
For maximum requirements, multiply min requirements by 2.
Location:
-by default , the server stores the index and data files in the directory $PMCacheDir.
-the server names the index files PMAGG*.idx and data files PMAGG*.dat. if the size exceeds 2GB,you may find
multiple index and data files in the directory .The server appends a number to the end of
filename(PMAGG*.id*1,id*2,etc).
Aggregator Caches
-
when server runs a session with an aggregator transformation, it stores data in memory until it
the aggregation.
completes
when you partition a source, the server creates one memory cache and one disk cache and one and disk
cache for each partition .It routes data from one partition to another based on group key values of the
transformation.
server uses memory to process an aggregator transformation with sort ports. It

.you dont need to configure the cache memory, that use sorted ports.
doesnt use cache memory
Index cache:
#Groups (( column size) + 7)
Aggregate data cache:
Rank Cache
-
when the server runs a session with a Rank transformation, it compares an input row with rows with rows
in data cache. If the input row out-ranks a stored row,the Informatica server replaces the stored row with the
input row.
If the rank transformation is configured to rank across multiple groups, the server ranks incrementally for
each group it finds .
Index Cache :
Rank Data Cache:
#Group [(#Ranks * ( column size + 10)) + 20]
Joiner Cache:
-
When server runs a session with joiner transformation, it reads all rows from the master source and builds
memory caches based on the master rows.
After building these caches, the server reads rows from the detail source and performs the joins
Server creates the Index cache as it reads the master source into the data cache. The server uses the
Index cache to test the join condition. When it finds a match, it retrieves rows values from the data cache.
To improve joiner performance, the server aligns all data for joiner cache or an eight byte boundary.
Index Cache :
#Master rows [( column size) + 16)
Joiner Data Cache:
#Master row [( column size) + 8]
Lookup cache:
-
When server runs a lookup transformation, the server builds a cache in memory, when it process the first
row of data in the transformation.
Server builds the cache and queries it for the each row that enters the transformation.
If you partition the source pipeline, the server allocates the configured amount of memory for each partition.
If two lookup transformations share the cache, the server does not allocate additional memory for the
second lookup transformation.
The server creates index and data cache files in the lookup cache drectory and used the server code page
to create the files.
Index Cache :
#Rows in lookup table [( column size) + 16)
Lookup Data Cache:
#Rows in lookup table [( column size) + 8]
Mapplets
When the server runs a session using a mapplets, it expands the mapplets. The server then runs the session as it
would any other sessions, passing data through each transformations in the mapplet.
If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet and every mapping
using the mapplet.
You can create a non-reusable instance of a reusable transformation.
Mapplet Objects:
(a)
Input transformation
(b)
Source qualifier
(c)
Transformations, as you need
(d)
Output transformation
Mapplet Wont Support:

-
Joiner
Normalizer
Pre/Post session stored procedure
Target definitions
XML source definitions
Types of Mapplets:
(a)
Active Mapplets
Contains one or more active transformations
(b)
Passive Mapplets
Contains only passive transformation
Copied mapplets are not an instance of original mapplets. If you make changes to the original, the copy does not inherit
your changes
You can use a single mapplet, even more than once on a mapping.
Ports
Default value for I/P port-
NULL
Default value for O/P port
ERROR
Default value for variables
Does not support default values

Session Parameters
This parameter represent values you might want to change between sessions, such as DB Connection or source file.
We can use session parameter in a session property sheet, then define the parameters in a session parameter file.
The user defined session parameter are:
(a)
DB Connection
(b)
Source File directory
(c)
Target file directory
(d)
Reject file directory
Description:
Use session parameter to make sessions more flexible. For example, you have the same type of transactional data
written to two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the
databases. You want to use the same mapping for both tables.
Instead of creating two sessions for the same mapping, you can create a database connection parameter, like
$DBConnectionSource, and use it as the source database connection for the session.
When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session.
After it completes set the value to TransDB2 and run the session again.
NOTE:
You can use several parameter together to make session management easier.
Session parameters do not have default value, when the server can not find a value for a session parameter, it fails to
initialize the session.
Session Parameter File
-
A parameter file is created by text editor.
In that, we can specify the folder and session name, then list the parameters and variables used in the
session and assign each value.
Save the parameter file in any directory, load to the server
We can define following values in a parameter

o
Mapping parameter
Mapping variables
Session parameters
You can include parameter and variable information for more than one session in a single parameter file by
creating separate sections, for each session with in the parameter file.
You can override the parameter file for sessions contained in a batch by using a batch parameter file. A
batch parameter file has the same format as a session parameter file
Locale
Informatica server can transform character data in two modes

(a) ASCII
a. Default one
b. Passes 7 byte, US-ASCII character data
(b) UNICODE
a. Passes 8 bytes, multi byte character data
b. It uses 2 bytes for each character to move data and performs additional checks at session level, to
ensure data integrity.
Code pages contains the encoding to specify characters in a set of one or more languages. We can select a code page,
based on the type of character data in the mappings.
Compatibility between code pages is essential for accurate data movement.
The various code page components are
-
Operating system Locale settings
Operating system code page
Informatica server data movement mode
Informatica server code page
Informatica repository code page
Locale
(a) System Locale -
System Default
(b) User locale
setting for date, time, display
Input locale
Mapping Parameter and Variables
These represent values in mappings/mapplets.

If we declare mapping parameters and variables in a mapping, you can reuse a mapping by altering the parameter
and variable values of the mappings in the session.
This can reduce the overhead of creating multiple mappings when only certain attributes of mapping needs to be
changed.
When you want to use the same value for a mapping parameter each time you run the session.
Unlike a mapping parameter, a mapping variable represent a value that can change through the session. The
server saves the value of a mapping variable to the repository at the end of each successful run and used that value
the next time you run the session.
Mapping objects:
Source, Target, Transformation, Cubes, Dimension
Debugger
We can run the Debugger in two situations
(a)
Before Session: After saving mapping, we can run some initial tests.
(b)
After Session:
real Debugging process
MEadata Reporter:
-
Web based application that allows to run reports against repository metadata
Reports including executed sessions, lookup table dependencies, mappings and source/target schemas.
Repository
Types of Repository
(a) Global Repository
a. This is the hub of the domain use the GR to store common objects that multiple developers can use
through shortcuts. These may include operational or application source definitions, reusable
transformations, mapplets and mappings
(b) Local Repository
a. A Local Repository is with in a domain that is not the global repository. Use4 the Local Repository for
development.
Standard Repository
a. A repository that functions individually, unrelated and unconnected to other repository
NOTE:
-
Once you create a global repository, you can not change it to a local repository
However, you can promote the local to global repository

Batches
Provide a way to group sessions for either serial or parallel execution by server
Batches
o
Sequential (Runs session one after another)
Concurrent (Runs sessions at same time)
Nesting Batches
Each batch can contain any number of session/batches. We can nest batches several levels deep, defining batches
within batches
Nested batches are useful when you want to control a complex series of sessions that must run sequentially or
concurrently
Scheduling
When you place sessions in a batch, the batch schedule override that session schedule by default. However, we
can configure a batched session to run on its own schedule by selecting the Use Absolute Time Session Option.
Server Behavior
Server configured to run a batch overrides the server configuration to run sessions within the batch. If you have
multiple servers, all sessions within a batch run on the Informatica server that runs the batch.
The server marks a batch as failed if one of its sessions is configured to run if Previous completes and that
previous session fails.
Sequential Batch
If you have sessions with dependent source/target relationship, you can place them in a sequential batch, so that
Informatica server can run them is consecutive order.
They are two ways of running sessions, under this category
(a) Run the session, only if the previous completes successfully
(b) Always run the session (this is default)
Concurrent Batch
In this mode, the server starts all of the sessions within the batch, at same time
Concurrent batches take advantage of the resource of the Informatica server, reducing the time it takes to run the
session separately or in a sequential batch.
Concurrent batch in a Sequential batch
If you have concurrent batches with source-target dependencies that benefit from running those batches in a
particular order, just like sessions, place them into a sequential batch.
Stopping and aborting a session

-
If the session you want to stop is a part of batch, you must stop the batch
If the batch is part of nested batch, stop the outermost batch
When you issue the stop command, the server stops reading data. It continues processing and writing data
and committing data to targets
If the server cannot finish processing and committing data, you can issue the ABORT command. It is similar
to stop command, except it has a 60 second timeout. If the server cannot finish processing and committing
data within 60 seconds, it kills the DTM process and terminates the session.
Recovery:
-
After a session being stopped/aborted, the session results can be recovered. When the recovery is
performed, the session continues from the point at which it stopped.
If you do not recover the session, the server runs the entire session the next time.
Hence, after stopping/aborting, you may need to manually delete targets before the session runs again.
NOTE:
ABORT command and ABORT function, both are different.
When can a Session Fail
-
Server cannot allocate enough system resources
Session exceeds the maximum no of sessions the server can run concurrently
Server cannot obtain an execute lock for the session (the session is already locked)
Server unable to execute post-session shell commands or post-load stored procedures
Server encounters database errors
Server encounter Transformation row errors (Ex: NULL value in non-null fields)
Network related errors
When Pre/Post Shell Commands are useful

-
To delete a reject file
To archive target files before session begins

Session Performance
Minimum log (Terse)
Partitioning source data.
Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)
Adding indexes.
Changing commit Level.
Using Filter trans to remove unwanted data movement.
Increasing buffer memory, when large volume of data.
Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expressions.
In session level, the causes are small cache size, low buffer memory and small commit interval.
At system level,
o
WIN NT/2000-U the task manager.
UNIX: VMSTART, IOSTART.
Hierarchy of optimization
-
Target.
Source.
Mapping
Session.
System.
Optimizing Target Databases:

-
Drop indexes /constraints
Increase checkpoint intervals.
Use bulk loading /external loading.
Turn off recovery.
Increase database network packet size.
Source
level
Optimize the query (using group by, group by).
Use conditional filters.
Connect to RDBMS using IPC protocol.
Mapping
-
Optimize data type conversions.
Eliminate transformation errors.
Optimize transformations/ expressions.
Session:
-
concurrent batches.
Partition sessions.
Reduce error tracing.
Remove staging area.
Tune session parameters.
System:
-
improve network speed.
Use multiple preservers on separate systems.
Reduce paging.
Session Process
Info server uses both process memory and system shared memory to perform ETL process.
It runs as a daemon on UNIX and as a service on WIN NT.
The following processes are used to run a session:
(a)
LOAD manager process:
(b)
starts a session
creates DTM process, which creates the session.
DTM process: -
creates threads to initialize the session
read, write and transform data.
handle pre/post session opertions.
Load manager processes:

-
manages session/batch scheduling.
Locks session.
Reads parameter file.
Expands server/session variables, parameters .
Verifies permissions/privileges.
Creates session log file.
DTM process:
The primary purpose of the DTM is to create and manage threads that carry out the session tasks.
The DTM allocates process memory for the session and divides it into buffers. This is known as buffer
memory. The default memory allocation is 12,000,000 bytes .it creates the main thread, which is called master
thread .this manages all other threads.
Various threads
functions
Master thread-
handles stop and abort requests from load manager.
Mapping thread-
one thread for each session.

Fetches session and mapping information.
Compiles mapping.
Cleans up after execution.
Reader thread-
one thread for each partition.

Relational sources uses relational threads and
Flat files use file threads.
Writer thread-
one thread for each partition writes to target.
Transformation thread-
One or more transformation for each partition.
Note:
When you run a session, the
to move/transform data.
threads for a partitioned source execute concurrently. The threads use buffers
What is the use of Forward/Reject rows in Mapping?
Q. What are the advantages of having an index? Or What is an index?

The purpose of an index is to provide pointers to the rows in a table that contain a given key value. In a regular index,
this is achieved by storing a list of rowids for each key corresponding to the rows with that key value. Oracle stores each
key value repeatedly with each stored rowid.
Q. What are the different types of indexes supported by Oracle?
The different types of indexes are:
a. B-tree indexes
b. B-tree cluster indexes
c. Hash cluster indexes
d. Reverse key indexes
e. Bitmap indexes
Q. Can we have function based indexes?
Yes, we can create indexes on functions and expressions that involve one or more columns in the table being indexed.
A function-based index precomputes the value of the function or expression and stores it in the index.
You can create a function-based index as either a B-tree or a bitmap index.
Q. What are the restrictions on function based indexes?
The function used for building the index can be an arithmetic expression or an expression that contains a PL/SQL
function, package function, C callout, or SQL function. The expression cannot contain any aggregate functions, and it
must be DETERMINISTIC. For building an index on a column containing an object type, the function can be a method of
that object, such as a map method. However, you cannot build a function-based index on a LOB column, REF, or
nested table column, nor can you build a function-based index if the object type contains a LOB, REF, or nested table.
Q. What are the advantages of having a B-tree index?

The major advantages of having a B-tree index are:
1. B-trees provide excellent retrieval performance for a wide range of queries, including exact match and range
searches.
2. Inserts, updates, and deletes are efficient, maintaining key order for fast retrieval.
3. B-tree performance is good for both small and large tables, and does not degrade as the size of a table grows.
Q. What is a bitmap index? (KPIT Infotech, Pune)
The purpose of an index is to provide pointers to the rows in a table that contain a given key value. In a regular index,
this is achieved by storing a list of rowids for each key corresponding to the rows with that key value. Oracle stores each
key value repeatedly with each stored rowid. In a bitmap index, a bitmap for each key value is used instead of a list of
rowids.
Each bit in the bitmap corresponds to a possible rowid. If the bit is set, then it means that the row with the corresponding
rowid contains the key value. A mapping function converts the bit position to an actual rowid, so the bitmap index
provides the same functionality as a regular index even though it uses a different representation internally. If the number
of different key values is small, then bitmap indexes are very space efficient.
Bitmap indexing efficiently merges indexes that correspond to several conditions in a WHERE clause. Rows that satisfy
some, but not all, conditions are filtered out before the table itself is accessed. This improves response time, often
dramatically.
Q. What are the advantages of having bitmap index for data warehousing applications? (KPIT Infotech, Pune)
Bitmap indexing benefits data warehousing applications which have large amounts of data and ad hoc queries but a low
level of concurrent transactions. For such applications, bitmap indexing provides:
1. Reduced response time for large classes of ad hoc queries
2. A substantial reduction of space usage compared to other indexing techniques
3. Dramatic performance gains even on very low end hardware
4. Very efficient parallel DML and loads
Q. What is the advantage of bitmap index over B-tree index?
Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in terms of space since the
index can be several times larger than the data in the table. Bitmap indexes are typically only a fraction of the size of the
indexed data in the table.
Q. What is the limitation/drawback of a bitmap index?
Bitmap indexes are not suitable for OLTP applications with large numbers of concurrent transactions modifying the data.
These indexes are primarily intended for decision support in data warehousing applications where users typically query
the data rather than update it.
Bitmap indexes are not suitable for high-cardinality data.
Q. How do you choose between B-tree index and bitmap index?
The advantages of using bitmap indexes are greatest for low cardinality columns: that is, columns in which the number
of distinct values is small compared to the number of rows in the table. If the values in a column are repeated more than
a hundred times, then the column is a candidate for a bitmap index. Even columns with a lower number of repetitions
and thus higher cardinality, can be candidates if they tend to be involved in complex conditions in the WHERE clauses
of queries.
For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A
bitmap index on this column can out-perform a B-tree index, particularly when this column is often queried in
conjunction with other columns.
B-tree indexes are most effective for high-cardinality data: that is, data with many possible values, such as
CUSTOMER_NAME or PHONE_NUMBER. A regular Btree index can be several times larger than the indexed data.
Used appropriately, bitmap indexes can be significantly smaller than a corresponding B-tree index.
Q. What are clusters?

Clusters are an optional method of storing table data. A cluster is a group of tables that share the same data blocks
because they share common columns and are often used together.
For example, the EMP and DEPT table share the DEPTNO column. When you cluster the EMP and DEPT tables,
Oracle physically stores all rows for each department from both the EMP and DEPT tables in the same data blocks.
Q. What is partitioning? (KPIT Infotech, Pune)
Partitioning addresses the key problem of supporting very large tables and indexes by allowing you to decompose them
into smaller and more manageable pieces called partitions. Once partitions are defined, SQL statements can access
and manipulate the partitions rather than entire tables or indexes. Partitions are especially useful in data warehouse
applications, which commonly store and analyze large amounts of historical data.
Q. What are the different partitioning methods?
Two primary methods of partitioning are available:
1. range partitioning, which partitions the data in a table or index according to a range of values, and
2. hash partitioning, which partitions the data according to a hash function.
Another method, composite partitioning, partitions the data by range and further subdivides the data into sub partitions
using a hash function.
Q. What is the necessity to have table partitions?
The need to partition large tables is driven by:
Data Warehouse and Business Intelligence demands for ad hoc analysis on great quantities of historical data
Cheaper disk storage
Application performance failure due to use of traditional techniques
Q. What are the advantages of storing each partition in a separate tablespace?
The major advantages are:
1. You can contain the impact of data corruption.
2. You can back up and recover each partition or subpartition independently.
3. You can map partitions or subpartitions to disk drives to balance the I/O load.
Q. What are the advantages of partitioning?
Partitioning is useful for:
1. Very Large Databases (VLDBs)
2. Reducing Downtime for Scheduled Maintenance
3. Reducing Downtime Due to Data Failures
4. DSS Performance
5. I/O Performance
6. Disk Striping: Performance versus Availability
7. Partition Transparency
Q. What is Range Partitioning? (KPIT Infotech, Pune)
Range partitioning maps rows to partitions based on ranges of column values. Range partitioning is defined by the
partitioning specification for a table or index:
PARTITION BY RANGE ( column_list ) and by the partitioning specifications for each individual partition:
VALUES LESS THAN ( value_list )
Q. What is Hash Partitioning?
Hash partitioning uses a hash function on the partitioning columns to stripe data into partitions. Hash partitioning allows
data that does not lend itself to range partitioning to be easily partitioned for performance reasons such as parallel DML,
partition pruning, and partition-wise joins.
Q. What are the advantages of Hash partitioning over Range Partitioning?
Hash partitioning is a better choice than range partitioning when:
a) You do not know beforehand how much data will map into a given range
b) Sizes of range partitions would differ quite substantially
c) Partition pruning and partition-wise joins on a partitioning key are important
Q. What are the rules for partitioning a table?

A table can be partitioned if:
It is not part of a cluster
It does not contain LONG or LONG RAW datatypes
Q. What is a global partitioned index?
In a global partitioned index, the keys in a particular index partition may refer to rows stored in more than one underlying
table partition or subpartition. A global index can only be range-partitioned, but it can be defined on any type of
partitioned table.
Q. What are the different types of locks?
There are five kinds of locks on repository objects:
Read lock. Created when you open a repository object in a folder for which you do not have write permission.
Also created when you open an object with an existing write lock.
Write lock. Created when you create or edit a repository object in a folder for which you have write permission.
Execute lock. Created when you start a session or batch, or when the Informatica Server starts a scheduled
session or batch.
Fetch lock. Created when the repository reads information about repository objects from the database.
Save lock. Created when you save information to the repository.
Q. What is Event-Based Scheduling?

When you use event-based scheduling, the Informatica Server starts a session when it locates the specified indicator
file. To use event-based scheduling, you need a shell command, script, or batch file to create an indicator file when all
sources are available. The file must be created or sent to a directory local to the Informatica Server. The file can be of
any format recognized by the Informatica Server operating system. The Informatica Server deletes the indicator file
once the session starts.
Q: Why doesn't constraint based load order work with a maplet? (08 May 2000)
If your maplet has a sequence generator (reusable) that's mapped with data straight to an "OUTPUT" designation, and
then the map splits the output to two tables: parent/child - and your session is marked with "Constraint Based Load
Ordering" you may have experienced a load problem - where the constraints do not appear to be met?? Well - the
problem is in the perception of what an "OUTPUT" designation is. The OUTPUT component is NOT an "object" that
collects a "row" as a row, before pushing it downstream. An OUTPUT component is merely a pass-through structural
object - as indicated, there are no data types on the INPUT or OUTPUT components of a maplet - thus indicating
merely structure. To make the constraint based load order work properly, move all the ports through a single
expression, then through the OUTPUT component - this will force a single row to be "put together" and passed along to
the receiving maplet. Otherwise - the sequence generator generates 1 new sequence ID for each split target on the
other side of the OUTPUT component.
Q: How do I handle duplicate rows coming in from a flat file?
If you don't care about "reporting" duplicates, use an aggregator. Set the Group By Ports to group by the primary key in
the parent target table. Keep in mind that using an aggregator causes the following: The last duplicate row in the file is
pushed through as the one and only row, loss of ability to detect which rows are duplicates, caching of the data before
processing in the map continues. If you wish to report duplicates, then follow the suggestions in the presentation slides
(available on this web site) to institute a staging table. See the pro's and cons' of staging tables, and what they can do
for you.
Q: What happens in a database when a cached LOOKUP object is created (during a session)?
The session generates a select statement with an Order By clause. Any time this is issued, the databases like Oracle
and Sybase will select (read) all the data from the table, in to the temporary database/space. Then the data will be
sorted, and read in chunks back to Informatica server. This means, that hot-spot contention for a cached lookup will
NOT be the table it just read from. It will be the TEMP area in the database, particularly if the TEMP area is being
utilized for other things. Also - once the cache is created, it is not re-read until the next running session re-creates it.
Q: Can you explain how "constraint based load ordering" works? (27 Jan 2000)
Constraint based load ordering in PowerMart / PowerCenter works like this: it controls the order in which the target
tables are committed to a relational database. It is of no use when sending information to a flat file. To construct the
proper constraint order: links between the TARGET tables in Informatica need to be constructed. Simply turning on
"constraint based load ordering" has no effect on the operation itself. Informatica does NOT read constraints from the
database when this switch is turned on. Again, to take advantage of this switch, you must construct primary / foreign
key relationships in the TARGET TABLES in the designer of Informatica. Creating primary / foreign key relationships is
difficult - you are only allowed to link a single port (field) to a single table as a primary / foreign key.
What is the method of loading 5 flat files of having same structure to a single target and which transformations
will you use?
This can be handled by using the file list in informatica. If we have 5 files
in different locations on the server and we need to load in to single target
table. In session properties we need to change the file type as Indirect.
(Direct if the source file contains the source data. Choose Indirect if the
source file contains a list of files.
When you select Indirect the PowerCenter Server finds the file list then reads
each listed file when it executes the session.)
am taking a notepad and giving following paths and filenames in this notepad and saving this notepad as
emp_source.txt in the directory /ftp_data/webrep/
/ftp_data/webrep/SrcFiles/abc.txt
/ftp_data/webrep/bcd.txt
/ftp_data/webrep/srcfilesforsessions/xyz.txt
/ftp_data/webrep/SrcFiles/uvw.txt
/ftp_data/webrep/pqr.txt
In session properties i give /ftp_data/webrep/ in the
directory path and file name as emp_source.txt and file type as Indirect.
Other methods to Improve Performance
Optimizing the Target Database
If your session writes to a flat file target, you can optimize session performance by writing to a flat file target that is local
to the Informatica Server.
If your session writes to a relational target, consider performing the following tasks to increase performance:
Drop indexes and key constraints.
Increase checkpoint intervals.
Use bulk loading.
Use external loading.
Turn off recovery.
Increase database network packet size.
Optimize Oracle target databases.
Dropping Indexes and Key Constraints

When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve
performance, drop indexes and key constraints before running your session. You can rebuild those indexes and key
constraints after the session completes.
If you decide to drop and rebuild indexes and key constraints on a regular basis, you can create pre- and postload stored procedures to perform these operations each time you run the session.
Note: To optimize performance, use constraint-based loading only if necessary.
Increasing Checkpoint Intervals
The Informatica Server performance slows each time it waits for the database to perform a checkpoint. To increase
performance, consider increasing the database checkpoint interval. When you increase the database checkpoint
interval, you increase the likelihood that the database performs checkpoints as necessary, when the size of the
database log file reaches its limit.
Bulk Loading on Sybase and Microsoft SQL Server
You can use bulk loading to improve the performance of a session that inserts a large amount of data to a Sybase or
Microsoft SQL Server database. Configure bulk loading on the Targets dialog box in the session properties.
When bulk loading, the Informatica Server bypasses the database log, which speeds performance. Without
writing to the database log, however, the target database cannot perform rollback. As a result, the Informatica Server
cannot perform recovery of the session. Therefore, you must weigh the importance of improved session performance
against the ability to recover an incomplete session.
If you have indexes or key constraints on your target tables and you want to enable bulk loading, you must drop
the indexes and constraints before running the session. After the session completes, you can rebuild them. If you
decide to use bulk loading with the session on a regular basis, you can create pre- and post-load stored procedures to
drop and rebuild indexes and key constraints.
For other databases, even if you configure the bulk loading option, Informatica Server ignores the commit
interval mentioned and commits as needed.
External Loading on Teradata, Oracle, and Sybase IQ
You can use the External Loader session option to integrate external loading with a session.
If you have a Teradata target database, you can use the Teradata external loader utility to bulk load target files.
If your target database runs on Oracle, you can use the Oracle SQL*Loader utility to bulk load target files. When
you load data to an Oracle database using a partitioned session, you can increase performance if you create the Oracle
target table with the same number of partitions you use for the session.
If your target database runs on Sybase IQ, you can use the Sybase IQ external loader utility to bulk load target
files. If your Sybase IQ database is local to the Informatica Server on your UNIX system, you can increase performance
by loading data to target tables directly from named pipes. Use pmconfig to enable the SybaseIQLocaltoPMServer
option. When you enable this option, the Informatica Server loads data directly from named pipes rather than writing to
a flat file for the Sybase IQ external loader.
Increasing Database Network Packet Size
You can increase the network packet size in the Informatica Server Manager to reduce target bottleneck. For Sybase
and Microsoft SQL Server, increase the network packet size to 8K - 16K. For Oracle, increase the network packet size
in tnsnames.ora and listener.ora. If you increase the network packet size in the Informatica Server configuration, you
also need to configure the database server network memory to accept larger packet sizes.
Optimizing Oracle Target Databases
If your target database is Oracle, you can optimize the target database by checking the storage clause, space
allocation, and rollback segments.
When you write to an Oracle database, check the storage clause for database objects. Make sure that tables
are using large initial and next values. The database should also store table and index data in separate tablespaces,
preferably on different disks.
When you write to Oracle target databases, the database uses rollback segments during loads. Make sure that
the database stores rollback segments in appropriate tablespaces, preferably on different disks. The rollback segments
should also have appropriate storage clauses.
You can optimize the Oracle target database by tuning the Oracle redo log. The Oracle database uses the redo
log to log loading operations. Make sure that redo log size and buffer size are optimal. You can view redo log properties
in the init.ora file.
If your Oracle instance is local to the Informatica Server, you can optimize performance by using IPC protocol to
connect to the Oracle database. You can set up Oracle database connection in listener.ora and tnsnames.ora.
Improving Performance at mapping level
Optimizing Datatype Conversions
Forcing the Informatica Server to make unnecessary datatype conversions slows performance.
For example, if your mapping moves data from an Integer column to a Decimal column, then back to an Integer column,
the unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype conversions
from mappings.
Some datatype conversions can improve system performance. Use integer values in place of other datatypes when
performing comparisons using Lookup and Filter transformations.
For example, many databases store U.S. zip code information as a Char or Varchar datatype. If you convert your zip
code data to an Integer datatype, the lookup database stores the zip code 94303-1234 as 943031234. This helps
increase the speed of the lookup comparisons based on zip code.
Optimizing Lookup Transformations
If a mapping contains a Lookup transformation, you can optimize the lookup. Some of the things you can do to increase
performance include caching the lookup table, optimizing the lookup condition, or indexing the lookup table.
Caching Lookups
If a mapping contains Lookup transformations, you might want to enable lookup caching. In general, you want to cache
lookup tables that need less than 300MB.
When you enable caching, the Informatica Server caches the lookup table and queries the lookup cache during the
session. When this option is not enabled, the Informatica Server queries the lookup table on a row-by-row basis. You
can increase performance using a shared or persistent cache:
Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache
between transformations in the same mapping. You can share a named cache between transformations in the same or
different mappings.
Persistent cache. If you want to save and reuse the cache files, you can configure the transformation to use a
persistent cache. Use this feature when you know the lookup table does not change between session runs. Using a
persistent cache can improve performance because the Informatica Server builds the memory cache from the cache
files instead of from the database.
Reducing the Number of Cached Rows
Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. This allows you to reduce
the number of rows included in the cache.
Optimizing the Lookup Condition
If you include more than one lookup condition, place the conditions with an equal sign first to optimize lookup
performance.
Indexing the Lookup Table
The Informatica Server needs to query, sort, and compare values in the lookup condition columns. The index needs to
include every column used in a lookup condition. You can improve performance for both cached and uncached lookups:
Cached lookups. You can improve performance by indexing the columns in the lookup ORDER BY. The session log
contains the ORDER BY statement.
Uncached lookups. Because the Informatica Server issues a SELECT statement for each row passing into the Lookup
transformation, you can improve performance by indexing the columns in the lookup condition.
Improving Performance at Repository level
Tuning Repository Performance
The PowerMart and PowerCenter repository has more than 80 tables and almost all tables use one or more indexes to
speed up queries. Most databases keep and use column distribution statistics to determine which index to use to
execute SQL queries optimally. Database servers do not update these statistics continuously.
In frequently-used repositories, these statistics can become outdated very quickly and SQL query optimizers may
choose a less than optimal query plan. In large repositories, the impact of choosing a sub-optimal query plan can affect
performance drastically. Over time, the repository becomes slower and slower.
To optimize SQL queries, you might update these statistics regularly. The frequency of updating statistics depends on
how heavily the repository is used. Updating statistics is done table by table. The database administrator can create
scripts to automate the task.
You can use the following information to generate scripts to update distribution statistics.
Note: All PowerMart/PowerCenter repository tables and index names begin with OPB_.
Oracle Database
You can generate scripts to update distribution statistics for an Oracle repository.
To generate scripts for an Oracle repository:
1. Run the following queries:
select 'analyze table ', table_name, ' compute statistics;' from user_tables where table_name like 'OPB_%'
select 'analyze index ', INDEX_NAME, ' compute statistics;' from user_indexes where INDEX_NAME like
'OPB_%'
This produces an output like the following:
'ANALYZETABLE'
TABLE_NAME
'COMPUTESTATISTICS;'
-------------- ---------------- -----------------------------------------------------------------------------analyze table
OPB_ANALYZE_DEP
compute statistics;
analyze table
OPB_ATTR
compute statistics;
analyze table
OPB_BATCH_OBJECT
compute statistics;
2. Save the output to a file.

3. Edit the file and remove all the headers.
Headers are like the following:
'ANALYZETABLE' TABLE_NAME
'COMPUTESTATISTICS;'
-------------- ---------------- -------------------4. Run this as an SQL script. This updates repository table statistics.
Microsoft SQL Server
You can generate scripts to update distribution statistics for a Microsoft SQL Server repository.
To generate scripts for a Microsoft SQL Server repository:
1. Run the following query:
select 'update statistics ', name from sysobjects where name like 'OPB_%'
This produces an output like the following:
name
------------------ -----------------update statistics OPB_ANALYZE_DEP
update statistics OPB_ATTR
update statistics OPB_BATCH_OBJECT
2. Save the output to a file.
3. Edit the file and remove the header information.
Headers are like the following:
name
------------------ -----------------4. Add a go at the end of the file.
5. Run this as a sql script. This updates repository table statistics.
Improving Performance at Session level

Optimizing the Session
Once you optimize your source database, target database, and mapping, you can focus on optimizing the session. You
can perform the following tasks to improve overall performance:
Run concurrent batches.
Partition sessions.
Reduce errors tracing.
Remove staging areas.
Tune session parameters.
Table 19-1 lists the settings and values you can use to improve session performance:
Table 19-1. Session Tuning Parameters
Setting
DTM
Size
Suggested
Value
Default Value
Buffer
Pool 12,000,000
MB]
bytes
[12
Minimum Suggested
Value
6,000,000 bytes
128,000,000 bytes
Buffer block size
64,000 bytes
[64 KB]
4,000 bytes
128,000 bytes
Index cache size
1,000,000 bytes
1,000,000 bytes
12,000,000 bytes
Data cache size
2,000,000 bytes
2,000,000 bytes
24,000,000 bytes
Commit interval
10,000 rows
N/A
N/A
Decimal arithmetic
Disabled
N/A
N/A
Tracing Level
Normal
Terse
N/A
Maximum
How to correct and load the rejected files when the session completes
During a session, the Informatica Server creates a reject file for each target instance in the mapping. If the writer or the
target rejects data, the Informatica Server writes the rejected row into the reject file. By default, the Informatica Server
creates reject files in the $PMBadFileDir server variable directory.
The reject file and session log contain information that helps you determine the cause of the reject. You can correct
reject files and load them to relational targets using the Informatica reject loader utility. The reject loader also creates
another reject file for the data that the writer or target reject during the reject loading.
Complete the following tasks to load reject data into the target:
Locate the reject file.

Correct bad data.
Run the reject loader utility.
NOTE: You cannot load rejected data into a flat file target
After you locate a reject file, you can read it using a text editor that supports the reject file code page.
Reject files contain rows of data rejected by the writer or the target database. Though the Informatica Server writes the
entire row in the reject file, the problem generally centers on one column within the row. To help you determine which
column caused the row to be rejected, the Informatica Server adds row and column indicators to give you more
information about each column:
Row indicator. The first column in each row of the reject file is the row indicator. The numeric indicator tells
whether the row was marked for insert, update, delete, or reject.
Column indicator. Column indicators appear after every column of data. The alphabetical character indicators
tell whether the data was valid, overflow, null, or truncated.
The following sample reject file shows the row and column indicators:
3,D,1,D,,D,0,D,1094945255,D,0.00,D,-0.00,D
0,D,1,D,April,D,1997,D,1,D,-1364.22,D,-1364.22,D
0,D,1,D,April,D,2000,D,1,D,2560974.96,D,2560974.96,D
3,D,1,D,April,D,2000,D,0,D,0.00,D,0.00,D
0,D,1,D,August,D,1997,D,2,D,2283.76,D,4567.53,D
0,D,3,D,December,D,1999,D,1,D,273825.03,D,273825.03,D
0,D,1,D,September,D,1997,D,1,D,0.00,D,0.00,D
Row Indicators
The first column in the reject file is the row indicator. The number listed as the row indicator tells the writer what to do
with the row of data.
Table 15-1 describes the row indicators in a reject file:
Table 15-1. Row Indicators in Reject File
Row Indicator Meaning Rejected By
0
Insert
Writer or target
Update
Writer or target
Delete
Writer or target
Reject
Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.
If a row indicator is 0, 1, or 2, either the writer or the target database rejected the row. To narrow down the reason why
rows marked 0, 1, or 2 were rejected, review the column indicators and consult the session log.
Column Indicators
After the row indicator is a column indicator, followed by the first column of data, and another column indicator. Column
indicators appear after every column of data and define the type of the data preceding it.
Table 15-2 describes the column indicators in a reject file:

Table 15-2. Column Indicators in Reject File
Column
Indicator
Type of data
Writer Treats As
Valid data.
Good data. Writer passes it to the target database. The

target accepts it unless a database error occurs, such as
finding a duplicate key.
Overflow. Numeric data exceeded the Bad data, if you configured the mapping target to reject
specified precision or scale for the column.
overflow or truncated data.
Null. The column contains a null value.
Truncated. String data exceeded a specified Bad data, if you configured the mapping target to reject
precision for the column, so the Informatica overflow or truncated data.
Server truncated it.
Good data. Writer passes it to the target, which rejects it

if the target database does not accept null values.
After you correct the target data in each of the reject files, append .in to each reject file you want to load into the
target database. For example, after you correct the reject file, t_AvgSales_1.bad, you can rename it
t_AvgSales_1.bad.in.
After you correct the reject file and rename it to reject_file.in, you can use the reject loader to send those files through
the writer to the target database.
Use the reject loader utility from the command line to load rejected files into target tables. The syntax for reject loading
differs on UNIX and Windows NT/2000 platforms.
Use the following syntax for UNIX:
pmrejldr pmserver.cfg [folder_name:]session_name
Use the following syntax for Windows NT/2000:
pmrejldr [folder_name:]session_name
Recovering Sessions
If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause
of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on
the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
Run the session again if the Informatica Server has not issued a commit.
Truncate the target tables and run the session again if the session is not recoverable.
Consider performing recovery if the Informatica Server has issued at least one commit.
When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row
ID of the last row committed to the target database. The Informatica Server then reads all sources again and starts
processing from the next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails,
when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001. The
commit point may be different for source- and target-based commits.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica
Server setup before you run a session so the Informatica Server can create and/or write entries in the
OPB_SRVR_RECOVERY table.
Causes for Session Failure
Reader errors. Errors encountered by the Informatica Server while reading the source database or source files.
Reader threshold errors can include alignment errors while running a session in Unicode mode.
Writer errors. Errors encountered by the Informatica Server while writing to the target database or target files.
Writer threshold errors can include key constraint violations, loading nulls into a not null field, and database
trigger responses.
Transformation errors. Errors encountered by the Informatica Server while transforming data. Transformation
threshold errors can include conversion errors, and any condition set up as an ERROR, such as null input.
Fatal Error
A fatal error occurs when the Informatica Server cannot access the source, target, or repository. This can include loss of
connection or target database errors, such as lack of database space to load data. If the session uses a Normalizer or
Sequence Generator transformation, the Informatica Server cannot update the sequence values in the repository, and a
fatal error occurs.
What is target load order?
You specify the target loadorder based on source qualifiers in a maping.If you have the
multiple source qualifiers connected to the multiple targets,You can designatethe order in
which informatica server loads data into the targets.
Can we use aggregator/active transformation after update strategy transformation?
You can use aggregator after update strategy. The problem will be, once you perform the
update strategy, say you had flagged some rows to be deleted and you had performed
aggregator transformation for all rows, say you are using SUM function, then the deleted
rows will be subtracted from this aggregator transformation.
How can we join the tables if the tables have no primary and forien key relation and
no matchig port to join?
without common column or common data type we can join two sources using dummy
ports.
1.Add one dummy port in two sources.
2.In the expression trans assing '1' to each port.
2.Use Joiner transformation to join the sources using dummy port(use join conditions).
In which circumstances that informatica server creates Reject files?
When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Filed in the rows was truncated or overflowed.
When do u we use dynamic cache and when do we use static cache in an connected
and unconnected lookup transformation
We use dynamic cache only for connected lookup. We use dynamic cache to check
whether the record already exists in the target table are not. And depending on that, we
insert,update or delete the records using update strategy. Static cache is the default cache
in both connected and unconnected. If u select static cache on lookup table in infa, it
own't update the cache and the row in the cache remain constant. We use this to check the
results and also to update slowly changing records
How to get two targets T1 containing distinct values and T2 containing duplicate
values from one source S1.
Use filter transformation for loading the target with no duplicates. and for the other
transformation load it directly from source.
How to delete duplicate rows in flat files source is any option in informatica
Use a sorter transformation , in that u will have a "distinct" option make use of it .
why did u use update stategy in your application?
Update Strategy is used to drive the data to be Inert, Update and Delete depending upon
some condition. You can do this on session level tooo but there you cannot define any
condition.For eg: If you want to do update and insert in one mapping...you will create
two flows and will make one as insert and one as update depending upon some
condition.Refer : Update Strategy in Transformation Guide for more information
What r the options in the target session of update strategy transsformatioin?
Update as Insert:
This option specified all the update records from source to be flagged as inserts in the
target. In other words, instead of updating the records in the target they are inserted as
new records.
Update else Insert:
This option enables informatica to flag the records either for update if they are old or
insert, if they are new records from source.
What r the different types of Type2 dimension maping?
Type2
1. Version number
2. Flag
3.Date
What are the basic needs to join two sources in a source qualifier?
Two sources should have primary and Foreign key relation ships.
Two sources should have matching data types.
What are the different options used to configure the sequential batches?
Two options
Run the session only if previous session completes sucessfully. Always runs the session.
What are conformed dimensions?
A data warehouse must provide consistent information for queries requesting
similar information. One method to maintain consistency is to create dimension
tables that are shared (and therefore conformed), and used by all applications
and data marts (dimensional models) in the data warehouse. Candidates for
shared or conformed dimensions include customers, time, products, and
geographical dimensions, such as the store dimension.
What are conformed facts?
Fact conformation means that if two facts exist in two separate locations, then
they must have the same name and definition. As examples, revenue and profit
are each facts that must be conformed. By conforming a fact, then all business
processes agree on one common definition for the revenue and profit measures.
Then, revenue and profit, even when taken from separate fact tables, can be
mathematically combined.
Establishing conformity
Developing a set of shared, conformed dimensions is a significant challenge. Any
dimensions that are common across the business processes must represent the
dimension information in the same way. That is, it must be conformed. Each
business process will typically have its own schema that contains a fact table,
several conforming dimension tables, and dimension tables unique to the
specific business function. The same is true for facts.
Degenerate dimensions
Before we discuss degenerate dimensions in detail, it is important to understand
the following:
A fact table may consist of the following data:
_ Foreign keys to dimension tables
_ Facts which may be:
Additive
Semi-additive
Non-additive
Pseudo facts (such as 1 and 0 in case of attendance tracking)
Textual fact (rarely the case)
Derived facts
year-to-date facts
_ Degenerate dimensions (one or more)
What is a degenerate dimension?
A degenerate dimension sounds a bit strange, but it is a dimension without
attributes. It is a transaction-based number which resides in the fact table. There
may be more than one degenerate dimension inside a fact table.
Identifying garbage dimensions
A garbage dimension is a dimension that consists of low-cardinality columns
such as codes, indicators, and status flags. The garbage dimension is also
referred to as a junk dimension. The attributes in a garbage dimension are not
related to any hierarchy.
Non-additive facts
Non-additive facts are facts which cannot be added meaningfully across any
dimensions.
Textual facts: Adding textual facts does not result in any number. However,
counting textual facts may result in a sensible number.
_ Per-unit prices: Adding unit prices does not produce any meaningful
Percentages and ratios:
Measures of intensity: Measures of intensity such as the room temperature
Averages:
Semi-additive facts
Semi-additive facts are facts which can be summarized across some dimensions
but not others. Examples of semi-additive facts include the following:
_ Account balances
_ Quantity-on-hand
adding the monthly balances across the
different days for the month of January results in an incorrect balance figure.
However, if we average the account balance to find out daily average balance
during each day of the month, it would be valid.
event-based fact tables
Event fact tables are tables that record events. For example, event fact tables are used to record events such as Web
page clicks and employee or student attendance. Events, such as a Web user clicking on a Web page of a Web site, do
not always result in facts. In other words, millions of such Web page click
events do not always result in sales. If we are interested in handling such event-based scenarios where there are no
facts, we use event fact tables which consist of either pseudo facts or these tables have no facts (factless) at all.
From a conceptual perspective, the event-based fact tables capture the
many-to-many relationships between the dimension tables.
Q. What type of repositories can be created using Informatica Repository Manager?
A. Informatica PowerCenter includeds following type of repositories :
Standalone Repository : A repository that functions individually and this is unrelated to any other repositories.
Global Repository : This is a centralized repository in a domain. This repository can contain shared objects
across the repositories in a domain. The objects are shared through global shortcuts.
Local Repository : Local repository is within a domain and its not a global repository. Local repository can
connect to a global repository using global shortcuts and can use objects in its shared folders.
Versioned Repository : This can either be local or global repository but it allows version control for the
repository. A versioned repository can store multiple copies, or versions of an object. This features allows to
efficiently develop, test and deploy metadata in the production environment.
12. How do you improve performance of a lookup?

We can improve the lookup performance by using the following methods:
Optimizing the lookup condition:
If you include more than one lookup condition, place the conditions with an equal sign first to optimize lookup
performance.
Indexing the lookup table:
Create index on the lookup table. The index needs to include every column used in a lookup condition.
Reducing the Number of Cached Rows:
Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. This allows you to
reduce the number of rows included in the cache.
If the lookup source does not change between sessions, configure the Lookup transformation to use a persistent
lookup cache. The Power Center Server then saves and reuses cache files from session to session, eliminating the
time required to read the lookup source.
When using a dynamic lookup and WHERE clause in SQL override. Make sure that you add a filter before the
lookup. The filter should remove rows which do not satisfy the WHERE Clause.
Reason
During dynamic lookups while inserting the records in cache the WHERE clause is not evaluated, only the join
condition is evaluated. So, the lookup cache and table are not in sync. Records satisfying the join condition are
inserted into lookup cache. Its better to put a filter before the lookup using WHERE clause so that it contains
records satisfying both join condition and where clause.
1. Difference between Filter and Router?
Filter
You can filter rows in a mapping with the Filter
transformation. You pass all the rows from a
source transformation through the Filter
transformation, and then enter a filter condition
for the transformation. All ports in a Filter
transformation are input/output, and only rows
that meet the condition pass through the Filter
transformation
As an active transformation, the Filter

transformation may change the number of rows
passed through it
A filter condition returns TRUE or FALSE for
each row that passes through the
transformation, depending on whether a row
meets the specified condition.
In filter we can have one condition
Router
A Router transformation is similar to a Filter
transformation because both transformations
allow you to use a condition to test data. A
Filter transformation tests data for one
condition and drops the rows of data that do
not meet the condition. However, a Router
transformation tests data for one or more
conditions and gives you the option to route
rows of data that do not meet any of the
conditions to a default output group
As an active transformation, the router
transformation may change the number of rows
passed through it
In router we can have multiple condition
In router we can have multiple condition
What r the types of metadata that stores in repository?

Source definitions. Definitions of database objects (tables, views, synonyms) or files that
provide source data.
Target definitions. Definitions of database objects or files that contain the target data.
Multi-dimensional metadata. Target definitions that are configured as cubes and
dimensions.
Mappings. A set of source and target definitions along with transformations containing
business logic that you build into the transformation. These are the instructions that the
Informatica Server uses to transform and move data.
Reusable transformations. Transformations that you can use in multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about how and when
the Informatica Server moves data. A workflow is a set of instructions that describes how
and when to run tasks related to extracting, transforming, and loading data. A session is a
type of task that you can put in a workflow. Each session corresponds to a single mapping.
What are the session parameters?
Session parameters are like maping parameters,represent values you might want to
change between sessions such as database connections or source files.
Server manager also allows you to create userdefined session parameters.Following are
user defined session parameters:Database connections
Source file names: use this parameter when you want to change the name or location of
session source file between session runs.
Target file name : Use this parameter when you want to change the name or location of
session target file between session runs.
Reject file name : Use this parameter when you want to change the name or location of
session reject files between session runs.
What is Session and Batches?

Session - A Session Is A set of instructions that tells the Informatica Server How And
When To Move Data From Sources To Targets. After creating the session, we can use
either the server manager or the command line program pmcmd to start or stop the
session.
Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel
Execution By The Informatica Server.
There Are Two Types Of Batches :
Sequential - Run Session One after the Other.
Concurrent - Run Session At The Same Time.
If a session fails after loading of 10,000 records in to the target.How can u load the
records from 10001 th record when u run the session next time in informatica 6.1?
Running the session in recovery mode will work, but the target load type should be
normal. If its bulk then recovery wont work as expected
What are the different threads in DTM process?
Master thread: Creates and manages all other threads
Maping thread: One maping thread will be creates for each session.Fectchs session and
maping information.
Pre and post session threads: This will be created to perform pre and post session
operations.
Reader thread: One thread will be created for each partition of a source.It reads data from
source.
Writer thread: It will be created to load data to the target.
Transformation thread: It will be created to tranform data.
Explain about Recovering sessions?
If you stop a session or if an error causes a session to stop, refer to the session and error
logs to determine the cause of failure. Correct the errors, and then complete the
session. The method you use to complete the session depends on the properties of the
mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
Run the session again if the Informatica Server has not issued a commit.
Truncate the target tables and run the session again if the session is not recoverable.
Consider performing recovery if the Informatica Server has issued at least one commit.
How can u recover the session in sequential batches?
If you configure a session in a sequential batch to stop on failure, you can run recovery
starting with the failed session. The Informatica Server completes the session and
then runs the rest of the batch. Use the Perform Recovery session property
To recover sessions in sequential batches configured to stop on failure:
1.In the Server Manager, open the session property sheet.
2.On the Log Files tab, select Perform Recovery, and click OK.
3.Run the session.
4.After the batch completes, open the session property sheet.
5.Clear Perform Recovery, and click OK.
If you do not clear Perform Recovery, the next time you run the session, the Informatica
Server attempts to recover the previous session.
If you do not configure a session in a sequential batch to stop on failure, and the
remaining sessions in the batch complete, recover the failed session as a standalone
session.
Why in informatica usage of dynamic cache not possible in flat file lookup?
Nothing in this thread makes any sense. Nothing gets updated in a dynamic cached other than the cache itself. What
happens in the file is a matter of what your mapping does to it, not the cache.
A lookup (dynamic or otherwise) is loaded from a source. The source can be anything you have defined in your
environment... flat file, table, whatever.
The difference between a dynamic and static cache is in a dynamic cache one of the columns in the source must be
identified as the primary key (separate from the lookup key) and it must be numeric. It uses the values in that column to
figure out what the new key should be should you insert a new row in the cache.
If your flat file does not have such a column you cannot use it in a dynamic lookup.
Enable
You can configure the Integration Service to perform a test load.
Test Load With a test load, the Integration Service reads and transforms data without writing to targets. The
Integration Service generates all session files and performs all pre- and post-session functions, as if
running the full session.
The Integration Service writes data to relational targets, but rolls back the data when the session
completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write
data to the targets.
Enter the number of source rows you want to test in the Number of Rows to Test field.
You cannot perform a test load on sessions that use XML sources.
Note: You can perform a test load when you configure a session for normal mode. If you configure the
session for bulk mode, the session fails.

377.informatica - What Are The Main Issues While Working With Flat Files As Source and As Targets ?

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

377.informatica - What Are The Main Issues While Working With Flat Files As Source and As Targets ?

Uploaded by

Copyright:

Available Formats

377.Informatica - what are the main issues while working with flat files as source and as targets ?

329.Informatica - How do you handle two sessions in Informatica

319.Informatica - which one is better performance wise joiner or lookup

318.Informatica - How to partition the Session?(Interview

312.Informatica - how many types of sessions are there in

305.Informatica - how can we remove/optmize source bottlenecks

297.Informatica - what is test load?

276.Informatica - How many types of TASKS we have in

269.Informatica - how did you handle errors?(ETL-Row-Errors)

262.Informatica - in which particular situation we use

247.Informatica - What is Shortcut? What is use of it?

222.Informatica - What about rapidly changing dimensions?Can u analyze with an example?

216.Informatica - what is the architecture of any Data

212.Informatica - why did u use update stategy in your

There is a function called DECODE to which we can arguments as 0 1 2 3

205.Informatica - how do we do unit testing in informatica?how do we load data in informatica ?

197.Informatica - how can we store previous session logs

179.Informatica - hwo can we eliminate duplicate rows from flat file?

178.Informatica - what is Partitioning ? where we can use

158.Informatica - Difference between Rank and Dense Rank?

How to measure Performance of ETL load process

138.Informatica - what are partition points?

129.Informatica - How do you handle decimal places while

126.Informatica - Can we use aggregator/active transformation after update strategy transformation

38.Informatica - What r the types of groups in Router

47.Informatica - What is Datadriven?

Flag indiactes the dimension is new or newlyupdated.Recent dimensions will

63.Informatica - What is batch and describe about types of

73.Informatica - What is difference between partioning of

76.Informatica - Define informatica repository?

Updates modified aggregate groups in the target

Inserts new aggregate data

Delete removed aggregate data

Ignores unchanged aggregate data

Use Incremental Aggregator Transformation Only IF:

Mapping includes an aggregate function

Source changes only incrementally

Allocation of system shared memory

Execution of Pre-session commands/ Post-session commands

Creation of SQL commands for reader/writer threads

Start/End timings for target loading

Error encountered during session

Load summary of Reader/Writer/ DTM statistics

Log File Codes

Related to reader process, including ERP, relational and flat file.

Related to database, memory allocation

- Initialization information as well as error messages, and notification of rejected data

Session Failures and Recovering Sessions

(a) Non-Fatal Errors

The performance of recovery might be low, if

Mapping contain mapping variables

Commit interval is high

(b) Source based commit

Good Data. The target accepts

Correcting Reject File

Why writer can reject ?

Data overflowed column constraints

An update strategy expression

Why target database can Reject ?

Data contains a NULL column

Database errors, such as key violations

Steps for loading reject file:

Use the reject loader utility