You are on page 1of 65

Informatica Interview Questions

How to give different names for target tables in the mapping.?


Even we can give the target name as different in the mapping that we can override at session level.
Double click on session and target. There one parameter called TABLE NAME; we override here. Then it
will pick from here like as sql override.

Scenario for above is Inserting row into A table based on one column(Primary key) and other side is update
the same table A based on the other column as key.

What is the difference between star schema and snow flake schema? When we use those schema's?
star schema: When dimension table contains less number of rows, we can go for Star schema. In this Both
Dimension and Fact Tables are in De-Normalized form. Good for data marts with simple relationships …

What is the difference between oltp and olap


current data short database transactions online update/insert/delete normalization is promoted high
volume transactions transaction recovery is necessary
2)Volatile data behavior
3)More no of Users
4)E-R model for data modeling
5)application oriented
olap current and historical data long...
Non-volatile data behavior
Less no of Users
Dimension model for data.
subject oriented

Why fact table is in normal form?


Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns
which are primary key to other table that itself make normal form table.

Give examples of degenerated dimensions


A source table has a single column, we not transfer this column to dimension table, directly to connected to
the fact table ,this fact table called the de-generated dimensional table
Ex:Best example is "Invoice Number","bill number","PO Number", these all are degenerated dimensions in
the Transaction Tables, they can be maintained in the fact table itself instead of crating seperate
dimensions for this.

what is incremental loading?

Incremental loading means loading the ongoing changes in the OLTP.

What is surrogate key ? Where we use it explain with examples

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension
tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity
values for the surrogate key.
It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this
makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys
(according to the business users) but, not only can these change, indexing on a numerical value is probably
better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to
the system and as far as the client is concerned you may display only the AIRPORT_NAME.

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

What is the difference between a ods and staging area

ODS (Operational Data Source) is the first point in the Data warehouse. Its store the real time data of daily
transactions as the first instance of Date.

Staging Area, is the later part which comes after the ODS. Here the Data is cleansed and temporarily stored
before loaded into the Data warehouse.

What are active transformation / Passive transformations?

Transformations can be active or passive. An active transformation can change the number of rows that
pass through it, such as a Filter transformation that removes rows that do not meet the filter condition.

A passive transformation does not change the number of rows that pass through it, such as an Expression
transformation that performs a calculation on data and passes all rows through the transformation

Active transformations

Advanced External Procedure


Aggregator
Application Source Qualifier
Filter
Joiner
Normalizer
Rank
Router
Update Strategy

Passive transformation

Lookup
Expression
Stored Procedure
Sequence generator
External Procedure
XML Source Qualifier
Maplet- Input
Maplet – Output

What is a mapping, session, worklet, workflow, mapplet?


Mapping - represents the flow and transformation of data from source to taraget.
Mapplet - a group of transformations that can be called within a mapping.
Session - a task associated with a mapping to define the connections and other configurations for that
mapping.
Workflow - controls the execution of tasks such as commands, emails and sessions.
Worklet - a workflow that can be called within a workflow.

How do we call shell scripts from informatica?


You can use a Command task to call the shell scripts, in the following ways:
1. Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run
shell commands.
2. Pre- and post-session shell command. You can call a Command task as the pre- or post-session shell
command for a Session task

What is the difference between Power Center & Power Mart?


Powermart:
We can register only local repositories
prtitioning is not available here
doesnot support ERP

PowerCentre:

We can make repositories to GLOBAL


Partitioning is available
Supports ERP

ETL Testing in Informatica:

1. First check the workflow exist in specified folder.


2. Run the workflow. If the workflow is success then check the target table is loaded on proper data else we
can need analyze the session log for the root cause for the failure then discuss with the developer and we
need to check email also when the workflow fails.
3. Need to validate the email configuration when workflow fails.

What are the different Lookup methods used in Informatica


connected lookup will receive input from the pipeline and sends output to the pipeline and can return any
number of values. it does not contain return port .
Unconnected lookup can return only one column. it containn return port.

In the lookup transformation mainly 2 types


1)connected 2)unconnected lookup
Connected lookup: 1)It recive the value directly from pipeline 2)it iwill use both dynamic and static
3)it return multiple value 4)it support userdefined value

Unconnected lookup:it recives the value :lkp expression 2)it will be use only dynamic
3)it return only single value 4)it does not support user defined values

What is a source qualifier?


Source Qualifier is one of the transformations, which converts any type of data to the relational format. So
that, can easily be used for any of the operations on the data.

Source Qualifier Is the default Transformation.


Through The source Qualifier Transformation Informatica Reads The Data.
We can Filter The Data.
We can sort the Data.
Its also Used to Join Homogeneous Source systems.
We can Join Any number of Sources in Single Source Qualifier.
We Can't Join the Flat files In source qualifier Because Flat files Are Heterogeneous When we open the Flat
files At source qualifier At the time All The options are Disabled.

Informatica Architecture :

client tools-
Repositoy Manager-> Pc Designer-> WFManager-> WFMonitor
| | | |
(creat,mdify,del-folders, (src&Tar def, (create tasks (display
privilages& access RS) mapings,maPlets)connect to work flow) reult/output)
---------------------

Powercenter tools--
PC service <--------------------->Repository Database
|
|
<-------> PC Server<---->

here: Pc-PowerCenter.

What are 2 modes of data movement in Informatica Server?

a) Unicode - IS allows 2 bytes for each character and uses additional byte for each non-ascii character (such
as Japanese characters)
b) ASCII - IS holds all data in a single byte

he two types of modes are:


1) Normal Mode in which for every record a separate DML stmt will be prepared and executed
Bulk Mode in which for multiple records DML stmt will be preapred and executed thus improves
performance.

There are 2 modes of data moment:


1. Normal mode
2. Bulk mode
Normally if there are no data constraints (key constraints) we will use bulk mode otherwise we use normal
mode.
Data Movement Modes 1. ASCII mode :- Single byte of data is processed.
2. UNICODE mode :- Two bytes of data is processed.

Data Loading Modes


1. Normal Mode :- Commit is executed after 10000 records (default).
2. Bulk Mode :- Commit is executed after all records have been loaded.

$ & $$ in mapping or parameter file


$ is session parameter e.g $DBConnection
$$ is mapping parameter/variable e.g $$LASTRunDate

What is session and batches?


Session: A session is a set of commands that describes the server to move data to the target.
Batch : A Batch is set of tasks that may include one or more numbar of tasks (sessions, ewent wait, email,
command, etc..,)
There are two types of batches in Informatica:
1. Sequential: When Data moves one after another from source to target it is sequential
2. Concurrent: When whole data moves simultaneously from source to target it is Concurrent

which is better among incremental load, Normal Load and Bulk load

Incremental load:
Incremental means suppose today we processed 100 records ,for tomorrow run
u need to extract whatever the records inserted newly and updated after previous run based on last
updated timestamp (Yesterday run) this process called as incremental or delta
Normal load:
In normal load we are processing entire source data into target with constraint based checking
Bulk load:
In bulk load with out checking constraints in target we are processing entire source data into target

Which transformation you need while using the cobol sources as source definitions?
Normalizer transformation which is used to normalize the data

What is DTM(Data Transfer Manager)


After the load manager performs validations for the session, it creates the dtm process. The dtm process is
the second process associated with the session run. The primary purpose of the dtm process is to create
and manage threads that carry out the session tasks.· the dtm allocates process memory for...
Where do we use Teradata utilities (FastLoad, MultiLoad, Tpump, FastExport) in Informatica
here 2 things you need to know.
1)Sources / Targets (Teradata)
You can go to Source analyser and Target importing tool in DESIGNER and simply import tables as sources
or targets via TERADATA odbc connection. This will allow you to use them and build your mappings as
you deem fit.

2)Executing workflow with Teradata Utilities.


a) For Sources
for sources you can only use a RELATIONAL CONNECTION which has no link with teradata utilities so
its pretty normal way of doing.

b) For Targets
to load any target table you have the options to make either "fast load connection" OR " multiload
connection" OR "Tpump connection" from Workflow Manager > Connections > Loader > New .... menu.

How to configure these connections you might want to have a person which has knowledge about teradata
TDPID etc.

briefly about connections,


TDPID = <bla bla>cop1 in your HOSTS
DATABASE NAME = database which contains tables
ENV SQL = can leave empty
DATA SOURCE NAME = DSN defined to connect to TD

Informatica does not do a bulk load


to Teradata. it will load row by row (cehck the session log to find this). To load into Teradata we have to
use external loader option either a fastload or multiload. Now another issue in using flat files. Informatica
process will not load delimeted files to Teradata it will laod only fixed length files. ( You can find this in
documentation) so you should have some kind of expression object or some other object that will convert
the CSV file to a fixed lenght file and then use external loader to load it into Teradata. The whole process is
very simple, because you dont have to write any fastload or
multiload, informatica will do it for you. let me know if you need further help. BUT the key is to convert
CSV file to fixed length file and then use external loader to load it into Teradata and it will run you job in 5
to 10 mins.

Teradata has 3 different types of Teradata loader processes, as follows:


Fastload
Mload
Tpump
Each loader process can be used in two different modes, as follows:
Staged Mode: The Informatica process does the following in this order:
Reads from the source data.
Creates a data file.
Invokes the loader process to load the table using the data file created.

Advantages: In the event of failures, you can recover using the Teradata recovery process.
Disadvantages: Staged mode is slower than Piped mode, and you need more disk space, as it can create
large data files.

Piped Mode: The Informatica process reads from the source and simultaneously pipes that data to the
loader to start loading the target table.

Advantages: Quicker than Staged mode, and you do not require large amounts of disk space because no
data files are created.
Disadvantages: In the event of failures, you cannot recover using the Teradata recovery process (because
tpump does row commits unlike fastload and mload).

Fastoad

You use the Fastload process on empty tables, such as loading staging tables and in initial loads where the
tables are empty.
When the Fastload process starts loading, it locks the target table, which means that processes (for
example, lookups) cannot access that table. One solution to this problem is to specify dummy SQL for the
look up overrides at the session level.
TIP: If a session fails during a Fastlload process, use SQL Assistant to run a simple SQL command (for
example, count(*)), to determine whether the table is locked by a Fastload process.
If a table is locked (for example, (for W_ORG_DS), use the following script to release the lock:
LOGON SDCNCR1/Siebel_qa1,sqa1;
BEGIN LOADING Siebel_qa1.W_ORG_DS
ERRORFILES Siebel_qa1.ET_W_ORG_DS,siebel_qa1.UV_W_ORG_DS;
END LOADING;
If you save the above text in a file called test.ctl, you would run this process by entering the following
command at a command prompt:
C:\fastload\test.ctl
TIP: To create a load script for a table, edit the test.ctl script above to change the login information, and
replace all occurrences of W_ORG_DS with the required target table name.
After a load process script runs successfully, you should be able to run the command 'select count(*)' on the
target table. If you are not able release the lock, you might need to drop and re-create the table to remove
the lock. If you do so, you must re-create the statistics.
TIP: Fastload is typically used in piped mode to load staging tables and initial loads. In the event of
errors, reload the entire data.
Mload

The Mload process is slower than Fastload but quicker than Tpump. The Mload process can work on both
empty tables as well as on tables with data. In the event of errors when running in piped mode, you cannot
recover the data.
Tpump

The Tpump process is slower than Mload but faster than ODBC. The Tpump process does row commits,
which enables you to recover processed operations, even if you use piping mode. In other words, if you re-
start the process, Tpump starts loading data from the last committed data.
Tpump can be used in the following modes:
Tpump_Insert : Use to do inserts.
Tpump_Update : Use to do updates (this mode requires you to define the primary key in the Informatica
target table definition).
Tpump_Upsert : Use to do update otherwise insert (this mode requires you to define the primary key in
the Informatica target table definition).
Tpump_Delete: Use to do deletes (this mode requires you to define the primary key in the Informatica
target table definition).
Informatica uses the the actual target table name to generate the error table and log tables to be used as
part of its control file generation. If you have two instances of Tpump loading into same target table at the
same time, you need to modify the session to use a different error table and log table name.
The Tpump load process in piped mode is useful for incremental loads, and where the table is not empty.
In the event of errors, restart the process and it starts re-loading from the last committed data.
Refer to Informatica documentation for information about configuring a session to use Teradata loaders.

What is Check In and Check Out in Informatica?

In Informaitca we use Check in and Check out as versioning tool. Whenever we want to edit any
mapping/mapplet/workflow/session in the repository then we need to Check OUT first.Once we Check out
it becomes into editable mode and we can implement our changes.And once all the changes are done we
can save it.
Once we are done with the changes we can Check IN so that other users can also see what changes we have
done and they can also view that.

Note:It is good practice to enter comments for every Check IN and Check OUT.This allows other users to
know the purpose of your change

What is a Debugger in Informatica and when to use it?

A debugger is used to troubleshoot the errors in a Informatica mapping that you find before running a
session or after saving the mapping and running the session. To debug a mapping, first we need to
configure the debugger and then run the same within the Mapping Designer.
The Debugger makes use of the existing session or creates a debug session of its own to debug the
mapping.
Debugging can be done in either of the below situations.
Before running the session: Once you are done with the mapping you can do the debugging on the
mapping before making the session to check the initial results

After you run a session: When you encounter any errors while running the sessions then you can go to the
mapping and start debugger with the existing session

How to move mappings from Development folder to Production folder in Informatica

We can move the objects from Development folder to Production folder in Informatica by using any of the
two methods.Before making any changes to the Production folder make sure you take the back up of all the
objects.
Export and import
Export the mappings that you want to move from the DEV folder and save those as XML in some folder.
Take the back up of the Production mappings before replacing
Import these XML into Production Folder
Save the mappings
But when you are doing these we need to check the below things
1.We need to check the Replace check box in case the source or target is already present in the folder
2. For other reusable transformations such as source shortcuts, target shortcuts or any other
transformations (Sequence Generator, Lookup etc) choose Reuse (not replace)
3. On Global copy options, in conflict resolution wizard, select (or mark) Retain Sequence Generator,
Normalizer or XML key current values.

Direct Migration :
IF the Development and Production are in separate repositories then go to the Repository Manager and
then connect to the development Repository.Then go to the Production repository and open that too.Then
you can drag and drop all the folders from Dev to Production.
Note:
Problem might come when we export and import objects separately i.e. mapping, workflow etc. The big
problems is shortcuts. In this case
1.Open Development folder from Repository Manger
2.Select only the workflows related to the mapping from the repository manager and export only the
workflow XML. This will take all the associated objects (mappings, sessions, etc.) with the workflows.
3.Import from this single file into your Prod Environment.
This will import and export everything regarding a mapping.

DAC Process Cycle


DAC is used to design, execute, monitor and diagnose execution plans
Setup: Database connections, ETL setup in Informatica
Design: Define Tasks, Groups and Execution plans
Execute: Define parameters, schedule and run the execution plans
Monitor: Monitoring the run time executions
Diagnose: In case of task failure identifying the route cause and rerunning the task.
Test Cases For Data Loading In Informatica
Check If The Source And Target Data Base Connections Are Fine, And No Issues Accessing The Source
Data. The Database Connections Should Be Defined In The Parameter File

If it’s A Full Load Check If the Truncate Option Is Enabled and Working Correctly. For Type 2(Scdtype 2)
Truncate Option Should Be Disabled
3)Check For The Performance Of The Session While Loading The Data. This Includes Checking The
Threshold Value Once The Data Is Loaded. The Performance Becomes Important When The Number Of
Records Are Huge

4)Set The Stop On Error As 1 In Error Handling. Indicates How Many Non-Fatal Errors The Integration
Service Can Encounter Before It Stops The Session. Non-Fatal Errors Include Reader, Writer, And Dtm
Errors. Enter the Number Of Non-Fatal Errors You Want To Allow Before Stopping The Session.This Will
Stop The Workflow When Informatica Encounters Any Error

Check The Option To Fail The Parent If The Task Fails. Fail Parent If This Task Fails Should Be Checked For
All The Sessions Within A Workflow.

Check If The Logs Are Getting Updated Properly After The Data Load Is Completed. This Includes the
Session Log and Workflow Log

Check If the Mapping Parameters And Workflow Parameter Used In The Mapping Are Defined Correctly
Compare The Stage And Target Table Counts.
Select Count(*) From Product_D
Select Count(*) From Product_Ds

Comparing the Attributes from Stage Tables to That of the Target Tables. They Should Be Matching.

Dynamic and Static Cache in Informatica

In Informatica Lookup transformation we have the option to the cache the Lookup table(Cached Lookup).If
we don’t use the lookup cache its is called as Uncached Lookup.
In Uncached lookup we do lookup on the base table and will return output values based on the Lookup
condition. If the lookup condition is matching it returns the value from Lookup table or cache .And if
lookup condition is not satisfied then it returns either NULL or default value. This is how Uncached
Lookup works

Now we will see what a Cached Lookup is!


In Cached lookup the Integration Service creates a Cache whenever the first row in the Lookup is
processed. Once a Cache is created the Integration Service always queries the Cache instead of the Lookup
Table. This saves a lot of time.

Lookup Cache can be of different types like Dynamic Cache and Static Cache

What is a Static Cache?


Integration service creates Static Cache by default while creating lookup cache. In Static Cache the
Integration Service does not update the cache while it processes the transformation. This is why it’s called
as Static.

Static Cache is same as a Cached Lookup in which once a Cache is created the Integration Service always
queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup table else returns Null or
Default value.
In Static Cache the important thing is that you cannot insert or update the cache.

What is a Dynamic Cache?


In Dynamic Cache we can insert or update rows in the cache when we pass the rows. The Integration
Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The
dynamic cache is synchronized with the target.

1. Connected or Unconnected:
They differ in the way output is received. In Connected Lookup the input is received through pipeline
whereas in Unconnected Lookup receives input values from the result of a :LKP expression in another
transformation
2. Lookup via Flat File or Relational:
After creating Lookup Transformation we can lookup either on a Flat file or on relational tables.
When we do Lookup on Relational tables we have to connect to the required table which will be there in
the source list of Lookup. If it’s not there then we need to import the table definition for the Lookup
Transformation
When we do Lookup on Flat files the Designer invokes the Flat File Wizard and connects the source.
3. Cached or Uncached :
Lookup Cache can be of two types:
1. Dynamic Cache: If its Dynamic Cache then integration service takes the rows from the cache .This
improves the session performance and speeds up the activity.
2. Static Cache: By default the Lookup cache will be static and will not change during the entire session

What Actually is a Persistent Cache?

Normally a lookup will be in cached form by default. Which means that when we do lookup on table then
Informatica will be go into the lookup table and store the data in Cache file, which avoids re lookup into
the table when we need the data again.Informatica will make use of the cache file and this makes the
lookup more faster.

But now the question comes why we need Persistent Cache in Lookup. To use Persistent Cache we need to
check the option of Using Persistent Cache in the lookup. When we do that what Informatica does is that it
will store the cache file and won’t delete it after run of the session or workflow.

This becomes handy for situations where we use the same lookup in many mappings. Suppose that we use
the Lookup LKP_GET_VALUE with same lookup condition and return and output ports in 10 different
mappings. In this case if you don’t use Persistent Cache then we have to lookup the table 10 times, and if
the table is a huge value then it will take some time to build the cache. This can be avoided by using
Persistent Cache.

Types of Lookup Caches in Informatica

Lookup Caches in Informatica


Static cache
Dynamic cache
Shared cache
Persistent cache
Static cache
Static Cache is same as a Cached Lookup in which once a Cache is created the Integration Service always
queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup table else returns Null or
Default value. In Static Cache the important thing is that you cannot insert or update the cache.

Dynamic cache
In Dynamic Cache we can insert or update rows in the cache when we pass the rows. The Integration
Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The
dynamic cache is synchronized with the target

Shared cache
When we use shared Cache Informatica server creates the cache memory for multiple lookup
transformations in the mapping and once the lookup is done for the first lookup then memory is released
and use that memory used by the other look up transformation.
We can share the lookup cache between multiple transformations. Unnamed cache is shared between
transformations in the same mapping and named cache between transformations in the same or different
mappings.

Persistent cache
If we use Persistent cache Informatica server processes a lookup transformation and saves the lookup cache
files and reuses them the next time. The Integration Service saves or deletes lookup cache files after a
successful session run based on whether the Lookup cache is checked as persistent or not
In order to make a Lookup Cache as Persistent cache you need to make the following changes
Lookup cache persistent: Needs to be checked
Cache File Name Prefix: Enter the Named Persistent cache file name
Re-cache from lookup source: Needs to be checked

Recache from database


If the persistent cache is not synchronized with the lookup table you can configure the lookup
transformation to rebuild the lookup cache.

Difference between Decode and IIF in INFORMATICA

Decode can be used in Select statement/IIF cannot be used in a Select statement.


Decode gives better readablity to users/IIF dosenot give good readablity

Setting the Date/Time Display Format in Informatica Workflow Manager

Date and time show in Workflow Manager is the one which is in Windows Control Panel of the
PowerCenter Client machine.You can modify using the below steps
Go to Control Panel
Click on Regional Settings
Set the date and time

How to run Informatica Workflow using Unix command


To run the workflow first need to got the command line and go to the folder where the executable is
installed. Mostly you can find the pmcmd command in \server\bin\directory.

Syntax:
pmcmd startworkflow -sv <Integration Service Name> -d <Domain Name> -u <Integration Service
Username> -p <Password> -f <Folder Name> <Workflow>

pmcmd startworkflow -sv qpd_Service -d qpd_Domain -u Administrator -p admin -f Alex wf_sales_tax

Before that we need to configure the environment variables and make the necessary changes

Go to control panel->systems->advanced ->environment variable->system variable and add a new system


variable and in the variable value set the path of your server machine where informatica is installed (where
pmcmd.exe file is present)

After that Go to control panel->systems->advanced ->environment variable->system variable and there you
can see the PATH variable present just add a “; “ at the end and add the path

Transaction Control Transformation in Informatica

Informatica Power Centre allows us to control the roll back and commit on transaction based on set of rows
that passes through the Transaction Control transformation. This allows to define your transaction whether
it should be committed or rollback based on the rows that pass through ,such as based on the Entry Date or
some other column.We can control this transaction either at the Mapping level or Session level.

1)Mapping Level:
Inside the mapping we will be using Transaction Control transformation. And inside this transformation
we have an expression. Based on the return value of this expression we decide whether we have to commit,
roll back, or continue without any transaction changes.The transaction control expression uses the IIF
function to test each row against the condition.
Use the following syntax for the expression:
IIF (condition, value1, value2)
Use the following built-in variables in the Expression Editor when you create a transaction control
expression:
TC_CONTINUE_TRANSACTION
The Integration Service does not perform any transaction change for this row. This is the default value of
the expression.
TC_COMMIT_BEFORE
The Integration Service commits the transaction, begins a new transaction, and writes the current row to
the target. The current row is in the new transaction.
TC_COMMIT_AFTER
The Integration Service writes the current row to the target, commits the transaction, and begins a new
transaction. The current row is in the committed transaction.
TC_ROLLBACK_BEFORE
The Integration Service rolls back the current transaction, begins a new transaction, and writes the current
row to the target. The current row is in the new transaction.
TC_ROLLBACK_AFTER
The Integration Service writes the current row to the target, rolls back the transaction, and begins a new
transaction. The current row is in the rolled back transaction
2)Session Level:
When we run a session the Integration Service checks the expression ,and when it finds a commit row then
it commits all rows in transaction to the target table. When the Integration Service evaluates a rollback row,
it rolls back all rows in the transaction from the target or targets.
Also we can do user defined commit here in case Integration services fails to do so.

What is Surrogate key

Surrogate key is the primary key for the Dimensional table. Surrogate key is a substitution for the natural
primary key.It is just a unique identifier or number for each row that can be used for the primary key to the
table.The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate,(also known as artificial or identity key), key for the dimension
tables primary keys.They can use sequence generator, or Oracle sequence, or SQL Server Identity values for
the surrogate key.It is useful because the natural primary key (i.e. Customer Number in Customer table)
can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys
(according to the business users) but ,not only can these change, indexing on a numerical value is probably
better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to
the system and as far as the client is concerned you may display only the AIRPORT_NAME.

Source Qualifier Query is not working when I run the workflow?


Check the below things when you counter error in a Source Qualifier :
Always validate the source query before saving it and running the session.There may be syntax error.
Check that the database login in both source qualifier and the corresponding session are the same.
Test the SQL query in some client tools like SQL developer and confirm that the data is correct
Check that all joins are proper in the Source Qualifier
Also make sure that all the ports in select list from the Source Qualifier is taken into the next level
The order in which the cols are given in Source Qualifier query should be in same order as its in the Ports
tab
Review the session log file if you need further information.
How to improve session using Joiner Transformation?
In order to improve the session performance check the below things
1.Use Sorted Input to the Joiner
Use sorted Input to the Joiner Transformation for improving the session performance. This reduces the
time taken by Integration services
2.Unsorted Data
For an unsorted Joiner transformation, use the source with fewer rows as the master source and the other
as detail.
3.Sorted Data
For a sorted Joiner transformation, use the source with fewer duplicate key values as the master source and
the other as detail.
4.Use Database Join
In some cases we cannot join at session level, in that case we need to join at Database level. Joining at
database level improves the session performance. For joining at database level
Create stored procedure to perform the join
Use the Source Qualifier transformation to perform the join.

How to do Top-N analysis in Oracle


select ROWNUM as RANK, ename,sal from(select ename,sal from emp ORDER BY sal DESC) WHERE
ROWNUM<=3;
RANK ENAME SAL
---------- ---------- ----------
1 KING 5000
2 SCOTT 3000
3 FORD 3000

select ROWNUM as RANK, ename,sal from(select ename,sal from emp ORDER BY sal) WHERE
ROWNUM<=3;
RANK ENAME SAL
---------- ---------- ----------
1 SMITH 800
2 JAMES 950
3 ADAMS 1100
SCD Type 1,Slowly Changing Dimension Use,Example,Advantage,Disadvantage

In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In
other words, no history is kept.
In our example, recall we originally have the following table:

Customer Key Name State


1001 Williams New York

After Williams moved from New York to Los Angeles, the new information replaces the new record, and
we have the following table:
Customer Key Name State
1001 Williams Los Angeles

Advantages
This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep
track of the old information.
Disadvantages
All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in
this case, the company would not be able to know that Williams lived in New York before.
Usage
About 50% of the time.

When to use Type 1


Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep
track of historical changes.

SCD Type 3,Slowly Changing Dimension Use,Example,Advantage,Disadvantage

In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of
interest, one indicating the original value, and one indicating the current value. There will also be a column
that indicates when the current value becomes active.
In our example, recall we originally have the following table:

Customer Key
Name
State
1001
Williams
New York

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:
Customer Key
Name
Original State
Current State
Effective Date
After Williams moved from New York to Los Angeles, the original information gets updated, and we have
the following table (assuming the effective date of change is February 20, 2010):

Customer Key
Name
Original State
Current State
Effective Date
1001
Williams
New York
Los Angeles
20-FEB-2010

Advantages
This does not increase the size of the table, since new information is updated.
This allows us to keep some part of history.
Disadvantages
Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if
Williams later moves to Texas on December 15, 2003, the Los Angeles information will be lost.
Usage
Type 3 is rarely used in actual practice.

When to use Type 3


Type III slowly changing dimension should only be used when it is necessary for the data warehouse to
track historical changes, and when such changes will only occur for a finite number of time.

important points/enhancements in OBIEE 11g when compared to OBIEE 10g are listed below
OBIEE 11g uses WebLogic Server as the application server as compared to Oracle AS or OC4J in OBIEE
10g.
The clustering process in much easier and automated in OBIEE 11g.
We can now model lookup tables in the repository.
The new UI called Unified Framework now combines Answers, Dashboards, and Delivers.
A new column called the hierarchical column in introduced.
BI Publishers is fully and seamlessly integrated with OBIEE 11g.
New time series functions PERIOD ROLLING and AGGREGATE AT are introduced.
In OBIEE 11g we can create KPIs to represent business metrics.
The aggregate persistence wizard creates indexes automatically.
The session variables get initialized when they are actually used in OBIEE 11g unlike OBIEE 10g where
they were initialized as soon as a user logs in.
OBIEE 11g now supports Ragged (Unbalanced) and Skipped Hierarchy.
You can also define Parent-Child hierarchy in OBIEE 11g as well.
SELECT_PHYSICAL command is supported in OBIEE 11g.
In OBIEE 11g there are some changes in the terminology as well.
iBots are renamed as Agents.
Requests are renamed as Analyses.
Charts are renamed as Graphs.
Presentation Columns are renamed as Attribute Columns.

Types of Tasks in Informatica


There are different types of tasks in Informatica Workflow manager which we use while running a
workflow have listed it below.

Assignment Used to assign a value to a workflow variable


Command Used to run a shell command during the workflow
Control Used to stop or abort the workflow
Decision Tells a condition to evaluate
Email Used to send email during the workflow
Event-Raise Notifies the Event-Wait task that an event has occurred
Event-Wait It waits for the event to completed in order to start the next task
Session Used to run the mapping created in Designer buy linking to session
Timer It waits for a already timed event to start

Session Parameters in Informatica


Session parameters, like mapping parameters, represent values you might want to change between
sessions, such as a database connection or source file. Use session parameters in the session properties, and
then define the parameters in a parameter file. You can specify the parameter file for the session to use in
the session properties. You can also specify it when you use pmcmd to start the session.The Workflow
Manager provides one built-in session parameter, $PMSessionLogFile.With $PMSessionLogFile, you can
change the name of the session log generated for the session.The Workflow Manager also allows you to
create user-defined session parameters.

Naming Conventions for User-Defined Session Parameters


Parameter Type Naming Convention
Database Connection $DBConnectionName
Source File $InputFileName
Target File $OutputFileName
Lookup File $LookupFileName
Reject File $BadFileName

How to remove Null values using Filter Transformation?


In the Filter Transformation you can filter out rows having null values and spaces by using ISNULL and
IS_SPACES functions.
For example , to filter out rows that contain null in EMP_NAME column us the below condition
IIF(ISNULL(EMP_NAME),FALSE,TRUE)
This condition says that if the Employee name is null then discard the row else pass it to the next
transfomation

Reusable Transformations in InformaticaWhat is reusable transformation?


Transformations that can be reused in multiple mappings or mapplets within a folder, repository, or
repository domain. This is developed using Transformation Developer to create this.Instead of creating a
new transformation we can make the transformation Reusable and use different instances of the same
reusable transformations in to different mapplets and mappings. There is no limit to the number of
reusable transformations you can use.

Conformed Dimension in Obiee


The best definition about Conformed dimension is that it is dimension which is consistent across the
whole business and can be linked with all the facts to which it relates .The best example is the Date
Dimension because its attributes (day, week, month, quarter, year, etc.) have the same meaning across all
the facts. The month January will be the same for all the departments in an organization. Unless there are
some departments that operate on a different fiscal calendar to the rest of the organization we can consider
date dimension as conformed.

All dimensions in your warehouse need to be conformed to get the exact power of a Data warehouse.
Below are some of the commonly used Conformed Dimensions:

Customer
Product
Date/Time
Employee
Account
Region /Territory
Vendor

Using the Copy As Command in Informatica Designer


In Informatica once mapping is created and if we want to overwrite on the same mapping then we need to
create a copy of that mapping as a backup. For creating a copy we need to use Copy as command. When
we create a mapping using Copy as command then another mapping same as that of the original is created.
This reduces the complexity in duplicating a mapping
To use the Copy As command to copy a mapping:
Open a mapping in the Mapping Designer
Click Mappings > Copy As.
It will prompt for entering a new name. Enter a new mapping name
Click OK.

regarding error it can be anything


1) data type issues esp. While loading blob, clob
2) code page setting of source and target to handle special character.
Assumptions:
1) u have proper source and targt table in place
2) u have configured connections properly.
3) no network glitch
4) no space issue in target...

About code page setting --Depend on data u try to load... if it contain lot of special character we prefer to
load via utf 8.. as some code page do not support that... ur informatica connection code page should match
with source or target code page..

Which type of join is not possible in source qualifier?


>heterogeneous join and full outer join is not possible in SQ
>If u use SQL override then all r possible because it executes in DB. If u r talking abt infa then Full outer
join s not possible in SQ using user defined join tab..only equi, left outer & right outer..

How to fail a session immediately when the first error occurs on log?
Click on session -->config object --> stop on error =1.
Stop on error means the running session will stop/fail whenever it encounters any error in the session. If
you specify '1' here, session will stop on the very first error without processing any data further, hence will
fail

why Update Strategy is active in T/R?


Its active because DD_REJECT clause will restrict the no of rows to the output from the Trans. any Trans
that can change the no of rows is active trans.

DD_Delete property deletes the row frm table..hence decreases no of rows , hence update strategy is
Active

What is d difference between cdc and scd??


change data capture is used to incrementally extract changed or new records from a source, so that you
don't download the whole database each time but slowly changing dimension is a way to apply updates to
a target so that the original data is preserved
CDC is refered to capture the changes
SCD refers to keep and maintain slowly changing dimernsions.
if some one use both for same perspective then there is no difference. so in ur scenario, CDC and SCD will
be same expect scd is only for dimenions and cdc is for facts and/or others

Q)What are the reasons why target bottleneck occurs? can any one give me at least 5 reasons?

1. Drop Indexes and key constraints. We can Drop & rebuild the indexes in the pre and the post session.

2. Increase the check points intervals.

3. Use External loading (SQL*Loader bulk loads to target files)

4. Increase database network packet size (at oracle level we do so in the tnsnames.ora or listner.ora and
in the informatica level we need to increase it inthe informatica server configuration and also in the
database server network memory).

5. Improve oracle target database.

6.Use of Partitioning concepts (pass through partitioning, database partitioning, Key partitioning,
Hash Key partitioning).
2.

Q)Col1 Col2 Col3 Col4 Col5


1 A 10 21 10
1 A 15 22 29
1 A 22 23 20
In the aggregater Tra
Group by: Col1&col2
Sum(col3),sum(col5)
col4 is passed as it is...

What is output?will it pass to target or not possible?

A)47,23,59. By default aggregator returns the last value from the group set..hence 23 is returned..
Q)Difference b/w informatica 8.x and 9.x?\

1) Lookup can be now configured as active transformation. 2) We have an option to configure the size of
session log. 3)New version of Informatica comes with a bundle with ( Informatica Developer + Analyst
tool). 4) We have deadlock feature enabled in this version before if there was a deadlock session use to
get fail.

Draw data model of ur project?


Explain informatica architecture 9?
How to configure staging area?
Explain few complex Mappings?
Wat is tracing levels? And wat r its different levels?
Wat is load balancing?
Wat is pushdown optimization? In which instance u used pushdown optimization in ur project?
Wat is deployment group?
Wat is CDC? How to configure CDC?
Wat is ODS? How to configure ODS?
Wat r ur roles in production environment?
Which schedular tool u used in ur project? And explain how did u used it?
Wat r d different issues u faced in ur project?
On which scenario u used persistent lookup cache in ur project?
On which scenario u used unconnected lookup tr in ur project and y??
Give scenario wer u used incremental aggregation in ur project and y?
Wat is indexes? Wat r types of indexes? And which index u used in ur project? And y u used?
Explain SCD TYPE II? Y u sql override in lookup tr in scd type 2?
Explain reporting tool? Which reporting tool u used and y?
Wat r ur involvement in preparation of LLD?Low level design?

How to create the staging area in your database


I collect some of info.

Staging area and ODS

It is a concept we implement in DWHsing where we use it to load data from different kinds of sources
before loading DWH. Staging contains only current cleaned,profiled data whereas DWH contains all
historical cleaned,profiled data.

Main purpose is we cannot compare source format data with DWH directly as source data will not be a
proper cleaned data which may not match with cleaned data stored in DWH. So, first we clean all source
data and load temporarly to staging from where we comapre with DWH using SCDs.

As Informatica can do inline Extraction transformation and load itself.

However in EDW systems there are multiple reasons why you may want to stage the data

1. EDW requires snapshot of data at a point in time to be loaded -

i.e if you dont have a staging and if load fails you may have different set of data getting extracted from
source
2. Staged source is easy work with in bulk mode when you want to merge multiple sources and load to
EDW

- Some times all sources would not be available at the same time. They may become available at different
slots.

3. SCD II type of complex operations may need place holder on the database in the form of Staging where it
is easier to build complex SQLs and then extract data instead of in memory joins

whatever is your source, you will be implementing some logic to clean,profile,refomat your source data
before loading to target DWH right?

Simple Eg: Assume source has a column Ename which has data with some spaces like ' abc '. So, before
loading to DWH you will be having some standards to folow something like spaces need to be removed or
data should be in upper case or may extract part of string etc.

Now, if you dont have a temporary staging, when you Impelment SCD, while comparing data ' abc ' will
never match with 'ABC'. But once you clean data and load to staging, data will be cleaned lik 'ABC' which
can be compared with DWH data (SCD Implementation) which will match and load accordingly.

ETL Cleansing,profiling,

OLTP or ERP or any source ----------------------------------------------> STAGE-->

Reforamtting,Business Stds

ETL

-->STAGE ------------------------------------------------> DWH

SCD Implementation

without staging,

ETL

OLTP or ERP or any source ---------------------------------------> DWH

SCD Implementation

Basically we didn't perform business logic in target area. Apart there are lot of issues like data cleansing
issues in source system. In that case we need to maintain staging area. If your data is cleaned and you
alrady perform transformations, cleansing issues through ETL specific tool or you have storage issues.
Then you may not need staging area. But as a standard staging area must be used.Because the source data
is always prone to error.

Even if your source and target data base table or files as per requirement you have to perform some data
cleansing operations before loading into ODS Layer. for this we are using staging. If you don't have any
requirement to cleansing data, just pass data from source systems to ODS then no need to use staging layer.

also this is done to avoid network overhead on OLTP system. if u directly run complex queries on source
system this might lock the table for quiet a period of time. so to avoid network congestion it is brought in
staging area...but this is not mandatory.
Different Types of Tracing Levels In Informatica

The tracing levels can be configured at the transformation And/OR session level in informatica. There are 4
different types of tracing levels. The different types of tracing levels are listed below:

Tracing levels:

None: Applicable only at session level. The Integration Service uses the tracing levels configured in the
mapping.

Terse: logs initialization information, error messages, and notification of rejected data in the session log
file.

Normal: Integration Service logs initialization and status information, errors encountered and skipped
rows due to transformation row errors. Summarizes session results, but not at the level of
individual rows.

Verbose Initialization: In addition to normal tracing, the Integration Service logs additional
initialization details; names of index and data files used, and detailed transformation statistics.

Verbose Data:In addition to verbose initialization tracing, the Integration Service logs each row that
passes into the mapping. Also notes where the Integration Service truncates string data to fit the
precision of a column and provides detailed transformation statistics. When you configure the
tracing level to verbose data, the Integration Service writes row data for all rows in a block when it
processes a transformation.

Configuring Tracing Levels:

At transformation level, you can find the Tracing Level option in the properties tab. Select the required
tracing level. At session level, go to the Config Object tag and select the option Override Tracing.

Selecting Tracing Levels:

The question is which tracing level to use. If you want to debug the session, use the verbose data as it
provides complete information about session run. However do not use this in tracing level in production.
Because it will cause some performance issue as the integration service writes the complete information
into the session log file.

In production use the normal tracing level. The normal tracing level is enough to identify most of the errors
when a session fails. As the integration service writes less amount of data in the session log file, this tracing
level wont cause any performance issue.

what is Data Validation.how to explain it when asked


and how do we do date validation and number validation when data coming from files

ANS)There are many ways to perform data validation.Once ETL is completed business team validates
the data.They either manually query the critical fields and check whether implemented business
logic has met the requirements.In market you have off the shelf by which you can do data
validation example..Informatica data validation tool (DVO).
Business team will provide all the valid business dates. If the date is invalid then you will be defaulting
the values.This help us to see if you have any invalid dates in which are coming from your source
system and also this help them to take it forward to discuss with the data goverance team to fix at
the source system level.

hai friends....... i have One Query please Help this...


I have sata in student table.....like this
name no marks subject
p 100 77 M
s 200 44 M
p 100 66 S
s 200 86 S
p 100 65 E
S 200 56 E
I want Output Like this....

Name no Maths Science English


P 100 77 66 65
S 200 44 86 56
ANS:)actually this is reverse of normalizer:
after sorter, in agg, do a group by on name, and 5 out ports
oname- name
o_no- no
o_maths-marks where subject='M' (agg expression)... and so on
1. srtr-->(sort the src data based on name followed by subject) 2.Agg--->now based on grp by name and the
o/p col of sub are populate as >> maths:max(marks,subject='maths') science:max(marks,subject='science')
english:max(marks,subject='english') 3. connect to tgt ....

Oracle Query: select name,no,max(case when subject='M' then marks end) Maths,max(case when
subject='S' then marks end) science,
max(case when subject='E' then marks end) English from students group by name,no order by name;

OR

select * from students pivot(max(marks) for subject in('M' as MATHS,'S' as Science,'E' as English));

Which one is not a target Option for file on the Servers?


a)FTP
b)Loader
c)MQ
d)ERP

MQ and FTP , I am sure are there for target files..I am not sure about Loader and ERP, and should be out of
two..I need to check on that ..vl update you

I presume Loader & MQ

Which one need a common key to join?


a)joiner
b)source qualifier

Both joiner and SQ also needs common key or any relation between the SD that are joined, else mapping
will be invalid

1)How to Change SQL override Dynamically in source qualifier tfn?


---I want to pass through Parameter file i want to change the query dynamically.

ANS)Mapping parameter = $$SQL


In sql override
Select a,b,c, $$SQL
In param file
$$SQL = d,e,f,g,h from table A
Note: Dont break the statement in the param file which will bring the linefeed in the query will cause
failure

2)How to Parameterized the Entire Filter condition in Filter Tfn?


---i mean first time i pass the condition as DEPTNO=$$DNO
But now i want to pass condition as JOB=$$JOB
ANS)SQ - Select $$column1,$$column2,$$column3.....from $$TABLE_OR_JOIN WHERE and
START_DATE='$$MY_BUSINESS_DATE' $$ FILTER

What is meant by persistent cache?


Can we use Aggand Exp transformations parallelly in the mapping coming from SQ? If so Why?

How many return ports max I can use in unconnected lookup transformation?

If I want to use more than 1 return port in unconnected lookup what is the best way to follow?

Data cache and index cache using in joiner and lookup transformations?

If I change the data type in SQ of any mapping what kind of error I may get?

Have you used parameter file concept? If so where the parameter file will be saved and how it will be
accessed?

What is the syntax of pmcmd?


Normalize transformation behavior?

Can we use alias names in where condition in SQL query? If I use what will happen?

What is the difference between parameters and variables?

What is target load plan and target load order in informatica?

Without any condition if I used Agg transformation what is the output from the transformation and how
many rows it will through?

These Qs asked me in TCS:


Guys please posts your company interview Qs too in this forum so that we can help out each one here

The Qs :-

#1
a
b
a
b
c
d

tar1
a
b
c
d

tar2
a
b

#2
Can we update the target table without update strategy??
what need to be done to target property??or simply session level property will do??

#3
what are the types of Lookups??what is difference between static n dynamic cache??

#4
Ask me about my project..how you go through with SDLC ..in my first support project how many incidents
you handled etc.

#5
what happen if we dont sort the data before aggregator,is it mandatory if not done session fail???

What is Informatica Parameter Files.


What is scd type-2. explain implementation ?
How to validate a flat file using date field ?
What is unit test case?
How to find unique records from flat file?
My source have 40 records loaded in target 4 wrong data. what will be the Manual test steps?
what is Materialized View?
what is Push Down Optimization ?
Performance tuning of mapping ?
Write a sql query to display those employees who are working under manager.
Write a sql query to display name of those employee who are getting the highest salary in their
department?
And
Implements above question in informatica.

SESSION Partitions

Pass Through
Round Robin
Key Range
Hash auto key
Hash User key

RoundRobin: splits the rows equally across all partitions

Key Range: Based on the port specified the integration service splits the rows

Hash auto key: The IS uses the sorted ports and group by ports to generate auto keys

Hash user keys: the user need to specify the porton which hash function should be applied to group the
rows

Pass through : all rows are passed from each partition.

1. Difference between Informatica 7x and 8x?


2. Difference between connected and unconnected lookup transformation in Informatica?
3. Difference between stop and abort in Informatica?
4. Difference between Static and Dynamic caches?
5. What is Persistent Lookup cache? What is its significance?
6. Difference between and reusable transformation and mapplet?
7. How the Informatica server sorts the string values in Rank transformation?
8. Is sorter an active or passive transformation? When do we consider it to be active and passive?
9. Explain about Informatica server Architecture?
10. In update strategy Relational table or flat file which gives us more performance? Why?
11. What are the out put files that the Informatica server creates during running a session?
12. Can you explain what are error tables in Informatica are and how we do error handling in
Informatica?
13. Difference between constraint base loading and target load plan?
14. Difference between IIF and DECODE function?
15. How to import oracle sequence into Informatica?
16. What is parameter file?
17. Difference between Normal load and Bulk load?
18. How u will create header and footer in target using Informatica?
19. What are the session parameters?
20. Where does Informatica store rejected data? How do we view them?
21. What is difference between partitioning of relational target and file targets?
22. What are mapping parameters and variables in which situation we can use them?
23. What do you mean by direct loading and Indirect loading in session properties?
24. How do we implement recovery strategy while running concurrent batches?
25. Explain the versioning concept in Informatica?
26. What is Data driven?
27.What is batch? Explain the types of the batches?

28.What are the types of meta data repository stores?

29.Can you use the mapping parameters or variables created in one mapping into another mapping?

30.Why did we use stored procedure in our ETL Application?

31.When we can join tables at the Source qualifier itself, why do we go for joiner transformation?

32.What is the default join operation performed by the look up transformation?

33.What is hash table Informatica?

34.In a joiner transformation, you should specify the table with lesser rows as the master table. Why?

35.Difference between Cached lookup and Un-cached lookup?

36.Explain what DTM does when you start a work flow?

37.Explain what Load Manager does when you start a work flow?

38.In a Sequential batch how do i stop one particular session from running?

39.What are the types of the aggregations available in Informatica?

40.How do I create Indexes after the load process is done?

41.How do we improve the performance of the aggregator transformation?

42.What are the different types of the caches available in Informatica? Explain in detail?

43.What is polling?

44.What are the limitations of the joiner transformation?

45.What is Mapplet?

46.What are active and passive transformations?


47.What are the options in the target session of update strategy transformation?

48.What is a code page? Explain the types of the code pages?

49.What do you mean rank cache?

50.How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other ways using
lookup delete the duplicate rows?
51.Can u copy the session in to a different folder or repository?
52.What is tracing level and what are its types?

53.What is a command that used to run a batch?

54.What are the unsupported repository objects for a mapplet?

55.If your workflow is running slow, what is your approach towards performance tuning?

56.What are the types of mapping wizards available in Informatica?

57.After dragging the ports of three sources (Sql server, oracle, Informix) to a single source qualifier, can we
map these three ports directly to target?

58.Why we use stored procedure transformation?

59.Which object is required by the debugger to create a valid debug session?

60.Can we use an active transformation after update strategy transformation?

61.Explain how we set the update strategy transformation at the mapping level and at the session level?

62.What is exact use of 'Online' and 'Offline' server connect Options while defining Work flow in Work
flow monitor? The system hangs when 'Online' Server connect option. The Informatica is installed on a
Personal laptop.

63.What is change data capture?

64.Write a session parameter file which will change the source and targets for every session. i.e different
source and targets for each session run ?

65.What are partition points?

66.What are the different threads in DTM process?

67.Can we do ranking on two ports? If yes explain how?

68.What is Transformation?

69.What does stored procedure transformation do in special as compared to other transformation?

70.How do you recognize whether the newly added rows got inserted or updated?

71.What is data cleansing?


72.My flat file’s size is 400 MB and I want to see the data inside the FF with out opening it? How do I do
that?

73.Difference between Filter and Router?

74.How do you handle the decimal places when you are importing the flat file?

75.What is the difference between $ & $$ in mapping or parameter file? In which case they are generally
used?
76.While importing the relational source definition from database, what are the meta data of source U
import?
77.Difference between Power mart & Power Center?

78.What kinds of sources and of targets can be used in Informatica?

79.If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses the
NEXTVAL port, what value will each target get?

80.What do you mean by SQL override?

81.What is a shortcut in Informatica?

82.How does Informatica do variable initialization? Number/String/Date

83.How many different locks are available for repository objects

84.What are the transformations that use cache for performance?

85.What is the use of Forward/Reject rows in Mapping?

86.How many ways you can filter the records?

87.How to delete duplicate records from source database/Flat Files? Can we use post sql to delete these
records. In case of flat file, how can you delete duplicates before it starts loading?

88.You are required to perform “bulk loading” using Informatica on Oracle, what action would perform at
Informatica + Oracle level for a successful load?

89.What precautions do you need take when you use reusable Sequence generator transformation for
concurrent sessions?

90.Is it possible negative increment in Sequence Generator? If yes, how would you accomplish it?

91.Which directory Informatica looks for parameter file and what happens if it is missing when start the
session? Does session stop after it starts?

92.Informatica is complaining about the server could not be reached? What steps would you take?

93.You have more five mappings use the same lookup. How can you manage the lookup?

94.What will happen if you copy the mapping from one repository to another repository and if there is no
identical source?
95.How can you limit number of running sessions in a workflow?

96.An Aggregate transformation has 4 ports (l sum (col 1), group by col 2, col3), which port should be the
output?

97.What is a dynamic lookup and what is the significance of NewLookupRow? How will use them for
rejecting duplicate records?

98.If you have more than one pipeline in your mapping how will change the order of load?

99.When you export a workflow from Repository Manager, what does this xml contain? Workflow only?

100. Your session failed and when you try to open a log file, it complains that the session details are not
available. How would do trace the error? What log file would you seek for?

101.You want to attach a file as an email attachment from a particular directory using ‘email task’ in
Informatica, How will you do it?

102. You have a requirement to alert you of any long running sessions in your workflow. How can you
create a workflow that will send you email for sessions running more than 30 minutes. You can use any
method, shell script, procedure or Informatica mapping or workflow control?

Scenario 1: How can we load first and last record from a flat file source to target?

Solution:

Create two pipelines in a mapping

1st pipeline would capture the first record, and 2nd one for last record.

1st Pipeline:

src-> sq-> exp(take a variable port with numeric data type and pass through a output port 'O_Test')-
>filter(pass if only

O_Test =1)->tgt

2nd pipeline:

src->sq->agg(No group it will pass only last entry)->tgt

In session for 2nd instance of target enable 'Append if Exists' option

Scenario 2: How to find out nth row in flat file...we used to do top N analysis by using rownum & some
other functionalities by using rowid when source is table .and my query is how to achieve the same
functionalities when my source is flat file?

Solution: In the Mapping designer, go to Mappings-> Parameters and Variables.

Here we have two things - Parameters(constant values passed to the mapping) and variables which are
dynamic and can be stored as a metadata for future runs(for example you want to do an incremental load
into a table B from table A. So you can define a variable which holds the seqid from source. Before you
write the data into target , create an expression and source the seqid from source as input and create a
variable Max_seqid as output. Now update this value for each row. when the session finishes informatica
saves the last read seqid and you can use this in your source qualifier when you run the mapping next
time. Please see Infa doc for setmaxvaribale and setminvariables.

In this case, we have to just make use parameters to find the nth row.

Create a parameter(type) - Last_row_number and select datatype as integer or double.

Now you have to create a parameter file on unix box before you call the workflow.

something like this

echoe'[<FOLDERNAME>.WF:<WorkflowName>.ST:<SessionName>]'

count=`wc -l filename`

echo "\$\$MappingVariable="$count

Name the parameter file as <workflowname>.par and copy the complete path of the file name and update
the "Parameter filename" field under Properties tab in workflow edit tasks.

You can then use this variable in your mapping wherever you want. Just proceed it with two $$.

Scenario 3: How to create flat file dynamically?

SRC FILE TRGT 1 Trgt 2

---------------- -------------------------------- --------------------------------------

Eid Name Sal Eid Name Sal Eid Name Sal

10 a 100 10 a 100 20 b 100

20 b 100 10 c 200 20 d 300

10 c 200

20 d 300

Solution :

1. Sort the data coming from the source based on EID.

2. Create a variable in an expression transformation that would track the change in EID e.g. in your case if
the data is sorted based on EID then it would look like

EID Name SAL

10 a 100

10 c 200
20 b 100

20 d 300

Whenever there is a change in EID the variable would track it

variable1= IIF(EID = PREV_EID, 0, 1)

3. Add a transaction control transformation in the map with a similar condition

IIF(variable1 = 1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)

this would create a new file whenever there is a change in the EID value.

4. Add a "filename" port in the target and then pass on a value as per your requirement so that the
filenames get generated dynamically as per your requirement.

Scenario 4: I HAVE A SOURCE FILE CONTAINING

1|A,1|B,1|C,1|D,2|A,2|B,3|A,3|B

AND IN TARGET I SHOULD GET LIKE

1|A+B+C+D

2|A+B

3|A+B

Solution:

Follow the logic given below in the expression and you will get your output.

Please ensure that all the ports you mentioned below are variable ports and the incoming data should be
sorted by key,data

VarPorts Assigned Row1 Row2 Row3 Row4

V_CURNT KEY 1 1 1 2

V_CURNT_DA
DATA a b c d
TA

IIF(isnull(v_PREV_DATA) or
v_OUT length(v_PREV_DATA)
=5,v_CURNT_DATA,iif(V_CURNT = a a~b a~b~c d
(variable port) V_PREV, V_PREV_DATA||'~'||
V_CURNT_DATA,NULL)

o_OUT v_OUT a a~b a~b~c d

V_PREV V_CURNT null 1 1 1

V_PREV_DATAv_OUT null a a~b a~b~c


And After the expression transformation, you have to add an aggregator tx with group by port as 'key'. this
will return the last record with the key.

Hi all the above scenario’s have been taken from informatica communities.Incase any one needs any info
about the scenarios discussed then they may contact for clarifications.

Scenario 1: My source files as 5, 00,000 records. While fetching its skipping records due to data type and
other issues. Finally it fetches only 1, 00,000 records. Through session properties we are considering 100000
as the source count.
But actually we are loosing 400000 records. How can I find the number or records that are skipped?
Solution: OPB_SESS_TASK_LOG there is a count for SRC_FAILED_ROWS

Scenario 2: Please provide different ways to achieve this..


How to normalize the following data:

id date

101 2/4/2008

101 4/4/2008

102 6/4/2008

102 4/4/2008

103 4/4/2008

104 8/4/2008

O/P - should have only one id with the min(date)

How to create a mapping for this ?

1 ---> Using Transformations

2 ---> Using SQ Override method..

Solution: You can use the rank transformation and select rank port for id and group by on date. In
properties tab select bottom and number of ranks as '1' .

Scenario 3: My scenario is like I am loading records on a daily basis, target is not a truncate load suppose
from source I am loading records like

ID|Name

101|Apple

102|Orange

102|Banana

but in target I am already having target record(ID 102 of 10 records), scenario I need is like I have to delete
only Empid 102 of yesterday record and load today record(2 records)

How to achieve this in Informatica?

Solution: You can achieve your goal by taking the Look up on your target table and match on the basis of
ID column. Then take an expression after your lookup and add a FLAG column. In that FLAG column
check for the NULL value return from Look up. After expression take 2 filters and in one filter pass the
records with NULL values and Insert those records into Target.

If the Value is not NULL then you can take a UPDATE strategy and Update the old row with the new one.

Scenario 4:

Informatica Sequence Generator Scenarios

1) I have following set of records

Id | Name

101 | ABC

102 | DEF

101 | AMERICA

103 | AFRICA

102 | JAPAN

103 | CHINA

I need to generate sequence and populate in Target in the following manner

SID | ID | NAME

1 |101 | ABC

2 |101 | AMERICA

1 |102 | DEF

2 |102 | JAPAN

1 |103 | AFRICA

2 |103 | CHINA

How to implement the same in Informatica?

Solution:

1 sort on Id

2 use expression t/f like below


V_cur_id -> v_pre_id

V_pre_id -> i_id -- i_id is input

V_seq_id -> iif(v_cur_id = v_pre_id, v_seq_id+1, 1) --default 0

O_seq_id -> v_seq_id.

Scenario 5 : I have my input as below : ( Using Expression)

10

10

10

20

20

30

O/P :

How do we obtain this using the Informatica?

Solution: first import source, then use a sorter transformation . sort it by ur column, then use a expression.

in expression make this column .

like this

1.column_num(coming from sorter)

2.current_num= check if column_num=previous_num,then add (first_value +1),else 1

3.first_value=current_num.

4.previous_num(new column)= column_num

pass current_num to target.


Below image is commands for backup.

Below is shell script to automate the backup file.


wrie a shell script for taking a backup to automate the process

name---backup.sh

echo 'taking a domain backup'


infasetup.sh backupdamin -da tsorallg:1521 -du usernameofschema -dp pwdofscehmea -ds
servicename(ORCL) -dt typeofDB(Oracle) -df domainbacupfilename -dn domainname -f
echo 'connecting rep'
cd c:\informatica\9.5.1\server\bin
pmrep connect -r repname -d domainname -n username -x pwd
sleep 2
echo 'taking bacup for repository'
pmrep backup -o c:\path\repbackup.rep -b -f

INTERVIEW Questions from TechMahindra.


diff b/w table and view?
diff b/t Truncate and Delete?
on Indexes and Pk and Fk?
Informatica:
diff b/w joiner T/R and SQ T/R? why we need go for these two to join sources?
what is Update Strategy T/R, How it works in Session Level?
what is mapplet? how it works?

Informatica Interview Questions


1. While importing the relational source definition from the database, what are the metadata of source that
will be imported? The metadata of the source that will be imported are:
Source name
Database location
Column names
Data type’s
Key constraints
2. How many ways a relational source definition can be updated and what are they?
There are two ways to update the relational source definition:
Edit the definition
Re-import the definition
3. To import the flat file definition into the designer where should the flat file be placed?
Place the flat file in local folder in the local machine
4. To provide support for Mainframes source data, which files are used as a source definitions?
COBOL files
5. Which transformation is needed while using the cobol sources as source definitions?
As cobol sources consists of denormalized data, normalizer transformation is required to normalize
the data.
6. How to create or import flat file definition in to the warehouse designer?
We cannot create or import flat file definition into warehouse designer directly. We can create or
import the file in source analyzer and then drag it into the warehouse designer.
7. What is a mapplet?
A mapplet is a set of transformations that you build in the mapplet designer and can be used in
multiple mappings.
8. What is a transformation?
It is a repository object that generates, modifies or passes data.
9. What are the designer tools for creating transformations?
Mapping designer
Transformation developer
Mapplet designer
10. What are active and passive transformations?
An active transformation can change the number of rows that pass through it. A passive
transformation does not change the number of rows that pass through it.
11. What are connected or unconnected transformations?
An unconnected transformation is not connected to other transformations in the mapping. Connected
transformation is connected to other transformations in the mapping pipeline.
12. How many ways are there to create ports?
There are two ways to create the ports:
Drag the port from another transformation
Click the add button on the ports tab.
13. What are the reusable transformations?
Reusable transformations can be used in multiple mappings and mapplets. When you need to include this
transformation into a mapping or a mapplet, an instance of it is dragged into the mapping or mapplet.
Since, the instance of reusable transformation is a pointer to that transformation, any change in the
reusable transformation will be inherited by all the instances.
14. What are the methods for creating reusable transformations?
Two methods:
Design it in the transformation developer.
Promote a standard transformation (Non reusable) from the mapping designer. After adding a
transformation to the mapping, we can promote it to the status of reusable transformation.
15. What are the unsupported repository objects for a mapplet?
COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations.
Pre or post session stored procedures
Target definitions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source definitions
16. What are the mapping parameters and mapping variables?
Mapping parameter represents a constant value which is defined before running a session. A mapping
parameter retains the same value throughout the entire session. A parameter can be declared either in a
mapping or mapplet and can have a default value. We can specify the value of the parameter in the
parameter file and the session reads the parameter value from the parameter file.
Unlike a mapping parameter, a mapping variable represents can change throughout the session. The
informatica server saves the value of mapping variable in the repository at the end of session run and uses
that value next time when the session runs.
17. Can we use the mapping parameters or variables created in one mapping into another mapping?
NO. We can use the mapping parameters or variables only in the transformations of the same mapping or
mapplet in which we have created the mapping parameters or variables.
18. Can we use the mapping parameters or variables created in one mapping into any other reusable
transformation?
Yes. As an instance of the reusable transformation created in the mapping belongs to that mapping only.
19. How can we improve session performance in aggregator transformation?
Use sorted input. Sort the input on the ports which are specified as group by ports in aggregator.

20. What is aggregate cache in aggregator transformation?


The aggregator stores data in the aggregate cache until it completes aggregate calculations. When we run a
session that uses an aggregator transformation, the informatica server creates index and data caches in
memory to process the transformation. If the informatica server requires more space, it stores overflow
values in cache files.

1. What are the differences between joiner transformation and source qualifier transformation?
A joiner transformation can join heterogeneous data sources where as a source qualifier can join only
homogeneous sources. Source qualifier transformation can join data from only relational sources but
cannot join flat files.
2. What are the limitations of joiner transformation?
Both pipelines begin with the same original data source.
Both input pipelines originate from the same Source Qualifier transformation.
Both input pipelines originate from the same Normalizer transformation.
Both input pipelines originate from the same Joiner transformation.
Either input pipelines contains an Update Strategy transformation.
Either input pipelines contains a connected or unconnected Sequence Generator transformation.
3. What are the settings that you use to configure the joiner transformation?The following settings are used
to configure the joiner transformation.
Master and detail source
Type of join
Condition of the join
4. What are the join types in joiner transformation?
The join types are
Normal (Default)
Master outer
Detail outer
Full outer
5. What are the joiner caches?
When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the
master source and builds index and data caches based on the master rows. After building the caches, the
Joiner transformation reads records from the detail source and performs joins.
6. What is the look up transformation?
Lookup transformation is used to lookup data in a relational table, view and synonym. Informatica server
queries the look up table based on the lookup ports in the transformation. It compares the lookup
transformation port values to lookup table column values based on the look up condition.
7. Why use the lookup transformation?
Lookup transformation is used to perform the following tasks.
Get a related value.
Perform a calculation.
Update slowly changing dimension tables.
8. What are the types of lookup transformation?
The types of lookup transformation are Connected and unconnected.
9. What is meant by lookup caches?
The informatica server builds a cache in memory when it processes the first row of a data in a cached look
up transformation. It allocates memory for the cache based on the amount you configure in the
transformation or session properties. The informatica server stores condition values in the index cache and
output values in the data cache.

10. What are the types of lookup caches?


Persistent cache: You can save the lookup cache files and reuse them the next time the informatica server
processes a lookup transformation configured to use the cache.
Re-cache from database: If the persistent cache is not synchronized with the lookup table, you can
configure the lookup transformation to rebuild the lookup cache.
Static cache: you can configure a static or read only cache for only lookup table. By default informatica
server creates a static cache. It caches the lookup table and lookup values in the cache for each row that
comes into the transformation. When the lookup condition is true, the informatica server does not update
the cache while it processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the target, you can
create a look up transformation to use dynamic cache. The informatica server dynamically inserts data to
the target table.
Shared cache: You can share the lookup cache between multiple transactions. You can share unnamed
cache between transformations in the same mapping.
11. Which transformation should we use to normalize the COBOL and relational sources?
Normalizer Transformation is used to normalize the data.
12. In which transformation you cannot drag ports into it?
Normalizer Transformation.

13. How the informatica server sorts the string values in Rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using Binary sort
order. If you configure the session to use a binary sort order, the informatica server calculates the binary
value of each string and returns the specified number of rows with the highest binary values for the string.

14. What are the rank caches?


During the session, the informatica server compares an input row with rows in the data cache. If the input
row out-ranks a stored row, the informatica server replaces the stored row with the input row. The
informatica server stores group information in an index cache and row data in a data cache.

15. What is the Rankindex port in Rank transformation?


The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica
Server uses the Rank Index port to store the ranking position for each record in a group.

16. What is the Router transformation?


A Router transformation is similar to a Filter transformation because both transformations allow you to use
a condition to test data. However, a Filter transformation tests data for one condition and drops the rows of
data that do not meet the condition. A Router transformation tests data for one or more conditions and
gives you the option to route rows of data that do not meet any of the conditions to a default output group.
If you need to test the same input data based on multiple conditions, use a Router Transformation in a
mapping instead of creating multiple Filter transformations to perform the same task.

17. What are the types of groups in Router transformation?


The different types of groups in router transformation are
Input group
Output group
The output group contains two types. They are
User defined groups
Default group
18. What are the types of data that passes between informatica server and stored procedure?
Three types of data passes between the informatica server and stored procedure.
Input/Output parameters
Return Values
Status code.

19. What is the status code in stored procedure transformation?


Status code provides error handling for the informatica server during the session. The stored procedure
issues a status code that notifies whether or not stored procedure completed successfully. This value cannot
seen by the user. It only used by the informatica server to determine whether to continue running the
session or stop.

20. What is the target load order?


You can specify the target load order based on source qualifiers in a mapping. If you have the multiple
source qualifiers connected to the multiple targets, you can designate the order in which informatica server
loads data into the targets.
Informatica Interview Questions
1. What is polling?
Polling displays the updated information about the session in the monitor window. The monitor window
displays the status of each session when you poll the informatica server.
2. In which circumstances, informatica server creates Reject files?
When the informatica server encounters the DD_Reject in update strategy transformation, violates the
database constraints, filed in the rows were truncated or overflowed.

3. What are the data movement modes in informatica?


Data movement mode determines how informatica server handles the character data. You can choose the
data movement mode in the informatica server configuration settings. Two types of data movement modes
are available in informatica. They are ASCII mode and Unicode mode.

4. Define mapping and session?


Mapping: It is a set of source and target definitions linked by transformation objects that define the rules
for transformation.
Session: It is a set of instructions that describe how and when to move data from source to targets.
5. Can u generate reports in Informatica?
Yes. By using Metadata reporter we can generate reports in informatica.

6. What is metadata reporter?


It is a web based application that enables you to run reports against repository metadata. With a metadata
reporter, you can access information about the repository without having knowledge of SQL,
transformation language or underlying tables in the repository.
7. What is the default source option for update strategy transformation?
Data driven.

8. What is Data driven?


The informatica server follows the instructions coded in the update strategy transformations with in the
mapping and determines how to flag the records for insert, update, delete or reject. If you do not choose
data driven option setting, the informatica server ignores all update strategy transformations in the
mapping.

9. What is source qualifier transformation?


When you add a relational or a flat file source definition to a mapping, you need to connect it to a source
qualifier transformation. The source qualifier transformation represents the records that the informatica
server reads when it runs a session.

10. What are the tasks that source qualifier perform?


Joins the data originating from same source data base.
Filter records when the informatica server reads source data.
Specify an outer join rather than the default inner join
specify sorted records.
Select only distinct values from the source.
Create custom query to issue a special SELECT statement for the informatica server to read the source data.
11. What is the default join that source qualifier provides?
Equi Join

12. What are the basic requirements to join two sources in a source qualifier transformation using default
join?
The two sources should have primary key and foreign key relationship.
The two sources should have matching data types
SQL Queries Interview Questions

1. Write a query to generate sequence numbers from 1 to the specified number N?


Solution:

SELECT LEVEL FROM DUAL CONNECT BY LEVEL<=&N;

2. Write a query to display only friday dates from Jan, 2000 to till now?
Solution:

SELECT C_DATE,
TO_CHAR(C_DATE,'DY')
FROM
(
SELECT TO_DATE('01-JAN-2000','DD-MON-YYYY')+LEVEL-1 C_DATE
FROM DUAL
CONNECT BY LEVEL <=
(SYSDATE - TO_DATE('01-JAN-2000','DD-MON-YYYY')+1)
)
WHERE TO_CHAR(C_DATE,'DY') = 'FRI';

3. Write a query to duplicate each row based on the value in the repeat column? The input table data looks
like as below

Products, Repeat
----------------
A, 3
B, 5
C, 2

Now in the output data, the product A should be repeated 3 times, B should be repeated 5 times and C
should be repeated 2 times. The output will look like as below

Products, Repeat
----------------
A, 3
A, 3
A, 3
B, 5
B, 5
B, 5
B, 5
B, 5
C, 2
C, 2

Solution:

SELECT PRODUCTS,
REPEAT
FROM T,
( SELECT LEVEL L FROM DUAL
CONNECT BY LEVEL <= (SELECT MAX(REPEAT) FROM T)
)A
WHERE T.REPEAT >= A.L
ORDER BY T.PRODUCTS;

4. Write a query to display each letter of the word "SMILE" in a separate row?

S
M
I
L
E

Solution:

SELECT SUBSTR('SMILE',LEVEL,1) A
FROM DUAL
CONNECT BY LEVEL <=LENGTH('SMILE');

5. Convert the string "SMILE" to Ascii values? The output should look like as 83,77,73,76,69. Where 83 is the
ascii value of S and so on.
The ASCII function will give ascii value for only one character. If you pass a string to the ascii function, it
will give the ascii value of first letter in the string. Here i am providing two solutions to get the ascii values
of string.

Solution1:

SELECT SUBSTR(DUMP('SMILE'),15)
FROM DUAL;

Solution2:

SELECT WM_CONCAT(A)
FROM
(
SELECT ASCII(SUBSTR('SMILE',LEVEL,1)) A
FROM DUAL
CONNECT BY LEVEL <=LENGTH('SMILE')
);
SQL Queries Interview Questions - Oracle Part 1

To solve these interview questions on SQL queries you have to create the products, sales tables in your
oracle database. The "Create Table", "Insert" statements are provided below.

CREATE TABLE PRODUCTS


(
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30)
);
CREATE TABLE SALES
(
SALE_ID INTEGER,
PRODUCT_ID INTEGER,
YEAR INTEGER,
Quantity INTEGER,
PRICE INTEGER
);

INSERT INTO PRODUCTS VALUES ( 100, 'Nokia');


INSERT INTO PRODUCTS VALUES ( 200, 'IPhone');
INSERT INTO PRODUCTS VALUES ( 300, 'Samsung');
INSERT INTO PRODUCTS VALUES ( 400, 'LG');

INSERT INTO SALES VALUES ( 1, 100, 2010, 25, 5000);


INSERT INTO SALES VALUES ( 2, 100, 2011, 16, 5000);
INSERT INTO SALES VALUES ( 3, 100, 2012, 8, 5000);
INSERT INTO SALES VALUES ( 4, 200, 2010, 10, 9000);
INSERT INTO SALES VALUES ( 5, 200, 2011, 15, 9000);
INSERT INTO SALES VALUES ( 6, 200, 2012, 20, 9000);
INSERT INTO SALES VALUES ( 7, 300, 2010, 20, 7000);
INSERT INTO SALES VALUES ( 8, 300, 2011, 18, 7000);
INSERT INTO SALES VALUES ( 9, 300, 2012, 20, 7000);
COMMIT;

The products table contains the below data.

SELECT * FROM PRODUCTS;

PRODUCT_ID PRODUCT_NAME
-----------------------
100 Nokia
200 IPhone
300 Samsung

The sales table contains the following data.

SELECT * FROM SALES;

SALE_ID PRODUCT_ID YEAR QUANTITY PRICE


--------------------------------------
1 100 2010 25 5000
2 100 2011 16 5000
3 100 2012 8 5000
4 200 2010 10 9000
5 200 2011 15 9000
6 200 2012 20 9000
7 300 2010 20 7000
8 300 2011 18 7000
9 300 2012 20 7000
Here Quantity is the number of products sold in each year. Price is the sale price of each product.

I hope you have created the tables in your oracle database. Now try to solve the below SQL queries.

1. Write a SQL query to find the products which have continuous increase in sales every year?

Solution:

Here “Iphone” is the only product whose sales are increasing every year.

STEP1: First we will get the previous year sales for each product. The SQL query to do this is

SELECT P.PRODUCT_NAME,
S.YEAR,
S.QUANTITY,
LEAD(S.QUANTITY,1,0) OVER (
PARTITION BY P.PRODUCT_ID
ORDER BY S.YEAR DESC
) QUAN_PREV_YEAR
FROM PRODUCTS P,
SALES S
WHERE P.PRODUCT_ID = S.PRODUCT_ID;

PRODUCT_NAME YEAR QUANTITY QUAN_PREV_YEAR


-----------------------------------------
Nokia 2012 8 16
Nokia 2011 16 25
Nokia 2010 25 0
IPhone 2012 20 15
IPhone 2011 15 10
IPhone 2010 10 0
Samsung 2012 20 18
Samsung 2011 18 20
Samsung 2010 20 0

Here the lead analytic function will get the quantity of a product in its previous year.

STEP2: We will find the difference between the quantities of a product with its previous year’s quantity. If
this difference is greater than or equal to zero for all the rows, then the product is a constantly increasing in
sales. The final query to get the required result is

SELECT PRODUCT_NAME
FROM
(
SELECT P.PRODUCT_NAME,
S.QUANTITY -
LEAD(S.QUANTITY,1,0) OVER (
PARTITION BY P.PRODUCT_ID
ORDER BY S.YEAR DESC
) QUAN_DIFF
FROM PRODUCTS P,
SALES S
WHERE P.PRODUCT_ID = S.PRODUCT_ID
)A
GROUP BY PRODUCT_NAME
HAVING MIN(QUAN_DIFF) >= 0;

PRODUCT_NAME
------------
IPhone

2. Write a SQL query to find the products which does not have sales at all?

Solution:

“LG” is the only product which does not have sales at all. This can be achieved in three ways.

Method1: Using left outer join.

SELECT P.PRODUCT_NAME
FROM PRODUCTS P
LEFT OUTER JOIN
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID);
WHERE S.QUANTITY IS NULL

PRODUCT_NAME
------------
LG

Method2: Using the NOT IN operator.

SELECT P.PRODUCT_NAME
FROM PRODUCTS P
WHERE P.PRODUCT_ID NOT IN
(SELECT DISTINCT PRODUCT_ID FROM SALES);

PRODUCT_NAME
------------
LG

Method3: Using the NOT EXISTS operator.

SELECT P.PRODUCT_NAME
FROM PRODUCTS P
WHERE NOT EXISTS
(SELECT 1 FROM SALES S WHERE S.PRODUCT_ID = P.PRODUCT_ID);

PRODUCT_NAME
------------
LG

3. Write a SQL query to find the products whose sales decreased in 2012 compared to 2011?

Solution:

Here Nokia is the only product whose sales decreased in year 2012 when compared with the sales in the
year 2011. The SQL query to get the required output is

SELECT P.PRODUCT_NAME
FROM PRODUCTS P,
SALES S_2012,
SALES S_2011
WHERE P.PRODUCT_ID = S_2012.PRODUCT_ID
AND S_2012.YEAR = 2012
AND S_2011.YEAR = 2011
AND S_2012.PRODUCT_ID = S_2011.PRODUCT_ID
AND S_2012.QUANTITY < S_2011.QUANTITY;

PRODUCT_NAME
------------
Nokia

4. Write a query to select the top product sold in each year?

Solution:

Nokia is the top product sold in the year 2010. Similarly, Samsung in 2011 and IPhone, Samsung in 2012.
The query for this is

SELECT PRODUCT_NAME,
YEAR
FROM
(
SELECT P.PRODUCT_NAME,
S.YEAR,
RANK() OVER (
PARTITION BY S.YEAR
ORDER BY S.QUANTITY DESC
) RNK
FROM PRODUCTS P,
SALES S
WHERE P.PRODUCT_ID = S.PRODUCT_ID
)A
WHERE RNK = 1;

PRODUCT_NAME YEAR
--------------------
Nokia 2010
Samsung 2011
IPhone 2012
Samsung 2012

5. Write a query to find the total sales of each product.?

Solution:

This is a simple query. You just need to group by the data on PRODUCT_NAME and then find the sum of
sales.

SELECT P.PRODUCT_NAME,
NVL( SUM( S.QUANTITY*S.PRICE ), 0) TOTAL_SALES
FROM PRODUCTS P
LEFT OUTER JOIN
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID)
GROUP BY P.PRODUCT_NAME;

PRODUCT_NAME TOTAL_SALES
---------------------------
LG 0
IPhone 405000
Samsung 406000
Nokia 245000
SQL Queries Interview Questions -
i have used PRODUCTS and SALES tables as an example. Here also i am using the same tables. So, just
take a look at the tables by going through that link and it will be easy for you to understand the questions
mentioned here.
Solve the below examples by writing SQL queries.

1. Write a query to find the products whose quantity sold in a year should be greater than the average
quantity of the product sold across all the years?

Solution:

This can be solved with the help of correlated query. The SQL query for this is

SELECT P.PRODUCT_NAME,
S.YEAR,
S.QUANTITY
FROM PRODUCTS P,
SALES S
WHERE P.PRODUCT_ID = S.PRODUCT_ID
AND S.QUANTITY >
(SELECT AVG(QUANTITY)
FROM SALES S1
WHERE S1.PRODUCT_ID = S.PRODUCT_ID
);

PRODUCT_NAME YEAR QUANTITY


--------------------------
Nokia 2010 25
IPhone 2012 20
Samsung 2012 20
Samsung 2010 20

2. Write a query to compare the products sales of "IPhone" and "Samsung" in each year? The output should
look like as

YEAR IPHONE_QUANT SAM_QUANT IPHONE_PRICE SAM_PRICE


---------------------------------------------------
2010 10 20 9000 7000
2011 15 18 9000 7000
2012 20 20 9000 7000

Solution:

By using self-join SQL query we can get the required result. The required SQL query is

SELECT S_I.YEAR,
S_I.QUANTITY IPHONE_QUANT,
S_S.QUANTITY SAM_QUANT,
S_I.PRICE IPHONE_PRICE,
S_S.PRICE SAM_PRICE
FROM PRODUCTS P_I,
SALES S_I,
PRODUCTS P_S,
SALES S_S
WHERE P_I.PRODUCT_ID = S_I.PRODUCT_ID
AND P_S.PRODUCT_ID = S_S.PRODUCT_ID
AND P_I.PRODUCT_NAME = 'IPhone'
AND P_S.PRODUCT_NAME = 'Samsung'
AND S_I.YEAR = S_S.YEAR

3. Write a query to find the ratios of the sales of a product?

Solution:

The ratio of a product is calculated as the total sales price in a particular year divide by the total sales price
across all years. Oracle provides RATIO_TO_REPORT analytical function for finding the ratios. The SQL
query is

SELECT P.PRODUCT_NAME,
S.YEAR,
RATIO_TO_REPORT(S.QUANTITY*S.PRICE)
OVER(PARTITION BY P.PRODUCT_NAME ) SALES_RATIO
FROM PRODUCTS P,
SALES S
WHERE (P.PRODUCT_ID = S.PRODUCT_ID);

PRODUCT_NAME YEAR RATIO


-----------------------------
IPhone 2011 0.333333333
IPhone 2012 0.444444444
IPhone 2010 0.222222222
Nokia 2012 0.163265306
Nokia 2011 0.326530612
Nokia 2010 0.510204082
Samsung 2010 0.344827586
Samsung 2012 0.344827586
Samsung 2011 0.310344828

4. In the SALES table quantity of each product is stored in rows for every year. Now write a query to
transpose the quantity for each product and display it in columns? The output should look like as

PRODUCT_NAME QUAN_2010 QUAN_2011 QUAN_2012


------------------------------------------
IPhone 10 15 20
Samsung 20 18 20
Nokia 25 16 8
Solution:

Oracle 11g provides a pivot function to transpose the row data into column data. The SQL query for this is

SELECT * FROM
(
SELECT P.PRODUCT_NAME,
S.QUANTITY,
S.YEAR
FROM PRODUCTS P,
SALES S
WHERE (P.PRODUCT_ID = S.PRODUCT_ID)
)A
PIVOT ( MAX(QUANTITY) AS QUAN FOR (YEAR) IN (2010,2011,2012));

If you are not running oracle 11g database, then use the below query for transposing the row data into
column data.

SELECT P.PRODUCT_NAME,
MAX(DECODE(S.YEAR,2010, S.QUANTITY)) QUAN_2010,
MAX(DECODE(S.YEAR,2011, S.QUANTITY)) QUAN_2011,
MAX(DECODE(S.YEAR,2012, S.QUANTITY)) QUAN_2012
FROM PRODUCTS P,
SALES S
WHERE (P.PRODUCT_ID = S.PRODUCT_ID)
GROUP BY P.PRODUCT_NAME;

5. Write a query to find the number of products sold in each year?

Solution:

To get this result we have to group by on year and the find the count. The SQL query for this question is

SELECT YEAR,
COUNT(1) NUM_PRODUCTS
FROM SALES
GROUP BY YEAR;

YEAR NUM_PRODUCTS
------------------
2010 3
2011 3
2012 3
SQL Queries Interview Questions - Oracle Part 4
1. Consider the following friends table as the source

Name, Friend_Name
-----------------
sam, ram
sam, vamsi
vamsi, ram
vamsi, jhon
ram, vijay
ram, anand

Here ram and vamsi are friends of sam; ram and jhon are friends of vamsi and so on. Now write a query to
find friends of friends of sam. For sam; ram,jhon,vijay and anand are friends of friends. The output should
look as

Name, Friend_of_Firend
----------------------
sam, ram
sam, jhon
sam, vijay
sam, anand

Solution:

SELECT f1.name,
f2.friend_name as friend_of_friend
FROM friends f1,
friends f2
WHERE f1.name = 'sam'
AND f1.friend_name = f2.name;

2. This is an extension to the problem 1. In the output, you can see ram is displayed as friends of friends.
This is because, ram is mutual friend of sam and vamsi. Now extend the above query to exclude mutual
friends. The outuput should look as

Name, Friend_of_Friend
----------------------
sam, jhon
sam, vijay
sam, anand

Solution:
SELECT f1.name,
f2.friend_name as friend_of_friend
FROM friends f1,
friends f2
WHERE f1.name = 'sam'
AND f1.friend_name = f2.name
AND NOT EXISTS
(SELECT 1 FROM friends f3
WHERE f3.name = f1.name
AND f3.friend_name = f2.friend_name);

3. Write a query to get the top 5 products based on the quantity sold without using the row_number
analytical function? The source data looks as

Products, quantity_sold, year


-----------------------------
A, 200, 2009
B, 155, 2009
C, 455, 2009
D, 620, 2009
E, 135, 2009
F, 390, 2009
G, 999, 2010
H, 810, 2010
I, 910, 2010
J, 109, 2010
L, 260, 2010
M, 580, 2010

Solution:

SELECT products,
quantity_sold,
year
FROM
(
SELECT products,
quantity_sold,
year,
rownum r
from t
ORDER BY quantity_sold DESC
)A
WHERE r <= 5;

4. This is an extension to the problem 3. Write a query to produce the same output using row_number
analytical function?

Solution:

SELECT products,
quantity_sold,
year
FROM
(
SELECT products,
quantity_sold,
year,
row_number() OVER(
ORDER BY quantity_sold DESC) r
from t
)A
WHERE r <= 5;

5. This is an extension to the problem 3. write a query to get the top 5 products in each year based on the
quantity sold?

Solution:

SELECT products,
quantity_sold,
year
FROM
(
SELECT products,
quantity_sold,
year,
row_number() OVER(
PARTITION BY year
ORDER BY quantity_sold DESC) r
from t
)A
WHERE r <= 5;
SQL Query Interview Questions - Part 5
Write SQL queries for the below interview questions:

1. Load the below products table into the target table.

CREATE TABLE PRODUCTS


(
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30)
);

INSERT INTO PRODUCTS VALUES ( 100, 'Nokia');


INSERT INTO PRODUCTS VALUES ( 200, 'IPhone');
INSERT INTO PRODUCTS VALUES ( 300, 'Samsung');
INSERT INTO PRODUCTS VALUES ( 400, 'LG');
INSERT INTO PRODUCTS VALUES ( 500, 'BlackBerry');
INSERT INTO PRODUCTS VALUES ( 600, 'Motorola');
COMMIT;

SELECT * FROM PRODUCTS;

PRODUCT_ID PRODUCT_NAME
-----------------------
100 Nokia
200 IPhone
300 Samsung
400 LG
500 BlackBerry
600 Motorola
The requirements for loading the target table are:

Select only 2 products randomly.

Do not select the products which are already loaded in the target table with in the last 30 days.

Target table should always contain the products loaded in 30 days. It should not contain the products
which are loaded prior to 30 days.

Solution:

First we will create a target table. The target table will have an additional column INSERT_DATE to know
when a product is loaded into the target table. The target
table structure is

CREATE TABLE TGT_PRODUCTS


(
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30),
INSERT_DATE DATE
);
The next step is to pick 5 products randomly and then load into target table. While selecting check whether
the products are there in the

INSERT INTO TGT_PRODUCTS


SELECT PRODUCT_ID,
PRODUCT_NAME,
SYSDATE INSERT_DATE
FROM
(
SELECT PRODUCT_ID,
PRODUCT_NAME
FROM PRODUCTS S
WHERE NOT EXISTS (
SELECT 1
FROM TGT_PRODUCTS T
WHERE T.PRODUCT_ID = S.PRODUCT_ID
)
ORDER BY DBMS_RANDOM.VALUE --Random number generator in oracle.
)A
WHERE ROWNUM <= 2;
The last step is to delete the products from the table which are loaded 30 days back.

DELETE FROM TGT_PRODUCTS


WHERE INSERT_DATE < SYSDATE - 30;
2. Load the below CONTENTS table into the target table.

CREATE TABLE CONTENTS


(
CONTENT_ID INTEGER,
CONTENT_TYPE VARCHAR2(30)
);

INSERT INTO CONTENTS VALUES (1,'MOVIE');


INSERT INTO CONTENTS VALUES (2,'MOVIE');
INSERT INTO CONTENTS VALUES (3,'AUDIO');
INSERT INTO CONTENTS VALUES (4,'AUDIO');
INSERT INTO CONTENTS VALUES (5,'MAGAZINE');
INSERT INTO CONTENTS VALUES (6,'MAGAZINE');
COMMIT;

SELECT * FROM CONTENTS;

CONTENT_ID CONTENT_TYPE
-----------------------
1 MOVIE
2 MOVIE
3 AUDIO
4 AUDIO
5 MAGAZINE
6 MAGAZINE
The requirements to load the target table are:
Load only one content type at a time into the target table.

The target table should always contain only one contain type.

The loading of content types should follow round-robin style. First MOVIE, second AUDIO, Third
MAGAZINE and again fourth Movie.

Solution:

First we will create a lookup table where we mention the priorities for the content types. The lookup table
“Create Statement” and data is shown below.

CREATE TABLE CONTENTS_LKP


(
CONTENT_TYPE VARCHAR2(30),
PRIORITY INTEGER,
LOAD_FLAG INTEGER
);

INSERT INTO CONTENTS_LKP VALUES('MOVIE',1,1);


INSERT INTO CONTENTS_LKP VALUES('AUDIO',2,0);
INSERT INTO CONTENTS_LKP VALUES('MAGAZINE',3,0);
COMMIT;

SELECT * FROM CONTENTS_LKP;

CONTENT_TYPE PRIORITY LOAD_FLAG


---------------------------------
MOVIE 1 1
AUDIO 2 0
MAGAZINE 3 0
Here if LOAD_FLAG is 1, then it indicates which content type needs to be loaded into the target table.
Only one content type will have LOAD_FLAG as 1. The other content types will have LOAD_FLAG as 0.
The target table structure is same as the source table structure.

The second step is to truncate the target table before loading the data

TRUNCATE TABLE TGT_CONTENTS;


The third step is to choose the appropriate content type from the lookup table to load the source data into
the target table.

INSERT INTO TGT_CONTENTS


SELECT CONTENT_ID,
CONTENT_TYPE
FROM CONTENTS
WHERE CONTENT_TYPE = (SELECT CONTENT_TYPE FROM CONTENTS_LKP WHERE
LOAD_FLAG=1);
The last step is to update the LOAD_FLAG of the Lookup table.

UPDATE CONTENTS_LKP
SET LOAD_FLAG = 0
WHERE LOAD_FLAG = 1;

UPDATE CONTENTS_LKP
SET LOAD_FLAG = 1
WHERE PRIORITY = (
SELECT DECODE( PRIORITY,(SELECT MAX(PRIORITY) FROM CONTENTS_LKP) ,1 , PRIORITY+1)
FROM CONTENTS_LKP
WHERE CONTENT_TYPE = (SELECT DISTINCT CONTENT_TYPE FROM TGT_CONTENTS)
);
1. About your project, your role and responsibilities.?
2. Which all transformation use cache and which all use index and data cache.?
3. What are indexes and its use.?
4. What are View and difference between view and materialistic view.?
5. What is cartesian product.?
6. Difference between stop and abort.?
7. What is SCD, explain SCD type 2 in full detail.?
8. Difference between Bulk and Normal load.?
9. What is mapplet and its use.?
10. Difference between Reusable and Shortcut.?
11. List of all active and passive t/n.?
12. Project data flow.?
13. Mapping and Session level variable.?
14. How to get record count of source, target and rejected rows in one flat file.
15. How is SQL sort diff from Informatica Sorter.

PROJECT EXPLANATION

This will be the project implementation summary for any of the BI solution

1. Business requirements from BRM (Requirement Phase)


2. FSD from Business Analyst. (Requirement Phase)
3. Review with SME and Development team (Requirement Phase)
4. Logical Design Modelling (Modelling Phase)
5. Physical Design Modelling (Modelling Phase)
6. Development (Development phase and Unit Testing)
7. Move the Code To QA env (Testing Phase)
8. Sign off from QA team
9. Get the sign off from business people.
10. Cross functional Teams Reviews.
11. Finally move to production

Project flow includes the following:


1. From which OLTP systems you are getting your data
2. Tell about the frequency rate of data coming from these systems like daily/weekly/monthly (In my
current project we have divided our OLTP systems on the basis of transaction rate)
3. Tell about the approach you are following in your project like top-down or bottom-up.
4. If you are moving data into staging area then tell about staging tables which you have used. (in my
current project we have 90 staging tables)
5. Tell about business logic that you have applied like CDC, SCD or any other.
6. Tell about the dimension and fact tables of your project like:
Dimension 1 -- type 2 static
Dimension 2 -- type 2 dynamic so on..
similarly, tell about Fact tables like SALES, PURCHASE.
(Also tell about the size these tables)
7. Can also explain about the schema you have used STAR/SNOW FLAKE/GALAXY
8. Tell about the warehouse you are building e.g.:
Type (Orcl, Teradata, DB2), Size (5 TB), containing 80 million records of last 5 yrs

Q)Does anyone has idea how can we send Alert/Mail when Informatica throughput falls below certain
threshold.

Top of Form 1

We need to write Shell script to get it from the Session log and then.. hope you can catch it. Thanks

you can use %t in post session email else one more option is create a new mapping which will compare
thruput with the metadata thruput and check the limit, and then set flag accordingly and send email alert if
flag is set to Y.

SELECT SESSION_NAME,THRUPUT
FROM
OPB_SWIDGINST_LOG
,REP_SESS_LOG
WHERE
OPB_SWIDGINST_LOG.SESSION_ID=REP_SESS_LOG.SESSION_ID
AND REP_SESS_LOG.SESSION_NAME= 's_session_name';

check if it helps.. if anywhere you need more details pls feel free to ask..

%t : Source and target table details, including read throughput in bytes per second and write throughput in
rows per second. The Integration Service includes all information displayed in the session detail dialog
box.

Bottom of Form 1
1. Explain your Project?
2. What are your Daily routines?
3. How many mapping have you created all together in your project?
4. In which account does your Project Fall?
5. What is your Reporting Hierarchy?
6. How many Complex Mapping’s have you created?
7. Could you please me the situation for which you have developed that Complex mapping?
8. What is your Involvement in Performance tuning of your Project?
9. What is the Schema of your Project?
10. And why did you opt for that particular schema?
11. What are your Roles in this project?
12. Can I have one situation which you have adopted by which performance has improved dramatically?
13. Where you Involved in more than two projects simultaneously?
14. Do you have any experience in the Production support?
15. What kinds of Testing have you done on your Project (Unit or Integration or System or UAT)?
16. And Enhancement’s were done after testing?
17. How many Dimension Table are there in your Project and how are they linked to the fact table?
18. How do we do the Fact Load?
19. How did you implement CDC in your project?
20. How does your Mapping in File to Load look like?
21. How does your Mapping in Load to Stage look like?
22. How does your Mapping in Stage to ODS look like?
23. What is the size of your Data warehouse?
24. What is your Daily feed size and weekly feed size?
25. Which Approach (Top down or Bottom Up) was used in building your project?
26. How do you access your source’s (are they Flat files or Relational)?
27. Have you developed any Stored Procedure or triggers in this project?
28. How did you use them and in which situation?
29. Did your Project go live?
30. What are the issues that you have faced while moving your project from the Test Environment to the
Production Environment?
31. What is the biggest Challenge that you encountered in this project?
32. What is the scheduler tool you have used in this project?
33. How did you schedule jobs using it?